Skip to content

track training loss while using doc2vec issue. #2983

@skwolvie

Description

@skwolvie

Problem description

I am trying to track training loss using doc2vec algorithm. And it failed. Is there a way to track training loss in doc2vec?
Also, I didnt find any documentation related to performing early stopping while do2vec training phase?

the similarity score is varying a lot based on epochs, and I want to stop training when it has reached optimal capacity with callbacks. I have used keras, it has earlystopping feature. Not sure how to do it using gensim models.

Any response is appreciated. Thank you!

Steps/code/corpus to reproduce

class EpochLogger(CallbackAny2Vec):
    '''Callback to log information about training'''

    def __init__(self):
        self.epoch = 0

    def on_epoch_begin(self, model):
        print("Epoch #{} start".format(self.epoch))

    def on_epoch_end(self, model):
        print("Epoch #{} end".format(self.epoch))
        self.epoch += 1

epoch_logger = EpochLogger()

class LossLogger(CallbackAny2Vec):
    '''Output loss at each epoch'''
    def __init__(self):
        self.epoch = 1
        self.losses = []

    def on_epoch_begin(self, model):
        print(f'Epoch: {self.epoch}', end='\t')

    def on_epoch_end(self, model):
        loss = model.get_latest_training_loss()
        self.losses.append(loss)
        print(f'  Loss: {loss}')
        self.epoch += 1

loss_logger = LossLogger()

def train_model(data, ids, destination, alpha):

    print('\tTagging data .. ')
    tagged_data = [TaggedDocument(words=word_tokenize(str(_d).lower()), tags=[str(ids[i])]) for i, _d in enumerate(data)]

    print('\tPreparing model with the following parameters: epochs = {}, vector_size = {}, alpha = {} .. '.
          format(max_epochs, vec_size, alpha))

    model = Doc2Vec(vector_size=vec_size,
                    workers=cores//2,
                    alpha=alpha,  # initial learning rate
                    min_count=2,  # Ignore words having a total frequency below this
                    dm_mean=1,  # take mean of of word2vec and doc2vec
                    dm=1,
                    callbacks=[epoch_logger, loss_logger])  # PV-DM over PV-DBOW

    model.build_vocab(tagged_data, keep_raw_vocab=False, progress_per=100000)

Versions

Please provide the output of:

2017 4673
        Tagging data ..
        Preparing model with the following parameters: epochs = 50, vector_size = 100, alpha = 0.01 ..
        Beginning model training ..
                Iteration 0
                Learning Rate =  0.01
Epoch #0 start
Epoch: 1        Epoch #0 end
Traceback (most recent call last):
    loss = model.get_latest_training_loss()
AttributeError: 'Doc2Vec' object has no attribute 'get_latest_training_loss'

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions