-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Open
Description
Problem description
I am trying to track training loss using doc2vec algorithm. And it failed. Is there a way to track training loss in doc2vec?
Also, I didnt find any documentation related to performing early stopping while do2vec training phase?
the similarity score is varying a lot based on epochs, and I want to stop training when it has reached optimal capacity with callbacks. I have used keras, it has earlystopping feature. Not sure how to do it using gensim models.
Any response is appreciated. Thank you!
Steps/code/corpus to reproduce
class EpochLogger(CallbackAny2Vec):
'''Callback to log information about training'''
def __init__(self):
self.epoch = 0
def on_epoch_begin(self, model):
print("Epoch #{} start".format(self.epoch))
def on_epoch_end(self, model):
print("Epoch #{} end".format(self.epoch))
self.epoch += 1
epoch_logger = EpochLogger()
class LossLogger(CallbackAny2Vec):
'''Output loss at each epoch'''
def __init__(self):
self.epoch = 1
self.losses = []
def on_epoch_begin(self, model):
print(f'Epoch: {self.epoch}', end='\t')
def on_epoch_end(self, model):
loss = model.get_latest_training_loss()
self.losses.append(loss)
print(f' Loss: {loss}')
self.epoch += 1
loss_logger = LossLogger()
def train_model(data, ids, destination, alpha):
print('\tTagging data .. ')
tagged_data = [TaggedDocument(words=word_tokenize(str(_d).lower()), tags=[str(ids[i])]) for i, _d in enumerate(data)]
print('\tPreparing model with the following parameters: epochs = {}, vector_size = {}, alpha = {} .. '.
format(max_epochs, vec_size, alpha))
model = Doc2Vec(vector_size=vec_size,
workers=cores//2,
alpha=alpha, # initial learning rate
min_count=2, # Ignore words having a total frequency below this
dm_mean=1, # take mean of of word2vec and doc2vec
dm=1,
callbacks=[epoch_logger, loss_logger]) # PV-DM over PV-DBOW
model.build_vocab(tagged_data, keep_raw_vocab=False, progress_per=100000)
Versions
Please provide the output of:
2017 4673
Tagging data ..
Preparing model with the following parameters: epochs = 50, vector_size = 100, alpha = 0.01 ..
Beginning model training ..
Iteration 0
Learning Rate = 0.01
Epoch #0 start
Epoch: 1 Epoch #0 end
Traceback (most recent call last):
loss = model.get_latest_training_loss()
AttributeError: 'Doc2Vec' object has no attribute 'get_latest_training_loss'
ketyi
Metadata
Metadata
Assignees
Labels
No labels