Grad in SI

Hi, 

I am recently reading your excellent continual-learning implementation, in particular about the SI. In the following line of code, you used `p.grad`, which is the gradient of the regularized loss. However, based on my understanding about SI, the gradient should be computed merely on the data loss, so that it measures how much each weight contributes to the fitting error of the present task. Am I wrong about it, or I missed important factors in your implementation? Thanks ahead for your clarification.

https://github.com/GMvandeVen/continual-learning/blob/d281967802396b146b2c30b6667369c6f2395472/train.py#L248

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Grad in SI #14

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Grad in SI #14

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions