Skip to content

Grad in SI #14

@ssydasheng

Description

@ssydasheng

Hi,

I am recently reading your excellent continual-learning implementation, in particular about the SI. In the following line of code, you used p.grad, which is the gradient of the regularized loss. However, based on my understanding about SI, the gradient should be computed merely on the data loss, so that it measures how much each weight contributes to the fitting error of the present task. Am I wrong about it, or I missed important factors in your implementation? Thanks ahead for your clarification.

W[n].add_(-p.grad*(p.detach()-p_old[n]))

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions