Skip to content

Fix missing diff[i] in legacy contrastive loss GPU backward#7100

Open
Chessing234 wants to merge 1 commit intoBVLC:masterfrom
Chessing234:fix/contrastive-loss-legacy-gpu-gradient
Open

Fix missing diff[i] in legacy contrastive loss GPU backward#7100
Chessing234 wants to merge 1 commit intoBVLC:masterfrom
Chessing234:fix/contrastive-loss-legacy-gpu-gradient

Conversation

@Chessing234
Copy link
Copy Markdown

Summary

The GPU backward kernel (CLLBackward) for ContrastiveLossLayer produces wrong gradients when legacy_version=true for dissimilar pairs.

Root cause: The CPU backward computes the gradient via caffe_cpu_axpby:

beta = -alpha;
caffe_cpu_axpby(channels, beta, diff_.cpu_data() + (j*channels), ...);
// result: bout[d] = -alpha * diff[d]

The GPU kernel sets bottom_diff[i] = beta = -alpha directly, omitting the multiplication by diff[i]:

beta = -alpha;          // missing: * diff[i]
bottom_diff[i] = beta;  // gives -alpha instead of -alpha * diff[i]

The non-legacy GPU path correctly absorbs diff[i] into beta:

beta = -alpha * mdist / (dist + 1e-4) * diff[i];  // correct

Fix: Change beta = -alpha to beta = -alpha * diff[i] to match the CPU implementation.

Test plan

  • Run contrastive loss layer tests with legacy_version=true on GPU
  • Verify CPU and GPU backward outputs match for dissimilar pairs

🤖 Generated with Claude Code

The CPU backward multiplies beta by diff via caffe_cpu_axpby, giving
gradient = -alpha * diff[i] for legacy dissimilar pairs. The GPU kernel
sets bottom_diff[i] = -alpha directly, omitting the diff[i] factor.
The non-legacy GPU path correctly includes * diff[i] in beta.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant