Fix indexedarray adjoint#1504
Conversation
📝 WalkthroughWalkthroughThis PR fixes adjoint gradient computation through ChangesIndexedarray Adjoint Gradients
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
warp/native/array.h (1)
1321-1324: ⚡ Quick winConsider adding a bounds check after index remapping.
After remapping the index through
buf.indices[0], the forward pass verifies the result is within bounds (line 619). The adjoint pass should include a similar check to guard against invalid indices arrays:if (buf.indices[0]) i = buf.indices[0][i]; assert(i >= 0 && i < buf.arr.shape[0]);While existing adjoint functions omit assertions for performance, validating remapped indices would catch external data corruption early.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@warp/native/array.h` around lines 1321 - 1324, Add a bounds check after remapping the index via buf.indices[0] in the adjoint pass: after assigning i = buf.indices[0][i] verify that i is within [0, buf.arr.shape[0]) and handle or assert on failure. Specifically update the block that currently does "if (buf.indices[0]) i = buf.indices[0][i];" to perform the bounds validation using buf.arr.shape[0] (and buf.indices[0] and i) so corrupted or out-of-range index mappings are caught early.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@warp/native/array.h`:
- Around line 1321-1324: Add a bounds check after remapping the index via
buf.indices[0] in the adjoint pass: after assigning i = buf.indices[0][i] verify
that i is within [0, buf.arr.shape[0]) and handle or assert on failure.
Specifically update the block that currently does "if (buf.indices[0]) i =
buf.indices[0][i];" to perform the bounds validation using buf.arr.shape[0] (and
buf.indices[0] and i) so corrupted or out-of-range index mappings are caught
early.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Enterprise
Run ID: b2df894d-50f5-4f0f-b39b-6910f9539deb
📒 Files selected for processing (4)
warp/_src/context.pywarp/_src/types.pywarp/native/array.hwarp/tests/test_indexedarray.py
Greptile SummaryThis PR fixes a long-standing bug (GH-1479) where Confidence Score: 4/5Safe to merge for 1-D indexed array differentiation; multi-dim will fail at C++ compile time rather than produce wrong gradients. The 1-D implementation is logically consistent across Python and C++: the grad property, the pack_arg relaxation, and both adj_address overloads correctly follow gather indirection before accumulating into base.grad. The main gap is that indexedarray.grad is accessible for ndim > 1 but C++ templates are absent, so multi-dim differentiation fails at kernel compilation with an unhelpful error. Test coverage is good for the happy path but misses tape.zero() and negative-index scenarios. warp/_src/types.py — the grad property should guard against ndim > 1 to prevent opaque C++ template failures. Important Files Changed
Reviews (1): Last reviewed commit: "1-D unit test as example" | Re-trigger Greptile |
| def grad(self): | ||
| if self.data is None or self.data.grad is None: | ||
| return None | ||
| return indexedarray(self.data.grad, self.indices[: self.ndim], ndim=self.ndim) |
There was a problem hiding this comment.
grad property silently permits ndim > 1 without C++ support
The property creates a valid-looking indexedarray for any ndim, but the C++ adj_address overloads added in this PR only exist for the 1-D signature. When a 2-D+ indexed array is differentiated, the generated adjoint kernel calls adj_address(indexedarray_t, i, j, ...), which has no matching template, producing an opaque C++ compilation error at kernel-launch time. Adding a guard here would surface a clear, actionable Python error instead.
| // indexedarray with a regular-array adjoint (as passed by the CUDA codegen): resolve the | ||
| // index indirection, then accumulate into the base grad or the base array's embedded grad | ||
| template <typename T> | ||
| inline CUDA_CALLABLE void | ||
| adj_address(const indexedarray_t<T>& buf, int i, const array_t<T>& adj_buf, int adj_i, const T& adj_output) | ||
| { | ||
| if (i < 0) | ||
| i += buf.shape[0]; | ||
| if (buf.indices[0]) | ||
| i = buf.indices[0][i]; | ||
|
|
||
| if (adj_buf.data) | ||
| adj_atomic_add(&index(adj_buf, i), adj_output); | ||
| else if (buf.arr.grad) | ||
| adj_atomic_add(&index_grad(buf.arr, i), adj_output); | ||
| } |
There was a problem hiding this comment.
Second overload's triggering path is underdocumented
The comment says "as passed by the CUDA codegen", but after this PR the tape backward always passes an indexedarray_t (not array_t) as the adjoint via the new grad property. The array_t adjoint overload is actually needed for the manual adjoint-launch use case (e.g., wp.launch(..., adj_inputs=[plain_array], adjoint=True)), which was the path that segfaulted on CPU. Clarifying this in the comment would prevent confusion about which code path actually exercises this overload.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| def test_indexedarray_grad_1d(test, device): | ||
| # gradients must flow back through a differentiable indexedarray input (GH-1479): | ||
| # the adjoint follows the gather indirection and accumulates into the base array's grad | ||
| base = wp.array(np.linspace(1.0, 6.0, 6, dtype=np.float32), dtype=float, device=device, requires_grad=True) | ||
| weights_np = np.array([0.25, 0.5, 1.0], dtype=np.float32) | ||
| weights = wp.array(weights_np, dtype=float, device=device) | ||
| indices = wp.array([1, 3, 5], dtype=int, device=device) | ||
| samples = wp.indexedarray1d(base, [indices]) | ||
| total = wp.zeros(1, dtype=float, device=device, requires_grad=True) | ||
|
|
||
| tape = wp.Tape() | ||
| with tape: | ||
| wp.launch( | ||
| kernel_indexedarray_grad_1d, dim=samples.size, inputs=[samples, weights], outputs=[total], device=device | ||
| ) | ||
|
|
||
| # forward: sum of base[i] * weight over the gathered indices | ||
| assert_np_equal(total.numpy(), np.array([8.5], dtype=np.float32), tol=1e-6) | ||
|
|
||
| tape.backward(loss=total) | ||
|
|
||
| # d(total)/d(base[j]) is the matching weight at each gathered index, zero elsewhere | ||
| expected = np.zeros(6, dtype=np.float32) | ||
| expected[[1, 3, 5]] = weights_np | ||
| assert_np_equal(base.grad.numpy(), expected, tol=1e-6) |
There was a problem hiding this comment.
Missing
tape.zero() and negative-index coverage
Two behaviors introduced by this PR are not exercised: (1) tape.zero() now calls samples.grad.zero_(), which is an indexed fill_(0) over base.grad — a regression here would leave stale gradients across backward passes; (2) the C++ code handles negative index wrap-around (if (i < 0) i += buf.shape[0]), but there is no test that passes a negative index through the kernel and verifies gradient accumulation at the correct base-array slot.
Description
wp.indexedarrayinputs were not differentiable:wp.TaperaisedAttributeError(no
.grad), a manual adjoint launch segfaulted on CPU, and on CUDA it silentlyproduced zero gradients. This adds gradient support for indexed-array inputs so the
adjoint follows the gather indirection and accumulates into the base array's gradient.
right now this is only handling 1-D arrays, multi-dim is still needing to be implemented
closes #1479
Checklist
Test plan
Verified gradients are correct on CPU and CUDA via both
wp.Tapeand a manualadjoint launch (expected
[0, 0.25, 0, 0.5, 0, 1.0]).Bug fix
Summary by CodeRabbit
New Features
Tests