Conversation
|
Review updated until commit 08a5f45 Description
|
| Relevant files | |||||||
|---|---|---|---|---|---|---|---|
| Tests |
| ||||||
| Enhancement |
| ||||||
| Bug fix |
| ||||||
| Miscellaneous |
|
PR Reviewer Guide
Here are some key observations to aid the review process:
| 🧪 PR contains tests |
| ⚡ Recommended focus areas for review |
Error Handling Robustness
|
for multi-GPU debugging. Multi-GPU scheduling happens before segmentation and the shardings are encoded as loop transforms.
I got the "triangle updates incoming" test passing in this PR. Below are the key issues identified, workarounds and their current status:
1. Sharding Propagation Rework
2. Multi-Dimensional Sharding &
getCommunicationInfogetCommunicationInfoto support multi-dimensional sharding. It reuseshaveDifferentShardingsto identify inconsistencies between input and outputTensorViewobjects. The commit needs cleanup and further test verification to be merged.haveDifferentShardingsis currently bottlenecked by the expensiveExpressionSimplifier. We need to transition this to be IdModel-based in a future iteration.3. Misaligned Memory Access in Transpose Kernels
ReorderShardedAxisPassto ensure the scattered axis of theReduceScatteris allocated outermost.4. Performance Bottleneck: AllGather memory
AllGatherpreceding the Einsum is functional but consumes too much memory for AlphaFold3 workloads due to long sequence lengths.cc @DejunL