Conversation
|
!test |
|
Review updated until commit 987672b Description
|
| Relevant files | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Enhancement |
| ||||||||||||
| Tests |
|
PR Reviewer Guide
Here are some key observations to aid the review process:
| 🧪 PR contains tests |
| ⚡ Recommended focus areas for review |
Error handling robustness
NVF_THROW for unexpected transform types, but the old code used NVF_ERROR. Consider if this change in error handling behavior is intentional and consistent with the project's error handling patterns. |
Test failures
-
(Medium, 3)
Shape mismatch in thunder.tests.test_update_aliases higher-order inplace alias update (thunderfx path)Test Name A100 GB200 H100 Source thunder.tests.test_update_aliases.test_higher_order_inplace_alias_update_nvfuser_cuda_thunder.dtypes.float32 ❌ ❌ ❌ -
(Medium, 1)
Large scalar mismatch in Thunder nanoGPT autograd nvFuser CUDA test (test_networks)Test Name H100 Source thunder.tests.test_networks.test_nanogpt_complete_autograd_nvfuser_cuda_thunder.dtypes.float32 ❌
Greptile OverviewGreptile SummaryThis PR enables replay support for Key Changes:
The implementation follows the same pattern as existing transform types (Split, Merge, Resize), with proper error handling and domain mapping. The test validates that swizzle1d replay works correctly with stream parallelization. Confidence Score: 5/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Client
participant HostIR as Host IR Container
participant Shard as shardByStream
participant Replay as ReplaySelf
participant PropLoop as transformLoopDomain
participant AllocUtil as shardAllocationAsLoop
Client->>HostIR: Create tensors with swizzle1d transforms
Note over HostIR: tv->outer_split(1, d)<br/>tv->axis(1)->parallelize(DIDx)<br/>tv->outer_split(0, d)<br/>tv->swizzle1d(0, DIDx)<br/>tv->axis(0)->parallelize(Stream)
Client->>Shard: shardByStream(tv, stream_index, expr)
Shard->>PropLoop: transformLoopDomain()
Note over PropLoop: Get transforms between domains
PropLoop->>PropLoop: Process Split transforms
Note over PropLoop: Replay splits on target domain
PropLoop->>PropLoop: Process Swizzle1D transforms
Note over PropLoop: Create replayed_id = IterDomain::swizzle1d()<br/>Preserve parallelType<br/>Update ref2target mapping
PropLoop-->>Shard: Return transformed loop domain
Shard->>AllocUtil: shardAllocationAsLoop()
Note over AllocUtil: Process allocation transforms
AllocUtil->>AllocUtil: Handle Swizzle1D in allocation
Note over AllocUtil: Preserve contiguity through swizzle<br/>allocation_to_contiguity.insert()
AllocUtil->>AllocUtil: Handle Split in allocation
Note over AllocUtil: Split contiguity for inner/outer
AllocUtil-->>Shard: Return updated allocation domain
Shard->>Replay: Replay transforms using ReplaySelf
Note over Replay: handle(Swizzle1D*)<br/>Map input to output ID<br/>Create replayed swizzle1d<br/>Update loop_ids_
Replay-->>Shard: Return replayed shard
Shard-->>Client: Return sharded tensor view
|
| if (e->isA<Swizzle1D>()) { | ||
| auto* swizzle1d = e->as<Swizzle1D>(); |
There was a problem hiding this comment.
| if (e->isA<Swizzle1D>()) { | |
| auto* swizzle1d = e->as<Swizzle1D>(); | |
| if (auto* swizzle_1d = dynamic_cast<Swizzle1D*>(e)) { |
| } else if (e->isA<Split>()) { | ||
| auto* split = e->as<Split>(); |
| allocation_to_contiguity.insert( | ||
| split_i, split->inner(), inner_contiguity); | ||
| } else { | ||
| NVF_THROW("Expected a swizzle1d or split transform. Got: ", e); |
There was a problem hiding this comment.
for (...) {
if (a) {
...
} else if (b) {
...
} else {
...
}
}
=>
for (...) {
if (a) {
...
continue;
}
if (b) {
...
continue;
}
...
}
There was a problem hiding this comment.
What is the reason for preferring if .. continue over if-else blocks?
There was a problem hiding this comment.
The same reason why I prefer early exit.
- Once I see
continue, I know the rest of the for loop has nothing to do with conditiona. Otherwise, I have to scroll down to check where the end-if is. - Less indentation for the final
elseblock.
csrc/multidevice/propagation.cpp
Outdated
| if (hasRootToLogicalTransform(consumer_id, consumer_tv)) { | ||
| validate_split(split, target_id); | ||
| } | ||
| if (transform->isA<Swizzle1D>()) { |
csrc/transform_iter.cpp
Outdated
| auto is_supported_expr = | ||
| e->isOneOf<Split, Merge, Swizzle, Swizzle1D, Resize>(); | ||
| NVF_ERROR( | ||
| is_supported_expr, "Invalid expr type found in transform traversal."); |
There was a problem hiding this comment.
| is_supported_expr, "Invalid expr type found in transform traversal."); | |
| (e->isOneOf<Split, Merge, Swizzle, Swizzle1D, Resize>())); |
The extra pair of parentheses is needed for disambiguate the commas.
|
!test |
Enables replaying
swizzle1Das required byshardByStream