Skip to content

Improve moe grouped gemm logging#652

Open
araina-amd wants to merge 9 commits intomainfrom
araina/dev/projection-moe-grouped-gemm-logging
Open

Improve moe grouped gemm logging#652
araina-amd wants to merge 9 commits intomainfrom
araina/dev/projection-moe-grouped-gemm-logging

Conversation

@araina-amd
Copy link
Copy Markdown
Contributor

Improve moe grouped gemm logging

root and others added 9 commits April 1, 2026 09:43
… scheduler comparison

Performance projection fixes:
- Fix double-counting of DeepEP A2A overlap when EP is unchanged
- Correctly reconstruct sequential compute time when EP changes with DeepEP ON
- Fix VPP handling: use interleaved_1f1b when zero-bubble + VPP>1

Scheduler comparison (--pipeline-schedule-algorithm):
- Thread scheduler_algorithm from CLI through projection engine
- Add zbv-formatted and zbv-greedy as CLI choices
- Add _print_scheduler_comparison for multi-scheduler results table
- 'all' mode runs all applicable schedulers + SeaAILab ILP and picks best

CLI fixes:
- Re-add --pipeline-schedule-algorithm argument with full choices
- Rename megatron-ilp to seaailab-ilp
- Log Megatron grouped-GEMM flags and Origami-style M/H/F, grouped_batch,
  token counts, and layer pattern from MoE block config (GPU benchmark
  and simulation).
- Simulation: print expert routed GEMM-only fwd/bwd ms before router overhead.
- training_config_debug_one_line() for compact config in profiler logs.
- Optional backward autograd label/args and CUDA profiler hook in utils;
  wire through layer profilers.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants