[release/2.7] Optimize Flex-Attention occupancy for head_dim=128 by tomjen12 · Pull Request #2957 · ROCm/pytorch

tomjen12 · 2026-02-04T03:25:53Z

[ROCm] Optimize Flex-Attention occupancy for head_dim=128

Summary

This PR adjusts n_warps from 8 to 4 for head_dim=128 configurations on ROCm for gfx942.

Performance Impact (320 Valid Cases)

The following table shows the Geometric Mean (Geomean) speedups compared to the current n_warps=8 baseline:

Attention Pattern	Test Count	Fwd Speedup	Bwd Speedup
alibi	32	1.07x	1.60x
causal	32	1.08x	1.44x
noop	32	1.07x	1.38x
prefix_lm	112	1.09x	1.40x
sliding_window	112	1.07x	1.27x
Overall (Geomean)	320	1.08x	1.41x

Benchmark Coverage:

Batch Size: [1, 2, 4, 8]
Heads: [16, 32]
Sequence Length: [512, 1024, 2048, 4096]
Masking: window_size and prefix_lm ∈ [128, 256, 512, 1024, 2048].
Note: Redundant cases (e.g., window_size > seq_len) were excluded.

Adjust n_warps from 8 to 4 for head_dim=128 configurations to improve performance stability across different attention patterns. - Forward speedup: ~1.07x geomean uplift. - Backward speedup: 1.27x to 1.60x geomean uplift. - Validated with a filtered sweep of 320 unique cases.

rocm-repo-management-api · 2026-02-04T03:54:51Z

Jenkins build for 4b7cecf5fbae5ab9ef60a3b7ae25abfa63074041 commit finished as FAILURE
Links: Pipeline Overview / Build artifacts / Test Results

tomjen12 requested review from jataylo and jeffdaily February 4, 2026 03:25

tomjen12 changed the title ~~[ROCm] Optimize Flex-Attention occupancy for head_dim=128~~ [release/2.7] Optimize Flex-Attention occupancy for head_dim=128 Feb 4, 2026

jataylo merged commit 6b53931 into release/2.7 Feb 11, 2026
0 of 2 checks passed

jataylo deleted the tomjen12/release/2.7-flex-attn-warp-optimization branch February 11, 2026 12:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[release/2.7] Optimize Flex-Attention occupancy for head_dim=128#2957

[release/2.7] Optimize Flex-Attention occupancy for head_dim=128#2957
jataylo merged 1 commit intorelease/2.7from
tomjen12/release/2.7-flex-attn-warp-optimization

tomjen12 commented Feb 4, 2026 •

edited

Loading

Uh oh!

rocm-repo-management-api bot commented Feb 4, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tomjen12 commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

[ROCm] Optimize Flex-Attention occupancy for head_dim=128

Summary

Performance Impact (320 Valid Cases)

Uh oh!

rocm-repo-management-api bot commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tomjen12 commented Feb 4, 2026 •

edited

Loading

rocm-repo-management-api bot commented Feb 4, 2026 •

edited

Loading