[No Merge][WIP] feat: primus-turbo attn add sbhd format support by RuibinCheung · Pull Request #650 · AMD-AGI/Primus

RuibinCheung · 2026-04-08T11:30:40Z

Primus Turbo attention add sbhd format support

This PR is experimental. Please not merge!

Copilot

Pull request overview

Adds experimental support for additional QKV tensor layouts (notably sbhd) in the Primus Turbo attention wrapper, while introducing special-casing for sink attention to force a specific layout.

Changes:

Removes the previous manual sbhd -> bshd transpose and instead forwards qkv_format into the underlying flash_attn op.
Introduces a use_sink_attn flag and forces sink-attention execution to use bshd, including explicit tensor permutations for Q/K/V and the output.

Copilot · 2026-04-08T11:36:36Z

primus/backends/megatron/core/extensions/primus_turbo.py

+            # NOTE: sink attention only support bshd format
+            query = query.permute(1, 0, 2, 3).contiguous()
+            key = key.permute(1, 0, 2, 3).contiguous()
+            value = value.permute(1, 0, 2, 3).contiguous()


When use_sink_attn is enabled, query/key/value are always permuted as if the incoming layout were sbhd (S,B,H,D) -> bshd (B,S,H,D). If qkv_format is already bshd (or any non-sbhd value coming from packed_seq_params), this permutation will corrupt the tensor layout while qkv_format is forced to "bshd", creating a format/tensor mismatch.

Consider either (a) explicitly asserting qkv_format == "sbhd" before permuting in the sink-attention path, or (b) permuting conditionally based on qkv_format and ensuring o is permuted back consistently to preserve the expected output layout.

Copilot · 2026-04-08T11:36:37Z

primus/backends/megatron/core/extensions/primus_turbo.py

@@ -465,9 +465,6 @@ def forward(
        )

        qkv_format = packed_seq_kwargs.get("qkv_format", self.qkv_format)


qkv_format is now passed through to self.attn(...) without any local validation. Since this value can come from packed_seq_params, an unexpected value will likely fail deeper in the kernel with a less actionable error (and could also interact badly with the explicit format conversions in the sink-attention path).

Recommend validating qkv_format against the set of supported formats in this backend and raising a clear ValueError (or keeping an assert) before using it.

Suggested change

qkv_format = packed_seq_kwargs.get("qkv_format", self.qkv_format)

qkv_format = packed_seq_kwargs.get("qkv_format", self.qkv_format)

supported_qkv_formats = ("sbhd", "bshd", "thd")

if qkv_format not in supported_qkv_formats:

raise ValueError(

f"Unsupported qkv_format: {qkv_format}. "

f"Supported formats: {supported_qkv_formats}"

)

Copilot · 2026-04-08T11:36:37Z

primus/backends/megatron/core/extensions/primus_turbo.py

                else:
                    window_size = (self.sink_sliding_window, 0)
+
+            # NOTE: sink attention only support bshd format


Typo/grammar in the new comment: "sink attention only support bshd format" → "sink attention only supports bshd format".

Suggested change

# NOTE: sink attention only support bshd format

# NOTE: sink attention only supports bshd format

RuibinCheung and others added 2 commits April 2, 2026 08:58

[Megatron-LM] feat: add qkv sbhd format support

a865a3f

workaround sink attn

1e3e9f2

RuibinCheung requested review from Xiaoming-AMD, limou102 and wenxie-amd as code owners April 8, 2026 11:30

Copilot AI review requested due to automatic review settings April 8, 2026 11:30

RuibinCheung changed the title ~~[No Merge][WIP] feat: add sbhd format support~~ [No Merge][WIP] feat: primus-turbo attn add sbhd format support Apr 8, 2026

RuibinCheung marked this pull request as draft April 8, 2026 11:31

Copilot started reviewing on behalf of RuibinCheung April 8, 2026 11:32 View session

Copilot AI reviewed Apr 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[No Merge][WIP] feat: primus-turbo attn add sbhd format support#650

[No Merge][WIP] feat: primus-turbo attn add sbhd format support#650
RuibinCheung wants to merge 2 commits intomainfrom
dev/zhangrb/add_sbhd_format_support

RuibinCheung commented Apr 8, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 8, 2026

Uh oh!

Copilot AI Apr 8, 2026

Uh oh!

Copilot AI Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -465,9 +465,6 @@ def forward(
		)

		qkv_format = packed_seq_kwargs.get("qkv_format", self.qkv_format)

-        qkv_format = packed_seq_kwargs.get("qkv_format", self.qkv_format)
+        qkv_format = packed_seq_kwargs.get("qkv_format", self.qkv_format)
+        supported_qkv_formats = ("sbhd", "bshd", "thd")
+        if qkv_format not in supported_qkv_formats:
+            raise ValueError(
+                f"Unsupported qkv_format: {qkv_format}. "
+                f"Supported formats: {supported_qkv_formats}"
+            )

	# NOTE: sink attention only support bshd format
	# NOTE: sink attention only supports bshd format

Conversation

RuibinCheung commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

RuibinCheung commented Apr 8, 2026 •

edited

Loading