[WIP]support fused neighborhood attention for npu #1034

Hailey-Zh · 2026-01-20T10:48:03Z

Summary

This PR introduces support for Fused Neighborhood Attention (FNA) optimized specifically for NPU architectures. The implementation focuses on memory efficiency and hardware affinity to prevent performance bottlenecks. Key modifications include:

Grid Dimension Refactoring: Adjusted the attention grid to a 2D structure. This change optimizes thread block mapping and prevents User Buffer (UB) overflow, ensuring the workload fits within the NPU's local memory constraints.

NPU-Affinity Softmax: Refactored the Softmax tiling and grid dimensions to align with NPU compute unit sizes, maximizing throughput and reducing synchronization overhead.

Details

Testing Done

Hardware Type: < >NPU（910B3）
run make test to ensure correctness
run make checkstyle to ensure code style
run make test-convergence to ensure convergence

Tcc0403 · 2026-01-22T10:04:03Z

Thank you! Could you also attach the benchmark results and keep comments in english?

Hailey-Zh · 2026-01-22T11:57:19Z

Thank you! Could you also attach the benchmark results and keep comments in english?

This is currently a draft and there are still a few outstanding issues to resolve. I will make sure to include the benchmark results and switch all comments to English in the final official version.

lowdy1 · 2026-02-05T09:05:03Z

Since we’re currently focused on the Ascend CI, and this kernel is still unworkable, I was wondering if you have bandwidth to keep working on it. If you’d like, maybe we could also help move it forward.

Hailey-Zh · 2026-02-05T12:41:38Z

Since we’re currently focused on the Ascend CI, and this kernel is still unworkable, I was wondering if you have bandwidth to keep working on it. If you’d like, maybe we could also help move it forward.

we'll keep working on it

support fused neighborhood attention for npu

2eca39f

xuedinge233 mentioned this pull request Jan 27, 2026

[NPU]RFC: Ascend CI Integration #1022

Open

Hailey-Zh added 2 commits February 5, 2026 20:47

change grid size

57755c2

delete draft file and num_wraps

45efab4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP]support fused neighborhood attention for npu #1034

[WIP]support fused neighborhood attention for npu #1034

Uh oh!

Hailey-Zh commented Jan 20, 2026 •

edited

Loading

Uh oh!

Tcc0403 commented Jan 22, 2026

Uh oh!

Hailey-Zh commented Jan 22, 2026

Uh oh!

lowdy1 commented Feb 5, 2026

Uh oh!

Hailey-Zh commented Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[WIP]support fused neighborhood attention for npu #1034

Are you sure you want to change the base?

[WIP]support fused neighborhood attention for npu #1034

Uh oh!

Conversation

Hailey-Zh commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Details

Testing Done

Uh oh!

Tcc0403 commented Jan 22, 2026

Uh oh!

Hailey-Zh commented Jan 22, 2026

Uh oh!

lowdy1 commented Feb 5, 2026

Uh oh!

Hailey-Zh commented Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Hailey-Zh commented Jan 20, 2026 •

edited

Loading