Releases · sgl-project/sgl-kernel-npu

28 Nov 08:39

iforgetmyname

20251128

da5a4e0

20251128

What's Changed

add test internode for deepep by @zuje123 in #193
Support run normal mode deepep on a single A2 machine by @luanyundu in #201
[Test] Testing the generalization of fused moe by @kaniel-outis in #167
Add whl packages to Github Release by @BourneSun0527 in #204
Add two scripts by @DubhepPan in #119
support long cat on a3 by @luanyundu in #182
calculate dispatch normal input parameters using npu instead of cpu by @lih827 in #177
Add alloc_extend_kernel by @hw-csong in #196
Modify deepep README_CN.md by @oagniqgnat in #187
notify_dispatch kernel change magic from int32_t to uint64_t by @zuje123 in #202

Full Changelog: 2025112...2025112

Contributors

kaniel-outis, luanyundu, and 6 other contributors

Assets 2

20 Nov 01:10

iforgetmyname

20251120

a8e003c

20251120

What's Changed

dispatch and combine batchsize support 4096 for A2 by @ruiqiangworking in #173
remove redundant check by @ruiqiangworking in #175
optimize deepep setup, package name with cann version by @zuje123 in #178
deepep low_latency d&c support a2 single server by @zuje123 in #176
Add README files for mlapo and batch_transpose_matmul by @randgun in #104
Support device with different counts of AICore (FusedDeepMoe operator) by @wangqiankun13 in #180
Add triton decode attention kernels by @RuixuanZhang06 in #184
fix cann version check by @hustmf in #188
update a verification of HCCL_BUFFSIZE for moe by @goosj in #183
op transfer kv fixbug by @husf1130 in #194
add_norm_bias and split_qkv_norm_rope for qwen3 by @chenxu140 in #157
[Chore] Upgrade CANN to 8.3.RC1 by @iforgetmyname in #195

New Contributors

@hustmf made their first contribution in #188
@chenxu140 made their first contribution in #157

Full Changelog: 2025111...2025112

Contributors

iforgetmyname, hustmf, and 8 other contributors

Assets 2

10 Nov 12:32

iforgetmyname

20251110

715fb0c

20251110

What's Changed

Added custom low_latency operators for dispatch/combine in the A2 dec… by @oagniqgnat in #166
deepep support internode api by @zuje123 in #169
add layout to ops2 directory by @luanyundu in #171
Modified the deep_ep README and add A2 operator performance data. by @oagniqgnat in #168
feat: add verify_tree_greedy_kernel triton kernel by @ranjiewen in #165
optimize a2 layered combine kernel code by @ruiqiangworking in #172
feat:tiny bugfix&Performance Optimization by @Yael-X in #170

New Contributors

@ruiqiangworking made their first contribution in #172

Full Changelog: 2025110...2025111

Contributors

luanyundu, Yael-X, and 4 other contributors

Assets 2

06 Nov 08:12

iforgetmyname

20251106

aae2a1a

20251106

What's Changed

Add dependency on the moe header file of CANN by @DubhepPan in #152
support small bs = 1 or 2 by @wangyibo1005 in #150
feat:adapt x86_64 compilation by @Yael-X in #143
[DFX] Compatible with CAN 8.2 and CAN 8.3 by @kaniel-outis in #158
add mla_preprocess test script by @LinyuanLi0046 in #153
[DFX] adapt cann8.3 by @kaniel-outis in #159
[bugfix] swiglu quant by @Liwansi in #162
[New Ops] build tree efficient by @hw-csong in #161
support shallow fused topk=-1 by @wangyibo1005 in #160
support kvcacheio by @husf1130 in #163
improve layout kernel on a2 by @luanyundu in #164

New Contributors

@DubhepPan made their first contribution in #152
@Liwansi made their first contribution in #162
@hw-csong made their first contribution in #161

Full Changelog: 2025103...2025110

Contributors

kaniel-outis, luanyundu, and 7 other contributors

Assets 2

30 Oct 07:22

iforgetmyname

20251030

97fb68c

20251030

What's Changed

add a2 dispatch layout and update its test by @luanyundu in #149
support topk=-1 by @wangyibo1005 in #132
add env to decide whether send out prefix sum or not by @luanyundu in #151
refactor: make hiddenStateDim a class member in MlaTilingData, Follow up closed PR#82 by @LinyuanLi0046 in #133
support cachemode int8_nzcache with bf16 in mla_preprocess by @LinyuanLi0046 in #135
add op transfer_kv_dim_exchange by @husf1130 in #148
impl fused_swiglu_quant with group_list for deepep-low-latency by @xiaobaicxy in #155
[Kernel] add Flash-Linear-Attention/layernorm_gated Triton op by @iforgetmyname in #154

New Contributors

@LinyuanLi0046 made their first contribution in #133
@husf1130 made their first contribution in #148
@xiaobaicxy made their first contribution in #155

Full Changelog: 2025102...2025103

Contributors

iforgetmyname, xiaobaicxy, and 4 other contributors

Assets 2

23 Oct 09:21

iforgetmyname

20251023

e8c6ab4

20251023

What's Changed

Change the padding generation from randperm back to arange by @oagniqgnat in #140
LoRA: moving kernels from vllm-ascend repo by @vlserov in #128
Update README.md of DeepEp by @goosj in #144

New Contributors

@vlserov made their first contribution in #128
@goosj made their first contribution in #144

Full Changelog: 2025102...2025102

Contributors

oagniqgnat, vlserov, and goosj

Assets 2

22 Oct 11:26

iforgetmyname

20251022

4dc412c

20251022

What's Changed

Update README.md: Add performace of normal and low latency dispatch/combine by @oagniqgnat in #106
Support debug info for build by @jia-rundong in #99
Update README by @oagniqgnat in #115
Synchronous fusion moe by @kaniel-outis in #108
Fix the severe performance degradation issue of the top9 dispatch in normal mode compared to top8. by @oagniqgnat in #117
feat:add moe fused operator test draft by @Yael-X in #120
mlapo fit different hidden state dim by @Todobe in #82
Not use download.pytorch.org by @jia-rundong in #121
EPLB for fused_deep_moe by @wangyibo1005 in #116
[FusedDeepMoe] Support EPLB by @kaniel-outis in #118
Support different token hidden sizes and gmm hidden sizes [FusedDeepMoe Operator] by @wangqiankun13 in #123
Delete left useless code [FusedDeepMoe Operator] by @wangqiankun13 in #129
update qwen3-next performance kernels by @iforgetmyname in #130
[Bugfix] Remove unused code that causes split failure in Qwen3-Next by @iforgetmyname in #142

New Contributors

@Todobe made their first contribution in #82
@wangqiankun13 made their first contribution in #123

Full Changelog: 2025092...2025102

Contributors

iforgetmyname, kaniel-outis, and 6 other contributors

Assets 2

Releases: sgl-project/sgl-kernel-npu

20251128

What's Changed

Contributors

Uh oh!

20251120

What's Changed

New Contributors

Contributors

Uh oh!

20251110

What's Changed

New Contributors

Contributors

Uh oh!

20251106

What's Changed

New Contributors

Contributors

Uh oh!

20251030

What's Changed

New Contributors

Contributors

Uh oh!

20251023

What's Changed

New Contributors

Contributors

Uh oh!

20251022

What's Changed

New Contributors

Contributors

Uh oh!