forked from ggml-org/llama.cpp
-
-
Notifications
You must be signed in to change notification settings - Fork 326
Pull requests: TheTom/llama-cpp-turboquant
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
vulkan: bound per-workgroup KV in flash attention (candidate fix for #185 device-lost)
ggml
Vulkan
#186
opened Jun 19, 2026 by
TheTom
Owner
Loading…
Feature/vulkan fa large buffer
ggml
Nvidia GPU
testing
Vulkan
#181
opened Jun 13, 2026 by
Yvi71
Loading…
ggml: add ROCmFP4 CPU quantization (experimental Q4_0_ROCMFP4 / _FAST)
examples
ggml
#170
opened Jun 6, 2026 by
TheTom
Owner
Loading…
hip: VEC flash-attn for D=512 (Gemma 4) on ROCm with quantized KV
ggml
Nvidia GPU
#156
opened May 24, 2026 by
cclecle
Loading…
vulkan: add TurboQuant KV cache support and optimized turbo mat-vec paths
ggml
Vulkan
#140
opened May 10, 2026 by
Fenix46
Loading…
fix(qwen35): support Qwen3.5:9B loading from Ollama GGUF
model
#135
opened May 8, 2026 by
Jordan-HS
Loading…
vendor: bump cpp-httplib to 0.43.2 (openssl 4.0.0 fix)
python
script
#121
opened May 4, 2026 by
TheTom
Owner
Loading…
1 of 3 tasks
HIP mixed TurboQuant vec FA on gfx900/gfx906
build
ggml
Nvidia GPU
#99
opened Apr 21, 2026 by
2bigO
Loading…
perf: turbo VEC flash attention — +9% decode on CUDA via autoresearch
ggml
Nvidia GPU
script
#53
opened Apr 4, 2026 by
signalnine
Loading…
7 tasks done
fix: HIP/ROCm compatibility — check cudaMemcpyToSymbol errors, guard …
ggml
Nvidia GPU
#41
opened Apr 1, 2026 by
terrysimons
•
Draft
ProTip!
Updated in the last three days: updated:>2026-06-19.