fix: add missing prototype for turbo_cpu_fwht_inverse to resolve -Wmissing-prototypes CI error#12
Open
sujitvasanth wants to merge 1 commit into
Conversation
…ssing-prototypes CI error
turbolego
pushed a commit
to turbolego/atomic-llama-cpp-turboquant
that referenced
this pull request
Jun 22, 2026
…ILE FA routing (TheTom#176) * HIP: fix turbo KV decode crash under graph capture; batch-aware VEC/TILE FA routing Route small-batch (decode) quantized-KV flash attention through the graph-safe VEC kernel and let large prefill batches fall through to the fast TILE/MMA kernel. Make the f16 dequant temp allocation capture-aware: allocate from the ggml pool while a stream is capturing (no cudaMalloc/cudaFree/cudaStreamSynchronize), keep raw alloc for large eager prefill so the multi-GB buffer is released immediately (gfx1201 has no VMM, the legacy pool would retain it). Fixes 'FLASH_ATTN_EXT failed: operation not permitted when stream is capturing' with GGML_HIP_GRAPHS=ON and turbo KV types on RDNA4. Tested on gfx1201 (Radeon AI PRO R9700, Windows, HIP SDK 7.1): pp2048 735 t/s (vs 188 t/s without graphs), tg128 22.9 t/s, no decode crash. Possibly related: AtomicBot-ai#12. * fattn (HIP): note pool-retention tradeoff for non-VEC captured decode Address review on TheTom#176: document that head_dim==192 / K-stride-mismatch configs fall through to the TILE/MMA path under capture and pool-alloc the full f16 dequant buffer, which the legacy pool retains permanently -- a VRAM tradeoff, not a crash. VEC-eligible head dims (Gemma) never hit this. --------- Co-authored-by: KaiAtAdesso <KaiAtAdesso@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
turbo_cpu_fwht_inverse was added in 0759506 without a forward declaration, triggering -Wmissing-prototypes which is treated as -Werror in the expanded CI suite, causing all builds to fail.
Fix: add forward declaration before the function definition in ggml-turbo-quant.c.
Additional information
Referenced in #8
Requirements