Skip to content

fix: add missing prototype for turbo_cpu_fwht_inverse to resolve -Wmissing-prototypes CI error#12

Open
sujitvasanth wants to merge 1 commit into
AtomicBot-ai:feature/turboquant-kv-cachefrom
sujitvasanth:fix/turbo-fwht-prototype
Open

fix: add missing prototype for turbo_cpu_fwht_inverse to resolve -Wmissing-prototypes CI error#12
sujitvasanth wants to merge 1 commit into
AtomicBot-ai:feature/turboquant-kv-cachefrom
sujitvasanth:fix/turbo-fwht-prototype

Conversation

@sujitvasanth

Copy link
Copy Markdown

Overview

turbo_cpu_fwht_inverse was added in 0759506 without a forward declaration, triggering -Wmissing-prototypes which is treated as -Werror in the expanded CI suite, causing all builds to fail.
Fix: add forward declaration before the function definition in ggml-turbo-quant.c.

Additional information

Referenced in #8

Requirements

  • I have read and agree with the contributing guidelines
  • AI usage disclosure: Yes, cowrote with Claude, reran build locally and now compiling without warnings on ubuntu 20.04

@github-actions github-actions Bot added the ggml label May 13, 2026
turbolego pushed a commit to turbolego/atomic-llama-cpp-turboquant that referenced this pull request Jun 22, 2026
…ILE FA routing (TheTom#176)

* HIP: fix turbo KV decode crash under graph capture; batch-aware VEC/TILE FA routing

Route small-batch (decode) quantized-KV flash attention through the graph-safe VEC kernel and let large prefill batches fall through to the fast TILE/MMA kernel. Make the f16 dequant temp allocation capture-aware: allocate from the ggml pool while a stream is capturing (no cudaMalloc/cudaFree/cudaStreamSynchronize), keep raw alloc for large eager prefill so the multi-GB buffer is released immediately (gfx1201 has no VMM, the legacy pool would retain it).

Fixes 'FLASH_ATTN_EXT failed: operation not permitted when stream is capturing' with GGML_HIP_GRAPHS=ON and turbo KV types on RDNA4. Tested on gfx1201 (Radeon AI PRO R9700, Windows, HIP SDK 7.1): pp2048 735 t/s (vs 188 t/s without graphs), tg128 22.9 t/s, no decode crash. Possibly related: AtomicBot-ai#12.

* fattn (HIP): note pool-retention tradeoff for non-VEC captured decode

Address review on TheTom#176: document that head_dim==192 / K-stride-mismatch
configs fall through to the TILE/MMA path under capture and pool-alloc the
full f16 dequant buffer, which the legacy pool retains permanently -- a VRAM
tradeoff, not a crash. VEC-eligible head dims (Gemma) never hit this.

---------

Co-authored-by: KaiAtAdesso <KaiAtAdesso@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant