Skip to content

SYCL Turboquant implementation attempt#144

Open
cclecle wants to merge 3 commits into
TheTom:feature/turboquant-kv-cachefrom
cclecle:feature/turboquant-kv-cache
Open

SYCL Turboquant implementation attempt#144
cclecle wants to merge 3 commits into
TheTom:feature/turboquant-kv-cachefrom
cclecle:feature/turboquant-kv-cache

Conversation

@cclecle

@cclecle cclecle commented May 13, 2026

Copy link
Copy Markdown

Overview

SYSCL Implementation using Claude, tested a little bit on Intel A380 and oneapi 2025.2.

Additional information

Code not properly reviewed.

Requirements

@TheTom

TheTom commented Jun 4, 2026

Copy link
Copy Markdown
Owner

Thanks for this, really appreciate the SYCL port. We reviewed it and we are happy to merge, it is cleanly isolated to ggml/src/ggml-sycl/ so it can't affect the CUDA, Metal, or ROCm paths, and the hardcoded centroid + WHT sign tables byte-match our canonical turbo tables, so the kernels look like a faithful port of the CUDA set-rows.cu.

Before we pull it we just need some compilation and test evidence, since we don't have an Intel/oneAPI box on our side to verify it ourselves:

  1. A clean -DGGML_SYCL=ON build log (icpx), ideally on current feature/turboquant-kv-cache.
  2. A quick functional check that turbo KV actually works on your A380, e.g. llama-server/llama-cli with -ctk turbo3 -ctv turbo3 producing coherent output, and if you can, a short llama-perplexity run showing turbo3 PPL close to q8_0 (our gate is within 5%).

Our build-sycl.yml CI (ubuntu-24.04 + oneAPI) can also produce the compile half. It is currently sitting at action_required because it is a fork PR, so once you push any update we can approve the run and let it compile against current HEAD.

A couple of small things worth a look while you're at it: the new src[0]->type == GGML_TYPE_F32 gate tightens the non-turbo set_rows path too (intended?), and please double check WHT_SIGNS2 and the turbo4 rnorm=0 write against the CUDA reference. Nothing blocking. Thanks again, nice work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants