Skip to content

Commit c7f40dd

Browse files
unamedkrclaude
andcommitted
quantcpp 0.8.2: add quant_free_string, eliminate ask() leak
New public C API export in quant.h: void quant_free_string(char* str); The implementation lives in the same translation unit as quant_ask, so its free() call uses the dylib's own malloc zone — no cross-heap abort on macOS arm64 / Windows. Python ctypes wrapper calls it instead of the v0.8.1 skip-and-leak workaround. Backwards compat: bindings use hasattr(lib, 'quant_free_string') so older loaded single-headers still work (with the old leak behavior). New PyPI installs ship the updated quant.h. Verified: 3x consecutive Model.ask() calls clean exit under faulthandler. Honest correction track now at 7 (all self-found, all logged). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent eb41858 commit c7f40dd

5 files changed

Lines changed: 56 additions & 9 deletions

File tree

CHANGELOG.md

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,34 @@
11
# Changelog
22

3+
## [0.8.2] — 2026-04-09 (quant_free_string + leak fix)
4+
5+
### Eliminated the v0.8.1 leak in `Model.ask()`
6+
7+
v0.8.1 worked but leaked ~65 KB per `ask()` call, because the Python wrapper couldn't safely call `libc.free()` on a pointer allocated inside `libquant.dylib`'s malloc heap (cross-zone abort on macOS arm64).
8+
9+
v0.8.2 adds a tiny new export to the public C API:
10+
11+
```c
12+
// quant.h
13+
void quant_free_string(char* str);
14+
```
15+
16+
The implementation lives in the same translation unit as `quant_ask`, so its `free()` call uses the dylib's malloc zone — same heap, no abort. The Python wrapper now calls `lib.quant_free_string(ptr)` instead of skipping the free.
17+
18+
Backwards compat: the binding uses `hasattr(lib, 'quant_free_string')` so older single-headers loaded via `QUANTCPP_LIB=...` continue to work (with the old leak behavior). New installs from PyPI 0.8.2 ship the updated `quant.h`.
19+
20+
Verified: `Model("model.gguf").ask("hi")` × 3 in a row, clean exit, no abort, no leak warning under faulthandler.
21+
22+
### Honest correction track is now 7 (still all self-found)
23+
24+
This is the 7th correction logged in v0.6.x → v0.8.x. Found by running `Model.ask` repeatedly in the v0.8.1 verification cycle. Goal: stay 100% self-found before any external user reports a regression.
25+
26+
### Still pending in v0.8.x
27+
28+
- **`kv_compress=1` / `=2` re-enable in Python bindings** — still requires `quant.h` regeneration against the v0.8.0+ multi-file source (the bundled header is an Apr-6 snapshot whose UNIFORM_4B path aborts on Llama). Tracked in [Issue #18](https://github.com/quantumaikr/quant.cpp/issues/18). Will land when we regenerate the single header.
29+
30+
---
31+
332
## [0.8.1] — 2026-04-09 (Python bindings hotfix)
433
534
### `pip install quantcpp` is now actually usable

bindings/python/pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ build-backend = "setuptools.build_meta"
77

88
[project]
99
name = "quantcpp"
10-
version = "0.8.1"
10+
version = "0.8.2"
1111
description = "Single-header LLM inference engine with KV cache compression (7× compression at fp32 parity)"
1212
readme = "README.md"
1313
license = { text = "Apache-2.0" }

bindings/python/quantcpp/__init__.py

Lines changed: 6 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@
2121
from importlib.metadata import version as _pkg_version
2222
__version__ = _pkg_version("quantcpp")
2323
except Exception:
24-
__version__ = "0.8.1" # fallback for editable / source-tree imports
24+
__version__ = "0.8.2" # fallback for editable / source-tree imports
2525

2626
import os
2727
import threading
@@ -159,13 +159,11 @@ def ask(self, prompt: str) -> str:
159159
result = ctypes.cast(ptr, ctypes.c_char_p).value
160160
text = result.decode("utf-8", errors="replace") if result else ""
161161

162-
# NOTE (v0.8.1): the C string returned by quant_ask is allocated
163-
# inside libquant.dylib's malloc heap. Calling ctypes.CDLL(None).free
164-
# on it crashes on macOS arm64 because Python's libc handle resolves
165-
# to a different malloc zone than the dylib's. We accept a ~65 KB
166-
# leak per ask() call as a temporary tradeoff. quant_free_ctx /
167-
# quant_free_model release the bulk of the memory at end of session.
168-
# Tracked: add quant_free_string(void*) to quant.h in v0.8.2.
162+
# Free via the dylib's own free wrapper (added in v0.8.2). Falls back
163+
# to a leak if the loaded library is an older single-header that
164+
# doesn't export quant_free_string — preserves binary compat.
165+
if hasattr(lib, "quant_free_string"):
166+
lib.quant_free_string(ptr)
169167

170168
return text
171169

bindings/python/quantcpp/_binding.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -134,6 +134,13 @@ def _setup_signatures(lib: ctypes.CDLL) -> None:
134134
lib.quant_ask.argtypes = [ctypes.c_void_p, ctypes.c_char_p]
135135
lib.quant_ask.restype = ctypes.c_void_p # We use c_void_p so we can free()
136136

137+
# void quant_free_string(char*) — added in v0.8.2 to free quant_ask
138+
# results without cross-heap libc.free() crashes on macOS arm64.
139+
# Optional: older single-headers may not export this symbol.
140+
if hasattr(lib, "quant_free_string"):
141+
lib.quant_free_string.argtypes = [ctypes.c_void_p]
142+
lib.quant_free_string.restype = None
143+
137144
# void quant_free_ctx(quant_ctx* ctx)
138145
lib.quant_free_ctx.argtypes = [ctypes.c_void_p]
139146
lib.quant_free_ctx.restype = None

quant.h

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,12 @@ int quant_generate(quant_ctx* ctx, const char* prompt,
5858
// Generate and return full response as string. Caller must free().
5959
char* quant_ask(quant_ctx* ctx, const char* prompt);
6060

61+
// Free a string returned by quant_ask. Always use this — never call libc
62+
// free() on the returned pointer directly. The string lives in the dylib's
63+
// malloc heap, which on macOS arm64 / Windows can be a different malloc
64+
// zone than the caller's libc.
65+
void quant_free_string(char* str);
66+
6167
// Free resources.
6268
void quant_free_ctx(quant_ctx* ctx);
6369
void quant_free_model(quant_model* model);
@@ -15770,6 +15776,13 @@ char* quant_ask(quant_ctx* ctx, const char* prompt) {
1577015776
return output;
1577115777
}
1577215778

15779+
void quant_free_string(char* str) {
15780+
/* The string was malloc()'d inside this translation unit (quant_ask),
15781+
* so it must be free()'d here too — same malloc zone, no cross-heap
15782+
* crash on macOS arm64 / Windows. */
15783+
if (str) free(str);
15784+
}
15785+
1577315786
void quant_free_ctx(quant_ctx* ctx) {
1577415787
if (!ctx) return;
1577515788
tq_free_state(ctx->state);

0 commit comments

Comments
 (0)