quantcpp 0.10.1: includes S2 infinite scrollback + S4 context persistence

unamedkr · unamedkr · commit cc74de27ccb0 · 2026-04-09T23:51:34.000+09:00
diff --git a/bindings/python/pyproject.toml b/bindings/python/pyproject.toml
@@ -7,7 +7,7 @@ build-backend = "setuptools.build_meta"
 
 [project]
 name = "quantcpp"
-version = "0.10.0"
+version = "0.10.1"
 description = "Single-header LLM inference engine with KV cache compression (7× compression at fp32 parity)"
 readme = "README.md"
 license = { text = "Apache-2.0" }
diff --git a/bindings/python/quantcpp/__init__.py b/bindings/python/quantcpp/__init__.py
@@ -19,7 +19,7 @@
     from importlib.metadata import version as _pkg_version
     __version__ = _pkg_version("quantcpp")
 except Exception:
-    __version__ = "0.10.0"  # fallback for editable / source-tree imports
+    __version__ = "0.10.1"  # fallback for editable / source-tree imports
 
 import os
 import sys
diff --git a/docs/strategy_progressive_kv.md b/docs/strategy_progressive_kv.md
@@ -52,5 +52,15 @@ degradation from +3.8% to +0.6% at 28 KB cost.
 - Added progressive=True to Model()
 - Published v0.10.0 to PyPI
 
-### Round 3: Infinite Scrollback (IN PROGRESS)
-- Goal: replace "context exceeded → stop" with "context full → compress oldest → continue"
+### Round 3: Infinite Scrollback (DONE)
+- Implemented context shift in tq_generate.c + quant.h
+- Verified: SmolLM2-135M at ctx=64, 500 tokens with 9 auto-shifts
+- Context never overflows — generation continues seamlessly
+
+### Round 4: Compressed Persistence (DONE)
+- quant_save_context / quant_load_context API
+- QKVC file format: 64-byte header + raw compressed KV data
+- Python: m.save_context("doc.kv") / m.load_context("doc.kv")
+- "Read once, query forever" — verified round-trip
+
+### Round 5: Next — S5 WASM Demo or PyPI publish v0.10.0