Summary
TQ+ per-coordinate calibration is fitted from the first add batch only and then frozen for the life of the index. If that first batch has fewer than TQPLUS_MIN_SAMPLES (1000) vectors, the fit falls back to identity (shift=0, scale=1) — but that identity is stored as non-empty and frozen, so every subsequent add (even millions of vectors) silently reuses identity and never fits real TQ+. The index permanently loses the TQ+ recall gain, with no error or warning.
The empty-first-add case was already fixed (the n == 0 early-return in add); the 1–999 case was not.
Repro
idx = TurboQuantIndex(dim=128, bit_width=4)
idx.add(seed_vectors) # e.g. 500 vectors -> identity calibration, frozen
idx.add(one_million_vectors) # never re-fits; whole index runs on identity
Root cause
encode::compute_tqplus_calibration returns identity when n < TQPLUS_MIN_SAMPLES (turbovec/src/encode.rs:149).
TurboQuantIndex::add fits + freezes calibration on the first add only (turbovec/src/lib.rs:298-307); subsequent adds reuse the frozen value.
- The index discards the original vectors after encoding (keeps only quantized codes + scales), and every encoded vector must share one frozen calibration (one coordinate system). So calibration cannot be re-fitted "late" — the already-encoded vectors can't be re-encoded.
Why the trivial fix is insufficient
The minimal fix ("don't freeze identity; fit on the first batch that is >= 1000") handles "small seed then big load" but still fails for drip-fed small batches (every add < 1000 -> never calibrates), and only ever reflects one batch's distribution.
Proper fix: dual-mode warm-up
Because originals are discarded and a single coordinate system is required, doing this properly means a dual-mode index:
- Warm-up (below the sample threshold): buffer the raw vectors, don't quantize yet, serve search by exact brute-force over that small buffer (<= threshold vectors — cheap, and higher quality than quantized at small scale).
- At the threshold: fit calibration from the full warm-up set, encode the whole buffer at once, freeze, drop the raw buffer.
- Steady state: stream-quantize with the frozen calibration, as today.
This eliminates the silent identity-freeze and calibrates from a proper cumulative sample, while keeping vectors searchable throughout.
Open design decisions
- Search during warm-up: brute-force exact over the buffer (recommended) vs return nothing until warmed up.
- Persistence: persist the raw buffer + mode (new
.tv/.tvim format version) vs force a flush-with-current-data on save.
- Keep
TQPLUS_MIN_SAMPLES = 1000?
- Memory: buffer costs up to
threshold * dim * 4 bytes raw (~6 MB at dim=1536, threshold 1000).
Prior art
Qdrant's TurboQuant implementation makes the same base-vs-calibration split and solves the sampling with a streaming estimator (P²/P-Square over a Vitter's Algorithm R reservoir), because it quantizes during a build/optimize phase when vectors are present. turbovec is pure online ingest, hence the warm-up-buffer approach above. See https://qdrant.tech/articles/turboquant-quantization/
Positioning note (separate)
The base TurboQuant is genuinely data-oblivious / training-free; TQ+ is a lightweight data-dependent calibration. The "no training" claim is fine when scoped to the base algorithm (as Qdrant scopes it), but README/docs should be precise that TQ+ does a data-dependent calibration step. Worth a docs precision pass, tracked separately from the code change.
Summary
TQ+ per-coordinate calibration is fitted from the first add batch only and then frozen for the life of the index. If that first batch has fewer than
TQPLUS_MIN_SAMPLES(1000) vectors, the fit falls back to identity (shift=0, scale=1) — but that identity is stored as non-empty and frozen, so every subsequent add (even millions of vectors) silently reuses identity and never fits real TQ+. The index permanently loses the TQ+ recall gain, with no error or warning.The empty-first-add case was already fixed (the
n == 0early-return inadd); the 1–999 case was not.Repro
Root cause
encode::compute_tqplus_calibrationreturns identity whenn < TQPLUS_MIN_SAMPLES(turbovec/src/encode.rs:149).TurboQuantIndex::addfits + freezes calibration on the first add only (turbovec/src/lib.rs:298-307); subsequent adds reuse the frozen value.Why the trivial fix is insufficient
The minimal fix ("don't freeze identity; fit on the first batch that is >= 1000") handles "small seed then big load" but still fails for drip-fed small batches (every add < 1000 -> never calibrates), and only ever reflects one batch's distribution.
Proper fix: dual-mode warm-up
Because originals are discarded and a single coordinate system is required, doing this properly means a dual-mode index:
This eliminates the silent identity-freeze and calibrates from a proper cumulative sample, while keeping vectors searchable throughout.
Open design decisions
.tv/.tvimformat version) vs force a flush-with-current-data on save.TQPLUS_MIN_SAMPLES = 1000?threshold * dim * 4bytes raw (~6 MB at dim=1536, threshold 1000).Prior art
Qdrant's TurboQuant implementation makes the same base-vs-calibration split and solves the sampling with a streaming estimator (P²/P-Square over a Vitter's Algorithm R reservoir), because it quantizes during a build/optimize phase when vectors are present. turbovec is pure online ingest, hence the warm-up-buffer approach above. See https://qdrant.tech/articles/turboquant-quantization/
Positioning note (separate)
The base TurboQuant is genuinely data-oblivious / training-free; TQ+ is a lightweight data-dependent calibration. The "no training" claim is fine when scoped to the base algorithm (as Qdrant scopes it), but README/docs should be precise that TQ+ does a data-dependent calibration step. Worth a docs precision pass, tracked separately from the code change.