Skip to content

TQ+ calibration: silent identity-freeze on small first add; fit from a cumulative warm-up sample #107

Description

@RyanCodrai

Summary

TQ+ per-coordinate calibration is fitted from the first add batch only and then frozen for the life of the index. If that first batch has fewer than TQPLUS_MIN_SAMPLES (1000) vectors, the fit falls back to identity (shift=0, scale=1) — but that identity is stored as non-empty and frozen, so every subsequent add (even millions of vectors) silently reuses identity and never fits real TQ+. The index permanently loses the TQ+ recall gain, with no error or warning.

The empty-first-add case was already fixed (the n == 0 early-return in add); the 1–999 case was not.

Repro

idx = TurboQuantIndex(dim=128, bit_width=4)
idx.add(seed_vectors)          # e.g. 500 vectors  -> identity calibration, frozen
idx.add(one_million_vectors)   # never re-fits; whole index runs on identity

Root cause

  • encode::compute_tqplus_calibration returns identity when n < TQPLUS_MIN_SAMPLES (turbovec/src/encode.rs:149).
  • TurboQuantIndex::add fits + freezes calibration on the first add only (turbovec/src/lib.rs:298-307); subsequent adds reuse the frozen value.
  • The index discards the original vectors after encoding (keeps only quantized codes + scales), and every encoded vector must share one frozen calibration (one coordinate system). So calibration cannot be re-fitted "late" — the already-encoded vectors can't be re-encoded.

Why the trivial fix is insufficient

The minimal fix ("don't freeze identity; fit on the first batch that is >= 1000") handles "small seed then big load" but still fails for drip-fed small batches (every add < 1000 -> never calibrates), and only ever reflects one batch's distribution.

Proper fix: dual-mode warm-up

Because originals are discarded and a single coordinate system is required, doing this properly means a dual-mode index:

  • Warm-up (below the sample threshold): buffer the raw vectors, don't quantize yet, serve search by exact brute-force over that small buffer (<= threshold vectors — cheap, and higher quality than quantized at small scale).
  • At the threshold: fit calibration from the full warm-up set, encode the whole buffer at once, freeze, drop the raw buffer.
  • Steady state: stream-quantize with the frozen calibration, as today.

This eliminates the silent identity-freeze and calibrates from a proper cumulative sample, while keeping vectors searchable throughout.

Open design decisions

  1. Search during warm-up: brute-force exact over the buffer (recommended) vs return nothing until warmed up.
  2. Persistence: persist the raw buffer + mode (new .tv/.tvim format version) vs force a flush-with-current-data on save.
  3. Keep TQPLUS_MIN_SAMPLES = 1000?
  4. Memory: buffer costs up to threshold * dim * 4 bytes raw (~6 MB at dim=1536, threshold 1000).

Prior art

Qdrant's TurboQuant implementation makes the same base-vs-calibration split and solves the sampling with a streaming estimator (P²/P-Square over a Vitter's Algorithm R reservoir), because it quantizes during a build/optimize phase when vectors are present. turbovec is pure online ingest, hence the warm-up-buffer approach above. See https://qdrant.tech/articles/turboquant-quantization/

Positioning note (separate)

The base TurboQuant is genuinely data-oblivious / training-free; TQ+ is a lightweight data-dependent calibration. The "no training" claim is fine when scoped to the base algorithm (as Qdrant scopes it), but README/docs should be precise that TQ+ does a data-dependent calibration step. Worth a docs precision pass, tracked separately from the code change.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions