Skip to content

Cold anchor never refreshed after partial prefix hits: steady-state regression for repeated fresh agent sessions #393

@TerryChengTW

Description

@TerryChengTW

TL;DR

Cold checkpoints are only written when cached == 0. After any change inside the stable prefix region — for Claude Code, the harness rewrites a git-status block at ~token 26,850 on every commit — fresh sessions partially hit the 20,480 continued waypoint instead. cached != 0 blocks the cold path, so the stale anchor is never replaced, and every subsequent fresh session repays anchor - waypoint tokens forever. The cache can never climb back to its best state without a fully cold prefill.

How it fails today

With a valid anchor (27,005 tokens), a fresh session costs 13.8s of prefill. Then one git commit changes the early prompt block, and:

01:31:30 ds4-server: live kv cache miss live=30311 prompt=30224 common=26852 reason=token-mismatch
01:31:30 ds4-server: kv cache hit text tokens=20480 ... file=92b3c7af....kv
01:32:00 ds4-server: chat ctx=20480..30224:9744 TOOLS prompt done 29.967s
          (cached=20480 != 0 → cold path never evaluated → no anchor refresh)
01:32:14 ds4-server: live kv cache miss live=30261 prompt=30224 common=30224 reason=token-mismatch
01:32:14 ds4-server: kv cache hit text tokens=20480 ...
01:32:44 ds4-server: chat ctx=20480..30224:9744 TOOLS prompt done 29.962s   ← same again, forever

Self-locking loop:

stable prefix changes → anchor stale → waypoint hit (cached != 0)
        → anchor never refreshed → every fresh session pays 30s instead of 14s

For agent workloads that commit regularly this is the normal state, not the exception: we measured 13.8s prefill (fresh anchor) vs ~30s steady state (stale anchor), i.e. the cache permanently runs at half its designed benefit.

Backend: Apple Silicon / Metal (M2 Ultra 192 GiB), DeepSeek V4 Flash q2-q4-imatrix. Found while investigating #392.

Proposed fix

When a request partially hits below the anchor position (cached < anchor), write the cold anchor for the current prompt during the same prefill. The lookup having returned a shorter entry already proves no matching anchor exists on disk, so this never duplicates an existing checkpoint; hits at or past the anchor store nothing. Cost is one extra checkpoint write (~70 ms for a 27K-token anchor here) on the first session after the prefix changed; every later fresh session returns to the fast path.

PR with the change and unit tests incoming.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions