Cold checkpoint silently skipped when the first prompt exceeds kv-cache-cold-max-tokens (Claude Code + MCP easily > 30K)

## TL;DR

The cold-checkpoint store is gated by `prompt_len <= cold_max_tokens` (default 30,000). A Claude Code opening prompt with a moderate MCP toolset is already past that: 59 tools rendered to **30,246 tokens** in our deployment. The gate fails, the cold anchor is never written, and nothing is logged — the exact mechanism designed for "independent agent sessions" reuse is disabled invisibly for the workload it was built for.

## How it fails today

Every fresh Claude Code session (`/clear` + first message) repays the full stable prefix. Two consecutive fresh sessions, 100 seconds apart, byte-identical prompts:

```
00:05:01 ds4-server: chat ctx=0..30246:30246 TOOLS prompt start
00:05:55 ds4-server: kv cache stored tokens=20480 trimmed=0 reason=continued ...
00:06:26 ds4-server: chat ctx=0..30246:30246 TOOLS prompt done 84.679s
          (no reason=cold store: 30246 > 30000, silently skipped)
00:06:42 ds4-server: live kv cache miss live=30276 prompt=30246 common=30246 reason=token-mismatch
00:08:07 ds4-server: chat ctx=0..30246:30246 TOOLS prompt done 84.710s   ← full prefill again
```

After raising the limit (`--kv-cache-cold-max-tokens 60000`), the designed behavior kicks in immediately:

```
01:09:20 ds4-server: kv cache stored tokens=27005 trimmed=3213 reason=cold ...
01:09:40 ds4-server: kv cache hit text tokens=27005 ... load=72.4 ms
01:09:54 ds4-server: chat ctx=27005..30218:3213 TOOLS prompt done 13.786s   ← 84.7s → 13.8s
```

Backend: Apple Silicon / Metal (M2 Ultra 192 GiB), DeepSeek V4 Flash q2-q4-imatrix.

We only found this by reading the source; the flag's help text gives no hint that agent first prompts have outgrown the default.

## Proposed fix

1. Log a hint when a fully cold prompt skips the cold store because it exceeds `cold_max_tokens` (one line, only when `cached == 0`, so no noise).
2. Consider raising the default: Claude Code first prompts crossed 30K once a handful of MCP servers are configured, and they only grow. 50,000–60,000 keeps the original "don't burn disk on huge contexts" intent while covering current agent CLIs.

PR for (1) incoming together with a related cold-anchor fix.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cold checkpoint silently skipped when the first prompt exceeds kv-cache-cold-max-tokens (Claude Code + MCP easily > 30K) #392

TL;DR

How it fails today

Proposed fix

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Cold checkpoint silently skipped when the first prompt exceeds kv-cache-cold-max-tokens (Claude Code + MCP easily > 30K) #392

Description

TL;DR

How it fails today

Proposed fix

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions