feat(ui): show last-turn input + cache hit % in context panel (instead of cumulative)#174
Open
lxistired wants to merge 1 commit into
Open
feat(ui): show last-turn input + cache hit % in context panel (instead of cumulative)#174lxistired wants to merge 1 commit into
lxistired wants to merge 1 commit into
Conversation
…d of cumulative)
Header used to display input/output token counts cumulatively across the
entire chat session. After ~10 turns the input number reads 'hundreds
of thousands of tokens' even though the cached prefix is being reused
at deep discount on each turn — users panic and think they're being
billed full rate every turn.
Switch the display to last-turn (current prompt size + last response
size) plus an overall cache-hit percentage, which honestly reflects
what the user is actually paying for.
Three changes:
1. types/index.ts — TokenUsage gains optional cacheRead / cacheWrite
fields (the existing provider/SDK already produces them; UI just
ignored them).
2. agent-runner.ts normalizeTokenUsage — extract cacheRead / cacheWrite
from common provider field-name variants:
- Anthropic: cache_read_input_tokens / cache_creation_input_tokens
- OpenAI Responses: prompt_tokens_details.cached_tokens
- pi-ai-core normalized: cacheRead / cacheWrite
3. ContextPanel.tsx —
- tokenUsage now exposes lastInput + lastOutput as 'input'/'output',
plus aggregates (totalCacheRead, cacheHitRate) for the header.
- contextUsage bar uses (lastInput + lastCacheRead) so the bar
doesn't shrink when a cache hit lands (which would feel
counterintuitive — the prompt is still that big).
- Header shows 'cache N%' inline next to input when totalCacheRead
> 0; absent for non-cache providers (current behavior preserved).
There was a problem hiding this comment.
Findings
- [Major]
TokenUsage.inputis being treated as "uncached-only" in the UI, butnormalizeTokenUsage()preserves provider-native semantics. That makes the new panel wrong in both directions: OpenAI-style usage already includes cached prompt tokens in the prompt total, while Anthropic-style usage splits cache writes intocache_creation_input_tokens, which the newinput + cacheReadcalculation never adds back. Evidencesrc/main/claude/agent-runner.ts:395-417,src/renderer/components/ContextPanel.tsx:117-119,src/renderer/components/ContextPanel.tsx:142-149,src/renderer/types/index.ts:95-98.
Suggested fix:Then useconst cacheRead = typeof cacheReadCandidate === 'number' ? cacheReadCandidate : 0; const cacheWrite = typeof cacheWriteCandidate === 'number' ? cacheWriteCandidate : 0; const baseInput = raw.input ?? raw.input_tokens ?? raw.inputTokens; const normalizedInput = raw.prompt_tokens_details !== undefined ? (baseInput as number) : (baseInput as number) + cacheRead + cacheWrite; return { input: normalizedInput, output, cacheRead, cacheWrite };
u.inputdirectly inContextPanelfor both the header input value andcontextUsage.used.
Summary
- Review mode: initial. 1 issue found.
CLAUDE.mdwas not found in repo/docs. No PR-side regression tests were added for provider-specific token accounting in the context panel.
Testing
- Not run (automation). Suggested tests: Anthropic cache-write first turn, Anthropic cache-read follow-up turn, and OpenAI cached prompt totals.
Open Cowork Bot
| const cacheWrite = typeof cacheWriteCandidate === 'number' ? cacheWriteCandidate : undefined; | ||
|
|
||
| return { | ||
| input, |
There was a problem hiding this comment.
[MAJOR] ContextPanel now assumes tokenUsage.input is the uncached tail and rebuilds prompt size as input + cacheRead, but normalizeTokenUsage() is still passing provider-native values through unchanged. That breaks in both directions: OpenAI-style usage already includes cached prompt tokens in the prompt total, while Anthropic-style usage splits cache writes into cache_creation_input_tokens, which this calculation never adds back.
Suggested fix:
const cacheRead = typeof cacheReadCandidate === 'number' ? cacheReadCandidate : 0;
const cacheWrite = typeof cacheWriteCandidate === 'number' ? cacheWriteCandidate : 0;
const baseInput = raw.input ?? raw.input_tokens ?? raw.inputTokens;
const normalizedInput =
raw.prompt_tokens_details !== undefined
? (baseInput as number)
: (baseInput as number) + cacheRead + cacheWrite;
return { input: normalizedInput, output, cacheRead, cacheWrite };Then use u.input directly in ContextPanel for both the header input value and contextUsage.used.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The context panel currently displays input/output token counts as cumulative sums across the entire session. After ~10 turns of normal usage, the input number reads "hundreds of thousands of tokens" — even though the cached prefix is being reused at deep discount on each turn. Users panic and assume they're being billed full rate every turn.
The context-usage bar has the inverse problem: it uses only
tokenUsage.input(uncached portion), so the bar visually shrinks when a cache hit lands. But the prompt is still that big — the model still has to attend to all of it. Users see a session that feels "lighter" right when it gets fuller.Fix
Switch the display to last-turn input + last-turn output + overall cache-hit percentage. That honestly reflects what the user is paying for.
Three changes:
types/index.ts—TokenUsagegains optionalcacheRead/cacheWritefields. The provider/SDK already produces these; the UI was discarding them.agent-runner.tsnormalizeTokenUsage— extractcacheRead/cacheWritefrom common provider field-name variants:cache_read_input_tokens/cache_creation_input_tokensprompt_tokens_details.cached_tokenscacheRead/cacheWriteContextPanel.tsx—tokenUsagenow exposeslastInput + lastOutputasinput/output, plus aggregates (totalCacheRead,cacheHitRate) for the header.contextUsagebar useslastInput + lastCacheRead(true prompt size), so the bar doesn't shrink when a cache hit lands.cache N%inline next to input whentotalCacheRead > 0; absent for non-cache providers (existing behavior preserved).Files
src/renderer/types/index.ts(+4)src/renderer/components/ContextPanel.tsx(+47/-18)src/main/claude/agent-runner.ts(normalizeTokenUsageonly, +24/-3)Test plan
tsc --noEmitpasses