Skip to content

feat(ui): show last-turn input + cache hit % in context panel (instead of cumulative)#174

Open
lxistired wants to merge 1 commit into
OpenCoworkAI:mainfrom
lxistired:feat/token-ui-cache-display
Open

feat(ui): show last-turn input + cache hit % in context panel (instead of cumulative)#174
lxistired wants to merge 1 commit into
OpenCoworkAI:mainfrom
lxistired:feat/token-ui-cache-display

Conversation

@lxistired

Copy link
Copy Markdown

Problem

The context panel currently displays input/output token counts as cumulative sums across the entire session. After ~10 turns of normal usage, the input number reads "hundreds of thousands of tokens" — even though the cached prefix is being reused at deep discount on each turn. Users panic and assume they're being billed full rate every turn.

The context-usage bar has the inverse problem: it uses only tokenUsage.input (uncached portion), so the bar visually shrinks when a cache hit lands. But the prompt is still that big — the model still has to attend to all of it. Users see a session that feels "lighter" right when it gets fuller.

Fix

Switch the display to last-turn input + last-turn output + overall cache-hit percentage. That honestly reflects what the user is paying for.

Three changes:

  1. types/index.tsTokenUsage gains optional cacheRead / cacheWrite fields. The provider/SDK already produces these; the UI was discarding them.

  2. agent-runner.ts normalizeTokenUsage — extract cacheRead / cacheWrite from common provider field-name variants:

    • Anthropic: cache_read_input_tokens / cache_creation_input_tokens
    • OpenAI Responses: prompt_tokens_details.cached_tokens
    • pi-ai-core normalized: cacheRead / cacheWrite
  3. ContextPanel.tsx

    • tokenUsage now exposes lastInput + lastOutput as input/output, plus aggregates (totalCacheRead, cacheHitRate) for the header.
    • contextUsage bar uses lastInput + lastCacheRead (true prompt size), so the bar doesn't shrink when a cache hit lands.
    • Header shows cache N% inline next to input when totalCacheRead > 0; absent for non-cache providers (existing behavior preserved).

Files

  • src/renderer/types/index.ts (+4)
  • src/renderer/components/ContextPanel.tsx (+47/-18)
  • src/main/claude/agent-runner.ts (normalizeTokenUsage only, +24/-3)

Test plan

  • tsc --noEmit passes
  • First message: input/output reflect that single turn (no aggregation)
  • After 10 turns with high cache hit rate: input shows current-turn prompt + 'cache 90%' style label, NOT 1.5M
  • Context bar % grows with chat length (no shrink on cache hit)
  • Provider with no cache info (e.g. Ollama, OpenAI Chat-Completions w/o caching): no 'cache N%' label, behavior identical to before

…d of cumulative)

Header used to display input/output token counts cumulatively across the
entire chat session. After ~10 turns the input number reads 'hundreds
of thousands of tokens' even though the cached prefix is being reused
at deep discount on each turn — users panic and think they're being
billed full rate every turn.

Switch the display to last-turn (current prompt size + last response
size) plus an overall cache-hit percentage, which honestly reflects
what the user is actually paying for.

Three changes:

1. types/index.ts — TokenUsage gains optional cacheRead / cacheWrite
   fields (the existing provider/SDK already produces them; UI just
   ignored them).

2. agent-runner.ts normalizeTokenUsage — extract cacheRead / cacheWrite
   from common provider field-name variants:
     - Anthropic: cache_read_input_tokens / cache_creation_input_tokens
     - OpenAI Responses: prompt_tokens_details.cached_tokens
     - pi-ai-core normalized: cacheRead / cacheWrite

3. ContextPanel.tsx —
     - tokenUsage now exposes lastInput + lastOutput as 'input'/'output',
       plus aggregates (totalCacheRead, cacheHitRate) for the header.
     - contextUsage bar uses (lastInput + lastCacheRead) so the bar
       doesn't shrink when a cache hit lands (which would feel
       counterintuitive — the prompt is still that big).
     - Header shows 'cache N%' inline next to input when totalCacheRead
       > 0; absent for non-cache providers (current behavior preserved).
@hqhq1025 hqhq1025 added bot-rerun Temporary label for rerunning bot automation and removed bot-rerun Temporary label for rerunning bot automation labels Apr 30, 2026

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Findings

  • [Major] TokenUsage.input is being treated as "uncached-only" in the UI, but normalizeTokenUsage() preserves provider-native semantics. That makes the new panel wrong in both directions: OpenAI-style usage already includes cached prompt tokens in the prompt total, while Anthropic-style usage splits cache writes into cache_creation_input_tokens, which the new input + cacheRead calculation never adds back. Evidence src/main/claude/agent-runner.ts:395-417, src/renderer/components/ContextPanel.tsx:117-119, src/renderer/components/ContextPanel.tsx:142-149, src/renderer/types/index.ts:95-98.
    Suggested fix:
    const cacheRead = typeof cacheReadCandidate === 'number' ? cacheReadCandidate : 0;
    const cacheWrite = typeof cacheWriteCandidate === 'number' ? cacheWriteCandidate : 0;
    const baseInput = raw.input ?? raw.input_tokens ?? raw.inputTokens;
    
    const normalizedInput =
      raw.prompt_tokens_details !== undefined
        ? (baseInput as number)
        : (baseInput as number) + cacheRead + cacheWrite;
    
    return { input: normalizedInput, output, cacheRead, cacheWrite };
    Then use u.input directly in ContextPanel for both the header input value and contextUsage.used.

Summary

  • Review mode: initial. 1 issue found. CLAUDE.md was not found in repo/docs. No PR-side regression tests were added for provider-specific token accounting in the context panel.

Testing

  • Not run (automation). Suggested tests: Anthropic cache-write first turn, Anthropic cache-read follow-up turn, and OpenAI cached prompt totals.

Open Cowork Bot

const cacheWrite = typeof cacheWriteCandidate === 'number' ? cacheWriteCandidate : undefined;

return {
input,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MAJOR] ContextPanel now assumes tokenUsage.input is the uncached tail and rebuilds prompt size as input + cacheRead, but normalizeTokenUsage() is still passing provider-native values through unchanged. That breaks in both directions: OpenAI-style usage already includes cached prompt tokens in the prompt total, while Anthropic-style usage splits cache writes into cache_creation_input_tokens, which this calculation never adds back.

Suggested fix:

const cacheRead = typeof cacheReadCandidate === 'number' ? cacheReadCandidate : 0;
const cacheWrite = typeof cacheWriteCandidate === 'number' ? cacheWriteCandidate : 0;
const baseInput = raw.input ?? raw.input_tokens ?? raw.inputTokens;

const normalizedInput =
  raw.prompt_tokens_details !== undefined
    ? (baseInput as number)
    : (baseInput as number) + cacheRead + cacheWrite;

return { input: normalizedInput, output, cacheRead, cacheWrite };

Then use u.input directly in ContextPanel for both the header input value and contextUsage.used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants