fix(agent-runner): forward pasted image content to vision-capable models#168
Open
lxistired wants to merge 1 commit into
Open
fix(agent-runner): forward pasted image content to vision-capable models#168lxistired wants to merge 1 commit into
lxistired wants to merge 1 commit into
Conversation
When a user pastes an image in ChatView, the renderer creates a content
block of shape `{type:'image', source:{type:'base64', media_type, data}}`
following the Anthropic Messages API. The agent runner only forwarded
the prompt text via `piSession.prompt(string)`, so the image bytes were
silently dropped before reaching the model.
This change extracts image blocks from the current user message,
normalises them to the pi-coding-agent shape
(`{type:'image', mimeType, data}`), and passes them via the
`{images: [...]}` option on `prompt()`. The text-only path is preserved
so we don't change the wire format when no images are present.
Notes:
- Reads `source.media_type` / `source.data` (Anthropic shape) but also
accepts already-normalised `mimeType` / `data` for forward compat.
- Only the current user turn is processed; prior assistant/tool turns
are unchanged.
- The existing `hasImages` log line now reports the number of extracted
images for easier debugging.
There was a problem hiding this comment.
Findings
- No high-confidence issues found in the added/modified lines.
Summary
- Review mode: initial. No correctness, security, or regression issues were identified in
src/main/claude/agent-runner.tson the modified lines. - Residual risk: this change adds a new image-forwarding path in
src/main/claude/agent-runner.tswithout focused regression coverage undertests/, so future SDK-shape drift may not be caught automatically.
Testing
- Not run (automation). Suggested follow-up: add an
agent-runnerregression test that asserts image blocks are normalized to{ type: 'image', mimeType, data }and passed topiSession.prompt(..., { images }).
Open Cowork Bot
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
When a user pastes an image in
ChatView, the renderer creates a content block in Anthropic Messages API shape:ClaudeAgentRunnerinsrc/main/claude/agent-runner.tsonly forwarded the prompt text viapiSession.prompt(string), so the image bytes were silently dropped before reaching the model. The pre-existinghasImagesflag was set but never acted on — vision-capable agents received only the text.This patch:
type:'image'entries.source.media_type,source.data) to the pi-coding-agent shape ({type:'image', mimeType, data}). Already-normalised blocks are accepted too for forward compat.piSession.prompt(text, { images: currentTurnImages }). The text-only path is preserved when the user didn't paste anything, so the wire format is unchanged in the common case.Only the current user turn is processed — prior turns are untouched. The existing
hasImageslog line now also reports the extracted count for easier debugging.Test plan
imageskey in the underlying request).User message contains images: 2 extracted.