Skip to content

fix(agent-runner): forward pasted image content to vision-capable models#168

Open
lxistired wants to merge 1 commit into
OpenCoworkAI:mainfrom
lxistired:feat/image-paste-fix
Open

fix(agent-runner): forward pasted image content to vision-capable models#168
lxistired wants to merge 1 commit into
OpenCoworkAI:mainfrom
lxistired:feat/image-paste-fix

Conversation

@lxistired

Copy link
Copy Markdown

Summary

When a user pastes an image in ChatView, the renderer creates a content block in Anthropic Messages API shape:

{ type: 'image', source: { type: 'base64', media_type, data } }

ClaudeAgentRunner in src/main/claude/agent-runner.ts only forwarded the prompt text via piSession.prompt(string), so the image bytes were silently dropped before reaching the model. The pre-existing hasImages flag was set but never acted on — vision-capable agents received only the text.

This patch:

  1. Walks the current user message's content blocks and extracts any type:'image' entries.
  2. Normalises the Anthropic shape (source.media_type, source.data) to the pi-coding-agent shape ({type:'image', mimeType, data}). Already-normalised blocks are accepted too for forward compat.
  3. Passes the extracted images via piSession.prompt(text, { images: currentTurnImages }). The text-only path is preserved when the user didn't paste anything, so the wire format is unchanged in the common case.

Only the current user turn is processed — prior turns are untouched. The existing hasImages log line now also reports the extracted count for easier debugging.

Test plan

  • Paste an image (PNG/JPEG) into ChatView, send with a question like "what's in this image?", confirm the model describes it instead of saying it can't see it.
  • Send a text-only message, confirm no behaviour change (no images key in the underlying request).
  • Send a message with multiple pasted images, confirm all are forwarded.
  • Confirm log line shows e.g. User message contains images: 2 extracted.

When a user pastes an image in ChatView, the renderer creates a content
block of shape `{type:'image', source:{type:'base64', media_type, data}}`
following the Anthropic Messages API. The agent runner only forwarded
the prompt text via `piSession.prompt(string)`, so the image bytes were
silently dropped before reaching the model.

This change extracts image blocks from the current user message,
normalises them to the pi-coding-agent shape
(`{type:'image', mimeType, data}`), and passes them via the
`{images: [...]}` option on `prompt()`. The text-only path is preserved
so we don't change the wire format when no images are present.

Notes:
- Reads `source.media_type` / `source.data` (Anthropic shape) but also
  accepts already-normalised `mimeType` / `data` for forward compat.
- Only the current user turn is processed; prior assistant/tool turns
  are unchanged.
- The existing `hasImages` log line now reports the number of extracted
  images for easier debugging.
@hqhq1025 hqhq1025 added bot-rerun Temporary label for rerunning bot automation and removed bot-rerun Temporary label for rerunning bot automation labels Apr 30, 2026

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Findings

  • No high-confidence issues found in the added/modified lines.

Summary

  • Review mode: initial. No correctness, security, or regression issues were identified in src/main/claude/agent-runner.ts on the modified lines.
  • Residual risk: this change adds a new image-forwarding path in src/main/claude/agent-runner.ts without focused regression coverage under tests/, so future SDK-shape drift may not be caught automatically.

Testing

  • Not run (automation). Suggested follow-up: add an agent-runner regression test that asserts image blocks are normalized to { type: 'image', mimeType, data } and passed to piSession.prompt(..., { images }).

Open Cowork Bot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants