Summary
When a coding agent finishes (ready event), the ElevenLabs voice assistant often does not read back the agent's answer reliably. The user may need to ask multiple times; ConvAI sometimes hallucinates a partial summary.
This affects all agent flavors, but not equally - see below.
Two related gaps in web/src/realtime/hooks/contextFormatters.ts:
formatReadyEvent (older behavior): injected text told ConvAI "the previous message(s) are the summary" without embedding assistant text inline. All flavors - ConvAI is pointed at context that may not contain the answer, with no inline fallback.
formatMessage: does not format stream-json / codex-family payloads ({ type: "codex", data: { type: "message", message: "..." } }), so onMessages and session history context updates are empty for those sessions. Claude-style content arrays (text blocks, plain strings) do work.
Affected agents (how)
| Flavor |
Ready inject gap |
Live context (onMessages / history) |
Typical outcome |
| Claude |
Yes |
Usually populated (text blocks reach formatMessage) |
Unreliable readback - context may help, but ready inject does not embed text; still fails in dogfood |
| Codex |
Yes |
Empty (stream-json not formatted) |
Readback usually fails; hallucination risk if user presses for a summary |
| Cursor |
Yes |
Empty (same stream-json shape as Codex) |
Same as Codex |
| Gemini, Kimi, OpenCode |
Yes |
Empty if using codex-family stream-json envelopes |
Same as Codex/Cursor |
| All |
Yes |
Varies |
Even when live context exists, old ready text did not paste the final answer inline |
Bottom line: the ready-event bug is universal. Codex/Cursor (and other stream-json flavors) also had silent context - no history and no live updates - so they were hit hardest.
Expected
On ready, voice should receive the last assistant speakable text inline (e.g. wrapped in <text>…</text>) so ConvAI can summarize immediately without the user re-prompting. Stream-json messages should flow through formatMessage for live updates and session history.
Repro
Any flavor (ready gap):
- Open a session with voice enabled
- Ask a factual question
- Agent replies in chat with a substantive answer
- Voice signals "finished" but does not convey the answer unless the user re-prompts
Codex / Cursor / stream-json flavors (additional context gap):
Same steps; voice also lacks interim agent text in context between messages, so readback fails more consistently and confabulation is more likely.
Evidence
Dogfood on ElevenLabs ConvAI: ready inject appeared as a fake user turn with no embedded assistant text; when context was thin (especially Codex/Cursor sessions), assistant confabulated instead of reading hub message content. A Cursor session with the fix applied showed embedded <text>…</text> at ready and accurate readback one second later (conv conv_4501ksdt0athfhfr189tq3jehkcq).
Suggested fix (minimal)
extractLastAssistantSpeakable() + embed in formatReadyEvent() (ready hook) - all flavors
- Teach
formatMessage() the codex stream-json path so live onMessages context is not empty - Codex/Cursor/stream-json flavors
Might be related to open PR #640 (Codex voice context) - not sure if that covers ready readback specifically.
Draft fix: PR #682
Summary
When a coding agent finishes (
readyevent), the ElevenLabs voice assistant often does not read back the agent's answer reliably. The user may need to ask multiple times; ConvAI sometimes hallucinates a partial summary.This affects all agent flavors, but not equally - see below.
Two related gaps in
web/src/realtime/hooks/contextFormatters.ts:formatReadyEvent(older behavior): injected text told ConvAI "the previous message(s) are the summary" without embedding assistant text inline. All flavors - ConvAI is pointed at context that may not contain the answer, with no inline fallback.formatMessage: does not format stream-json / codex-family payloads ({ type: "codex", data: { type: "message", message: "..." } }), soonMessagesand session history context updates are empty for those sessions. Claude-style content arrays (text blocks, plain strings) do work.Affected agents (how)
onMessages/ history)formatMessage)Bottom line: the ready-event bug is universal. Codex/Cursor (and other stream-json flavors) also had silent context - no history and no live updates - so they were hit hardest.
Expected
On
ready, voice should receive the last assistant speakable text inline (e.g. wrapped in<text>…</text>) so ConvAI can summarize immediately without the user re-prompting. Stream-json messages should flow throughformatMessagefor live updates and session history.Repro
Any flavor (ready gap):
Codex / Cursor / stream-json flavors (additional context gap):
Same steps; voice also lacks interim agent text in context between messages, so readback fails more consistently and confabulation is more likely.
Evidence
Dogfood on ElevenLabs ConvAI: ready inject appeared as a fake user turn with no embedded assistant text; when context was thin (especially Codex/Cursor sessions), assistant confabulated instead of reading hub message content. A Cursor session with the fix applied showed embedded
<text>…</text>at ready and accurate readback one second later (convconv_4501ksdt0athfhfr189tq3jehkcq).Suggested fix (minimal)
extractLastAssistantSpeakable()+ embed informatReadyEvent()(ready hook) - all flavorsformatMessage()the codex stream-json path so liveonMessagescontext is not empty - Codex/Cursor/stream-json flavorsMight be related to open PR #640 (Codex voice context) - not sure if that covers ready readback specifically.
Draft fix: PR #682