Skip to content

feat(hub+cli): surface mermaid parse failures back to the agent so it can self-correct #829

@heavygee

Description

@heavygee

Summary

The web client already validates mermaid blocks before rendering (explicit mermaid.parse() + mermaid.render() wrapped in try/catch, securityLevel: 'strict', suppressErrorRendering: true, fallback to raw source on failure - see web/src/components/assistant-ui/mermaid-diagram.tsx). That stack prevents broken charts from XSS-ing or crashing the page (closed by #785) and is the foundation the lightbox in #737 / #741 sits on top of.

What it does not do: tell the agent that its diagram failed to parse. The validation loop is one-way. The user sees raw mermaid source in a code-block fallback; the agent sees nothing and happily emits another broken chart next turn. Over a long session, this produces a quiet UX regression - the operator stops getting diagrams entirely until they manually point out "your charts have been raw text for 20 messages."

This is the same class of gap that #675 / #676 closed for OpenCode errors ("invisible failure"), applied to the agent-side feedback path instead of the user-side display.

Current behavior

  1. Agent emits a markdown mermaid code block in assistant text.
  2. Hub stores + forwards the message verbatim. No mermaid awareness.
  3. Browser tries mermaid.parse(code, { suppressErrors: true }).
  4. Parse returns falsy or mermaid.render() throws → MermaidFallback renders the raw source in a <pre><code> block.
  5. User sees the raw text. Agent receives no signal. Hub has no record that anything failed.

No SSE event, no tool-result reply, no system-channel hint into the transcript.

Proposed approaches (pick one, in increasing weight)

A. CLI-side post-emit re-parse (smallest)

In the CLI's outgoing assistant-text path, scan for ```mermaid fences, run a lightweight mermaid-parse (or call the existing client validator via bun/Node), and on failure inject a single system-channel turn back into the agent transcript:

The previous mermaid block did not parse:

<parser error excerpt, 1-3 lines>

The block was preserved in the user-visible message as raw text. Please re-emit it with corrected syntax if the diagram was intended.

  • Pros: hub stays mermaid-unaware; same diff in one place; works for every UI client.
  • Cons: CLI carries a mermaid dependency (it does not today); version drift between CLI parser and browser parser could disagree.

B. Client → hub → CLI signal (most accurate to what the user actually sees)

MermaidDiagram emits a typed event (mermaid-parse-failure) up to a session-level handler that POSTs to a new /api/sessions/:id/render-issues endpoint. Hub persists a small render_issues row (sessionId, messageId, kind=mermaid_parse_failure, snippet, parser version) and broadcasts via SSE so the CLI can convert it into a system-channel hint to the agent on the next turn.

  • Pros: ground truth is "what the user actually saw fail in their browser." No CLI mermaid dependency. Captures parser-version drift exactly because it's the user's parser.
  • Cons: needs a new endpoint, persistence row, SSE type, and CLI consumer. More surface to maintain.

C. Both A and B, layered

A for fast feedback in the same turn; B as the system of record so the agent gets a high-confidence hint on the next turn even if A missed. Probably overbuild for v1.

Acceptance criteria

  • Agent receives a structured, low-noise signal when one of its emitted mermaid blocks fails to parse / render in at least one connected client.
  • Signal is not delivered for every render (no thrash if 4 PWAs render the same broken chart - dedupe per messageId).
  • User-visible fallback behavior is unchanged.
  • No additional XSS surface; securityLevel: 'strict' and suppressErrorRendering: true stay in place.
  • Optional: surface the same signal to the UI as a subtle marker on the failing message (e.g. small "diagram could not be rendered" pill) so the user knows the agent has been notified.

Out of scope

Related

Environment

  • Reproduces on web PWA + mobile, any agent flavor that can emit markdown (Claude, Codex, Cursor, etc.).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions