Skip to content

fix(zod-stream): accumulate ReadableStream chunks before JSON.parse#98

Closed
nikmeiser wants to merge 1 commit into
hack-dance:mainfrom
nikmeiser:fix/zod-stream-chunk-boundary
Closed

fix(zod-stream): accumulate ReadableStream chunks before JSON.parse#98
nikmeiser wants to merge 1 commit into
hack-dance:mainfrom
nikmeiser:fix/zod-stream-chunk-boundary

Conversation

@nikmeiser

Copy link
Copy Markdown

Symptom

An Electron tool I'm using, which depends on zod-stream, fails when trying to parse a large LLM stream

ReadableStream.read() does not guarantee chunk boundaries align with TransformStream output boundaries. For large string values (e.g. file content in streaming tool calls), a single JSON-encoded partial object can span multiple read() calls, causing JSON.parse to throw on truncated input.

Root Cause: S3() in zod-stream Does Not Handle Stream Chunk Boundaries

The S3 function in zod-stream/dist/index.js reads from the final ReadableStream and calls JSON.parse on each chunk:

// zod-stream/dist/index.js — the broken function
async function* S3(t21) {
  let e19 = t21.getReader(), o25 = new TextDecoder();
  for (;;) {
    let { done: n29, value: r23 } = await e19.read();
    if (n29) break;
    let a27 = A2(o25.decode(r23));  // A2 strips control chars \x00-\x1F\x7F-\x9F
    yield JSON.parse(a27);           // ← FAILS when r23 is a partial chunk
  }
}

The problem: ReadableStream.read() does not guarantee that each call returns exactly one complete JSON object. The upstream TransformStream emits one JSON-encoded partial object per input chunk, but the ReadableStream layer may split these across read() calls — especially for large content where the encoded JSON object is large.

When JSON.parse receives a truncated JSON string, it throws. The error is silently swallowed (no catch block in S3), and the generator yields undefined. The for await loop in handleCreate then processes undefined as the last chunk, so lastChunk ends up as undefined or the partial object has text: null.

Why text is null specifically

SchemaStream is initialized with typeDefaults: { string: null }:

// zod-stream/dist/index.js — chatCompletionStream
let i25 = new import_schema_stream.SchemaStream(n29.schema, {
  typeDefaults: { string: null, number: null, boolean: null },
  ...
});

This means the partial object starts as { path: null, text: null }. As chunks arrive, fields are populated incrementally. If S3 yields a partial object before text is fully received (due to the chunk boundary issue), text is still null.

Fix

Fix readableStreamToAsyncGenerator() and the validationStream TransformStream to accumulate bytes across read() calls and only parse/yield once a complete JSON object is available. Use TextDecoder with stream:true to handle multi-byte UTF-8 sequences split across chunk boundaries.

Steps to Reproduce

Environment note: The bug occurs in browser and Electron environments (including Kiro IDE, which runs on Electron 39). Node.js 18+ does not split ReadableStream chunks in the same way, so the full zod-stream pipeline may not trigger the bug on Node. The reproduction below isolates the broken assumption directly, without requiring zod-stream to be installed.

Reproduce in any browser DevTools console (no install, no Node):

// Paste in browser DevTools console (Chrome, Firefox, Safari — any modern browser)
// No dependencies required.

const fullJson = JSON.stringify({ path: 'output.md', text: 'x'.repeat(70_000) })
const encoder = new TextEncoder()
const encoded = encoder.encode(fullJson)

// Simulate what the browser ReadableStream does with a 70KB chunk:
// it splits at the internal buffer boundary (~64KB = 65536 bytes).
async function* splitAtBufferBoundary() {
  const splitAt = 65536
  yield encoded.slice(0, splitAt)   // first read() — truncated JSON
  yield encoded.slice(splitAt)      // second read() — remainder
}

// This is the exact logic in readableStreamToAsyncGenerator():
const decoder = new TextDecoder()
const reader = ReadableStream.from(splitAtBufferBoundary()).getReader()

let parseFailures = 0
let lastValue = null

while (true) {
  const { done, value } = await reader.read()
  if (done) break
  try {
    lastValue = JSON.parse(decoder.decode(value))  // throws on first chunk (truncated JSON)
  } catch {
    parseFailures++  // silently swallowed — this is the bug
  }
}

console.log('parse failures (silently swallowed):', parseFailures)  // 1
console.log('last parsed text is null:', lastValue?.text === null)   // true  ← BUG
console.log()
console.log('Expected: text should be a 70000-char string')
console.log('Actual:   text is null because JSON.parse threw on the')
console.log('          first (truncated) chunk and was silently ignored')

Expected output:

parse failures (silently swallowed): 0
last parsed text is null: false

Actual output:

parse failures (silently swallowed): 1
last parsed text is null: true

Expected: text should be a 70000-char string
Actual:   text is null because JSON.parse threw on the
          first (truncated) chunk and was silently ignored

Why Node doesn't reproduce it but browsers do

Node 18+ ReadableStream does not split chunks — each read() returns exactly what was enqueued. The browser Web Streams API (used by Electron, a popular IDE framework) applies backpressure and splits large chunks at the internal buffer boundary (~64KB). The readableStreamToAsyncGenerator function assumes each read() returns a complete JSON object, which holds on Node but breaks in browser/Electron for objects larger than ~64KB.

ReadableStream.read() does not guarantee chunk boundaries align with
TransformStream output boundaries. For large string values (e.g. file
content in streaming tool calls), a single JSON-encoded partial object
can span multiple read() calls, causing JSON.parse to throw on
truncated input.

Fix readableStreamToAsyncGenerator() and the validationStream
TransformStream to accumulate bytes across read() calls and only
parse/yield once a complete JSON object is available. Use TextDecoder
with stream:true to handle multi-byte UTF-8 sequences split across
chunk boundaries.
@vercel

vercel Bot commented Apr 16, 2026

Copy link
Copy Markdown

Someone is attempting to deploy a commit to the Hack Dance Team on Vercel.

A member of the Team first needs to authorize it.

@changeset-bot

changeset-bot Bot commented Apr 16, 2026

Copy link
Copy Markdown

⚠️ No Changeset found

Latest commit: e87886c

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@nikmeiser nikmeiser closed this by deleting the head repository Apr 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant