fix(zod-stream): accumulate ReadableStream chunks before JSON.parse#98
Closed
nikmeiser wants to merge 1 commit into
Closed
fix(zod-stream): accumulate ReadableStream chunks before JSON.parse#98nikmeiser wants to merge 1 commit into
nikmeiser wants to merge 1 commit into
Conversation
ReadableStream.read() does not guarantee chunk boundaries align with TransformStream output boundaries. For large string values (e.g. file content in streaming tool calls), a single JSON-encoded partial object can span multiple read() calls, causing JSON.parse to throw on truncated input. Fix readableStreamToAsyncGenerator() and the validationStream TransformStream to accumulate bytes across read() calls and only parse/yield once a complete JSON object is available. Use TextDecoder with stream:true to handle multi-byte UTF-8 sequences split across chunk boundaries.
|
Someone is attempting to deploy a commit to the Hack Dance Team on Vercel. A member of the Team first needs to authorize it. |
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Symptom
An Electron tool I'm using, which depends on
zod-stream, fails when trying to parse a large LLM streamReadableStream.read() does not guarantee chunk boundaries align with TransformStream output boundaries. For large string values (e.g. file content in streaming tool calls), a single JSON-encoded partial object can span multiple read() calls, causing JSON.parse to throw on truncated input.
Root Cause:
S3()inzod-streamDoes Not Handle Stream Chunk BoundariesThe
S3function inzod-stream/dist/index.jsreads from the finalReadableStreamand callsJSON.parseon each chunk:The problem:
ReadableStream.read()does not guarantee that each call returns exactly one complete JSON object. The upstreamTransformStreamemits one JSON-encoded partial object per input chunk, but theReadableStreamlayer may split these acrossread()calls — especially for large content where the encoded JSON object is large.When
JSON.parsereceives a truncated JSON string, it throws. The error is silently swallowed (no catch block inS3), and the generator yieldsundefined. Thefor awaitloop inhandleCreatethen processesundefinedas the last chunk, solastChunkends up asundefinedor the partial object hastext: null.Why
textisnullspecificallySchemaStreamis initialized withtypeDefaults: { string: null }:This means the partial object starts as
{ path: null, text: null }. As chunks arrive, fields are populated incrementally. IfS3yields a partial object beforetextis fully received (due to the chunk boundary issue),textis stillnull.Fix
Fix readableStreamToAsyncGenerator() and the validationStream TransformStream to accumulate bytes across read() calls and only parse/yield once a complete JSON object is available. Use TextDecoder with stream:true to handle multi-byte UTF-8 sequences split across chunk boundaries.
Steps to Reproduce
Environment note: The bug occurs in browser and Electron environments (including Kiro IDE, which runs on Electron 39). Node.js 18+ does not split
ReadableStreamchunks in the same way, so the fullzod-streampipeline may not trigger the bug on Node. The reproduction below isolates the broken assumption directly, without requiringzod-streamto be installed.Reproduce in any browser DevTools console (no install, no Node):
Expected output:
Actual output:
Why Node doesn't reproduce it but browsers do
Node 18+
ReadableStreamdoes not split chunks — eachread()returns exactly what was enqueued. The browser Web Streams API (used by Electron, a popular IDE framework) applies backpressure and splits large chunks at the internal buffer boundary (~64KB). ThereadableStreamToAsyncGeneratorfunction assumes eachread()returns a complete JSON object, which holds on Node but breaks in browser/Electron for objects larger than ~64KB.