Fix: Correct SSE streaming JSON parsing (resolves #48) #74
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This pull request fixes the SSE streaming implementation in the Cohere Java SDK. The current SSEIterator in Stream.java collects multiple data: lines into a single buffer and then attempts to parse the entire block as one JSON object. This does not match how Cohere’s SSE responses are structured, where each data: line is its own JSON event. As a result, the existing logic leads to JSON parsing errors and type conversion failures during streaming.
This PR updates the parsing logic to correctly interpret SSE events and restore functional streaming behavior.
Root Cause
Cohere’s streaming API uses standard Server-Sent Events format:
event: message-start
data: { ... }
event: content-delta
data: { ... }
The issue arises because the current implementation:
Appends all data: lines into a shared buffer.
Attempts to parse the entire buffer as JSON when a blank line appears.
Fails for several reasons:
Multiple data: events do not form a valid combined JSON object.
event: lines are not JSON and should not be parsed.
content-delta events are emitted incrementally and must be handled individually.
This incorrect aggregation leads to parse errors and prevents proper incremental streaming.
Fix Implemented
This pull request introduces the following corrections:
Only lines beginning with data: are parsed as individual JSON events.
event: lines and other non-JSON lines are skipped entirely.
Each data: line is decoded and delivered to the consumer immediately.
Additional null checks and type-safety guards eliminate ClassCastException and similar issues.
The updated flow follows SSE semantics and Cohere’s documented streaming behavior.
These changes ensure correct handling of message-start, content-start, and content-delta events, and allow incremental output to function as expected.
Tests
The streaming tests in StreamTest.java have been added or updated to verify:
Correct parsing of message-start and content-delta events.
That only data: lines are parsed as JSON.
That no parsing exceptions occur during streaming.
That incremental output is assembled correctly from multiple content-delta chunks.
All tests pass:
./gradlew test --tests com.cohere.api.StreamTest
Impact
With this fix, SSE streaming in the Java SDK behaves correctly again.
Applications that depend on token-by-token or incremental output will now receive updates reliably without runtime parsing errors. The changes do not affect the public API and remain fully backward-compatible.
Closing
This pull request addresses and resolves Issue #48 by correcting the SSE parsing logic and aligning the SDK with standard SSE behavior and Cohere’s streaming protocol. I am happy to make any adjustments requested during review.