Skip to content

fix: align /v1/chat/completions SSE stream with OpenAI spec#1262

Open
sufubao wants to merge 2 commits intomainfrom
stream_done
Open

fix: align /v1/chat/completions SSE stream with OpenAI spec#1262
sufubao wants to merge 2 commits intomainfrom
stream_done

Conversation

@sufubao
Copy link
Copy Markdown
Collaborator

@sufubao sufubao commented Apr 7, 2026

Summary

  • Add missing data: [DONE]\n\n terminator at the end of /v1/chat/completions streams — clients following the SSE spec were hanging waiting for this sentinel (the /v1/completions endpoint already emitted it, so this was an inconsistency within LightLLM itself)
  • Emit a role-only initial chunk delta: {role: "assistant", content: ""} once per choice before any content, matching OpenAI API / vLLM behaviour
  • Remove role: "assistant" from all subsequent delta chunks (plain content, tool-call text, and reasoning paths) so role appears only in the first chunk as the spec requires

Chunk sequence after this fix

data: {"delta": {"role": "assistant", "content": ""}, ...}   ← role-only first chunk
data: {"delta": {"content": "Hello"}, ...}                   ← token chunks (no role)
...
data: {"delta": {}, "finish_reason": "stop", ...}            ← finish chunk
data: {"choices": [], "usage": {...}, ...}                    ← only if include_usage=true
data: [DONE]

Test Plan

  • Verify streaming response with a standard SSE client no longer hangs after generation completes
  • Confirm first chunk carries role: "assistant" with content: ""
  • Confirm subsequent content chunks do not repeat role
  • Confirm data: [DONE] appears as the final line of the stream
  • Test with stream_options: {include_usage: true} — usage chunk precedes [DONE]
  • Test tool-call streaming — role-only chunk still emitted first, tool call chunks unaffected

- Add missing `data: [DONE]\n\n` terminator at end of stream; clients
  that follow the SSE spec were hanging waiting for this sentinel
- Emit a role-only initial chunk `{role:"assistant",content:""}` once
  per choice before any content, matching OpenAI / vLLM behaviour
- Remove `role:"assistant"` from all subsequent delta chunks (reasoning,
  tool-call text, and plain content paths) so role appears only in the
  first chunk as the spec requires
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the OpenAI API streaming implementation to comply with the SSE specification by emitting an initial role-only chunk and removing the role from subsequent deltas. It also adds a [DONE] terminator to the end of the stream. Feedback was provided to encode the terminator string to bytes to ensure consistency with the function's type hint and other parts of the codebase.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant