fix: align /v1/chat/completions SSE stream with OpenAI spec by sufubao · Pull Request #1262 · ModelTC/LightLLM

sufubao · 2026-04-07T06:54:48Z

Summary

Add missing data: [DONE]\n\n terminator at the end of /v1/chat/completions streams — clients following the SSE spec were hanging waiting for this sentinel (the /v1/completions endpoint already emitted it, so this was an inconsistency within LightLLM itself)
Emit a role-only initial chunk delta: {role: "assistant", content: ""} once per choice before any content, matching OpenAI API / vLLM behaviour
Remove role: "assistant" from all subsequent delta chunks (plain content, tool-call text, and reasoning paths) so role appears only in the first chunk as the spec requires

Chunk sequence after this fix

data: {"delta": {"role": "assistant", "content": ""}, ...}   ← role-only first chunk
data: {"delta": {"content": "Hello"}, ...}                   ← token chunks (no role)
...
data: {"delta": {}, "finish_reason": "stop", ...}            ← finish chunk
data: {"choices": [], "usage": {...}, ...}                    ← only if include_usage=true
data: [DONE]

Test Plan

Verify streaming response with a standard SSE client no longer hangs after generation completes
Confirm first chunk carries role: "assistant" with content: ""
Confirm subsequent content chunks do not repeat role
Confirm data: [DONE] appears as the final line of the stream
Test with stream_options: {include_usage: true} — usage chunk precedes [DONE]
Test tool-call streaming — role-only chunk still emitted first, tool call chunks unaffected

- Add missing `data: [DONE]\n\n` terminator at end of stream; clients that follow the SSE spec were hanging waiting for this sentinel - Emit a role-only initial chunk `{role:"assistant",content:""}` once per choice before any content, matching OpenAI / vLLM behaviour - Remove `role:"assistant"` from all subsequent delta chunks (reasoning, tool-call text, and plain content paths) so role appears only in the first chunk as the spec requires

gemini-code-assist

Code Review

This pull request updates the OpenAI API streaming implementation to comply with the SSE specification by emitting an initial role-only chunk and removing the role from subsequent deltas. It also adds a [DONE] terminator to the end of the stream. Feedback was provided to encode the terminator string to bytes to ensure consistency with the function's type hint and other parts of the codebase.

lightllm/server/api_openai.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

gemini-code-assist bot reviewed Apr 7, 2026

View reviewed changes

lightllm/server/api_openai.py Outdated Show resolved Hide resolved

Apply suggestions from code review

2282c21

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: align /v1/chat/completions SSE stream with OpenAI spec#1262

fix: align /v1/chat/completions SSE stream with OpenAI spec#1262
sufubao wants to merge 2 commits intomainfrom
stream_done

sufubao commented Apr 7, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sufubao commented Apr 7, 2026

Summary

Chunk sequence after this fix

Test Plan

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant