Skip to content

OpenAI SSE accumulator assumes one token channel per streamed chunk #325

@hvagadia

Description

@hvagadia

The current OpenAISSEAccumulator design appears to assume that each streamed SSE delta belongs to only one token channel at a time: content, reasoning, or tool-call data. That invariant is not guaranteed by OpenAI-compatible serving frameworks, especially when stream batching / larger stream intervals are used.

Code pointers:

Concrete example:

If one streamed delta contains both content and reasoning:

from inference_endpoint.openai.accumulator import OpenAISSEAccumulator
from inference_endpoint.openai.types import SSEChoice, SSEDelta

acc = OpenAISSEAccumulator("qid", stream_all_chunks=True)
acc.add_chunk(
    SSEChoice(
        delta=SSEDelta(
            content="answer",
            reasoning_content="think",
            tool_calls=[
                {
                    "index": 0,
                    "id": "call_1",
                    "type": "function",
                    "function": {"name": "f", "arguments": "{}"},
                }
            ],
        )
    )
)

result = acc.get_final_output()
print(result.response_output.as_message_parts())

Observed result:

("answer", None, (...tool_calls...))

The reasoning_content="think" token data is silently dropped because the content branch wins and the reasoning_content branch is skipped.

This may be less likely with stream_interval=1, but it is not a safe design invariant to rely on. Multiple token channels in the same streamed chunk could cause incorrect final outputs, missing reasoning data, and potentially incorrect downstream token accounting.

Potential solution:

Represent each streamed SSE frame as a structured chunk object that can carry all token-channel deltas present in that frame, e.g. content delta, reasoning delta, and tool-call delta together. The accumulator can then maintain a list of these structured chunks and build the final TextModelOutput by independently folding each channel across all chunks, instead of using mutually exclusive branch logic during ingestion.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions