feat(streaming): emit ReasoningDeltaEvent for reasoning/thinking deltas#2913
feat(streaming): emit ReasoningDeltaEvent for reasoning/thinking deltas#2913adityasingh2400 wants to merge 6 commits into
Conversation
a2d4974 to
3a823e4
Compare
|
This PR is stale because it has been open for 10 days with no activity. |
3a823e4 to
5c36515
Compare
|
This PR is stale because it has been open for 10 days with no activity. |
…as (openai#825) Add a new ReasoningDeltaEvent to StreamEvent so callers can react to reasoning/thinking tokens in real time without unpacking low-level raw response events. The event is emitted whenever a ResponseReasoningSummaryTextDeltaEvent (o-series extended thinking via the Responses API) or a ResponseReasoningTextDeltaEvent (third-party models like DeepSeek-R1 via LiteLLM) passes through the stream. The underlying RawResponsesStreamEvent is still emitted as well, so nothing breaks for consumers that already inspect raw events. Fields: delta - the incremental text fragment from this chunk snapshot - full accumulated reasoning text so far in this turn type - always 'reasoning_delta' Closes openai#825
- Import ResponseCreatedEvent and reset _reasoning_snapshot to "" when a ResponseCreatedEvent is received inside the retry stream loop, fixing the bug where snapshot text would be duplicated across retries - In test_reasoning_delta_event_type_field: add found=False flag and assert found after the loop so the test properly fails when no ReasoningDeltaEvent is emitted
The two stream-event tests were only asserting on data conditional on a
ReasoningDeltaEvent being emitted at all, so a regression that stopped
emitting the event entirely would have passed silently.
* test_reasoning_delta_snapshot_accumulates: assert that snapshots is
non-empty before checking monotonic length and the "Hello world"
inclusion (previously gated on `if snapshots:`).
* test_no_reasoning_delta_event_without_reasoning: count yielded events
and assert the stream produced at least one, so the negative
not-isinstance assertion can't pass on an empty event stream.
Picked up the remaining nitpicks from the CodeRabbit review of PR #3.
5c36515 to
5b04cf8
Compare
|
Bumping after the stale-bot ping. I just rebased onto current main and CI is still green across the 5-file diff. The change stays narrow, one new ReasoningDeltaEvent type, a small emit path in run_loop, and tests. Happy to split the docs/example out into a follow-up PR if that helps land the core type sooner. cc @rm-openai @seratch when you have a moment. |
|
@adityasingh2400 Thanks for sharing this. This change could drastically increase the number of events that could be sent, so we hesitate changing the default behavior in this way for now. |
|
Thanks for the read. Totally fair point on event volume. Would you accept the change behind an opt-in flag so the default stream stays unchanged? I can gate emission behind a RunConfig flag (e.g. emit_reasoning_deltas defaulting to False) so anyone with a reasoning UI can opt in, and existing consumers see identical event counts. If you'd prefer to close this and reopen as the gated version, happy to do that too. |
…ning_deltas Default the new ReasoningDeltaEvent emission to off so the streamed event count is unchanged for existing consumers. Opt in via RunConfig.emit_reasoning_deltas=True to receive reasoning deltas without unwrapping the raw events. Addresses the event-volume concern raised in review.
|
I went ahead and gated this behind an opt-in flag so the default behavior is unchanged. I added RunConfig.emit_reasoning_deltas, defaulting to False. When it is False the runner emits exactly the same events as before, so existing consumers see no change in event volume. Setting it to True turns on ReasoningDeltaEvent for anyone building a reasoning UI. The two emission points in run_single_turn_streamed are now wrapped in that flag check, and I added a test asserting no ReasoningDeltaEvent is emitted by default plus updated the existing streaming tests to opt in. Locally ruff format, ruff lint, mypy on the two changed source files, and the full tests/test_stream_events.py and tests/test_reasoning_delta_stream_event.py suites all pass. Let me know if you would prefer the flag live somewhere other than RunConfig. |
Adding ReasoningDeltaEvent to the StreamEvent union widened PrintableEvent, so mypy could no longer prove the trailing match handled a RunItemStreamEvent. Add an early return for ReasoningDeltaEvent, mirroring the existing RawResponsesStreamEvent guard, which restores narrowing and fixes typecheck.
Summary
When models like o3 or DeepSeek-R1 produce reasoning/thinking tokens during streaming, those deltas currently only surface as raw
RawResponsesStreamEventwrappers around low-levelresponse.reasoning_summary_text.deltaorresponse.reasoning_text.deltaevents. To consume them, callers have to inspect.data.typeand cast the event themselves — there's no clean signal in theStreamEventunion.This PR adds
ReasoningDeltaEventtoStreamEventand emits it alongside the existing raw event so reasoning deltas are as easy to consume as message deltas.Closes #825
What changed
ReasoningDeltaEventdataclass tostream_events.pywithdelta,snapshot, andtypefieldsStreamEventtype alias to includeReasoningDeltaEventagents/__init__.pyrun_internal/run_loop.py, therun_single_turn_streamedloop now emits aReasoningDeltaEventafter eachResponseReasoningSummaryTextDeltaEvent(o-series) andResponseReasoningTextDeltaEvent(DeepSeek/LiteLLM)snapshotfield accumulates the full reasoning text so far in the turn, so callers don't have to maintain their own bufferUsage example
Tests
Added
tests/test_reasoning_delta_stream_event.pycovering:ReasoningDeltaEventis emitted for reasoning itemsagentsAlso updated
tests/test_stream_events.py::test_complete_streaming_eventsto account for the new event in the event sequence (count goes from 27 → 28).Summary by CodeRabbit
New Features
ReasoningDeltaEventfor streaming incremental reasoning updates from AI agents, featuring:delta: incremental reasoning fragmentsnapshot: accumulated reasoning textTests