Skip to content

🚀 Centralize streaming orchestration into a single StreamHandler #894

@vinitkadam03

Description

@vinitkadam03

Problem

The current streaming architecture distributes orchestration logic across every provider's Stream handler. Each of the 9 providers independently manages:

  • Lifecycle event emission (StreamStartEvent, StepStartEvent, TextStartEvent, ThinkingStartEvent, and their corresponding complete/finish events)
  • State tracking (stream/step/text/thinking flags, message IDs, accumulated text)
  • Tool execution and recursive turn handling
  • Usage accumulation across tool-call turns

This results in ~5,000 lines of duplicated control flow spread across provider handlers, making the streaming pipeline difficult to reason about, modify, or extend. Adding a new lifecycle event or changing tool-call recursion behavior requires touching every provider.

Proposed Solution

Introduce a clear separation between orchestration and parsing:

  1. StreamHandler - a single concrete orchestrator that owns the entire streaming lifecycle: event emission, state management, tool execution, and recursion. This is the only place where the streaming flow is defined.

  2. StreamParser - a minimal interface that provider handlers implement. Each parser's sole responsibility is to call the provider's API, read raw chunks, and yield primitive content events (TextDeltaEvent, ThinkingEvent, ToolCallEvent, etc.). Turn-level metadata (finish reason, usage, model) is communicated back via the generator's return value (TurnResult).

  3. ChatCompletionsStreamParser - an abstract base class for providers that follow the OpenAI chat completions SSE format (DeepSeek, XAI, Groq, Mistral, OpenRouter), reducing them to a handful of extraction hooks.

Providers with unique response formats (OpenAI Responses API, Anthropic, Gemini, Ollama) implement StreamParser directly.

Every provider's stream() method becomes:

$parser = new Stream($this->client(...));

return (new StreamHandler)->handle($parser, $request);

Benefits

  • Single source of truth for the streaming flow - one class to read, one class to debug
  • ~3,600 net lines removed across 32 files
  • Easier to extend - new lifecycle events or tool-call behavior changes happen in one place
  • Provider handlers become pure translators - simpler to write, test, and review
  • Consistent behavior across all providers for edge cases (usage accumulation, step boundaries, thinking/text transitions)
  • 🔭 Enables telemetry — a single orchestrator means instrumentation (spans, metrics, logging) can be added in one place rather than across every provider 🎉

Implementation

A working implementation is available at vinitkadam03/prism#4. This builds on top of #880 (client-executed tools) which is currently under review.

Future scope

The same orchestrator + parser pattern can be applied to other handler types to further reduce duplication:

  • Text - a single TextHandler orchestrator with provider-specific parsers for request building and response mapping
  • Structured - same approach for structured/JSON responses
  • Embeddings - centralize embedding request/response flow

This issue focuses on streaming first since it has the most duplication and complexity. The other handler types can follow the same pattern in subsequent PRs once the direction is confirmed.

Status

This vinitkadam03/prism#4 is a draft / direction check - the implementation is functional and all tests pass, but class names and file locations may still change. The goal is to confirm this architectural direction before finalizing. @sixlive

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions