Add structured, channel-aware streaming architecture to LLMEngine by Copilot · Pull Request #24 · SerialKicked/Lethe-AI-Sharp

Copilot · 2026-02-26T06:47:10Z

The existing OnInferenceStreamed/OnInferenceEnded events emit flat strings, making it impossible to cleanly separate thinking/CoT content from response text at the stream level — and entirely inadequate for future tool/function calling where the LLM response is structured data, not text.

New types — `LLM/InferenceStream.cs`

InferenceChannel enum: Text, Thinking, ToolCall, ToolResult, System
InferenceSegment: channel-tagged streaming chunk with Text/ToolCall/ToolResult payload and IsComplete flag
InferenceResult: final structured result with Response (thinking stripped), ThinkingContent, ToolCalls, FinishReason
ToolCallInfo, ToolResultInfo, ToolCallRecord: data classes for future tool-call plumbing

New events on `LLMEngine`

// Per-token, channel-tagged — replaces OnInferenceStreamed
public static event EventHandler<InferenceSegment>? OnInferenceSegment;

// Structured completion — replaces OnInferenceEnded
public static event EventHandler<InferenceResult>? OnInferenceCompleted;

Old events marked [Obsolete] but continue to fire unchanged for backward compatibility.

Streaming state management

Private fields _currentChannel, _thinkingBuffer, _textBuffer track per-generation state
ResetStreamingState() clears all state; called at the start of StartGeneration, RerollLastMessage, and SimpleQueryStreaming
Channel transitions in Client_StreamingMessageReceived use the existing Instruct.IsThinkingPrompt() for robust detection — no hardcoded <think> tags
InferenceResult.Response is derived from the plugin-processed response via RemoveThinkingBlocks(); FinishReason == "tool_calls" emits a terminal ToolCall segment as a hook for future implementation

Usage

LLMEngine.OnInferenceSegment += (_, segment) =>
{
    switch (segment.Channel)
    {
        case InferenceChannel.Text:    Console.Write(segment.Text); break;
        case InferenceChannel.Thinking: /* suppress or show CoT */ break;
    }
};

LLMEngine.OnInferenceCompleted += (_, result) =>
{
    // result.Response has thinking blocks already stripped
    // result.ThinkingContent holds the CoT block if present
    LLMEngine.History.LogMessage(AuthorRole.Assistant, result.Response, ...);
};

Docs

Docs/LLMSYSTEM.md Events section updated with structured event examples and legacy event deprecation note.

Original prompt

Problem

The current streaming event system in LLMEngine is built around flat string events (OnInferenceStreamed / OnInferenceEnded). This was fine when the only content was plain text, but it's already becoming awkward with thinking/CoT models (thinking blocks are mashed into the same string and stripped out post-hoc in multiple places), and it won't scale at all to function calling / tool use, where the LLM response isn't text — it's structured tool-call data.

The goal of this PR is to introduce a structured, channel-aware streaming architecture that cleanly separates different types of inference content (text, thinking, tool calls, tool results, errors) at the stream level, while maintaining full backward compatibility with the existing string-based events.

This is the foundational infrastructure needed before function calling can be implemented.

What to implement

1. New types — `LLM/InferenceStream.cs` (new file)

Create a new file LLM/InferenceStream.cs with these types:

InferenceChannel enum:

Text — Normal visible text response
Thinking — Chain-of-thought / thinking block content
ToolCall — The LLM is requesting a tool/function call
ToolResult — Result being fed back after tool execution
System — Error or system-level message

InferenceSegment class:

InferenceChannel Channel — What kind of content this is
string? Text — The text delta (for Text/Thinking channels)
ToolCallInfo? ToolCall — Tool call data (for ToolCall channel)
ToolResultInfo? ToolResult — Tool result data (for ToolResult channel)
bool IsComplete — Whether this is the final chunk in its channel

ToolCallInfo class:

string CallId
string FunctionName
string ArgumentsJson — Raw JSON arguments

ToolResultInfo class:

string CallId
string FunctionName
bool Success
string ResultJson
string? Error

InferenceResult class (final structured result of a complete inference cycle):

string Response — The final visible text response
string? ThinkingContent — The thinking/CoT block, if any
List<ToolCallRecord> ToolCalls — All tool calls made during this inference
string? FinishReason

ToolCallRecord class:

string CallId
string FunctionName
string ArgumentsJson
string ResultJson
bool Success
TimeSpan Duration

2. New events on `LLMEngine` (in `LLM/LLMEngine.cs`)

Add these new events alongside the existing ones:

/// <summary> Called during inference with typed, channel-tagged segments. Provides richer information than OnInferenceStreamed. </summary>
public static event EventHandler<InferenceSegment>? OnInferenceSegment;

/// <summary> Called when a complete inference cycle finishes. Provides structured results including thinking content and tool call records. </summary>
public static event EventHandler<InferenceResult>? OnInferenceCompleted;

Add private raise methods:

private static void RaiseInferenceSegment(InferenceSegment segment) => OnInferenceSegment?.Invoke(Bot, segment);
private static void RaiseInferenceCompleted(InferenceResult result) => OnInferenceCompleted?.Invoke(Bot, result);

Mark the old events as obsolete (but do NOT remove them):

[Obsolete("Use OnInferenceSegment for channel-aware streaming. This event only receives Text channel content.")]
public static event EventHandler<string>? OnInferenceStreamed;

[Obsolete("Use OnInferenceCompleted for structured results including thinking content and tool calls.")]
public static event EventHandler<string>? OnInferenceEnded;

3. Refactor `Client_StreamingMessageReceived` in `LLM/LLMEngine.cs`

This is the core change. The current method (around line 635) does flat string accumulation. It needs to:

Track which channel we're in using a private field _currentChannel (default InferenceChannel.Text)
Track thinking content separately using a private StringBuilder _thinkingBuffer
Detect channel transitions based on ThinkingStart/ThinkingEnd tags from Instruct
Emit InferenceSegment events tagged with the correct channel
Bridge to old events: when emitting a Text channel segment, also call RaiseOnInferenceStreamed(segment.Text). When completing, also call RaiseOnInferenceEnded(response).
On completion (e.IsComplete), build an InferenceResult with the separated text and thinking content, and raise OnInferenceCompleted.

Key behavior:

During streaming, detect if the incoming token transitions us into/out of a thinking block
Thinking content accumulates in _thinkingBuffer, text content in existing StreamingTextProgress (for backward compat) but only text-channel tokens go there
When e.IsComplete, build InferenceResult with Response (text only) and ThinkingContent (if any)
The FinishReason from the event args should be carried through to InferenceResult.FinishReason
For `Finish...

This pull request was created from Copilot chat.

🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.

Co-authored-by: SerialKicked <1781563+SerialKicked@users.noreply.github.com>

Initial plan

2ba5ac9

Copilot AI assigned Copilot and SerialKicked Feb 26, 2026

Copilot started work on behalf of SerialKicked February 26, 2026 06:47 View session

Add structured channel-aware streaming architecture to LLMEngine

555fb08

Co-authored-by: SerialKicked <1781563+SerialKicked@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Introduce structured streaming architecture for inference events~~ Add structured, channel-aware streaming architecture to LLMEngine Feb 26, 2026

Copilot finished work on behalf of SerialKicked February 26, 2026 06:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add structured, channel-aware streaming architecture to LLMEngine#24

Add structured, channel-aware streaming architecture to LLMEngine#24
Copilot wants to merge 2 commits intomasterfrom
copilot/add-structured-streaming-architecture

Copilot AI commented Feb 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

New types — LLM/InferenceStream.cs

New events on LLMEngine

Streaming state management

Usage

Docs

Problem

What to implement

1. New types — LLM/InferenceStream.cs (new file)

2. New events on LLMEngine (in LLM/LLMEngine.cs)

3. Refactor Client_StreamingMessageReceived in LLM/LLMEngine.cs

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Feb 26, 2026 •

edited

Loading

New types — `LLM/InferenceStream.cs`

New events on `LLMEngine`

1. New types — `LLM/InferenceStream.cs` (new file)

2. New events on `LLMEngine` (in `LLM/LLMEngine.cs`)

3. Refactor `Client_StreamingMessageReceived` in `LLM/LLMEngine.cs`