feat(agents-server-ui): stream model reasoning into the UI#4508
feat(agents-server-ui): stream model reasoning into the UI#4508kevin-dp wants to merge 15 commits into
Conversation
While the model is "thinking" (Anthropic extended thinking, DeepSeek-R1 reasoning_content, Moonshot K2, OpenAI Responses summaries) the agent response now shows the reasoning text faded above the answer, with the existing `Thinking` shimmer heading + elapsed-time ticker. Once the reasoning settles, it collapses to `▸ Thought for 12s` — click to expand. Multiple reasoning rows per run render independently in order (one per LLM step in tool-using turns). End-to-end plumbing: - Schema: `reasoning` row gains `run_id`, `encrypted` (Anthropic redacted blocks must round-trip back to the model), and `summary_title` (extracted at write time). New `reasoningDeltas` collection mirrors `textDeltas` for streamed content. - Bridge: `OutboundBridge` gains `onReasoningStart` / `onReasoningDelta` / `onReasoningEnd`, parallel to text. - Adapter: `pi-adapter.ts` routes `thinking_start` / `thinking_delta` / `thinking_end` from pi-ai. Parses a `**Title**\n\n<body>` heading once at write time (OpenAI Responses; no-op for others). - Timeline: live `reasoning: Collection<EntityTimelineReasoningItem>` on `EntityTimelineRunRow`, content built via delta-join. - UI: new `<ReasoningSection>` renders above items in `AgentResponseLive`. Streamdown body, click-to-expand on settle, redacted-block placeholder for opaque Anthropic payloads.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #4508 +/- ##
===========================================
- Coverage 74.83% 58.22% -16.62%
===========================================
Files 54 371 +317
Lines 7300 40887 +33587
Branches 2353 11594 +9241
===========================================
+ Hits 5463 23806 +18343
- Misses 1820 17006 +15186
- Partials 17 75 +58
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Electric Agents Mobile BuildLocal mobile checks ran for commit The EAS Android preview build was skipped because the |
Previously `withProviderPayloadDefaults` short-circuited for any
provider other than OpenAI / OpenAI-Codex, so picking Claude with a
`reasoningEffort` higher than `auto` produced no effect — no
`thinking` parameter was added to the request, so Anthropic ran in
standard mode and the model emitted no `thinking_delta` events. The
inbound reasoning plumbing landed in the same PR was correct but
unreachable from Anthropic without this.
Now: when the chosen model is Anthropic-capable for reasoning AND
`reasoningEffort` is explicit (minimal/low/medium/high), inject
thinking: { type: "enabled", budget_tokens: <by effort> }
into the payload. Budgets follow Anthropic's docs (≥ 1024 floor):
minimal=1024, low=2048, medium=8192, high=24576. `auto` stays opt-out
of thinking so default sessions don't silently incur the extra
reasoning tokens.
KyleAMathews
left a comment
There was a problem hiding this comment.
Lovely! Could you add a screenshot of the UI to the PR body?
Three latent bugs in the reasoning-content branch that together made
extended thinking and the assistant's answer text fail to render:
1. **Alias collision in the timeline live query** —
`entity-timeline.ts` had two correlated sub-queries (one for
`items.text.content`, one for `reasoning.content`) both using
`chunk` as the `from({...})` alias. TanStack DB silently
mis-bound the correlation when both were active in the same run
projection, so `items.text.content` came back as an empty string
even though the deltas were present in `db.collections.textDeltas`.
Reasoning won the binding; the answer didn't render at all.
Fix: rename the inner alias to `textChunk`, and hoist the union
row's text fields to top-level scalars (`text_key`, `text_run_id`,
…) so the correlation references a top-level field instead of a
nested `item.text.key` (also a source of empty joins).
2. **Anthropic thinking always-on instead of opt-in** —
`withProviderPayloadDefaults` short-circuited for Anthropic when
`reasoningEffort` was `auto`, so no `thinking` parameter ever
reached the API. The OpenAI branch already defaulted `auto` to
`minimal`; Anthropic now does the same (1024-token budget). `low`
/ `medium` / `high` scale the budget exactly as before.
3. **Anthropic `thinking` merge order** — pi-ai writes
`thinking: { type: "disabled" }` into the request body by default.
Our `onPayload` was merging `existingThinking` _last_, so the
default `type: "disabled"` clobbered our `type: "enabled"` and
the API rejected `budget_tokens` with
`thinking.disabled.budget_tokens: Extra inputs are not permitted`.
Spread `existingThinking` first now, then `type` + `budget_tokens`.
Tests:
- `entity-timeline.test.ts` — regression test exercises
`createEntityTimelineQuery` end-to-end with text and reasoning rows
in the same run; fails on the alias collision, passes with the
rename + flat-field projection.
- `model-catalog.test.ts` — adds Anthropic-side coverage that mirrors
the existing OpenAI tests: always-on minimal budget on `auto`,
scaled budget on explicit effort, and `type: disabled` override
for pre-existing `thinking` in the payload.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…eltas The reasoning sub-collection's `content` field — projected via `concat(toArray(<correlated delta-join>))` — went stale in the running app after the row's status flipped to `completed`, surfacing `content: null` in the live query even though the deltas were still present in the local DB. The expand-thought-block view rendered an empty body until the user navigated away and back (forcing a fresh live-query subscription), at which point the join evaluated cleanly. Unit tests for the same projection pattern all pass — the bug only reproduces in the running app, against an established live-query graph with overlapping text/reasoning subscriptions. The sub-query itself is correct (data is there after a fresh subscription), but something about the long-lived subscription state makes the correlated row binding stale. Sidestep the unreliable projection entirely: - **Timeline query** — drop the `content` field from `EntityTimelineReasoningItem`. Expose `run.reasoningDeltas` as a parallel sub-collection (mirroring `run.reasoning`), surfacing the raw deltas keyed by `reasoning_id`. - **UI** — `AgentResponseLive` subscribes to both `run.reasoning` and `run.reasoningDeltas`, builds a `Map<reasoning_id, content>` from the deltas client-side, and merges it onto the reasoning rows before handing them to `<ReasoningSection>`. Reactive on every delta arrival, no stale state. - **State lift** — `expanded` for the collapsed "Thought for Ns" toggle moves from `ReasoningEntryView` (per-entry) up to `ReasoningSection` (keyed by `entry.key`), so the user's choice survives any spurious unmount of the entry view (virtualizer measurement passes, brief entries-empty states, etc.). Tests: - New regressions in `entity-timeline.test.ts` exercise the deltas sub-collection with the same shape as the failing production scenario: reasoning + text together, multi-step run-row updates, status transitions. Follow-up: investigate why the original correlated sub-query goes stale only against long-lived live-query graphs (passes in tests). The `content` projection has been left commented-out in case we want to restore it after fixing the underlying TanStack DB issue. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The original `reasoning.content` projection used `concat(toArray(<correlated delta-join>))`, which TanStack DB compiles to a `buildIncludesSubquery(..., 'concat')` node — a specialized differential-dataflow operator that incrementally maintains a string-concatenation of a child query's projection. Unit tests of the same projection shape pass cleanly: a fresh `createLiveQueryCollection` evaluates the join correctly on initial preload, and again after status flips. Tests do not reproduce the production failure mode (long-lived subscription where `content` silently goes from populated → null after the row's status flips, recovering only after a full live-query teardown). Leaving a placeholder test as a marker — when we have a repro, drop the body in here and restore the `content` field in `entity-timeline.ts:buildEntityTimelineQuery`. The current fix sidesteps the issue by exposing `run.reasoningDeltas` and assembling content client-side, which is reliable but bypasses what should be a working server-side projection. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Restore the original nested-text shape on \`runItemsSource\` —
\`text: caseWhen(text.key, {...})\` and \`textContent: concat(toArray(...))\`
projected together on the union row — and undo the flat-scalar
hoist (\`text_key\`, \`text_run_id\`, \`text_order\`, \`text_status\`).
The \`textChunk\` alias on the delta-join stays, since that's the
load-bearing change that actually fixed the original \`chunk\`
alias collision with the reasoning sub-query.
When fixing the original alias-collision bug I made two changes in
one commit:
1. Renamed the text delta-join alias \`chunk\` → \`textChunk\` so it
no longer collided with the \`chunk\` used in reasoning content.
2. Hoisted text fields to flat scalars on the union row so the join
could move out of \`runItemsSource\`'s select and into the items
consumer's select.
I never bisected the two. Turns out (1) alone is sufficient — the
nested \`text: caseWhen(text.key, {...})\` + co-located \`textContent\`
projection works fine once the alias collision is gone. The flat-
scalar hoist was unnecessary churn that just made the code harder
to read for no behavioral benefit.
Tested by reverting (2), running unit tests (60 still pass), and
verifying in the running app that text content still streams in
and renders correctly through a full Claude exchange.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ection Reverts the client-side `run.reasoningDeltas` workaround in favor of the server-side `concat(toArray(...))` projection on `run.reasoning.content`. Currently broken in production against `@tanstack/db@0.6.7` — documented in `packages/agents-runtime/test/entity-timeline.test.ts`'s `reasoning content remains populated after status flips to completed` and friends. Unit tests against the projection pass cleanly; the bug only surfaces in a long-lived stream-backed live query after the parent row's `.update()`, with the field silently becoming `null` even though deltas are present in the local DB. A fresh subscription (navigate-away + back, or reload) recovers. Holding this branch as a draft PR so the work isn't lost. Merge once TanStack DB ships an upstream fix that makes the placeholder tests pass against a long-lived production live query. Diff vs `kevin/reasoning-content`: - `entity-timeline.ts` — add `content: concat(toArray(<delta-join>))` back to `reasoning.select(...)`, drop the parallel `reasoningDeltas` sub-collection. Alias stays `reasoningChunk` (not the generic `chunk`) to avoid the alias-collision class of bug. - `EntityTimelineReasoningItem` — `content: string` reinstated; `EntityTimelineReasoningDeltaItem` removed. - `client.ts` — drop `EntityTimelineReasoningDeltaItem` export. - `AgentResponseLive` — drop the `run.reasoningDeltas` subscription + client-side concat; `reasoningEntries` reads `content` straight off the projected row. - Tests — three reasoning-content tests assert `reasoning[0].content` (rather than concatenating raw deltas). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
✅ Deploy Preview for electric-next ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Tracks down and fixes the bug that's been driving the client-side-concat workaround in #4508 and blocking #4532. ## Root cause TanStack DB's "includes" — fields whose value is a sub-query like \`concat(toArray(...))\` — are deferred. A row carrying an include arrives with the field set to \`null\` and a hidden \`Symbol(includesRouting)\` marker describing how to compute it. The include is only materialized when something downstream reads it *in the right way*. The empirical rule (figured out via DevTools probes — \`.toArray\` on the sub-collection always showed the populated string, \`useLiveQuery\` output had \`content: null\`): **An include is materialized only when it's referenced inside a \`caseWhen\` object body in a downstream \`.select(...)\`. A bare top-level reference doesn't trigger it — the include is just aliased forward, still deferred.** This is why \`items.text.content\` has always worked and reasoning hasn't. The items consumer derefs \`item.textContent\` inside the \`text: caseWhen(item.text.key, { ..., content: item.textContent })\` body. The reasoning consumer had \`content: concat(toArray(...))\` (or, after the source/consumer split, \`content: r.reasoningContent\`) at the top level of its select. useLiveQuery handed the row to React with \`content: null\`. ## Fix Wrap the include reference inside a \`caseWhen\` object body, mirroring items: \`\`\`ts reasoning: q .from({ r: runReasoningSource }) ... .select(({ r }) => ({ key: r.key, run_id: r.run_id, order: r.order, status: r.status, body: caseWhen(r.key, { content: r.reasoningContent, }), summary_title: r.summary_title, encrypted: r.encrypted, })) \`\`\` \`r.key\` is always truthy on a real row, so the caseWhen is effectively unconditional — its only purpose is being an object body that forces the include reference to materialize. UI reads \`entry.body?.content\` (via the type) and \`AgentResponseLive\` maps it back into a flat \`content: string\` on \`ReasoningEntry\` so \`ReasoningSection\`'s API is unchanged. This drops the need for the client-side concat workaround that was the original target of #4532. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@KyleAMathews here are some screenshots showing how it displays while it's thinking and how it displays when it's done thinking (the "Thought for 2s" block is expandable on click).
|
The entity-stream-db mock omitted the reasoning and reasoningDeltas collections, so loadOutboundIdSeed crashed when reading db.collections.reasoning.toArray under three process-wake scenarios. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
# Conflicts: # packages/agents-runtime/src/entity-timeline.ts # packages/agents-runtime/test/entity-timeline.test.ts # packages/agents-server-ui/src/components/AgentResponse.tsx
# Conflicts: # packages/agents-runtime/src/entity-timeline.ts
# Conflicts: # packages/agents/src/model-catalog.ts # packages/agents/test/model-catalog.test.ts
…m with run items (#4570) ## Summary Two fixes for the reasoning-stream UI added in #4508 (note: that feature is on `kevin/reasoning-content`, not yet on `main`, so this PR targets the feature branch): 1. **No more empty thinking blocks.** Some models report that they reasoned but never expose the tokens (e.g. OpenAI codex models) — `pi-adapter.ts` deliberately opens a reasoning row on `thinking_start` even when no delta ever arrives, so the UI rendered a blank live block that settled into an empty `▸ Thought` row. `AgentResponseLive` now filters out rows with no content client-side. Anthropic redacted rows (`encrypted` set) are kept and still render their placeholder, and a genuinely-streaming block appears as soon as its first delta lands. Persistence is untouched — empty rows are still recorded (they can carry the encrypted payload that must round-trip to the model). 2. **Reasoning blocks interleave with the response instead of stacking at the top.** Previously all of a run's reasoning rows rendered in one `<ReasoningSection>` above every text/tool-call item, so in multi-step tool-using runs step-3 thinking appeared above step-1 output. Reasoning rows already carry the same `_timeline_order` as text/tool-call rows, so `AgentResponseLive` now merges both streams into one ordered render list — each block renders at the position the model emitted it (think → write → call tool → think → …). On an order tie (legacy rows without `_timeline_order`), reasoning sorts before output. ## Implementation - `ReasoningSection` → `ReasoningBlock`: the component now renders a single entry; expand/collapse state is lifted to `AgentResponseLive` (keyed by row key) so it still survives the block unmounting/remounting, same as before. - `ReasoningEntry` gains an `order` field (same `TimelineOrder` space as run items). - New `LiveRenderEntry` union + `compareLiveRenderEntries` comparator; item-vs-item ties keep delegating to `compareLiveRunItems`. - The `.root` width wrapper in `ReasoningSection.module.css` is gone — blocks are now direct children of the `AgentResponse` root, which applies the same width treatment, so they align with text items. - The streaming flag for the last text item now compares against `lastItem` by identity instead of array index (the index no longer maps 1:1 once reasoning entries are interleaved). ## Test plan - [x] `pnpm typecheck` clean in `agents-server-ui` - [x] `pnpm test` in `agents-server-ui` (88 passed) - [ ] Manual: codex-model run shows no empty thought block; multi-step Anthropic extended-thinking run shows blocks interleaved between text/tool calls 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Fable 5 <noreply@anthropic.com>


Summary
While the model is "thinking" (Anthropic extended thinking, DeepSeek-R1 reasoning, Moonshot K2, OpenAI Responses summaries) the agent response now shows the reasoning text faded above the answer, with the existing
Thinkingshimmer heading plus elapsed-time ticker. Once the reasoning settles it collapses to▸ Thought for 12s— click to expand. Multiple reasoning rows per run render independently in order (one per LLM step in tool-using turns). UX intentionally mirrors Claude Code + OpenCode patterns.Implementation (end-to-end)
reasoningrow gainsrun_id,encrypted(Anthropic redacted-thinking opaque payload, must round-trip back to the model verbatim), andsummary_title(extracted at write time). NewreasoningDeltascollection mirrorstextDeltas. Strictly additive.OutboundBridgegainsonReasoningStart/onReasoningDelta/onReasoningEnd, parallel to the text path. Reasoning counter added toOutboundIdSeed.pi-adapter.tsroutes pi-ai'sthinking_start/thinking_delta/thinking_endevents to the bridge. Parses a**Title**\n\n<body>heading once at write time (OpenAI Responses; no-op for Anthropic / DeepSeek / Moonshot). Defensive: handles latethinking_deltawithout a precedingthinking_start, and closes an open reasoning row onmessage_end(e.g. provider abort).reasoning: Collection<EntityTimelineReasoningItem>onEntityTimelineRunRow, content built via the same delta-join pattern asEntityTimelineTextItem.content.<ReasoningSection>renders above items inAgentResponseLive:StreamdownwithThinkingIndicatorheading + summary title + elapsed-time ticker▸ Thought for Nswith click-to-expand. Closure duration snapshotted fromDate.now() - timestampusing the samesawStreamingReftrick from the elapsed-time PR — accurate for in-session settles, stays a bareThoughtfor rows already settled on first mount (no real end timestamp available client-side).⊘ Reasoning redacted by provider safety filters. The encrypted payload is still persisted server-side so the model gets it back on the next turn.Reference
Patterns informed by reading OpenCode's reasoning implementation:
reasoning-start/reasoning-delta/reasoning-end)ReasoningPartstorage shape includingencryptedfor Anthropic round-tripreasoningSummary()headline parser (5-line regex, OpenAI Responses only)Test plan
pnpm typecheckclean inagents-runtime+agents-server-uipnpm test outbound-bridge pi-adapter entity-timelineinagents-runtime(95 passed: 18 bridge + 21 adapter + 56 timeline)pnpm testinagents-server-ui(66 passed)pnpm -C packages/agents-runtime build— dist artifacts emit cleanlyThought for Nson settleNotes
AgentResponse(the non-Live path used for old scrollback sections) doesn't yet surface reasoning — historical rows recorded before this PR lack the data anyway. Follow-up if we discover sessions where this matters.runtime-dsl.test.ts401 failures (anddispatch-policy-routing.test.ts500 failures) reproduce identically on cleanmainand were not introduced by this PR.🤖 Generated with Claude Code