Durable execution for agent-openai-advanced + agent-langgraph-advanced#195
Open
Durable execution for agent-openai-advanced + agent-langgraph-advanced#195
Conversation
Pins both advanced templates to the ai-bridge PR branch so the long-running agent server crash-resumes in-flight runs via heartbeat + CAS claim. Revert the [tool.uv.sources] entry once that PR merges and a new release is cut. Also fixes a latent IndexError in agent-openai-advanced's deduplicate_input: when the long-running server re-invokes the handler with input=[] to resume from the session (the agnostic resume contract validated by prototyping), messages[-1] blew up. Now we return [] for empty input — the session already has prior turns so there is nothing to dedupe. No change to either template's agent.py.
Makes the bundled chat UI durable end-to-end without any client-side
changes. The Express /invocations proxy in e2e-chatbot-app-next now:
- Rewrites streaming POSTs to { ...body, background: true, stream: true },
so every user turn persists each SSE event to Lakebase via
LongRunningAgentServer.
- Sniffs response.id + sequence_number out of the forwarded SSE stream.
- If upstream closes before [DONE] (pod died, lost connection), the proxy
transparently reconnects via
GET /responses/{id}?stream=true&starting_after=N
and resumes emitting events to the still-connected browser client. The
browser sees one continuous stream.
Non-streaming requests and non-POST methods keep the original passthrough
behavior.
Also points agent-openai-advanced/scripts/start_app.py at the
dhruv0811/durable-execution-templates branch of app-templates so the new
proxy code is actually deployed (override via APP_TEMPLATES_BRANCH env
var). Revert once this lands on main.
… actually fires
Previous attempt left the proxy dead-code: the Node AI SDK honored API_PROXY
verbatim and sent requests straight to http://localhost:8000/invocations
(FastAPI), skipping the Express /invocations handler at :3000 entirely.
Confirmed in logs: requests reached the backend with {"stream": true}
but never with "background": true.
Split the two concerns across env vars:
API_PROXY=http://localhost:3000/invocations (AI SDK -> Express proxy)
AGENT_BACKEND_URL=http://localhost:8000/invocations (Express proxy -> FastAPI)
Express handler prefers AGENT_BACKEND_URL, falls back to API_PROXY for
backwards compat so existing templates don't break.
response_id is buried in the raw backend SSE stream and never surfaces to the browser because the Vercel AI SDK re-wraps the stream as its own message format before sending to the client. Log it on the server side instead so test instructions can `grep 'background started response_id=' ` from apps logs. Also distinguish the startup log so it's clear the durable-resume code path is live. No behavior change; pure observability.
app.yaml env vars were overriding databricks.yml at runtime, so the AI SDK was still talking directly to the Python FastAPI backend and the Express /invocations proxy never saw the request. Keep both files in sync.
…RL to FastAPI The script was unconditionally overwriting API_PROXY with the backend URL right before launching the frontend, which defeated our whole durable- resume-rewrite story: the Node AI SDK bypassed the Express /invocations handler and streamed straight from FastAPI. Fix: API_PROXY now points at CHAT_APP_PORT (the Express proxy), and we default AGENT_BACKEND_URL (previously unset) to the Python backend. Use os.environ.setdefault for AGENT_BACKEND_URL so operators can still override via databricks.yml or app.yaml.
…resp_* Broadens the response_id parser so it works whether the backend tags frames with top-level response_id (preferred) or the older nested-only shape.
…tally Matches the [/invocations] prefix so the full story is greppable from apps logs without correlating Node and Python timestamps.
The library logger inherits from root (default WARNING) so INFO-level lifecycle messages from LongRunningAgentServer (heartbeat, claim, resume, stream lifecycle) were being dropped. Set both the ai-bridge logger and the root level to LOG_LEVEL so apps logs carry the full durable-resume story without requiring callers to tune logging themselves.
When a response is killed mid-stream, the partial assistant text that was already rendered to the client kept receiving fresh deltas from attempt 2 — users saw attempt-1-partial + attempt-2-full concatenated in one bubble. Express /invocations proxy now seals the in-progress assistant message across an attempt boundary: 1. On upstream close without [DONE], immediately append a '(connection interrupted — reconnecting…)' suffix delta to the active message so the user sees something is happening during the ~10s stale window. 2. On the response.resumed sentinel, emit synthetic response.content_part.done + response.output_item.done events for the active message — effectively ending the first assistant bubble at OpenAI Responses API level. 3. Attempt 2's natural response.output_item.added (with a fresh item_id) then creates a clean second bubble showing the full answer. Tool calls naturally de-dup by call_id across attempts, so no closure synthesis needed for them. Also mirrors the routing + logging fixes previously applied to agent-openai-advanced onto agent-langgraph-advanced so both templates get durable resume with the full [durable] log lifecycle visible: - app.yaml + databricks.yml: split API_PROXY (-> Express :3000) from AGENT_BACKEND_URL (-> FastAPI :8000). - scripts/start_app.py: honor AGENT_BACKEND_URL, point API_PROXY at the Express proxy, clone e2e-chatbot-app-next from the durable-execution branch. - agent_server/start_server.py: raise databricks_ai_bridge + root logger to LOG_LEVEL so [durable] INFO lines surface in apps logs.
Durable-resume can interrupt the pod between an LLM emitting tool_calls and the SDK finishing the tool executions — the Session is left with function_call items whose matching function_call_output never got written. The next LLM request over that session fails: 400 BAD_REQUEST: An assistant message with 'tool_calls' must be followed by tool messages responding to each 'tool_call_id'. The following tool_call_ids did not have response messages: call_xxx, call_yyy, ... Piggy-back on deduplicate_input (which already touches the session each turn) to inject synthetic function_call_output items for every orphan function_call. Message is plain-text, so the LLM sees 'tool X was interrupted, please retry if needed' and can decide whether to re-call or continue. No change to agent.py.
The previous heal added synthetic function_call_output at the END of the session (add_items only appends). When the conversation has a message between the orphan function_call and the synthetic output, the SDK rebuilds the LLM request as an assistant-with-tool_calls message that doesn't have its tool responses right after it, and the API rejects with 'assistant message with tool_calls must be followed by tool messages'. Also: the Vercel AI SDK client echoes the full conversation back each turn. deduplicate_input drops most of it but the Runner.run path can still re-persist prior items, leaving DUPLICATE function_call rows for the same call_id. Replace with a clear+rebuild sanitize pass: dedupe function_call / function_call_output by call_id, inject synthetic outputs immediately after any orphan function_call, clear the session, and re-add the canonical sequence. No-op when already clean.
Keep the UI minimal but fix the doubled-text issue: when a mid-stream
kill happens, the AI SDK merges all deltas within one streamText call
into one UIMessage — so our proxy-level seal events were valid but
invisible, and attempt 2's text kept appending to attempt 1's partial.
Minimal solution:
1. Express /invocations proxy already emits response.resumed at the
attempt boundary (unchanged).
2. chat.ts server: detect response.resumed via onChunk and forward it
to the UI stream as { type: 'data-resumed', data: { attempt } }.
3. chat.tsx client: on 'data-resumed', call setMessages to drop all
text parts from the last (assistant) message. Tool call parts stay
because they dedupe by call_id naturally.
Also: fix auto-resume loop burning MAX_RESUME_ATTEMPTS on terminal
errors by exiting early when an error event with code=task_failed or
code=task_timeout comes through the proxy.
No changes to agent.py. Agnosticism tenet intact.
Your 'clean up at end of stream' idea — much more robust than relying on mid-stream mutation sticking. On data-resumed we now snapshot the attempt-1 text length, and in onFinish we slice exactly that many chars off the front of the last assistant message's text parts. Whatever the AI SDK accumulator did during streaming, the final rendered state contains only attempt 2's content. The mid-stream mutation wipe stays in place too — when it sticks the text visibly clears during the 10s stale window, which is nicer UX than waiting for onFinish. When it doesn't stick, onFinish catches it.
PreviewMessage is memoized: while loading it compares prevProps.message to nextProps.message by reference; when not loading it deep-equals the parts array (which short-circuits on identical references). Our previous truncate mutated part.text in place and returned [...prev] — same message + same parts array refs, so the memo skipped the re-render and the old text stuck on screen even though state was technically updated. Map to NEW part objects with sliced text and wrap a NEW message object so both the reference check (loading path) and deep-equal (done path) see a change and re-render.
State-level wipes were getting clobbered by the AI SDK accumulator — ReactChatState.replaceMessage deep-clones state.message on every write(), and activeTextParts keeps mutating the originals behind the UI's back. Solution: transform at the VIEW layer instead of fighting the state machine. Chat component tracks attempt1TextLen per messageId (state, not ref, so it propagates to children). Messages maps each message through a render-time slice that drops the leading attempt-1 chars from text parts before passing to PreviewMessage. Creates new message + part objects so the memo's reference check trips and the component re-renders. onFinish still does the authoritative setMessages truncate so the persisted-to-DB final message reflects only attempt 2. That truncate now also clears attempt1TextLen, so the render-time slice becomes a no-op after completion (state is already truncated).
…cution-templates # Conflicts: # agent-openai-advanced/databricks.yml
Drop the [chat][onData] / [chat][onFinish] / [chat][onChunk] tracing statements that were used to trace the attempt-1 → attempt-2 flow while tuning the render-time slice and post-stream truncate. The server-side Express proxy still logs resume lifecycle (background started / resume fetch / terminal error / stream done) since that's operationally useful; the ai-bridge backend's [durable] INFO logs stay as-is. Co-authored-by: Isaac
Move the per-template workarounds for mid-tool crash-resume into the
databricks-ai-bridge library and wire them in:
- agent-openai-advanced/utils.py: deduplicate_input now calls
session.repair() (new public method on AsyncDatabricksSession) instead
of the 100-line in-template _sanitize_session. Same behavior — dedupe
function_call/function_call_output by call_id, inject synthetic
outputs for orphans — just owned by the library.
- agent-langgraph-advanced/agent.py: before agent.astream, call
build_tool_resume_repair on the checkpointer's messages and apply via
agent.aupdate_state(..., as_node="tools"). The as_node is critical —
without it LangGraph re-evaluates the model→{tools,END} branch from
the updated state and crashes with KeyError: 'model'.
- agent-langgraph-advanced/agent.py: when the checkpointer already has
a thread, only forward the latest user turn from request.input — the
UI client (Vercel AI SDK) re-echoes the full history on every turn,
which can re-inject orphan tool_uses from a previously-interrupted
attempt that the client kept in its buffer.
Both pyproject.toml files now pin databricks-openai / databricks-langchain
to the same ai-bridge branch (subdirectory git sources) so the new
helpers are picked up. Temporary; revert to registry once the bridge PR
merges.
Co-authored-by: Isaac
Library side (databricks-langchain, PR #416):
- New build_tool_resume_repair_middleware() returns an AgentMiddleware whose
before_model hook runs build_tool_resume_repair. Swaps the manual
aget_state / aupdate_state(as_node="tools") surgery in the template for a
one-line `middleware=[...]` arg to create_agent.
- The as_node="tools" footgun (KeyError: 'model' in the model→{tools,END}
conditional branch re-eval) disappears entirely; repair runs inside the
graph's own execution flow, not as external state surgery.
Template (agent-langgraph-advanced):
- init_agent: add middleware=[build_tool_resume_repair_middleware()] to
create_agent. stream_handler drops the 8-line repair block.
- utils.py process_agent_astream_events: skip None node_data (the graph's
updates stream emits {middleware_node: None} when the middleware is a
no-op, which is every turn on the happy path).
UI (e2e-chatbot-app-next):
- On data-resumed from the backend, wipe text parts from the last assistant
message in one setMessages. Tool-call parts are kept as-is (they already
dedupe across attempts by call_id). Dropped:
* attempt1TextLen state + per-message snapshot in onData
* render-time text slice in Messages.tsx
* onFinish authoritative post-stream truncate
The AI SDK's seal-on-resume synthesis (Express proxy) still creates a
fresh output_item_id for attempt 2, so new deltas land in a fresh text
part — our wipe of the old text part is sufficient.
Net: -99 LOC across 4 files. Same behavior for the "delete old text,
leave tools alone" UX; substantially less state-machine choreography.
Co-authored-by: Isaac
setMessages can't wipe mid-stream — the AI SDK's activeResponse.state is a snapshot taken at makeRequest time, and every text-delta calls write() → this.state.replaceMessage(lastIdx, activeResponse.state.message), which overwrites any setMessages we do. Our wipe was visible for a single chunk then reverted. Fix: snapshot the assistant message's parts.length at data-resumed, and at render time hide text parts at indices BEFORE that cutoff. Tool / step parts render normally at every index. Works for openai and langgraph because it transforms at the view layer rather than fighting the AI SDK state machine. Removes server-side debug log. Keeps the minimal delete-old-text UX. Co-authored-by: Isaac
…lper - Removed the "_(connection interrupted — reconnecting…)_" delta block. Render-time slice hides attempt-1 text on resume anyway, so the suffix was invisible past the 10s stale window and too subtle during it. - Extracted writeEvent(type, payload) helper; sealActiveMessage went from 45 → 22 lines, no behavior change. - Removed readActive() TS-widening helper (no longer needed without the suffix block). - Inlined onFirstResponseId helper into its single call site. Net: 92 lines removed, 36 added in this file. Co-authored-by: Isaac
Durability mechanics now live entirely in databricks-ai-bridge's LongRunningAgentServer (rotate conv_id on resume + full-history input sanitizer, see ai-bridge PR #416). Templates can drop the explicit repair surface: - agent-langgraph-advanced/agent.py: drop middleware=[build_tool_resume_repair_middleware()] from create_agent and the unused import. Also drop the stream_handler UI-echo dedupe block — the server sanitizer handles mid-history orphans end-to-end. - agent-openai-advanced/utils.py: drop await session.repair() from deduplicate_input. session.repair() stays available as a public method for callers who want destructive session cleanup. Net: agent.py / utils.py in both advanced templates have zero durability-specific lines. The contract becomes "use our checkpointer/ session classes with LongRunningAgentServer — durable resume + orphan repair is free." Co-authored-by: Isaac
Temporarily short-circuit the resumeCutIndex write so attempt-1's text stays visible while attempt-2 streams over it. Lets us see how the server-side inheritance + synthetic-output prompt shape the LLM's mid-turn continuation behavior without the visual wipe hiding what attempt-2 actually emits. Re-enable by uncommenting the block; the rest of the wipe plumbing (state hook, Messages prop threading, render-time slice) is left in place so re-enabling is a 1-line flip. Co-authored-by: Isaac
…les resume Server-side changes earlier in this branch (prior-attempt tool-event inheritance + partial-stream reassembly in databricks-ai-bridge) make the client-side "wipe attempt-1 text when resume fires" machinery unnecessary: attempt-2's LLM sees attempt-1's work as history and continues seamlessly instead of restarting. The wipe was also hiding the new continuation quality from the user. Turning the wipe off in UI testing confirmed the server-side story is sufficient. Delete the full stack: - packages/core/src/types.ts: drop `resumed` from CustomUIDataTypes. - server/src/routes/chat.ts: drop writerRef + emittedResumedAttempts + the onChunk raw-event branch that emitted data-resumed parts. Trace-extraction stays; only the resume-forwarding path is removed. - client/src/components/chat.tsx: drop resumeCutIndex state hook, the data-resumed onData handler (was already commented out), and the prop pass to <Messages/>. - client/src/components/messages.tsx: drop resumeCutIndex prop from MessagesProps + its destructuring + the render-time text-part slice. The server still emits `response.resumed` as a sentinel so the Express proxy's sealActiveMessage() call correctly closes attempt-1's open text part before attempt-2's fresh output_item.added creates a new one. The proxy no longer extracts it into a UI data part. Co-authored-by: Isaac
Remove everything that isn't strictly required for durable resume with the server-side-only approach in ai-bridge PR #416: - agent-langgraph-advanced/agent_server/agent.py: revert entirely. The test-scaffolding tools (get_weather, get_stock_price, deep_research) were only for crash-test harnesses; the asyncio import only existed to support them. User-space durability surface for this template is now zero lines. - agent-openai-advanced/agent_server/agent.py: revert entirely. Drop the test-scaffolding tools (get_weather, get_stock_price, search_best_restaurants, deep_research) and asyncio import. Same zero-user-space result. - agent-langgraph-advanced/agent_server/utils.py: revert. The "middleware nodes that no-op return None" guard was defensive against middleware we no longer install. - agent-openai-advanced/agent_server/utils.py: revert. The empty-input guard was defensive against the old input=[] resume replay that no longer happens — server always replays the original input. - e2e-chatbot-app-next/server/src/index.ts: drop the activeMessage / sealActiveMessage / writeEvent machinery. Was synthesizing closure events on response.resumed to seal attempt-1's text part for the UI wipe. UI wipe is gone; the AI SDK creates parts by item_id so attempt-2's fresh output_item.added naturally starts a new part and attempt-1's open part finalizes on stream end. - Plus the earlier UI cleanup (chat.tsx, messages.tsx, types.ts, routes/chat.ts) that removed the data-resumed / resumeCutIndex plumbing. Remaining essentials: - agent_server/start_server.py: log-level setup so [durable] logs surface in app logs. - scripts/start_app.py: API_PROXY / AGENT_BACKEND_URL wiring so the Node AI SDK routes streaming POSTs through the Express background-mode + auto-resume proxy. Clone-from-branch is marked TEMPORARY (revert when ai-bridge ships). - pyproject.toml: databricks-ai-bridge git source pointer (TEMPORARY). - e2e-chatbot-app-next/server/src/index.ts: background-mode rewrite + auto-resume proxy for the /invocations route. Co-authored-by: Isaac
Infinite Stream Resume loop seen with Claude multi-tool turns via
durable retrieve. Root:
- useChat's onStreamPart reset resumeAttemptCountRef on every chunk,
so the 3-retry cap was only enforced when a stream ended empty.
When Claude's provider failed to emit a clean `finish` UIMessageChunk
at the end of the stream, lastPart.type !== 'finish' kept
streamIncomplete = true. Each resume replayed the cached stream,
delivered chunks, reset the counter to 0, onFinish fired without
`finish`, looped.
Fix:
- Remove the per-chunk reset in onStreamPart.
- Reset only in prepareSendMessagesRequest when the last message is a
user message (a genuine new turn). Tool-result continuations
(non-user-message continuations) don't reset.
- Cap stays at 3; after that, fetchChatHistory() pulls the
DB-persisted state so the user sees the final assistant output
instead of spinning forever.
Co-authored-by: Isaac
Final stable state for durable execution. End-to-end UI-validated
scenarios that now work:
- Multi-tool turn interrupted mid-sequence, durable resume inherits
completed tool pairs + narrative (reordered) + synthetic output
for the interrupted call, agent continues from where it left off.
- Text-only mid-stream crash, partial-text reassembly + Claude
prefill → continuation.
- Cross-turn recall after crash-and-resume (stable thread via read-
time checkpoint repair on LangGraph / session auto-repair on
OpenAI).
- Multi-tool on GPT-5 + openai-agents (single-response-per-turn).
Template fix here: process_agent_stream_events now disambiguates by
(a) item.type bucket for delta routing and (b) call_id bucket for
multiple open function_calls. The original single curr_item_id bucket
worked for GPT-5's strictly serial events but collided on Claude's
interleaved + parallel tool-call events, which produced two items
sharing one id and broke the client's part tracking.
Pairs with databricks-ai-bridge PR #416 changes (rotate + replay +
full-history sanitizer + prior-attempt tool-pair inheritance +
narrative hoist + checkpoint read-time repair + session auto-repair).
Co-authored-by: Isaac
End-to-end UI test on Claude (via deployed agent-openai-advanced with the updated databricks-ai-bridge) confirmed that the bridge-side ordering fix (sanitizer + narrative hoist + tool-pair inheritance + session auto-repair) is sufficient on its own. The two template-side guards added in earlier commits are no longer needed: - Revert 0ddbd60: `process_agent_stream_events` per-type + per-call-id id tracking. The single-bucket implementation handles Claude's interleaved + parallel tool-call events correctly now that the upstream ordering is clean. - Revert 5f3c507: `chat.tsx` user-message-only resume-counter reset. Claude now emits a clean `finish` UIMessageChunk through the durable retrieve path, so the per-chunk reset no longer traps the 3-retry cap in an infinite loop. Keeps the advanced templates lean — durability logic lives entirely in databricks-ai-bridge (LongRunningAgentServer). Co-authored-by: Isaac
Extract three pure helpers above the route handler so the SSE frame loop reads like prose: - parseSseFrame(frame): classifies a frame as done / passthrough / data. - extractResponseId(payload): tolerates FastAPI's three response_id locations (response_id, response.id, top-level id with resp_ prefix). - isTerminalErrorFrame(payload): detects task_failed / task_timeout so the resume loop can short-circuit. pumpStream now just drives the reader + forwards bytes; the parsing logic is testable in isolation and the handler body is substantially shorter. Co-authored-by: Isaac
Both advanced templates were setting these env vars to hard-coded
localhost URLs that match the bundled-process topology (Node on 3000,
FastAPI on 8000). The values are fixed by the templates themselves —
a customer deploying the advanced stack can't change them without
breaking the bundle. Making them required in yaml adds noise without
adding configurability.
Push the defaults into the chatbot:
- New ``getApiProxyUrl()`` helper in ``packages/ai-sdk-providers/src/
api-proxy.ts`` resolves the effective proxy URL:
1. explicit ``API_PROXY`` wins,
2. ``DATABRICKS_SERVING_ENDPOINT`` set → direct-endpoint mode, no
proxy,
3. otherwise → ``http://localhost:${CHAT_APP_PORT|PORT|3000}/invocations``
(advanced-template convention).
Used from ``providers-server.ts`` and ``request-context.ts`` so both
agree on proxy activation.
- ``server/src/index.ts`` defaults ``AGENT_BACKEND_URL`` to
``http://localhost:8000/invocations`` when unset. Explicit empty
string still disables the ``/invocations`` proxy route.
- Drop the ``API_PROXY`` / ``AGENT_BACKEND_URL`` block (and its comment)
from both advanced templates' ``app.yaml`` and ``databricks.yml``.
Preserves direct-serving-endpoint CUJs: when
``DATABRICKS_SERVING_ENDPOINT`` is set (basic chatbot deployments), the
AI SDK talks straight to the endpoint and never hits ``/invocations``.
Co-authored-by: Isaac
Prior cleanup commit dropped ``API_PROXY=http://localhost:8000/invocations`` from the advanced templates' ``app.yaml`` and ``databricks.yml``. That line pre-existed on ``main``; the PR never meant to remove it. Scope of the previous change was only the *newly-added* ``API_PROXY`` + ``AGENT_BACKEND_URL`` block that activated the Node proxy path. Restore the four files to exactly match ``main``. The chatbot-side ``getApiProxyUrl()`` default only fires when ``API_PROXY`` is unset, so users with main's explicit setting keep their existing behavior. Co-authored-by: Isaac
Both helpers answer routing-decision questions for the provider layer (proxy URL + context-injection gate), and the separate file wasn't buying isolation — providers-server.ts already imports from request-context.ts. One file, same logic. Co-authored-by: Isaac
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Wires
agent-openai-advanced+agent-langgraph-advanced+ the sharede2e-chatbot-app-nextfrontend to the durable-execution contract in databricks-ai-bridge PR #416 (ML-64230).Agent code stays unchanged. All durability logic lives in the bridge or in the chatbot proxy.
Template changes
pyproject.toml— pindatabricks-ai-bridge+ integration package to the bridge PR branch (revert to registry once released).start_server.py— raisedatabricks_ai_bridgelogger toLOG_LEVELso[durable]messages surface in app logs. TheLongRunningAgentServersubclass already exists onmain.agent.py. Read-time repair happens insideAsyncDatabricksSession.get_items()(openai) and_repair_loaded_checkpoint_tuplewrapping the checkpointer (langgraph) — both in the bridge.Chatbot proxy (
e2e-chatbot-app-next/server/src/index.ts)The Express
/invocationshandler rewrites streaming POSTs into the bridge's background-mode contract and transparently resumes on upstream drops. Zero client-side changes.POST /invocations {stream: true}→ backend{background: true, stream: true}pumpStreamforwards SSE frames to the browser; three pure helpers (parseSseFrame,extractResponseId,isTerminalErrorFrame) classify each frame[DONE], the loop reconnects viaGET /responses/{id}?stream=true&starting_after={lastSeq}, capped at 10 attemptstask_failed/task_timeoutterminal errorsAI SDK provider (
packages/ai-sdk-providers/src/request-context.ts)getApiProxyUrl()helper resolves the proxy URL:API_PROXYenv var winsDATABRICKS_SERVING_ENDPOINTset → direct-endpoint mode, no proxy/invocations(advanced-template convention)API_PROXY/AGENT_BACKEND_URLindatabricks.yml/app.yaml— the defaults live in chatbot code.Testing
agent-openai-advanced: multi-tool turns (get_current_time,get_weather,get_stock_price,deep_research) interrupted mid-stream via/_debug/kill_task/{id}. Durable resume inherits completed tool pairs and injects synthetic[INTERRUPTED]output for the killed call; the agent continues without re-running completed tools. Tool cards dedupe across attempts.status=completed, attempt_number=2.openai-advanced[autoscaling]end-to-end. Other failures in the local run are environmental (Python 3.14 mlflow simulator compat, vector-embedding cold-start on langgraph LTM test) — not PR-caused.How to test
Mid-stream crash test (UI)
"Do deep_research on quantum computing basics").response_idfrom[/invocations] background started response_id=resp_....Mid-stream crash test (HTTP only)
Pre-merge checklist
pyproject.tomlgit-branch pins in both advanced templates to registry versionsAPP_TEMPLATES_BRANCHdefault in bothscripts/start_app.pyfromdhruv0811/durable-execution-templatestomainLONG_RUNNING_ENABLE_DEBUG_KILL=1from deploy configs before production use