fix(session): detect, discard, and retry text-form tool calls#1336
Merged
Conversation
Under large context the model can serialize a tool call as prose text instead of emitting a structured tool_use, leaving a turn with finish="tool-calls" but no tool part. That turn strands the conversation on an assistant message (provider prefill rejection) and, if kept, poisons the rest of the session. classify now detects this state (finish tool-calls + zero tool parts + text carrying tool-call markup) as a distinct text-tool-call category. runLoop discards the bad turn by setting assistant.error (TextToolCallError), so toModelMessages drops it from request history, then retries the request up to MIMOCODE_TEXT_TOOL_CALL_RETRY_LIMIT (default 2) times; on exhaustion the error stays terminal. Wired at all three classify call-sites (existing-assistant, main after-process, fork).
The recovery helper set the turn's error but appended no new user message, so on continue the loop re-entered, re-detected the same (still-latest) turn, and burned retries with zero model calls — every case ended in a hard error. Now it appends a synthetic user turn (mirroring autoRetryStructuredOutput) so the discarded turn goes stale and the loop reaches generation, plus an already-errored early-return. classify #3a is gated with a staleness guard (lastUser.id < assistant.id) and !assistant.error so a degraded turn left in history can't re-fire across turns/resumes. Adds an integration test asserting the model is re-called (calls === 2) and recovers, plus classify guard tests.
Adds TextToolCallError to the generated SDK Assistant.error union and the openapi schema so consumers switching on error.name see the complete union.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The model can serialize a tool call as prose text (
<invoke name=...>/<parameter name=...>in the message body) instead of emitting a structuredtool_use. The result is a contradictory turn:finish === "tool-calls"but zero structured tool parts. With no tool part there is notool_resultto close the turn, so the next request ends on that assistant message — which Bedrock-routed Claude rejects (400 — This model does not support assistant message prefill). Worse, if the bad turn lingers in history it poisons the rest of the session.This is a model-side degradation, observed with a notably higher incidence on Claude Opus. This change detects the state and recovers by discarding the bad turn and retrying the request, treating it as a transient error.
How
classify.ts): newtext-tool-callcategory —finish === "tool-calls"AND no structured tool part AND text carrying tool-call markup. Placed before the unconditionaltool-calls ⇒ continueso it isn't swallowed; guarded with a staleness check (lastUser.id < assistant.id) and!assistant.errorso a discarded turn left in history can't re-fire.runLoop): setassistant.error = TextToolCallError, which makestoModelMessagesdrop the whole turn from request history — so it can neither strand the conversation on an assistant turn nor poison later context. (Theignoredflag is not used: the assistant-role branch oftoModelMessagesdoes not honor it; only the error-skip path drops a whole assistant message.)MIMOCODE_TEXT_TOOL_CALL_RETRY_LIMIT(default 2). On exhaustion the error stays terminal and is published.Why discard instead of keep+nudge
Keeping the turn and appending a reminder was tested and fails: any bad case lingering in context degrades the whole session. Discarding removes the contamination source.
Verification
classify.test.ts— 29 pass (detection + negative cases: real tool part, text without markup; plus the staleness / already-errored guards).prompt.test.ts: a mock LLM returns a text-form tool call on call 1 and clean text on call 2; asserts the model is re-called (calls === 2) and recovers — proving the retry actually regenerates.classify-integration.test.ts+prompt-sweep.test.ts— no regressions (52 pass across the four files).bun typecheck— clean.Notes
Root cause traced from the errored session's raw trajectory: a turn with
finish="tool-calls"+ 0 tool parts + text ending mid tool-call markup. This is a separate path from the max-steps prefill fixed in #1239.