Skip to content

fix(session): detect, discard, and retry text-form tool calls#1336

Merged
wqymi merged 3 commits into
mainfrom
vb/fec8-session-error
Jun 25, 2026
Merged

fix(session): detect, discard, and retry text-form tool calls#1336
wqymi merged 3 commits into
mainfrom
vb/fec8-session-error

Conversation

@wqymi

@wqymi wqymi commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

Summary

The model can serialize a tool call as prose text (<invoke name=...>/<parameter name=...> in the message body) instead of emitting a structured tool_use. The result is a contradictory turn: finish === "tool-calls" but zero structured tool parts. With no tool part there is no tool_result to close the turn, so the next request ends on that assistant message — which Bedrock-routed Claude rejects (400 — This model does not support assistant message prefill). Worse, if the bad turn lingers in history it poisons the rest of the session.

This is a model-side degradation, observed with a notably higher incidence on Claude Opus. This change detects the state and recovers by discarding the bad turn and retrying the request, treating it as a transient error.

How

  • Detect (classify.ts): new text-tool-call category — finish === "tool-calls" AND no structured tool part AND text carrying tool-call markup. Placed before the unconditional tool-calls ⇒ continue so it isn't swallowed; guarded with a staleness check (lastUser.id < assistant.id) and !assistant.error so a discarded turn left in history can't re-fire.
  • Discard (runLoop): set assistant.error = TextToolCallError, which makes toModelMessages drop the whole turn from request history — so it can neither strand the conversation on an assistant turn nor poison later context. (The ignored flag is not used: the assistant-role branch of toModelMessages does not honor it; only the error-skip path drops a whole assistant message.)
  • Retry: append a synthetic user turn (mirroring the structured-output retry) so the discarded turn goes stale and the loop reaches generation, bounded by MIMOCODE_TEXT_TOOL_CALL_RETRY_LIMIT (default 2). On exhaustion the error stays terminal and is published.
  • Wired at all three classify call-sites (existing-assistant, main after-process, fork).

Why discard instead of keep+nudge

Keeping the turn and appending a reminder was tested and fails: any bad case lingering in context degrades the whole session. Discarding removes the contamination source.

Verification

  • classify.test.ts — 29 pass (detection + negative cases: real tool part, text without markup; plus the staleness / already-errored guards).
  • Integration test in prompt.test.ts: a mock LLM returns a text-form tool call on call 1 and clean text on call 2; asserts the model is re-called (calls === 2) and recovers — proving the retry actually regenerates.
  • classify-integration.test.ts + prompt-sweep.test.ts — no regressions (52 pass across the four files).
  • bun typecheck — clean.

Notes

Root cause traced from the errored session's raw trajectory: a turn with finish="tool-calls" + 0 tool parts + text ending mid tool-call markup. This is a separate path from the max-steps prefill fixed in #1239.

wqymi added 3 commits June 25, 2026 18:26
Under large context the model can serialize a tool call as prose text instead
of emitting a structured tool_use, leaving a turn with finish="tool-calls" but
no tool part. That turn strands the conversation on an assistant message
(provider prefill rejection) and, if kept, poisons the rest of the session.

classify now detects this state (finish tool-calls + zero tool parts + text
carrying tool-call markup) as a distinct text-tool-call category. runLoop
discards the bad turn by setting assistant.error (TextToolCallError), so
toModelMessages drops it from request history, then retries the request up to
MIMOCODE_TEXT_TOOL_CALL_RETRY_LIMIT (default 2) times; on exhaustion the error
stays terminal. Wired at all three classify call-sites (existing-assistant,
main after-process, fork).
The recovery helper set the turn's error but appended no new user message, so
on continue the loop re-entered, re-detected the same (still-latest) turn, and
burned retries with zero model calls — every case ended in a hard error.

Now it appends a synthetic user turn (mirroring autoRetryStructuredOutput) so
the discarded turn goes stale and the loop reaches generation, plus an
already-errored early-return. classify #3a is gated with a staleness guard
(lastUser.id < assistant.id) and !assistant.error so a degraded turn left in
history can't re-fire across turns/resumes. Adds an integration test asserting
the model is re-called (calls === 2) and recovers, plus classify guard tests.
Adds TextToolCallError to the generated SDK Assistant.error union and the
openapi schema so consumers switching on error.name see the complete union.
@wqymi wqymi merged commit 64fceb2 into main Jun 25, 2026
3 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant