fix(session): detect, discard, and retry text-form tool calls by wqymi · Pull Request #1336 · XiaomiMiMo/MiMo-Code

wqymi · 2026-06-25T10:29:20Z

Summary

The model can serialize a tool call as prose text (<invoke name=...>/<parameter name=...> in the message body) instead of emitting a structured tool_use. The result is a contradictory turn: finish === "tool-calls" but zero structured tool parts. With no tool part there is no tool_result to close the turn, so the next request ends on that assistant message — which Bedrock-routed Claude rejects (400 — This model does not support assistant message prefill). Worse, if the bad turn lingers in history it poisons the rest of the session.

This is a model-side degradation, observed with a notably higher incidence on Claude Opus. This change detects the state and recovers by discarding the bad turn and retrying the request, treating it as a transient error.

How

Detect (classify.ts): new text-tool-call category — finish === "tool-calls" AND no structured tool part AND text carrying tool-call markup. Placed before the unconditional tool-calls ⇒ continue so it isn't swallowed; guarded with a staleness check (lastUser.id < assistant.id) and !assistant.error so a discarded turn left in history can't re-fire.
Discard (runLoop): set assistant.error = TextToolCallError, which makes toModelMessages drop the whole turn from request history — so it can neither strand the conversation on an assistant turn nor poison later context. (The ignored flag is not used: the assistant-role branch of toModelMessages does not honor it; only the error-skip path drops a whole assistant message.)
Retry: append a synthetic user turn (mirroring the structured-output retry) so the discarded turn goes stale and the loop reaches generation, bounded by MIMOCODE_TEXT_TOOL_CALL_RETRY_LIMIT (default 2). On exhaustion the error stays terminal and is published.
Wired at all three classify call-sites (existing-assistant, main after-process, fork).

Why discard instead of keep+nudge

Keeping the turn and appending a reminder was tested and fails: any bad case lingering in context degrades the whole session. Discarding removes the contamination source.

Verification

classify.test.ts — 29 pass (detection + negative cases: real tool part, text without markup; plus the staleness / already-errored guards).
Integration test in prompt.test.ts: a mock LLM returns a text-form tool call on call 1 and clean text on call 2; asserts the model is re-called (calls === 2) and recovers — proving the retry actually regenerates.
classify-integration.test.ts + prompt-sweep.test.ts — no regressions (52 pass across the four files).
bun typecheck — clean.

Notes

Root cause traced from the errored session's raw trajectory: a turn with finish="tool-calls" + 0 tool parts + text ending mid tool-call markup. This is a separate path from the max-steps prefill fixed in #1239.

Under large context the model can serialize a tool call as prose text instead of emitting a structured tool_use, leaving a turn with finish="tool-calls" but no tool part. That turn strands the conversation on an assistant message (provider prefill rejection) and, if kept, poisons the rest of the session. classify now detects this state (finish tool-calls + zero tool parts + text carrying tool-call markup) as a distinct text-tool-call category. runLoop discards the bad turn by setting assistant.error (TextToolCallError), so toModelMessages drops it from request history, then retries the request up to MIMOCODE_TEXT_TOOL_CALL_RETRY_LIMIT (default 2) times; on exhaustion the error stays terminal. Wired at all three classify call-sites (existing-assistant, main after-process, fork).

The recovery helper set the turn's error but appended no new user message, so on continue the loop re-entered, re-detected the same (still-latest) turn, and burned retries with zero model calls — every case ended in a hard error. Now it appends a synthetic user turn (mirroring autoRetryStructuredOutput) so the discarded turn goes stale and the loop reaches generation, plus an already-errored early-return. classify #3a is gated with a staleness guard (lastUser.id < assistant.id) and !assistant.error so a degraded turn left in history can't re-fire across turns/resumes. Adds an integration test asserting the model is re-called (calls === 2) and recovers, plus classify guard tests.

Adds TextToolCallError to the generated SDK Assistant.error union and the openapi schema so consumers switching on error.name see the complete union.

wqymi added 3 commits June 25, 2026 18:26

chore(sdk): regenerate SDK + openapi for TextToolCallError

6ef7c0f

Adds TextToolCallError to the generated SDK Assistant.error union and the openapi schema so consumers switching on error.name see the complete union.

wqymi merged commit 64fceb2 into main Jun 25, 2026
3 of 6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(session): detect, discard, and retry text-form tool calls#1336

fix(session): detect, discard, and retry text-form tool calls#1336
wqymi merged 3 commits into
mainfrom
vb/fec8-session-error

wqymi commented Jun 25, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

wqymi commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

How

Why discard instead of keep+nudge

Verification

Notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

wqymi commented Jun 25, 2026 •

edited

Loading