Skip to content

Humanize the opaque "An unknown error occurred" provider error#322

Merged
dannon merged 1 commit into
galaxyproject:mainfrom
dannon:fix/320-chat-history-save-error
Jun 19, 2026
Merged

Humanize the opaque "An unknown error occurred" provider error#322
dannon merged 1 commit into
galaxyproject:mainfrom
dannon:fix/320-chat-history-save-error

Conversation

@dannon

@dannon dannon commented Jun 18, 2026

Copy link
Copy Markdown
Member

What this does

Asking the agent to "save the full chat history to a Markdown file" (provider google / gemini-2.5-flash) replied with a bogus "...saved as chat_history.md" immediately followed by "An unknown error occurred" -- no file written, nothing in the Files pane (#320). This humanizes that opaque error into something the user can act on.

Root cause

"An unknown error occurred" isn't a Loom string -- it's absent from the repo at every tag (incl. v0.3.1) and from git history. It's a pi-ai provider sentinel: when a turn ends with stopReason "error"/"aborted", the provider streams throw new Error("An unknown error occurred") and discard the real reason. For Gemini, mapStopReason buckets RECITATION, SAFETY, MALFORMED_FUNCTION_CALL, OTHER, BLOCKLIST, ... all into "error" (only STOP/MAX_TOKENS escape). The sentinel reaches the renderer via the message_end path (stopReason === "error"), where humanizeAgentError previously echoed the bare string verbatim. pi-ai is pinned ^0.78.0 in both v0.3.1 and current main, so the path is unchanged -- the bug reproduces today.

Asking the model to reproduce the whole conversation verbatim is itself a classic recitation/safety trigger, and there's no agent-callable chat-export tool (the "Export Chat" button is renderer-only), so the model has to reconstruct the transcript -- which is what trips the stop.

The fix

A sentinel case in error-humanizer.ts that detects the fixed phrase and returns an actionable message (rephrase/resend, ask for a summary instead of verbatim text, switch models). The exact reason is gone by the time we see it, so the message names the common causes rather than guessing one. The match is caught only in the bare-string branch, so a structured error that merely embeds the phrase still gets its typed handling.

What's intentionally NOT here

The issue's second angle -- don't let the agent claim "saved as X" when no write happened -- is a model-narration-without-tool-call behavior with no clean deterministic lever (the prose is already streamed before the turn errors). Left for discussion rather than patched, so this Addresses #320 but doesn't close it. Open questions:

  1. A context.ts prompt nudge (write files via a tool and confirm only after success; prefer a summary over reproducing large text verbatim)? Best-effort on weak models, but it would plausibly cut the recitation trigger, not just relabel the failure.
  2. A real agent-callable "save transcript" tool (Loom already has exportAsMarkdown) so the model never reconstructs the conversation -- sidesteps both the recitation stop and the false claim. That's a feature, not a bugfix.
  3. Upstream pi-ai note: the sentinel discards the real finish reason; preserving it would let us give a precise message. This shell-side humanization is the right stopgap regardless.

Review

Ran a Codex adversarial pass. It surfaced two risks:

  • Ordering / false positive (fixed in this PR): the sentinel check originally ran ahead of the JSON branch, so a structured api_error whose message embedded the phrase would be swallowed instead of getting its typed, retriable handling. Moved into the bare-string branch; added a regression test.
  • Activity feed (noted, out of scope): feedShell's top-level error case appends event.message raw. It doesn't affect "Save chat history to a Markdown file" fails with "An unknown error occurred" and writes no file #320 (this error arrives via message_end, which feedShell doesn't handle), and that feed is a raw/technical view by design -- flagging rather than widening scope.

Testing

  • Test-first; watched it fail (echoed verbatim) before implementing.
  • root npm test: 998 passed / 1 skipped · app tsc --noEmit clean · prettier/eslint clean.
  • Not live-eyeballed end-to-end (no Gemini key driven to organically trip the stop); this is a code-path + unit-test reproduction.

@dannon dannon force-pushed the fix/320-chat-history-save-error branch from f515b9b to d5fb49d Compare June 18, 2026 23:43
@dannon dannon marked this pull request as ready for review June 18, 2026 23:43
When a Gemini turn ends on a finish reason like RECITATION, SAFETY, or a
malformed tool call, pi-ai collapses it to a bare "An unknown error occurred"
and we echoed that straight into chat (galaxyproject#320) -- no file written, no hint at what
to do. The real reason is gone by the time the string reaches the renderer
(message_end with stopReason "error"), so detect the sentinel in the humanizer
and return something actionable: rephrase and resend, ask for a summary instead
of reproducing text verbatim, or switch models. The same sentinel shows up
across every pi-ai provider, so the match isn't Gemini-specific. It's caught
only in the bare-string branch so a structured error that merely embeds the
phrase still gets its typed handling.
@dannon dannon force-pushed the fix/320-chat-history-save-error branch from d5fb49d to 5ff827a Compare June 19, 2026 00:07
@dannon dannon merged commit bf970e2 into galaxyproject:main Jun 19, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant