Surface Galaxy job failures after a single tool run by dannon · Pull Request #249 · galaxyproject/loom

dannon · 2026-06-09T20:50:25Z

Fixes the trust gap in #210: a galaxy_run_tool whose job fails immediately wasn't showing up in Orbit chat. run_tool returns at submission ("Started tool ..." / HTTP 200), the agent reads that as success and moves on, and -- unlike a workflow invocation -- a single tool run leaves no loom-invocation block, so the background poller never advances it either. The failure stayed invisible until the user explicitly asked.

Two parts, both aimed at the within-turn case:

Deterministic nudge. New job-status-hint.ts, a message_end hook mirroring confusables-hint.ts. On a non-error galaxy_run_tool / galaxy_run_user_tool result it appends a short reminder that a 200 means submitted, not finished -- verify the job reached a terminal ok state (galaxy_get_job_details on one of the run's output dataset ids, or check the output dataset's state) and report failures now rather than waiting to be asked. It's idempotent and format-agnostic (no id parsing, no Galaxy call, no timers); it fires at the exact moment the model is about to misread the 200.
Sharper prompt guidance. The verification block now splits "workflow invocation" from "single tool run" instead of lumping them, and the Galaxy step section gains a "Running a single tool" subsection spelling out the run_tool -> poll get_job_details -> report-errors path. The old generic guidance already existed and a weak model ignored it, which is why the deterministic hook carries the weight here.

Note: galaxy_get_job_details takes the output dataset id (it returns the job that produced that dataset), not a job id -- the guidance reflects that.

Scope. This covers the within-turn case, which is the reported incident ("Galaxy UI immediately showed error"). Proactive cross-turn polling of tool-run jobs -- the background-poller equivalent for single jobs -- is a larger design call and intentionally left out. A cleaner durable fix would also live upstream in galaxy-mcp (have run_tool return clearer "submitted, not complete" language plus the ids), which would make this brain-side nudge belt-and-suspenders.

Testing: 529 root tests green (15 new across tests/job-status-hint.test.ts + tests/job-run-context.test.ts), root + app typecheck clean. Not yet live-eyeballed in Orbit.

Mitigates #210.

A galaxy_run_tool whose job failed immediately wasn't showing up in Orbit chat. run_tool returns at submission ("Started tool ..." / HTTP 200), the agent reads that as success and moves on, and unlike a workflow invocation a single tool run leaves no loom-invocation block, so the background poller never advances it either -- the failure stayed invisible until the user explicitly asked. Add a message_end hook (mirrors confusables-hint) that appends a "submitted != finished, verify terminal state and report failures" reminder onto successful run_tool / run_user_tool results, so the agent checks the job in the same turn. Also sharpen the context guidance: the verification block now splits workflow invocation from single tool run, and the Galaxy step section spells out the run_tool -> poll get_job_details -> report-errors path. This covers the within-turn case. Proactive cross-turn polling of tool-run jobs (the background-poller equivalent for jobs) is a separate, larger design call and is intentionally left out.

mvdbeek · 2026-06-10T07:31:23Z

+leaves no \`loom-invocation\` block — so nothing polls it for you. The
+"Started tool …" / HTTP 200 response means the job was **submitted, not
+that it finished**. Before you tell the user the step worked, confirm the
+job reached a terminal \`ok\` state: call \`galaxy_get_job_details\` with one


why aren't we directly checking the job via the job id ? Jobs are not obliged to produce outputs at all.

or might only produce collections, which would probably fail ?
does this work for map over jobs ?

mvdbeek · 2026-06-10T07:35:22Z

+// they get their own loom-invocation block + background poller.
+const JOB_RUN_TOOLS = new Set(["galaxy_run_tool", "galaxy_run_user_tool"]);
+
+export const JOB_SUBMITTED_HINT =


Could this live in a yaml document that is easy to review and update, like the help terms document we have in Galaxy ? The instructions also seem partially redundant with the context.ts changes

dannon · 2026-06-11T02:21:31Z

This is what I've been iterating on with Claude -- still thinking it through, so welcome your read before I rework the PR:

Both your points landed, and they pushed this somewhere a bit different from what's up now. Short version: the contested how-to (check via a dataset id, mind the jobs with no outputs / only collections / map-over) isn't content that needs a better home -- it's a sign the tool surface is wrong for the question. The only reason the prose has to spell out those caveats is that get_job_details(dataset_id) can't directly answer "did this job reach a terminal ok state?" Give get_job_details a job-id path and the no-output and collection-only caveats go away -- and run_tool already returns the job id(s), so the agent has the handle without guessing. Relocating the prose to a yaml doc or a skill just moves a half-wrong claim into a tidier drawer. The real fix is the tool.

That said, map-over is the case that doesn't fully fall out of a job-id path: a batch run is N jobs, so "check the job" becomes "check all the jobs, or the output collection's populated_state." That's exactly the kind of thing the tool should encapsulate rather than prose -- but worth flagging as the one that needs real thought, not a one-liner.

This is the plan Claude came up with after some iteration -- posting it to react to, not as a settled call. The cut is by what's actually stable:

The imperative is stable and belongs in the deterministic hook: "this returns at submission, not completion -- before you report results, confirm the job hit a terminal ok state, and surface a failure now." Zero id-mechanic needed to be correct.
The argument shape belongs in the tool schema, not a prose nudge. When prose has to compensate for the tool, that's the smell.

On the redundancy you flagged: the first instinct (move the detail into a skill the hook points at) is actually worse. The whole reason the hook is deterministic is that the ambient guidance already got ignored by a weak model. Shrink it to "verify -- go fetch the skill for how" and you've put a pull-on-demand hop back on the exact path that needed push: the model gets "verify" but has to choose to fetch the method, and if it doesn't it either ignores it (the original bug) or guesses a call (the wrong-dataset-id thing). A deterministic nudge can't depend on a pull for the part that makes it actionable.

Concretely:

Slim the hook to the pure imperative -- no dataset-id mechanic, no skill pointer. That answers the correctness point by deleting the wrong claim, not relocating it.
Dedupe context.ts to the one fact that isn't redundant: a single tool run is a job, not an invocation, so nothing polls it.
File the galaxy-mcp job-id path as a separate issue -- that's the structural fix for the no-output / collection / map-over cases. This loom PR ships on its own; the hook's imperative stays permanently (weak models ignore ambient guidance), and the tool fix just removes any temptation to put the mechanic back into prose.

On the yaml idea: standing up a doc for a single gotcha is overkill, but you're onto something longer-term -- there's a growing family of these deterministic hint injectors, and a loom-local data-driven hint registry is probably the right "data not code" move for that family. Worth doing as its own refactor when it's clearly earning its keep, rather than bolting it onto this issue.

Flipped back to draft while we sort the shape. Does the tool-fix-first framing match how you'd want to handle the no-output / collection / map-over cases, or is there a reason to keep a dataset-path stopgap in loom in the meantime?

mvdbeek reviewed Jun 10, 2026

View reviewed changes

dannon marked this pull request as draft June 11, 2026 01:23

dannon mentioned this pull request Jun 13, 2026

Galaxy job errors aren't surfaced proactively in Orbit chat -- only when the user asks #210

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Surface Galaxy job failures after a single tool run#249

Surface Galaxy job failures after a single tool run#249
dannon wants to merge 1 commit into
galaxyproject:mainfrom
dannon:fix/210-job-error-surfacing

dannon commented Jun 9, 2026

Uh oh!

mvdbeek Jun 10, 2026

Uh oh!

mvdbeek Jun 10, 2026

Uh oh!

mvdbeek Jun 10, 2026

Uh oh!

dannon commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dannon commented Jun 9, 2026

Uh oh!

mvdbeek Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

mvdbeek Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

mvdbeek Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

dannon commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants