From 2ec4c3c6e63e428ff44d2ad9d697f50906ab9bc3 Mon Sep 17 00:00:00 2001 From: John Chilton Date: Wed, 17 Jun 2026 17:43:53 -0400 Subject: [PATCH] discover-shed-tool: normalize tool-id-shaped queries before declaring miss MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit A tool-id token (`integron_finder`, `iuc/integron_finder`) — what a template `_plan_context` candidate hands the discover step — scores zero in `tool-search` while the human name (`integron finder`) hits the real `iuc/integron_finder` wrapper. Without normalization the loop emits a false `miss` and wrongly fall-throughs to authoring a tool that already exists. - index.md §1: strip `owner/`, split `_`/`-` into words, try human name + bare word; `miss` only honest after variants tried. Strengthen the caveat. - eval.md: add property "a miss survives query-variant normalization". - re-cast claude bundle (verify clean). - refinement journals: discover-shed-tool (applied fix) + advance-galaxy-draft-step (the test-pipeline finding; upstream _plan_context-hint question stays open). Surfaced by /test-pipeline interview-to-galaxy UC1 MRSA re-run. Co-Authored-By: Claude Opus 4.8 (1M context) --- .../claude/skills/discover-shed-tool/SKILL.md | 4 +- .../discover-shed-tool/_provenance.json | 6 +- ...6-06-17-uc1-mrsa-interview-galaxy-rerun.md | 56 +++++++++++++++++++ content/molds/discover-shed-tool/eval.md | 11 ++++ content/molds/discover-shed-tool/index.md | 4 +- .../2026-06-17-tool-id-query-normalization.md | 42 ++++++++++++++ 6 files changed, 118 insertions(+), 5 deletions(-) create mode 100644 content/molds/advance-galaxy-draft-step/refinements/2026-06-17-uc1-mrsa-interview-galaxy-rerun.md create mode 100644 content/molds/discover-shed-tool/refinements/2026-06-17-tool-id-query-normalization.md diff --git a/casts/claude/skills/discover-shed-tool/SKILL.md b/casts/claude/skills/discover-shed-tool/SKILL.md index b5a42d9..72ea725 100644 --- a/casts/claude/skills/discover-shed-tool/SKILL.md +++ b/casts/claude/skills/discover-shed-tool/SKILL.md @@ -100,6 +100,8 @@ gxwf tool-search "" --json --max-results 10 If an owner hint is present, add `--owner `. If an exact-name hint is present, add `--match-name`. Lowercase the query (the tool index does not lowercase, see component-tool-shed-search §6). +**Normalize a tool-id-shaped need before searching.** A caller (e.g. a template `_plan_context` that guessed a candidate) may hand this skill an XML-id token rather than a human name — `iuc/integron_finder`, `integron_finder`. The lexical index does **not** reliably match the underscored token: `tool-search integron_finder` returns no hits while `integron finder` and `integron` both score. So derive query variants instead of feeding the token verbatim: strip any `owner/` prefix, split on `_` / `-` into space-separated words, and also try the bare significant word. A `miss` is only honest after the name variants have been tried — a no-hit on the raw underscored token alone is a search artifact, not evidence the tool is absent. + #### 2. Triage hits For each hit, score on: @@ -144,7 +146,7 @@ Validate the recommendation with `validate-galaxy-tool-discovery` before returni The procedure assumes — and the skill must surface in its rationale when relevant — the following Tool Shed realities (full detail in component-tool-shed-search §6): - **Indexes are stale by design.** A freshly published tool may not appear; a deprecated tool may still appear. Treat absence as soft evidence, not proof. -- **Wildcard `*term*` wrapping** disables stemming; spelling matters. Try alternate phrasings before declaring `miss`. +- **Wildcard `*term*` wrapping** disables stemming; spelling matters. Try alternate phrasings before declaring `miss`. In particular an underscored or owner-prefixed **tool-id token** (`integron_finder`, `iuc/integron_finder`) can score zero where the **human name** (`integron finder`) hits — never declare `miss` on a single id-token query (see §1 normalization). - **No EDAM in shed search** — semantic queries that work in Galaxy's installed-toolbox search will not work here. Stick to lexical name/keyword queries. - **Same XML id across repos.** Hits collapse only on `(repoName, owner)`; expect duplicates that need triage. - **Repo-level discovery is a different surface.** For "find me the *package* that contains a tool about X" with server-side `owner:` / `category:` keywords, `gxwf repo-search` is the right command — out of scope for this skill but a known sibling. diff --git a/casts/claude/skills/discover-shed-tool/_provenance.json b/casts/claude/skills/discover-shed-tool/_provenance.json index 4c8dfde..cf1e68c 100644 --- a/casts/claude/skills/discover-shed-tool/_provenance.json +++ b/casts/claude/skills/discover-shed-tool/_provenance.json @@ -5,10 +5,10 @@ "name": "discover-shed-tool", "path": "content/molds/discover-shed-tool/index.md", "revision": 4, - "content_hash": "256e3861e4fd54e5afe1db0cb99954523faea71c06d46b32b2442d39fb646056", - "commit": "4e21bc4ffe429471256a2d268127e2a807769fa8" + "content_hash": "2e5fa1f8b22f1f9c984fa4c4dc05999294de5db706a9c1691d3af7b3459cbecb", + "commit": "80a19256fffab433c59f19940f1464d2e7a8779b" }, - "cast_at": "2026-06-17T19:52:17.013Z", + "cast_at": "2026-06-17T21:07:28.232Z", "cast_date": "2026-05-07", "cast_revision": 5, "cast_history": [ diff --git a/content/molds/advance-galaxy-draft-step/refinements/2026-06-17-uc1-mrsa-interview-galaxy-rerun.md b/content/molds/advance-galaxy-draft-step/refinements/2026-06-17-uc1-mrsa-interview-galaxy-rerun.md new file mode 100644 index 0000000..f0d6d84 --- /dev/null +++ b/content/molds/advance-galaxy-draft-step/refinements/2026-06-17-uc1-mrsa-interview-galaxy-rerun.md @@ -0,0 +1,56 @@ +--- +mold: advance-galaxy-draft-step +date: 2026-06-17 +intent: Fresh /test-pipeline re-run of interview-to-galaxy on the UC1 MRSA mobile-AMR case; drive the per-step loop's discover/resolve half against the live Tool Shed. +decision: open-question +--- + +## What worked + +- The loop oracles behaved exactly to spec on a real 15-step draft: + `draft-next-step` deterministically picked `integron_finder` (stable across + re-runs, `work` listed the step's TODO sentinels + `_plan_*`); + `draft-validate --concrete` → `draft valid` + `Concrete: OK` with the one + Resolved step's tool_state validated and the 14 drafty steps excluded from the + concrete projection; `draft-extract` returned the maximal concrete prefix + (staramr only, dependents dropped). +- The discover/resolve half resolved the named domain tools against the live + Tool Shed to real pinnable changesets: `iuc/isescan/isescan` @ `81539b9ae80a` + (1.7.3+galaxy0, newest) and `iuc/integron_finder/integron_finder` @ + `5429646e486d` (2.0.5+galaxy1, newest). + +## Gap that surfaced — discover query normalization + +`gxwf tool-search integron_finder` (the verbatim **tool-id token**, which is what +the template wrote into the step's `_plan_context` candidate — +"candidate iuc/integron_finder") returns **`No hits for query: integron_finder`**. +Only the **human-name** variants score: `integron finder` (34.60), +`integronfinder` (15.97), `integron` (51.33). ISEScan happened to score on its +bare token (`isescan` → 27.78) so it masked the issue. + +Concretely: a discover-shed-tool invocation that feeds the raw `_plan_context` +candidate string (`integron_finder`, an underscored tool-id token) straight into +`tool-search` would conclude "no acceptable shed candidate" and fall through to +`author-galaxy-tool-wrapper` — authoring a wrapper for a tool that **already +exists** in `iuc`. That's a false-negative discovery, the expensive wrong branch. + +## Open question + +Where should query normalization live, and what should it do? + +- The loop's step 2 ("Resolve a wrapper… run discover-shed-tool against the + step's `_plan_*` context") implies the query is derived from `_plan_context`. + If templates write tool-id-shaped candidates (`iuc/integron_finder`), discovery + needs to **variant-expand**: strip owner prefix, split on `_`/`-`, try the + human name, not just the literal token. +- Alternatively the template Mold should record the **human tool name** in + `_plan_context` (e.g. "Integron Finder") alongside or instead of the tool-id + guess, since the id token is exactly what a fresh search should not assume. +- Either way the eval property "no acceptable shed candidate falls through to + authoring" has a hidden precondition — that the *query* was a fair search. + A single-token search that misses an existing tool is a discovery bug + masquerading as a legitimate fallthrough. Worth an eval note or a + discover-shed-tool refinement on query construction. + +Not changing schema or references here — flagging the query-construction contract +between template `_plan_context`, the loop's discover step, and discover-shed-tool. diff --git a/content/molds/discover-shed-tool/eval.md b/content/molds/discover-shed-tool/eval.md index 3dd4efc..b8fd415 100644 --- a/content/molds/discover-shed-tool/eval.md +++ b/content/molds/discover-shed-tool/eval.md @@ -19,3 +19,14 @@ need it ran on. Concrete fixtures and their expected pins live in or similarly named repositories, the recommendation classifies the result as `weak`, explains the ambiguity, and surfaces alternates instead of silently pinning a low-confidence hit. + +## Property: a miss survives query-variant normalization + +- check: llm-judged +- assertion: a `miss` is not emitted on the strength of a single raw query when + the need is a tool-id-shaped token (underscored or `owner/`-prefixed). Before + falling through to authoring, the run tries lexical variants — strip the + `owner/` prefix, split `_`/`-` into words, try the bare significant word — so a + wrapper that exists under its human name (e.g. `integron finder` for an + `integron_finder` token) is found rather than mistaken for absent. A `miss` + that an obvious name-variant would have turned into a hit is a failure. diff --git a/content/molds/discover-shed-tool/index.md b/content/molds/discover-shed-tool/index.md index 052393f..1d6706f 100644 --- a/content/molds/discover-shed-tool/index.md +++ b/content/molds/discover-shed-tool/index.md @@ -122,6 +122,8 @@ gxwf tool-search "" --json --max-results 10 If an owner hint is present, add `--owner `. If an exact-name hint is present, add `--match-name`. Lowercase the query (the tool index does not lowercase, see [[component-tool-shed-search]] §6). +**Normalize a tool-id-shaped need before searching.** A caller (e.g. a template `_plan_context` that guessed a candidate) may hand this Mold an XML-id token rather than a human name — `iuc/integron_finder`, `integron_finder`. The lexical index does **not** reliably match the underscored token: `tool-search integron_finder` returns no hits while `integron finder` and `integron` both score. So derive query variants instead of feeding the token verbatim: strip any `owner/` prefix, split on `_` / `-` into space-separated words, and also try the bare significant word. A `miss` is only honest after the name variants have been tried — a no-hit on the raw underscored token alone is a search artifact, not evidence the tool is absent. + ### 2. Triage hits For each hit, score on: @@ -166,7 +168,7 @@ Validate the recommendation with `validate-galaxy-tool-discovery` before returni The procedure assumes — and the cast skill must surface in its rationale when relevant — the following Tool Shed realities (full detail in [[component-tool-shed-search]] §6): - **Indexes are stale by design.** A freshly published tool may not appear; a deprecated tool may still appear. Treat absence as soft evidence, not proof. -- **Wildcard `*term*` wrapping** disables stemming; spelling matters. Try alternate phrasings before declaring `miss`. +- **Wildcard `*term*` wrapping** disables stemming; spelling matters. Try alternate phrasings before declaring `miss`. In particular an underscored or owner-prefixed **tool-id token** (`integron_finder`, `iuc/integron_finder`) can score zero where the **human name** (`integron finder`) hits — never declare `miss` on a single id-token query (see §1 normalization). - **No EDAM in shed search** — semantic queries that work in Galaxy's installed-toolbox search will not work here. Stick to lexical name/keyword queries. - **Same XML id across repos.** Hits collapse only on `(repoName, owner)`; expect duplicates that need triage. - **Repo-level discovery is a different surface.** For "find me the *package* that contains a tool about X" with server-side `owner:` / `category:` keywords, `gxwf repo-search` is the right command — out of scope for this Mold but a known sibling. diff --git a/content/molds/discover-shed-tool/refinements/2026-06-17-tool-id-query-normalization.md b/content/molds/discover-shed-tool/refinements/2026-06-17-tool-id-query-normalization.md new file mode 100644 index 0000000..9880f96 --- /dev/null +++ b/content/molds/discover-shed-tool/refinements/2026-06-17-tool-id-query-normalization.md @@ -0,0 +1,42 @@ +--- +mold: discover-shed-tool +date: 2026-06-17 +intent: Close the discover-query false-negative surfaced by the UC1 MRSA interview-to-galaxy test-pipeline re-run. +decision: eval-add +--- + +## Finding + +During a `/test-pipeline interview-to-galaxy` run, the per-step loop's deferred +`integron_finder` step carried a `_plan_context` candidate of `iuc/integron_finder`. +`gxwf tool-search integron_finder` (the verbatim tool-id token) returns +**`No hits`**, while `integron finder` (34.60), `integronfinder` (15.97), and +`integron` (51.33) all score and resolve to the real `iuc/integron_finder` +wrapper (changeset `5429646e486d`, 2.0.5+galaxy1). ISEScan masked the class of +bug because its bare token (`isescan`) happens to score. + +A discover-shed-tool run that fed the raw underscored/owner-prefixed token into +`tool-search` would therefore emit `miss` and trigger the `discover-or-author` +fall-through to author a wrapper for a tool that **already exists** — the +expensive wrong branch, and a false `miss` that passes every static check. + +## Change applied + +- **Procedure §1 (Search):** added a "normalize a tool-id-shaped need before + searching" step — strip `owner/` prefix, split `_`/`-` into words, try the + human name and the bare significant word; a `miss` is only honest after the + variants are tried. +- **Caveats:** strengthened the "try alternate phrasings" bullet with the + concrete id-token-vs-human-name failure mode. +- **eval.md:** added property *"a miss survives query-variant normalization"* — + a `miss` that an obvious name-variant would have turned into a hit is a failure. +- Re-cast the Claude bundle (verify clean, no drift). + +## Open + +- The deeper contract question is whether the **template** Mold should write a + human-readable tool name into `_plan_context` (not only an id-token guess), so + the discover step gets a fair query by construction. Tracked in the sibling + refinement on [[advance-galaxy-draft-step]] (the loop that constructs the + discover query from `_plan_context`). This entry fixes the discovery side; + the upstream-hint side stays open.