Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion casts/claude/skills/discover-shed-tool/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,8 @@ gxwf tool-search "<keywords>" --json --max-results 10

If an owner hint is present, add `--owner <owner>`. If an exact-name hint is present, add `--match-name`. Lowercase the query (the tool index does not lowercase, see component-tool-shed-search §6).

**Normalize a tool-id-shaped need before searching.** A caller (e.g. a template `_plan_context` that guessed a candidate) may hand this skill an XML-id token rather than a human name — `iuc/integron_finder`, `integron_finder`. The lexical index does **not** reliably match the underscored token: `tool-search integron_finder` returns no hits while `integron finder` and `integron` both score. So derive query variants instead of feeding the token verbatim: strip any `owner/` prefix, split on `_` / `-` into space-separated words, and also try the bare significant word. A `miss` is only honest after the name variants have been tried — a no-hit on the raw underscored token alone is a search artifact, not evidence the tool is absent.

#### 2. Triage hits

For each hit, score on:
Expand Down Expand Up @@ -144,7 +146,7 @@ Validate the recommendation with `validate-galaxy-tool-discovery` before returni
The procedure assumes — and the skill must surface in its rationale when relevant — the following Tool Shed realities (full detail in component-tool-shed-search §6):

- **Indexes are stale by design.** A freshly published tool may not appear; a deprecated tool may still appear. Treat absence as soft evidence, not proof.
- **Wildcard `*term*` wrapping** disables stemming; spelling matters. Try alternate phrasings before declaring `miss`.
- **Wildcard `*term*` wrapping** disables stemming; spelling matters. Try alternate phrasings before declaring `miss`. In particular an underscored or owner-prefixed **tool-id token** (`integron_finder`, `iuc/integron_finder`) can score zero where the **human name** (`integron finder`) hits — never declare `miss` on a single id-token query (see §1 normalization).
- **No EDAM in shed search** — semantic queries that work in Galaxy's installed-toolbox search will not work here. Stick to lexical name/keyword queries.
- **Same XML id across repos.** Hits collapse only on `(repoName, owner)`; expect duplicates that need triage.
- **Repo-level discovery is a different surface.** For "find me the *package* that contains a tool about X" with server-side `owner:` / `category:` keywords, `gxwf repo-search` is the right command — out of scope for this skill but a known sibling.
Expand Down
6 changes: 3 additions & 3 deletions casts/claude/skills/discover-shed-tool/_provenance.json
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,10 @@
"name": "discover-shed-tool",
"path": "content/molds/discover-shed-tool/index.md",
"revision": 4,
"content_hash": "256e3861e4fd54e5afe1db0cb99954523faea71c06d46b32b2442d39fb646056",
"commit": "4e21bc4ffe429471256a2d268127e2a807769fa8"
"content_hash": "2e5fa1f8b22f1f9c984fa4c4dc05999294de5db706a9c1691d3af7b3459cbecb",
"commit": "80a19256fffab433c59f19940f1464d2e7a8779b"
},
"cast_at": "2026-06-17T19:52:17.013Z",
"cast_at": "2026-06-17T21:07:28.232Z",
"cast_date": "2026-05-07",
"cast_revision": 5,
"cast_history": [
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
---
mold: advance-galaxy-draft-step
date: 2026-06-17
intent: Fresh /test-pipeline re-run of interview-to-galaxy on the UC1 MRSA mobile-AMR case; drive the per-step loop's discover/resolve half against the live Tool Shed.
decision: open-question
---

## What worked

- The loop oracles behaved exactly to spec on a real 15-step draft:
`draft-next-step` deterministically picked `integron_finder` (stable across
re-runs, `work` listed the step's TODO sentinels + `_plan_*`);
`draft-validate --concrete` → `draft valid` + `Concrete: OK` with the one
Resolved step's tool_state validated and the 14 drafty steps excluded from the
concrete projection; `draft-extract` returned the maximal concrete prefix
(staramr only, dependents dropped).
- The discover/resolve half resolved the named domain tools against the live
Tool Shed to real pinnable changesets: `iuc/isescan/isescan` @ `81539b9ae80a`
(1.7.3+galaxy0, newest) and `iuc/integron_finder/integron_finder` @
`5429646e486d` (2.0.5+galaxy1, newest).

## Gap that surfaced — discover query normalization

`gxwf tool-search integron_finder` (the verbatim **tool-id token**, which is what
the template wrote into the step's `_plan_context` candidate —
"candidate iuc/integron_finder") returns **`No hits for query: integron_finder`**.
Only the **human-name** variants score: `integron finder` (34.60),
`integronfinder` (15.97), `integron` (51.33). ISEScan happened to score on its
bare token (`isescan` → 27.78) so it masked the issue.

Concretely: a discover-shed-tool invocation that feeds the raw `_plan_context`
candidate string (`integron_finder`, an underscored tool-id token) straight into
`tool-search` would conclude "no acceptable shed candidate" and fall through to
`author-galaxy-tool-wrapper` — authoring a wrapper for a tool that **already
exists** in `iuc`. That's a false-negative discovery, the expensive wrong branch.

## Open question

Where should query normalization live, and what should it do?

- The loop's step 2 ("Resolve a wrapper… run discover-shed-tool against the
step's `_plan_*` context") implies the query is derived from `_plan_context`.
If templates write tool-id-shaped candidates (`iuc/integron_finder`), discovery
needs to **variant-expand**: strip owner prefix, split on `_`/`-`, try the
human name, not just the literal token.
- Alternatively the template Mold should record the **human tool name** in
`_plan_context` (e.g. "Integron Finder") alongside or instead of the tool-id
guess, since the id token is exactly what a fresh search should not assume.
- Either way the eval property "no acceptable shed candidate falls through to
authoring" has a hidden precondition — that the *query* was a fair search.
A single-token search that misses an existing tool is a discovery bug
masquerading as a legitimate fallthrough. Worth an eval note or a
discover-shed-tool refinement on query construction.

Not changing schema or references here — flagging the query-construction contract
between template `_plan_context`, the loop's discover step, and discover-shed-tool.
11 changes: 11 additions & 0 deletions content/molds/discover-shed-tool/eval.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,3 +19,14 @@ need it ran on. Concrete fixtures and their expected pins live in
or similarly named repositories, the recommendation classifies the result as
`weak`, explains the ambiguity, and surfaces alternates instead of silently
pinning a low-confidence hit.

## Property: a miss survives query-variant normalization

- check: llm-judged
- assertion: a `miss` is not emitted on the strength of a single raw query when
the need is a tool-id-shaped token (underscored or `owner/`-prefixed). Before
falling through to authoring, the run tries lexical variants — strip the
`owner/` prefix, split `_`/`-` into words, try the bare significant word — so a
wrapper that exists under its human name (e.g. `integron finder` for an
`integron_finder` token) is found rather than mistaken for absent. A `miss`
that an obvious name-variant would have turned into a hit is a failure.
4 changes: 3 additions & 1 deletion content/molds/discover-shed-tool/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,8 @@ gxwf tool-search "<keywords>" --json --max-results 10

If an owner hint is present, add `--owner <owner>`. If an exact-name hint is present, add `--match-name`. Lowercase the query (the tool index does not lowercase, see [[component-tool-shed-search]] §6).

**Normalize a tool-id-shaped need before searching.** A caller (e.g. a template `_plan_context` that guessed a candidate) may hand this Mold an XML-id token rather than a human name — `iuc/integron_finder`, `integron_finder`. The lexical index does **not** reliably match the underscored token: `tool-search integron_finder` returns no hits while `integron finder` and `integron` both score. So derive query variants instead of feeding the token verbatim: strip any `owner/` prefix, split on `_` / `-` into space-separated words, and also try the bare significant word. A `miss` is only honest after the name variants have been tried — a no-hit on the raw underscored token alone is a search artifact, not evidence the tool is absent.

### 2. Triage hits

For each hit, score on:
Expand Down Expand Up @@ -166,7 +168,7 @@ Validate the recommendation with `validate-galaxy-tool-discovery` before returni
The procedure assumes — and the cast skill must surface in its rationale when relevant — the following Tool Shed realities (full detail in [[component-tool-shed-search]] §6):

- **Indexes are stale by design.** A freshly published tool may not appear; a deprecated tool may still appear. Treat absence as soft evidence, not proof.
- **Wildcard `*term*` wrapping** disables stemming; spelling matters. Try alternate phrasings before declaring `miss`.
- **Wildcard `*term*` wrapping** disables stemming; spelling matters. Try alternate phrasings before declaring `miss`. In particular an underscored or owner-prefixed **tool-id token** (`integron_finder`, `iuc/integron_finder`) can score zero where the **human name** (`integron finder`) hits — never declare `miss` on a single id-token query (see §1 normalization).
- **No EDAM in shed search** — semantic queries that work in Galaxy's installed-toolbox search will not work here. Stick to lexical name/keyword queries.
- **Same XML id across repos.** Hits collapse only on `(repoName, owner)`; expect duplicates that need triage.
- **Repo-level discovery is a different surface.** For "find me the *package* that contains a tool about X" with server-side `owner:` / `category:` keywords, `gxwf repo-search` is the right command — out of scope for this Mold but a known sibling.
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
---
mold: discover-shed-tool
date: 2026-06-17
intent: Close the discover-query false-negative surfaced by the UC1 MRSA interview-to-galaxy test-pipeline re-run.
decision: eval-add
---

## Finding

During a `/test-pipeline interview-to-galaxy` run, the per-step loop's deferred
`integron_finder` step carried a `_plan_context` candidate of `iuc/integron_finder`.
`gxwf tool-search integron_finder` (the verbatim tool-id token) returns
**`No hits`**, while `integron finder` (34.60), `integronfinder` (15.97), and
`integron` (51.33) all score and resolve to the real `iuc/integron_finder`
wrapper (changeset `5429646e486d`, 2.0.5+galaxy1). ISEScan masked the class of
bug because its bare token (`isescan`) happens to score.

A discover-shed-tool run that fed the raw underscored/owner-prefixed token into
`tool-search` would therefore emit `miss` and trigger the `discover-or-author`
fall-through to author a wrapper for a tool that **already exists** — the
expensive wrong branch, and a false `miss` that passes every static check.

## Change applied

- **Procedure §1 (Search):** added a "normalize a tool-id-shaped need before
searching" step — strip `owner/` prefix, split `_`/`-` into words, try the
human name and the bare significant word; a `miss` is only honest after the
variants are tried.
- **Caveats:** strengthened the "try alternate phrasings" bullet with the
concrete id-token-vs-human-name failure mode.
- **eval.md:** added property *"a miss survives query-variant normalization"* —
a `miss` that an obvious name-variant would have turned into a hit is a failure.
- Re-cast the Claude bundle (verify clean, no drift).

## Open

- The deeper contract question is whether the **template** Mold should write a
human-readable tool name into `_plan_context` (not only an id-token guess), so
the discover step gets a fair query by construction. Tracked in the sibling
refinement on [[advance-galaxy-draft-step]] (the loop that constructs the
discover query from `_plan_context`). This entry fixes the discovery side;
the upstream-hint side stays open.
Loading