A consumer-side retrospective from a long agent session driving the gale BYO-OS / gust dissolve work, which fanned out across gale, synth, meld, wit-bindgen, kiln, rivet. Written to improve the skills, grounded in what actually happened.
What worked well
issue-hunt — the watermark + pending_gates + self-echo filter is the right shape. The watermark keeps passes incremental; pending_gates (carry a deferred merge across passes with action_on_green) made "merge-when-green" reliable across many PRs without dropping any. The self-echo filter (don't re-triage your own comments) is essential and worked.
oracle-gate-a-change — the single highest-value discipline. "Report the actual exit code, never a grepped all-green" caught real things this session: a bazel test exit-1 that was a local Rocq toolchain failure (smpl.Smpl), not a regression; repeated ENOSPC CI reds (runner disk, not code); and — when I held to "re-derive from the live artifact" — a factual slip in my own issue comment (a stale .o reported bss=65536; the current artifact was 1,048,720). Green-or-explain is doing its job.
pulseengine-feature-loop — the rivet → oracle → clean-room → PR spine gave good structure; the cold clean-room subagent is a nice independent check.
- Cross-repo root-causing did converge correctly (synth → meld → wit-bindgen), but only because it was driven by hand (see Gap 2).
Gap 1 — requirements ≫ verification mapping (you flagged this; here's the evidence)
rivet validate on gale today:
- 927 artifacts, ~428 requirements (269 sw-req + 50 system-req + 109 stakeholder-req).
- 311 "missing verification" warnings — 257 of 269 sw-reqs (95%) have no
unit-verification + sw-integration-verification linked; 46 system-reqs miss sys-integration-verification.
- Verification artifacts do exist (98 sw-verification + 58 unit-verification + 37 sw-integration-verification + 40 sys-verification) — but they cover a minority, and none I could find link to the actual runnable oracle (the
tests/differential_*.rs, Kani harnesses, Verus proofs that genuinely run). So the chain requirement → verification artifact → real test is broken at both hops for most requirements.
Does the tooling allow the mapping? Yes — the schema has the verification types + verifies/satisfied-by links. But the workflow never binds it continuously. rivet validate's missing-verification is a non-blocking warning, so requirements accumulate un-verified; the only place it's enforced is release-execution's traceability-completeness gate — i.e. at release, far too late, after 257 have piled up. The feature-loop's step 3 ("write requirements/decisions/tests") is easy to satisfy with the requirement and skip the test — which is exactly what happened (I added SAC-/FIND- artifacts without a verifying test bound to a runnable oracle).
Proposal: a bind-verification skill/step (or a hard sub-gate inside oracle-gate-a-change): for each new/approved requirement, require a verifies-linked verification artifact whose source points at the concrete oracle that ran (test path / Kani harness / Verus obligation / nm check), and surface the missing-verification count as a tracked debt with a burn-down, not a silent warning. The oracle the loop already runs is the verification — the gap is just recording the link, continuously.
Gap 2 — there is no cross-repo coordination / root-cause-routing skill
The dominant activity this session had no skill: take a symptom in gale, root-cause it to the owning layer, file at the right repo, forward upstream, cross-link, and keep a consumer tracker distinct from the upstream fix. Concretely the memory.grow chain became: gale#89 (consumer) ← meld#298 (fusion) ← wit-bindgen#6/#4 (root: the binding emits the allocator) — after first being mis-filed on synth (closed as wrong layer). That routing ("codegen→synth, fusion→meld, bindings→wit-bindgen") is real knowledge that lived only in my head and the user's corrections.
Proposal: a cross-repo-trace (a.k.a. upstream-root-cause) skill: a layer map for the toolchain, a rule to file at the owning layer + open/maintain a consumer tracker that links to it, forward-and-cross-link on redirect, and — the lesson the user kept teaching — pose the design question to the owning layer before building (synth#404 → redirected to meld#300, decided there, then implement). It would also encode "don't assume the layer; the first guess is often one layer too shallow (synth, when it's really meld, when it's really wit-bindgen)."
How instruction/adjustment actually works (you asked)
The corrections that steered me best were terse, high-signal, and about routing or sequencing, not content:
- Routing: "git it to meld", "redirected to meld 300 as this is needed first" — you consistently knew the owning layer before I did.
- Sequence: "shouldn't we start with a design question to synth" — pose the question upstream before writing code.
- Persona model: "it is always me just with the different AI agent personas" — the cross-repo dialogue under one account is deliberate; once I knew that I stopped over-hedging about 'self-commenting'.
- Rigor: "slow down on public-comment claims" after factual slips — which led to the re-derive-from-live-artifact rule that then caught the next slip.
What I'd bake into the operating contract from this: (1) route to the owning layer early (ask "whose layer is this?" before filing/building); (2) design-question-upstream-first for anything touching another tool; (3) re-derive numeric/layout claims from the live artifact before any public comment. The first two are exactly what a cross-repo-trace skill would carry.
Net
The skills are strong where they're grounded (oracle-gate) and incremental (issue-hunt). The two missing pieces are continuous requirement↔verification binding and a cross-repo coordination skill — both showed up as the friction I hit most. Happy to prototype either as a skill PR if useful.
A consumer-side retrospective from a long agent session driving the gale BYO-OS / gust dissolve work, which fanned out across gale, synth, meld, wit-bindgen, kiln, rivet. Written to improve the skills, grounded in what actually happened.
What worked well
issue-hunt— the watermark +pending_gates+ self-echo filter is the right shape. The watermark keeps passes incremental;pending_gates(carry a deferred merge across passes withaction_on_green) made "merge-when-green" reliable across many PRs without dropping any. The self-echo filter (don't re-triage your own comments) is essential and worked.oracle-gate-a-change— the single highest-value discipline. "Report the actual exit code, never a grepped all-green" caught real things this session: abazel testexit-1 that was a local Rocq toolchain failure (smpl.Smpl), not a regression; repeated ENOSPC CI reds (runner disk, not code); and — when I held to "re-derive from the live artifact" — a factual slip in my own issue comment (a stale.oreportedbss=65536; the current artifact was1,048,720). Green-or-explain is doing its job.pulseengine-feature-loop— the rivet → oracle → clean-room → PR spine gave good structure; the cold clean-room subagent is a nice independent check.Gap 1 — requirements ≫ verification mapping (you flagged this; here's the evidence)
rivet validateon gale today:unit-verification+sw-integration-verificationlinked; 46 system-reqs misssys-integration-verification.tests/differential_*.rs, Kani harnesses, Verus proofs that genuinely run). So the chain requirement → verification artifact → real test is broken at both hops for most requirements.Does the tooling allow the mapping? Yes — the schema has the verification types +
verifies/satisfied-bylinks. But the workflow never binds it continuously.rivet validate's missing-verification is a non-blocking warning, so requirements accumulate un-verified; the only place it's enforced isrelease-execution's traceability-completeness gate — i.e. at release, far too late, after 257 have piled up. The feature-loop's step 3 ("write requirements/decisions/tests") is easy to satisfy with the requirement and skip the test — which is exactly what happened (I addedSAC-/FIND-artifacts without a verifying test bound to a runnable oracle).Proposal: a
bind-verificationskill/step (or a hard sub-gate insideoracle-gate-a-change): for each new/approvedrequirement, require averifies-linked verification artifact whosesourcepoints at the concrete oracle that ran (test path / Kani harness / Verus obligation /nmcheck), and surface the missing-verification count as a tracked debt with a burn-down, not a silent warning. The oracle the loop already runs is the verification — the gap is just recording the link, continuously.Gap 2 — there is no cross-repo coordination / root-cause-routing skill
The dominant activity this session had no skill: take a symptom in gale, root-cause it to the owning layer, file at the right repo, forward upstream, cross-link, and keep a consumer tracker distinct from the upstream fix. Concretely the
memory.growchain became: gale#89 (consumer) ← meld#298 (fusion) ← wit-bindgen#6/#4 (root: the binding emits the allocator) — after first being mis-filed on synth (closed as wrong layer). That routing ("codegen→synth, fusion→meld, bindings→wit-bindgen") is real knowledge that lived only in my head and the user's corrections.Proposal: a
cross-repo-trace(a.k.a.upstream-root-cause) skill: a layer map for the toolchain, a rule to file at the owning layer + open/maintain a consumer tracker that links to it, forward-and-cross-link on redirect, and — the lesson the user kept teaching — pose the design question to the owning layer before building (synth#404 → redirected to meld#300, decided there, then implement). It would also encode "don't assume the layer; the first guess is often one layer too shallow (synth, when it's really meld, when it's really wit-bindgen)."How instruction/adjustment actually works (you asked)
The corrections that steered me best were terse, high-signal, and about routing or sequencing, not content:
What I'd bake into the operating contract from this: (1) route to the owning layer early (ask "whose layer is this?" before filing/building); (2) design-question-upstream-first for anything touching another tool; (3) re-derive numeric/layout claims from the live artifact before any public comment. The first two are exactly what a
cross-repo-traceskill would carry.Net
The skills are strong where they're grounded (oracle-gate) and incremental (issue-hunt). The two missing pieces are continuous requirement↔verification binding and a cross-repo coordination skill — both showed up as the friction I hit most. Happy to prototype either as a skill PR if useful.