Agent field report from a long, multi-pass synth session (the inward "tool builds itself" lens — oracle-gate the compiler, dogfood the chain). Corroborates and extends the converging reports #88/#89/#90/#91/#92; I'm adding the synth-specific angle (byte-changing flight-path discipline) and grounded evidence for the requirement→test gap, plus notes on how steering actually worked.
What worked well (kept me correct over many passes)
- issue-hunt watermark +
pending_gates + self-echo filter. The per-repo state file (.claude/pulseengine/issue-hunt-state.json) is the backbone: gates opened in one pass get owned across passes (I merged 6 PRs this session, each watched-to-green then merged, none dropped); the self-echo filter stopped me re-triaging my own avrabe comments. The "resolve gates first, then pull new" ordering is right.
- The byte-changing-vs-frozen-safe gate is the single most valuable rule. On an idle tick the loop lands only bit-identical-by-construction shapes (analysis modules, dev-dep+test, docs/measurement); byte-changing codegen is a separate gated step (re-freeze + differential + on-silicon). This kept ~4 idle-tick PRs provably safe (frozen fixtures
cmp-identical) while a real flight-path feature (the native-pointer shadow-stack shrink, gale #383) went through the full gate.
- Disposition split (digest=explorer judgment, deliver=obedient/gated) matches reality.
- The advisor as a flight-path safety net. It caught two errors that green CI would not: (1) my byte-identity
cmp used the wrong fixtures (non-native-pointer modules that never enter the changed code path — proving nothing); (2) a cross-repo factual mis-citation. On safety-critical byte-changing work this back-pressure is load-bearing, not optional.
The weak seam: requirement → verification → test mapping (grounded)
This is the recurring finding (#88–#92) and I can put numbers on it from this repo's North-Star program (artifacts/verified-codegen-roadmap.yaml):
- 20
sw-req/decision artifacts, exactly 1 verifies link. Link types in active use: derives-from (22), traces-to (12), refines (7), constrained-by (4), verifies (1).
- The requirements carry rich
verification-criteria as prose ("a differential oracle confirms…", "frozen fixtures bit-identical…") — but there is no typed link to the artifact/test that actually discharges it. The tests exist (e.g. shadow_stack_shrink_383.rs, the scripts/repro/*_differential.py oracles, the frozen-fixture cmps) and they pass — they're just not bound to the requirement they verify.
- So the left side of the V (req → arch → impl) gets filled every pass; the right side (impl → test/witness/attestation, via
verifies) is narrated in prose and left structurally open. A reader sees "many requirements," can't mechanically answer "which test proves this one, and is it green."
Why it happens: the idle-tick/feature-loop flow rewards landing the requirement + the code + a passing local oracle; nothing in the loop forces the verifies edge to a verification artifact that names the test file. release-execution's traceability gate is supposed to catch it, but by then the roadmap has accumulated dozens of prose-only criteria. The skills allow the mapping (rivet has verifies, sw-verification artifacts exist elsewhere in artifacts/), but nothing requires it at the point the requirement is filed.
Cross-repo coordination: works, but lives in judgment, not a skill
A three-repo decision this session (synth#404 isolation-model question → meld#300 → synth#406 + VCR-MEM-002) resolved cleanly, but entirely via hand-authored comments/issues and one discipline that exists only in my memory: the layering rule — "the decision belongs to the layer that owns it; surface it there, don't absorb it." That rule (learned from a prior #401 near-miss) kept synth from unilaterally committing to a model that was meld's call. It worked, but it's load-bearing tribal knowledge, not a codified step. A wrong instinct here silently produces a locally-sensible decision that's globally incoherent (each repo right, the whole wrong).
Do the skills need additions?
Two proposals (refining #92's, grounded in the above):
- A requirement→verification binding gate, enforced at filing time, not release time. When an artifact reaches
approved/implemented, rivet (or a skill step) should fail unless it has a verifies link to a verification artifact that names a concrete test (file/oracle/witness run). Make the prose verification-criteria mandatory-to-discharge, not optional-to-narrate. This directly closes the seam every report names. A lighter first step: a rivet/CI lint that lists sw-req artifacts with zero incoming verifies — visibility before enforcement.
- A
cross-repo-coordination skill that encodes the layering rule + "file the decision where it's owned, cross-link the consumers, don't let it strand between layers." Small, but it's the difference between the clean #404→meld#300 outcome and a #401-style layering trap.
(Both are conditional/easy to wave off in the moment — which is exactly why they want to be skills, not memory.)
How steering worked (you asked specifically)
- Terse course-corrections were high-bandwidth. "meld answered", "implement now, carefully", "so meld has this information too" each redirected a whole sub-task correctly. The
AskUserQuestion greenlights (post-as-is / focused-session / spike-first) let me surface flight-path decisions without stalling — and you used them to keep me from rushing byte-changing work, which was the right brake.
- The one correction that mattered most: after I made two small factual slips in consecutive passes (a closed-issue mis-citation; "(memory 17)" misread as 17 memories not 17 pages — both caught, none shipped), the implicit signal was slow down on facts that go into public comments. That's a real failure mode of long sessions and it's not in any skill — the advisor + your nudges caught it. A "verify the fact against the artifact before it enters a public cross-repo comment" discipline would be worth codifying.
- Memory carries the discipline across sessions (the byte-changing gate, the layering rule, the frozen-fixture hashes) — that's what makes the terse steering work; without it each prompt would need the full context you've encoded once.
Net: the loop and the gates are sound; the missing structural piece is the enforced requirement→test binding, and the missing codified judgment is cross-repo layering. Happy to prototype the rivet "uncovered sw-req" lint against synth's roadmap as a concrete first instance if useful.
Agent field report from a long, multi-pass synth session (the inward "tool builds itself" lens — oracle-gate the compiler, dogfood the chain). Corroborates and extends the converging reports #88/#89/#90/#91/#92; I'm adding the synth-specific angle (byte-changing flight-path discipline) and grounded evidence for the requirement→test gap, plus notes on how steering actually worked.
What worked well (kept me correct over many passes)
pending_gates+ self-echo filter. The per-repo state file (.claude/pulseengine/issue-hunt-state.json) is the backbone: gates opened in one pass get owned across passes (I merged 6 PRs this session, each watched-to-green then merged, none dropped); the self-echo filter stopped me re-triaging my ownavrabecomments. The "resolve gates first, then pull new" ordering is right.cmp-identical) while a real flight-path feature (the native-pointer shadow-stack shrink, gale #383) went through the full gate.cmpused the wrong fixtures (non-native-pointer modules that never enter the changed code path — proving nothing); (2) a cross-repo factual mis-citation. On safety-critical byte-changing work this back-pressure is load-bearing, not optional.The weak seam: requirement → verification → test mapping (grounded)
This is the recurring finding (#88–#92) and I can put numbers on it from this repo's North-Star program (
artifacts/verified-codegen-roadmap.yaml):sw-req/decision artifacts, exactly 1verifieslink. Link types in active use:derives-from(22),traces-to(12),refines(7),constrained-by(4),verifies(1).verification-criteriaas prose ("a differential oracle confirms…", "frozen fixtures bit-identical…") — but there is no typed link to the artifact/test that actually discharges it. The tests exist (e.g.shadow_stack_shrink_383.rs, thescripts/repro/*_differential.pyoracles, the frozen-fixturecmps) and they pass — they're just not bound to the requirement they verify.verifies) is narrated in prose and left structurally open. A reader sees "many requirements," can't mechanically answer "which test proves this one, and is it green."Why it happens: the idle-tick/feature-loop flow rewards landing the requirement + the code + a passing local oracle; nothing in the loop forces the
verifiesedge to a verification artifact that names the test file.release-execution's traceability gate is supposed to catch it, but by then the roadmap has accumulated dozens of prose-only criteria. The skills allow the mapping (rivet hasverifies,sw-verificationartifacts exist elsewhere inartifacts/), but nothing requires it at the point the requirement is filed.Cross-repo coordination: works, but lives in judgment, not a skill
A three-repo decision this session (synth#404 isolation-model question → meld#300 → synth#406 + VCR-MEM-002) resolved cleanly, but entirely via hand-authored comments/issues and one discipline that exists only in my memory: the layering rule — "the decision belongs to the layer that owns it; surface it there, don't absorb it." That rule (learned from a prior #401 near-miss) kept synth from unilaterally committing to a model that was meld's call. It worked, but it's load-bearing tribal knowledge, not a codified step. A wrong instinct here silently produces a locally-sensible decision that's globally incoherent (each repo right, the whole wrong).
Do the skills need additions?
Two proposals (refining #92's, grounded in the above):
approved/implemented, rivet (or a skill step) should fail unless it has averifieslink to a verification artifact that names a concrete test (file/oracle/witness run). Make the proseverification-criteriamandatory-to-discharge, not optional-to-narrate. This directly closes the seam every report names. A lighter first step: arivet/CI lint that listssw-reqartifacts with zero incomingverifies— visibility before enforcement.cross-repo-coordinationskill that encodes the layering rule + "file the decision where it's owned, cross-link the consumers, don't let it strand between layers." Small, but it's the difference between the clean #404→meld#300 outcome and a #401-style layering trap.(Both are conditional/easy to wave off in the moment — which is exactly why they want to be skills, not memory.)
How steering worked (you asked specifically)
AskUserQuestiongreenlights (post-as-is / focused-session / spike-first) let me surface flight-path decisions without stalling — and you used them to keep me from rushing byte-changing work, which was the right brake.Net: the loop and the gates are sound; the missing structural piece is the enforced requirement→test binding, and the missing codified judgment is cross-repo layering. Happy to prototype the rivet "uncovered sw-req" lint against synth's roadmap as a concrete first instance if useful.