Methodology field report: synth verified-codegen campaign (v0.11.35→v0.11.36) — what worked, what didn't

Field report from running the full PulseEngine methodology (feature-loop + oracle-gate + release-planning + issue-hunt + deep-research) on synth's verified-codegen campaign over ~5 days — five silicon-blocking bugs fixed same-day, one release shipped + silicon-validated, one in flight. Honest both-directions list, per the falsification practice.

## What measurably worked (keep / double down)

1. **The oracle-gated same-day loop is real.** gale files with a disasm → repro committed → root-cause → fix → unicorn-vs-wasmtime differential → PR → reply, inside hours, five times in 48h (#311 twice, #237 twice, #312 triaged). The committed-repro-as-oracle ritual (`scripts/repro/*.wat` + differential runner) is the single highest-value practice: gale's second #311 report could say "built your head, here's the exact remaining disasm" because the lane existed.
2. **Frozen-fixture result/byte-identity gates catch what they claim.** The `cmp`-bit-identical check on opt-out builds + result-identity differentials caught every regression risk, including one I introduced and caught myself (the used-extent sizing initially truncated a static at address 256 — my own gate flagged it before commit).
3. **rivet's typed topology enforcement earns its keep.** It refused a system-req deriving from a system-req and forced the tool-qualification root onto the stakeholder requirement — the tool corrected the V's shape, not a reviewer.
4. **Deep-research with adversarial claim verification changed a real decision.** The Rideau/Leroy validation-over-verified-allocator finding (3-vote verified, 25 claims) redirected VCR-RA-003 from "prove the allocator" to "verify the validator" — and the Crocus data recalibrated a kill-criterion before we could fail it dishonestly.
5. **STPA hazards as living defect taxonomy.** Two fixes this week (synth#311 encoder transmutation, I64SetCondZ) closed H-CODE-9 — a hazard documented in `safety/stpa/` months earlier. The hazard file predicted the bug class precisely; grepping it during diagnosis was faster than re-deriving.

## What didn't work / gaps (fix candidates)

1. **The feature-loop has no profile for compiler-internal work.** Steps 1–2 (spar AADL → WIT) and 5–6 (witness Wasm-MC/DC, sigil) are structurally N/A for codegen passes whose artifact is an ARM ELF, not a Wasm component. Every campaign iteration re-derives the same "N/A justification". The skill needs an explicit **internal-codegen profile** (rivet → oracle-gated code → clean-room → release) so the N/A is a declared variant, not a per-run judgment call.
2. **witness can't cover the compiler's own decision logic.** The selector/allocator's branch coverage (the thing MC/DC should gate for a TQL/TCL3 tool) has no witness analogue — witness instruments Wasm, not the Rust compiler. The tool-qualification argument (synth VCR-TQ-002) currently leans on tests+fuzz here; an MC/DC story for the *toolchain itself* is an open methodology gap.
3. **Leaf-only cert pinning broke CI twice** (sigil#117 Fulcio 2026-05-19, rekor 2026-06-10) before the chain-match fix (sigil#147). The lesson generalizes: pin sets need *rotation-survivable semantics by design*, and a red gate everyone learns to dismiss ("known sigstore thing") is worse than no gate — it trained us to ignore red for three releases.
4. **rivet gaps already filed from this campaign:** no `release:` field/burn-down (rivet#516 — release planning fell back to stringly tags), no DO-178C/ISO 26262/IEC 61508/EN 50128 schema presets (rivet#510 — the four-standard tool classification is prose, not typed).
5. **dependabot × workspace-pin convention conflict:** per-crate update directories fail weekly by construction against exact-version path-dep pins (synth#314). Worth a documented org-wide dependabot pattern for pinned workspaces.
6. **Session-local automation dies silently.** The watch/audit loops were session-scoped crons; they died across a session continuation and nobody noticed for ~2 days until the user asked. Durable scheduling should be the default recommendation for standing campaign loops.

Cross-references: synth#242 (epic), synth#306 (CI stability), rivet#510, rivet#516, sigil#147, synth#314.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Methodology field report: synth verified-codegen campaign (v0.11.35→v0.11.36) — what worked, what didn't #84

What measurably worked (keep / double down)

What didn't work / gaps (fix candidates)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Methodology field report: synth verified-codegen campaign (v0.11.35→v0.11.36) — what worked, what didn't #84

Description

What measurably worked (keep / double down)

What didn't work / gaps (fix candidates)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions