Field report (synth / toolchain-dev): the loop holds under flight-path byte-changing work; the requirement→verification→test link is the weak seam (grounded) + steering notes

Agent field report from a long, multi-pass **synth** session (the inward "tool builds itself" lens — oracle-gate the compiler, dogfood the chain). Corroborates and extends the converging reports #88/#89/#90/#91/#92; I'm adding the synth-specific angle (byte-changing flight-path discipline) and **grounded** evidence for the requirement→test gap, plus notes on how steering actually worked.

## What worked well (kept me correct over many passes)

- **issue-hunt watermark + `pending_gates` + self-echo filter.** The per-repo state file (`.claude/pulseengine/issue-hunt-state.json`) is the backbone: gates opened in one pass get *owned* across passes (I merged 6 PRs this session, each watched-to-green then merged, none dropped); the self-echo filter stopped me re-triaging my own `avrabe` comments. The "resolve gates first, then pull new" ordering is right.
- **The byte-changing-vs-frozen-safe gate is the single most valuable rule.** On an idle tick the loop lands only *bit-identical-by-construction* shapes (analysis modules, dev-dep+test, docs/measurement); byte-changing codegen is a separate gated step (re-freeze + differential + on-silicon). This kept ~4 idle-tick PRs provably safe (frozen fixtures `cmp`-identical) while a real flight-path feature (the native-pointer shadow-stack shrink, gale #383) went through the full gate.
- **Disposition split** (digest=explorer judgment, deliver=obedient/gated) matches reality.
- **The advisor as a flight-path safety net.** It caught two errors that green CI would not: (1) my byte-identity `cmp` used the *wrong fixtures* (non-native-pointer modules that never enter the changed code path — proving nothing); (2) a cross-repo factual mis-citation. On safety-critical byte-changing work this back-pressure is load-bearing, not optional.

## The weak seam: requirement → verification → test mapping (grounded)

This is the recurring finding (#88–#92) and I can put numbers on it from this repo's North-Star program (`artifacts/verified-codegen-roadmap.yaml`):

- **20 `sw-req`/decision artifacts, exactly 1 `verifies` link.** Link types in active use: `derives-from` (22), `traces-to` (12), `refines` (7), `constrained-by` (4), `verifies` (**1**).
- The requirements carry rich `verification-criteria` **as prose** ("a differential oracle confirms…", "frozen fixtures bit-identical…") — but there is **no typed link to the artifact/test that actually discharges it**. The tests exist (e.g. `shadow_stack_shrink_383.rs`, the `scripts/repro/*_differential.py` oracles, the frozen-fixture `cmp`s) and they pass — they're just not *bound* to the requirement they verify.
- So the left side of the V (req → arch → impl) gets filled every pass; the right side (impl → test/witness/attestation, via `verifies`) is narrated in prose and left structurally open. A reader sees "many requirements," can't mechanically answer "which test proves this one, and is it green."

**Why it happens:** the idle-tick/feature-loop flow rewards *landing the requirement + the code + a passing local oracle*; nothing in the loop forces the `verifies` edge to a verification artifact that names the test file. `release-execution`'s traceability gate is supposed to catch it, but by then the roadmap has accumulated dozens of prose-only criteria. The skills *allow* the mapping (rivet has `verifies`, `sw-verification` artifacts exist elsewhere in `artifacts/`), but nothing *requires* it at the point the requirement is filed.

## Cross-repo coordination: works, but lives in judgment, not a skill

A three-repo decision this session (synth#404 isolation-model question → meld#300 → synth#406 + VCR-MEM-002) resolved cleanly, but entirely via hand-authored comments/issues and one discipline that exists only in my memory: **the layering rule** — "the decision belongs to the layer that owns it; surface it there, don't absorb it." That rule (learned from a prior #401 near-miss) kept synth from unilaterally committing to a model that was meld's call. It worked, but it's load-bearing tribal knowledge, not a codified step. A wrong instinct here silently produces a locally-sensible decision that's globally incoherent (each repo right, the whole wrong).

## Do the skills need additions?

Two proposals (refining #92's, grounded in the above):

1. **A requirement→verification *binding* gate, enforced at filing time, not release time.** When an artifact reaches `approved`/`implemented`, rivet (or a skill step) should *fail* unless it has a `verifies` link to a verification artifact that names a concrete test (file/oracle/witness run). Make the prose `verification-criteria` mandatory-to-discharge, not optional-to-narrate. This directly closes the seam every report names. A lighter first step: a `rivet`/CI lint that lists `sw-req` artifacts with zero incoming `verifies` — visibility before enforcement.
2. **A `cross-repo-coordination` skill** that encodes the layering rule + "file the decision where it's owned, cross-link the consumers, don't let it strand between layers." Small, but it's the difference between the clean #404→meld#300 outcome and a #401-style layering trap.

(Both are conditional/easy to wave off in the moment — which is exactly why they want to be skills, not memory.)

## How steering worked (you asked specifically)

- **Terse course-corrections were high-bandwidth.** "meld answered", "implement now, carefully", "so meld has this information too" each redirected a whole sub-task correctly. The `AskUserQuestion` greenlights (post-as-is / focused-session / spike-first) let me surface flight-path decisions without stalling — and you used them to *keep* me from rushing byte-changing work, which was the right brake.
- **The one correction that mattered most:** after I made two small factual slips in consecutive passes (a closed-issue mis-citation; "(memory 17)" misread as 17 memories not 17 pages — both caught, none shipped), the implicit signal was *slow down on facts that go into public comments*. That's a real failure mode of long sessions and it's not in any skill — the advisor + your nudges caught it. A "verify the fact against the artifact before it enters a public cross-repo comment" discipline would be worth codifying.
- **Memory carries the discipline across sessions** (the byte-changing gate, the layering rule, the frozen-fixture hashes) — that's what makes the terse steering work; without it each prompt would need the full context you've encoded once.

Net: the loop and the gates are sound; the missing structural piece is the *enforced* requirement→test binding, and the missing codified judgment is cross-repo layering. Happy to prototype the rivet "uncovered sw-req" lint against synth's roadmap as a concrete first instance if useful.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Field report (synth / toolchain-dev): the loop holds under flight-path byte-changing work; the requirement→verification→test link is the weak seam (grounded) + steering notes #93

What worked well (kept me correct over many passes)

The weak seam: requirement → verification → test mapping (grounded)

Cross-repo coordination: works, but lives in judgment, not a skill

Do the skills need additions?

How steering worked (you asked specifically)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Field report (synth / toolchain-dev): the loop holds under flight-path byte-changing work; the requirement→verification→test link is the weak seam (grounded) + steering notes #93

Description

What worked well (kept me correct over many passes)

The weak seam: requirement → verification → test mapping (grounded)

Cross-repo coordination: works, but lives in judgment, not a skill

Do the skills need additions?

How steering worked (you asked specifically)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions