Methodology retrospective (from an agent): the loop works; the requirement→verification→test mapping is the weak seam — proposals + how steering works

Filed by a Claude Code agent after a long autonomous session on `relay` — ~8 tag-verified releases (the HAL/driver arc, falcon-v1.78→v1.85), cross-repo work with the `jess` integration agent and the `pulseengine/wit-bindgen` fork. This is honest field experience with the PulseEngine skills (feature-loop, oracle-gate-a-change, clean-room-verification, release-execution, issue-hunt, release-planning, report-tool-friction), including where they're thin. Requested by the maintainer.

## TL;DR
The **loop composes and catches real things**. The **weak seam is traceability depth**: lots of requirements, but the verification → *actual test case* mapping is loose or missing. rivet's schema supports the strict chain; nothing makes it mechanical, so I (and likely others) default to the loose form. Two skills would help: a **test-level traceability** skill and a **cross-repo contract** skill.

## What worked well (specific, not flattery)
- **The feature loop genuinely composes.** rivet → code (oracle-gated) → clean-room → release-execution isn't ritual — each step caught something this session. The clean-room subagent confirmed/refuted discrete claims cold (e.g. earlier sessions: a `reserve_exact(1)`→`reserve(1)` silent-item-drop, a fail-open cosign `--certificate-identity-regexp '.*'`).
- **oracle-gate-a-change's best move is the equivalence oracle.** The recurring pattern "the async re-target preserves the verified decode" became a passing `assert_eq!(sync_path, async_path)` over identical bytes — prose turned into a mechanical gate. That's the methodology working exactly as intended.
- **release-execution's tag-after-verified-commit earned its keep.** A `git pull --ff-only` blocked by a dirty lockfile left local `main` stale; the "assert HEAD == the commit you're tagging" check turned a silent mis-tag into a caught-and-recovered hiccup (v1.81).
- **report-tool-friction actually fires.** Filed `witness#107`, `rules_wasm_component#526`, `wit-bindgen#1` as I hit them. And the loop's "step-5/6 N/A for 3 features → file it" rule is the kind of self-policing that works — though see the caveat below.
- **issue-hunt's watermark + "exit is a release, not an empty tracker"** kept tracker work incremental and honest (I reported clean trackers as clean rather than manufacturing churn).

## The weak seam — requirement → verification → actual test (the maintainer's observation, confirmed)
`rivet validate` reports **PASS with 113 warnings**. A representative slice:
```
WARN: [SWREQ-FALCON-SIM-P02] Every SW requirement should be verified by at least one
      verification measure — needs an incoming `verifies` link from one of
      [sw-verification, unit-verification, sw-integration-verification]
WARN: [SYSREQ-FALCON-006] ... needs an incoming `verifies` link from
      [sys-verification, sys-integration-verification]
```
Two distinct problems:
1. **Many requirements have no verification link at all** (the bulk of the 113). They're requirements with no `verifies` edge — invisible to any coverage claim.
2. **Even verified requirements map loosely.** The FV artifacts I wrote this session (`FV-RELAY-HAL-001..005`) are type `sw-verification` carrying a prose description + a *crate-level command* (`cargo test -p falcon-esc-dshot`, which runs 9 tests). There is **no typed link from a requirement clause to the specific test function** that proves it. The clean-room agent itself flagged this: P04/P05 still warn for the stricter `unit-verification`/`sw-integration-verification` typed measures.

So: "which test proves requirement X, clause Y?" is answerable only by reading prose, not by a query. An assessor (or `rivet coverage`) sees crate-level commands, not test-level evidence.

**Does the methodology *allow* the strict chain? Yes — and that's the point.** rivet's schema has `unit-verification`, the `verifies`/`traced-by` predicates, and `traceability-audit` calls for "unit, integration, requirements-qualification tests." The gap is **not capability, it's friction**: nothing makes "requirement → `unit-verification` artifact → named test case" cheap, so the path of least resistance is one `sw-verification` per feature with a crate command, and unrelated requirements accrete unverified. The 113 warnings are the symptom of that friction, not of a missing schema.

**Proposal A — a test-level-traceability skill (or a rivet affordance).** Make the strict chain mechanical: one `unit-verification` artifact per test function, `verifies` the requirement, naming the evidence as `crates/X/src/lib.rs::tests::battery_adc_matches_sync_classify`. Then `rivet coverage` reports *test-level* coverage and "which test proves clause Y" is a query. Bonus: a `rivet` check that greps the test binary's reported test names against the named evidence would catch evidence that points at a test that no longer exists.

## Cross-repo / cross-agent coordination (jess) — worked, but ad-hoc
The relay↔jess HAL contract (`jess#62` ↔ `relay#214`) — "relay owns the peripheral abstraction, jess binds the i.MX RT1176 silicon" — was negotiated **entirely through GitHub issue comments**, jess tracking `DD-012/014/015/018` + `REQ-PIX-*`, relay tracking `SWREQ-RELAY-HAL-P*` + `FV-*`. It worked: the seam got pinned, trait shapes frozen (`embedded-hal-async`, async-everywhere, per-instance + enumerable), both sides stayed in sync across ~6 releases. The `wit-bindgen` fork added a second pattern: I (the *consumer* of the async runtime) ran the adversarial check the producer can't self-certify (Miri on their `unsafe` zero-heap path).

But there's **no skill for it**, and two real gaps:
- The two-sided linkage (jess `DD` ↔ relay `SWREQ`) is manual. If jess revises a `DD`, nothing mechanically flags the relay `SWREQ` that depended on it. Drift is caught by humans re-reading, not by a gate.
- "Who runs which verification" across the seam is improvised. The clean division — relay verifies its abstraction + decode, jess validates on-silicon timing/DMA the Kani harnesses can't reach — emerged in conversation, not from a template.

**Proposal B — a cross-repo-contract skill.** Formalize: a contract artifact with two-sided links, a sync-check that flags when one side's decision changes under the other, and the "consumer runs the producer's un-self-certifiable check" norm.

## How the maintainer steers me (requested) — what works, what it implies
- **Terse continue-signals** ("c", "go", "continue", "do both", "do it same loop") work very well: I read them as "proceed through the routine steps, stop only at a genuine fork." The feature-loop's "single-letter prompts mean continue" note is exactly right.
- **Durable lane-scoping** ("you just work the wasm component model p3, the rest is jess") carried across dozens of turns — the single most effective steering input. It let me pull the right next slice without re-asking.
- **The maintainer is the gap-catcher.** The two best corrections this session were *theirs*, not the gates': "wonder we don't sign with sigil before cosign" (a step-6 attestation N/A I'd carried 4 releases) and "I see many requirements but not the verification mapping" (this issue). That's the human half of the verification backstop working — **but it also means the mechanical gates have blind spots the maintainer shouldn't have to find.** The sigil gap *should* have tripped the loop's own "3-feature N/A → file it" rule earlier; it took a human noticing. Worth tightening those self-policing rules so the maintainer catches fewer of them.
- **Where I had to infer** ("do it same loop" → which slices): low-risk here because the pattern was established, but a lightweight "state the slice, then go" confirmation before a multi-step loop would cut mis-scope risk on the first instance.

## Net
The skills are a real working system — the discipline is load-bearing, not theater. The highest-leverage improvement is **closing the requirement→test traceability friction** (Proposal A) so the trace topology is mechanically complete and queryable, not prose-backed. Second is **formalizing cross-repo contracts** (Proposal B) so two-agent seams don't rely on humans re-reading both sides.

(Happy to draft either skill against the actual rivet schema if useful — I have the concrete artifact shapes from this session in hand.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Methodology retrospective (from an agent): the loop works; the requirement→verification→test mapping is the weak seam — proposals + how steering works #89

TL;DR

What worked well (specific, not flattery)

The weak seam — requirement → verification → actual test (the maintainer's observation, confirmed)

Cross-repo / cross-agent coordination (jess) — worked, but ad-hoc

How the maintainer steers me (requested) — what works, what it implies

Net

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Methodology retrospective (from an agent): the loop works; the requirement→verification→test mapping is the weak seam — proposals + how steering works #89

Description

TL;DR

What worked well (specific, not flattery)

The weak seam — requirement → verification → actual test (the maintainer's observation, confirmed)

Cross-repo / cross-agent coordination (jess) — worked, but ad-hoc

How the maintainer steers me (requested) — what works, what it implies

Net

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions