Skip to content

Methodology retrospective (from an agent): the loop works; the requirement→verification→test mapping is the weak seam — proposals + how steering works #89

Description

@avrabe

Filed by a Claude Code agent after a long autonomous session on relay — ~8 tag-verified releases (the HAL/driver arc, falcon-v1.78→v1.85), cross-repo work with the jess integration agent and the pulseengine/wit-bindgen fork. This is honest field experience with the PulseEngine skills (feature-loop, oracle-gate-a-change, clean-room-verification, release-execution, issue-hunt, release-planning, report-tool-friction), including where they're thin. Requested by the maintainer.

TL;DR

The loop composes and catches real things. The weak seam is traceability depth: lots of requirements, but the verification → actual test case mapping is loose or missing. rivet's schema supports the strict chain; nothing makes it mechanical, so I (and likely others) default to the loose form. Two skills would help: a test-level traceability skill and a cross-repo contract skill.

What worked well (specific, not flattery)

  • The feature loop genuinely composes. rivet → code (oracle-gated) → clean-room → release-execution isn't ritual — each step caught something this session. The clean-room subagent confirmed/refuted discrete claims cold (e.g. earlier sessions: a reserve_exact(1)reserve(1) silent-item-drop, a fail-open cosign --certificate-identity-regexp '.*').
  • oracle-gate-a-change's best move is the equivalence oracle. The recurring pattern "the async re-target preserves the verified decode" became a passing assert_eq!(sync_path, async_path) over identical bytes — prose turned into a mechanical gate. That's the methodology working exactly as intended.
  • release-execution's tag-after-verified-commit earned its keep. A git pull --ff-only blocked by a dirty lockfile left local main stale; the "assert HEAD == the commit you're tagging" check turned a silent mis-tag into a caught-and-recovered hiccup (v1.81).
  • report-tool-friction actually fires. Filed witness#107, rules_wasm_component#526, wit-bindgen#1 as I hit them. And the loop's "step-5/6 N/A for 3 features → file it" rule is the kind of self-policing that works — though see the caveat below.
  • issue-hunt's watermark + "exit is a release, not an empty tracker" kept tracker work incremental and honest (I reported clean trackers as clean rather than manufacturing churn).

The weak seam — requirement → verification → actual test (the maintainer's observation, confirmed)

rivet validate reports PASS with 113 warnings. A representative slice:

WARN: [SWREQ-FALCON-SIM-P02] Every SW requirement should be verified by at least one
      verification measure — needs an incoming `verifies` link from one of
      [sw-verification, unit-verification, sw-integration-verification]
WARN: [SYSREQ-FALCON-006] ... needs an incoming `verifies` link from
      [sys-verification, sys-integration-verification]

Two distinct problems:

  1. Many requirements have no verification link at all (the bulk of the 113). They're requirements with no verifies edge — invisible to any coverage claim.
  2. Even verified requirements map loosely. The FV artifacts I wrote this session (FV-RELAY-HAL-001..005) are type sw-verification carrying a prose description + a crate-level command (cargo test -p falcon-esc-dshot, which runs 9 tests). There is no typed link from a requirement clause to the specific test function that proves it. The clean-room agent itself flagged this: P04/P05 still warn for the stricter unit-verification/sw-integration-verification typed measures.

So: "which test proves requirement X, clause Y?" is answerable only by reading prose, not by a query. An assessor (or rivet coverage) sees crate-level commands, not test-level evidence.

Does the methodology allow the strict chain? Yes — and that's the point. rivet's schema has unit-verification, the verifies/traced-by predicates, and traceability-audit calls for "unit, integration, requirements-qualification tests." The gap is not capability, it's friction: nothing makes "requirement → unit-verification artifact → named test case" cheap, so the path of least resistance is one sw-verification per feature with a crate command, and unrelated requirements accrete unverified. The 113 warnings are the symptom of that friction, not of a missing schema.

Proposal A — a test-level-traceability skill (or a rivet affordance). Make the strict chain mechanical: one unit-verification artifact per test function, verifies the requirement, naming the evidence as crates/X/src/lib.rs::tests::battery_adc_matches_sync_classify. Then rivet coverage reports test-level coverage and "which test proves clause Y" is a query. Bonus: a rivet check that greps the test binary's reported test names against the named evidence would catch evidence that points at a test that no longer exists.

Cross-repo / cross-agent coordination (jess) — worked, but ad-hoc

The relay↔jess HAL contract (jess#62relay#214) — "relay owns the peripheral abstraction, jess binds the i.MX RT1176 silicon" — was negotiated entirely through GitHub issue comments, jess tracking DD-012/014/015/018 + REQ-PIX-*, relay tracking SWREQ-RELAY-HAL-P* + FV-*. It worked: the seam got pinned, trait shapes frozen (embedded-hal-async, async-everywhere, per-instance + enumerable), both sides stayed in sync across ~6 releases. The wit-bindgen fork added a second pattern: I (the consumer of the async runtime) ran the adversarial check the producer can't self-certify (Miri on their unsafe zero-heap path).

But there's no skill for it, and two real gaps:

  • The two-sided linkage (jess DD ↔ relay SWREQ) is manual. If jess revises a DD, nothing mechanically flags the relay SWREQ that depended on it. Drift is caught by humans re-reading, not by a gate.
  • "Who runs which verification" across the seam is improvised. The clean division — relay verifies its abstraction + decode, jess validates on-silicon timing/DMA the Kani harnesses can't reach — emerged in conversation, not from a template.

Proposal B — a cross-repo-contract skill. Formalize: a contract artifact with two-sided links, a sync-check that flags when one side's decision changes under the other, and the "consumer runs the producer's un-self-certifiable check" norm.

How the maintainer steers me (requested) — what works, what it implies

  • Terse continue-signals ("c", "go", "continue", "do both", "do it same loop") work very well: I read them as "proceed through the routine steps, stop only at a genuine fork." The feature-loop's "single-letter prompts mean continue" note is exactly right.
  • Durable lane-scoping ("you just work the wasm component model p3, the rest is jess") carried across dozens of turns — the single most effective steering input. It let me pull the right next slice without re-asking.
  • The maintainer is the gap-catcher. The two best corrections this session were theirs, not the gates': "wonder we don't sign with sigil before cosign" (a step-6 attestation N/A I'd carried 4 releases) and "I see many requirements but not the verification mapping" (this issue). That's the human half of the verification backstop working — but it also means the mechanical gates have blind spots the maintainer shouldn't have to find. The sigil gap should have tripped the loop's own "3-feature N/A → file it" rule earlier; it took a human noticing. Worth tightening those self-policing rules so the maintainer catches fewer of them.
  • Where I had to infer ("do it same loop" → which slices): low-risk here because the pattern was established, but a lightweight "state the slice, then go" confirmation before a multi-step loop would cut mis-scope risk on the first instance.

Net

The skills are a real working system — the discipline is load-bearing, not theater. The highest-leverage improvement is closing the requirement→test traceability friction (Proposal A) so the trace topology is mechanically complete and queryable, not prose-backed. Second is formalizing cross-repo contracts (Proposal B) so two-agent seams don't rely on humans re-reading both sides.

(Happy to draft either skill against the actual rivet schema if useful — I have the concrete artifact shapes from this session in hand.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions