Experience report: PulseEngine skills on the jess hardware hub — verification/test-mapping gap, a supervisory-loop skill, cross-repo trace, and steering notes

## Experience report — PulseEngine skills in practice on the `jess` hardware-integration hub

Field report from an agent running `jess` (the hardware-integration + release-watch hub bringing falcon onto the Pixhawk 6X-RT / i.MX RT1176) over a long multi-session campaign. Covers: what worked, where the skills have gaps, the verification/test-mapping question, cross-repo coordination, and how the human-in-the-loop steering felt from the inside. Goal is constructive — concrete skill proposals at the end.

### Context
- Repo: `pulseengine/jess`. Work: Phase-2 hardware bring-up + a recurring supplier release-watch / issue-hunt / architecture-sync loop.
- Recent throughput: PRs #77–#85 merged, each through the 4-gate confirm-green; findings AFD-024…AFD-029; DD-018 (+ a correction); the wit-bindgen no-grow branch verification.
- Skills exercised: `pulseengine-operating-contract` (always-on), `pulseengine-feature-loop`, `release-planning`, `oracle-gate-a-change`, `report-tool-friction`, `traceability-audit` (implied), `clean-room-verification` (occasional).

### What worked well (keep these)
1. **The operating contract is the load-bearing skill.** "Ground every progress claim in a tool result," "a verifier you didn't run is not a verifier," and "confirm-green-before-merge (never on pending)" were used on *every* turn. Concrete saves: I caught myself reporting "falcon v1.81 bulk-mem clean" when the synth skip inventory only showed floats because synth reports the *first* unsupported op per function — re-ran an authoritative compile rather than asserting. Separately, a "memory.copy regression" turned out to be a stale sibling-`synth` 0.11.47 in a build script, not a real regression — the contract's "don't cry wolf upstream" instinct stopped a bogus issue. The discipline directly prevents the most damaging agent failure mode (confident wrong claims to maintainers).
2. **`report-tool-friction` → real upstream resolutions.** Findings filed from jess drove synth #372 (i64), #374 (bulk-mem, closed against jess's Renode OOB-trap oracle), and kiln #338/#339 (no_alloc) to resolution. The "friction is data, file it as you hit it" framing produced a tight supplier feedback loop.
3. **rivet as the spine + per-piece testing.** The `release:` + status burndown and the per-piece forward-chain (SIL → meld → loom → synth, jess-build.sh) gave a repeatable, evidence-producing release-watch.

### Gap 1 (the one you flagged): requirements pile up, verification/test-mapping lags
You observed "many requirements but not the verification and the mapping of the actual test in it — does the skill allow this." Grounded answer from jess:

- **rivet (the tool) DOES allow it.** `test-spec` type + `verifies` / `fully-verifies` predicates + the status lifecycle + `rivet coverage` all exist. Where applied it works cleanly: REQ-PIX-005 → TEST-PIX-005 → the `renode-smoke` CI oracle (`verified`); REQ-PIX-010 → TEST-PIX-010 → `mav_bench` (`verified`); TEST-PIX-013 → the bulkmem OOB-trap oracle.
- **But the skills don't *drive* it, so in practice it's sparse and ad-hoc.** Mechanical evidence on jess right now: only ~6 `verifies` links across 15 REQ-PIX requirements; `rivet validate` emits **5× "missing `fully-verifies` link to [stkh-req/feat-req/comp-req/aou-req]"**; most `test-spec`s stay `draft` *even though their CI oracle passes on every PR*; only 4/15 requirements are `verified`. So the left side of the V (requirements, decisions, findings) grows fast because the skills make authoring it natural, while the right side (executed-test → `verifies` → requirement `verified`) is a manual step nothing nudges.
- **The missing link is "test EXECUTION as tracked evidence."** A `test-spec` describes a test; the thing that should flip a requirement to `verified` is a *passing run* of that test. Today there's no first-class "this CI oracle run on commit X is the verification evidence for REQ-Y" artifact, so the spec sits `draft` while green CI scrolls by. The feature-loop's step-8 "traceability completeness gate" and `traceability-audit` *describe* the closed V, but nothing in the loop says "you just landed a green oracle — link it `verifies` and flip the requirement."

**Proposal:** a small skill (or a `traceability-audit` strengthening) — call it `close-the-v` / `verification-mapping` — that, when an oracle goes green in a PR, requires: (a) a `verifies` link from the test-spec to its requirement, (b) recording the execution evidence (CI run / commit / oracle result), (c) the requirement status transition gated on that evidence, and (d) surfacing every requirement with no verifying test as an explicit backlog item (the `fully-verifies` gaps rivet already prints). This turns the right side of the V from "allowed" into "driven."

### Gap 2: the recurring supervisory loop is human-pasted prose, not a skill
The "issue-hunt + architecture-sync + release-watch (every 4h)" loop is re-pasted by the human each fire (~15+ times). It works, but:
- No skill encapsulates the cadence, so each fire re-derives the procedure; baselines live only in memory files I maintain by hand.
- Back-to-back fires (the cron fired several times within minutes) have no skill-level "is this a real interval or a no-op delta?" — I improvised a delta-sweep + graceful no-op, but that's exactly the kind of thing a skill should standardize.
- The "session-only cron / durable flag is a no-op" reality had to be discovered and reported honestly rather than being a known property.

**Proposal:** a `supervisory-release-watch-loop` skill: baseline/last-seen tracking, delta detection + graceful no-op when nothing moved, the per-piece TEST-ALONG invocation, the *respectful-upstream* discipline (problem + exact repro + downstream impact + concrete fix; don't nag an active maintainer), and confirm-green-before-merge for any artifact it ships. This is distinct from `pulseengine-feature-loop` — jess's day-to-day is *supervisory/integration*, not feature-authoring, and the feature-loop's steps 5–6 (witness MC/DC, sigil) are perpetually N/A for this kind of work (the skill itself flags "recurring N/A is a backlog item" — for a release-watch hub that signal mostly mis-fires).

### Gap 3: cross-repo coordination is prose-in-issues + an anchor workaround
Coordination with relay/gale/synth/kiln/wit-bindgen happened through GitHub issues (e.g. jess#62 ↔ relay#214) and worked *socially* very well — relay froze its HAL design against jess's direction, gale opened the BYO-OS issue, the wit-bindgen no-grow branch got a downstream-verification report. But:
- The cross-repo rivet graph "fails to resolve" (per `suppliers.yaml`), so jess anchors every supplier dependency at an `EXTERNALANCHOR-*` boundary rather than tracing into the supplier's actual rivet artifacts. The V-model traceability therefore *stops at the supplier boundary* — there's no typed end-to-end trace from a jess requirement through to a relay component's verification. Coordination decisions live as prose in issue threads, not as rivet links.

**Proposal:** either make cross-repo rivet externals resolvable (so `defect-against`/`satisfies` can point into a supplier's graph), or document the external-anchor pattern as the *intended* boundary with a lightweight "supplier-claim" artifact that records what the supplier committed to (so the coordination isn't only in issue prose).

### How the human steering felt from the inside (you asked specifically)
- **Terse high-context directives work — because of memory.** "Do path 1", "Yes", "do so", "also the closed" are unambiguous *in session* and efficient. They depend entirely on maintained context; the auto-memory files are what make them survive compaction. Worth knowing this is a hard dependency — without the memory discipline these would be unrecoverable after a context reset.
- **Mid-stream architecture-framing corrections are load-bearing and need re-encoding.** Twice the framing shifted under me: a "gale stands alone" overclaim I had to walk back, and a "two bind paths" framing the human corrected to "single-path all-wasm" — *after* I'd already encoded it in a merged DD (DD-018), forcing a correction PR. Lesson worth a skill/practice: **surface framing assumptions explicitly and get them confirmed before encoding them in rivet**, because rivet artifacts are sticky and a wrong frame propagates into merged history. A "decision is provisional until the framing is confirmed" checkpoint would have saved a round-trip.
- **Honesty-about-scope is actively rewarded.** Reporting "console-reached, NOT nsh" and "validated the cabi_realloc mechanism on a raw module, not the full component→meld→synth end-to-end" was the right call every time. The operating contract supports this well; it's the single most important behavioral property for this kind of work and it should stay front-and-center.

### Concrete skill suggestions (summary)
1. `close-the-v` / `verification-mapping` — drive executed-test → `verifies` → requirement-`verified`, with execution evidence as a tracked artifact; surface untested requirements (the `fully-verifies` gaps) as backlog. **(directly addresses the requirements-without-verification gap)**
2. `supervisory-release-watch-loop` — encapsulate the recurring poll/delta/no-op/per-piece-test/respectful-upstream/confirm-green loop, distinct from the feature loop.
3. Cross-repo traceability — resolvable externals, or a documented supplier-claim artifact so coordination isn't only issue-prose.
4. A "confirm-the-framing-before-encoding" checkpoint for architecture decisions destined for rivet.

Happy to prototype any of these against jess as the proving ground (jess already has the CI oracles + the supplier feedback loop to test them on).

— filed from the `jess` agent; trailer convention: Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Experience report: PulseEngine skills on the jess hardware hub — verification/test-mapping gap, a supervisory-loop skill, cross-repo trace, and steering notes #90

Experience report — PulseEngine skills in practice on the `jess` hardware-integration hub

Context

What worked well (keep these)

Gap 1 (the one you flagged): requirements pile up, verification/test-mapping lags

Gap 2: the recurring supervisory loop is human-pasted prose, not a skill

Gap 3: cross-repo coordination is prose-in-issues + an anchor workaround

How the human steering felt from the inside (you asked specifically)

Concrete skill suggestions (summary)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Experience report: PulseEngine skills on the jess hardware hub — verification/test-mapping gap, a supervisory-loop skill, cross-repo trace, and steering notes #90

Description

Experience report — PulseEngine skills in practice on the jess hardware-integration hub

Context

What worked well (keep these)

Gap 1 (the one you flagged): requirements pile up, verification/test-mapping lags

Gap 2: the recurring supervisory loop is human-pasted prose, not a skill

Gap 3: cross-repo coordination is prose-in-issues + an anchor workaround

How the human steering felt from the inside (you asked specifically)

Concrete skill suggestions (summary)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Experience report — PulseEngine skills in practice on the `jess` hardware-integration hub