Experience report: pulseengine-claude skills in a long multi-repo session (issue-hunt, /loop, cross-repo coordination, requirement→test visibility, instruction style)

## Experience report — PulseEngine Claude skills in a long multi-repo session

Filed at the maintainer's request: an honest account of how the
`pulseengine-claude` skills (v0.8.0) performed across a multi-day,
multi-repo session driving meld, with attention to **issue-hunt**, the
**`/loop`** self-pacing, **cross-repo coordination**, the
**requirements→verification→test mapping**, and — explicitly requested —
**how the human instructs/adjusts the agent**. Grounded in one concrete
arc: meld#298 (drop vestigial `cabi_realloc` to unblock `--memory shared`)
→ meld#300 (commit the inter-component isolation model) and the
coordination with the wit-bindgen fork / gale / synth around it.

### TL;DR
- The **standing-contract memory** (operating-contract, philosophy, oracle-gating, "hallucinations cost more than silence", Tier-5-Mythos-before-merge) is the strongest part — it reliably stopped me from shipping unsound code and kept claims tied to tool output. Keep it.
- **issue-hunt** and **`/loop`** compose well for *bounded* work but have no good *termination/parking* story when work becomes externally gated — the loop keeps firing with nothing to do, which pressures toward make-work.
- **Cross-repo coordination has no skill.** The most consequential work this session (meld#300, an ASIL-D isolation-model decision spanning meld/gale/synth) was done ad-hoc. This is the clearest gap.
- The **requirements→verification→test mapping exists and is complete in coverage** (all 41 SRs), but is **invisible** (separate file from the requirements) and **honestly partial** (24/41 `status: partial`). The skill *allows* it; the gap is *surfacing* and *completion*, not capability.

---

### 1. What worked well
- **Memory/situational-awareness hooks**: session-start situational awareness + working-context checkpoints survived compaction cleanly. After a context summary I resumed the #298 design (the rewriter.rs coupling, the over-drop hazard) without re-deriving it. High value.
- **operating-contract as a guardrail**: "ground every claim in a tool result", "never merge around a red/absent gate", "fail-safe toward keeping". Concretely: I *paused* the corruption-critical #298 drop wiring rather than rush it autonomously — the contract made that the obvious call, not a judgment I had to invent.
- **oracle-gate-a-change**: the "write the failing oracle first" discipline produced the most durable artifacts of the session (the #298 kill-criterion + real-artifact blocker tests) before any risky code.
- **clean-room / Mythos Tier-5 gate**: knowing "this can't merge without a clean-room pass" kept me from treating a green local test as done.

### 2. issue-hunt — how it works and how it felt
The watermark model (per-repo `.json`, "pull what changed since last pass, triage, land in rivet, advance watermark only on success") is sound and the anti-patterns list is good. Friction:
- **No convergence when the tracker is quiet.** A pass that finds "only the issues I already worked" has no defined action other than "park". The skill says the exit condition is *a release*, but in a quiet window with an in-flight decision (meld#300) there's no release to cut and no new issues — the skill doesn't describe what a *no-op pass* should do, so the loop tends to manufacture activity. A "nothing actionable; park and report why" branch would help.
- **Triage→rivet friction**: landing an architecture *decision* (meld#300) didn't fit the requirement/bug/optimization taxonomy cleanly. I recorded it as an **ADR** (safety/adr/ADR-4) instead — which felt right, but the skill points at rivet requirements, not ADRs, for decisions. The decision-vs-requirement distinction could be explicit.

### 3. `/loop` (dynamic mode) — how it felt to self-pace
- **Cache-window guidance is good** and I used it (270s vs 1200–1800s reasoning). But there's a **termination gap**: the loop is built to continue, and "stop when work is genuinely done/gated" requires me to override the momentum. Several iterations this session were honestly *parked* (waiting on a human decision + another repo), and the right move was a long fallback + "no make-work" — but nothing in the skill blesses *parking*, so it reads as under-performing the loop.
- **Bare-token ambiguity**: the human typed `300` mid-loop. I parsed it as a 300s interval (the loop's rule-1 grammar primes that reading); they meant **issue #300**. The loop's leading-token rule (`^\d+[smhd]$`) made "a bare number = time" the salient interpretation. Suggestion: when a bare token is ambiguous (no unit, could be an issue ref in an active issue context), **ask** rather than assume — or the loop skill could note that bare integers are ambiguous.

### 4. Cross-repo coordination — the biggest gap
meld#300 was a genuine multi-repo architecture decision: synth (maintainer avrabe) asked meld to commit to an isolation model (single shared memory + MPU-carve **vs** multi-memory structural isolation) because gale#86's design forked on it. Resolving it meant: reading the cross-repo context, grounding meld's actual capability (it already supports both via `MemoryStrategy`), surfacing the decision to the human, posting a committed answer, and propagating consequences (reclassifying meld#298/#299 secondary, recording ADR-4, noting the synth#369/#404 follow-up).

**None of the 13 skills covers this.** issue-hunt and release-planning are single-repo. There's no skill for: detect a cross-repo coordination issue → gather each repo's real state → identify the decision that's actually mine vs theirs → surface it → record the commitment with traceability → propagate to the dependent issues/repos. This is recurring in this toolchain (the rivet.yaml `externals:` block — kiln/synth/etc. — shows cross-repo trace is a first-class concern) and was the highest-stakes work of the session, done entirely ad-hoc.

**Proposed skill: `cross-repo-coordination`** (or `coordinate-decision`). Inputs: a coordination issue + the `externals` map. Steps: (1) ground each repo's real state (don't take the issue's framing at face value — I had to correct an overstated gap this session); (2) classify each open sub-decision as *ours / theirs / joint*; (3) for ours, surface to the human (it's usually a judgment call with safety implications); (4) record the commitment as an ADR + propagate (relabel dependent issues, cross-link); (5) hand the rest back with a crisp "what we need from you". This would have structured exactly what I did manually.

### 5. Requirements → verification → test mapping (the maintainer's specific concern)
The concern was: *"I see many requirements but not the verification and the mapping of the actual test — does the skill allow this?"*

**It does, and it's already there — it's just not visible from where you read requirements.** Grounded in meld today:
- `safety/requirements/safety-requirements.yaml` — each SR carries `verification-method` (test | proof) + `verification-description`.
- `safety/requirements/traceability.yaml` → `verification-status:` — **all 41 SRs** map to `implementation-files`, `tests` (fully-qualified test names), `proofs`, and a `status`.
- `traceability-audit` (skill) + `release-execution`'s "traceability completeness gate" consume this; `rivet` is a real binary and there's a `verification-gate.yml` CI workflow.

So capability is **not** the gap. Two real gaps explain the "I don't see it" perception:
1. **Discoverability / co-location.** The SR→test mapping lives in a *different file* (`traceability.yaml`) from the SR text (`safety-requirements.yaml`). Reading the requirements file, you see requirements with no tests beside them. A rendered view (or a `rivet` command) that shows, per requirement: *text · verification-method · mapped test names · status* inline would close this immediately. **Proposed**: a small skill/command `traceability-render` (or extend `traceability-audit` with a human-facing report) that emits the V per requirement.
2. **Honesty of completion.** Of 41 SRs, **24 are `status: partial`** (e.g. "test exists; no count-based completeness verification"), 10 `verified`, 6 `implemented`, 4 `resolved`, 1 `not-verified`. So for most requirements the test binding is real but *weak* — which is honest, but means "the mapping" a reader wants (requirement ⇒ a test that actually discharges it) genuinely is incomplete for the majority. The skill should make `partial` loud, because it's the true state and the thing to burn down.

### 6. How you instruct/adjust me — what works, what's ambiguous (explicitly requested)
Candid, since you asked:
- **Standing directives work best.** "concentrate on meld", "Tier-5 needs Mythos before merge", "fail-safe toward keeping", "hallucinations cost more than silence" — these are durable and I applied them across many turns without re-prompting. More of these up front > per-step correction.
- **Per-step `yes` / "do it as you said" is efficient but compounds scope ambiguity.** "Start 298 *and* integration/embedded-rt-no-grow" bundled two tasks; I had to decide ordering and how far to take each. When an instruction bundles work, an explicit "do X first, then Y" or "X is the priority" saves me guessing.
- **Mid-course corrections were high-value and I'd welcome more, earlier.** "we partially do already together with gale and jess" corrected an overstated gap; "concentrate on meld" pulled me back from fork drift. These caught real drift. A periodic "is your current scope still right?" from me (or a skill that re-confirms scope each loop iteration) would catch it sooner.
- **One concrete miscommunication**: `300` (issue ref) read as `300` (seconds) — see §3. Bare tokens mid-loop are ambiguous; I should ask, and/or the loop grammar should flag it.
- **What would help me serve you better**: (a) when you start a loop, a one-line "definition of done / when to stop" so parking is unambiguous; (b) priority ordering when bundling tasks; (c) treat my AskUserQuestion as the place to put the *decision* (the `Auto`-default policy question from #300 is still open) — a couple of pending decisions accumulate because the loop keeps moving.

### Concrete asks out of this report
1. Add a **`cross-repo-coordination`** skill (§4).
2. Add a human-facing **traceability render** (or extend `traceability-audit`) so requirement→test→status is visible co-located with requirements (§5.1); and make `partial` prominent (§5.2).
3. Give `/loop` + issue-hunt an explicit **park / no-actionable-work** branch and a **definition-of-done** input, so a gated/quiet loop stops cleanly instead of pressuring toward make-work (§2, §3).
4. Disambiguate **bare integer tokens** in `/loop` (interval vs issue ref) — ask, don't assume (§3, §6).

Happy to turn any of these into PRs against `claude-tooling/plugins/pulseengine-claude`.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Experience report: pulseengine-claude skills in a long multi-repo session (issue-hunt, /loop, cross-repo coordination, requirement→test visibility, instruction style) #91

Experience report — PulseEngine Claude skills in a long multi-repo session

TL;DR

1. What worked well

2. issue-hunt — how it works and how it felt

3. `/loop` (dynamic mode) — how it felt to self-pace

4. Cross-repo coordination — the biggest gap

5. Requirements → verification → test mapping (the maintainer's specific concern)

6. How you instruct/adjust me — what works, what's ambiguous (explicitly requested)

Concrete asks out of this report

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Experience report: pulseengine-claude skills in a long multi-repo session (issue-hunt, /loop, cross-repo coordination, requirement→test visibility, instruction style) #91

Description

Experience report — PulseEngine Claude skills in a long multi-repo session

TL;DR

1. What worked well

2. issue-hunt — how it works and how it felt

3. /loop (dynamic mode) — how it felt to self-pace

4. Cross-repo coordination — the biggest gap

5. Requirements → verification → test mapping (the maintainer's specific concern)

6. How you instruct/adjust me — what works, what's ambiguous (explicitly requested)

Concrete asks out of this report

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

3. `/loop` (dynamic mode) — how it felt to self-pace