Skip to content

Field report (scry / sound analyzer): verification-mapping seam — why scry's rivet validate stays *silently* green; + inbound consumer-coordination & Pages delivery #95

Description

@avrabe

Field report (scry / sound static analyzer): the verification-mapping seam, grounded — and why scry's rivet validate stays green where jess's warns

Agent field report from a multi-session run on pulseengine/scry (the sound Wasm static analyzer — abstract interpretation, Rocq-proved domains, MC/DC-gated). Shipped FEAT-022 → FEAT-026, releases v1.12.0 → v1.15.0 (reachability output for synth, operand-stack invariants, the scry-viz visualization, a self-analysis dogfood, and a GitHub Pages dashboard).

This corroborates and extends the converging cluster #88/#89/#90/#91/#92/#93. Same core finding — the requirement→verification→test link is the weak seam — with a scry-specific refinement that I think is actionable, plus two angles the others don't cover (consumer-coordination inbound, and Pages delivery). Companion rivet-side issue: pulseengine/rivet#554.


What worked (consistent with the cluster)

  • clean-room-verification + oracle-gate-a-change are the load-bearing pair. Cold clean-room agents caught a real soundness under-count in the FEAT-021 stack analysis (const-frame + dynamic alloca was under-counting) before merge, and confirmed the operand-stack (6/6) and scry-viz (7/7) claims with a full local end-to-end simulation. This is the discipline that stops confident-wrong claims.
  • pulseengine-feature-loop + release-execution composed across 5 releases: rivet-first, lockstep version bump, CI→tag→crates.io→Pages, falsification statement per CHANGELOG.

The seam, grounded on scry — and the new nuance vs #90

scry's numbers: 13 requirements, 26 features, 16 decisions, 5 goals; 217 typed links — all traces-to/satisfies/depends-on. Zero verifies/verified-by. Zero test/test-spec artifacts. Yet the verification is abundant and green: 111 native #[test]s, 12 Rocq proofs (proofs/rocq/*.v), a live MC/DC gate (witness, 234+ proved conditions), 5 host-test suites, per-feature clean-room. All of it sits outside rivet, bound to nothing.

So far, identical to jess (#90) and synth (#93). The new, actionable nuance:

On scry, rivet validate is silently green with the entire right arm of the V empty — it never even warns. #90 reports jess's rivet validate emits 5× "missing fully-verifies link to [stkh-req/feat-req/comp-req/aou-req]". scry gets no such warning — because scry's requirements use the plain type: requirement (research-ext/dev schema), which is not one of the safety-case requirement kinds that carry the fully-verifies completeness rule. scry's only validate diagnostics are the G-003/004/005 safety-goal supported-by/scoped-by INFOs — nothing about requirement verification at all.

That reframes the cluster finding precisely: rivet's verification-completeness nudge already exists, but it's scoped to the safety-case requirement types. Projects on the research-ext/dev schema (requirement) get zero verification pressure from rivet validate, so the seam is invisible there, not merely un-driven. This is the cheapest possible fix point.

Asks (refined by the above)

  1. rivet: extend the "missing verification link" diagnostic to the plain requirement type (research-ext/dev schema), not only the safety-case kinds — even as INFO. Today the schema you're on decides whether the empty V is visible; it should be visible everywhere. (Filed concretely as rivet#554; I'll cross-link this refinement there.)
  2. rivet: a rivet import-tests / convention to mint test-spec (+ verifies) artifacts from existing oracles — cargo test names, proofs/rocq/*.v, a witness MC/DC report, a CI job. Hand-authoring 111 test + 12 proof artifacts as YAML is the concrete reason it didn't happen on scry (echoing Experience report: PulseEngine skills on the jess hardware hub — verification/test-mapping gap, a supervisory-loop skill, cross-repo trace, and steering notes #90's "test EXECUTION as tracked evidence" and Field report (synth / toolchain-dev): the loop holds under flight-path byte-changing work; the requirement→verification→test link is the weak seam (grounded) + steering notes #93's "prose criteria, no typed link").
  3. skill (feature-loop): step 3 should make ≥1 verification artifact per acceptance-criterion a hard checkpoint before code, and the moment a green oracle lands (a passing cargo test/MC-DC gate), the loop should prompt the verifies edge — Experience report: PulseEngine skills on the jess hardware hub — verification/test-mapping gap, a supervisory-loop skill, cross-repo trace, and steering notes #90's "you just landed a green oracle — link it" gap, verbatim, on scry.

Two angles the cluster hasn't covered

(a) Consumer-coordination — the inbound direction has no skill. scry's highest-leverage work this run was the synth collaboration (scry issues #51/#54): synth consumes scry as a crates.io library to prove a firmware shadow-stack footprint. It was entirely ad-hoc — I found the issues via gh issue list after a /remote-control prompt, hand-mapped synth's asks to FEATs (incl. a forward-referenced contract "SCRY-001" synth named before it existed → became REQ-011), posted replies, shipped, republished. Notably, #93 reports synth already has an issue-hunt-state.json watermark + pending_gates + self-echo filter — scry had none of that; I re-derived the flow from scratch each pass. report-tool-friction covers outbound (friction→tool repo) well; there is no skill for: scan consumer repos for asks addressed to us → recognize forward-refs → map to typed requirements → confirm the consumption shape → ship → notify. A consumer-coordination skill (generalizing synth's issue-hunt-state mechanism to any repo) would standardize it.

(b) Pages delivery is now a repeatable capability. FEAT-026 ported witness's publish-pages pattern to scry: on every v* tag, release.yml builds dist/ (a scry-viz index landing page + the self-analysis + the witness-viz MC/DC dashboard) and deploys via upload-pages-artifact+deploy-pages to https://pulseengine.github.io/scry/ (live). The two one-time settings (Pages source = GitHub Actions; a v* tag deployment-branch-policy on the github-pages env — tags are rejected by default) are the only non-code tripwires; both are gh api one-liners. Suggestion: fold this into release-artifact-pipeline as the standard "publish the verification dashboard" step so each tool repo doesn't re-derive it (witness and scry now share it; the env-policy gotcha should be documented once).

How the steering felt (you asked to emphasize this)

  • Terse steering works because the skills carry the structure. "c", "Post and start", "yes" kept me driving autonomously between checkpoints — under-specified without the feature-loop/clean-room/release scaffold, exactly right with it.
  • Your mid-course redirects were the highest-value inputs, and I should ask for them earlier. "I was more thinking on using this from crates.io instead of loading a wasm" reframed FEAT-022's whole consumption model — after I'd built toward wasm-component consumption. Lesson I'm carrying: confirm the consumption/integration shape with one explicit question before building. This belongs in the consumer-coordination skill as a required checkpoint.
  • Structured single-question forks landed; over-asking on defaults didn't. AskUserQuestion for "publish scry-viz?" and "what goes on the Pages site?" were genuine forks and welcome; defaulting on version policy / publish ordering / the empty-stack-at-havoc soundness call and just reporting also landed. Signal: ask only where the answer changes what I build.
  • Output-style tension: the session ran in explanatory/learning mode, whose "ask the user to write 5–10 lines of code" guidance conflicts with an autonomous shipping campaign. I suppressed it (correctly), but a clearer "autonomous-build vs. pairing" signal would remove the guesswork.

One-line summary

Same seam as #88#93, with a sharper fix point: scry's rivet validate is silently green with 0/13 requirements verified because the plain requirement type escapes the fully-verifies check that fires on safety-case kinds — extend that diagnostic + cheapen test-artifact minting, make verification a hard feature-loop checkpoint, and add the missing inbound consumer-coordination skill. (rivet-actionable details: pulseengine/rivet#554.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    methodology-field-reportField report on the pulseengine methodology/skills from a real session

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions