diff --git a/docs/02-injection-pipeline.md b/docs/02-injection-pipeline.md index 7ecc1f1..6522a65 100644 --- a/docs/02-injection-pipeline.md +++ b/docs/02-injection-pipeline.md @@ -30,6 +30,7 @@ This document explains how vercel-plugin decides **which skills to inject**, **w - [Vercel.json Key-Aware Routing](#verceljson-key-aware-routing) - [Profiler Boost](#profiler-boost) - [Setup Mode Routing](#setup-mode-routing) + - [Route-Scoped Verified Policy Recall](#route-scoped-verified-policy-recall) - [Unified Ranker](#unified-ranker) 7. [Dedup State Machine](#dedup-state-machine) - [Three State Sources](#three-state-sources) @@ -481,6 +482,61 @@ effectivePriority = base + vercelJsonAdjustment + profilerBoost When the project is greenfield (`VERCEL_PLUGIN_GREENFIELD=true`), the `bootstrap` skill gets a massive priority boost of **+50**, ensuring it's always injected first. If `bootstrap` didn't naturally match the tool call, it's synthetically added to the match set. +### Route-Scoped Verified Policy Recall + +**Source**: `hooks/src/policy-recall.mts` → `selectPolicyRecallCandidates()`, integrated in `pretooluse-skill-inject.mts` at Stage 4.95 + +After all pattern-matched skills are ranked and before injection, the hook checks whether an **active verification story** with a non-null `targetBoundary` exists. If so, it queries the project's routing policy for historically winning skills that pattern matching missed. + +**Preconditions** (all must be true): +1. `cwd` and `sessionId` are available +2. An active verification story exists (via `loadCachedPlanResult` → `selectPrimaryStory`) +3. `primaryNextAction.targetBoundary` is non-null + +**Lookup precedence** (first qualifying bucket wins — no cross-bucket merging): +1. **Exact route** — e.g. `PreToolUse|flow-verification|clientRequest|Bash|/settings` +2. **Wildcard route** — e.g. `PreToolUse|flow-verification|clientRequest|Bash|*` +3. **Legacy 4-part key** — e.g. `PreToolUse|flow-verification|clientRequest|Bash` + +**Qualification thresholds** (same conservatism as `derivePolicyBoost`): +- Minimum 3 exposures +- Minimum 65% success rate (weighted: `directiveWins` count at 0.25×) +- Minimum +2 policy boost + +**Tie-breaking** is deterministic: `recallScore` DESC → `exposures` DESC → skill name ASC (lexicographic). + +**Insertion behavior** — recalled skills are **bounded second-order candidates**, not slot-1 overrides: +- When direct pattern matches exist: recalled skill inserts at index 1 (behind the top direct match) +- When no direct matches exist: recalled skill takes index 0 +- At most 1 recalled skill per PreToolUse invocation (`maxCandidates: 1`) +- Skills already in `rankedSkills` or `injectedSkills` (dedup) are excluded + +**How recall differs from ordinary policy boosts**: +- Policy boosts adjust `effectivePriority` of already-matched skills — they only amplify what pattern matching found +- Policy recall **injects a skill that pattern matching missed entirely**, based on historical verification evidence +- Recalled skills are marked `synthetic: true` in the routing decision trace +- Recalled skills use `trigger: "policy-recall"` and `reasonCode: "route-scoped-verified-policy-recall"` in injection metadata +- Recalled skills are NOT forced to summary-only mode (the summary and full payloads are identical via `skillInvocationMessage`) + +**Trace output**: Recalled candidates appear in `ranked[]` with: +```json +{ + "skill": "verification", + "synthetic": true, + "pattern": { + "type": "policy-recall", + "value": "route-scoped-verified-policy-recall" + } +} +``` + +**When recall is skipped**, a `policy-recall-skipped` log line is emitted with reason `no_active_verification_story` or `no_target_boundary`. + +**Observability**: +- `policy-recall-lookup` is emitted before any recalled skill is inserted +- It includes `requestedScenario`, `checkedScenarios[]`, `selectedBucket`, `selectedSkills[]`, `rejected[]`, and `hintCodes[]` +- This is the canonical machine-readable explanation for why route-scoped recall did or did not fire + ### Unified Ranker **Source**: `patterns.mts:rankEntries()` diff --git a/docs/06-runtime-internals.md b/docs/06-runtime-internals.md index 551fe1b..3a6ce15 100644 --- a/docs/06-runtime-internals.md +++ b/docs/06-runtime-internals.md @@ -28,6 +28,7 @@ This document covers implementation details that go beyond the pipeline overview - [Profiler Boost (+5)](#profiler-boost-5) - [Vercel.json Key Routing (±10)](#verceljson-key-routing-10) - [Special-Case Boosts](#special-case-boosts) + - [Route-Scoped Verified Policy Recall](#route-scoped-verified-policy-recall) - [Ranking Function](#ranking-function) - [Budget Enforcement](#budget-enforcement) - [Prompt Signal Scoring](#prompt-signal-scoring) @@ -507,6 +508,7 @@ Every matched skill receives an **effective priority** computed from its base pr | Setup-mode bootstrap | **+50** | `bootstrap` skill | Greenfield or ≥3 bootstrap hints | | TSX review trigger | **+40** | `react-best-practices` | After N `.tsx` edits (default 3) | | Dev-server verify | **+45** | `agent-browser-verify` | Dev server command detected | +| Policy recall | *splice at idx 1* | Any verified skill | Active story + target boundary + policy evidence | ### Base Priority Range (4–8) @@ -556,6 +558,79 @@ If the skill's associated key **exists** in `vercel.json` → +10. If the skill - **Dev-server verify (+45)**: On `npm run dev`, `next dev`, `vercel dev`, etc., injects `agent-browser-verify` + `verification` companion. Capped at 2 injections per session (loop guard). - **Vercel env help**: One-time injection when `vercel env add/update/pull` commands are detected. +### Route-Scoped Verified Policy Recall + +**Source**: `hooks/src/policy-recall.mts` → `selectPolicyRecallCandidates()` + +Policy recall is a **post-ranking injection stage** (Stage 4.95) that fires between ranking and skill body loading. It is fundamentally different from policy boosts: + +| Aspect | Policy Boost | Policy Recall | +|--------|-------------|---------------| +| Input | Skill already matched by patterns | Skill **not** matched by patterns | +| Effect | Adjusts `effectivePriority` | Splices skill into `rankedSkills` array | +| Trace field | `policyBoost` (number) | `synthetic: true`, `pattern.type: "policy-recall"` | +| Reason code | `"policy-boost"` | `"route-scoped-verified-policy-recall"` | +| Trigger | Always (when policy data exists) | Only when active verification story + target boundary exist | + +**Selector algorithm** (`selectPolicyRecallCandidates`): + +1. Generate scenario key candidates via `scenarioKeyCandidates()` — exact route, wildcard (`*`), legacy 4-part key +2. For each candidate key (in precedence order), look up the policy bucket +3. Filter entries: `exposures >= 3`, `successRate >= 0.65`, `policyBoost >= 2`, not in `excludeSkills` +4. Sort: `recallScore` DESC → `exposures` DESC → `skill` ASC +5. Return first qualifying bucket's top `maxCandidates` entries (default 1) — no cross-bucket merging + +**Recall score formula**: +``` +recallScore = derivePolicyBoost(stats) × 1000 + + round(successRate × 100) × 10 + + directiveWins × 5 + + wins + − staleMisses +``` + +Where `successRate = (wins + directiveWins × 0.25) / max(exposures, 1)`. + +**Insertion semantics**: The recalled skill is spliced at `index = rankedSkills.length > 0 ? 1 : 0`, ensuring it never preempts the strongest direct match. It then flows through normal budget enforcement and cap logic. + +**Synthetic trace marking**: All recalled skills are added to the `syntheticSkills` set and appear in the routing decision trace with: +```json +{ + "skill": "", + "synthetic": true, + "pattern": { "type": "policy-recall", "value": "route-scoped-verified-policy-recall" }, + "summaryOnly": false +} +``` + +**Log events**: +- `policy-recall-injected` (debug): Emitted per recalled skill with `skill`, `scenario`, `insertionIndex`, `exposures`, `wins`, `directiveWins`, `successRate`, `policyBoost`, `recallScore` +- `policy-recall-skipped` (debug): Emitted when preconditions fail, with `reason`: `"no_active_verification_story"` or `"no_target_boundary"` +- `policy-recall-lookup` (debug): Emitted before any recalled skill is inserted, with `requestedScenario`, `checkedScenarios[]`, `selectedBucket`, `selectedSkills[]`, `rejected[]`, and `hintCodes[]` + +### Routing Doctor (`session-explain --json`) + +`session-explain` includes an additive `doctor` object that explains the latest routing decision without changing routing behavior. + +```json +{ + "doctor": { + "latestDecisionId": "abc123", + "latestScenario": "PreToolUse|flow-verification|clientRequest|Bash|/settings", + "latestRanked": [], + "policyRecall": { + "selectedBucket": "PreToolUse|flow-verification|clientRequest|Bash|/settings", + "selected": [], + "rejected": [], + "hints": [] + }, + "hints": [] + } +} +``` + +The contract is additive-only and intended for downstream agents, CI diagnostics, and local operator debugging. + ### Ranking Function **Source**: `hooks/src/patterns.mts` → `rankEntries()` diff --git a/docs/skill-injection.md b/docs/skill-injection.md index f813a3b..564307f 100644 --- a/docs/skill-injection.md +++ b/docs/skill-injection.md @@ -764,3 +764,64 @@ This catches `cookies()` calls without `await`, but skips client components (whi | `VERCEL_PLUGIN_LOG_LEVEL` | `off` | — | `off` / `summary` / `debug` / `trace` | | `VERCEL_PLUGIN_HOOK_DEDUP` | — | — | `off` to disable dedup entirely | | `VERCEL_PLUGIN_AUDIT_LOG_FILE` | — | — | Audit log path or `off` | + +--- + +## Learned Routing Rulebook & Capsule Provenance + +When the routing-policy compiler promotes verified rules into a **Learned Routing Rulebook**, the ranking pipeline can apply per-rule boosts at injection time. Every decision capsule records which rule (if any) fired via the `rulebookProvenance` field, so downstream consumers never need to re-derive ranking state. + +### Canonical Rulebook JSON + +```json +{ + "version": 1, + "createdAt": "2026-03-28T08:15:00.000Z", + "sessionId": "sess_123", + "rules": [ + { + "id": "PreToolUse|flow-verification|uiRender|Bash|agent-browser-verify", + "scenario": "PreToolUse|flow-verification|uiRender|Bash", + "skill": "agent-browser-verify", + "action": "promote", + "boost": 8, + "confidence": 0.93, + "reason": "replay verified: no regressions, learned routing matched winning skill", + "sourceSessionId": "sess_123", + "promotedAt": "2026-03-28T08:15:00.000Z", + "evidence": { + "baselineWins": 4, + "baselineDirectiveWins": 2, + "learnedWins": 4, + "learnedDirectiveWins": 2, + "regressionCount": 0 + } + } + ] +} +``` + +### Decision Capsule Provenance + +When a rulebook rule fires, the capsule includes: + +```json +{ + "rulebookProvenance": { + "matchedRuleId": "PreToolUse|flow-verification|uiRender|Bash|agent-browser-verify", + "ruleBoost": 8, + "ruleReason": "replay verified: no regressions, learned routing matched winning skill", + "rulebookPath": "/tmp/vercel-plugin-routing-policy--rulebook.json" + } +} +``` + +When no rule fires, the field is `null`: + +```json +{ + "rulebookProvenance": null +} +``` + +Each ranked entry in the capsule's `ranked` array also carries the per-skill fields `matchedRuleId`, `ruleBoost`, `ruleReason`, and `rulebookPath` for full traceability. diff --git a/generated/build-from-skills.manifest.json b/generated/build-from-skills.manifest.json index dcadd99..460895f 100644 --- a/generated/build-from-skills.manifest.json +++ b/generated/build-from-skills.manifest.json @@ -1,6 +1,6 @@ { "version": 1, - "generatedAt": "2026-03-23T18:09:18.986Z", + "generatedAt": "2026-03-28T19:42:43.133Z", "templates": [ { "template": "agents/ai-architect.md.tmpl", diff --git a/generated/skill-catalog.md b/generated/skill-catalog.md index 3c2632f..2d98f28 100644 --- a/generated/skill-catalog.md +++ b/generated/skill-catalog.md @@ -1,7 +1,7 @@ # Skill Catalog > Auto-generated by `scripts/generate-catalog.ts` — do not edit manually. -> Generated: 2026-03-23T18:07:35.703Z +> Generated: 2026-03-28T00:24:28.407Z > Skills: 39 ## Table of Contents diff --git a/generated/skill-manifest.json b/generated/skill-manifest.json index 8b34ceb..601375f 100644 --- a/generated/skill-manifest.json +++ b/generated/skill-manifest.json @@ -1,6 +1,12 @@ { - "generatedAt": "2026-03-23T18:09:21.758Z", + "generatedAt": "2026-03-28T19:42:43.059Z", "version": 2, + "excludedSkills": [ + { + "slug": "fake-banned-test-skill", + "reason": "test-only-pattern" + } + ], "skills": { "vercel-agent": { "priority": 4, diff --git a/hooks/cli-routing-replay.mjs b/hooks/cli-routing-replay.mjs new file mode 100644 index 0000000..e950d82 --- /dev/null +++ b/hooks/cli-routing-replay.mjs @@ -0,0 +1,33 @@ +// hooks/src/cli-routing-replay.mts +import { replayRoutingSession } from "./routing-replay.mjs"; +import { createLogger } from "./logger.mjs"; +var log = createLogger(); +var sessionId = process.argv[2]; +if (!sessionId) { + log.summary("cli_error", { reason: "missing_session_id" }); + process.stderr.write( + JSON.stringify({ + ok: false, + error: "missing_session_id", + usage: "node cli-routing-replay.mjs " + }) + "\n" + ); + process.exit(1); +} +try { + const report = replayRoutingSession(sessionId); + log.summary("cli_complete", { + sessionId, + traceCount: report.traceCount, + scenarioCount: report.scenarioCount, + recommendationCount: report.recommendations.length + }); + process.stdout.write(JSON.stringify(report, null, 2) + "\n"); +} catch (err) { + const message = err instanceof Error ? err.message : String(err); + log.summary("cli_error", { reason: "replay_failed", message }); + process.stderr.write( + JSON.stringify({ ok: false, error: "replay_failed", message }) + "\n" + ); + process.exit(2); +} diff --git a/hooks/companion-distillation.mjs b/hooks/companion-distillation.mjs new file mode 100644 index 0000000..c4da194 --- /dev/null +++ b/hooks/companion-distillation.mjs @@ -0,0 +1,193 @@ +// hooks/src/companion-distillation.mts +import { + createEmptyCompanionRulebook +} from "./learned-companion-rulebook.mjs"; +import { createLogger } from "./logger.mjs"; +function precision(wins, support) { + return support === 0 ? 0 : wins / support; +} +function round4(value) { + return Number(value.toFixed(4)); +} +function distillCompanionRules(params) { + const log = createLogger(); + const generatedAt = params.generatedAt ?? (/* @__PURE__ */ new Date()).toISOString(); + const minSupport = params.minSupport ?? 4; + const minPrecision = params.minPrecision ?? 0.75; + const minLift = params.minLift ?? 1.25; + const maxStaleMissDelta = params.maxStaleMissDelta ?? 0.1; + log.summary("companion-distillation.start", { + exposureCount: params.exposures.length, + traceCount: params.traces.length, + minSupport, + minPrecision, + minLift, + maxStaleMissDelta + }); + const rulebook = createEmptyCompanionRulebook( + params.projectRoot, + generatedAt + ); + const byGroup = /* @__PURE__ */ new Map(); + for (const exposure of params.exposures) { + if (!exposure.exposureGroupId) continue; + const list = byGroup.get(exposure.exposureGroupId) ?? []; + list.push(exposure); + byGroup.set(exposure.exposureGroupId, list); + } + log.summary("companion-distillation.grouped", { + groupCount: byGroup.size, + skippedNoGroupId: params.exposures.filter((e) => !e.exposureGroupId).length + }); + const pairBuckets = /* @__PURE__ */ new Map(); + const candidateBaseline = /* @__PURE__ */ new Map(); + for (const [groupId, group] of byGroup) { + const candidate = group.find((e) => e.attributionRole === "candidate"); + if (!candidate) continue; + const outcome = candidate.outcome; + const scenario = [ + candidate.hook, + candidate.storyKind ?? "none", + candidate.targetBoundary ?? "none", + candidate.toolName, + candidate.route ?? "*" + ].join("|"); + const baselineKey = `${scenario}::${candidate.skill}`; + const baseline = candidateBaseline.get(baselineKey) ?? { + support: 0, + wins: 0, + staleMisses: 0 + }; + baseline.support += 1; + if (outcome === "win" || outcome === "directive-win") baseline.wins += 1; + if (outcome === "stale-miss") baseline.staleMisses += 1; + candidateBaseline.set(baselineKey, baseline); + for (const context of group.filter( + (e) => e.attributionRole === "context" + )) { + const key = `${scenario}::${candidate.skill}::${context.skill}`; + const bucket = pairBuckets.get(key) ?? { + scenario, + hook: candidate.hook, + storyKind: candidate.storyKind, + targetBoundary: candidate.targetBoundary, + toolName: candidate.toolName, + routeScope: candidate.route, + candidateSkill: candidate.skill, + companionSkill: context.skill, + support: 0, + winsWithCompanion: 0, + directiveWinsWithCompanion: 0, + staleMissesWithCompanion: 0, + sourceExposureGroupIds: [] + }; + bucket.support += 1; + if (outcome === "win" || outcome === "directive-win") + bucket.winsWithCompanion += 1; + if (outcome === "directive-win") bucket.directiveWinsWithCompanion += 1; + if (outcome === "stale-miss") bucket.staleMissesWithCompanion += 1; + bucket.sourceExposureGroupIds.push(groupId); + pairBuckets.set(key, bucket); + } + } + log.summary("companion-distillation.buckets", { + pairBucketCount: pairBuckets.size, + baselineCount: candidateBaseline.size + }); + const rules = []; + for (const bucket of pairBuckets.values()) { + const baseline = candidateBaseline.get( + `${bucket.scenario}::${bucket.candidateSkill}` + ); + if (!baseline) continue; + const winsWithoutCompanion = Math.max( + baseline.wins - bucket.winsWithCompanion, + 0 + ); + const supportWithoutCompanion = Math.max( + baseline.support - bucket.support, + 0 + ); + const precisionWithCompanion = precision( + bucket.winsWithCompanion, + bucket.support + ); + const baselinePrecisionWithoutCompanion = precision( + winsWithoutCompanion, + supportWithoutCompanion + ); + const liftVsCandidateAlone = baselinePrecisionWithoutCompanion === 0 ? precisionWithCompanion : precisionWithCompanion / baselinePrecisionWithoutCompanion; + const staleRateWithCompanion = precision( + bucket.staleMissesWithCompanion, + bucket.support + ); + const staleRateWithoutCompanion = precision( + Math.max(baseline.staleMisses - bucket.staleMissesWithCompanion, 0), + supportWithoutCompanion + ); + const staleMissDelta = staleRateWithCompanion - staleRateWithoutCompanion; + const promoted = bucket.support >= minSupport && precisionWithCompanion >= minPrecision && liftVsCandidateAlone >= minLift && staleMissDelta <= maxStaleMissDelta; + const rule = { + id: `${bucket.scenario}::${bucket.candidateSkill}->${bucket.companionSkill}`, + scenario: bucket.scenario, + hook: bucket.hook, + storyKind: bucket.storyKind, + targetBoundary: bucket.targetBoundary, + toolName: bucket.toolName, + routeScope: bucket.routeScope, + candidateSkill: bucket.candidateSkill, + companionSkill: bucket.companionSkill, + support: bucket.support, + winsWithCompanion: bucket.winsWithCompanion, + winsWithoutCompanion, + directiveWinsWithCompanion: bucket.directiveWinsWithCompanion, + staleMissesWithCompanion: bucket.staleMissesWithCompanion, + precisionWithCompanion: round4(precisionWithCompanion), + baselinePrecisionWithoutCompanion: round4( + baselinePrecisionWithoutCompanion + ), + liftVsCandidateAlone: round4(liftVsCandidateAlone), + staleMissDelta: round4(staleMissDelta), + confidence: promoted ? "promote" : "holdout-fail", + promotedAt: promoted ? generatedAt : null, + reason: promoted ? "companion beats candidate-alone within same verified scenario" : "insufficient support or lift", + sourceExposureGroupIds: [...bucket.sourceExposureGroupIds].sort() + }; + rules.push(rule); + log.summary("companion-distillation.rule-evaluated", { + id: rule.id, + confidence: rule.confidence, + support: rule.support, + precisionWithCompanion: rule.precisionWithCompanion, + liftVsCandidateAlone: rule.liftVsCandidateAlone, + staleMissDelta: rule.staleMissDelta + }); + } + rules.sort( + (a, b) => a.scenario.localeCompare(b.scenario) || a.candidateSkill.localeCompare(b.candidateSkill) || a.companionSkill.localeCompare(b.companionSkill) + ); + rulebook.rules = rules; + const promotedCount = rules.filter( + (r) => r.confidence === "promote" + ).length; + rulebook.replay = { + baselineWins: 0, + learnedWins: 0, + deltaWins: 0, + regressions: [] + }; + rulebook.promotion = { + accepted: true, + errorCode: null, + reason: `${promotedCount} promoted companion rules` + }; + log.summary("companion-distillation.complete", { + totalRules: rules.length, + promotedCount, + holdoutFailCount: rules.length - promotedCount + }); + return rulebook; +} +export { + distillCompanionRules +}; diff --git a/hooks/companion-recall.mjs b/hooks/companion-recall.mjs new file mode 100644 index 0000000..2c7a817 --- /dev/null +++ b/hooks/companion-recall.mjs @@ -0,0 +1,71 @@ +// hooks/src/companion-recall.mts +import { loadCompanionRulebook } from "./learned-companion-rulebook.mjs"; +import { + scenarioKeyCandidates +} from "./routing-policy.mjs"; +import { createLogger } from "./logger.mjs"; +function recallVerifiedCompanions(params) { + const log = createLogger(); + const loaded = loadCompanionRulebook(params.projectRoot); + if (!loaded.ok) { + log.summary("companion-recall.load-error", { + code: loaded.error.code, + message: loaded.error.message + }); + return { selected: [], checkedScenarios: [], rejected: [] }; + } + const checkedScenarios = scenarioKeyCandidates(params.scenario); + const selected = []; + const rejected = []; + const selectedCompanions = /* @__PURE__ */ new Set(); + log.summary("companion-recall.lookup", { + checkedScenarios, + candidateSkills: params.candidateSkills, + excludeCount: params.excludeSkills.size, + maxCompanions: params.maxCompanions, + rulebookRuleCount: loaded.rulebook.rules.length + }); + for (const scenario of checkedScenarios) { + const matching = loaded.rulebook.rules.filter( + (rule) => rule.scenario === scenario && rule.confidence === "promote" && params.candidateSkills.includes(rule.candidateSkill) + ).sort( + (a, b) => b.liftVsCandidateAlone - a.liftVsCandidateAlone || b.support - a.support || a.companionSkill.localeCompare(b.companionSkill) + ); + for (const rule of matching) { + if (selected.length >= params.maxCompanions) break; + if (selectedCompanions.has(rule.companionSkill)) continue; + if (params.excludeSkills.has(rule.companionSkill)) { + rejected.push({ + candidateSkill: rule.candidateSkill, + companionSkill: rule.companionSkill, + scenario, + rejectedReason: "excluded" + }); + continue; + } + selected.push({ + candidateSkill: rule.candidateSkill, + companionSkill: rule.companionSkill, + scenario, + confidence: rule.liftVsCandidateAlone, + reason: rule.reason + }); + selectedCompanions.add(rule.companionSkill); + } + } + log.summary("companion-recall.result", { + selectedCount: selected.length, + rejectedCount: rejected.length, + checkedScenarioCount: checkedScenarios.length, + selected: selected.map((s) => ({ + candidate: s.candidateSkill, + companion: s.companionSkill, + scenario: s.scenario, + lift: s.confidence + })) + }); + return { selected, checkedScenarios, rejected }; +} +export { + recallVerifiedCompanions +}; diff --git a/hooks/hooks.json b/hooks/hooks.json index d763e2d..ecd563d 100644 --- a/hooks/hooks.json +++ b/hooks/hooks.json @@ -89,6 +89,16 @@ } ] }, + { + "matcher": "Read|Edit|Write|Glob|Grep|WebFetch", + "hooks": [ + { + "type": "command", + "command": "node \"${CLAUDE_PLUGIN_ROOT}/hooks/posttooluse-verification-observe.mjs\"", + "timeout": 5 + } + ] + }, { "matcher": "Write|Edit", "hooks": [ diff --git a/hooks/learned-companion-rulebook.mjs b/hooks/learned-companion-rulebook.mjs new file mode 100644 index 0000000..eadcb40 --- /dev/null +++ b/hooks/learned-companion-rulebook.mjs @@ -0,0 +1,223 @@ +// hooks/src/learned-companion-rulebook.mts +import { createHash, randomUUID } from "crypto"; +import { + readFileSync, + writeFileSync, + renameSync +} from "fs"; +import { tmpdir } from "os"; +import { join } from "path"; +import { createLogger } from "./logger.mjs"; +function companionRulebookPath(projectRoot) { + const hash = createHash("sha256").update(projectRoot).digest("hex"); + return `${tmpdir()}/vercel-plugin-learned-companions-${hash}.json`; +} +function serializeCompanionRulebook(rulebook) { + const sorted = { + ...rulebook, + rules: [...rulebook.rules].sort( + (a, b) => a.scenario.localeCompare(b.scenario) || a.candidateSkill.localeCompare(b.candidateSkill) || a.companionSkill.localeCompare(b.companionSkill) + ) + }; + return JSON.stringify(sorted, null, 2) + "\n"; +} +function validateCompanionRulebookSchema(parsed) { + if (typeof parsed !== "object" || parsed === null || Array.isArray(parsed)) { + return { + code: "COMPANION_RULEBOOK_SCHEMA_INVALID", + message: "Companion rulebook must be a JSON object", + detail: { receivedType: typeof parsed } + }; + } + const obj = parsed; + if (obj.version !== 1) { + return { + code: "COMPANION_RULEBOOK_VERSION_UNSUPPORTED", + message: `Unsupported companion rulebook version: ${String(obj.version)}`, + detail: { version: obj.version, supportedVersions: [1] } + }; + } + if (typeof obj.generatedAt !== "string") { + return { + code: "COMPANION_RULEBOOK_SCHEMA_INVALID", + message: "Missing or invalid generatedAt field", + detail: { field: "generatedAt", receivedType: typeof obj.generatedAt } + }; + } + if (typeof obj.projectRoot !== "string") { + return { + code: "COMPANION_RULEBOOK_SCHEMA_INVALID", + message: "Missing or invalid projectRoot field", + detail: { field: "projectRoot", receivedType: typeof obj.projectRoot } + }; + } + if (!Array.isArray(obj.rules)) { + return { + code: "COMPANION_RULEBOOK_SCHEMA_INVALID", + message: "Missing or invalid rules field", + detail: { field: "rules", receivedType: typeof obj.rules } + }; + } + for (let i = 0; i < obj.rules.length; i++) { + const rule = obj.rules[i]; + if (typeof rule !== "object" || rule === null) { + return { + code: "COMPANION_RULEBOOK_SCHEMA_INVALID", + message: `Rule at index ${i} is not an object`, + detail: { index: i, receivedType: typeof rule } + }; + } + const requiredStrings = [ + "id", + "scenario", + "candidateSkill", + "companionSkill", + "reason" + ]; + for (const field of requiredStrings) { + if (typeof rule[field] !== "string") { + return { + code: "COMPANION_RULEBOOK_SCHEMA_INVALID", + message: `Rule at index ${i} has invalid ${field}`, + detail: { index: i, field, receivedType: typeof rule[field] } + }; + } + } + const requiredNumbers = [ + "support", + "winsWithCompanion", + "winsWithoutCompanion", + "precisionWithCompanion", + "baselinePrecisionWithoutCompanion", + "liftVsCandidateAlone", + "staleMissDelta" + ]; + for (const field of requiredNumbers) { + if (typeof rule[field] !== "number") { + return { + code: "COMPANION_RULEBOOK_SCHEMA_INVALID", + message: `Rule at index ${i} has invalid ${field}`, + detail: { index: i, field, receivedType: typeof rule[field] } + }; + } + } + const validConfidence = ["candidate", "promote", "holdout-fail"]; + if (!validConfidence.includes(rule.confidence)) { + return { + code: "COMPANION_RULEBOOK_SCHEMA_INVALID", + message: `Rule at index ${i} has invalid confidence: ${String(rule.confidence)}`, + detail: { index: i, field: "confidence", value: rule.confidence } + }; + } + } + if (typeof obj.replay !== "object" || obj.replay === null) { + return { + code: "COMPANION_RULEBOOK_SCHEMA_INVALID", + message: "Missing or invalid replay field", + detail: { field: "replay", receivedType: typeof obj.replay } + }; + } + if (typeof obj.promotion !== "object" || obj.promotion === null) { + return { + code: "COMPANION_RULEBOOK_SCHEMA_INVALID", + message: "Missing or invalid promotion field", + detail: { field: "promotion", receivedType: typeof obj.promotion } + }; + } + return null; +} +function createEmptyCompanionRulebook(projectRoot, generatedAt) { + return { + version: 1, + generatedAt, + projectRoot, + rules: [], + replay: { + baselineWins: 0, + learnedWins: 0, + deltaWins: 0, + regressions: [] + }, + promotion: { + accepted: true, + errorCode: null, + reason: "empty rulebook" + } + }; +} +function loadCompanionRulebook(projectRoot) { + const path = companionRulebookPath(projectRoot); + const log = createLogger(); + let raw; + try { + raw = readFileSync(path, "utf-8"); + } catch { + log.summary("learned-companion-rulebook.load-miss", { + path, + reason: "file_not_found" + }); + return { + ok: true, + rulebook: createEmptyCompanionRulebook( + projectRoot, + (/* @__PURE__ */ new Date(0)).toISOString() + ) + }; + } + let parsed; + try { + parsed = JSON.parse(raw); + } catch (err) { + const error = { + code: "COMPANION_RULEBOOK_READ_FAILED", + message: "Companion rulebook file contains invalid JSON", + detail: { path, parseError: String(err) } + }; + log.summary("learned-companion-rulebook.load-error", { + code: error.code, + path + }); + return { ok: false, error }; + } + const validationError = validateCompanionRulebookSchema(parsed); + if (validationError) { + log.summary("learned-companion-rulebook.load-error", { + code: validationError.code, + path, + detail: validationError.detail + }); + return { ok: false, error: validationError }; + } + const rulebook = parsed; + log.summary("learned-companion-rulebook.load-ok", { + path, + ruleCount: rulebook.rules.length, + promotedCount: rulebook.rules.filter((r) => r.confidence === "promote").length, + version: rulebook.version + }); + return { ok: true, rulebook }; +} +function saveCompanionRulebook(projectRoot, rulebook) { + const dest = companionRulebookPath(projectRoot); + const tempPath = join( + tmpdir(), + `vercel-plugin-companion-rulebook-${randomUUID()}.tmp` + ); + const log = createLogger(); + const content = serializeCompanionRulebook(rulebook); + writeFileSync(tempPath, content); + renameSync(tempPath, dest); + log.summary("learned-companion-rulebook.save", { + path: dest, + ruleCount: rulebook.rules.length, + promotedCount: rulebook.rules.filter((r) => r.confidence === "promote").length, + bytesWritten: Buffer.byteLength(content) + }); +} +export { + companionRulebookPath, + createEmptyCompanionRulebook, + loadCompanionRulebook, + saveCompanionRulebook, + serializeCompanionRulebook +}; diff --git a/hooks/learned-playbook-rulebook.mjs b/hooks/learned-playbook-rulebook.mjs new file mode 100644 index 0000000..5cdbce7 --- /dev/null +++ b/hooks/learned-playbook-rulebook.mjs @@ -0,0 +1,76 @@ +// hooks/src/learned-playbook-rulebook.mts +import { mkdirSync, readFileSync, writeFileSync } from "fs"; +import { dirname, join } from "path"; +import { createLogger } from "./logger.mjs"; +function playbookRulebookPath(projectRoot) { + return join(projectRoot, "generated", "learned-playbooks.json"); +} +function createEmptyPlaybookRulebook(projectRoot, generatedAt = (/* @__PURE__ */ new Date()).toISOString()) { + return { + version: 1, + generatedAt, + projectRoot, + rules: [], + replay: { + baselineWins: 0, + learnedWins: 0, + deltaWins: 0, + regressions: [] + }, + promotion: { + accepted: true, + errorCode: null, + reason: "No promoted playbooks" + } + }; +} +function savePlaybookRulebook(projectRoot, rulebook) { + const path = playbookRulebookPath(projectRoot); + mkdirSync(dirname(path), { recursive: true }); + writeFileSync(path, JSON.stringify(rulebook, null, 2) + "\n"); + createLogger().summary("learned-playbook-rulebook.save", { + path, + ruleCount: rulebook.rules.length, + promotedCount: rulebook.rules.filter((r) => r.confidence === "promote").length + }); +} +function loadPlaybookRulebook(projectRoot) { + const path = playbookRulebookPath(projectRoot); + try { + const raw = readFileSync(path, "utf-8"); + const parsed = JSON.parse(raw); + if (parsed?.version !== 1 || typeof parsed.generatedAt !== "string" || typeof parsed.projectRoot !== "string" || !Array.isArray(parsed.rules) || typeof parsed.replay !== "object" || typeof parsed.promotion !== "object") { + return { + ok: false, + error: { + code: "EINVALID", + message: `Invalid learned playbook rulebook at ${path}` + } + }; + } + return { ok: true, rulebook: parsed }; + } catch (error) { + if (error instanceof Error && "code" in error && error.code === "ENOENT") { + return { + ok: false, + error: { + code: "ENOENT", + message: `No learned playbook rulebook found at ${path}` + } + }; + } + return { + ok: false, + error: { + code: "EINVALID", + message: `Failed to read learned playbook rulebook at ${path}` + } + }; + } +} +export { + createEmptyPlaybookRulebook, + loadPlaybookRulebook, + playbookRulebookPath, + savePlaybookRulebook +}; diff --git a/hooks/learned-routing-rulebook.mjs b/hooks/learned-routing-rulebook.mjs new file mode 100644 index 0000000..76035db --- /dev/null +++ b/hooks/learned-routing-rulebook.mjs @@ -0,0 +1,195 @@ +// hooks/src/learned-routing-rulebook.mts +import { createHash, randomUUID } from "crypto"; +import { + readFileSync, + writeFileSync, + renameSync +} from "fs"; +import { tmpdir } from "os"; +import { join } from "path"; +import { createLogger } from "./logger.mjs"; +function rulebookPath(projectRoot) { + const hash = createHash("sha256").update(projectRoot).digest("hex"); + return `${tmpdir()}/vercel-plugin-routing-policy-${hash}-rulebook.json`; +} +function serializeRulebook(rulebook) { + const sorted = { + ...rulebook, + rules: [...rulebook.rules].sort( + (a, b) => a.scenario.localeCompare(b.scenario) || a.skill.localeCompare(b.skill) || a.id.localeCompare(b.id) + ) + }; + return JSON.stringify(sorted, null, 2) + "\n"; +} +function validateRulebookSchema(parsed) { + if (typeof parsed !== "object" || parsed === null || Array.isArray(parsed)) { + return { + code: "RULEBOOK_SCHEMA_INVALID", + message: "Rulebook must be a JSON object", + detail: { receivedType: typeof parsed } + }; + } + const obj = parsed; + if (obj.version !== 1) { + return { + code: "RULEBOOK_VERSION_UNSUPPORTED", + message: `Unsupported rulebook version: ${String(obj.version)}`, + detail: { version: obj.version, supportedVersions: [1] } + }; + } + if (typeof obj.createdAt !== "string") { + return { + code: "RULEBOOK_SCHEMA_INVALID", + message: "Missing or invalid createdAt field", + detail: { field: "createdAt", receivedType: typeof obj.createdAt } + }; + } + if (typeof obj.sessionId !== "string") { + return { + code: "RULEBOOK_SCHEMA_INVALID", + message: "Missing or invalid sessionId field", + detail: { field: "sessionId", receivedType: typeof obj.sessionId } + }; + } + if (!Array.isArray(obj.rules)) { + return { + code: "RULEBOOK_SCHEMA_INVALID", + message: "Missing or invalid rules field", + detail: { field: "rules", receivedType: typeof obj.rules } + }; + } + for (let i = 0; i < obj.rules.length; i++) { + const rule = obj.rules[i]; + if (typeof rule !== "object" || rule === null) { + return { + code: "RULEBOOK_SCHEMA_INVALID", + message: `Rule at index ${i} is not an object`, + detail: { index: i, receivedType: typeof rule } + }; + } + const requiredStrings = ["id", "scenario", "skill", "reason", "sourceSessionId", "promotedAt"]; + for (const field of requiredStrings) { + if (typeof rule[field] !== "string") { + return { + code: "RULEBOOK_SCHEMA_INVALID", + message: `Rule at index ${i} has invalid ${field}`, + detail: { index: i, field, receivedType: typeof rule[field] } + }; + } + } + if (rule.action !== "promote" && rule.action !== "demote") { + return { + code: "RULEBOOK_SCHEMA_INVALID", + message: `Rule at index ${i} has invalid action: ${String(rule.action)}`, + detail: { index: i, field: "action", value: rule.action } + }; + } + if (typeof rule.boost !== "number" || typeof rule.confidence !== "number") { + return { + code: "RULEBOOK_SCHEMA_INVALID", + message: `Rule at index ${i} has invalid boost or confidence`, + detail: { index: i, boost: rule.boost, confidence: rule.confidence } + }; + } + const evidence = rule.evidence; + if (typeof evidence !== "object" || evidence === null) { + return { + code: "RULEBOOK_SCHEMA_INVALID", + message: `Rule at index ${i} has invalid evidence`, + detail: { index: i, field: "evidence" } + }; + } + const evidenceNumbers = [ + "baselineWins", + "baselineDirectiveWins", + "learnedWins", + "learnedDirectiveWins", + "regressionCount" + ]; + for (const field of evidenceNumbers) { + if (typeof evidence[field] !== "number") { + return { + code: "RULEBOOK_SCHEMA_INVALID", + message: `Rule at index ${i} evidence has invalid ${field}`, + detail: { index: i, field, receivedType: typeof evidence[field] } + }; + } + } + } + return null; +} +function loadRulebook(projectRoot) { + const path = rulebookPath(projectRoot); + const log = createLogger(); + let raw; + try { + raw = readFileSync(path, "utf-8"); + } catch { + log.summary("learned-routing-rulebook.load-miss", { path, reason: "file_not_found" }); + return { + ok: true, + rulebook: createEmptyRulebook("", "") + }; + } + let parsed; + try { + parsed = JSON.parse(raw); + } catch (err) { + const error = { + code: "RULEBOOK_SCHEMA_INVALID", + message: "Rulebook file contains invalid JSON", + detail: { path, parseError: String(err) } + }; + log.summary("learned-routing-rulebook.load-error", { code: error.code, path }); + return { ok: false, error }; + } + const validationError = validateRulebookSchema(parsed); + if (validationError) { + log.summary("learned-routing-rulebook.load-error", { + code: validationError.code, + path, + detail: validationError.detail + }); + return { ok: false, error: validationError }; + } + log.summary("learned-routing-rulebook.load-ok", { + path, + ruleCount: parsed.rules.length, + version: parsed.version + }); + return { ok: true, rulebook: parsed }; +} +function saveRulebook(projectRoot, rulebook) { + const dest = rulebookPath(projectRoot); + const tempPath = join(tmpdir(), `vercel-plugin-rulebook-${randomUUID()}.tmp`); + const log = createLogger(); + const content = serializeRulebook(rulebook); + writeFileSync(tempPath, content); + renameSync(tempPath, dest); + log.summary("learned-routing-rulebook.save", { + path: dest, + ruleCount: rulebook.rules.length, + sessionId: rulebook.sessionId, + bytesWritten: Buffer.byteLength(content) + }); +} +function createEmptyRulebook(sessionId, createdAt) { + return { + version: 1, + createdAt, + sessionId, + rules: [] + }; +} +function createRule(params) { + const id = `${params.scenario}|${params.skill}`; + return { id, ...params }; +} +export { + createEmptyRulebook, + createRule, + loadRulebook, + rulebookPath, + saveRulebook, + serializeRulebook +}; diff --git a/hooks/playbook-distillation.mjs b/hooks/playbook-distillation.mjs new file mode 100644 index 0000000..c0fc17e --- /dev/null +++ b/hooks/playbook-distillation.mjs @@ -0,0 +1,184 @@ +// hooks/src/playbook-distillation.mts +import { + createEmptyPlaybookRulebook +} from "./learned-playbook-rulebook.mjs"; +import { createLogger } from "./logger.mjs"; +function round4(value) { + return Number(value.toFixed(4)); +} +function precision(wins, support) { + return support === 0 ? 0 : wins / support; +} +function orderedUnique(skills) { + const seen = /* @__PURE__ */ new Set(); + const out = []; + for (const skill of skills) { + if (!skill || seen.has(skill)) continue; + seen.add(skill); + out.push(skill); + } + return out; +} +function distillPlaybooks(params) { + const log = createLogger(); + const generatedAt = params.generatedAt ?? (/* @__PURE__ */ new Date()).toISOString(); + const minSupport = params.minSupport ?? 3; + const minPrecision = params.minPrecision ?? 0.75; + const minLift = params.minLift ?? 1.25; + const maxStaleMissDelta = params.maxStaleMissDelta ?? 0.1; + const maxSkills = Math.max(2, params.maxSkills ?? 3); + const rulebook = createEmptyPlaybookRulebook( + params.projectRoot, + generatedAt + ); + const byGroup = /* @__PURE__ */ new Map(); + for (const exposure of params.exposures) { + if (!exposure.exposureGroupId) continue; + const list = byGroup.get(exposure.exposureGroupId) ?? []; + list.push(exposure); + byGroup.set(exposure.exposureGroupId, list); + } + const playbookBuckets = /* @__PURE__ */ new Map(); + const anchorBaselines = /* @__PURE__ */ new Map(); + for (const [groupId, group] of byGroup) { + const candidate = group.find( + (e) => (e.attributionRole ?? "candidate") === "candidate" + ); + if (!candidate) continue; + if (candidate.outcome === "pending") continue; + const scenario = [ + candidate.hook, + candidate.storyKind ?? "none", + candidate.targetBoundary ?? "none", + candidate.toolName, + candidate.route ?? "*" + ].join("|"); + const orderedSkills = orderedUnique(group.map((e) => e.skill)).slice( + 0, + maxSkills + ); + const anchorSkill = candidate.candidateSkill ?? candidate.skill; + const baselineKey = `${scenario}::${anchorSkill}`; + const baseline = anchorBaselines.get(baselineKey) ?? { + support: 0, + wins: 0, + staleMisses: 0 + }; + baseline.support += 1; + if (candidate.outcome === "win" || candidate.outcome === "directive-win") { + baseline.wins += 1; + } + if (candidate.outcome === "stale-miss") { + baseline.staleMisses += 1; + } + anchorBaselines.set(baselineKey, baseline); + if (orderedSkills.length < 2) continue; + const bucketKey = `${scenario}::${orderedSkills.join(">")}`; + const bucket = playbookBuckets.get(bucketKey) ?? { + scenario, + hook: candidate.hook, + storyKind: candidate.storyKind, + targetBoundary: candidate.targetBoundary, + toolName: candidate.toolName, + routeScope: candidate.route, + anchorSkill, + orderedSkills, + support: 0, + wins: 0, + directiveWins: 0, + staleMisses: 0, + sourceExposureGroupIds: [] + }; + bucket.support += 1; + if (candidate.outcome === "win" || candidate.outcome === "directive-win") { + bucket.wins += 1; + } + if (candidate.outcome === "directive-win") { + bucket.directiveWins += 1; + } + if (candidate.outcome === "stale-miss") { + bucket.staleMisses += 1; + } + bucket.sourceExposureGroupIds.push(groupId); + playbookBuckets.set(bucketKey, bucket); + } + const rules = []; + for (const bucket of playbookBuckets.values()) { + const baseline = anchorBaselines.get( + `${bucket.scenario}::${bucket.anchorSkill}` + ); + if (!baseline) continue; + const supportWithoutPlaybook = Math.max( + baseline.support - bucket.support, + 0 + ); + const winsWithoutPlaybook = Math.max(baseline.wins - bucket.wins, 0); + const staleWithoutPlaybook = Math.max( + baseline.staleMisses - bucket.staleMisses, + 0 + ); + const precisionWithPlaybook = precision(bucket.wins, bucket.support); + const baselinePrecisionWithoutPlaybook = precision( + winsWithoutPlaybook, + supportWithoutPlaybook + ); + const liftVsAnchorBaseline = baselinePrecisionWithoutPlaybook === 0 ? precisionWithPlaybook : precisionWithPlaybook / baselinePrecisionWithoutPlaybook; + const staleRateWithPlaybook = precision( + bucket.staleMisses, + bucket.support + ); + const staleRateWithoutPlaybook = precision( + staleWithoutPlaybook, + supportWithoutPlaybook + ); + const staleMissDelta = staleRateWithPlaybook - staleRateWithoutPlaybook; + const promoted = bucket.support >= minSupport && precisionWithPlaybook >= minPrecision && liftVsAnchorBaseline >= minLift && staleMissDelta <= maxStaleMissDelta; + rules.push({ + id: `${bucket.scenario}::${bucket.orderedSkills.join(">")}`, + scenario: bucket.scenario, + hook: bucket.hook, + storyKind: bucket.storyKind, + targetBoundary: bucket.targetBoundary, + toolName: bucket.toolName, + routeScope: bucket.routeScope, + anchorSkill: bucket.anchorSkill, + orderedSkills: bucket.orderedSkills, + support: bucket.support, + wins: bucket.wins, + directiveWins: bucket.directiveWins, + staleMisses: bucket.staleMisses, + precision: round4(precisionWithPlaybook), + baselinePrecisionWithoutPlaybook: round4( + baselinePrecisionWithoutPlaybook + ), + liftVsAnchorBaseline: round4(liftVsAnchorBaseline), + staleMissDelta: round4(staleMissDelta), + confidence: promoted ? "promote" : "holdout-fail", + promotedAt: promoted ? generatedAt : null, + reason: promoted ? "verified ordered playbook beats same anchor without this exact sequence" : "insufficient support, precision, lift, or stale-miss performance", + sourceExposureGroupIds: [...bucket.sourceExposureGroupIds].sort() + }); + } + rules.sort( + (a, b) => a.scenario.localeCompare(b.scenario) || a.anchorSkill.localeCompare(b.anchorSkill) || a.orderedSkills.join(">").localeCompare(b.orderedSkills.join(">")) + ); + const promotedCount = rules.filter( + (r) => r.confidence === "promote" + ).length; + rulebook.rules = rules; + rulebook.promotion = { + accepted: true, + errorCode: null, + reason: `${promotedCount} promoted playbooks` + }; + log.summary("playbook-distillation.complete", { + exposureCount: params.exposures.length, + groupCount: byGroup.size, + ruleCount: rules.length, + promotedCount + }); + return rulebook; +} +export { + distillPlaybooks +}; diff --git a/hooks/playbook-recall.mjs b/hooks/playbook-recall.mjs new file mode 100644 index 0000000..dfb0c04 --- /dev/null +++ b/hooks/playbook-recall.mjs @@ -0,0 +1,77 @@ +// hooks/src/playbook-recall.mts +import { + loadPlaybookRulebook +} from "./learned-playbook-rulebook.mjs"; +import { + scenarioKeyCandidates +} from "./routing-policy.mjs"; +function rankRule(rule, candidateSkills) { + const anchorIdx = candidateSkills.indexOf(rule.anchorSkill); + return [ + anchorIdx === -1 ? Number.MAX_SAFE_INTEGER : anchorIdx, + -rule.support, + -rule.liftVsAnchorBaseline, + -rule.precision, + rule.id + ]; +} +function formatPlaybookBanner(selected) { + return [ + "", + "**[Verified Playbook]**", + `Anchor: \`${selected.anchorSkill}\``, + `Sequence: ${selected.orderedSkills.map((s) => `\`${s}\``).join(" \u2192 ")}`, + `Evidence: support=${selected.support}, precision=${selected.precision}, lift=${selected.lift}`, + "Use the sequence before inventing a new debugging workflow.", + "" + ].join("\n"); +} +function recallVerifiedPlaybook(params) { + const loaded = loadPlaybookRulebook(params.projectRoot); + if (!loaded.ok) { + return { selected: null, banner: null, rejected: [] }; + } + const exclude = new Set(params.excludeSkills ?? []); + const maxInsertedSkills = Math.max(0, params.maxInsertedSkills ?? 2); + const rejected = []; + for (const scenario of scenarioKeyCandidates(params.scenario)) { + const eligible = loaded.rulebook.rules.filter( + (rule) => rule.confidence === "promote" && rule.scenario === scenario && params.candidateSkills.includes(rule.anchorSkill) + ).sort((a, b) => { + const ra = rankRule(a, params.candidateSkills); + const rb = rankRule(b, params.candidateSkills); + return ra[0] - rb[0] || ra[1] - rb[1] || ra[2] - rb[2] || ra[3] - rb[3] || ra[4].localeCompare(rb[4]); + }); + for (const rule of eligible) { + const anchorPos = rule.orderedSkills.indexOf(rule.anchorSkill); + const tail = anchorPos === -1 ? rule.orderedSkills.slice(1) : rule.orderedSkills.slice(anchorPos + 1); + const insertedSkills = tail.filter((skill) => !exclude.has(skill)).slice(0, maxInsertedSkills); + if (insertedSkills.length === 0) { + rejected.push({ + ruleId: rule.id, + reason: "all_playbook_steps_already_present_or_no_budget" + }); + continue; + } + const selected = { + ruleId: rule.id, + scenario: rule.scenario, + anchorSkill: rule.anchorSkill, + orderedSkills: rule.orderedSkills, + insertedSkills, + support: rule.support, + precision: rule.precision, + lift: rule.liftVsAnchorBaseline + }; + return { + selected, + banner: formatPlaybookBanner(selected), + rejected + }; + } + } + return { selected: null, banner: null, rejected }; +} +export { + recallVerifiedPlaybook +}; diff --git a/hooks/policy-recall.mjs b/hooks/policy-recall.mjs new file mode 100644 index 0000000..cd253e9 --- /dev/null +++ b/hooks/policy-recall.mjs @@ -0,0 +1,42 @@ +// hooks/src/policy-recall.mts +import { + derivePolicyBoost, + scenarioKeyCandidates +} from "./routing-policy.mjs"; +function successRate(stats) { + const weightedWins = stats.wins + stats.directiveWins * 0.25; + return weightedWins / Math.max(stats.exposures, 1); +} +function recallScore(stats) { + return derivePolicyBoost(stats) * 1e3 + Math.round(successRate(stats) * 100) * 10 + stats.directiveWins * 5 + stats.wins - stats.staleMisses; +} +function selectPolicyRecallCandidates(policy, scenarioInput, options = {}) { + const maxCandidates = options.maxCandidates ?? 1; + const minExposures = options.minExposures ?? 3; + const minSuccessRate = options.minSuccessRate ?? 0.65; + const minBoost = options.minBoost ?? 2; + const exclude = new Set(options.excludeSkills ?? []); + for (const scenario of scenarioKeyCandidates(scenarioInput)) { + const bucket = policy.scenarios[scenario] ?? {}; + const candidates = Object.entries(bucket).map(([skill, stats]) => ({ + skill, + scenario, + exposures: stats.exposures, + wins: stats.wins, + directiveWins: stats.directiveWins, + staleMisses: stats.staleMisses, + successRate: successRate(stats), + policyBoost: derivePolicyBoost(stats), + recallScore: recallScore(stats) + })).filter((entry) => !exclude.has(entry.skill)).filter((entry) => entry.exposures >= minExposures).filter((entry) => entry.successRate >= minSuccessRate).filter((entry) => entry.policyBoost >= minBoost).sort( + (a, b) => b.recallScore - a.recallScore || b.exposures - a.exposures || a.skill.localeCompare(b.skill) + ); + if (candidates.length > 0) { + return candidates.slice(0, maxCandidates); + } + } + return []; +} +export { + selectPolicyRecallCandidates +}; diff --git a/hooks/posttooluse-verification-observe.mjs b/hooks/posttooluse-verification-observe.mjs index 23cc27a..1169170 100755 --- a/hooks/posttooluse-verification-observe.mjs +++ b/hooks/posttooluse-verification-observe.mjs @@ -6,6 +6,27 @@ import { resolve } from "path"; import { fileURLToPath } from "url"; import { generateVerificationId } from "./hook-env.mjs"; import { createLogger } from "./logger.mjs"; +import { redactCommand } from "./pretooluse-skill-inject.mjs"; +import { + recordObservation +} from "./verification-ledger.mjs"; +import { resolveBoundaryOutcome } from "./routing-policy-ledger.mjs"; +import { selectActiveStory } from "./verification-plan.mjs"; +import { + appendRoutingDecisionTrace, + createDecisionId +} from "./routing-decision-trace.mjs"; +import { + classifyVerificationSignal +} from "./verification-signal.mjs"; +import { + evaluateResolutionGate, + diagnosePendingExposureMatch +} from "./verification-closure-diagnosis.mjs"; +import { + buildVerificationClosureCapsule, + persistVerificationClosureCapsule +} from "./verification-closure-capsule.mjs"; function isVerificationReport(value) { if (typeof value !== "object" || value === null) return false; const obj = value; @@ -13,6 +34,52 @@ function isVerificationReport(value) { (b) => typeof b === "object" && b !== null && b.event === "verification.boundary_observed" ); } +var LOCAL_DEV_HOSTS = /* @__PURE__ */ new Set([ + "localhost", + "127.0.0.1", + "0.0.0.0", + "::1", + "[::1]" +]); +function isLocalVerificationUrl(rawUrl, env = process.env) { + try { + const url = new URL(rawUrl); + if (url.protocol !== "http:" && url.protocol !== "https:") return false; + const hostname = url.hostname.toLowerCase(); + if (LOCAL_DEV_HOSTS.has(hostname)) return true; + const configuredOrigin = envString(env, "VERCEL_PLUGIN_LOCAL_DEV_ORIGIN"); + if (!configuredOrigin) return false; + const configured = new URL(configuredOrigin); + return configured.host.toLowerCase() === url.host.toLowerCase(); + } catch { + return false; + } +} +function resolveObservedStory(plan, observedRoute, env = process.env) { + const explicit = envString(env, "VERCEL_PLUGIN_VERIFICATION_STORY_ID"); + if (explicit) return { storyId: explicit, method: "explicit-env" }; + if (observedRoute) { + const exact = plan.stories.filter((story) => story.route === observedRoute); + if (exact.length === 1) { + return { storyId: exact[0].id, method: "exact-route" }; + } + } + if (plan.activeStoryId) { + return { storyId: plan.activeStoryId, method: "active-story" }; + } + return { storyId: null, method: "none" }; +} +function resolveObservedStoryId(plan, observedRoute, env = process.env) { + return resolveObservedStory(plan, observedRoute, env).storyId; +} +function shouldResolveRoutingOutcome(event, env = process.env) { + if (event.boundary === "unknown") return false; + if (event.signalStrength !== "strong") return false; + if (event.toolName === "WebFetch") { + return isLocalVerificationUrl(event.command, env); + } + return true; +} var BOUNDARY_PATTERNS = [ // uiRender: browser/screenshot/playwright/puppeteer commands { boundary: "uiRender", pattern: /\b(open|launch|browse|screenshot|puppeteer|playwright|chromium|firefox|webkit)\b/i, label: "browser-tool" }, @@ -42,6 +109,157 @@ function classifyBoundary(command) { } return { boundary: "unknown", matchedPattern: "none" }; } +function classifyToolSignal(toolName, toolInput) { + if (toolName === "Read") { + const filePath = String(toolInput.file_path || ""); + if (!filePath) return null; + if (/\.env(\.\w+)?$/.test(filePath)) { + return { + boundary: "environment", + matchedPattern: "env-file-read", + signalStrength: "soft", + evidenceSource: "env-read", + summary: filePath + }; + } + if (/vercel\.json$/.test(filePath) || /\.vercel\/project\.json$/.test(filePath)) { + return { + boundary: "environment", + matchedPattern: "vercel-config-read", + signalStrength: "soft", + evidenceSource: "env-read", + summary: filePath + }; + } + if (/\.(log|out|err)$/.test(filePath) || /vercel-logs/.test(filePath) || /\.next\/.*server.*\.log/.test(filePath)) { + return { + boundary: "serverHandler", + matchedPattern: "log-file-read", + signalStrength: "soft", + evidenceSource: "log-read", + summary: filePath + }; + } + return null; + } + if (toolName === "WebFetch") { + const url = String(toolInput.url || ""); + if (!url) return null; + return { + boundary: "clientRequest", + matchedPattern: "web-fetch", + signalStrength: "strong", + evidenceSource: "http", + summary: url.slice(0, 200) + }; + } + if (toolName === "Grep") { + const path = String(toolInput.path || ""); + if (/\.(log|out|err)$/.test(path) || /logs?\//.test(path)) { + return { + boundary: "serverHandler", + matchedPattern: "log-grep", + signalStrength: "soft", + evidenceSource: "log-read", + summary: `grep ${toolInput.pattern || ""} in ${path}`.slice(0, 200) + }; + } + if (/\.env/.test(path)) { + return { + boundary: "environment", + matchedPattern: "env-grep", + signalStrength: "soft", + evidenceSource: "env-read", + summary: `grep ${toolInput.pattern || ""} in ${path}`.slice(0, 200) + }; + } + return null; + } + if (toolName === "Glob") { + const pattern = String(toolInput.pattern || ""); + if (/\*\.(log|out|err)/.test(pattern) || /logs?\//.test(pattern)) { + return { + boundary: "serverHandler", + matchedPattern: "log-glob", + signalStrength: "soft", + evidenceSource: "log-read", + summary: `glob ${pattern}`.slice(0, 200) + }; + } + if (/\.env/.test(pattern)) { + return { + boundary: "environment", + matchedPattern: "env-glob", + signalStrength: "soft", + evidenceSource: "env-read", + summary: `glob ${pattern}`.slice(0, 200) + }; + } + return null; + } + if (toolName === "Edit" || toolName === "Write") { + return null; + } + return null; +} +function buildBoundaryEvent(input) { + const env = input.env ?? process.env; + const redactedCommand = redactCommand(input.command).slice(0, 200); + const suggestedBoundary = env.VERCEL_PLUGIN_VERIFICATION_BOUNDARY || null; + const suggestedAction = env.VERCEL_PLUGIN_VERIFICATION_ACTION ? redactCommand(env.VERCEL_PLUGIN_VERIFICATION_ACTION).slice(0, 200) : null; + return { + event: "verification.boundary_observed", + boundary: input.boundary, + verificationId: input.verificationId, + command: redactedCommand, + matchedPattern: input.matchedPattern, + inferredRoute: input.inferredRoute, + timestamp: input.timestamp ?? (/* @__PURE__ */ new Date()).toISOString(), + suggestedBoundary, + suggestedAction, + matchedSuggestedAction: suggestedBoundary !== null && suggestedBoundary === input.boundary || suggestedAction !== null && suggestedAction === redactedCommand, + signalStrength: input.signalStrength ?? "strong", + evidenceSource: input.evidenceSource ?? "bash", + toolName: input.toolName ?? "Bash" + }; +} +function buildLedgerObservation(event, env = process.env) { + const storyIdValue = env.VERCEL_PLUGIN_VERIFICATION_STORY_ID; + const sourceMap = { + "bash": "bash", + "browser": "bash", + "http": "bash", + "log-read": "edit", + "env-read": "edit", + "file-read": "edit", + "unknown": "bash" + }; + return { + id: event.verificationId, + timestamp: event.timestamp, + source: sourceMap[event.evidenceSource] ?? "bash", + boundary: event.boundary === "unknown" ? null : event.boundary, + route: event.inferredRoute, + storyId: typeof storyIdValue === "string" && storyIdValue.trim() !== "" ? storyIdValue.trim() : null, + summary: event.command, + meta: { + matchedPattern: event.matchedPattern, + suggestedBoundary: event.suggestedBoundary, + suggestedAction: event.suggestedAction, + matchedSuggestedAction: event.matchedSuggestedAction, + toolName: event.toolName, + signalStrength: event.signalStrength, + evidenceSource: event.evidenceSource + } + }; +} +function envString(env, key) { + const value = env[key]; + return typeof value === "string" && value.trim() !== "" ? value.trim() : null; +} +function resolveObservedRoute(inferredRoute, env = process.env) { + return inferredRoute ?? envString(env, "VERCEL_PLUGIN_VERIFICATION_ROUTE"); +} var ROUTE_REGEX = /\b(?:app|pages|src\/pages|src\/app)\/([\w[\].-]+(?:\/[\w[\].-]+)*)/; var URL_ROUTE_REGEX = /https?:\/\/[^/\s]+(\/([\w-]+(?:\/[\w-]+)*))/; function inferRoute(command, recentEdits) { @@ -61,6 +279,15 @@ function inferRoute(command, recentEdits) { } return null; } +function inferRouteFromFilePath(filePath) { + const match = ROUTE_REGEX.exec(filePath); + if (match) { + const route = "/" + match[1].replace(/\/page\.\w+$/, "").replace(/\/route\.\w+$/, "").replace(/\/layout\.\w+$/, "").replace(/\/loading\.\w+$/, "").replace(/\/error\.\w+$/, "").replace(/\[([^\]]+)\]/g, ":$1"); + return route === "/" ? "/" : route.replace(/\/$/, ""); + } + return null; +} +var SUPPORTED_TOOLS = /* @__PURE__ */ new Set(["Bash", "Read", "Edit", "Write", "Glob", "Grep", "WebFetch"]); function parseInput(raw, logger) { const trimmed = (raw || "").trim(); if (!trimmed) return null; @@ -71,14 +298,16 @@ function parseInput(raw, logger) { return null; } const toolName = input.tool_name || ""; - if (toolName !== "Bash") return null; + if (!SUPPORTED_TOOLS.has(toolName)) return null; const toolInput = input.tool_input || {}; - const command = toolInput.command || ""; - if (!command) return null; + if (toolName === "Bash") { + const command = toolInput.command || ""; + if (!command) return null; + } const sessionId = input.session_id || null; const cwdCandidate = input.cwd ?? input.working_directory; const cwd = typeof cwdCandidate === "string" && cwdCandidate.trim() !== "" ? cwdCandidate : null; - return { command, sessionId, cwd }; + return { toolName, toolInput, sessionId, cwd }; } function run(rawInput) { const log = createLogger(); @@ -94,28 +323,243 @@ function run(rawInput) { } const parsed = parseInput(raw, log); if (!parsed) { - log.debug("verification-observe-skip", { reason: "no_bash_input" }); + log.debug("verification-observe-skip", { reason: "no_supported_input" }); return "{}"; } - const { command, sessionId } = parsed; - const { boundary, matchedPattern } = classifyBoundary(command); - if (boundary === "unknown") { - log.trace("verification-observe-skip", { reason: "no_boundary_match", command: command.slice(0, 120) }); + const { toolName, toolInput, sessionId } = parsed; + const env = process.env; + const signal = classifyVerificationSignal({ toolName, toolInput, env }); + if (!signal) { + log.trace("verification-observe-skip", { + reason: "no_boundary_match", + toolName + }); return "{}"; } + if (signal.boundary === "unknown") { + log.trace("verification-observe-skip", { + reason: "no_boundary_match", + toolName, + summary: signal.summary.slice(0, 120) + }); + return "{}"; + } + const { boundary, matchedPattern, signalStrength, evidenceSource, summary } = signal; const verificationId = generateVerificationId(); - const recentEdits = process.env.VERCEL_PLUGIN_RECENT_EDITS || ""; - const inferredRoute = inferRoute(command, recentEdits); - const boundaryEvent = { - event: "verification.boundary_observed", + const recentEdits = env.VERCEL_PLUGIN_RECENT_EDITS || ""; + let inferredRoute; + if (toolName === "Bash") { + inferredRoute = resolveObservedRoute(inferRoute(summary, recentEdits), env); + } else { + const filePath = String(toolInput.file_path || toolInput.path || toolInput.url || ""); + inferredRoute = resolveObservedRoute( + inferRouteFromFilePath(filePath) ?? inferRoute(summary, recentEdits), + env + ); + } + const boundaryEvent = buildBoundaryEvent({ + command: summary, boundary, - verificationId, - command: command.slice(0, 200), matchedPattern, inferredRoute, - timestamp: (/* @__PURE__ */ new Date()).toISOString() - }; + verificationId, + signalStrength, + evidenceSource, + toolName + }); log.summary("verification.boundary_observed", boundaryEvent); + if (sessionId) { + const plan = recordObservation( + sessionId, + buildLedgerObservation(boundaryEvent), + { + agentBrowserAvailable: process.env.VERCEL_PLUGIN_AGENT_BROWSER_AVAILABLE !== "0", + lastAttemptedAction: process.env.VERCEL_PLUGIN_VERIFICATION_ACTION || null + }, + log + ); + log.summary("verification.plan_feedback", { + verificationId, + toolName, + signalStrength, + evidenceSource, + matchedSuggestedAction: boundaryEvent.matchedSuggestedAction, + satisfiedBoundaries: Array.from(plan.satisfiedBoundaries).sort(), + missingBoundaries: [...plan.missingBoundaries], + primaryNextAction: plan.primaryNextAction, + blockedReasons: [...plan.blockedReasons] + }); + const activeStory = plan.stories.length > 0 ? selectActiveStory(plan) : null; + const storyResolution = resolveObservedStory( + { + stories: plan.stories.map((s) => ({ id: s.id, route: s.route })), + activeStoryId: activeStory?.id ?? null + }, + inferredRoute, + env + ); + const gate = evaluateResolutionGate( + { + boundary: boundaryEvent.boundary, + signalStrength, + toolName, + command: boundaryEvent.command + }, + env + ); + const exposureDiagnosis = boundaryEvent.boundary === "unknown" ? null : diagnosePendingExposureMatch({ + sessionId, + boundary: boundaryEvent.boundary, + storyId: storyResolution.storyId, + route: inferredRoute + }); + let resolved = []; + if (gate.eligible && boundaryEvent.boundary !== "unknown") { + resolved = resolveBoundaryOutcome({ + sessionId, + boundary: boundaryEvent.boundary, + matchedSuggestedAction: boundaryEvent.matchedSuggestedAction, + storyId: storyResolution.storyId, + route: inferredRoute, + now: boundaryEvent.timestamp + }); + } else { + log.debug("verification.routing-policy-skipped", { + verificationId, + boundary: boundaryEvent.boundary, + toolName, + blockingReasonCodes: gate.blockingReasonCodes, + signalStrength + }); + } + if (gate.eligible && resolved.length === 0) { + log.debug("verification.routing-policy-unresolved", { + verificationId, + boundary: boundaryEvent.boundary, + toolName, + storyId: storyResolution.storyId, + route: inferredRoute, + unresolvedReasonCodes: exposureDiagnosis?.unresolvedReasonCodes ?? [ + "no_exact_pending_match" + ], + pendingBoundaryCount: exposureDiagnosis?.pendingBoundaryCount ?? 0 + }); + } + const closureCapsule = buildVerificationClosureCapsule({ + sessionId, + verificationId, + toolName, + createdAt: boundaryEvent.timestamp, + observation: { + boundary: boundaryEvent.boundary, + signalStrength, + evidenceSource, + matchedPattern, + command: boundaryEvent.command, + inferredRoute, + matchedSuggestedAction: boundaryEvent.matchedSuggestedAction + }, + storyResolution: { + resolvedStoryId: storyResolution.storyId, + method: storyResolution.method, + activeStoryId: activeStory?.id ?? null, + activeStoryKind: activeStory?.kind ?? null, + activeStoryRoute: activeStory?.route ?? null + }, + gate, + exposureDiagnosis, + resolvedExposures: resolved, + plan: { + activeStoryId: plan.activeStoryId ?? null, + satisfiedBoundaries: plan.satisfiedBoundaries, + missingBoundaries: [...plan.missingBoundaries], + blockedReasons: [...plan.blockedReasons], + primaryNextAction: plan.primaryNextAction ? { + action: plan.primaryNextAction.action, + targetBoundary: plan.primaryNextAction.targetBoundary, + reason: plan.primaryNextAction.reason + } : null + } + }); + const capsulePath = persistVerificationClosureCapsule( + closureCapsule, + log + ); + log.summary("verification.routing-policy-resolution-gate", { + verificationId, + toolName, + boundary: boundaryEvent.boundary, + inferredRoute, + resolvedStoryId: storyResolution.storyId, + storyResolutionMethod: storyResolution.method, + resolutionEligible: gate.eligible, + blockingReasonCodes: gate.blockingReasonCodes, + exactPendingMatchCount: exposureDiagnosis?.exactMatchCount ?? 0, + capsulePath + }); + if (resolved.length > 0) { + const outcomeKind = boundaryEvent.matchedSuggestedAction ? "directive-win" : "win"; + log.summary("verification.routing-policy-resolved", { + verificationId, + boundary: boundaryEvent.boundary, + storyId: storyResolution.storyId, + route: inferredRoute, + resolvedCount: resolved.length, + outcomeKind, + skills: resolved.map((e) => e.skill) + }); + } + const redactedTarget = toolName === "Bash" ? redactCommand(summary).slice(0, 200) : summary.slice(0, 200); + const decisionId = createDecisionId({ + hook: "PostToolUse", + sessionId, + toolName, + toolTarget: redactedTarget, + timestamp: boundaryEvent.timestamp + }); + appendRoutingDecisionTrace({ + version: 2, + decisionId, + sessionId, + hook: "PostToolUse", + toolName, + toolTarget: redactedTarget, + timestamp: boundaryEvent.timestamp, + primaryStory: { + id: storyResolution.storyId, + kind: activeStory?.kind ?? null, + storyRoute: activeStory?.route ?? inferredRoute, + targetBoundary: boundaryEvent.boundary === "unknown" ? null : boundaryEvent.boundary + }, + observedRoute: inferredRoute, + policyScenario: storyResolution.storyId ? `PostToolUse|${activeStory?.kind ?? "none"}|${boundaryEvent.boundary}|${toolName}` : null, + matchedSkills: [], + injectedSkills: [], + skippedReasons: [ + ...storyResolution.storyId ? [] : ["no_active_verification_story"], + ...gate.blockingReasonCodes.map((code) => `gate:${code}`), + ...gate.eligible && resolved.length === 0 ? (exposureDiagnosis?.unresolvedReasonCodes ?? ["no_exact_pending_match"]).map( + (code) => `resolution:${code}` + ) : [] + ], + ranked: [], + verification: { + verificationId, + observedBoundary: boundaryEvent.boundary, + matchedSuggestedAction: boundaryEvent.matchedSuggestedAction + }, + causes: [], + edges: [] + }); + log.summary("routing.decision_trace_written", { + decisionId, + hook: "PostToolUse", + verificationId, + boundary: boundaryEvent.boundary, + toolName, + signalStrength + }); + } log.complete("verification-observe-done", { matchedCount: 1, injectedCount: 0 @@ -147,9 +591,20 @@ if (isMainModule()) { } } export { + buildBoundaryEvent, + buildLedgerObservation, classifyBoundary, + classifyToolSignal, + classifyVerificationSignal, + envString, inferRoute, + isLocalVerificationUrl, isVerificationReport, parseInput, - run + redactCommand, + resolveObservedRoute, + resolveObservedStory, + resolveObservedStoryId, + run, + shouldResolveRoutingOutcome }; diff --git a/hooks/pretooluse-skill-inject.mjs b/hooks/pretooluse-skill-inject.mjs index 096155e..23da6a4 100644 --- a/hooks/pretooluse-skill-inject.mjs +++ b/hooks/pretooluse-skill-inject.mjs @@ -36,6 +36,27 @@ import { import { resolveVercelJsonSkills, isVercelJsonPath, VERCEL_JSON_SKILLS } from "./vercel-config.mjs"; import { createLogger, logDecision } from "./logger.mjs"; import { trackBaseEvents } from "./telemetry.mjs"; +import { loadCachedPlanResult, selectActiveStory } from "./verification-plan.mjs"; +import { resolveVerificationRuntimeState, buildVerificationEnv } from "./verification-directive.mjs"; +import { applyPolicyBoosts, applyRulebookBoosts } from "./routing-policy.mjs"; +import { + appendSkillExposure, + loadProjectRoutingPolicy +} from "./routing-policy-ledger.mjs"; +import { loadRulebook, rulebookPath } from "./learned-routing-rulebook.mjs"; +import { buildAttributionDecision } from "./routing-attribution.mjs"; +import { explainPolicyRecall } from "./routing-diagnosis.mjs"; +import { + appendRoutingDecisionTrace, + createDecisionId +} from "./routing-decision-trace.mjs"; +import { + createDecisionCausality, + addCause, + addEdge +} from "./routing-decision-causality.mjs"; +import { recallVerifiedCompanions } from "./companion-recall.mjs"; +import { recallVerifiedPlaybook } from "./playbook-recall.mjs"; var MAX_SKILLS = 3; var DEFAULT_INJECTION_BUDGET_BYTES = 18e3; var SETUP_MODE_BOOTSTRAP_SKILL = "bootstrap"; @@ -471,7 +492,7 @@ function matchSkills(toolName, toolInput, compiledSkills, logger) { l.debug("matches-found", { matched: [...matched], reasons: matchReasons }); return { matchedEntries, matchReasons, matched }; } -function deduplicateSkills({ matchedEntries, matched, toolName, toolInput, injectedSkills, dedupOff, maxSkills, likelySkills, compiledSkills, setupMode }, logger) { +function deduplicateSkills({ matchedEntries, matched, toolName, toolInput, injectedSkills, dedupOff, maxSkills, likelySkills, compiledSkills, setupMode, cwd, sessionId }, logger) { const l = logger || log; const cap = maxSkills ?? MAX_SKILLS; const likely = likelySkills || /* @__PURE__ */ new Set(); @@ -552,23 +573,122 @@ function deduplicateSkills({ matchedEntries, matched, toolName, toolInput, injec }); } } + const policyBoosted = []; + if (cwd) { + const plan = sessionId ? loadCachedPlanResult(sessionId, l) : null; + const primaryStory = plan ? selectActiveStory(plan) : null; + if (primaryStory) { + const policyScenario = { + hook: "PreToolUse", + storyKind: primaryStory.kind ?? null, + targetBoundary: plan?.primaryNextAction?.targetBoundary ?? null, + toolName + }; + const policy = loadProjectRoutingPolicy(cwd); + const boosted = applyPolicyBoosts( + newEntries.map((e) => ({ + ...e, + skill: e.skill, + priority: e.priority, + effectivePriority: typeof e.effectivePriority === "number" ? e.effectivePriority : e.priority + })), + policy, + policyScenario + ); + for (let i = 0; i < newEntries.length; i++) { + const b = boosted[i]; + newEntries[i].effectivePriority = b.effectivePriority; + if (b.policyBoost !== 0) { + policyBoosted.push({ + skill: b.skill, + boost: b.policyBoost, + reason: b.policyReason + }); + } + } + if (policyBoosted.length > 0) { + l.debug("policy-boosted", { + scenario: `${policyScenario.hook}|${policyScenario.storyKind ?? "none"}|${policyScenario.targetBoundary ?? "none"}|${policyScenario.toolName}`, + boostedSkills: policyBoosted + }); + } + } else { + l.debug("policy-boost-skipped", { reason: "no active verification story" }); + } + } + const rulebookBoosted = []; + if (cwd) { + const rbResult = loadRulebook(cwd); + if (rbResult.ok && rbResult.rulebook.rules.length > 0) { + const plan = sessionId ? loadCachedPlanResult(sessionId, l) : null; + const primaryStory = plan ? selectActiveStory(plan) : null; + if (primaryStory) { + const rbScenario = { + hook: "PreToolUse", + storyKind: primaryStory.kind ?? null, + targetBoundary: plan?.primaryNextAction?.targetBoundary ?? null, + toolName + }; + const rbPath = rulebookPath(cwd); + const withRulebook = applyRulebookBoosts( + newEntries.map((e) => ({ + ...e, + skill: e.skill, + priority: e.priority, + effectivePriority: typeof e.effectivePriority === "number" ? e.effectivePriority : e.priority, + policyBoost: policyBoosted.find((p) => p.skill === e.skill)?.boost ?? 0, + policyReason: policyBoosted.find((p) => p.skill === e.skill)?.reason ?? null + })), + rbResult.rulebook, + rbScenario, + rbPath + ); + for (let i = 0; i < newEntries.length; i++) { + const rb = withRulebook[i]; + newEntries[i].effectivePriority = rb.effectivePriority; + if (rb.matchedRuleId) { + rulebookBoosted.push({ + skill: rb.skill, + matchedRuleId: rb.matchedRuleId, + ruleBoost: rb.ruleBoost, + ruleReason: rb.ruleReason ?? "", + rulebookPath: rb.rulebookPath ?? "" + }); + const pIdx = policyBoosted.findIndex((p) => p.skill === rb.skill); + if (pIdx !== -1) { + policyBoosted.splice(pIdx, 1); + } + } + } + if (rulebookBoosted.length > 0) { + l.debug("rulebook-boosted", { + scenario: `${rbScenario.hook}|${rbScenario.storyKind ?? "none"}|${rbScenario.targetBoundary ?? "none"}|${rbScenario.toolName}`, + boostedSkills: rulebookBoosted + }); + } + } + } else if (!rbResult.ok) { + l.debug("rulebook-load-error", { code: rbResult.error.code, message: rbResult.error.message }); + } + } newEntries = rankEntries(newEntries); const rankedSkills = newEntries.map((e) => e.skill); for (const entry of newEntries) { const eff = typeof entry.effectivePriority === "number" ? entry.effectivePriority : entry.priority; + const reason = rulebookBoosted.some((r) => r.skill === entry.skill) ? "rulebook_boosted" : policyBoosted.some((p) => p.skill === entry.skill) ? "policy_boosted" : profilerBoosted.includes(entry.skill) ? "profiler_boosted" : "pattern_match"; logDecision(l, { hook: "PreToolUse", event: "skill_ranked", skill: entry.skill, score: eff, - reason: profilerBoosted.includes(entry.skill) ? "profiler_boosted" : "pattern_match" + reason }); } l.debug("dedup-filtered", { rankedSkills, previouslyInjected: [...injectedSkills] }); - return { newEntries, rankedSkills, vercelJsonRouting, profilerBoosted, setupModeRouting }; + return { newEntries, rankedSkills, vercelJsonRouting, profilerBoosted, setupModeRouting, policyBoosted, rulebookBoosted }; } function skillInvocationMessage(skill, platform) { return platform === "cursor" ? `Load the /${skill} skill.` : `You must run the Skill(${skill}) tool.`; @@ -673,6 +793,80 @@ function injectSkills(rankedSkills, options) { l.debug("skills-injected", { injected: loaded, summaryOnly, skippedByConcurrentClaim, totalParts: parts.length, usedBytes, budgetBytes: budget }); return { parts, loaded, summaryOnly, droppedByCap, droppedByBudget, skippedByConcurrentClaim }; } +function applyVerifiedPlaybookInsertion(params) { + const rankedSkills = [...params.rankedSkills]; + const matched = new Set(params.matched); + const forceSummarySkills = new Set(params.forceSummarySkills); + const reasons = {}; + if (!params.selection) { + return { + rankedSkills, + matched, + forceSummarySkills, + reasons, + applied: false, + appliedOrderedSkills: [], + appliedInsertedSkills: [], + banner: null + }; + } + const anchorIdx = rankedSkills.indexOf(params.selection.anchorSkill); + if (anchorIdx === -1) { + return { + rankedSkills, + matched, + forceSummarySkills, + reasons, + applied: false, + appliedOrderedSkills: [], + appliedInsertedSkills: [], + banner: null + }; + } + const appliedInsertedSkills = []; + let insertOffset = 1; + for (const skill of params.selection.insertedSkills) { + if (rankedSkills.includes(skill)) continue; + rankedSkills.splice(anchorIdx + insertOffset, 0, skill); + matched.add(skill); + appliedInsertedSkills.push(skill); + if (!params.dedupOff && params.injectedSkills.has(skill)) { + forceSummarySkills.add(skill); + } + reasons[skill] = { + trigger: "verified-playbook", + reasonCode: "scenario-playbook-rulebook" + }; + insertOffset += 1; + } + const applied = appliedInsertedSkills.length > 0; + return { + rankedSkills, + matched, + forceSummarySkills, + reasons, + applied, + appliedOrderedSkills: applied ? [params.selection.anchorSkill, ...appliedInsertedSkills] : [], + appliedInsertedSkills, + banner: applied ? params.selection.banner : null + }; +} +function buildPlaybookExposureRoles(orderedSkills) { + const [anchorSkill, ...rest] = orderedSkills.filter(Boolean); + if (!anchorSkill) return []; + return [ + { + skill: anchorSkill, + attributionRole: "candidate", + candidateSkill: anchorSkill + }, + ...rest.map((skill) => ({ + skill, + attributionRole: "context", + candidateSkill: anchorSkill + })) + ]; +} function formatPlatformOutput(platform, additionalContext, env) { if (platform === "cursor") { const output2 = {}; @@ -830,6 +1024,23 @@ function run() { if (!matchResult) return "{}"; if (log.active) timing.match = Math.round(log.now() - tMatch); const { matchedEntries, matchReasons, matched } = matchResult; + const causality = createDecisionCausality(); + for (const [skill, reason] of Object.entries(matchReasons)) { + addCause(causality, { + code: "pattern-match", + stage: "match", + skill, + synthetic: false, + scoreDelta: 0, + message: `Matched ${reason.matchType} pattern`, + detail: { + matchType: reason.matchType, + pattern: reason.pattern, + toolName, + toolTarget: toolName === "Bash" ? redactCommand(toolTarget) : toolTarget + } + }); + } const tsxReview = checkTsxReviewTrigger(toolName, toolInput, injectedSkills, dedupOff, sessionId, log); const devServerVerify = checkDevServerVerify(toolName, toolInput, injectedSkills, dedupOff, sessionId, log); const vercelEnvHelp = checkVercelEnvHelp(toolName, toolInput, injectedSkills, dedupOff, log); @@ -851,9 +1062,38 @@ function run() { dedupOff, likelySkills, compiledSkills, - setupMode + setupMode, + cwd, + sessionId }, log); - const { newEntries, rankedSkills, profilerBoosted } = dedupResult; + const { newEntries, rankedSkills, profilerBoosted, policyBoosted, rulebookBoosted } = dedupResult; + for (const boosted of policyBoosted) { + addCause(causality, { + code: "policy-boost", + stage: "rank", + skill: boosted.skill, + synthetic: false, + scoreDelta: boosted.boost, + message: boosted.reason ?? "Policy boost applied", + detail: { boost: boosted.boost, reason: boosted.reason ?? "" } + }); + } + for (const boosted of rulebookBoosted) { + addCause(causality, { + code: "rulebook-boost", + stage: "rank", + skill: boosted.skill, + synthetic: false, + scoreDelta: boosted.ruleBoost, + message: boosted.ruleReason || "Rulebook boost applied", + detail: { + matchedRuleId: boosted.matchedRuleId, + ruleBoost: boosted.ruleBoost, + ruleReason: boosted.ruleReason, + rulebookPath: boosted.rulebookPath + } + }); + } let tsxReviewInjected = false; if (tsxReview.triggered && !rankedSkills.includes(TSX_REVIEW_SKILL)) { const reviewTemplate = compiledSkills.find((e) => e.skill === TSX_REVIEW_SKILL); @@ -969,6 +1209,252 @@ function run() { log.debug("ai-sdk-companion-inject", { skill: companion }); } } + const policyRecallSynthetic = /* @__PURE__ */ new Set(); + if (cwd && sessionId) { + const recallPlan = loadCachedPlanResult(sessionId, log); + const recallStory = recallPlan ? selectActiveStory(recallPlan) : null; + const recallBoundary = recallPlan?.primaryNextAction?.targetBoundary ?? null; + if (recallStory && recallBoundary) { + const recallScenario = { + hook: "PreToolUse", + storyKind: recallStory.kind ?? null, + targetBoundary: recallBoundary, + toolName, + routeScope: recallStory.route ?? null + }; + const policy = loadProjectRoutingPolicy(cwd); + const excludeSkills = /* @__PURE__ */ new Set([...rankedSkills, ...injectedSkills]); + const recallDiagnosis = explainPolicyRecall(policy, recallScenario, { + maxCandidates: 1, + excludeSkills + }); + log.debug("policy-recall-lookup", { + requestedScenario: `${recallScenario.hook}|${recallScenario.storyKind ?? "none"}|${recallScenario.targetBoundary ?? "none"}|${recallScenario.toolName}|${recallScenario.routeScope ?? "*"}`, + checkedScenarios: recallDiagnosis.checkedScenarios, + selectedBucket: recallDiagnosis.selectedBucket, + selectedSkills: recallDiagnosis.selected.map((candidate) => candidate.skill), + rejected: recallDiagnosis.rejected.map((candidate) => ({ + skill: candidate.skill, + scenario: candidate.scenario, + exposures: candidate.exposures, + successRate: candidate.successRate, + policyBoost: candidate.policyBoost, + excluded: candidate.excluded, + rejectedReason: candidate.rejectedReason + })), + hintCodes: recallDiagnosis.hints.map((hint) => hint.code) + }); + for (const candidate of recallDiagnosis.selected) { + if (rankedSkills.includes(candidate.skill)) continue; + const insertIdx = rankedSkills.length > 0 ? 1 : 0; + rankedSkills.splice(insertIdx, 0, candidate.skill); + matched.add(candidate.skill); + policyRecallSynthetic.add(candidate.skill); + addCause(causality, { + code: "policy-recall", + stage: "rank", + skill: candidate.skill, + synthetic: true, + scoreDelta: 0, + message: `Recalled historically verified skill for ${candidate.scenario}`, + detail: { + scenario: candidate.scenario, + exposures: candidate.exposures, + wins: candidate.wins, + directiveWins: candidate.directiveWins, + successRate: candidate.successRate, + recallScore: candidate.recallScore + } + }); + log.debug("policy-recall-injected", { + skill: candidate.skill, + scenario: candidate.scenario, + insertionIndex: insertIdx, + exposures: candidate.exposures, + wins: candidate.wins, + directiveWins: candidate.directiveWins, + successRate: candidate.successRate, + policyBoost: candidate.policyBoost, + recallScore: candidate.recallScore + }); + } + } else { + log.debug("policy-recall-skipped", { + reason: !recallStory ? "no_active_verification_story" : "no_target_boundary" + }); + } + } + const companionRecallReasons = {}; + if (cwd && sessionId) { + const companionPlan = loadCachedPlanResult(sessionId, log); + const companionStory = companionPlan ? selectActiveStory(companionPlan) : null; + const companionBoundary = companionPlan?.primaryNextAction?.targetBoundary ?? null; + if (companionStory && companionBoundary) { + const companionRecall = recallVerifiedCompanions({ + projectRoot: cwd, + scenario: { + hook: "PreToolUse", + storyKind: companionStory.kind ?? null, + targetBoundary: companionBoundary, + toolName, + routeScope: companionStory.route ?? null + }, + candidateSkills: [...rankedSkills], + excludeSkills: /* @__PURE__ */ new Set([...rankedSkills, ...injectedSkills]), + maxCompanions: 1 + }); + for (const recall of companionRecall.selected) { + const candidateIdx = rankedSkills.indexOf(recall.candidateSkill); + if (candidateIdx === -1) continue; + rankedSkills.splice(candidateIdx + 1, 0, recall.companionSkill); + matched.add(recall.companionSkill); + const alreadySeen = !dedupOff && injectedSkills.has(recall.companionSkill); + if (alreadySeen) { + forceSummarySkills.add(recall.companionSkill); + } + companionRecallReasons[recall.companionSkill] = { + trigger: "verified-companion", + reasonCode: "scenario-companion-rulebook" + }; + addCause(causality, { + code: "verified-companion", + stage: "rank", + skill: recall.companionSkill, + synthetic: true, + scoreDelta: 0, + message: `Inserted learned companion after ${recall.candidateSkill}`, + detail: { + candidateSkill: recall.candidateSkill, + scenario: recall.scenario, + confidence: recall.confidence, + summaryOnly: alreadySeen + } + }); + addEdge(causality, { + fromSkill: recall.candidateSkill, + toSkill: recall.companionSkill, + relation: "companion-of", + code: "verified-companion", + detail: { + scenario: recall.scenario, + confidence: recall.confidence + } + }); + log.debug("companion-recall-injected", { + candidateSkill: recall.candidateSkill, + companionSkill: recall.companionSkill, + scenario: recall.scenario, + lift: recall.confidence, + summaryOnly: alreadySeen + }); + } + if (companionRecall.rejected.length > 0) { + log.debug("companion-recall-rejected", { + rejected: companionRecall.rejected + }); + } + } else { + log.debug("companion-recall-skipped", { + reason: !companionStory ? "no_active_verification_story" : "no_target_boundary" + }); + } + } + const playbookRecallReasons = {}; + let playbookBanner = null; + const playbookExposureRoles = /* @__PURE__ */ new Map(); + if (cwd && sessionId) { + const playbookPlan = loadCachedPlanResult(sessionId, log); + const playbookStory = playbookPlan ? selectActiveStory(playbookPlan) : null; + const playbookBoundary = playbookPlan?.primaryNextAction?.targetBoundary ?? null; + if (playbookStory && playbookBoundary) { + const playbookRecall = recallVerifiedPlaybook({ + projectRoot: cwd, + scenario: { + hook: "PreToolUse", + storyKind: playbookStory.kind ?? null, + targetBoundary: playbookBoundary, + toolName, + routeScope: playbookStory.route ?? null + }, + candidateSkills: [...rankedSkills], + excludeSkills: /* @__PURE__ */ new Set([...rankedSkills, ...injectedSkills]), + maxInsertedSkills: 2 + }); + const playbookApply = applyVerifiedPlaybookInsertion({ + rankedSkills, + matched, + injectedSkills, + dedupOff, + forceSummarySkills, + selection: playbookRecall.selected ? { + anchorSkill: playbookRecall.selected.anchorSkill, + insertedSkills: playbookRecall.selected.insertedSkills, + banner: playbookRecall.banner + } : null + }); + rankedSkills.length = 0; + rankedSkills.push(...playbookApply.rankedSkills); + matched.clear(); + for (const skill of playbookApply.matched) matched.add(skill); + forceSummarySkills.clear(); + for (const skill of playbookApply.forceSummarySkills) { + forceSummarySkills.add(skill); + } + Object.assign(playbookRecallReasons, playbookApply.reasons); + if (playbookApply.applied) { + if (playbookApply.banner) { + playbookBanner = playbookApply.banner; + } + for (const role of buildPlaybookExposureRoles(playbookApply.appliedOrderedSkills)) { + playbookExposureRoles.set(role.skill, role); + } + if (playbookRecall.selected) { + for (const skill of playbookApply.appliedInsertedSkills) { + addCause(causality, { + code: "verified-playbook", + stage: "rank", + skill, + synthetic: true, + scoreDelta: 0, + message: `Inserted verified playbook step after ${playbookRecall.selected.anchorSkill}`, + detail: { + ruleId: playbookRecall.selected.ruleId, + orderedSkills: playbookApply.appliedOrderedSkills, + support: playbookRecall.selected.support, + precision: playbookRecall.selected.precision, + lift: playbookRecall.selected.lift + } + }); + addEdge(causality, { + fromSkill: playbookRecall.selected.anchorSkill, + toSkill: skill, + relation: "playbook-step", + code: "verified-playbook", + detail: { + ruleId: playbookRecall.selected.ruleId + } + }); + } + log.debug("playbook-recall-injected", { + ruleId: playbookRecall.selected.ruleId, + anchorSkill: playbookRecall.selected.anchorSkill, + insertedSkills: playbookApply.appliedInsertedSkills + }); + } + } else if (playbookRecall.selected) { + log.debug("playbook-recall-noop", { + ruleId: playbookRecall.selected.ruleId, + anchorSkill: playbookRecall.selected.anchorSkill, + requestedInsertedSkills: playbookRecall.selected.insertedSkills, + reason: "no_new_skills_inserted" + }); + } + } else { + log.debug("playbook-recall-skipped", { + reason: !playbookStory ? "no_active_verification_story" : "no_target_boundary" + }); + } + } let vercelEnvHelpInjected = false; if (vercelEnvHelp.triggered) { let helpClaimed = true; @@ -997,10 +1483,12 @@ function run() { devServerVerifyTriggered: devServerVerify.triggered, matchedSkills: [...matched], injectedSkills: [], - boostsApplied: profilerBoosted + boostsApplied: profilerBoosted, + policyBoosted }, log.active ? timing : null); - const envUpdates2 = finalizeRuntimeEnvUpdates(platform, runtimeEnvBefore); - return formatPlatformOutput(platform, void 0, envUpdates2); + const earlyEnv = finalizeRuntimeEnvUpdates(platform, runtimeEnvBefore); + const clearingEnv = { ...earlyEnv ?? {}, ...buildVerificationEnv(null) }; + return formatPlatformOutput(platform, void 0, clearingEnv); } const tSkillRead = log.active ? log.now() : 0; const { parts, loaded, summaryOnly, droppedByCap, droppedByBudget } = injectSkills(rankedSkills, { @@ -1015,6 +1503,79 @@ function run() { platform }); if (log.active) timing.skill_read = Math.round(log.now() - tSkillRead); + for (const skill of droppedByCap) { + addCause(causality, { + code: "dropped-cap", + stage: "inject", + skill, + synthetic: false, + scoreDelta: 0, + message: "Dropped because max skill cap was exceeded", + detail: { maxSkills: MAX_SKILLS } + }); + } + for (const skill of droppedByBudget) { + addCause(causality, { + code: "dropped-budget", + stage: "inject", + skill, + synthetic: false, + scoreDelta: 0, + message: "Dropped because injection budget was exhausted", + detail: { budgetBytes: getInjectionBudget() } + }); + } + if (loaded.length > 0 && sessionId) { + const plan = loadCachedPlanResult(sessionId, log); + const story = plan ? selectActiveStory(plan) : null; + if (story) { + const targetBoundary = plan?.primaryNextAction?.targetBoundary ?? null; + const attribution = buildAttributionDecision({ + sessionId, + hook: "PreToolUse", + storyId: story.id ?? null, + route: story.route ?? null, + targetBoundary, + loadedSkills: loaded, + preferredSkills: policyRecallSynthetic + }); + for (const skill of loaded) { + const playbookRole = playbookExposureRoles.get(skill); + appendSkillExposure({ + id: `${sessionId}:${skill}:${Date.now()}`, + sessionId, + projectRoot: cwd, + storyId: story.id ?? null, + storyKind: story.kind ?? null, + route: story.route ?? null, + hook: "PreToolUse", + toolName, + skill, + targetBoundary, + exposureGroupId: attribution.exposureGroupId, + attributionRole: playbookRole?.attributionRole ?? (skill === attribution.candidateSkill ? "candidate" : "context"), + candidateSkill: playbookRole?.candidateSkill ?? attribution.candidateSkill, + createdAt: (/* @__PURE__ */ new Date()).toISOString(), + resolvedAt: null, + outcome: "pending" + }); + } + log.summary("routing-policy-exposures-recorded", { + hook: "PreToolUse", + skills: loaded, + storyId: story.id, + storyKind: story.kind ?? null, + candidateSkill: attribution.candidateSkill, + exposureGroupId: attribution.exposureGroupId + }); + } else { + log.debug("routing-policy-exposures-skipped", { + hook: "PreToolUse", + reason: "no active verification story", + skills: loaded + }); + } + } if (tsxReviewInjected && loaded.includes(TSX_REVIEW_SKILL)) { parts.push(REVIEW_MARKER); const prevCount = getTsxEditCount(sessionId); @@ -1049,10 +1610,12 @@ function run() { injectedSkills: [], droppedByCap, droppedByBudget, - boostsApplied: profilerBoosted + boostsApplied: profilerBoosted, + policyBoosted }, log.active ? timing : null); - const envUpdates2 = finalizeRuntimeEnvUpdates(platform, runtimeEnvBefore); - return formatPlatformOutput(platform, void 0, envUpdates2); + const earlyEnv2 = finalizeRuntimeEnvUpdates(platform, runtimeEnvBefore); + const clearingEnv2 = { ...earlyEnv2 ?? {}, ...buildVerificationEnv(null) }; + return formatPlatformOutput(platform, void 0, clearingEnv2); } if (log.active) timing.total = log.elapsed(); const cappedCount = droppedByCap.length + droppedByBudget.length; @@ -1104,6 +1667,18 @@ function run() { } } } + for (const skill of policyRecallSynthetic) { + reasons[skill] = { + trigger: "policy-recall", + reasonCode: "route-scoped-verified-policy-recall" + }; + } + for (const [skill, reason] of Object.entries(companionRecallReasons)) { + reasons[skill] = reason; + } + for (const [skill, reason] of Object.entries(playbookRecallReasons)) { + reasons[skill] = reason; + } for (const skill of loaded) { if (!reasons[skill] && matchReasons?.[skill]) { reasons[skill] = { @@ -1112,7 +1687,27 @@ function run() { }; } } - const envUpdates = finalizeRuntimeEnvUpdates(platform, runtimeEnvBefore); + const verificationRuntime = resolveVerificationRuntimeState(sessionId, { + agentBrowserAvailable: process.env.VERCEL_PLUGIN_AGENT_BROWSER_AVAILABLE !== "0", + lastAttemptedAction: process.env.VERCEL_PLUGIN_VERIFICATION_ACTION || null + }, log); + if (verificationRuntime.banner) { + parts.unshift(verificationRuntime.banner); + log.summary("pretooluse.verification-banner-injected", { + sessionId, + storyId: verificationRuntime.directive?.storyId ?? null, + route: verificationRuntime.directive?.route ?? null, + source: verificationRuntime.plan ? "cache-or-compute" : "none" + }); + } + if (playbookBanner) { + parts.unshift(playbookBanner); + } + const runtimeEnv = finalizeRuntimeEnvUpdates(platform, runtimeEnvBefore); + const envUpdates = { + ...runtimeEnv ?? {}, + ...verificationRuntime.env + }; const result = formatOutput({ parts, matched, @@ -1127,7 +1722,7 @@ function run() { verificationId, skillMap: skills.skillMap, platform, - env: envUpdates + env: Object.keys(envUpdates).length > 0 ? envUpdates : void 0 }); if (loaded.length > 0) { appendAuditLog({ @@ -1156,6 +1751,159 @@ function run() { }); } } + { + const tracePlan = sessionId ? loadCachedPlanResult(sessionId, log) : null; + const traceStory = tracePlan ? selectActiveStory(tracePlan) : null; + const traceTimestamp = (/* @__PURE__ */ new Date()).toISOString(); + const traceToolTarget = toolName === "Bash" ? redactCommand(toolTarget) : toolTarget; + const decisionId = createDecisionId({ + hook: "PreToolUse", + sessionId, + toolName, + toolTarget: traceToolTarget, + timestamp: traceTimestamp + }); + const syntheticSkills = /* @__PURE__ */ new Set(); + if (tsxReviewInjected && tsxReview.triggered) syntheticSkills.add(TSX_REVIEW_SKILL); + if (devServerVerifyInjected && devServerVerify.triggered) syntheticSkills.add(DEV_SERVER_VERIFY_SKILL); + if (devServerVerify.triggered && !devServerVerify.unavailable) { + for (const companion of DEV_SERVER_COMPANION_SKILLS) { + if (rankedSkills.includes(companion) && !newEntries.some((e) => e.skill === companion)) { + syntheticSkills.add(companion); + } + } + } + if (devServerVerify.loopGuardHit && !devServerVerify.unavailable) { + for (const companion of DEV_SERVER_COMPANION_SKILLS) { + if (rankedSkills.includes(companion)) syntheticSkills.add(companion); + } + } + if (aiSdkCompanionInjected) { + for (const companion of AI_SDK_COMPANION_SKILLS) { + if (rankedSkills.includes(companion) && !newEntries.some((e) => e.skill === companion)) { + syntheticSkills.add(companion); + } + } + } + for (const skill of policyRecallSynthetic) { + syntheticSkills.add(skill); + } + for (const skill of Object.keys(companionRecallReasons)) { + syntheticSkills.add(skill); + } + const traceRanked = []; + const trackedSkills = /* @__PURE__ */ new Set(); + for (const entry of newEntries) { + const match = matchReasons?.[entry.skill]; + const policy = policyBoosted.find((p) => p.skill === entry.skill); + const rb = rulebookBoosted.find((r) => r.skill === entry.skill); + const companionReason = companionRecallReasons[entry.skill]; + trackedSkills.add(entry.skill); + traceRanked.push({ + skill: entry.skill, + basePriority: entry.priority, + effectivePriority: typeof entry.effectivePriority === "number" ? entry.effectivePriority : entry.priority, + pattern: companionReason ? { type: companionReason.trigger, value: companionReason.reasonCode } : match ? { type: match.matchType, value: match.pattern } : null, + profilerBoost: profilerBoosted.includes(entry.skill) ? 5 : 0, + policyBoost: policy?.boost ?? 0, + policyReason: policy?.reason ?? null, + matchedRuleId: rb?.matchedRuleId ?? null, + ruleBoost: rb?.ruleBoost ?? 0, + ruleReason: rb?.ruleReason ?? null, + rulebookPath: rb?.rulebookPath ?? null, + summaryOnly: summaryOnly.includes(entry.skill), + synthetic: syntheticSkills.has(entry.skill), + droppedReason: droppedByCap.includes(entry.skill) ? "cap_exceeded" : droppedByBudget.includes(entry.skill) ? "budget_exhausted" : null + }); + } + for (const skill of syntheticSkills) { + if (trackedSkills.has(skill)) continue; + trackedSkills.add(skill); + const reason = reasons[skill]; + traceRanked.push({ + skill, + basePriority: 0, + effectivePriority: 0, + pattern: reason ? { type: reason.trigger, value: reason.reasonCode } : null, + profilerBoost: 0, + policyBoost: 0, + policyReason: null, + matchedRuleId: null, + ruleBoost: 0, + ruleReason: null, + rulebookPath: null, + summaryOnly: summaryOnly.includes(skill), + synthetic: true, + droppedReason: droppedByCap.includes(skill) ? "cap_exceeded" : droppedByBudget.includes(skill) ? "budget_exhausted" : null + }); + } + for (const entry of matchedEntries) { + if (trackedSkills.has(entry.skill)) continue; + if (!injectedSkills.has(entry.skill)) continue; + trackedSkills.add(entry.skill); + const match = matchReasons?.[entry.skill]; + traceRanked.push({ + skill: entry.skill, + basePriority: entry.priority, + effectivePriority: typeof entry.effectivePriority === "number" ? entry.effectivePriority : entry.priority, + pattern: match ? { type: match.matchType, value: match.pattern } : null, + profilerBoost: profilerBoosted.includes(entry.skill) ? 5 : 0, + policyBoost: 0, + policyReason: null, + matchedRuleId: null, + ruleBoost: 0, + ruleReason: null, + rulebookPath: null, + summaryOnly: false, + synthetic: false, + droppedReason: "deduped" + }); + } + appendRoutingDecisionTrace({ + version: 2, + decisionId, + sessionId, + hook: "PreToolUse", + toolName, + toolTarget: traceToolTarget, + timestamp: traceTimestamp, + primaryStory: { + id: traceStory?.id ?? null, + kind: traceStory?.kind ?? null, + storyRoute: traceStory?.route ?? null, + targetBoundary: tracePlan?.primaryNextAction?.targetBoundary ?? null + }, + observedRoute: null, + // PreToolUse fires before execution; no observed route yet + policyScenario: traceStory ? `PreToolUse|${traceStory.kind ?? "none"}|${tracePlan?.primaryNextAction?.targetBoundary ?? "none"}|${toolName}` : null, + matchedSkills: [...matched], + injectedSkills: loaded, + skippedReasons: [ + ...traceStory ? [] : ["no_active_verification_story"], + ...droppedByCap.map((skill) => `cap_exceeded:${skill}`), + ...droppedByBudget.map((skill) => `budget_exhausted:${skill}`) + ], + ranked: traceRanked, + verification: verificationId ? { verificationId, observedBoundary: null, matchedSuggestedAction: null } : null, + causes: causality.causes, + edges: causality.edges + }); + log.summary("routing.decision_trace_written", { + decisionId, + hook: "PreToolUse", + toolName, + matchedSkills: [...matched], + injectedSkills: loaded + }); + log.summary("routing.decision_causality", { + decisionId, + hook: "PreToolUse", + causeCount: causality.causes.length, + edgeCount: causality.edges.length, + causes: causality.causes, + edges: causality.edges + }); + } return result; } var REDACT_MAX = 200; @@ -1258,6 +2006,8 @@ export { DEV_SERVER_VERIFY_SKILL, REVIEW_MARKER, TSX_REVIEW_SKILL, + applyVerifiedPlaybookInsertion, + buildPlaybookExposureRoles, captureRuntimeEnvSnapshot, checkDevServerVerify, checkTsxReviewTrigger, diff --git a/hooks/prompt-policy-recall.mjs b/hooks/prompt-policy-recall.mjs new file mode 100644 index 0000000..421ba24 --- /dev/null +++ b/hooks/prompt-policy-recall.mjs @@ -0,0 +1,71 @@ +// hooks/src/prompt-policy-recall.mts +import { + explainPolicyRecall +} from "./routing-diagnosis.mjs"; +function applyPromptPolicyRecall(params) { + const seenSkills = new Set(params.seenSkills ?? []); + const selectedSkills = [...params.selectedSkills]; + const matchedSkills = [...params.matchedSkills]; + const syntheticSkills = []; + const reasons = {}; + if (!params.binding.storyId || !params.binding.targetBoundary) { + return { + selectedSkills, + matchedSkills, + syntheticSkills, + reasons, + diagnosis: null + }; + } + const availableSlots = Math.max(0, params.maxSkills - selectedSkills.length); + if (availableSlots === 0) { + return { + selectedSkills, + matchedSkills, + syntheticSkills, + reasons, + diagnosis: null + }; + } + const excludeSkills = /* @__PURE__ */ new Set([ + ...selectedSkills, + ...seenSkills + ]); + const diagnosis = explainPolicyRecall( + params.policy, + { + hook: "UserPromptSubmit", + storyKind: params.binding.storyKind, + targetBoundary: params.binding.targetBoundary, + toolName: "Prompt", + routeScope: params.binding.route ?? null + }, + { + maxCandidates: availableSlots, + excludeSkills + } + ); + const baseInsertIdx = selectedSkills.length > 0 ? 1 : 0; + let insertedCount = 0; + for (const candidate of diagnosis.selected) { + if (selectedSkills.includes(candidate.skill)) continue; + const insertIdx = baseInsertIdx + insertedCount; + selectedSkills.splice(insertIdx, 0, candidate.skill); + insertedCount += 1; + if (!matchedSkills.includes(candidate.skill)) { + matchedSkills.push(candidate.skill); + } + syntheticSkills.push(candidate.skill); + reasons[candidate.skill] = `route-scoped verified policy recall (${candidate.wins}/${candidate.exposures} wins, success=${candidate.successRate})`; + } + return { + selectedSkills, + matchedSkills, + syntheticSkills, + reasons, + diagnosis + }; +} +export { + applyPromptPolicyRecall +}; diff --git a/hooks/prompt-verification-binding.mjs b/hooks/prompt-verification-binding.mjs new file mode 100644 index 0000000..eb35d68 --- /dev/null +++ b/hooks/prompt-verification-binding.mjs @@ -0,0 +1,31 @@ +// hooks/src/prompt-verification-binding.mts +import { + selectActiveStory +} from "./verification-plan.mjs"; +function resolvePromptVerificationBinding(input) { + const story = input.plan ? selectActiveStory(input.plan) : null; + const targetBoundary = input.plan?.primaryNextAction?.targetBoundary ?? null; + if (story && targetBoundary) { + return { + targetBoundary, + storyId: story.id ?? null, + storyKind: story.kind ?? null, + route: story.route ?? null, + source: "active-plan", + confidence: 1, + reason: `active verification plan predicted ${targetBoundary}` + }; + } + return { + targetBoundary: null, + storyId: story?.id ?? null, + storyKind: story?.kind ?? null, + route: story?.route ?? null, + source: "none", + confidence: 0, + reason: story ? "active verification story exists but no primary next boundary is available" : "no active verification story" + }; +} +export { + resolvePromptVerificationBinding +}; diff --git a/hooks/routing-attribution.mjs b/hooks/routing-attribution.mjs new file mode 100644 index 0000000..c591092 --- /dev/null +++ b/hooks/routing-attribution.mjs @@ -0,0 +1,42 @@ +// hooks/src/routing-attribution.mts +import { createLogger } from "./logger.mjs"; +function chooseAttributedSkill(loadedSkills, preferredSkills = []) { + const preferred = new Set(preferredSkills); + for (const skill of loadedSkills) { + if (preferred.has(skill)) return skill; + } + return loadedSkills[0] ?? null; +} +function buildAttributionDecision(input) { + const log = createLogger(); + const timestamp = input.now ?? (/* @__PURE__ */ new Date()).toISOString(); + const candidateSkill = chooseAttributedSkill( + input.loadedSkills, + input.preferredSkills + ); + const decision = { + exposureGroupId: [ + input.sessionId, + input.hook, + input.storyId ?? "none", + input.route ?? "*", + input.targetBoundary ?? "none", + timestamp + ].join(":"), + candidateSkill, + loadedSkills: [...input.loadedSkills] + }; + log.summary("routing-attribution.decision", { + exposureGroupId: decision.exposureGroupId, + candidateSkill: decision.candidateSkill, + loadedSkills: decision.loadedSkills, + hook: input.hook, + storyId: input.storyId, + route: input.route + }); + return decision; +} +export { + buildAttributionDecision, + chooseAttributedSkill +}; diff --git a/hooks/routing-decision-capsule.mjs b/hooks/routing-decision-capsule.mjs new file mode 100644 index 0000000..f6d781b --- /dev/null +++ b/hooks/routing-decision-capsule.mjs @@ -0,0 +1,171 @@ +// hooks/src/routing-decision-capsule.mts +import { mkdirSync, readFileSync, writeFileSync } from "fs"; +import { createHash } from "crypto"; +import { join } from "path"; +import { tmpdir } from "os"; +import { createLogger, logCaughtError } from "./logger.mjs"; +var SAFE_SESSION_ID_RE = /^[a-zA-Z0-9_-]+$/; +function safeSessionSegment(sessionId) { + if (!sessionId) return "no-session"; + if (SAFE_SESSION_ID_RE.test(sessionId)) return sessionId; + return createHash("sha256").update(sessionId).digest("hex"); +} +function decisionCapsuleDir(sessionId) { + return join( + tmpdir(), + `vercel-plugin-${safeSessionSegment(sessionId)}-capsules` + ); +} +function decisionCapsulePath(sessionId, decisionId) { + return join(decisionCapsuleDir(sessionId), `${decisionId}.json`); +} +function stableSha256(value) { + return createHash("sha256").update(JSON.stringify(value)).digest("hex"); +} +function deriveIssues(input) { + const issues = []; + if (!input.trace.primaryStory.id) { + issues.push({ + code: "no_active_verification_story", + severity: "warning", + message: "No active verification story was available for this decision.", + action: "Create or record a verification story before expecting policy learning or directed verification." + }); + } + if (!input.directive?.primaryNextAction) { + issues.push({ + code: "env_cleared", + severity: "info", + message: "Verification env keys were cleared for this decision.", + action: "Expected when no next action exists; unexpected if a flow is mid-debug." + }); + } + if (input.directive?.blockedReasons?.length) { + issues.push({ + code: "verification_blocked", + severity: "warning", + message: input.directive.blockedReasons[0], + action: "Resolve the blocking condition before relying on automated verification." + }); + } + if (input.trace.skippedReasons.some( + (reason) => reason.startsWith("budget_exhausted:") + )) { + issues.push({ + code: "budget_exhausted", + severity: "warning", + message: "At least one ranked skill was dropped because the injection budget was exhausted.", + action: "Inspect the ranked list in this capsule to see which skills were trimmed." + }); + } + if (input.hook !== "PostToolUse") { + issues.push({ + code: "machine_output_hidden_in_html_comment", + severity: "info", + message: "Some hook metadata still travels through additionalContext comments due hook schema limits.", + action: "Use VERCEL_PLUGIN_DECISION_PATH instead of scraping hook output." + }); + } + return issues; +} +function deriveRulebookProvenance(trace) { + for (const entry of trace.ranked) { + if (entry.matchedRuleId && entry.rulebookPath) { + return { + matchedRuleId: entry.matchedRuleId, + ruleBoost: entry.ruleBoost, + ruleReason: entry.ruleReason ?? "", + rulebookPath: entry.rulebookPath + }; + } + } + return null; +} +function buildDecisionCapsule(input) { + const platform = input.platform === "cursor" || input.platform === "claude-code" ? input.platform : "unknown"; + const base = { + type: "routing.decision-capsule/v1", + version: 1, + decisionId: input.trace.decisionId, + sessionId: input.sessionId, + hook: input.hook, + createdAt: input.createdAt, + input: { + toolName: input.toolName, + toolTarget: input.toolTarget, + platform + }, + activeStory: { + id: input.trace.primaryStory.id, + kind: input.trace.primaryStory.kind, + route: input.trace.primaryStory.storyRoute, + targetBoundary: input.trace.primaryStory.targetBoundary + }, + directive: input.directive, + matchedSkills: [...input.trace.matchedSkills], + injectedSkills: [...input.trace.injectedSkills], + ranked: [...input.trace.ranked], + attribution: input.attribution ?? null, + rulebookProvenance: deriveRulebookProvenance(input.trace), + verification: input.trace.verification, + reasons: { ...input.reasons ?? {} }, + skippedReasons: [...input.trace.skippedReasons], + env: { ...input.env ?? {} }, + issues: deriveIssues({ + hook: input.hook, + directive: input.directive, + trace: input.trace + }) + }; + return { ...base, sha256: stableSha256(base) }; +} +function persistDecisionCapsule(capsule, logger) { + const log = logger ?? createLogger(); + const path = decisionCapsulePath(capsule.sessionId, capsule.decisionId); + try { + mkdirSync(decisionCapsuleDir(capsule.sessionId), { recursive: true }); + writeFileSync(path, JSON.stringify(capsule, null, 2) + "\n", "utf-8"); + log.summary("routing.decision_capsule_written", { + decisionId: capsule.decisionId, + sessionId: capsule.sessionId, + hook: capsule.hook, + path, + sha256: capsule.sha256 + }); + } catch (error) { + logCaughtError(log, "routing.decision_capsule_write_failed", error, { + decisionId: capsule.decisionId, + sessionId: capsule.sessionId, + path + }); + } + return path; +} +function buildDecisionCapsuleEnv(capsule, artifactPath) { + return { + VERCEL_PLUGIN_DECISION_ID: capsule.decisionId, + VERCEL_PLUGIN_DECISION_PATH: artifactPath, + VERCEL_PLUGIN_DECISION_SHA256: capsule.sha256 + }; +} +function readDecisionCapsule(artifactPath, logger) { + const log = logger ?? createLogger(); + try { + return JSON.parse( + readFileSync(artifactPath, "utf-8") + ); + } catch (error) { + logCaughtError(log, "routing.decision_capsule_read_failed", error, { + artifactPath + }); + return null; + } +} +export { + buildDecisionCapsule, + buildDecisionCapsuleEnv, + decisionCapsuleDir, + decisionCapsulePath, + persistDecisionCapsule, + readDecisionCapsule +}; diff --git a/hooks/routing-decision-causality.mjs b/hooks/routing-decision-causality.mjs new file mode 100644 index 0000000..051a2f1 --- /dev/null +++ b/hooks/routing-decision-causality.mjs @@ -0,0 +1,53 @@ +// hooks/src/routing-decision-causality.mts +function sortUnknown(value) { + if (Array.isArray(value)) { + return value.map(sortUnknown); + } + if (!value || typeof value !== "object") { + return value; + } + const input = value; + const output = {}; + for (const key of Object.keys(input).sort()) { + output[key] = sortUnknown(input[key]); + } + return output; +} +function causeKey(cause) { + return [cause.skill, cause.stage, cause.code, cause.message].join("\0"); +} +function edgeKey(edge) { + return [edge.fromSkill, edge.toSkill, String(edge.relation), edge.code].join( + "\0" + ); +} +function createDecisionCausality() { + return { causes: [], edges: [] }; +} +function addCause(store, cause) { + store.causes.push({ + ...cause, + detail: sortUnknown(cause.detail) + }); + store.causes.sort( + (left, right) => causeKey(left).localeCompare(causeKey(right)) + ); +} +function addEdge(store, edge) { + store.edges.push({ + ...edge, + detail: sortUnknown(edge.detail) + }); + store.edges.sort( + (left, right) => edgeKey(left).localeCompare(edgeKey(right)) + ); +} +function causesForSkill(store, skill) { + return store.causes.filter((cause) => cause.skill === skill); +} +export { + addCause, + addEdge, + causesForSkill, + createDecisionCausality +}; diff --git a/hooks/routing-decision-trace.mjs b/hooks/routing-decision-trace.mjs new file mode 100644 index 0000000..d45080f --- /dev/null +++ b/hooks/routing-decision-trace.mjs @@ -0,0 +1,84 @@ +// hooks/src/routing-decision-trace.mts +import { + appendFileSync, + mkdirSync, + readFileSync +} from "fs"; +import { join } from "path"; +import { tmpdir } from "os"; +import { createHash } from "crypto"; +var SAFE_SESSION_ID_RE = /^[a-zA-Z0-9_-]+$/; +function safeSessionSegment(sessionId) { + if (!sessionId) return "no-session"; + if (SAFE_SESSION_ID_RE.test(sessionId)) return sessionId; + return createHash("sha256").update(sessionId).digest("hex"); +} +function normalizeTrace(raw) { + if (raw.version === 2) { + const trace = raw; + return { + ...trace, + causes: trace.causes ?? [], + edges: trace.edges ?? [] + }; + } + const v1 = raw; + return { + ...v1, + version: 2, + primaryStory: { + id: v1.primaryStory.id, + kind: v1.primaryStory.kind, + storyRoute: v1.primaryStory.route, + targetBoundary: v1.primaryStory.targetBoundary + }, + observedRoute: v1.primaryStory.route, + // best-effort: v1 conflated the two + causes: [], + edges: [] + }; +} +function traceDir(sessionId) { + return join( + tmpdir(), + `vercel-plugin-${safeSessionSegment(sessionId)}-trace` + ); +} +function tracePath(sessionId) { + return join(traceDir(sessionId), "routing-decision-trace.jsonl"); +} +function createDecisionId(input) { + const timestamp = input.timestamp ?? (/* @__PURE__ */ new Date()).toISOString(); + return createHash("sha256").update( + [ + input.hook, + input.sessionId ?? "", + input.toolName, + input.toolTarget, + timestamp + ].join("|") + ).digest("hex").slice(0, 16); +} +function appendRoutingDecisionTrace(trace) { + mkdirSync(traceDir(trace.sessionId), { recursive: true }); + appendFileSync( + tracePath(trace.sessionId), + JSON.stringify(trace) + "\n", + "utf8" + ); +} +function readRoutingDecisionTrace(sessionId) { + try { + const content = readFileSync(tracePath(sessionId), "utf8"); + return content.split("\n").filter((line) => line.trim() !== "").map((line) => normalizeTrace(JSON.parse(line))); + } catch { + return []; + } +} +export { + appendRoutingDecisionTrace, + createDecisionId, + readRoutingDecisionTrace, + traceDir, + tracePath +}; diff --git a/hooks/routing-diagnosis.mjs b/hooks/routing-diagnosis.mjs new file mode 100644 index 0000000..7ef5006 --- /dev/null +++ b/hooks/routing-diagnosis.mjs @@ -0,0 +1,288 @@ +// hooks/src/routing-diagnosis.mts +import { + computePolicySuccessRate, + derivePolicyBoost, + scenarioKeyCandidates +} from "./routing-policy.mjs"; +import { selectPolicyRecallCandidates } from "./policy-recall.mjs"; +var POLICY_RECALL_MIN_EXPOSURES = 3; +var POLICY_RECALL_MIN_SUCCESS_RATE = 0.65; +var POLICY_RECALL_MIN_BOOST = 2; +var HOOK_NAMES = ["PreToolUse", "UserPromptSubmit"]; +var TOOL_NAMES = ["Read", "Edit", "Write", "Bash", "Prompt"]; +var BOUNDARIES = [ + "uiRender", + "clientRequest", + "serverHandler", + "environment" +]; +function round(value) { + return Number(value.toFixed(4)); +} +function diagnosticRecallScore(stats) { + return round( + derivePolicyBoost(stats) * 1e3 + computePolicySuccessRate(stats) * 100 + stats.exposures + ); +} +function qualifies(stats) { + const successRate = round(computePolicySuccessRate(stats)); + const policyBoost = derivePolicyBoost(stats); + const qualified = stats.exposures >= POLICY_RECALL_MIN_EXPOSURES && successRate >= POLICY_RECALL_MIN_SUCCESS_RATE && policyBoost >= POLICY_RECALL_MIN_BOOST; + return { successRate, policyBoost, qualified }; +} +function pushHint(target, hint) { + const key = JSON.stringify([ + hint.code, + hint.action?.type ?? null, + hint.action?.skill ?? null, + hint.action?.scenario ?? null + ]); + const exists = target.some((existing) => { + const existingKey = JSON.stringify([ + existing.code, + existing.action?.type ?? null, + existing.action?.skill ?? null, + existing.action?.scenario ?? null + ]); + return existingKey === key; + }); + if (!exists) { + target.push(hint); + } +} +function isHookName(value) { + return HOOK_NAMES.includes(value); +} +function isToolName(value) { + return TOOL_NAMES.includes(value); +} +function isBoundary(value) { + return BOUNDARIES.includes(value); +} +function parsePolicyScenario(value) { + if (!value) return null; + const parts = value.split("|"); + if (parts.length < 4) return null; + const [hook, storyKind, targetBoundary, toolName, routeScope] = parts; + if (!isHookName(hook) || !isToolName(toolName)) { + return null; + } + return { + hook, + storyKind: storyKind === "none" ? null : storyKind, + targetBoundary: targetBoundary === "none" ? null : isBoundary(targetBoundary) ? targetBoundary : null, + toolName, + routeScope: typeof routeScope === "string" && routeScope.length > 0 ? routeScope : null + }; +} +function candidateFromStats(skill, scenario, stats, selectedBucket, selectedSkills, excludeSkills) { + if (selectedBucket === scenario && selectedSkills.has(skill)) { + return null; + } + const { successRate, policyBoost, qualified } = qualifies(stats); + const excluded = excludeSkills.has(skill); + let rejectedReason = null; + if (selectedBucket && scenario !== selectedBucket) { + rejectedReason = `shadowed_by_selected_bucket:${selectedBucket}`; + } else if (excluded) { + rejectedReason = "already_ranked_or_injected"; + } else if (stats.exposures < POLICY_RECALL_MIN_EXPOSURES) { + rejectedReason = `needs_${POLICY_RECALL_MIN_EXPOSURES - stats.exposures}_more_exposures`; + } else if (qualified) { + rejectedReason = "lost_tiebreak_in_selected_bucket"; + } else if (successRate < POLICY_RECALL_MIN_SUCCESS_RATE) { + rejectedReason = `success_rate_${successRate.toFixed(3)}_below_${POLICY_RECALL_MIN_SUCCESS_RATE.toFixed(3)}`; + } else if (policyBoost < POLICY_RECALL_MIN_BOOST) { + rejectedReason = `policy_boost_${policyBoost}_below_${POLICY_RECALL_MIN_BOOST}`; + } + return { + skill, + scenario, + exposures: stats.exposures, + wins: stats.wins, + directiveWins: stats.directiveWins, + staleMisses: stats.staleMisses, + successRate, + policyBoost, + recallScore: diagnosticRecallScore(stats), + qualified, + excluded, + rejectedReason + }; +} +function buildHints(input, diagnosis) { + const hints = []; + const preferredExactScenario = scenarioKeyCandidates(input)[0] ?? null; + if (diagnosis.selectedBucket && diagnosis.selectedBucket.endsWith("|*")) { + pushHint(hints, { + severity: "info", + code: "POLICY_RECALL_USING_WILDCARD_ROUTE", + message: `Policy recall is selecting the wildcard bucket for ${input.toolName}`, + hint: "Collect exact-route wins for the active route so recall can promote from * to the concrete route key", + action: { + type: "seed_exact_route_history", + scenario: preferredExactScenario ?? void 0 + } + }); + } + if (diagnosis.checkedScenarios.every((bucket) => bucket.skillCount === 0)) { + pushHint(hints, { + severity: "info", + code: "POLICY_RECALL_NO_HISTORY", + message: "No routing-policy history exists for this scenario", + hint: "Let the current verification loop complete once to seed exposures and outcomes", + action: { + type: "no_history", + scenario: preferredExactScenario ?? void 0 + } + }); + } + const needsExposure = diagnosis.rejected.find( + (candidate) => typeof candidate.rejectedReason === "string" && candidate.rejectedReason.startsWith("needs_") + ); + if (needsExposure) { + pushHint(hints, { + severity: "warning", + code: "POLICY_RECALL_NEEDS_EXPOSURES", + message: `${needsExposure.skill} is close to qualifying but needs more exposures`, + hint: `Record ${POLICY_RECALL_MIN_EXPOSURES - needsExposure.exposures} more exposure(s) for ${needsExposure.scenario}`, + action: { + type: "collect_more_exposures", + skill: needsExposure.skill, + scenario: needsExposure.scenario, + remainingExposures: POLICY_RECALL_MIN_EXPOSURES - needsExposure.exposures + } + }); + } + const lowSuccess = diagnosis.rejected.find( + (candidate) => typeof candidate.rejectedReason === "string" && candidate.rejectedReason.startsWith("success_rate_") + ); + if (lowSuccess) { + pushHint(hints, { + severity: "warning", + code: "POLICY_RECALL_LOW_SUCCESS_RATE", + message: `${lowSuccess.skill} has history, but its success rate is below the recall threshold`, + hint: "Inspect stale misses and directive adherence before trusting policy recall here", + action: { + type: "improve_success_rate", + skill: lowSuccess.skill, + scenario: lowSuccess.scenario + } + }); + } + const alreadyPresent = diagnosis.rejected.find( + (candidate) => candidate.rejectedReason === "already_ranked_or_injected" + ); + if (alreadyPresent) { + pushHint(hints, { + severity: "info", + code: "POLICY_RECALL_ALREADY_PRESENT", + message: `${alreadyPresent.skill} already exists in the ranked or injected set`, + hint: "No recall action is needed; the candidate is already present via direct routing or prior injection", + action: { + type: "candidate_already_present", + skill: alreadyPresent.skill, + scenario: alreadyPresent.scenario + } + }); + } + const precedence = diagnosis.rejected.find( + (candidate) => typeof candidate.rejectedReason === "string" && candidate.rejectedReason.startsWith("shadowed_by_selected_bucket:") + ); + if (precedence) { + pushHint(hints, { + severity: "info", + code: "POLICY_RECALL_PRECEDENCE_APPLIED", + message: "A higher-precedence bucket won, so lower-precedence buckets were intentionally ignored", + hint: "This is expected: exact route > wildcard route > legacy 4-part key", + action: { + type: "selected_bucket_precedence", + skill: precedence.skill, + scenario: diagnosis.selectedBucket ?? precedence.scenario + } + }); + } + return hints; +} +function explainPolicyRecall(policy, input, options = {}) { + const excludeSkills = options.excludeSkills ?? /* @__PURE__ */ new Set(); + const maxCandidates = options.maxCandidates ?? 1; + if (!input.targetBoundary) { + return { + eligible: false, + skipReason: "no_target_boundary", + checkedScenarios: [], + selectedBucket: null, + selected: [], + rejected: [], + hints: [] + }; + } + const selectedRaw = selectPolicyRecallCandidates(policy, input, { + maxCandidates, + excludeSkills + }); + const selectedBucket = selectedRaw[0]?.scenario ?? null; + const selectedSkills = new Set( + selectedRaw.map((candidate) => candidate.skill) + ); + const checkedScenarios = scenarioKeyCandidates(input).map((scenario) => { + const bucket = policy.scenarios[scenario] ?? {}; + const qualifiedCount = Object.entries(bucket).filter(([, stats]) => { + const { qualified } = qualifies(stats); + return qualified; + }).length; + return { + scenario, + skillCount: Object.keys(bucket).length, + qualifiedCount, + selected: scenario === selectedBucket + }; + }); + const selected = selectedRaw.map((candidate) => ({ + skill: candidate.skill, + scenario: candidate.scenario, + exposures: candidate.exposures, + wins: candidate.wins, + directiveWins: candidate.directiveWins, + staleMisses: candidate.staleMisses ?? 0, + successRate: round(candidate.successRate), + policyBoost: candidate.policyBoost, + recallScore: candidate.recallScore, + qualified: true, + excluded: false, + rejectedReason: null + })); + const rejected = []; + for (const scenario of scenarioKeyCandidates(input)) { + const bucket = policy.scenarios[scenario] ?? {}; + for (const [skill, stats] of Object.entries(bucket)) { + const candidate = candidateFromStats( + skill, + scenario, + stats, + selectedBucket, + selectedSkills, + excludeSkills + ); + if (candidate) { + rejected.push(candidate); + } + } + } + const diagnosis = { + eligible: true, + skipReason: null, + checkedScenarios, + selectedBucket, + selected, + rejected, + hints: [] + }; + diagnosis.hints = buildHints(input, diagnosis); + return diagnosis; +} +export { + explainPolicyRecall, + parsePolicyScenario +}; diff --git a/hooks/routing-policy-compiler.mjs b/hooks/routing-policy-compiler.mjs new file mode 100644 index 0000000..f18979e --- /dev/null +++ b/hooks/routing-policy-compiler.mjs @@ -0,0 +1,200 @@ +// hooks/src/routing-policy-compiler.mts +import { + derivePolicyBoost +} from "./routing-policy.mjs"; +import { + createRule as createRulebookRule +} from "./learned-routing-rulebook.mjs"; +import { createLogger } from "./logger.mjs"; +function boostForAction(rec) { + switch (rec.action) { + case "promote": + return 8; + case "demote": + return -2; + case "investigate": + return 0; + } +} +function compilePolicyPatch(policy, report) { + const log = createLogger(); + log.summary("policy_compiler_start", { + sessionId: report.sessionId, + recommendationCount: report.recommendations.length + }); + const entries = []; + for (const rec of report.recommendations) { + const bucket = policy.scenarios[rec.scenario] ?? {}; + const stats = bucket[rec.skill]; + const currentBoost = derivePolicyBoost(stats); + const proposedBoost = boostForAction(rec); + const delta = proposedBoost - currentBoost; + if (delta !== 0 || rec.action === "investigate") { + const action = rec.action === "investigate" ? "investigate" : delta > 0 ? "promote" : delta < 0 ? "demote" : "no-op"; + const entry = { + scenario: rec.scenario, + skill: rec.skill, + action, + currentBoost, + proposedBoost, + delta, + confidence: rec.confidence, + reason: rec.reason + }; + entries.push(entry); + log.debug("policy_patch_entry", { + scenario: rec.scenario, + skill: rec.skill, + action, + currentBoost, + proposedBoost, + delta + }); + } else { + log.debug("policy_patch_no_op", { + scenario: rec.scenario, + skill: rec.skill, + currentBoost, + proposedBoost, + reason: "boost already aligned" + }); + } + } + entries.sort( + (a, b) => a.scenario.localeCompare(b.scenario) || a.skill.localeCompare(b.skill) + ); + log.summary("policy_compiler_complete", { + sessionId: report.sessionId, + patchCount: entries.length, + promotes: entries.filter((e) => e.action === "promote").length, + demotes: entries.filter((e) => e.action === "demote").length, + investigates: entries.filter((e) => e.action === "investigate").length, + noOps: entries.filter((e) => e.action === "no-op").length + }); + return { + version: 1, + sessionId: report.sessionId, + patchCount: entries.length, + entries + }; +} +function applyPolicyPatch(patch, now) { + const log = createLogger(); + const timestamp = now ?? (/* @__PURE__ */ new Date()).toISOString(); + const rules = []; + for (const entry of patch.entries) { + if (entry.action === "investigate" || entry.action === "no-op") { + log.debug("policy_apply_skip", { + scenario: entry.scenario, + skill: entry.skill, + action: entry.action, + reason: "non-actionable" + }); + continue; + } + rules.push({ + scenario: entry.scenario, + skill: entry.skill, + action: entry.action, + boost: Math.abs(entry.proposedBoost), + confidence: entry.confidence, + reason: entry.reason + }); + log.summary("policy_apply_entry", { + scenario: entry.scenario, + skill: entry.skill, + action: entry.action, + proposedBoost: entry.proposedBoost, + delta: entry.delta + }); + } + log.summary("policy_apply_complete", { + sessionId: patch.sessionId, + applied: rules.length, + total: patch.entries.length + }); + return { + version: 1, + sessionId: patch.sessionId, + promotedAt: timestamp, + applied: rules.length, + rules + }; +} +function evaluatePromotionGate(params) { + const { artifact, replay, now = artifact.promotedAt } = params; + const log = createLogger(); + if (replay.regressions.length > 0) { + const result = { + accepted: false, + errorCode: "RULEBOOK_PROMOTION_REJECTED_REGRESSION", + reason: `Promotion rejected: ${replay.regressions.length} regression(s) detected`, + replay, + rulebook: null + }; + log.summary("promotion_gate_rejected", { + errorCode: result.errorCode, + regressionCount: replay.regressions.length, + regressions: replay.regressions + }); + return result; + } + if (replay.learnedWins < replay.baselineWins) { + const result = { + accepted: false, + errorCode: "RULEBOOK_PROMOTION_REJECTED_REGRESSION", + reason: `Promotion rejected: learned wins (${replay.learnedWins}) < baseline wins (${replay.baselineWins})`, + replay, + rulebook: null + }; + log.summary("promotion_gate_rejected", { + errorCode: result.errorCode, + learnedWins: replay.learnedWins, + baselineWins: replay.baselineWins + }); + return result; + } + const rulebookRules = artifact.rules.map( + (r) => createRulebookRule({ + scenario: r.scenario, + skill: r.skill, + action: r.action, + boost: r.boost, + confidence: r.confidence, + reason: r.reason, + sourceSessionId: artifact.sessionId, + promotedAt: now, + evidence: { + baselineWins: replay.baselineWins, + baselineDirectiveWins: replay.baselineDirectiveWins, + learnedWins: replay.learnedWins, + learnedDirectiveWins: replay.learnedDirectiveWins, + regressionCount: replay.regressions.length + } + }) + ); + const rulebook = { + version: 1, + createdAt: now, + sessionId: artifact.sessionId, + rules: rulebookRules + }; + log.summary("promotion_gate_accepted", { + sessionId: artifact.sessionId, + ruleCount: rulebookRules.length, + learnedWins: replay.learnedWins, + baselineWins: replay.baselineWins + }); + return { + accepted: true, + errorCode: null, + reason: `Promotion accepted: ${rulebookRules.length} rule(s), ${replay.learnedWins} learned wins, 0 regressions`, + replay, + rulebook + }; +} +export { + applyPolicyPatch, + compilePolicyPatch, + evaluatePromotionGate +}; diff --git a/hooks/routing-policy-ledger.mjs b/hooks/routing-policy-ledger.mjs new file mode 100644 index 0000000..0326da4 --- /dev/null +++ b/hooks/routing-policy-ledger.mjs @@ -0,0 +1,237 @@ +// hooks/src/routing-policy-ledger.mts +import { + appendFileSync, + readFileSync, + writeFileSync +} from "fs"; +import { createHash } from "crypto"; +import { tmpdir } from "os"; +import { + createEmptyRoutingPolicy, + recordExposure as policyRecordExposure, + recordOutcome as policyRecordOutcome +} from "./routing-policy.mjs"; +import { createLogger } from "./logger.mjs"; +var SAFE_SESSION_ID_RE = /^[a-zA-Z0-9_-]+$/; +function safeSessionSegment(sessionId) { + if (SAFE_SESSION_ID_RE.test(sessionId)) return sessionId; + return createHash("sha256").update(sessionId).digest("hex"); +} +function projectPolicyPath(projectRoot) { + const hash = createHash("sha256").update(projectRoot).digest("hex"); + return `${tmpdir()}/vercel-plugin-routing-policy-${hash}.json`; +} +function sessionExposurePath(sessionId) { + return `${tmpdir()}/vercel-plugin-${safeSessionSegment(sessionId)}-routing-exposures.jsonl`; +} +function loadProjectRoutingPolicy(projectRoot) { + const path = projectPolicyPath(projectRoot); + try { + const raw = readFileSync(path, "utf-8"); + const parsed = JSON.parse(raw); + if (parsed && parsed.version === 1 && typeof parsed.scenarios === "object") { + return parsed; + } + } catch { + } + return createEmptyRoutingPolicy(); +} +function saveProjectRoutingPolicy(projectRoot, policy) { + const path = projectPolicyPath(projectRoot); + const log = createLogger(); + writeFileSync(path, JSON.stringify(policy, null, 2) + "\n"); + log.summary("routing-policy-ledger.save", { + path, + scenarioCount: Object.keys(policy.scenarios).length + }); +} +function shouldAffectPolicy(exposure) { + if (!exposure.attributionRole) return true; + return exposure.attributionRole === "candidate"; +} +function appendSkillExposure(exposure) { + const path = sessionExposurePath(exposure.sessionId); + const log = createLogger(); + appendFileSync(path, JSON.stringify(exposure) + "\n"); + if (shouldAffectPolicy(exposure)) { + const policy = loadProjectRoutingPolicy(exposure.projectRoot); + policyRecordExposure(policy, { + hook: exposure.hook, + storyKind: exposure.storyKind, + targetBoundary: exposure.targetBoundary, + toolName: exposure.toolName, + routeScope: exposure.route, + skill: exposure.skill, + now: exposure.createdAt + }); + saveProjectRoutingPolicy(exposure.projectRoot, policy); + } + log.summary("routing-policy-ledger.exposure-append", { + id: exposure.id, + skill: exposure.skill, + hook: exposure.hook, + targetBoundary: exposure.targetBoundary, + route: exposure.route, + outcome: exposure.outcome, + attributionRole: exposure.attributionRole ?? "legacy", + exposureGroupId: exposure.exposureGroupId ?? null, + policyAffected: shouldAffectPolicy(exposure) + }); +} +function loadSessionExposures(sessionId) { + const path = sessionExposurePath(sessionId); + try { + const raw = readFileSync(path, "utf-8"); + return raw.split("\n").filter((line) => line.trim().length > 0).map((line) => JSON.parse(line)); + } catch { + return []; + } +} +function resolveBoundaryOutcome(params) { + const { sessionId, boundary, matchedSuggestedAction } = params; + const storyId = params.storyId ?? null; + const route = params.route ?? null; + const now = params.now ?? (/* @__PURE__ */ new Date()).toISOString(); + const log = createLogger(); + const exposures = loadSessionExposures(sessionId); + const resolved = []; + const pending = exposures.filter( + (e) => e.outcome === "pending" && e.sessionId === sessionId && e.targetBoundary === boundary && e.storyId === storyId && e.route === route + ); + log.summary("routing-policy-ledger.resolve-filter", { + sessionId, + boundary, + storyId, + route, + totalExposures: exposures.length, + pendingCount: exposures.filter((e) => e.outcome === "pending").length, + matchedCount: pending.length + }); + if (pending.length === 0) { + log.trace("routing-policy-ledger.resolve-skip", { + sessionId, + boundary, + storyId, + route, + reason: "no_matching_pending_exposures" + }); + return []; + } + const outcome = matchedSuggestedAction ? "directive-win" : "win"; + for (const exposure of pending) { + exposure.outcome = outcome; + exposure.resolvedAt = now; + resolved.push(exposure); + log.summary("routing-policy-ledger.exposure-resolved", { + id: exposure.id, + skill: exposure.skill, + outcome, + storyId: exposure.storyId, + route: exposure.route, + boundary + }); + } + const path = sessionExposurePath(sessionId); + const lines = exposures.map((e) => JSON.stringify(e)).join("\n") + "\n"; + writeFileSync(path, lines); + const candidateResolved = resolved.filter(shouldAffectPolicy); + const projectRoots = new Set(resolved.map((e) => e.projectRoot)); + for (const projectRoot of projectRoots) { + const candidates = candidateResolved.filter((r) => r.projectRoot === projectRoot); + if (candidates.length === 0) continue; + const policy = loadProjectRoutingPolicy(projectRoot); + for (const e of candidates) { + policyRecordOutcome(policy, { + hook: e.hook, + storyKind: e.storyKind, + targetBoundary: e.targetBoundary, + toolName: e.toolName, + routeScope: e.route, + skill: e.skill, + outcome, + now + }); + } + saveProjectRoutingPolicy(projectRoot, policy); + } + log.summary("routing-policy-ledger.resolve", { + sessionId, + boundary, + storyId, + route, + outcome, + resolvedCount: resolved.length, + candidateCount: candidateResolved.length, + contextCount: resolved.length - candidateResolved.length, + skills: resolved.map((e) => e.skill) + }); + return resolved; +} +function finalizeStaleExposures(sessionId, now) { + const timestamp = now ?? (/* @__PURE__ */ new Date()).toISOString(); + const log = createLogger(); + const exposures = loadSessionExposures(sessionId); + const stale = exposures.filter( + (e) => e.outcome === "pending" && e.sessionId === sessionId + ); + if (stale.length === 0) { + log.trace("routing-policy-ledger.finalize-skip", { + sessionId, + reason: "no_pending_exposures" + }); + return []; + } + for (const exposure of stale) { + exposure.outcome = "stale-miss"; + exposure.resolvedAt = timestamp; + log.summary("routing-policy-ledger.exposure-stale", { + id: exposure.id, + skill: exposure.skill, + outcome: "stale-miss", + storyId: exposure.storyId, + route: exposure.route, + targetBoundary: exposure.targetBoundary + }); + } + const path = sessionExposurePath(sessionId); + const lines = exposures.map((e) => JSON.stringify(e)).join("\n") + "\n"; + writeFileSync(path, lines); + const candidateStale = stale.filter(shouldAffectPolicy); + const projectRoots = new Set(stale.map((e) => e.projectRoot)); + for (const projectRoot of projectRoots) { + const candidates = candidateStale.filter((r) => r.projectRoot === projectRoot); + if (candidates.length === 0) continue; + const policy = loadProjectRoutingPolicy(projectRoot); + for (const e of candidates) { + policyRecordOutcome(policy, { + hook: e.hook, + storyKind: e.storyKind, + targetBoundary: e.targetBoundary, + toolName: e.toolName, + routeScope: e.route, + skill: e.skill, + outcome: "stale-miss", + now: timestamp + }); + } + saveProjectRoutingPolicy(projectRoot, policy); + } + log.summary("routing-policy-ledger.finalize-stale", { + sessionId, + staleCount: stale.length, + candidateCount: candidateStale.length, + contextCount: stale.length - candidateStale.length, + skills: stale.map((e) => e.skill) + }); + return stale; +} +export { + appendSkillExposure, + finalizeStaleExposures, + loadProjectRoutingPolicy, + loadSessionExposures, + projectPolicyPath, + resolveBoundaryOutcome, + saveProjectRoutingPolicy, + sessionExposurePath +}; diff --git a/hooks/routing-policy.mjs b/hooks/routing-policy.mjs new file mode 100644 index 0000000..bcb0ba8 --- /dev/null +++ b/hooks/routing-policy.mjs @@ -0,0 +1,159 @@ +// hooks/src/routing-policy.mts +function createEmptyRoutingPolicy() { + return { + version: 1, + scenarios: {} + }; +} +function scenarioKey(input) { + return [ + input.hook, + input.storyKind ?? "none", + input.targetBoundary ?? "none", + input.toolName + ].join("|"); +} +function scenarioKeyWithRoute(input) { + return [ + input.hook, + input.storyKind ?? "none", + input.targetBoundary ?? "none", + input.toolName, + input.routeScope ?? "*" + ].join("|"); +} +function scenarioKeyCandidates(input) { + const keys = []; + if (input.routeScope && input.routeScope !== "*") { + keys.push(scenarioKeyWithRoute(input)); + } + keys.push(scenarioKeyWithRoute({ ...input, routeScope: "*" })); + keys.push(scenarioKey(input)); + return [...new Set(keys)]; +} +function computePolicySuccessRate(stats) { + const weightedWins = stats.wins + stats.directiveWins * 0.25; + return weightedWins / Math.max(stats.exposures, 1); +} +function lookupPolicyStats(policy, input, skill) { + for (const key of scenarioKeyCandidates(input)) { + const stats = policy.scenarios[key]?.[skill]; + if (stats) return { scenario: key, stats }; + } + return { scenario: null, stats: void 0 }; +} +function ensureScenario(policy, scenario, skill, now) { + if (!policy.scenarios[scenario]) policy.scenarios[scenario] = {}; + if (!policy.scenarios[scenario][skill]) { + policy.scenarios[scenario][skill] = { + exposures: 0, + wins: 0, + directiveWins: 0, + staleMisses: 0, + lastUpdatedAt: now + }; + } + return policy.scenarios[scenario][skill]; +} +function recordExposure(policy, input) { + const now = input.now ?? (/* @__PURE__ */ new Date()).toISOString(); + for (const key of scenarioKeyCandidates(input)) { + const stats = ensureScenario(policy, key, input.skill, now); + stats.exposures += 1; + stats.lastUpdatedAt = now; + } + return policy; +} +function recordOutcome(policy, input) { + const now = input.now ?? (/* @__PURE__ */ new Date()).toISOString(); + for (const key of scenarioKeyCandidates(input)) { + const stats = ensureScenario(policy, key, input.skill, now); + if (input.outcome === "win") { + stats.wins += 1; + } else if (input.outcome === "directive-win") { + stats.wins += 1; + stats.directiveWins += 1; + } else { + stats.staleMisses += 1; + } + stats.lastUpdatedAt = now; + } + return policy; +} +function derivePolicyBoost(stats) { + if (!stats) return 0; + if (stats.exposures < 3) return 0; + const weightedWins = stats.wins + stats.directiveWins * 0.25; + const successRate = weightedWins / Math.max(stats.exposures, 1); + if (successRate >= 0.8) return 8; + if (successRate >= 0.65) return 5; + if (successRate >= 0.4) return 2; + if (stats.exposures >= 5 && successRate < 0.15) return -2; + return 0; +} +function applyPolicyBoosts(entries, policy, scenarioInput) { + return entries.map((entry) => { + const { scenario, stats } = lookupPolicyStats(policy, scenarioInput, entry.skill); + const boost = derivePolicyBoost(stats); + const base = typeof entry.effectivePriority === "number" ? entry.effectivePriority : entry.priority; + return { + ...entry, + effectivePriority: base + boost, + policyBoost: boost, + policyReason: stats && scenario ? `${scenario}: ${stats.wins} wins / ${stats.exposures} exposures, ${stats.directiveWins} directive wins, ${stats.staleMisses} stale misses` : null + }; + }); +} +function matchRulebookRule(rulebook, scenarioInput, skill) { + if (rulebook.rules.length === 0) return null; + for (const key of scenarioKeyCandidates(scenarioInput)) { + const rule = rulebook.rules.find( + (r) => r.scenario === key && r.skill === skill + ); + if (rule) return { rule, matchedScenario: key }; + } + return null; +} +function applyRulebookBoosts(entries, rulebook, scenarioInput, rulebookFilePath) { + return entries.map((entry) => { + const match = matchRulebookRule(rulebook, scenarioInput, entry.skill); + if (!match) { + return { + ...entry, + matchedRuleId: null, + ruleBoost: 0, + ruleReason: null, + rulebookPath: null + }; + } + const { rule } = match; + const ruleBoost = rule.action === "promote" ? rule.boost : -rule.boost; + const base = (typeof entry.effectivePriority === "number" ? entry.effectivePriority : entry.priority) - entry.policyBoost; + return { + ...entry, + effectivePriority: base + ruleBoost, + policyBoost: 0, + // suppressed — rulebook takes precedence + policyReason: null, + matchedRuleId: rule.id, + ruleBoost, + ruleReason: rule.reason, + rulebookPath: rulebookFilePath + }; + }); +} +export { + applyPolicyBoosts, + applyRulebookBoosts, + computePolicySuccessRate, + createEmptyRoutingPolicy, + derivePolicyBoost, + ensureScenario, + lookupPolicyStats, + matchRulebookRule, + recordExposure, + recordOutcome, + scenarioKey, + scenarioKeyCandidates, + scenarioKeyWithRoute +}; diff --git a/hooks/routing-replay.mjs b/hooks/routing-replay.mjs new file mode 100644 index 0000000..42de956 --- /dev/null +++ b/hooks/routing-replay.mjs @@ -0,0 +1,130 @@ +// hooks/src/routing-replay.mts +import { + readRoutingDecisionTrace +} from "./routing-decision-trace.mjs"; +import { loadSessionExposures } from "./routing-policy-ledger.mjs"; +import { createLogger } from "./logger.mjs"; +function buildScenarioKey(exposure) { + return [ + exposure.hook, + exposure.storyKind ?? "none", + exposure.targetBoundary ?? "none", + exposure.toolName + ].join("|"); +} +function emptyStats() { + return { exposures: 0, wins: 0, directiveWins: 0, staleMisses: 0 }; +} +var PROMOTE_MIN_EXPOSURES = 3; +var PROMOTE_MIN_SUCCESS_RATE = 0.8; +var PROMOTE_BOOST = 8; +var DEMOTE_MIN_EXPOSURES = 5; +var DEMOTE_MAX_SUCCESS_RATE = 0.15; +var DEMOTE_BOOST = -2; +var INVESTIGATE_MIN_EXPOSURES = 3; +var INVESTIGATE_MIN_RATE = 0.4; +var INVESTIGATE_MAX_RATE = 0.65; +function replayRoutingSession(sessionId) { + const log = createLogger(); + log.summary("replay_start", { sessionId }); + const traces = readRoutingDecisionTrace(sessionId); + const exposures = loadSessionExposures(sessionId); + log.debug("replay_loaded", { + sessionId, + traceCount: traces.length, + exposureCount: exposures.length + }); + const buckets = /* @__PURE__ */ new Map(); + for (const trace of traces) { + const scenario = trace.policyScenario; + if (scenario && !buckets.has(scenario)) { + buckets.set(scenario, /* @__PURE__ */ new Map()); + } + } + for (const exposure of exposures) { + const scenario = buildScenarioKey(exposure); + let bySkill = buckets.get(scenario); + if (!bySkill) { + bySkill = /* @__PURE__ */ new Map(); + buckets.set(scenario, bySkill); + } + const current = bySkill.get(exposure.skill) ?? emptyStats(); + current.exposures += 1; + if (exposure.outcome === "win") { + current.wins += 1; + } else if (exposure.outcome === "directive-win") { + current.wins += 1; + current.directiveWins += 1; + } else if (exposure.outcome === "stale-miss") { + current.staleMisses += 1; + } + bySkill.set(exposure.skill, current); + } + const scenarios = [...buckets.entries()].sort(([a], [b]) => a.localeCompare(b)).map(([scenario, bySkill]) => { + const topSkills = [...bySkill.entries()].map(([skill, stats]) => ({ skill, ...stats })).sort( + (a, b) => b.wins - a.wins || b.exposures - a.exposures || a.skill.localeCompare(b.skill) + ); + return { + scenario, + exposures: topSkills.reduce((n, s) => n + s.exposures, 0), + wins: topSkills.reduce((n, s) => n + s.wins, 0), + directiveWins: topSkills.reduce((n, s) => n + s.directiveWins, 0), + staleMisses: topSkills.reduce((n, s) => n + s.staleMisses, 0), + topSkills + }; + }); + const recommendations = []; + for (const scenario of scenarios) { + for (const skill of scenario.topSkills) { + const successRate = skill.exposures === 0 ? 0 : skill.wins / skill.exposures; + if (skill.exposures >= PROMOTE_MIN_EXPOSURES && successRate >= PROMOTE_MIN_SUCCESS_RATE) { + recommendations.push({ + scenario: scenario.scenario, + skill: skill.skill, + action: "promote", + suggestedBoost: PROMOTE_BOOST, + confidence: Math.min(0.99, successRate), + reason: `${skill.wins}/${skill.exposures} wins in ${scenario.scenario}` + }); + } else if (skill.exposures >= DEMOTE_MIN_EXPOSURES && successRate < DEMOTE_MAX_SUCCESS_RATE) { + recommendations.push({ + scenario: scenario.scenario, + skill: skill.skill, + action: "demote", + suggestedBoost: DEMOTE_BOOST, + confidence: 1 - successRate, + reason: `${skill.wins}/${skill.exposures} wins in ${scenario.scenario}` + }); + } else if (skill.exposures >= INVESTIGATE_MIN_EXPOSURES && successRate >= INVESTIGATE_MIN_RATE && successRate < INVESTIGATE_MAX_RATE) { + recommendations.push({ + scenario: scenario.scenario, + skill: skill.skill, + action: "investigate", + suggestedBoost: 0, + confidence: successRate, + reason: `${skill.wins}/${skill.exposures} mixed results in ${scenario.scenario}` + }); + } + } + } + recommendations.sort( + (a, b) => a.scenario.localeCompare(b.scenario) || a.skill.localeCompare(b.skill) + ); + log.summary("replay_complete", { + sessionId, + traceCount: traces.length, + scenarioCount: scenarios.length, + recommendationCount: recommendations.length + }); + return { + version: 1, + sessionId, + traceCount: traces.length, + scenarioCount: scenarios.length, + scenarios, + recommendations + }; +} +export { + replayRoutingSession +}; diff --git a/hooks/rule-distillation.mjs b/hooks/rule-distillation.mjs new file mode 100644 index 0000000..9a822ec --- /dev/null +++ b/hooks/rule-distillation.mjs @@ -0,0 +1,270 @@ +// hooks/src/rule-distillation.mts +import { createLogger } from "./logger.mjs"; +import { replayLearnedRules } from "./rule-replay.mjs"; +import { replayLearnedRules as replayLearnedRules2 } from "./rule-replay.mjs"; +function computeRuleLift(input) { + const rulePrecision = input.wins / Math.max(input.support, 1); + const scenarioPrecision = input.scenarioWins / Math.max(input.scenarioExposures, 1); + if (scenarioPrecision === 0) return rulePrecision; + return rulePrecision / scenarioPrecision; +} +function classifyRuleConfidence(input) { + if (input.regressions > 0) return "holdout-fail"; + if (input.support >= 5 && input.precision >= 0.8 && input.lift >= 1.5) + return "promote"; + if (input.support >= 3 && input.precision >= 0.65 && input.lift >= 1.1) + return "candidate"; + return "holdout-fail"; +} +function scenarioKeyFromTrace(trace) { + const story = trace.primaryStory; + return [ + trace.hook, + story.kind ?? "_", + story.targetBoundary ?? "_", + trace.toolName, + story.storyRoute ?? "_" + ].join("|"); +} +function scenarioFromTrace(trace) { + const story = trace.primaryStory; + return { + hook: trace.hook, + storyKind: story.kind ?? null, + targetBoundary: story.targetBoundary ?? null, + toolName: trace.toolName, + routeScope: story.storyRoute ?? null + }; +} +function inferRuleKind(ranked, hook) { + if (!ranked.pattern) { + return hook === "UserPromptSubmit" ? "promptPhrase" : "pathPattern"; + } + switch (ranked.pattern.type) { + case "path": + case "pathPattern": + return "pathPattern"; + case "bash": + case "bashPattern": + return "bashPattern"; + case "import": + case "importPattern": + return "importPattern"; + case "prompt": + case "promptPhrase": + return "promptPhrase"; + case "promptAllOf": + return "promptAllOf"; + case "promptNoneOf": + return "promptNoneOf"; + case "companion": + return "companion"; + default: + return hook === "UserPromptSubmit" ? "promptPhrase" : "pathPattern"; + } +} +function extractPatternValue(ranked, trace) { + if (ranked.pattern?.value) return ranked.pattern.value; + if (trace.hook === "UserPromptSubmit") return trace.toolTarget || ""; + return trace.toolTarget || ""; +} +function candidateKey(scenarioKey, skill, kind, value) { + const v = Array.isArray(value) ? value.join(",") : value; + return `${scenarioKey}|${skill}|${kind}|${v}`; +} +function distillRulesFromTrace(params) { + const { + projectRoot, + traces, + exposures, + policy, + minSupport = 5, + minPrecision = 0.8, + minLift = 1.5, + generatedAt = (/* @__PURE__ */ new Date()).toISOString() + } = params; + const logger = createLogger("summary"); + logger.summary("distill_start", { + traceCount: traces.length, + exposureCount: exposures.length, + minSupport, + minPrecision, + minLift + }); + const exposureByKey = /* @__PURE__ */ new Map(); + for (const exp of exposures) { + const key = `${exp.sessionId}|${exp.skill}|${exp.hook}|${exp.route ?? "_"}`; + exposureByKey.set(key, exp); + } + const candidates = /* @__PURE__ */ new Map(); + const scenarioExposureCounts = /* @__PURE__ */ new Map(); + const scenarioWinCounts = /* @__PURE__ */ new Map(); + for (const trace of traces) { + const sKey = scenarioKeyFromTrace(trace); + const scenario = scenarioFromTrace(trace); + for (const ranked of trace.ranked) { + if (ranked.droppedReason) continue; + const expKey = `${trace.sessionId}|${ranked.skill}|${trace.hook}|${trace.primaryStory.storyRoute ?? "_"}`; + const exposure = exposureByKey.get(expKey); + if (!exposure) continue; + if (exposure.attributionRole !== "candidate") continue; + const kind = inferRuleKind(ranked, trace.hook); + const value = extractPatternValue(ranked, trace); + const cKey = candidateKey(sKey, ranked.skill, kind, value); + let acc = candidates.get(cKey); + if (!acc) { + acc = { + skill: ranked.skill, + kind, + value, + scenario, + scenarioKey: sKey, + support: 0, + wins: 0, + directiveWins: 0, + staleMisses: 0, + sourceDecisionIds: [] + }; + candidates.set(cKey, acc); + } + acc.support++; + acc.sourceDecisionIds.push(trace.decisionId); + scenarioExposureCounts.set( + sKey, + (scenarioExposureCounts.get(sKey) ?? 0) + 1 + ); + if (exposure.outcome === "win" || exposure.outcome === "directive-win") { + scenarioWinCounts.set(sKey, (scenarioWinCounts.get(sKey) ?? 0) + 1); + } + switch (exposure.outcome) { + case "win": + acc.wins++; + break; + case "directive-win": + acc.wins++; + acc.directiveWins++; + break; + case "stale-miss": + acc.staleMisses++; + break; + } + } + } + logger.summary("distill_candidates_extracted", { + candidateCount: candidates.size, + scenarioCount: scenarioExposureCounts.size + }); + const rules = []; + for (const acc of candidates.values()) { + const precision = acc.wins / Math.max(acc.support, 1); + const scenarioWins = scenarioWinCounts.get(acc.scenarioKey) ?? 0; + const scenarioExposures = scenarioExposureCounts.get(acc.scenarioKey) ?? 0; + const lift = computeRuleLift({ + wins: acc.wins, + support: acc.support, + scenarioWins, + scenarioExposures + }); + const confidence = classifyRuleConfidence({ + support: acc.support, + precision, + lift, + regressions: 0 + }); + const ruleId = `${acc.kind}:${acc.skill}:${Array.isArray(acc.value) ? acc.value.join("+") : acc.value}`; + const sortedIds = [...acc.sourceDecisionIds].sort(); + rules.push({ + id: ruleId, + skill: acc.skill, + kind: acc.kind, + value: acc.value, + scenario: acc.scenario, + support: acc.support, + wins: acc.wins, + directiveWins: acc.directiveWins, + staleMisses: acc.staleMisses, + precision: Number(precision.toFixed(4)), + lift: Number(lift.toFixed(4)), + sourceDecisionIds: sortedIds, + confidence, + promotedAt: confidence === "promote" ? generatedAt : null + }); + } + logger.summary("distill_scoring_complete", { + totalRules: rules.length, + promoted: rules.filter((r) => r.confidence === "promote").length, + candidate: rules.filter((r) => r.confidence === "candidate").length, + holdoutFail: rules.filter((r) => r.confidence === "holdout-fail").length + }); + rules.sort((a, b) => { + const scenarioA = [a.scenario.hook, a.scenario.storyKind ?? "_", a.scenario.targetBoundary ?? "_", a.scenario.toolName, a.scenario.routeScope ?? "_"].join("|"); + const scenarioB = [b.scenario.hook, b.scenario.storyKind ?? "_", b.scenario.targetBoundary ?? "_", b.scenario.toolName, b.scenario.routeScope ?? "_"].join("|"); + const sc = scenarioA.localeCompare(scenarioB); + if (sc !== 0) return sc; + const sk = a.skill.localeCompare(b.skill); + if (sk !== 0) return sk; + return a.id.localeCompare(b.id); + }); + const replay = replayLearnedRules({ traces, rules }); + let promotion; + const rejected = replay.regressions.length > 0 || replay.learnedWins < replay.baselineWins; + if (rejected) { + for (const rule of rules) { + if (rule.confidence === "promote") { + rule.confidence = "holdout-fail"; + rule.promotedAt = null; + } + } + const reasons = []; + if (replay.regressions.length > 0) { + reasons.push(`${replay.regressions.length} regression(s) detected`); + } + if (replay.learnedWins < replay.baselineWins) { + reasons.push(`learned wins (${replay.learnedWins}) < baseline wins (${replay.baselineWins})`); + } + promotion = { + accepted: false, + errorCode: "RULEBOOK_PROMOTION_REJECTED_REGRESSION", + reason: `Promotion rejected: ${reasons.join("; ")}` + }; + logger.summary("distill_promotion_rejected", { + errorCode: promotion.errorCode, + reason: promotion.reason, + regressions: replay.regressions.length, + learnedWins: replay.learnedWins, + baselineWins: replay.baselineWins + }); + } else { + const promotedCount = rules.filter((r) => r.confidence === "promote").length; + promotion = { + accepted: true, + errorCode: null, + reason: `Promotion accepted: ${promotedCount} rule(s) promoted, ${replay.learnedWins} learned wins, 0 regressions` + }; + logger.summary("distill_promotion_accepted", { + promotedCount, + learnedWins: replay.learnedWins, + baselineWins: replay.baselineWins + }); + } + logger.summary("distill_complete", { + ruleCount: rules.length, + replayDelta: replay.deltaWins, + regressions: replay.regressions.length, + promotionAccepted: promotion.accepted + }); + return { + version: 1, + generatedAt, + projectRoot, + rules, + replay, + promotion + }; +} +export { + classifyRuleConfidence, + computeRuleLift, + distillRulesFromTrace, + replayLearnedRules2 as replayLearnedRules +}; diff --git a/hooks/rule-replay.mjs b/hooks/rule-replay.mjs new file mode 100644 index 0000000..7291e0d --- /dev/null +++ b/hooks/rule-replay.mjs @@ -0,0 +1,98 @@ +// hooks/src/rule-replay.mts +import { createLogger } from "./logger.mjs"; +function scenarioKeyFromTrace(trace) { + const story = trace.primaryStory; + return [ + trace.hook, + story.kind ?? "_", + story.targetBoundary ?? "_", + trace.toolName, + story.storyRoute ?? "_" + ].join("|"); +} +function scenarioKeyFromRule(rule) { + return [ + rule.scenario.hook, + rule.scenario.storyKind ?? "_", + rule.scenario.targetBoundary ?? "_", + rule.scenario.toolName, + rule.scenario.routeScope ?? "_" + ].join("|"); +} +function replayLearnedRules(params) { + const { traces, rules } = params; + const logger = createLogger("summary"); + logger.summary("replay_start", { + traceCount: traces.length, + ruleCount: rules.length, + promotedCount: rules.filter((r) => r.confidence === "promote").length + }); + const promotedByScenario = /* @__PURE__ */ new Map(); + for (const rule of rules) { + if (rule.confidence !== "promote") continue; + const sKey = scenarioKeyFromRule(rule); + let skills = promotedByScenario.get(sKey); + if (!skills) { + skills = /* @__PURE__ */ new Set(); + promotedByScenario.set(sKey, skills); + } + skills.add(rule.skill); + } + let baselineWins = 0; + let baselineDirectiveWins = 0; + let learnedWins = 0; + let learnedDirectiveWins = 0; + const regressions = []; + for (const trace of traces) { + const sKey = scenarioKeyFromTrace(trace); + const promotedSkills = promotedByScenario.get(sKey); + const verifiedSuccess = trace.verification?.observedBoundary != null && trace.injectedSkills.length > 0; + const directiveAdherent = verifiedSuccess && trace.verification?.matchedSuggestedAction === true; + if (verifiedSuccess) baselineWins++; + if (directiveAdherent) baselineDirectiveWins++; + if (promotedSkills) { + const learnedOverlap = trace.injectedSkills.some( + (s) => promotedSkills.has(s) + ); + if (verifiedSuccess && !learnedOverlap) { + regressions.push(trace.decisionId); + logger.summary("replay_regression", { + decisionId: trace.decisionId, + scenario: sKey, + injectedSkills: trace.injectedSkills, + promotedSkills: [...promotedSkills] + }); + } else if (learnedOverlap) { + learnedWins++; + if (directiveAdherent) learnedDirectiveWins++; + } + } else if (verifiedSuccess) { + learnedWins++; + if (directiveAdherent) learnedDirectiveWins++; + } + } + regressions.sort(); + const result = { + baselineWins, + baselineDirectiveWins, + learnedWins, + learnedDirectiveWins, + deltaWins: learnedWins - baselineWins, + deltaDirectiveWins: learnedDirectiveWins - baselineDirectiveWins, + regressions + }; + logger.summary("replay_complete", { + baselineWins: result.baselineWins, + baselineDirectiveWins: result.baselineDirectiveWins, + learnedWins: result.learnedWins, + learnedDirectiveWins: result.learnedDirectiveWins, + deltaWins: result.deltaWins, + deltaDirectiveWins: result.deltaDirectiveWins, + regressionCount: result.regressions.length, + regressionIds: result.regressions + }); + return result; +} +export { + replayLearnedRules +}; diff --git a/hooks/session-end-cleanup.mjs b/hooks/session-end-cleanup.mjs index 5ac02f1..d9c2157 100755 --- a/hooks/session-end-cleanup.mjs +++ b/hooks/session-end-cleanup.mjs @@ -6,6 +6,7 @@ import { readdirSync, readFileSync, rmSync, unlinkSync, writeFileSync } from "fs import { homedir, tmpdir } from "os"; import { join, resolve } from "path"; import { fileURLToPath } from "url"; +import { finalizeStaleExposures } from "./routing-policy-ledger.mjs"; var SAFE_SESSION_ID_RE = /^[a-zA-Z0-9_-]+$/; function tempSessionIdSegment(sessionId) { if (SAFE_SESSION_ID_RE.test(sessionId)) { @@ -56,6 +57,10 @@ function main() { } const tempRoot = tmpdir(); const prefix = `vercel-plugin-${tempSessionIdSegment(sessionId)}-`; + try { + finalizeStaleExposures(sessionId, (/* @__PURE__ */ new Date()).toISOString()); + } catch { + } let entries = []; try { entries = readdirSync(tempRoot).filter((name) => name.startsWith(prefix)); diff --git a/hooks/src/cli-routing-replay.mts b/hooks/src/cli-routing-replay.mts new file mode 100644 index 0000000..788c2b0 --- /dev/null +++ b/hooks/src/cli-routing-replay.mts @@ -0,0 +1,46 @@ +/** + * CLI entry point for the routing replay analyzer. + * + * Usage: node cli-routing-replay.mjs + * + * Outputs a deterministic JSON RoutingReplayReport to stdout. + * Exits non-zero on missing or malformed input. + * Designed for machine consumption — JSON is the only output format. + */ + +import { replayRoutingSession } from "./routing-replay.mjs"; +import { createLogger } from "./logger.mjs"; + +const log = createLogger(); + +const sessionId = process.argv[2]; + +if (!sessionId) { + log.summary("cli_error", { reason: "missing_session_id" }); + process.stderr.write( + JSON.stringify({ + ok: false, + error: "missing_session_id", + usage: "node cli-routing-replay.mjs ", + }) + "\n", + ); + process.exit(1); +} + +try { + const report = replayRoutingSession(sessionId); + log.summary("cli_complete", { + sessionId, + traceCount: report.traceCount, + scenarioCount: report.scenarioCount, + recommendationCount: report.recommendations.length, + }); + process.stdout.write(JSON.stringify(report, null, 2) + "\n"); +} catch (err: unknown) { + const message = err instanceof Error ? err.message : String(err); + log.summary("cli_error", { reason: "replay_failed", message }); + process.stderr.write( + JSON.stringify({ ok: false, error: "replay_failed", message }) + "\n", + ); + process.exit(2); +} diff --git a/hooks/src/companion-distillation.mts b/hooks/src/companion-distillation.mts new file mode 100644 index 0000000..6bba871 --- /dev/null +++ b/hooks/src/companion-distillation.mts @@ -0,0 +1,314 @@ +/** + * companion-distillation.mts — Distill grouped exposures into promotable + * companion rules. + * + * Reads grouped SkillExposure records (exposureGroupId, candidateSkill, + * attributionRole, outcome) and compares candidate+companion performance + * against candidate-alone within the same scenario. Promotion thresholds: + * + * support >= 4 + * precisionWithCompanion >= 0.75 + * liftVsCandidateAlone >= 1.25 + * staleMissDelta <= 0.10 + * + * Does NOT write files. Does NOT change candidate-only policy credit semantics. + * All derived metrics are rounded to 4 decimal places for determinism. + */ + +import type { SkillExposure } from "./routing-policy-ledger.mjs"; +import type { RoutingDecisionTrace } from "./routing-decision-trace.mjs"; +import { + createEmptyCompanionRulebook, + type LearnedCompanionRule, + type LearnedCompanionRulebook, +} from "./learned-companion-rulebook.mjs"; +import { createLogger } from "./logger.mjs"; + +// --------------------------------------------------------------------------- +// Helpers +// --------------------------------------------------------------------------- + +function precision(wins: number, support: number): number { + return support === 0 ? 0 : wins / support; +} + +function round4(value: number): number { + return Number(value.toFixed(4)); +} + +// --------------------------------------------------------------------------- +// Distillation parameters +// --------------------------------------------------------------------------- + +export interface DistillationParams { + projectRoot: string; + traces: RoutingDecisionTrace[]; + exposures: SkillExposure[]; + generatedAt?: string; + minSupport?: number; + minPrecision?: number; + minLift?: number; + maxStaleMissDelta?: number; +} + +// --------------------------------------------------------------------------- +// Internal bucket types +// --------------------------------------------------------------------------- + +interface PairBucket { + scenario: string; + hook: SkillExposure["hook"]; + storyKind: string | null; + targetBoundary: SkillExposure["targetBoundary"]; + toolName: SkillExposure["toolName"]; + routeScope: string | null; + candidateSkill: string; + companionSkill: string; + support: number; + winsWithCompanion: number; + directiveWinsWithCompanion: number; + staleMissesWithCompanion: number; + sourceExposureGroupIds: string[]; +} + +interface BaselineBucket { + support: number; + wins: number; + staleMisses: number; +} + +// --------------------------------------------------------------------------- +// Main distillation function +// --------------------------------------------------------------------------- + +/** + * Distill grouped exposures into a companion rulebook. Pure computation — + * reads exposure fields only and does not write files or modify policy credit. + */ +export function distillCompanionRules( + params: DistillationParams, +): LearnedCompanionRulebook { + const log = createLogger(); + const generatedAt = params.generatedAt ?? new Date().toISOString(); + const minSupport = params.minSupport ?? 4; + const minPrecision = params.minPrecision ?? 0.75; + const minLift = params.minLift ?? 1.25; + const maxStaleMissDelta = params.maxStaleMissDelta ?? 0.10; + + log.summary("companion-distillation.start", { + exposureCount: params.exposures.length, + traceCount: params.traces.length, + minSupport, + minPrecision, + minLift, + maxStaleMissDelta, + }); + + const rulebook = createEmptyCompanionRulebook( + params.projectRoot, + generatedAt, + ); + + // Group exposures by exposureGroupId + const byGroup = new Map(); + for (const exposure of params.exposures) { + if (!exposure.exposureGroupId) continue; + const list = byGroup.get(exposure.exposureGroupId) ?? []; + list.push(exposure); + byGroup.set(exposure.exposureGroupId, list); + } + + log.summary("companion-distillation.grouped", { + groupCount: byGroup.size, + skippedNoGroupId: params.exposures.filter((e) => !e.exposureGroupId).length, + }); + + // Accumulate pair buckets and candidate baselines + const pairBuckets = new Map(); + const candidateBaseline = new Map(); + + for (const [groupId, group] of byGroup) { + const candidate = group.find((e) => e.attributionRole === "candidate"); + if (!candidate) continue; + + const outcome = candidate.outcome; + + const scenario = [ + candidate.hook, + candidate.storyKind ?? "none", + candidate.targetBoundary ?? "none", + candidate.toolName, + candidate.route ?? "*", + ].join("|"); + + // Update candidate baseline + const baselineKey = `${scenario}::${candidate.skill}`; + const baseline = candidateBaseline.get(baselineKey) ?? { + support: 0, + wins: 0, + staleMisses: 0, + }; + baseline.support += 1; + if (outcome === "win" || outcome === "directive-win") baseline.wins += 1; + if (outcome === "stale-miss") baseline.staleMisses += 1; + candidateBaseline.set(baselineKey, baseline); + + // Update pair buckets for each context companion + for (const context of group.filter( + (e) => e.attributionRole === "context", + )) { + const key = `${scenario}::${candidate.skill}::${context.skill}`; + const bucket = pairBuckets.get(key) ?? { + scenario, + hook: candidate.hook, + storyKind: candidate.storyKind, + targetBoundary: candidate.targetBoundary, + toolName: candidate.toolName, + routeScope: candidate.route, + candidateSkill: candidate.skill, + companionSkill: context.skill, + support: 0, + winsWithCompanion: 0, + directiveWinsWithCompanion: 0, + staleMissesWithCompanion: 0, + sourceExposureGroupIds: [], + }; + bucket.support += 1; + if (outcome === "win" || outcome === "directive-win") + bucket.winsWithCompanion += 1; + if (outcome === "directive-win") bucket.directiveWinsWithCompanion += 1; + if (outcome === "stale-miss") bucket.staleMissesWithCompanion += 1; + bucket.sourceExposureGroupIds.push(groupId); + pairBuckets.set(key, bucket); + } + } + + log.summary("companion-distillation.buckets", { + pairBucketCount: pairBuckets.size, + baselineCount: candidateBaseline.size, + }); + + // Evaluate each pair bucket against thresholds + const rules: LearnedCompanionRule[] = []; + + for (const bucket of pairBuckets.values()) { + const baseline = candidateBaseline.get( + `${bucket.scenario}::${bucket.candidateSkill}`, + ); + if (!baseline) continue; + + const winsWithoutCompanion = Math.max( + baseline.wins - bucket.winsWithCompanion, + 0, + ); + const supportWithoutCompanion = Math.max( + baseline.support - bucket.support, + 0, + ); + + const precisionWithCompanion = precision( + bucket.winsWithCompanion, + bucket.support, + ); + const baselinePrecisionWithoutCompanion = precision( + winsWithoutCompanion, + supportWithoutCompanion, + ); + + const liftVsCandidateAlone = + baselinePrecisionWithoutCompanion === 0 + ? precisionWithCompanion + : precisionWithCompanion / baselinePrecisionWithoutCompanion; + + const staleRateWithCompanion = precision( + bucket.staleMissesWithCompanion, + bucket.support, + ); + const staleRateWithoutCompanion = precision( + Math.max(baseline.staleMisses - bucket.staleMissesWithCompanion, 0), + supportWithoutCompanion, + ); + const staleMissDelta = staleRateWithCompanion - staleRateWithoutCompanion; + + const promoted = + bucket.support >= minSupport && + precisionWithCompanion >= minPrecision && + liftVsCandidateAlone >= minLift && + staleMissDelta <= maxStaleMissDelta; + + const rule: LearnedCompanionRule = { + id: `${bucket.scenario}::${bucket.candidateSkill}->${bucket.companionSkill}`, + scenario: bucket.scenario, + hook: bucket.hook, + storyKind: bucket.storyKind, + targetBoundary: bucket.targetBoundary, + toolName: bucket.toolName, + routeScope: bucket.routeScope, + candidateSkill: bucket.candidateSkill, + companionSkill: bucket.companionSkill, + support: bucket.support, + winsWithCompanion: bucket.winsWithCompanion, + winsWithoutCompanion, + directiveWinsWithCompanion: bucket.directiveWinsWithCompanion, + staleMissesWithCompanion: bucket.staleMissesWithCompanion, + precisionWithCompanion: round4(precisionWithCompanion), + baselinePrecisionWithoutCompanion: round4( + baselinePrecisionWithoutCompanion, + ), + liftVsCandidateAlone: round4(liftVsCandidateAlone), + staleMissDelta: round4(staleMissDelta), + confidence: promoted ? "promote" : "holdout-fail", + promotedAt: promoted ? generatedAt : null, + reason: promoted + ? "companion beats candidate-alone within same verified scenario" + : "insufficient support or lift", + sourceExposureGroupIds: [...bucket.sourceExposureGroupIds].sort(), + }; + + rules.push(rule); + + log.summary("companion-distillation.rule-evaluated", { + id: rule.id, + confidence: rule.confidence, + support: rule.support, + precisionWithCompanion: rule.precisionWithCompanion, + liftVsCandidateAlone: rule.liftVsCandidateAlone, + staleMissDelta: rule.staleMissDelta, + }); + } + + // Sort deterministically + rules.sort( + (a, b) => + a.scenario.localeCompare(b.scenario) || + a.candidateSkill.localeCompare(b.candidateSkill) || + a.companionSkill.localeCompare(b.companionSkill), + ); + + rulebook.rules = rules; + + const promotedCount = rules.filter( + (r) => r.confidence === "promote", + ).length; + + rulebook.replay = { + baselineWins: 0, + learnedWins: 0, + deltaWins: 0, + regressions: [], + }; + + rulebook.promotion = { + accepted: true, + errorCode: null, + reason: `${promotedCount} promoted companion rules`, + }; + + log.summary("companion-distillation.complete", { + totalRules: rules.length, + promotedCount, + holdoutFailCount: rules.length - promotedCount, + }); + + return rulebook; +} diff --git a/hooks/src/companion-recall.mts b/hooks/src/companion-recall.mts new file mode 100644 index 0000000..b452f86 --- /dev/null +++ b/hooks/src/companion-recall.mts @@ -0,0 +1,145 @@ +/** + * companion-recall.mts — Recall verified companion skills during hook injection. + * + * When a promoted companion rule matches the current scenario and candidate + * skills, the recalled companion is inserted immediately after its candidate + * in the ranked skill list. Excluded or already-injected companions fall back + * to the existing summary path instead of violating dedup rules. + * + * No-ops safely when: + * - The companion rulebook artifact is missing, invalid, or unsupported + * - No promoted rule matches the current scenario + * - All matched companions are excluded or already seen + * + * Routing reasons for recalled companions use: + * trigger: "verified-companion" + * reasonCode: "scenario-companion-rulebook" + */ + +import { loadCompanionRulebook } from "./learned-companion-rulebook.mjs"; +import { + scenarioKeyCandidates, + type RoutingPolicyScenario, +} from "./routing-policy.mjs"; +import { createLogger } from "./logger.mjs"; + +// --------------------------------------------------------------------------- +// Types +// --------------------------------------------------------------------------- + +export interface CompanionRecallCandidate { + candidateSkill: string; + companionSkill: string; + scenario: string; + confidence: number; + reason: string; +} + +export interface CompanionRecallRejection { + candidateSkill: string; + companionSkill: string; + scenario: string; + rejectedReason: string; +} + +export interface CompanionRecallResult { + selected: CompanionRecallCandidate[]; + checkedScenarios: string[]; + rejected: CompanionRecallRejection[]; +} + +// --------------------------------------------------------------------------- +// Main recall function +// --------------------------------------------------------------------------- + +/** + * Look up verified companion rules for the given scenario and candidate skills. + * Returns companions sorted by lift (desc), support (desc), name (asc). + * Respects maxCompanions cap and excludeSkills set. + */ +export function recallVerifiedCompanions(params: { + projectRoot: string; + scenario: RoutingPolicyScenario; + candidateSkills: string[]; + excludeSkills: Set; + maxCompanions: number; +}): CompanionRecallResult { + const log = createLogger(); + + const loaded = loadCompanionRulebook(params.projectRoot); + if (!loaded.ok) { + log.summary("companion-recall.load-error", { + code: loaded.error.code, + message: loaded.error.message, + }); + return { selected: [], checkedScenarios: [], rejected: [] }; + } + + const checkedScenarios = scenarioKeyCandidates(params.scenario); + const selected: CompanionRecallCandidate[] = []; + const rejected: CompanionRecallRejection[] = []; + const selectedCompanions = new Set(); + + log.summary("companion-recall.lookup", { + checkedScenarios, + candidateSkills: params.candidateSkills, + excludeCount: params.excludeSkills.size, + maxCompanions: params.maxCompanions, + rulebookRuleCount: loaded.rulebook.rules.length, + }); + + for (const scenario of checkedScenarios) { + const matching = loaded.rulebook.rules + .filter( + (rule) => + rule.scenario === scenario && + rule.confidence === "promote" && + params.candidateSkills.includes(rule.candidateSkill), + ) + .sort( + (a, b) => + b.liftVsCandidateAlone - a.liftVsCandidateAlone || + b.support - a.support || + a.companionSkill.localeCompare(b.companionSkill), + ); + + for (const rule of matching) { + if (selected.length >= params.maxCompanions) break; + + if (selectedCompanions.has(rule.companionSkill)) continue; + + if (params.excludeSkills.has(rule.companionSkill)) { + rejected.push({ + candidateSkill: rule.candidateSkill, + companionSkill: rule.companionSkill, + scenario, + rejectedReason: "excluded", + }); + continue; + } + + selected.push({ + candidateSkill: rule.candidateSkill, + companionSkill: rule.companionSkill, + scenario, + confidence: rule.liftVsCandidateAlone, + reason: rule.reason, + }); + selectedCompanions.add(rule.companionSkill); + } + } + + log.summary("companion-recall.result", { + selectedCount: selected.length, + rejectedCount: rejected.length, + checkedScenarioCount: checkedScenarios.length, + selected: selected.map((s) => ({ + candidate: s.candidateSkill, + companion: s.companionSkill, + scenario: s.scenario, + lift: s.confidence, + })), + }); + + return { selected, checkedScenarios, rejected }; +} diff --git a/hooks/src/learned-companion-rulebook.mts b/hooks/src/learned-companion-rulebook.mts new file mode 100644 index 0000000..fe7c79f --- /dev/null +++ b/hooks/src/learned-companion-rulebook.mts @@ -0,0 +1,380 @@ +/** + * learned-companion-rulebook.mts — Persisted learned companion rulebook artifact. + * + * Stores which companion skills improve a candidate skill's verification + * closure rate within the same scenario. Separate from the single-skill + * routing-policy ledger to keep causal credit clean. + * + * Persistence contract: + * - Path: `/vercel-plugin-learned-companions-.json` + * - Atomic write semantics via write-to-tmp + rename. + * - Independent of the single-skill routing-policy path. + * + * Error codes: + * - COMPANION_RULEBOOK_VERSION_UNSUPPORTED — unrecognized version + * - COMPANION_RULEBOOK_SCHEMA_INVALID — structural validation failure + * - COMPANION_RULEBOOK_READ_FAILED — I/O or JSON parse error + */ + +import { createHash, randomUUID } from "node:crypto"; +import { + readFileSync, + writeFileSync, + renameSync, +} from "node:fs"; +import { tmpdir } from "node:os"; +import { join } from "node:path"; +import { createLogger } from "./logger.mjs"; +import type { + RoutingBoundary, + RoutingHookName, + RoutingToolName, +} from "./routing-policy.mjs"; + +// --------------------------------------------------------------------------- +// Types +// --------------------------------------------------------------------------- + +export type CompanionConfidence = "candidate" | "promote" | "holdout-fail"; + +export interface LearnedCompanionRule { + id: string; + scenario: string; + hook: RoutingHookName; + storyKind: string | null; + targetBoundary: RoutingBoundary | null; + toolName: RoutingToolName; + routeScope: string | null; + candidateSkill: string; + companionSkill: string; + support: number; + winsWithCompanion: number; + winsWithoutCompanion: number; + directiveWinsWithCompanion: number; + staleMissesWithCompanion: number; + precisionWithCompanion: number; + baselinePrecisionWithoutCompanion: number; + liftVsCandidateAlone: number; + staleMissDelta: number; + confidence: CompanionConfidence; + promotedAt: string | null; + reason: string; + sourceExposureGroupIds: string[]; +} + +export interface CompanionReplay { + baselineWins: number; + learnedWins: number; + deltaWins: number; + regressions: string[]; +} + +export interface CompanionPromotion { + accepted: boolean; + errorCode: string | null; + reason: string; +} + +export interface LearnedCompanionRulebook { + version: 1; + generatedAt: string; + projectRoot: string; + rules: LearnedCompanionRule[]; + replay: CompanionReplay; + promotion: CompanionPromotion; +} + +// --------------------------------------------------------------------------- +// Error types +// --------------------------------------------------------------------------- + +export type CompanionRulebookErrorCode = + | "COMPANION_RULEBOOK_VERSION_UNSUPPORTED" + | "COMPANION_RULEBOOK_SCHEMA_INVALID" + | "COMPANION_RULEBOOK_READ_FAILED"; + +export interface CompanionRulebookError { + code: CompanionRulebookErrorCode; + message: string; + detail: Record; +} + +export type CompanionRulebookLoadResult = + | { ok: true; rulebook: LearnedCompanionRulebook } + | { ok: false; error: CompanionRulebookError }; + +// --------------------------------------------------------------------------- +// Path helpers +// --------------------------------------------------------------------------- + +export function companionRulebookPath(projectRoot: string): string { + const hash = createHash("sha256").update(projectRoot).digest("hex"); + return `${tmpdir()}/vercel-plugin-learned-companions-${hash}.json`; +} + +// --------------------------------------------------------------------------- +// Deterministic serialization +// --------------------------------------------------------------------------- + +/** + * Serialize a companion rulebook to deterministic JSON. Rules are sorted by + * scenario asc → candidateSkill asc → companionSkill asc. + */ +export function serializeCompanionRulebook( + rulebook: LearnedCompanionRulebook, +): string { + const sorted: LearnedCompanionRulebook = { + ...rulebook, + rules: [...rulebook.rules].sort( + (a, b) => + a.scenario.localeCompare(b.scenario) || + a.candidateSkill.localeCompare(b.candidateSkill) || + a.companionSkill.localeCompare(b.companionSkill), + ), + }; + return JSON.stringify(sorted, null, 2) + "\n"; +} + +// --------------------------------------------------------------------------- +// Validation +// --------------------------------------------------------------------------- + +function validateCompanionRulebookSchema( + parsed: unknown, +): CompanionRulebookError | null { + if (typeof parsed !== "object" || parsed === null || Array.isArray(parsed)) { + return { + code: "COMPANION_RULEBOOK_SCHEMA_INVALID", + message: "Companion rulebook must be a JSON object", + detail: { receivedType: typeof parsed }, + }; + } + + const obj = parsed as Record; + + if (obj.version !== 1) { + return { + code: "COMPANION_RULEBOOK_VERSION_UNSUPPORTED", + message: `Unsupported companion rulebook version: ${String(obj.version)}`, + detail: { version: obj.version, supportedVersions: [1] }, + }; + } + + if (typeof obj.generatedAt !== "string") { + return { + code: "COMPANION_RULEBOOK_SCHEMA_INVALID", + message: "Missing or invalid generatedAt field", + detail: { field: "generatedAt", receivedType: typeof obj.generatedAt }, + }; + } + + if (typeof obj.projectRoot !== "string") { + return { + code: "COMPANION_RULEBOOK_SCHEMA_INVALID", + message: "Missing or invalid projectRoot field", + detail: { field: "projectRoot", receivedType: typeof obj.projectRoot }, + }; + } + + if (!Array.isArray(obj.rules)) { + return { + code: "COMPANION_RULEBOOK_SCHEMA_INVALID", + message: "Missing or invalid rules field", + detail: { field: "rules", receivedType: typeof obj.rules }, + }; + } + + for (let i = 0; i < obj.rules.length; i++) { + const rule = obj.rules[i] as Record; + if (typeof rule !== "object" || rule === null) { + return { + code: "COMPANION_RULEBOOK_SCHEMA_INVALID", + message: `Rule at index ${i} is not an object`, + detail: { index: i, receivedType: typeof rule }, + }; + } + const requiredStrings = [ + "id", "scenario", "candidateSkill", "companionSkill", "reason", + ] as const; + for (const field of requiredStrings) { + if (typeof rule[field] !== "string") { + return { + code: "COMPANION_RULEBOOK_SCHEMA_INVALID", + message: `Rule at index ${i} has invalid ${field}`, + detail: { index: i, field, receivedType: typeof rule[field] }, + }; + } + } + const requiredNumbers = [ + "support", "winsWithCompanion", "winsWithoutCompanion", + "precisionWithCompanion", "baselinePrecisionWithoutCompanion", + "liftVsCandidateAlone", "staleMissDelta", + ] as const; + for (const field of requiredNumbers) { + if (typeof rule[field] !== "number") { + return { + code: "COMPANION_RULEBOOK_SCHEMA_INVALID", + message: `Rule at index ${i} has invalid ${field}`, + detail: { index: i, field, receivedType: typeof rule[field] }, + }; + } + } + const validConfidence = ["candidate", "promote", "holdout-fail"]; + if (!validConfidence.includes(rule.confidence as string)) { + return { + code: "COMPANION_RULEBOOK_SCHEMA_INVALID", + message: `Rule at index ${i} has invalid confidence: ${String(rule.confidence)}`, + detail: { index: i, field: "confidence", value: rule.confidence }, + }; + } + } + + // Validate replay object + if (typeof obj.replay !== "object" || obj.replay === null) { + return { + code: "COMPANION_RULEBOOK_SCHEMA_INVALID", + message: "Missing or invalid replay field", + detail: { field: "replay", receivedType: typeof obj.replay }, + }; + } + + // Validate promotion object + if (typeof obj.promotion !== "object" || obj.promotion === null) { + return { + code: "COMPANION_RULEBOOK_SCHEMA_INVALID", + message: "Missing or invalid promotion field", + detail: { field: "promotion", receivedType: typeof obj.promotion }, + }; + } + + return null; +} + +// --------------------------------------------------------------------------- +// Factory helpers +// --------------------------------------------------------------------------- + +export function createEmptyCompanionRulebook( + projectRoot: string, + generatedAt: string, +): LearnedCompanionRulebook { + return { + version: 1, + generatedAt, + projectRoot, + rules: [], + replay: { + baselineWins: 0, + learnedWins: 0, + deltaWins: 0, + regressions: [], + }, + promotion: { + accepted: true, + errorCode: null, + reason: "empty rulebook", + }, + }; +} + +// --------------------------------------------------------------------------- +// Load +// --------------------------------------------------------------------------- + +/** + * Load a project-scoped companion rulebook from disk. Returns a structured + * error for version mismatches, schema violations, or I/O failures. + * Returns an empty version-1 rulebook when the file does not exist. + */ +export function loadCompanionRulebook( + projectRoot: string, +): CompanionRulebookLoadResult { + const path = companionRulebookPath(projectRoot); + const log = createLogger(); + + let raw: string; + try { + raw = readFileSync(path, "utf-8"); + } catch { + log.summary("learned-companion-rulebook.load-miss", { + path, + reason: "file_not_found", + }); + return { + ok: true, + rulebook: createEmptyCompanionRulebook( + projectRoot, + new Date(0).toISOString(), + ), + }; + } + + let parsed: unknown; + try { + parsed = JSON.parse(raw); + } catch (err) { + const error: CompanionRulebookError = { + code: "COMPANION_RULEBOOK_READ_FAILED", + message: "Companion rulebook file contains invalid JSON", + detail: { path, parseError: String(err) }, + }; + log.summary("learned-companion-rulebook.load-error", { + code: error.code, + path, + }); + return { ok: false, error }; + } + + const validationError = validateCompanionRulebookSchema(parsed); + if (validationError) { + log.summary("learned-companion-rulebook.load-error", { + code: validationError.code, + path, + detail: validationError.detail, + }); + return { ok: false, error: validationError }; + } + + const rulebook = parsed as LearnedCompanionRulebook; + log.summary("learned-companion-rulebook.load-ok", { + path, + ruleCount: rulebook.rules.length, + promotedCount: rulebook.rules.filter((r) => r.confidence === "promote") + .length, + version: rulebook.version, + }); + + return { ok: true, rulebook }; +} + +// --------------------------------------------------------------------------- +// Save (atomic write) +// --------------------------------------------------------------------------- + +/** + * Persist a companion rulebook to disk with atomic write semantics. + * Writes to a temp file then renames to prevent partial reads. + */ +export function saveCompanionRulebook( + projectRoot: string, + rulebook: LearnedCompanionRulebook, +): void { + const dest = companionRulebookPath(projectRoot); + const tempPath = join( + tmpdir(), + `vercel-plugin-companion-rulebook-${randomUUID()}.tmp`, + ); + const log = createLogger(); + + const content = serializeCompanionRulebook(rulebook); + writeFileSync(tempPath, content); + renameSync(tempPath, dest); + + log.summary("learned-companion-rulebook.save", { + path: dest, + ruleCount: rulebook.rules.length, + promotedCount: rulebook.rules.filter((r) => r.confidence === "promote") + .length, + bytesWritten: Buffer.byteLength(content), + }); +} diff --git a/hooks/src/learned-playbook-rulebook.mts b/hooks/src/learned-playbook-rulebook.mts new file mode 100644 index 0000000..95446a0 --- /dev/null +++ b/hooks/src/learned-playbook-rulebook.mts @@ -0,0 +1,183 @@ +/** + * learned-playbook-rulebook.mts — Learned playbook artifact persistence. + * + * A playbook is a verified ordered multi-skill sequence (e.g. A → B → C) scoped + * to a (hook, storyKind, targetBoundary, toolName, routeScope) scenario. Unlike + * single-skill routing rules or pairwise companion rules, playbooks capture + * proven procedural strategies that repeatedly close a verification gap. + * + * The rulebook is written to `generated/learned-playbooks.json` beside the + * existing routing and companion rulebooks. It is safe to round-trip: the file + * is deterministic JSON sorted by scenario/anchor/sequence. + */ + +import { mkdirSync, readFileSync, writeFileSync } from "node:fs"; +import { dirname, join } from "node:path"; +import { createLogger } from "./logger.mjs"; +import type { + RoutingBoundary, + RoutingHookName, + RoutingToolName, +} from "./routing-policy.mjs"; + +// --------------------------------------------------------------------------- +// Types +// --------------------------------------------------------------------------- + +export interface LearnedPlaybookRule { + id: string; + scenario: string; + hook: RoutingHookName; + storyKind: string | null; + targetBoundary: RoutingBoundary | null; + toolName: RoutingToolName; + routeScope: string | null; + anchorSkill: string; + orderedSkills: string[]; + support: number; + wins: number; + directiveWins: number; + staleMisses: number; + precision: number; + baselinePrecisionWithoutPlaybook: number; + liftVsAnchorBaseline: number; + staleMissDelta: number; + confidence: "promote" | "holdout-fail"; + promotedAt: string | null; + reason: string; + sourceExposureGroupIds: string[]; +} + +export interface LearnedPlaybookRulebook { + version: 1; + generatedAt: string; + projectRoot: string; + rules: LearnedPlaybookRule[]; + replay: { + baselineWins: number; + learnedWins: number; + deltaWins: number; + regressions: string[]; + }; + promotion: { + accepted: boolean; + errorCode: string | null; + reason: string; + }; +} + +// --------------------------------------------------------------------------- +// Path +// --------------------------------------------------------------------------- + +export function playbookRulebookPath(projectRoot: string): string { + return join(projectRoot, "generated", "learned-playbooks.json"); +} + +// --------------------------------------------------------------------------- +// Factory +// --------------------------------------------------------------------------- + +export function createEmptyPlaybookRulebook( + projectRoot: string, + generatedAt = new Date().toISOString(), +): LearnedPlaybookRulebook { + return { + version: 1, + generatedAt, + projectRoot, + rules: [], + replay: { + baselineWins: 0, + learnedWins: 0, + deltaWins: 0, + regressions: [], + }, + promotion: { + accepted: true, + errorCode: null, + reason: "No promoted playbooks", + }, + }; +} + +// --------------------------------------------------------------------------- +// Save +// --------------------------------------------------------------------------- + +export function savePlaybookRulebook( + projectRoot: string, + rulebook: LearnedPlaybookRulebook, +): void { + const path = playbookRulebookPath(projectRoot); + mkdirSync(dirname(path), { recursive: true }); + writeFileSync(path, JSON.stringify(rulebook, null, 2) + "\n"); + createLogger().summary("learned-playbook-rulebook.save", { + path, + ruleCount: rulebook.rules.length, + promotedCount: rulebook.rules.filter((r) => r.confidence === "promote") + .length, + }); +} + +// --------------------------------------------------------------------------- +// Load +// --------------------------------------------------------------------------- + +export type LoadPlaybookRulebookResult = + | { ok: true; rulebook: LearnedPlaybookRulebook } + | { + ok: false; + error: { + code: "ENOENT" | "EINVALID"; + message: string; + }; + }; + +export function loadPlaybookRulebook( + projectRoot: string, +): LoadPlaybookRulebookResult { + const path = playbookRulebookPath(projectRoot); + try { + const raw = readFileSync(path, "utf-8"); + const parsed = JSON.parse(raw) as Partial; + if ( + parsed?.version !== 1 || + typeof parsed.generatedAt !== "string" || + typeof parsed.projectRoot !== "string" || + !Array.isArray(parsed.rules) || + typeof parsed.replay !== "object" || + typeof parsed.promotion !== "object" + ) { + return { + ok: false, + error: { + code: "EINVALID", + message: `Invalid learned playbook rulebook at ${path}`, + }, + }; + } + return { ok: true, rulebook: parsed as LearnedPlaybookRulebook }; + } catch (error) { + if ( + error instanceof Error && + "code" in error && + (error as NodeJS.ErrnoException).code === "ENOENT" + ) { + return { + ok: false, + error: { + code: "ENOENT", + message: `No learned playbook rulebook found at ${path}`, + }, + }; + } + return { + ok: false, + error: { + code: "EINVALID", + message: `Failed to read learned playbook rulebook at ${path}`, + }, + }; + } +} diff --git a/hooks/src/learned-routing-rulebook.mts b/hooks/src/learned-routing-rulebook.mts new file mode 100644 index 0000000..16ccc3a --- /dev/null +++ b/hooks/src/learned-routing-rulebook.mts @@ -0,0 +1,330 @@ +/** + * learned-routing-rulebook.mts — Canonical learned routing rulebook artifact. + * + * Surfaces the routing-policy compiler's promotion decisions as a versioned, + * machine-readable, project-scoped artifact with per-rule evidence and + * deterministic serialization. + * + * Persistence contract: + * - Rulebook path: `/vercel-plugin-routing-policy--rulebook.json` + * - Sits next to the project routing policy file. + * - Atomic write semantics via write-to-tmp + rename. + * + * Error codes: + * - RULEBOOK_VERSION_UNSUPPORTED — loaded file has an unrecognized version + * - RULEBOOK_SCHEMA_INVALID — loaded file fails structural validation + */ + +import { createHash, randomUUID } from "node:crypto"; +import { + readFileSync, + writeFileSync, + renameSync, +} from "node:fs"; +import { tmpdir } from "node:os"; +import { join } from "node:path"; +import { createLogger } from "./logger.mjs"; + +// --------------------------------------------------------------------------- +// Types +// --------------------------------------------------------------------------- + +export type LearnedRuleAction = "promote" | "demote"; + +export interface LearnedRoutingRuleEvidence { + baselineWins: number; + baselineDirectiveWins: number; + learnedWins: number; + learnedDirectiveWins: number; + regressionCount: number; +} + +export interface LearnedRoutingRule { + id: string; + scenario: string; + skill: string; + action: LearnedRuleAction; + boost: number; + confidence: number; + reason: string; + sourceSessionId: string; + promotedAt: string; + evidence: LearnedRoutingRuleEvidence; +} + +export interface LearnedRoutingRulebook { + version: 1; + createdAt: string; + sessionId: string; + rules: LearnedRoutingRule[]; +} + +// --------------------------------------------------------------------------- +// Error types +// --------------------------------------------------------------------------- + +export type RulebookErrorCode = + | "RULEBOOK_VERSION_UNSUPPORTED" + | "RULEBOOK_SCHEMA_INVALID" + | "RULEBOOK_PROMOTION_REJECTED_REGRESSION"; + +export interface RulebookError { + code: RulebookErrorCode; + message: string; + detail: Record; +} + +export type RulebookLoadResult = + | { ok: true; rulebook: LearnedRoutingRulebook } + | { ok: false; error: RulebookError }; + +// --------------------------------------------------------------------------- +// Path helpers +// --------------------------------------------------------------------------- + +export function rulebookPath(projectRoot: string): string { + const hash = createHash("sha256").update(projectRoot).digest("hex"); + return `${tmpdir()}/vercel-plugin-routing-policy-${hash}-rulebook.json`; +} + +// --------------------------------------------------------------------------- +// Deterministic serialization +// --------------------------------------------------------------------------- + +/** + * Serialize a rulebook to deterministic JSON. Rules are sorted by + * scenario asc → skill asc → id asc to guarantee byte-identical output + * for the same logical content. + */ +export function serializeRulebook(rulebook: LearnedRoutingRulebook): string { + const sorted: LearnedRoutingRulebook = { + ...rulebook, + rules: [...rulebook.rules].sort( + (a, b) => + a.scenario.localeCompare(b.scenario) || + a.skill.localeCompare(b.skill) || + a.id.localeCompare(b.id), + ), + }; + return JSON.stringify(sorted, null, 2) + "\n"; +} + +// --------------------------------------------------------------------------- +// Validation +// --------------------------------------------------------------------------- + +function validateRulebookSchema(parsed: unknown): RulebookError | null { + if (typeof parsed !== "object" || parsed === null || Array.isArray(parsed)) { + return { + code: "RULEBOOK_SCHEMA_INVALID", + message: "Rulebook must be a JSON object", + detail: { receivedType: typeof parsed }, + }; + } + + const obj = parsed as Record; + + if (obj.version !== 1) { + return { + code: "RULEBOOK_VERSION_UNSUPPORTED", + message: `Unsupported rulebook version: ${String(obj.version)}`, + detail: { version: obj.version, supportedVersions: [1] }, + }; + } + + if (typeof obj.createdAt !== "string") { + return { + code: "RULEBOOK_SCHEMA_INVALID", + message: "Missing or invalid createdAt field", + detail: { field: "createdAt", receivedType: typeof obj.createdAt }, + }; + } + + if (typeof obj.sessionId !== "string") { + return { + code: "RULEBOOK_SCHEMA_INVALID", + message: "Missing or invalid sessionId field", + detail: { field: "sessionId", receivedType: typeof obj.sessionId }, + }; + } + + if (!Array.isArray(obj.rules)) { + return { + code: "RULEBOOK_SCHEMA_INVALID", + message: "Missing or invalid rules field", + detail: { field: "rules", receivedType: typeof obj.rules }, + }; + } + + for (let i = 0; i < obj.rules.length; i++) { + const rule = obj.rules[i] as Record; + if (typeof rule !== "object" || rule === null) { + return { + code: "RULEBOOK_SCHEMA_INVALID", + message: `Rule at index ${i} is not an object`, + detail: { index: i, receivedType: typeof rule }, + }; + } + const requiredStrings = ["id", "scenario", "skill", "reason", "sourceSessionId", "promotedAt"] as const; + for (const field of requiredStrings) { + if (typeof rule[field] !== "string") { + return { + code: "RULEBOOK_SCHEMA_INVALID", + message: `Rule at index ${i} has invalid ${field}`, + detail: { index: i, field, receivedType: typeof rule[field] }, + }; + } + } + if (rule.action !== "promote" && rule.action !== "demote") { + return { + code: "RULEBOOK_SCHEMA_INVALID", + message: `Rule at index ${i} has invalid action: ${String(rule.action)}`, + detail: { index: i, field: "action", value: rule.action }, + }; + } + if (typeof rule.boost !== "number" || typeof rule.confidence !== "number") { + return { + code: "RULEBOOK_SCHEMA_INVALID", + message: `Rule at index ${i} has invalid boost or confidence`, + detail: { index: i, boost: rule.boost, confidence: rule.confidence }, + }; + } + const evidence = rule.evidence as Record | undefined; + if (typeof evidence !== "object" || evidence === null) { + return { + code: "RULEBOOK_SCHEMA_INVALID", + message: `Rule at index ${i} has invalid evidence`, + detail: { index: i, field: "evidence" }, + }; + } + const evidenceNumbers = [ + "baselineWins", "baselineDirectiveWins", + "learnedWins", "learnedDirectiveWins", "regressionCount", + ] as const; + for (const field of evidenceNumbers) { + if (typeof evidence[field] !== "number") { + return { + code: "RULEBOOK_SCHEMA_INVALID", + message: `Rule at index ${i} evidence has invalid ${field}`, + detail: { index: i, field, receivedType: typeof evidence[field] }, + }; + } + } + } + + return null; +} + +// --------------------------------------------------------------------------- +// Load +// --------------------------------------------------------------------------- + +/** + * Load a project-scoped rulebook from disk. Returns structured errors + * for version mismatches or schema violations. + */ +export function loadRulebook(projectRoot: string): RulebookLoadResult { + const path = rulebookPath(projectRoot); + const log = createLogger(); + + let raw: string; + try { + raw = readFileSync(path, "utf-8"); + } catch { + log.summary("learned-routing-rulebook.load-miss", { path, reason: "file_not_found" }); + return { + ok: true, + rulebook: createEmptyRulebook("", ""), + }; + } + + let parsed: unknown; + try { + parsed = JSON.parse(raw); + } catch (err) { + const error: RulebookError = { + code: "RULEBOOK_SCHEMA_INVALID", + message: "Rulebook file contains invalid JSON", + detail: { path, parseError: String(err) }, + }; + log.summary("learned-routing-rulebook.load-error", { code: error.code, path }); + return { ok: false, error }; + } + + const validationError = validateRulebookSchema(parsed); + if (validationError) { + log.summary("learned-routing-rulebook.load-error", { + code: validationError.code, + path, + detail: validationError.detail, + }); + return { ok: false, error: validationError }; + } + + log.summary("learned-routing-rulebook.load-ok", { + path, + ruleCount: (parsed as LearnedRoutingRulebook).rules.length, + version: (parsed as LearnedRoutingRulebook).version, + }); + + return { ok: true, rulebook: parsed as LearnedRoutingRulebook }; +} + +// --------------------------------------------------------------------------- +// Save (atomic write) +// --------------------------------------------------------------------------- + +/** + * Persist a rulebook to disk with atomic write semantics. + * Writes to a temp file then renames to prevent partial reads. + */ +export function saveRulebook( + projectRoot: string, + rulebook: LearnedRoutingRulebook, +): void { + const dest = rulebookPath(projectRoot); + const tempPath = join(tmpdir(), `vercel-plugin-rulebook-${randomUUID()}.tmp`); + const log = createLogger(); + + const content = serializeRulebook(rulebook); + writeFileSync(tempPath, content); + renameSync(tempPath, dest); + + log.summary("learned-routing-rulebook.save", { + path: dest, + ruleCount: rulebook.rules.length, + sessionId: rulebook.sessionId, + bytesWritten: Buffer.byteLength(content), + }); +} + +// --------------------------------------------------------------------------- +// Factory helpers +// --------------------------------------------------------------------------- + +export function createEmptyRulebook( + sessionId: string, + createdAt: string, +): LearnedRoutingRulebook { + return { + version: 1, + createdAt, + sessionId, + rules: [], + }; +} + +export function createRule(params: { + scenario: string; + skill: string; + action: LearnedRuleAction; + boost: number; + confidence: number; + reason: string; + sourceSessionId: string; + promotedAt: string; + evidence: LearnedRoutingRuleEvidence; +}): LearnedRoutingRule { + const id = `${params.scenario}|${params.skill}`; + return { id, ...params }; +} diff --git a/hooks/src/logger.mts b/hooks/src/logger.mts index 72387aa..3e5efc5 100644 --- a/hooks/src/logger.mts +++ b/hooks/src/logger.mts @@ -39,6 +39,7 @@ interface CompleteCounts { droppedByCap?: string[]; droppedByBudget?: string[]; boostsApplied?: string[]; + policyBoosted?: Array<{ skill: string; boost: number; reason: string | null }>; } interface SharedLoggerContext { diff --git a/hooks/src/playbook-distillation.mts b/hooks/src/playbook-distillation.mts new file mode 100644 index 0000000..d39675d --- /dev/null +++ b/hooks/src/playbook-distillation.mts @@ -0,0 +1,295 @@ +/** + * playbook-distillation.mts — Distill ordered multi-skill playbooks from + * grouped SkillExposure records. + * + * Groups exposures by `exposureGroupId`, derives ordered unique skill sequences + * capped by `maxSkills`, and emits `promote` only when support, precision, lift, + * and stale-miss thresholds all pass. The distiller compares each playbook + * (ordered sequence) against the anchor skill's baseline performance without + * that exact sequence. + */ + +import type { SkillExposure } from "./routing-policy-ledger.mjs"; +import { + createEmptyPlaybookRulebook, + type LearnedPlaybookRule, + type LearnedPlaybookRulebook, +} from "./learned-playbook-rulebook.mjs"; +import { createLogger } from "./logger.mjs"; + +// --------------------------------------------------------------------------- +// Params +// --------------------------------------------------------------------------- + +export interface DistillPlaybooksParams { + projectRoot: string; + exposures: SkillExposure[]; + generatedAt?: string; + minSupport?: number; + minPrecision?: number; + minLift?: number; + maxStaleMissDelta?: number; + maxSkills?: number; +} + +// --------------------------------------------------------------------------- +// Internal buckets +// --------------------------------------------------------------------------- + +interface PlaybookBucket { + scenario: string; + hook: SkillExposure["hook"]; + storyKind: string | null; + targetBoundary: SkillExposure["targetBoundary"]; + toolName: SkillExposure["toolName"]; + routeScope: string | null; + anchorSkill: string; + orderedSkills: string[]; + support: number; + wins: number; + directiveWins: number; + staleMisses: number; + sourceExposureGroupIds: string[]; +} + +interface BaselineBucket { + support: number; + wins: number; + staleMisses: number; +} + +// --------------------------------------------------------------------------- +// Helpers +// --------------------------------------------------------------------------- + +function round4(value: number): number { + return Number(value.toFixed(4)); +} + +function precision(wins: number, support: number): number { + return support === 0 ? 0 : wins / support; +} + +function orderedUnique(skills: string[]): string[] { + const seen = new Set(); + const out: string[] = []; + for (const skill of skills) { + if (!skill || seen.has(skill)) continue; + seen.add(skill); + out.push(skill); + } + return out; +} + +// --------------------------------------------------------------------------- +// Main distiller +// --------------------------------------------------------------------------- + +export function distillPlaybooks( + params: DistillPlaybooksParams, +): LearnedPlaybookRulebook { + const log = createLogger(); + const generatedAt = params.generatedAt ?? new Date().toISOString(); + const minSupport = params.minSupport ?? 3; + const minPrecision = params.minPrecision ?? 0.75; + const minLift = params.minLift ?? 1.25; + const maxStaleMissDelta = params.maxStaleMissDelta ?? 0.1; + const maxSkills = Math.max(2, params.maxSkills ?? 3); + + const rulebook = createEmptyPlaybookRulebook( + params.projectRoot, + generatedAt, + ); + + // Group exposures by exposureGroupId + const byGroup = new Map(); + for (const exposure of params.exposures) { + if (!exposure.exposureGroupId) continue; + const list = byGroup.get(exposure.exposureGroupId) ?? []; + list.push(exposure); + byGroup.set(exposure.exposureGroupId, list); + } + + // Bucket playbook sequences and anchor baselines + const playbookBuckets = new Map(); + const anchorBaselines = new Map(); + + for (const [groupId, group] of byGroup) { + const candidate = group.find( + (e) => (e.attributionRole ?? "candidate") === "candidate", + ); + if (!candidate) continue; + if (candidate.outcome === "pending") continue; + + const scenario = [ + candidate.hook, + candidate.storyKind ?? "none", + candidate.targetBoundary ?? "none", + candidate.toolName, + candidate.route ?? "*", + ].join("|"); + + const orderedSkills = orderedUnique(group.map((e) => e.skill)).slice( + 0, + maxSkills, + ); + + const anchorSkill = candidate.candidateSkill ?? candidate.skill; + const baselineKey = `${scenario}::${anchorSkill}`; + + // Update anchor baseline (all groups with this anchor, regardless of sequence) + const baseline = anchorBaselines.get(baselineKey) ?? { + support: 0, + wins: 0, + staleMisses: 0, + }; + baseline.support += 1; + if ( + candidate.outcome === "win" || + candidate.outcome === "directive-win" + ) { + baseline.wins += 1; + } + if (candidate.outcome === "stale-miss") { + baseline.staleMisses += 1; + } + anchorBaselines.set(baselineKey, baseline); + + // Only bucket multi-skill sequences + if (orderedSkills.length < 2) continue; + + const bucketKey = `${scenario}::${orderedSkills.join(">")}`; + const bucket = playbookBuckets.get(bucketKey) ?? { + scenario, + hook: candidate.hook, + storyKind: candidate.storyKind, + targetBoundary: candidate.targetBoundary, + toolName: candidate.toolName, + routeScope: candidate.route, + anchorSkill, + orderedSkills, + support: 0, + wins: 0, + directiveWins: 0, + staleMisses: 0, + sourceExposureGroupIds: [], + }; + + bucket.support += 1; + if ( + candidate.outcome === "win" || + candidate.outcome === "directive-win" + ) { + bucket.wins += 1; + } + if (candidate.outcome === "directive-win") { + bucket.directiveWins += 1; + } + if (candidate.outcome === "stale-miss") { + bucket.staleMisses += 1; + } + bucket.sourceExposureGroupIds.push(groupId); + playbookBuckets.set(bucketKey, bucket); + } + + // Evaluate each playbook against anchor baseline + const rules: LearnedPlaybookRule[] = []; + + for (const bucket of playbookBuckets.values()) { + const baseline = anchorBaselines.get( + `${bucket.scenario}::${bucket.anchorSkill}`, + ); + if (!baseline) continue; + + const supportWithoutPlaybook = Math.max( + baseline.support - bucket.support, + 0, + ); + const winsWithoutPlaybook = Math.max(baseline.wins - bucket.wins, 0); + const staleWithoutPlaybook = Math.max( + baseline.staleMisses - bucket.staleMisses, + 0, + ); + + const precisionWithPlaybook = precision(bucket.wins, bucket.support); + const baselinePrecisionWithoutPlaybook = precision( + winsWithoutPlaybook, + supportWithoutPlaybook, + ); + const liftVsAnchorBaseline = + baselinePrecisionWithoutPlaybook === 0 + ? precisionWithPlaybook + : precisionWithPlaybook / baselinePrecisionWithoutPlaybook; + + const staleRateWithPlaybook = precision( + bucket.staleMisses, + bucket.support, + ); + const staleRateWithoutPlaybook = precision( + staleWithoutPlaybook, + supportWithoutPlaybook, + ); + const staleMissDelta = staleRateWithPlaybook - staleRateWithoutPlaybook; + + const promoted = + bucket.support >= minSupport && + precisionWithPlaybook >= minPrecision && + liftVsAnchorBaseline >= minLift && + staleMissDelta <= maxStaleMissDelta; + + rules.push({ + id: `${bucket.scenario}::${bucket.orderedSkills.join(">")}`, + scenario: bucket.scenario, + hook: bucket.hook, + storyKind: bucket.storyKind, + targetBoundary: bucket.targetBoundary, + toolName: bucket.toolName, + routeScope: bucket.routeScope, + anchorSkill: bucket.anchorSkill, + orderedSkills: bucket.orderedSkills, + support: bucket.support, + wins: bucket.wins, + directiveWins: bucket.directiveWins, + staleMisses: bucket.staleMisses, + precision: round4(precisionWithPlaybook), + baselinePrecisionWithoutPlaybook: round4( + baselinePrecisionWithoutPlaybook, + ), + liftVsAnchorBaseline: round4(liftVsAnchorBaseline), + staleMissDelta: round4(staleMissDelta), + confidence: promoted ? "promote" : "holdout-fail", + promotedAt: promoted ? generatedAt : null, + reason: promoted + ? "verified ordered playbook beats same anchor without this exact sequence" + : "insufficient support, precision, lift, or stale-miss performance", + sourceExposureGroupIds: [...bucket.sourceExposureGroupIds].sort(), + }); + } + + // Deterministic sort + rules.sort( + (a, b) => + a.scenario.localeCompare(b.scenario) || + a.anchorSkill.localeCompare(b.anchorSkill) || + a.orderedSkills.join(">").localeCompare(b.orderedSkills.join(">")), + ); + + const promotedCount = rules.filter( + (r) => r.confidence === "promote", + ).length; + rulebook.rules = rules; + rulebook.promotion = { + accepted: true, + errorCode: null, + reason: `${promotedCount} promoted playbooks`, + }; + + log.summary("playbook-distillation.complete", { + exposureCount: params.exposures.length, + groupCount: byGroup.size, + ruleCount: rules.length, + promotedCount, + }); + + return rulebook; +} diff --git a/hooks/src/playbook-recall.mts b/hooks/src/playbook-recall.mts new file mode 100644 index 0000000..335ad1b --- /dev/null +++ b/hooks/src/playbook-recall.mts @@ -0,0 +1,157 @@ +/** + * playbook-recall.mts — Recall verified playbook sequences during hook injection. + * + * When a promoted playbook rule matches the current scenario and one of the + * candidate skills is the playbook's anchor, the missing follow-on steps are + * inserted after the anchor in ranked order. This upgrades injection from + * recalling isolated winners to recalling proven multi-skill procedures. + * + * No-ops safely when: + * - The playbook rulebook artifact is missing, invalid, or unsupported + * - No promoted rule matches the current scenario + * - All playbook steps are already present or excluded + */ + +import { + loadPlaybookRulebook, + type LearnedPlaybookRule, +} from "./learned-playbook-rulebook.mjs"; +import { + scenarioKeyCandidates, + type RoutingPolicyScenario, +} from "./routing-policy.mjs"; + +// --------------------------------------------------------------------------- +// Types +// --------------------------------------------------------------------------- + +export interface SelectedPlaybook { + ruleId: string; + scenario: string; + anchorSkill: string; + orderedSkills: string[]; + insertedSkills: string[]; + support: number; + precision: number; + lift: number; +} + +export interface RecallPlaybookResult { + selected: SelectedPlaybook | null; + banner: string | null; + rejected: Array<{ ruleId: string; reason: string }>; +} + +// --------------------------------------------------------------------------- +// Ranking +// --------------------------------------------------------------------------- + +function rankRule( + rule: LearnedPlaybookRule, + candidateSkills: string[], +): [number, number, number, number, string] { + const anchorIdx = candidateSkills.indexOf(rule.anchorSkill); + return [ + anchorIdx === -1 ? Number.MAX_SAFE_INTEGER : anchorIdx, + -rule.support, + -rule.liftVsAnchorBaseline, + -rule.precision, + rule.id, + ]; +} + +// --------------------------------------------------------------------------- +// Banner formatting +// --------------------------------------------------------------------------- + +function formatPlaybookBanner(selected: SelectedPlaybook): string { + return [ + "", + "**[Verified Playbook]**", + `Anchor: \`${selected.anchorSkill}\``, + `Sequence: ${selected.orderedSkills.map((s) => `\`${s}\``).join(" → ")}`, + `Evidence: support=${selected.support}, precision=${selected.precision}, lift=${selected.lift}`, + "Use the sequence before inventing a new debugging workflow.", + "", + ].join("\n"); +} + +// --------------------------------------------------------------------------- +// Main recall function +// --------------------------------------------------------------------------- + +export function recallVerifiedPlaybook(params: { + projectRoot: string; + scenario: RoutingPolicyScenario; + candidateSkills: string[]; + excludeSkills?: Iterable; + maxInsertedSkills?: number; +}): RecallPlaybookResult { + const loaded = loadPlaybookRulebook(params.projectRoot); + if (!loaded.ok) { + return { selected: null, banner: null, rejected: [] }; + } + + const exclude = new Set(params.excludeSkills ?? []); + const maxInsertedSkills = Math.max(0, params.maxInsertedSkills ?? 2); + const rejected: Array<{ ruleId: string; reason: string }> = []; + + for (const scenario of scenarioKeyCandidates(params.scenario)) { + const eligible = loaded.rulebook.rules + .filter( + (rule) => + rule.confidence === "promote" && + rule.scenario === scenario && + params.candidateSkills.includes(rule.anchorSkill), + ) + .sort((a, b) => { + const ra = rankRule(a, params.candidateSkills); + const rb = rankRule(b, params.candidateSkills); + return ( + ra[0] - rb[0] || + ra[1] - rb[1] || + ra[2] - rb[2] || + ra[3] - rb[3] || + ra[4].localeCompare(rb[4]) + ); + }); + + for (const rule of eligible) { + const anchorPos = rule.orderedSkills.indexOf(rule.anchorSkill); + const tail = + anchorPos === -1 + ? rule.orderedSkills.slice(1) + : rule.orderedSkills.slice(anchorPos + 1); + const insertedSkills = tail + .filter((skill) => !exclude.has(skill)) + .slice(0, maxInsertedSkills); + + if (insertedSkills.length === 0) { + rejected.push({ + ruleId: rule.id, + reason: "all_playbook_steps_already_present_or_no_budget", + }); + continue; + } + + const selected: SelectedPlaybook = { + ruleId: rule.id, + scenario: rule.scenario, + anchorSkill: rule.anchorSkill, + orderedSkills: rule.orderedSkills, + insertedSkills, + support: rule.support, + precision: rule.precision, + lift: rule.liftVsAnchorBaseline, + }; + + return { + selected, + banner: formatPlaybookBanner(selected), + rejected, + }; + } + } + + return { selected: null, banner: null, rejected }; +} diff --git a/hooks/src/policy-recall.mts b/hooks/src/policy-recall.mts new file mode 100644 index 0000000..57a4a19 --- /dev/null +++ b/hooks/src/policy-recall.mts @@ -0,0 +1,120 @@ +/** + * Route-Scoped Verified Policy Recall Selector + * + * Pure selector that picks at most one historically winning skill from the + * project routing policy. Prefers exact-route buckets before wildcard fallback. + * No filesystem access — operates entirely on the policy data structure. + * + * Thresholds inherit the same conservatism as derivePolicyBoost: minimum 3 + * exposures, minimum 65% success rate, and minimum +2 policy boost. + */ + +import { + derivePolicyBoost, + scenarioKeyCandidates, + type RoutingPolicyFile, + type RoutingPolicyScenario, + type RoutingPolicyStats, +} from "./routing-policy.mjs"; + +// --------------------------------------------------------------------------- +// Types +// --------------------------------------------------------------------------- + +export interface PolicyRecallCandidate { + skill: string; + scenario: string; + exposures: number; + wins: number; + directiveWins: number; + staleMisses: number; + successRate: number; + policyBoost: number; + recallScore: number; +} + +export interface PolicyRecallOptions { + maxCandidates?: number; + minExposures?: number; + minSuccessRate?: number; + minBoost?: number; + excludeSkills?: Iterable; +} + +// --------------------------------------------------------------------------- +// Internal helpers +// --------------------------------------------------------------------------- + +function successRate(stats: RoutingPolicyStats): number { + const weightedWins = stats.wins + stats.directiveWins * 0.25; + return weightedWins / Math.max(stats.exposures, 1); +} + +function recallScore(stats: RoutingPolicyStats): number { + return ( + derivePolicyBoost(stats) * 1000 + + Math.round(successRate(stats) * 100) * 10 + + stats.directiveWins * 5 + + stats.wins - + stats.staleMisses + ); +} + +// --------------------------------------------------------------------------- +// Selector +// --------------------------------------------------------------------------- + +/** + * Select at most `maxCandidates` (default 1) historically winning skills + * from the project routing policy for a given scenario. + * + * Lookup order follows scenarioKeyCandidates: exact route first, then + * wildcard, then legacy 4-part key. The first bucket that produces at + * least one qualified candidate wins — no cross-bucket merging. + * + * Tie-breaking is deterministic: recallScore desc, exposures desc, + * skill name asc (lexicographic). + */ +export function selectPolicyRecallCandidates( + policy: RoutingPolicyFile, + scenarioInput: RoutingPolicyScenario, + options: PolicyRecallOptions = {}, +): PolicyRecallCandidate[] { + const maxCandidates = options.maxCandidates ?? 1; + const minExposures = options.minExposures ?? 3; + const minSuccessRate = options.minSuccessRate ?? 0.65; + const minBoost = options.minBoost ?? 2; + const exclude = new Set(options.excludeSkills ?? []); + + for (const scenario of scenarioKeyCandidates(scenarioInput)) { + const bucket = policy.scenarios[scenario] ?? {}; + const candidates = Object.entries(bucket) + .map(([skill, stats]) => ({ + skill, + scenario, + exposures: stats.exposures, + wins: stats.wins, + directiveWins: stats.directiveWins, + staleMisses: stats.staleMisses, + successRate: successRate(stats), + policyBoost: derivePolicyBoost(stats), + recallScore: recallScore(stats), + })) + .filter((entry) => !exclude.has(entry.skill)) + .filter((entry) => entry.exposures >= minExposures) + .filter((entry) => entry.successRate >= minSuccessRate) + .filter((entry) => entry.policyBoost >= minBoost) + .sort( + (a, b) => + b.recallScore - a.recallScore || + b.exposures - a.exposures || + a.skill.localeCompare(b.skill), + ); + + if (candidates.length > 0) { + return candidates.slice(0, maxCandidates); + } + } + + return []; +} diff --git a/hooks/src/posttooluse-verification-observe.mts b/hooks/src/posttooluse-verification-observe.mts index d5b08f1..e8bcd4f 100644 --- a/hooks/src/posttooluse-verification-observe.mts +++ b/hooks/src/posttooluse-verification-observe.mts @@ -1,11 +1,16 @@ #!/usr/bin/env node /** - * PostToolUse hook: verification observer for Bash tool calls. + * PostToolUse hook: verification observer for tool calls. * - * Maps bash commands to verification boundaries (uiRender, clientRequest, + * Maps tool calls to verification boundaries (uiRender, clientRequest, * serverHandler, environment) and emits structured log events for the * verification pipeline. * + * Supports Bash, Read, Edit, Write, Glob, Grep, and WebFetch tools. + * Non-Bash tools produce "soft" evidence that records observations but + * does not resolve long-term routing policy outcomes. Only "strong" signals + * (Bash HTTP/browser commands, WebFetch) resolve routing policy. + * * Story inference derives the target route from recent file edits stored * in VERCEL_PLUGIN_RECENT_EDITS env var (set by PreToolUse), falling back * to extracting route hints from the command itself. @@ -18,9 +23,34 @@ import type { SyncHookJSONOutput } from "@anthropic-ai/claude-agent-sdk"; import { readFileSync, realpathSync } from "node:fs"; import { resolve } from "node:path"; import { fileURLToPath } from "node:url"; -import { pluginRoot as resolvePluginRoot, generateVerificationId } from "./hook-env.mjs"; +import { generateVerificationId } from "./hook-env.mjs"; import { createLogger } from "./logger.mjs"; import type { Logger } from "./logger.mjs"; +import { redactCommand } from "./pretooluse-skill-inject.mjs"; +import { + recordObservation, + type VerificationObservation, +} from "./verification-ledger.mjs"; +import { resolveBoundaryOutcome, type SkillExposure } from "./routing-policy-ledger.mjs"; +import { selectActiveStory } from "./verification-plan.mjs"; +import { + appendRoutingDecisionTrace, + createDecisionId, +} from "./routing-decision-trace.mjs"; +import { + classifyVerificationSignal, +} from "./verification-signal.mjs"; +import { + evaluateResolutionGate, + diagnosePendingExposureMatch, +} from "./verification-closure-diagnosis.mjs"; +import { + buildVerificationClosureCapsule, + persistVerificationClosureCapsule, +} from "./verification-closure-capsule.mjs"; + +export { redactCommand }; +export { classifyVerificationSignal }; // --------------------------------------------------------------------------- // Types @@ -33,6 +63,17 @@ export type BoundaryType = | "environment" | "unknown"; +export type VerificationSignalStrength = "strong" | "soft"; + +export type VerificationEvidenceSource = + | "bash" + | "browser" + | "http" + | "log-read" + | "env-read" + | "file-read" + | "unknown"; + export interface VerificationBoundaryEvent { event: "verification.boundary_observed"; boundary: BoundaryType; @@ -41,6 +82,12 @@ export interface VerificationBoundaryEvent { matchedPattern: string; inferredRoute: string | null; timestamp: string; + suggestedBoundary: string | null; + suggestedAction: string | null; + matchedSuggestedAction: boolean; + signalStrength: VerificationSignalStrength; + evidenceSource: VerificationEvidenceSource; + toolName: string; } export interface VerificationReport { @@ -73,7 +120,123 @@ export function isVerificationReport(value: unknown): value is VerificationRepor } // --------------------------------------------------------------------------- -// Boundary pattern mapping +// Local verification URL gating +// --------------------------------------------------------------------------- + +const LOCAL_DEV_HOSTS = new Set([ + "localhost", + "127.0.0.1", + "0.0.0.0", + "::1", + "[::1]", +]); + +/** + * Returns true when rawUrl targets a local development server. + * Recognizes well-known loopback hosts and the user-configured + * VERCEL_PLUGIN_LOCAL_DEV_ORIGIN. + */ +export function isLocalVerificationUrl( + rawUrl: string, + env: NodeJS.ProcessEnv = process.env, +): boolean { + try { + const url = new URL(rawUrl); + if (url.protocol !== "http:" && url.protocol !== "https:") return false; + const hostname = url.hostname.toLowerCase(); + if (LOCAL_DEV_HOSTS.has(hostname)) return true; + const configuredOrigin = envString(env, "VERCEL_PLUGIN_LOCAL_DEV_ORIGIN"); + if (!configuredOrigin) return false; + const configured = new URL(configuredOrigin); + return configured.host.toLowerCase() === url.host.toLowerCase(); + } catch { + return false; + } +} + +// --------------------------------------------------------------------------- +// Story-ID resolution from observed route +// --------------------------------------------------------------------------- + +export interface ObservedStoryResolution { + storyId: string | null; + method: "explicit-env" | "exact-route" | "active-story" | "none"; +} + +/** + * Resolve the story ID that owns the observed route, with method tracking. + * + * Priority: + * 1. Explicit env override (VERCEL_PLUGIN_VERIFICATION_STORY_ID) + * 2. Unique exact-match story whose route === observedRoute + * 3. Fallback to plan.activeStoryId + */ +export function resolveObservedStory( + plan: { + stories: Array<{ id: string; route: string | null }>; + activeStoryId: string | null; + }, + observedRoute: string | null, + env: NodeJS.ProcessEnv = process.env, +): ObservedStoryResolution { + const explicit = envString(env, "VERCEL_PLUGIN_VERIFICATION_STORY_ID"); + if (explicit) return { storyId: explicit, method: "explicit-env" }; + + if (observedRoute) { + const exact = plan.stories.filter((story) => story.route === observedRoute); + if (exact.length === 1) { + return { storyId: exact[0]!.id, method: "exact-route" }; + } + } + + if (plan.activeStoryId) { + return { storyId: plan.activeStoryId, method: "active-story" }; + } + + return { storyId: null, method: "none" }; +} + +/** + * @deprecated Use resolveObservedStory instead. + */ +export function resolveObservedStoryId( + plan: { + stories: Array<{ id: string; route: string | null }>; + activeStoryId: string | null; + }, + observedRoute: string | null, + env: NodeJS.ProcessEnv = process.env, +): string | null { + return resolveObservedStory(plan, observedRoute, env).storyId; +} + +// --------------------------------------------------------------------------- +// Signal strength gating +// --------------------------------------------------------------------------- + +/** + * Determine whether a verification event should resolve long-term routing + * policy outcomes. Only strong signals on known boundaries qualify. + * WebFetch is additionally gated to local-dev-origin URLs to prevent + * external fetches from poisoning routing policy. + */ +export function shouldResolveRoutingOutcome( + event: Pick, + env: NodeJS.ProcessEnv = process.env, +): boolean { + if (event.boundary === "unknown") return false; + if (event.signalStrength !== "strong") return false; + + // WebFetch should only train policy when it targets local verification. + if (event.toolName === "WebFetch") { + return isLocalVerificationUrl(event.command, env); + } + + return true; +} + +// --------------------------------------------------------------------------- +// Boundary pattern mapping (Bash) // --------------------------------------------------------------------------- interface BoundaryPattern { @@ -119,6 +282,257 @@ export function classifyBoundary(command: string): { boundary: BoundaryType; mat return { boundary: "unknown", matchedPattern: "none" }; } +// --------------------------------------------------------------------------- +// Non-Bash tool classification +// --------------------------------------------------------------------------- + +/** + * Classify a non-Bash tool call into a verification boundary and evidence metadata. + */ +export function classifyToolSignal(toolName: string, toolInput: Record): { + boundary: BoundaryType; + matchedPattern: string; + signalStrength: VerificationSignalStrength; + evidenceSource: VerificationEvidenceSource; + summary: string; +} | null { + if (toolName === "Read") { + const filePath = String(toolInput.file_path || ""); + if (!filePath) return null; + + // .env files → environment + soft + if (/\.env(\.\w+)?$/.test(filePath)) { + return { + boundary: "environment", + matchedPattern: "env-file-read", + signalStrength: "soft", + evidenceSource: "env-read", + summary: filePath, + }; + } + + // vercel.json, .vercel/project.json → environment + soft + if (/vercel\.json$/.test(filePath) || /\.vercel\/project\.json$/.test(filePath)) { + return { + boundary: "environment", + matchedPattern: "vercel-config-read", + signalStrength: "soft", + evidenceSource: "env-read", + summary: filePath, + }; + } + + // Log files → serverHandler + soft + if (/\.(log|out|err)$/.test(filePath) || /vercel-logs/.test(filePath) || /\.next\/.*server.*\.log/.test(filePath)) { + return { + boundary: "serverHandler", + matchedPattern: "log-file-read", + signalStrength: "soft", + evidenceSource: "log-read", + summary: filePath, + }; + } + + // Generic file read — not useful for verification + return null; + } + + if (toolName === "WebFetch") { + const url = String(toolInput.url || ""); + if (!url) return null; + + return { + boundary: "clientRequest", + matchedPattern: "web-fetch", + signalStrength: "strong", + evidenceSource: "http", + summary: url.slice(0, 200), + }; + } + + if (toolName === "Grep") { + const path = String(toolInput.path || ""); + + // Grep in log files → serverHandler + soft + if (/\.(log|out|err)$/.test(path) || /logs?\//.test(path)) { + return { + boundary: "serverHandler", + matchedPattern: "log-grep", + signalStrength: "soft", + evidenceSource: "log-read", + summary: `grep ${toolInput.pattern || ""} in ${path}`.slice(0, 200), + }; + } + + // Grep in .env files → environment + soft + if (/\.env/.test(path)) { + return { + boundary: "environment", + matchedPattern: "env-grep", + signalStrength: "soft", + evidenceSource: "env-read", + summary: `grep ${toolInput.pattern || ""} in ${path}`.slice(0, 200), + }; + } + + return null; + } + + if (toolName === "Glob") { + const pattern = String(toolInput.pattern || ""); + + // Glob for log files → serverHandler + soft + if (/\*\.(log|out|err)/.test(pattern) || /logs?\//.test(pattern)) { + return { + boundary: "serverHandler", + matchedPattern: "log-glob", + signalStrength: "soft", + evidenceSource: "log-read", + summary: `glob ${pattern}`.slice(0, 200), + }; + } + + // Glob for env files → environment + soft + if (/\.env/.test(pattern)) { + return { + boundary: "environment", + matchedPattern: "env-glob", + signalStrength: "soft", + evidenceSource: "env-read", + summary: `glob ${pattern}`.slice(0, 200), + }; + } + + return null; + } + + // Edit and Write on route files could infer route but aren't verification evidence + // They don't observe system behavior, they modify it + if (toolName === "Edit" || toolName === "Write") { + return null; + } + + return null; +} + +// --------------------------------------------------------------------------- +// Boundary event builder (pure, testable) +// --------------------------------------------------------------------------- + +/** + * Build a structured boundary event with redacted commands and directive matching. + * Compares the observed boundary/action against the suggested directive from env vars. + */ +export function buildBoundaryEvent(input: { + command: string; + boundary: BoundaryType; + matchedPattern: string; + inferredRoute: string | null; + verificationId: string; + timestamp?: string; + env?: NodeJS.ProcessEnv; + signalStrength?: VerificationSignalStrength; + evidenceSource?: VerificationEvidenceSource; + toolName?: string; +}): VerificationBoundaryEvent { + const env = input.env ?? process.env; + const redactedCommand = redactCommand(input.command).slice(0, 200); + const suggestedBoundary = env.VERCEL_PLUGIN_VERIFICATION_BOUNDARY || null; + const suggestedAction = env.VERCEL_PLUGIN_VERIFICATION_ACTION + ? redactCommand(env.VERCEL_PLUGIN_VERIFICATION_ACTION).slice(0, 200) + : null; + + return { + event: "verification.boundary_observed", + boundary: input.boundary, + verificationId: input.verificationId, + command: redactedCommand, + matchedPattern: input.matchedPattern, + inferredRoute: input.inferredRoute, + timestamp: input.timestamp ?? new Date().toISOString(), + suggestedBoundary, + suggestedAction, + matchedSuggestedAction: + (suggestedBoundary !== null && suggestedBoundary === input.boundary) || + (suggestedAction !== null && suggestedAction === redactedCommand), + signalStrength: input.signalStrength ?? "strong", + evidenceSource: input.evidenceSource ?? "bash", + toolName: input.toolName ?? "Bash", + }; +} + +// --------------------------------------------------------------------------- +// Ledger observation builder (pure, testable) +// --------------------------------------------------------------------------- + +/** + * Convert a boundary event into a VerificationObservation for ledger persistence. + */ +export function buildLedgerObservation( + event: VerificationBoundaryEvent, + env: NodeJS.ProcessEnv = process.env, +): VerificationObservation { + const storyIdValue = env.VERCEL_PLUGIN_VERIFICATION_STORY_ID; + + // Map evidenceSource to ledger source type + const sourceMap: Record = { + "bash": "bash", + "browser": "bash", + "http": "bash", + "log-read": "edit", + "env-read": "edit", + "file-read": "edit", + "unknown": "bash", + }; + + return { + id: event.verificationId, + timestamp: event.timestamp, + source: sourceMap[event.evidenceSource] ?? "bash", + boundary: event.boundary === "unknown" ? null : event.boundary, + route: event.inferredRoute, + storyId: typeof storyIdValue === "string" && storyIdValue.trim() !== "" + ? storyIdValue.trim() + : null, + summary: event.command, + meta: { + matchedPattern: event.matchedPattern, + suggestedBoundary: event.suggestedBoundary, + suggestedAction: event.suggestedAction, + matchedSuggestedAction: event.matchedSuggestedAction, + toolName: event.toolName, + signalStrength: event.signalStrength, + evidenceSource: event.evidenceSource, + }, + }; +} + +// --------------------------------------------------------------------------- +// Directive env helpers +// --------------------------------------------------------------------------- + +/** + * Read a trimmed non-empty string from the environment, or null. + */ +export function envString( + env: NodeJS.ProcessEnv, + key: string, +): string | null { + const value = env[key]; + return typeof value === "string" && value.trim() !== "" ? value.trim() : null; +} + +/** + * Resolve the observed route: prefer command/edit inference, fall back to + * VERCEL_PLUGIN_VERIFICATION_ROUTE from the directive env. + */ +export function resolveObservedRoute( + inferredRoute: string | null, + env: NodeJS.ProcessEnv = process.env, +): string | null { + return inferredRoute ?? envString(env, "VERCEL_PLUGIN_VERIFICATION_ROUTE"); +} + // --------------------------------------------------------------------------- // Story inference // --------------------------------------------------------------------------- @@ -162,17 +576,49 @@ export function inferRoute(command: string, recentEdits?: string): string | null return null; } +// --------------------------------------------------------------------------- +// Route inference for file-path-based tools +// --------------------------------------------------------------------------- + +/** + * Infer route from a file path (used for Read, Edit, Write, Glob, Grep). + */ +function inferRouteFromFilePath(filePath: string): string | null { + const match = ROUTE_REGEX.exec(filePath); + if (match) { + const route = "/" + match[1] + .replace(/\/page\.\w+$/, "") + .replace(/\/route\.\w+$/, "") + .replace(/\/layout\.\w+$/, "") + .replace(/\/loading\.\w+$/, "") + .replace(/\/error\.\w+$/, "") + .replace(/\[([^\]]+)\]/g, ":$1"); + return route === "/" ? "/" : route.replace(/\/$/, ""); + } + return null; +} + // --------------------------------------------------------------------------- // Input parsing // --------------------------------------------------------------------------- -export interface ParsedBashInput { - command: string; +export interface ParsedToolInput { + toolName: string; + toolInput: Record; sessionId: string | null; cwd: string | null; } -export function parseInput(raw: string, logger?: Logger): ParsedBashInput | null { +/** @deprecated Use ParsedToolInput instead */ +export type ParsedBashInput = { + command: string; + sessionId: string | null; + cwd: string | null; +}; + +const SUPPORTED_TOOLS = new Set(["Bash", "Read", "Edit", "Write", "Glob", "Grep", "WebFetch"]); + +export function parseInput(raw: string, logger?: Logger): ParsedToolInput | null { const trimmed = (raw || "").trim(); if (!trimmed) return null; @@ -184,17 +630,21 @@ export function parseInput(raw: string, logger?: Logger): ParsedBashInput | null } const toolName = (input.tool_name as string) || ""; - if (toolName !== "Bash") return null; + if (!SUPPORTED_TOOLS.has(toolName)) return null; const toolInput = (input.tool_input as Record) || {}; - const command = (toolInput.command as string) || ""; - if (!command) return null; + + // Bash requires a non-empty command + if (toolName === "Bash") { + const command = (toolInput.command as string) || ""; + if (!command) return null; + } const sessionId = (input.session_id as string) || null; const cwdCandidate = input.cwd ?? input.working_directory; const cwd = typeof cwdCandidate === "string" && cwdCandidate.trim() !== "" ? cwdCandidate : null; - return { command, sessionId, cwd }; + return { toolName, toolInput, sessionId, cwd }; } // --------------------------------------------------------------------------- @@ -217,34 +667,302 @@ export function run(rawInput?: string): string { const parsed = parseInput(raw, log); if (!parsed) { - log.debug("verification-observe-skip", { reason: "no_bash_input" }); + log.debug("verification-observe-skip", { reason: "no_supported_input" }); return "{}"; } - const { command, sessionId } = parsed; - const { boundary, matchedPattern } = classifyBoundary(command); + const { toolName, toolInput, sessionId } = parsed; + const env = process.env; + + // Unified multi-tool classification via verification-signal module + const signal = classifyVerificationSignal({ toolName, toolInput, env }); + if (!signal) { + log.trace("verification-observe-skip", { + reason: "no_boundary_match", + toolName, + }); + return "{}"; + } - if (boundary === "unknown") { - log.trace("verification-observe-skip", { reason: "no_boundary_match", command: command.slice(0, 120) }); + if (signal.boundary === "unknown") { + log.trace("verification-observe-skip", { + reason: "no_boundary_match", + toolName, + summary: signal.summary.slice(0, 120), + }); return "{}"; } + const { boundary, matchedPattern, signalStrength, evidenceSource, summary } = signal; + const verificationId = generateVerificationId(); - const recentEdits = process.env.VERCEL_PLUGIN_RECENT_EDITS || ""; - const inferredRoute = inferRoute(command, recentEdits); + const recentEdits = env.VERCEL_PLUGIN_RECENT_EDITS || ""; - const boundaryEvent: VerificationBoundaryEvent = { - event: "verification.boundary_observed", + // Infer route: for Bash use command + recent edits, for file tools use file path + let inferredRoute: string | null; + if (toolName === "Bash") { + inferredRoute = resolveObservedRoute(inferRoute(summary, recentEdits), env); + } else { + const filePath = String(toolInput.file_path || toolInput.path || toolInput.url || ""); + inferredRoute = resolveObservedRoute( + inferRouteFromFilePath(filePath) ?? inferRoute(summary, recentEdits), + env, + ); + } + + const boundaryEvent = buildBoundaryEvent({ + command: summary, boundary, - verificationId, - command: command.slice(0, 200), matchedPattern, inferredRoute, - timestamp: new Date().toISOString(), - }; + verificationId, + signalStrength, + evidenceSource, + toolName, + }); log.summary("verification.boundary_observed", boundaryEvent as unknown as Record); + if (sessionId) { + const plan = recordObservation( + sessionId, + buildLedgerObservation(boundaryEvent), + { + agentBrowserAvailable: + process.env.VERCEL_PLUGIN_AGENT_BROWSER_AVAILABLE !== "0", + lastAttemptedAction: + process.env.VERCEL_PLUGIN_VERIFICATION_ACTION || null, + }, + log, + ); + + log.summary("verification.plan_feedback", { + verificationId, + toolName, + signalStrength, + evidenceSource, + matchedSuggestedAction: boundaryEvent.matchedSuggestedAction, + satisfiedBoundaries: Array.from(plan.satisfiedBoundaries).sort(), + missingBoundaries: [...plan.missingBoundaries], + primaryNextAction: plan.primaryNextAction, + blockedReasons: [...plan.blockedReasons], + }); + + // Resolve story from observed route, preferring exact match over active story + const activeStory = plan.stories.length > 0 + ? selectActiveStory(plan) + : null; + + const storyResolution = resolveObservedStory( + { + stories: plan.stories.map((s) => ({ id: s.id, route: s.route })), + activeStoryId: activeStory?.id ?? null, + }, + inferredRoute, + env, + ); + + // Structured gate evaluation with explicit blocking reason codes + const gate = evaluateResolutionGate( + { + boundary: boundaryEvent.boundary, + signalStrength, + toolName, + command: boundaryEvent.command, + }, + env, + ); + + // Diagnose pending exposure matches (skip for unknown boundaries) + const exposureDiagnosis = + boundaryEvent.boundary === "unknown" + ? null + : diagnosePendingExposureMatch({ + sessionId, + boundary: boundaryEvent.boundary as + | "uiRender" + | "clientRequest" + | "serverHandler" + | "environment", + storyId: storyResolution.storyId, + route: inferredRoute, + }); + + // Resolve routing policy only when the gate passes + let resolved: SkillExposure[] = []; + if (gate.eligible && boundaryEvent.boundary !== "unknown") { + resolved = resolveBoundaryOutcome({ + sessionId, + boundary: boundaryEvent.boundary as + | "uiRender" + | "clientRequest" + | "serverHandler" + | "environment", + matchedSuggestedAction: boundaryEvent.matchedSuggestedAction, + storyId: storyResolution.storyId, + route: inferredRoute, + now: boundaryEvent.timestamp, + }); + } else { + log.debug("verification.routing-policy-skipped", { + verificationId, + boundary: boundaryEvent.boundary, + toolName, + blockingReasonCodes: gate.blockingReasonCodes, + signalStrength, + }); + } + + if (gate.eligible && resolved.length === 0) { + log.debug("verification.routing-policy-unresolved", { + verificationId, + boundary: boundaryEvent.boundary, + toolName, + storyId: storyResolution.storyId, + route: inferredRoute, + unresolvedReasonCodes: + exposureDiagnosis?.unresolvedReasonCodes ?? [ + "no_exact_pending_match", + ], + pendingBoundaryCount: + exposureDiagnosis?.pendingBoundaryCount ?? 0, + }); + } + + // Build and persist the closure capsule + const closureCapsule = buildVerificationClosureCapsule({ + sessionId, + verificationId, + toolName, + createdAt: boundaryEvent.timestamp, + observation: { + boundary: boundaryEvent.boundary, + signalStrength, + evidenceSource, + matchedPattern, + command: boundaryEvent.command, + inferredRoute, + matchedSuggestedAction: boundaryEvent.matchedSuggestedAction, + }, + storyResolution: { + resolvedStoryId: storyResolution.storyId, + method: storyResolution.method, + activeStoryId: activeStory?.id ?? null, + activeStoryKind: activeStory?.kind ?? null, + activeStoryRoute: activeStory?.route ?? null, + }, + gate, + exposureDiagnosis, + resolvedExposures: resolved, + plan: { + activeStoryId: plan.activeStoryId ?? null, + satisfiedBoundaries: plan.satisfiedBoundaries, + missingBoundaries: [...plan.missingBoundaries], + blockedReasons: [...plan.blockedReasons], + primaryNextAction: plan.primaryNextAction + ? { + action: plan.primaryNextAction.action, + targetBoundary: plan.primaryNextAction.targetBoundary, + reason: plan.primaryNextAction.reason, + } + : null, + }, + }); + + const capsulePath = persistVerificationClosureCapsule( + closureCapsule, + log, + ); + + log.summary("verification.routing-policy-resolution-gate", { + verificationId, + toolName, + boundary: boundaryEvent.boundary, + inferredRoute, + resolvedStoryId: storyResolution.storyId, + storyResolutionMethod: storyResolution.method, + resolutionEligible: gate.eligible, + blockingReasonCodes: gate.blockingReasonCodes, + exactPendingMatchCount: exposureDiagnosis?.exactMatchCount ?? 0, + capsulePath, + }); + + if (resolved.length > 0) { + const outcomeKind = boundaryEvent.matchedSuggestedAction + ? "directive-win" + : "win"; + log.summary("verification.routing-policy-resolved", { + verificationId, + boundary: boundaryEvent.boundary, + storyId: storyResolution.storyId, + route: inferredRoute, + resolvedCount: resolved.length, + outcomeKind, + skills: resolved.map((e) => e.skill), + }); + } + + // Emit routing decision trace with diagnostic skipped reasons + const redactedTarget = toolName === "Bash" + ? redactCommand(summary).slice(0, 200) + : summary.slice(0, 200); + const decisionId = createDecisionId({ + hook: "PostToolUse", + sessionId, + toolName, + toolTarget: redactedTarget, + timestamp: boundaryEvent.timestamp, + }); + + appendRoutingDecisionTrace({ + version: 2, + decisionId, + sessionId, + hook: "PostToolUse", + toolName, + toolTarget: redactedTarget, + timestamp: boundaryEvent.timestamp, + primaryStory: { + id: storyResolution.storyId, + kind: activeStory?.kind ?? null, + storyRoute: activeStory?.route ?? inferredRoute, + targetBoundary: boundaryEvent.boundary === "unknown" ? null : boundaryEvent.boundary, + }, + observedRoute: inferredRoute, + policyScenario: storyResolution.storyId + ? `PostToolUse|${activeStory?.kind ?? "none"}|${boundaryEvent.boundary}|${toolName}` + : null, + matchedSkills: [], + injectedSkills: [], + skippedReasons: [ + ...(storyResolution.storyId ? [] : ["no_active_verification_story"]), + ...gate.blockingReasonCodes.map((code) => `gate:${code}`), + ...(gate.eligible && resolved.length === 0 + ? (exposureDiagnosis?.unresolvedReasonCodes ?? ["no_exact_pending_match"]).map( + (code) => `resolution:${code}`, + ) + : []), + ], + ranked: [], + verification: { + verificationId, + observedBoundary: boundaryEvent.boundary, + matchedSuggestedAction: boundaryEvent.matchedSuggestedAction, + }, + causes: [], + edges: [], + }); + + log.summary("routing.decision_trace_written", { + decisionId, + hook: "PostToolUse", + verificationId, + boundary: boundaryEvent.boundary, + toolName, + signalStrength, + }); + } + log.complete("verification-observe-done", { matchedCount: 1, injectedCount: 0, diff --git a/hooks/src/pretooluse-skill-inject.mts b/hooks/src/pretooluse-skill-inject.mts index 5edf9bb..59582a7 100644 --- a/hooks/src/pretooluse-skill-inject.mts +++ b/hooks/src/pretooluse-skill-inject.mts @@ -60,6 +60,29 @@ import type { VercelJsonRouting } from "./vercel-config.mjs"; import { createLogger, logDecision } from "./logger.mjs"; import type { Logger } from "./logger.mjs"; import { trackBaseEvents } from "./telemetry.mjs"; +import { loadCachedPlanResult, selectActiveStory } from "./verification-plan.mjs"; +import { resolveVerificationRuntimeState, buildVerificationEnv } from "./verification-directive.mjs"; +import { applyPolicyBoosts, applyRulebookBoosts } from "./routing-policy.mjs"; +import type { RoutingHookName, RoutingToolName, RulebookBoostExplanation } from "./routing-policy.mjs"; +import { + appendSkillExposure, + loadProjectRoutingPolicy, +} from "./routing-policy-ledger.mjs"; +import { loadRulebook, rulebookPath } from "./learned-routing-rulebook.mjs"; +import { buildAttributionDecision } from "./routing-attribution.mjs"; +import { explainPolicyRecall } from "./routing-diagnosis.mjs"; +import { + appendRoutingDecisionTrace, + createDecisionId, +} from "./routing-decision-trace.mjs"; +import { + createDecisionCausality, + addCause, + addEdge, +} from "./routing-decision-causality.mjs"; +import type { RoutingDecisionCausality } from "./routing-decision-causality.mjs"; +import { recallVerifiedCompanions } from "./companion-recall.mjs"; +import { recallVerifiedPlaybook } from "./playbook-recall.mjs"; const MAX_SKILLS = 3; const DEFAULT_INJECTION_BUDGET_BYTES = 18_000; @@ -775,6 +798,10 @@ export interface DeduplicateParams { likelySkills?: Set; compiledSkills?: CompiledSkillEntry[]; setupMode?: boolean; + /** Project root for loading routing policy. */ + cwd?: string; + /** Session ID for loading cached verification plan. */ + sessionId?: string | null; } export interface SetupModeRouting { @@ -789,13 +816,15 @@ export interface DeduplicateResult { vercelJsonRouting: VercelJsonRouting | null; profilerBoosted: string[]; setupModeRouting: SetupModeRouting | null; + policyBoosted: Array<{ skill: string; boost: number; reason: string | null }>; + rulebookBoosted: RulebookBoostExplanation[]; } /** * Filter already-seen skills, apply vercel.json key-aware routing and profiler boost, rank, and cap. */ export function deduplicateSkills( - { matchedEntries, matched, toolName, toolInput, injectedSkills, dedupOff, maxSkills, likelySkills, compiledSkills, setupMode }: DeduplicateParams, + { matchedEntries, matched, toolName, toolInput, injectedSkills, dedupOff, maxSkills, likelySkills, compiledSkills, setupMode, cwd, sessionId }: DeduplicateParams, logger?: Logger, ): DeduplicateResult { const l = logger || log; @@ -901,6 +930,127 @@ export function deduplicateSkills( } } + // Policy boost: apply learned routing-policy boosts from verification outcomes + // Only apply when an active verification story exists to avoid training on junk none|none buckets + const policyBoosted: Array<{ skill: string; boost: number; reason: string | null }> = []; + if (cwd) { + const plan = sessionId ? loadCachedPlanResult(sessionId, l) : null; + const primaryStory = plan ? selectActiveStory(plan) : null; + + if (primaryStory) { + const policyScenario = { + hook: "PreToolUse" as RoutingHookName, + storyKind: primaryStory.kind ?? null, + targetBoundary: (plan?.primaryNextAction?.targetBoundary as + | "uiRender" + | "clientRequest" + | "serverHandler" + | "environment" + | null) ?? null, + toolName: toolName as RoutingToolName, + }; + const policy = loadProjectRoutingPolicy(cwd); + const boosted = applyPolicyBoosts( + newEntries.map((e) => ({ + ...e, + skill: e.skill, + priority: e.priority, + effectivePriority: typeof e.effectivePriority === "number" ? e.effectivePriority : e.priority, + })), + policy, + policyScenario, + ); + + for (let i = 0; i < newEntries.length; i++) { + const b = boosted[i]; + newEntries[i].effectivePriority = b.effectivePriority; + if (b.policyBoost !== 0) { + policyBoosted.push({ + skill: b.skill, + boost: b.policyBoost, + reason: b.policyReason, + }); + } + } + + if (policyBoosted.length > 0) { + l.debug("policy-boosted", { + scenario: `${policyScenario.hook}|${policyScenario.storyKind ?? "none"}|${policyScenario.targetBoundary ?? "none"}|${policyScenario.toolName}`, + boostedSkills: policyBoosted, + }); + } + } else { + l.debug("policy-boost-skipped", { reason: "no active verification story" }); + } + } + + // Rulebook boost: when a learned-routing-rulebook exists, apply rulebook rules + // with explicit precedence — rulebook replaces stats-policy for matching skills + const rulebookBoosted: RulebookBoostExplanation[] = []; + if (cwd) { + const rbResult = loadRulebook(cwd); + if (rbResult.ok && rbResult.rulebook.rules.length > 0) { + const plan = sessionId ? loadCachedPlanResult(sessionId, l) : null; + const primaryStory = plan ? selectActiveStory(plan) : null; + + if (primaryStory) { + const rbScenario = { + hook: "PreToolUse" as RoutingHookName, + storyKind: primaryStory.kind ?? null, + targetBoundary: (plan?.primaryNextAction?.targetBoundary as + | "uiRender" + | "clientRequest" + | "serverHandler" + | "environment" + | null) ?? null, + toolName: toolName as RoutingToolName, + }; + const rbPath = rulebookPath(cwd); + const withRulebook = applyRulebookBoosts( + newEntries.map((e) => ({ + ...e, + skill: e.skill, + priority: e.priority, + effectivePriority: typeof e.effectivePriority === "number" ? e.effectivePriority : e.priority, + policyBoost: policyBoosted.find((p) => p.skill === e.skill)?.boost ?? 0, + policyReason: policyBoosted.find((p) => p.skill === e.skill)?.reason ?? null, + })), + rbResult.rulebook, + rbScenario, + rbPath, + ); + + for (let i = 0; i < newEntries.length; i++) { + const rb = withRulebook[i]; + newEntries[i].effectivePriority = rb.effectivePriority; + if (rb.matchedRuleId) { + rulebookBoosted.push({ + skill: rb.skill, + matchedRuleId: rb.matchedRuleId, + ruleBoost: rb.ruleBoost, + ruleReason: rb.ruleReason ?? "", + rulebookPath: rb.rulebookPath ?? "", + }); + // Suppress stats-policy boost for skills where rulebook takes precedence + const pIdx = policyBoosted.findIndex((p) => p.skill === rb.skill); + if (pIdx !== -1) { + policyBoosted.splice(pIdx, 1); + } + } + } + + if (rulebookBoosted.length > 0) { + l.debug("rulebook-boosted", { + scenario: `${rbScenario.hook}|${rbScenario.storyKind ?? "none"}|${rbScenario.targetBoundary ?? "none"}|${rbScenario.toolName}`, + boostedSkills: rulebookBoosted, + }); + } + } + } else if (!rbResult.ok) { + l.debug("rulebook-load-error", { code: rbResult.error.code, message: rbResult.error.message }); + } + } + // Sort by effectivePriority (if set) or priority DESC, then skill name ASC newEntries = rankEntries(newEntries); @@ -909,12 +1059,17 @@ export function deduplicateSkills( // Emit skill_ranked for each candidate in priority order for (const entry of newEntries) { const eff = typeof entry.effectivePriority === "number" ? entry.effectivePriority : entry.priority; + const reason = rulebookBoosted.some((r) => r.skill === entry.skill) + ? "rulebook_boosted" + : policyBoosted.some((p) => p.skill === entry.skill) + ? "policy_boosted" + : profilerBoosted.includes(entry.skill) ? "profiler_boosted" : "pattern_match"; logDecision(l, { hook: "PreToolUse", event: "skill_ranked", skill: entry.skill, score: eff, - reason: profilerBoosted.includes(entry.skill) ? "profiler_boosted" : "pattern_match", + reason, }); } @@ -923,7 +1078,7 @@ export function deduplicateSkills( previouslyInjected: [...injectedSkills], }); - return { newEntries, rankedSkills, vercelJsonRouting, profilerBoosted, setupModeRouting }; + return { newEntries, rankedSkills, vercelJsonRouting, profilerBoosted, setupModeRouting, policyBoosted, rulebookBoosted }; } // --------------------------------------------------------------------------- @@ -1094,6 +1249,119 @@ export interface SkillInjectionReason { reasonCode: string; } +export interface VerifiedPlaybookSelection { + anchorSkill: string; + insertedSkills: string[]; + banner: string | null; +} + +export function applyVerifiedPlaybookInsertion(params: { + rankedSkills: string[]; + matched: Set; + injectedSkills: Set; + dedupOff: boolean; + forceSummarySkills: Set; + selection: VerifiedPlaybookSelection | null; +}): { + rankedSkills: string[]; + matched: Set; + forceSummarySkills: Set; + reasons: Record; + applied: boolean; + appliedOrderedSkills: string[]; + appliedInsertedSkills: string[]; + banner: string | null; +} { + const rankedSkills = [...params.rankedSkills]; + const matched = new Set(params.matched); + const forceSummarySkills = new Set(params.forceSummarySkills); + const reasons: Record = {}; + + if (!params.selection) { + return { + rankedSkills, matched, forceSummarySkills, reasons, + applied: false, appliedOrderedSkills: [], appliedInsertedSkills: [], + banner: null, + }; + } + + const anchorIdx = rankedSkills.indexOf(params.selection.anchorSkill); + if (anchorIdx === -1) { + return { + rankedSkills, matched, forceSummarySkills, reasons, + applied: false, appliedOrderedSkills: [], appliedInsertedSkills: [], + banner: null, + }; + } + + const appliedInsertedSkills: string[] = []; + let insertOffset = 1; + for (const skill of params.selection.insertedSkills) { + if (rankedSkills.includes(skill)) continue; + rankedSkills.splice(anchorIdx + insertOffset, 0, skill); + matched.add(skill); + appliedInsertedSkills.push(skill); + if (!params.dedupOff && params.injectedSkills.has(skill)) { + forceSummarySkills.add(skill); + } + reasons[skill] = { + trigger: "verified-playbook", + reasonCode: "scenario-playbook-rulebook", + }; + insertOffset += 1; + } + + const applied = appliedInsertedSkills.length > 0; + return { + rankedSkills, matched, forceSummarySkills, reasons, + applied, + appliedOrderedSkills: applied + ? [params.selection.anchorSkill, ...appliedInsertedSkills] + : [], + appliedInsertedSkills, + banner: applied ? params.selection.banner : null, + }; +} + +// --------------------------------------------------------------------------- +// Playbook exposure role assignment (credit-safe attribution) +// --------------------------------------------------------------------------- + +export type ExposureRole = "candidate" | "context"; + +export interface PlaybookExposureRole { + skill: string; + attributionRole: ExposureRole; + candidateSkill: string | null; +} + +/** + * Deterministically marks the anchor skill as `candidate` and every inserted + * playbook step as `context`, all sharing the anchor as `candidateSkill`. + * + * Only the anchor exposure affects project policy counters; inserted context + * steps remain inspectable in the session exposure ledger but never + * accumulate policy wins. + */ +export function buildPlaybookExposureRoles( + orderedSkills: string[], +): PlaybookExposureRole[] { + const [anchorSkill, ...rest] = orderedSkills.filter(Boolean); + if (!anchorSkill) return []; + return [ + { + skill: anchorSkill, + attributionRole: "candidate", + candidateSkill: anchorSkill, + }, + ...rest.map((skill) => ({ + skill, + attributionRole: "context" as const, + candidateSkill: anchorSkill, + })), + ]; +} + export interface FormatOutputParams { parts: string[]; matched: Set; @@ -1108,13 +1376,13 @@ export interface FormatOutputParams { verificationId?: string; skillMap?: Record; platform?: HookPlatform; - env?: RuntimeEnvUpdates; + env?: Record; } function formatPlatformOutput( platform: HookPlatform, additionalContext?: string, - env?: RuntimeEnvUpdates, + env?: Record, ): string { if (platform === "cursor") { const output: Record = {}; @@ -1325,6 +1593,27 @@ function run(): string { const { matchedEntries, matchReasons, matched } = matchResult; + // Causality store — accumulates explicit causes and edges across all stages + const causality = createDecisionCausality(); + + // Record pattern-match causes + for (const [skill, reason] of Object.entries(matchReasons)) { + addCause(causality, { + code: "pattern-match", + stage: "match", + skill, + synthetic: false, + scoreDelta: 0, + message: `Matched ${reason.matchType} pattern`, + detail: { + matchType: reason.matchType, + pattern: reason.pattern, + toolName, + toolTarget: toolName === "Bash" ? redactCommand(toolTarget) : toolTarget, + }, + }); + } + // Stage 3.5: TSX review trigger — check before dedup to inform synthetic injection const tsxReview = checkTsxReviewTrigger(toolName, toolInput, injectedSkills, dedupOff, sessionId, log); @@ -1356,9 +1645,42 @@ function run(): string { likelySkills, compiledSkills, setupMode, + cwd, + sessionId, }, log); - const { newEntries, rankedSkills, profilerBoosted } = dedupResult; + const { newEntries, rankedSkills, profilerBoosted, policyBoosted, rulebookBoosted } = dedupResult; + + // Record policy boost causes + for (const boosted of policyBoosted) { + addCause(causality, { + code: "policy-boost", + stage: "rank", + skill: boosted.skill, + synthetic: false, + scoreDelta: boosted.boost, + message: boosted.reason ?? "Policy boost applied", + detail: { boost: boosted.boost, reason: boosted.reason ?? "" }, + }); + } + + // Record rulebook boost causes + for (const boosted of rulebookBoosted) { + addCause(causality, { + code: "rulebook-boost", + stage: "rank", + skill: boosted.skill, + synthetic: false, + scoreDelta: boosted.ruleBoost, + message: boosted.ruleReason || "Rulebook boost applied", + detail: { + matchedRuleId: boosted.matchedRuleId, + ruleBoost: boosted.ruleBoost, + ruleReason: boosted.ruleReason, + rulebookPath: boosted.rulebookPath, + }, + }); + } // Stage 4.5: Synthetically inject react-best-practices if TSX review triggered let tsxReviewInjected = false; @@ -1504,6 +1826,301 @@ function run(): string { } } + // Stage 4.95: Route-scoped policy recall — inject historically verified winners + // that pattern matching missed. Only fires when an active verification story + // and target boundary exist. Phase 1: max 1 recalled skill. + const policyRecallSynthetic = new Set(); + if (cwd && sessionId) { + const recallPlan = loadCachedPlanResult(sessionId, log); + const recallStory = recallPlan ? selectActiveStory(recallPlan) : null; + const recallBoundary = (recallPlan?.primaryNextAction?.targetBoundary as + | "uiRender" + | "clientRequest" + | "serverHandler" + | "environment" + | null) ?? null; + + if (recallStory && recallBoundary) { + const recallScenario = { + hook: "PreToolUse" as RoutingHookName, + storyKind: recallStory.kind ?? null, + targetBoundary: recallBoundary, + toolName: toolName as RoutingToolName, + routeScope: recallStory.route ?? null, + }; + + const policy = loadProjectRoutingPolicy(cwd); + const excludeSkills = new Set([...rankedSkills, ...injectedSkills]); + + const recallDiagnosis = explainPolicyRecall(policy, recallScenario, { + maxCandidates: 1, + excludeSkills, + }); + + log.debug("policy-recall-lookup", { + requestedScenario: + `${recallScenario.hook}|${recallScenario.storyKind ?? "none"}|` + + `${recallScenario.targetBoundary ?? "none"}|${recallScenario.toolName}|` + + `${recallScenario.routeScope ?? "*"}`, + checkedScenarios: recallDiagnosis.checkedScenarios, + selectedBucket: recallDiagnosis.selectedBucket, + selectedSkills: recallDiagnosis.selected.map((candidate) => candidate.skill), + rejected: recallDiagnosis.rejected.map((candidate) => ({ + skill: candidate.skill, + scenario: candidate.scenario, + exposures: candidate.exposures, + successRate: candidate.successRate, + policyBoost: candidate.policyBoost, + excluded: candidate.excluded, + rejectedReason: candidate.rejectedReason, + })), + hintCodes: recallDiagnosis.hints.map((hint) => hint.code), + }); + + for (const candidate of recallDiagnosis.selected) { + if (rankedSkills.includes(candidate.skill)) continue; + const insertIdx = rankedSkills.length > 0 ? 1 : 0; + rankedSkills.splice(insertIdx, 0, candidate.skill); + matched.add(candidate.skill); + policyRecallSynthetic.add(candidate.skill); + addCause(causality, { + code: "policy-recall", + stage: "rank", + skill: candidate.skill, + synthetic: true, + scoreDelta: 0, + message: `Recalled historically verified skill for ${candidate.scenario}`, + detail: { + scenario: candidate.scenario, + exposures: candidate.exposures, + wins: candidate.wins, + directiveWins: candidate.directiveWins, + successRate: candidate.successRate, + recallScore: candidate.recallScore, + }, + }); + log.debug("policy-recall-injected", { + skill: candidate.skill, + scenario: candidate.scenario, + insertionIndex: insertIdx, + exposures: candidate.exposures, + wins: candidate.wins, + directiveWins: candidate.directiveWins, + successRate: candidate.successRate, + policyBoost: candidate.policyBoost, + recallScore: candidate.recallScore, + }); + } + } else { + log.debug("policy-recall-skipped", { + reason: !recallStory ? "no_active_verification_story" : "no_target_boundary", + }); + } + } + + // Stage 4.96: Verified companion recall — insert learned companion skills + // immediately after their candidate in the ranked list. Separate from + // single-skill policy recall to keep causal credit clean. + const companionRecallReasons: Record = {}; + if (cwd && sessionId) { + const companionPlan = loadCachedPlanResult(sessionId, log); + const companionStory = companionPlan ? selectActiveStory(companionPlan) : null; + const companionBoundary = (companionPlan?.primaryNextAction?.targetBoundary as + | "uiRender" + | "clientRequest" + | "serverHandler" + | "environment" + | null) ?? null; + + if (companionStory && companionBoundary) { + const companionRecall = recallVerifiedCompanions({ + projectRoot: cwd, + scenario: { + hook: "PreToolUse" as RoutingHookName, + storyKind: companionStory.kind ?? null, + targetBoundary: companionBoundary, + toolName: toolName as RoutingToolName, + routeScope: companionStory.route ?? null, + }, + candidateSkills: [...rankedSkills], + excludeSkills: new Set([...rankedSkills, ...injectedSkills]), + maxCompanions: 1, + }); + + for (const recall of companionRecall.selected) { + const candidateIdx = rankedSkills.indexOf(recall.candidateSkill); + if (candidateIdx === -1) continue; + rankedSkills.splice(candidateIdx + 1, 0, recall.companionSkill); + matched.add(recall.companionSkill); + + const alreadySeen = !dedupOff && injectedSkills.has(recall.companionSkill); + if (alreadySeen) { + forceSummarySkills.add(recall.companionSkill); + } + + companionRecallReasons[recall.companionSkill] = { + trigger: "verified-companion", + reasonCode: "scenario-companion-rulebook", + }; + + addCause(causality, { + code: "verified-companion", + stage: "rank", + skill: recall.companionSkill, + synthetic: true, + scoreDelta: 0, + message: `Inserted learned companion after ${recall.candidateSkill}`, + detail: { + candidateSkill: recall.candidateSkill, + scenario: recall.scenario, + confidence: recall.confidence, + summaryOnly: alreadySeen, + }, + }); + addEdge(causality, { + fromSkill: recall.candidateSkill, + toSkill: recall.companionSkill, + relation: "companion-of", + code: "verified-companion", + detail: { + scenario: recall.scenario, + confidence: recall.confidence, + }, + }); + + log.debug("companion-recall-injected", { + candidateSkill: recall.candidateSkill, + companionSkill: recall.companionSkill, + scenario: recall.scenario, + lift: recall.confidence, + summaryOnly: alreadySeen, + }); + } + + if (companionRecall.rejected.length > 0) { + log.debug("companion-recall-rejected", { + rejected: companionRecall.rejected, + }); + } + } else { + log.debug("companion-recall-skipped", { + reason: !companionStory ? "no_active_verification_story" : "no_target_boundary", + }); + } + } + + // Stage 4.97: Verified playbook recall — insert learned ordered multi-skill + // sequences after the anchor skill. Upgrades injection from isolated winners + // to proven procedural strategies. + const playbookRecallReasons: Record = {}; + let playbookBanner: string | null = null; + const playbookExposureRoles = new Map(); + if (cwd && sessionId) { + const playbookPlan = loadCachedPlanResult(sessionId, log); + const playbookStory = playbookPlan ? selectActiveStory(playbookPlan) : null; + const playbookBoundary = (playbookPlan?.primaryNextAction?.targetBoundary as + | "uiRender" + | "clientRequest" + | "serverHandler" + | "environment" + | null) ?? null; + + if (playbookStory && playbookBoundary) { + const playbookRecall = recallVerifiedPlaybook({ + projectRoot: cwd, + scenario: { + hook: "PreToolUse" as RoutingHookName, + storyKind: playbookStory.kind ?? null, + targetBoundary: playbookBoundary, + toolName: toolName as RoutingToolName, + routeScope: playbookStory.route ?? null, + }, + candidateSkills: [...rankedSkills], + excludeSkills: new Set([...rankedSkills, ...injectedSkills]), + maxInsertedSkills: 2, + }); + + const playbookApply = applyVerifiedPlaybookInsertion({ + rankedSkills, + matched, + injectedSkills, + dedupOff, + forceSummarySkills, + selection: playbookRecall.selected + ? { + anchorSkill: playbookRecall.selected.anchorSkill, + insertedSkills: playbookRecall.selected.insertedSkills, + banner: playbookRecall.banner, + } + : null, + }); + + rankedSkills.length = 0; + rankedSkills.push(...playbookApply.rankedSkills); + matched.clear(); + for (const skill of playbookApply.matched) matched.add(skill); + forceSummarySkills.clear(); + for (const skill of playbookApply.forceSummarySkills) { + forceSummarySkills.add(skill); + } + Object.assign(playbookRecallReasons, playbookApply.reasons); + + if (playbookApply.applied) { + if (playbookApply.banner) { + playbookBanner = playbookApply.banner; + } + for (const role of buildPlaybookExposureRoles(playbookApply.appliedOrderedSkills)) { + playbookExposureRoles.set(role.skill, role); + } + + if (playbookRecall.selected) { + for (const skill of playbookApply.appliedInsertedSkills) { + addCause(causality, { + code: "verified-playbook", + stage: "rank", + skill, + synthetic: true, + scoreDelta: 0, + message: `Inserted verified playbook step after ${playbookRecall.selected.anchorSkill}`, + detail: { + ruleId: playbookRecall.selected.ruleId, + orderedSkills: playbookApply.appliedOrderedSkills, + support: playbookRecall.selected.support, + precision: playbookRecall.selected.precision, + lift: playbookRecall.selected.lift, + }, + }); + addEdge(causality, { + fromSkill: playbookRecall.selected.anchorSkill, + toSkill: skill, + relation: "playbook-step", + code: "verified-playbook", + detail: { + ruleId: playbookRecall.selected.ruleId, + }, + }); + } + log.debug("playbook-recall-injected", { + ruleId: playbookRecall.selected.ruleId, + anchorSkill: playbookRecall.selected.anchorSkill, + insertedSkills: playbookApply.appliedInsertedSkills, + }); + } + } else if (playbookRecall.selected) { + log.debug("playbook-recall-noop", { + ruleId: playbookRecall.selected.ruleId, + anchorSkill: playbookRecall.selected.anchorSkill, + requestedInsertedSkills: playbookRecall.selected.insertedSkills, + reason: "no_new_skills_inserted", + }); + } + } else { + log.debug("playbook-recall-skipped", { + reason: !playbookStory ? "no_active_verification_story" : "no_target_boundary", + }); + } + } + let vercelEnvHelpInjected = false; if (vercelEnvHelp.triggered) { let helpClaimed = true; @@ -1534,9 +2151,11 @@ function run(): string { matchedSkills: [...matched], injectedSkills: [], boostsApplied: profilerBoosted, + policyBoosted, }, log.active ? timing : null); - const envUpdates = finalizeRuntimeEnvUpdates(platform, runtimeEnvBefore); - return formatPlatformOutput(platform, undefined, envUpdates); + const earlyEnv = finalizeRuntimeEnvUpdates(platform, runtimeEnvBefore); + const clearingEnv: Record = { ...(earlyEnv ?? {}), ...buildVerificationEnv(null) }; + return formatPlatformOutput(platform, undefined, clearingEnv); } // Stage 5: injectSkills (enforces byte budget + MAX_SKILLS ceiling) @@ -1554,6 +2173,92 @@ function run(): string { }); if (log.active) timing.skill_read = Math.round(log.now() - tSkillRead); + // Record cap/budget drop causes + for (const skill of droppedByCap) { + addCause(causality, { + code: "dropped-cap", + stage: "inject", + skill, + synthetic: false, + scoreDelta: 0, + message: "Dropped because max skill cap was exceeded", + detail: { maxSkills: MAX_SKILLS }, + }); + } + for (const skill of droppedByBudget) { + addCause(causality, { + code: "dropped-budget", + stage: "inject", + skill, + synthetic: false, + scoreDelta: 0, + message: "Dropped because injection budget was exhausted", + detail: { budgetBytes: getInjectionBudget() }, + }); + } + + // Record routing-policy exposures for actually injected skills + // Only record when an active verification story exists to prevent none|none scenario pollution + if (loaded.length > 0 && sessionId) { + const plan = loadCachedPlanResult(sessionId, log); + const story = plan ? selectActiveStory(plan) : null; + if (story) { + const targetBoundary = (plan?.primaryNextAction?.targetBoundary as + | "uiRender" + | "clientRequest" + | "serverHandler" + | "environment" + | null) ?? null; + + const attribution = buildAttributionDecision({ + sessionId, + hook: "PreToolUse", + storyId: story.id ?? null, + route: story.route ?? null, + targetBoundary, + loadedSkills: loaded, + preferredSkills: policyRecallSynthetic, + }); + + for (const skill of loaded) { + const playbookRole = playbookExposureRoles.get(skill); + appendSkillExposure({ + id: `${sessionId}:${skill}:${Date.now()}`, + sessionId, + projectRoot: cwd, + storyId: story.id ?? null, + storyKind: story.kind ?? null, + route: story.route ?? null, + hook: "PreToolUse", + toolName: toolName as RoutingToolName, + skill, + targetBoundary, + exposureGroupId: attribution.exposureGroupId, + attributionRole: playbookRole?.attributionRole + ?? (skill === attribution.candidateSkill ? "candidate" : "context"), + candidateSkill: playbookRole?.candidateSkill ?? attribution.candidateSkill, + createdAt: new Date().toISOString(), + resolvedAt: null, + outcome: "pending", + }); + } + log.summary("routing-policy-exposures-recorded", { + hook: "PreToolUse", + skills: loaded, + storyId: story.id, + storyKind: story.kind ?? null, + candidateSkill: attribution.candidateSkill, + exposureGroupId: attribution.exposureGroupId, + }); + } else { + log.debug("routing-policy-exposures-skipped", { + hook: "PreToolUse", + reason: "no active verification story", + skills: loaded, + }); + } + } + // Append review marker if tsx review was triggered and skill was loaded if (tsxReviewInjected && loaded.includes(TSX_REVIEW_SKILL)) { parts.push(REVIEW_MARKER); @@ -1596,9 +2301,11 @@ function run(): string { droppedByCap, droppedByBudget, boostsApplied: profilerBoosted, + policyBoosted, }, log.active ? timing : null); - const envUpdates = finalizeRuntimeEnvUpdates(platform, runtimeEnvBefore); - return formatPlatformOutput(platform, undefined, envUpdates); + const earlyEnv2 = finalizeRuntimeEnvUpdates(platform, runtimeEnvBefore); + const clearingEnv2: Record = { ...(earlyEnv2 ?? {}), ...buildVerificationEnv(null) }; + return formatPlatformOutput(platform, undefined, clearingEnv2); } if (log.active) timing.total = log.elapsed(); @@ -1653,6 +2360,19 @@ function run(): string { } } } + // Add policy-recall reasons + for (const skill of policyRecallSynthetic) { + reasons[skill] = { + trigger: "policy-recall", + reasonCode: "route-scoped-verified-policy-recall", + }; + } + for (const [skill, reason] of Object.entries(companionRecallReasons)) { + reasons[skill] = reason; + } + for (const [skill, reason] of Object.entries(playbookRecallReasons)) { + reasons[skill] = reason; + } // Add pattern-match reasons for remaining skills for (const skill of loaded) { if (!reasons[skill] && matchReasons?.[skill]) { @@ -1663,8 +2383,32 @@ function run(): string { } } - // Stage 6: formatOutput - const envUpdates = finalizeRuntimeEnvUpdates(platform, runtimeEnvBefore); + // Stage 6: resolve verification directive and formatOutput + const verificationRuntime = resolveVerificationRuntimeState(sessionId, { + agentBrowserAvailable: process.env.VERCEL_PLUGIN_AGENT_BROWSER_AVAILABLE !== "0", + lastAttemptedAction: process.env.VERCEL_PLUGIN_VERIFICATION_ACTION || null, + }, log); + + if (verificationRuntime.banner) { + parts.unshift(verificationRuntime.banner); + log.summary("pretooluse.verification-banner-injected", { + sessionId, + storyId: verificationRuntime.directive?.storyId ?? null, + route: verificationRuntime.directive?.route ?? null, + source: verificationRuntime.plan ? "cache-or-compute" : "none", + }); + } + + if (playbookBanner) { + parts.unshift(playbookBanner); + } + + const runtimeEnv = finalizeRuntimeEnvUpdates(platform, runtimeEnvBefore); + const envUpdates: Record = { + ...(runtimeEnv ?? {}), + ...verificationRuntime.env, + }; + const result = formatOutput({ parts, matched, @@ -1679,7 +2423,7 @@ function run(): string { verificationId, skillMap: skills.skillMap, platform, - env: envUpdates, + env: Object.keys(envUpdates).length > 0 ? envUpdates : undefined, }); if (loaded.length > 0) { @@ -1711,6 +2455,206 @@ function run(): string { } } + // Stage 7: Emit routing decision trace (v2) + { + const tracePlan = sessionId ? loadCachedPlanResult(sessionId, log) : null; + const traceStory = tracePlan ? selectActiveStory(tracePlan) : null; + const traceTimestamp = new Date().toISOString(); + const traceToolTarget = toolName === "Bash" ? redactCommand(toolTarget) : toolTarget; + const decisionId = createDecisionId({ + hook: "PreToolUse", + sessionId, + toolName, + toolTarget: traceToolTarget, + timestamp: traceTimestamp, + }); + + // Build synthetic skill set for accurate trace marking + const syntheticSkills = new Set(); + if (tsxReviewInjected && tsxReview.triggered) syntheticSkills.add(TSX_REVIEW_SKILL); + if (devServerVerifyInjected && devServerVerify.triggered) syntheticSkills.add(DEV_SERVER_VERIFY_SKILL); + if (devServerVerify.triggered && !devServerVerify.unavailable) { + for (const companion of DEV_SERVER_COMPANION_SKILLS) { + if (rankedSkills.includes(companion) && !newEntries.some((e) => e.skill === companion)) { + syntheticSkills.add(companion); + } + } + } + if (devServerVerify.loopGuardHit && !devServerVerify.unavailable) { + for (const companion of DEV_SERVER_COMPANION_SKILLS) { + if (rankedSkills.includes(companion)) syntheticSkills.add(companion); + } + } + if (aiSdkCompanionInjected) { + for (const companion of AI_SDK_COMPANION_SKILLS) { + if (rankedSkills.includes(companion) && !newEntries.some((e) => e.skill === companion)) { + syntheticSkills.add(companion); + } + } + } + for (const skill of policyRecallSynthetic) { + syntheticSkills.add(skill); + } + for (const skill of Object.keys(companionRecallReasons)) { + syntheticSkills.add(skill); + } + + // Build ranked entries: pattern-matched entries + synthetic injections + deduped candidates + const traceRanked: Array<{ + skill: string; + basePriority: number; + effectivePriority: number; + pattern: { type: string; value: string } | null; + profilerBoost: number; + policyBoost: number; + policyReason: string | null; + matchedRuleId: string | null; + ruleBoost: number; + ruleReason: string | null; + rulebookPath: string | null; + summaryOnly: boolean; + synthetic: boolean; + droppedReason: "deduped" | "cap_exceeded" | "budget_exhausted" | "concurrent_claim" | null; + }> = []; + const trackedSkills = new Set(); + + // 1. Pattern-matched entries (from newEntries, post-dedup) + for (const entry of newEntries) { + const match = matchReasons?.[entry.skill]; + const policy = policyBoosted.find((p) => p.skill === entry.skill); + const rb = rulebookBoosted.find((r) => r.skill === entry.skill); + const companionReason = companionRecallReasons[entry.skill]; + trackedSkills.add(entry.skill); + traceRanked.push({ + skill: entry.skill, + basePriority: entry.priority, + effectivePriority: typeof entry.effectivePriority === "number" + ? entry.effectivePriority + : entry.priority, + pattern: companionReason + ? { type: companionReason.trigger, value: companionReason.reasonCode } + : match + ? { type: match.matchType, value: match.pattern } + : null, + profilerBoost: profilerBoosted.includes(entry.skill) ? 5 : 0, + policyBoost: policy?.boost ?? 0, + policyReason: policy?.reason ?? null, + matchedRuleId: rb?.matchedRuleId ?? null, + ruleBoost: rb?.ruleBoost ?? 0, + ruleReason: rb?.ruleReason ?? null, + rulebookPath: rb?.rulebookPath ?? null, + summaryOnly: summaryOnly.includes(entry.skill), + synthetic: syntheticSkills.has(entry.skill), + droppedReason: droppedByCap.includes(entry.skill) + ? "cap_exceeded" + : droppedByBudget.includes(entry.skill) + ? "budget_exhausted" + : null, + }); + } + + // 2. Synthetic injections not already in newEntries + for (const skill of syntheticSkills) { + if (trackedSkills.has(skill)) continue; + trackedSkills.add(skill); + const reason = reasons[skill]; + traceRanked.push({ + skill, + basePriority: 0, + effectivePriority: 0, + pattern: reason ? { type: reason.trigger, value: reason.reasonCode } : null, + profilerBoost: 0, + policyBoost: 0, + policyReason: null, + matchedRuleId: null, + ruleBoost: 0, + ruleReason: null, + rulebookPath: null, + summaryOnly: summaryOnly.includes(skill), + synthetic: true, + droppedReason: droppedByCap.includes(skill) + ? "cap_exceeded" + : droppedByBudget.includes(skill) + ? "budget_exhausted" + : null, + }); + } + + // 3. Deduped candidates (matched but filtered by seen-skills) + for (const entry of matchedEntries) { + if (trackedSkills.has(entry.skill)) continue; + if (!injectedSkills.has(entry.skill)) continue; // only mark actually-deduped ones + trackedSkills.add(entry.skill); + const match = matchReasons?.[entry.skill]; + traceRanked.push({ + skill: entry.skill, + basePriority: entry.priority, + effectivePriority: typeof entry.effectivePriority === "number" + ? entry.effectivePriority + : entry.priority, + pattern: match ? { type: match.matchType, value: match.pattern } : null, + profilerBoost: profilerBoosted.includes(entry.skill) ? 5 : 0, + policyBoost: 0, + policyReason: null, + matchedRuleId: null, + ruleBoost: 0, + ruleReason: null, + rulebookPath: null, + summaryOnly: false, + synthetic: false, + droppedReason: "deduped", + }); + } + + appendRoutingDecisionTrace({ + version: 2, + decisionId, + sessionId, + hook: "PreToolUse", + toolName, + toolTarget: traceToolTarget, + timestamp: traceTimestamp, + primaryStory: { + id: traceStory?.id ?? null, + kind: traceStory?.kind ?? null, + storyRoute: traceStory?.route ?? null, + targetBoundary: tracePlan?.primaryNextAction?.targetBoundary ?? null, + }, + observedRoute: null, // PreToolUse fires before execution; no observed route yet + policyScenario: traceStory + ? `PreToolUse|${traceStory.kind ?? "none"}|${tracePlan?.primaryNextAction?.targetBoundary ?? "none"}|${toolName}` + : null, + matchedSkills: [...matched], + injectedSkills: loaded, + skippedReasons: [ + ...(traceStory ? [] : ["no_active_verification_story"]), + ...droppedByCap.map((skill) => `cap_exceeded:${skill}`), + ...droppedByBudget.map((skill) => `budget_exhausted:${skill}`), + ], + ranked: traceRanked, + verification: verificationId + ? { verificationId, observedBoundary: null, matchedSuggestedAction: null } + : null, + causes: causality.causes, + edges: causality.edges, + }); + log.summary("routing.decision_trace_written", { + decisionId, + hook: "PreToolUse", + toolName, + matchedSkills: [...matched], + injectedSkills: loaded, + }); + log.summary("routing.decision_causality", { + decisionId, + hook: "PreToolUse", + causeCount: causality.causes.length, + edgeCount: causality.edges.length, + causes: causality.causes, + edges: causality.edges, + }); + } + return result; } diff --git a/hooks/src/prompt-policy-recall.mts b/hooks/src/prompt-policy-recall.mts new file mode 100644 index 0000000..7a0a8bb --- /dev/null +++ b/hooks/src/prompt-policy-recall.mts @@ -0,0 +1,123 @@ +/** + * Pure verified prompt-policy recall helper. + * + * When an active verification story exists, recalls the highest-confidence + * historically winning skill for that exact storyKind + targetBoundary + + * routeScope — even when prompt signals miss it entirely. + * + * Pure: does not mutate caller-provided arrays or iterables. + */ + +import { + explainPolicyRecall, + type PolicyRecallDiagnosis, +} from "./routing-diagnosis.mjs"; +import type { + RoutingBoundary, + RoutingPolicyFile, +} from "./routing-policy.mjs"; + +// --------------------------------------------------------------------------- +// Types +// --------------------------------------------------------------------------- + +export interface PromptPolicyRecallBinding { + storyId: string | null; + storyKind: string | null; + route: string | null; + targetBoundary: RoutingBoundary | null; +} + +export interface PromptPolicyRecallResult { + selectedSkills: string[]; + matchedSkills: string[]; + syntheticSkills: string[]; + reasons: Record; + diagnosis: PolicyRecallDiagnosis | null; +} + +// --------------------------------------------------------------------------- +// Core +// --------------------------------------------------------------------------- + +export function applyPromptPolicyRecall(params: { + selectedSkills: string[]; + matchedSkills: string[]; + seenSkills?: Iterable; + maxSkills: number; + binding: PromptPolicyRecallBinding; + policy: RoutingPolicyFile; +}): PromptPolicyRecallResult { + const seenSkills = new Set(params.seenSkills ?? []); + const selectedSkills = [...params.selectedSkills]; + const matchedSkills = [...params.matchedSkills]; + const syntheticSkills: string[] = []; + const reasons: Record = {}; + + if (!params.binding.storyId || !params.binding.targetBoundary) { + return { + selectedSkills, + matchedSkills, + syntheticSkills, + reasons, + diagnosis: null, + }; + } + + const availableSlots = Math.max(0, params.maxSkills - selectedSkills.length); + if (availableSlots === 0) { + return { + selectedSkills, + matchedSkills, + syntheticSkills, + reasons, + diagnosis: null, + }; + } + + const excludeSkills = new Set([ + ...selectedSkills, + ...seenSkills, + ]); + + const diagnosis = explainPolicyRecall( + params.policy, + { + hook: "UserPromptSubmit", + storyKind: params.binding.storyKind, + targetBoundary: params.binding.targetBoundary, + toolName: "Prompt", + routeScope: params.binding.route ?? null, + }, + { + maxCandidates: availableSlots, + excludeSkills, + }, + ); + + const baseInsertIdx = selectedSkills.length > 0 ? 1 : 0; + let insertedCount = 0; + + for (const candidate of diagnosis.selected) { + if (selectedSkills.includes(candidate.skill)) continue; + + const insertIdx = baseInsertIdx + insertedCount; + selectedSkills.splice(insertIdx, 0, candidate.skill); + insertedCount += 1; + + if (!matchedSkills.includes(candidate.skill)) { + matchedSkills.push(candidate.skill); + } + syntheticSkills.push(candidate.skill); + reasons[candidate.skill] = + `route-scoped verified policy recall (${candidate.wins}/${candidate.exposures} wins, success=${candidate.successRate})`; + } + + return { + selectedSkills, + matchedSkills, + syntheticSkills, + reasons, + diagnosis, + }; +} diff --git a/hooks/src/prompt-verification-binding.mts b/hooks/src/prompt-verification-binding.mts new file mode 100644 index 0000000..5d14359 --- /dev/null +++ b/hooks/src/prompt-verification-binding.mts @@ -0,0 +1,61 @@ +/** + * Prompt Verification Binding + * + * Deterministically binds prompt-time routing decisions to the active + * verification plan's primaryNextAction.targetBoundary. This closes the + * loop so prompt exposures become resolvable training data — without it, + * prompt exposures record targetBoundary: null and fall through to + * stale-miss at session end. + * + * Rule: no prompt exposure append and no prompt policy boost unless + * targetBoundary is non-null. + */ + +import type { RoutingBoundary } from "./routing-policy.mjs"; +import { + selectActiveStory, + type VerificationPlanResult, +} from "./verification-plan.mjs"; + +export interface PromptVerificationBinding { + targetBoundary: RoutingBoundary | null; + storyId: string | null; + storyKind: string | null; + route: string | null; + source: "active-plan" | "none"; + confidence: number; + reason: string; +} + +export function resolvePromptVerificationBinding(input: { + plan: VerificationPlanResult | null; +}): PromptVerificationBinding { + const story = input.plan ? selectActiveStory(input.plan) : null; + const targetBoundary = + (input.plan?.primaryNextAction?.targetBoundary as RoutingBoundary | null) ?? + null; + + if (story && targetBoundary) { + return { + targetBoundary, + storyId: story.id ?? null, + storyKind: story.kind ?? null, + route: story.route ?? null, + source: "active-plan", + confidence: 1, + reason: `active verification plan predicted ${targetBoundary}`, + }; + } + + return { + targetBoundary: null, + storyId: story?.id ?? null, + storyKind: story?.kind ?? null, + route: story?.route ?? null, + source: "none", + confidence: 0, + reason: story + ? "active verification story exists but no primary next boundary is available" + : "no active verification story", + }; +} diff --git a/hooks/src/routing-attribution.mts b/hooks/src/routing-attribution.mts new file mode 100644 index 0000000..60b48e0 --- /dev/null +++ b/hooks/src/routing-attribution.mts @@ -0,0 +1,101 @@ +/** + * Routing Attribution: causal credit assignment for co-injected skills. + * + * When multiple skills are injected in a single batch, one is designated the + * "candidate" (the skill that causally drove the injection) and the rest are + * "context" (helpers along for the ride). Only the candidate's outcomes update + * long-term project routing policy — context exposures are still fully logged + * for replay and operator inspection, but they do not move policy stats. + * + * Selection heuristic (v1): prefer skills that appear in policyRecallSynthetic + * (i.e. skills the policy system explicitly chose to re-inject). If none match, + * fall back to the first skill in the ranked load order (highest priority). + */ + +import { createLogger } from "./logger.mjs"; + +// --------------------------------------------------------------------------- +// Types +// --------------------------------------------------------------------------- + +export type ExposureAttributionRole = "candidate" | "context"; + +export interface AttributionDecision { + exposureGroupId: string; + candidateSkill: string | null; + loadedSkills: string[]; +} + +// --------------------------------------------------------------------------- +// Candidate selection +// --------------------------------------------------------------------------- + +/** + * Choose which skill in a batch owns the policy credit. + * + * Prefers skills in `preferredSkills` (policy-recall synthetic injections). + * Falls back to the first loaded skill (highest-ranked by priority). + * Returns null only when the batch is empty. + */ +export function chooseAttributedSkill( + loadedSkills: string[], + preferredSkills: Iterable = [], +): string | null { + const preferred = new Set(preferredSkills); + for (const skill of loadedSkills) { + if (preferred.has(skill)) return skill; + } + return loadedSkills[0] ?? null; +} + +// --------------------------------------------------------------------------- +// Attribution decision builder +// --------------------------------------------------------------------------- + +export function buildAttributionDecision(input: { + sessionId: string; + hook: "PreToolUse" | "UserPromptSubmit"; + storyId: string | null; + route: string | null; + targetBoundary: + | "uiRender" + | "clientRequest" + | "serverHandler" + | "environment" + | null; + loadedSkills: string[]; + preferredSkills?: Iterable; + now?: string; +}): AttributionDecision { + const log = createLogger(); + const timestamp = input.now ?? new Date().toISOString(); + + const candidateSkill = chooseAttributedSkill( + input.loadedSkills, + input.preferredSkills, + ); + + const decision: AttributionDecision = { + exposureGroupId: [ + input.sessionId, + input.hook, + input.storyId ?? "none", + input.route ?? "*", + input.targetBoundary ?? "none", + timestamp, + ].join(":"), + candidateSkill, + loadedSkills: [...input.loadedSkills], + }; + + log.summary("routing-attribution.decision", { + exposureGroupId: decision.exposureGroupId, + candidateSkill: decision.candidateSkill, + loadedSkills: decision.loadedSkills, + hook: input.hook, + storyId: input.storyId, + route: input.route, + }); + + return decision; +} diff --git a/hooks/src/routing-decision-capsule.mts b/hooks/src/routing-decision-capsule.mts new file mode 100644 index 0000000..d0d9463 --- /dev/null +++ b/hooks/src/routing-decision-capsule.mts @@ -0,0 +1,299 @@ +import { mkdirSync, readFileSync, writeFileSync } from "node:fs"; +import { createHash } from "node:crypto"; +import { join } from "node:path"; +import { tmpdir } from "node:os"; +import type { + DecisionHook, + RankedSkillTrace, + RoutingDecisionTrace, +} from "./routing-decision-trace.mjs"; +import type { VerificationDirective } from "./verification-directive.mjs"; +import { createLogger, logCaughtError, type Logger } from "./logger.mjs"; + +const SAFE_SESSION_ID_RE = /^[a-zA-Z0-9_-]+$/; + +function safeSessionSegment(sessionId: string | null): string { + if (!sessionId) return "no-session"; + if (SAFE_SESSION_ID_RE.test(sessionId)) return sessionId; + return createHash("sha256").update(sessionId).digest("hex"); +} + +export type DecisionCapsulePlatform = "claude-code" | "cursor" | "unknown"; + +export interface DecisionCapsuleReason { + trigger: string; + reasonCode: string; +} + +export interface DecisionCapsuleAttribution { + exposureGroupId: string | null; + candidateSkill: string | null; + loadedSkills: string[]; +} + +export interface DecisionCapsuleRulebookProvenance { + /** The rule ID that matched, e.g. "PreToolUse|flow-verification|uiRender|Bash|agent-browser-verify" */ + matchedRuleId: string; + /** Boost applied by the matched rule */ + ruleBoost: number; + /** Human-readable reason from the rule */ + ruleReason: string; + /** Absolute path to the rulebook JSON file on disk */ + rulebookPath: string; +} + +export interface DecisionCapsuleIssue { + code: string; + severity: "info" | "warning" | "error"; + message: string; + action?: string; +} + +export interface DecisionCapsuleV1 { + type: "routing.decision-capsule/v1"; + version: 1; + decisionId: string; + sessionId: string | null; + hook: DecisionHook; + createdAt: string; + input: { + toolName: string; + toolTarget: string; + platform: DecisionCapsulePlatform; + }; + activeStory: { + id: string | null; + kind: string | null; + route: string | null; + targetBoundary: string | null; + }; + directive: VerificationDirective | null; + matchedSkills: string[]; + injectedSkills: string[]; + ranked: RankedSkillTrace[]; + attribution: DecisionCapsuleAttribution | null; + rulebookProvenance: DecisionCapsuleRulebookProvenance | null; + verification: RoutingDecisionTrace["verification"]; + reasons: Record; + skippedReasons: string[]; + env: Record; + issues: DecisionCapsuleIssue[]; + sha256: string; +} + +export function decisionCapsuleDir(sessionId: string | null): string { + return join( + tmpdir(), + `vercel-plugin-${safeSessionSegment(sessionId)}-capsules`, + ); +} + +export function decisionCapsulePath( + sessionId: string | null, + decisionId: string, +): string { + return join(decisionCapsuleDir(sessionId), `${decisionId}.json`); +} + +function stableSha256(value: Omit): string { + return createHash("sha256").update(JSON.stringify(value)).digest("hex"); +} + +function deriveIssues(input: { + hook: DecisionHook; + directive: VerificationDirective | null; + trace: RoutingDecisionTrace; +}): DecisionCapsuleIssue[] { + const issues: DecisionCapsuleIssue[] = []; + + if (!input.trace.primaryStory.id) { + issues.push({ + code: "no_active_verification_story", + severity: "warning", + message: + "No active verification story was available for this decision.", + action: + "Create or record a verification story before expecting policy learning or directed verification.", + }); + } + + if (!input.directive?.primaryNextAction) { + issues.push({ + code: "env_cleared", + severity: "info", + message: "Verification env keys were cleared for this decision.", + action: + "Expected when no next action exists; unexpected if a flow is mid-debug.", + }); + } + + if (input.directive?.blockedReasons?.length) { + issues.push({ + code: "verification_blocked", + severity: "warning", + message: input.directive.blockedReasons[0]!, + action: + "Resolve the blocking condition before relying on automated verification.", + }); + } + + if ( + input.trace.skippedReasons.some((reason) => + reason.startsWith("budget_exhausted:"), + ) + ) { + issues.push({ + code: "budget_exhausted", + severity: "warning", + message: + "At least one ranked skill was dropped because the injection budget was exhausted.", + action: + "Inspect the ranked list in this capsule to see which skills were trimmed.", + }); + } + + if (input.hook !== "PostToolUse") { + issues.push({ + code: "machine_output_hidden_in_html_comment", + severity: "info", + message: + "Some hook metadata still travels through additionalContext comments due hook schema limits.", + action: + "Use VERCEL_PLUGIN_DECISION_PATH instead of scraping hook output.", + }); + } + + return issues; +} + +/** + * Extract the first rulebook-matched entry from ranked traces. + * Returns null when no rule fired for any ranked skill. + */ +function deriveRulebookProvenance( + trace: RoutingDecisionTrace, +): DecisionCapsuleRulebookProvenance | null { + for (const entry of trace.ranked) { + if (entry.matchedRuleId && entry.rulebookPath) { + return { + matchedRuleId: entry.matchedRuleId, + ruleBoost: entry.ruleBoost, + ruleReason: entry.ruleReason ?? "", + rulebookPath: entry.rulebookPath, + }; + } + } + return null; +} + +export function buildDecisionCapsule(input: { + sessionId: string | null; + hook: DecisionHook; + createdAt: string; + toolName: string; + toolTarget: string; + platform?: string | null; + trace: RoutingDecisionTrace; + directive: VerificationDirective | null; + attribution?: DecisionCapsuleAttribution | null; + reasons?: Record; + env?: Record; +}): DecisionCapsuleV1 { + const platform: DecisionCapsulePlatform = + input.platform === "cursor" || input.platform === "claude-code" + ? input.platform + : "unknown"; + + const base: Omit = { + type: "routing.decision-capsule/v1", + version: 1, + decisionId: input.trace.decisionId, + sessionId: input.sessionId, + hook: input.hook, + createdAt: input.createdAt, + input: { + toolName: input.toolName, + toolTarget: input.toolTarget, + platform, + }, + activeStory: { + id: input.trace.primaryStory.id, + kind: input.trace.primaryStory.kind, + route: input.trace.primaryStory.storyRoute, + targetBoundary: input.trace.primaryStory.targetBoundary, + }, + directive: input.directive, + matchedSkills: [...input.trace.matchedSkills], + injectedSkills: [...input.trace.injectedSkills], + ranked: [...input.trace.ranked], + attribution: input.attribution ?? null, + rulebookProvenance: deriveRulebookProvenance(input.trace), + verification: input.trace.verification, + reasons: { ...(input.reasons ?? {}) }, + skippedReasons: [...input.trace.skippedReasons], + env: { ...(input.env ?? {}) }, + issues: deriveIssues({ + hook: input.hook, + directive: input.directive, + trace: input.trace, + }), + }; + + return { ...base, sha256: stableSha256(base) }; +} + +export function persistDecisionCapsule( + capsule: DecisionCapsuleV1, + logger?: Logger, +): string { + const log = logger ?? createLogger(); + const path = decisionCapsulePath(capsule.sessionId, capsule.decisionId); + + try { + mkdirSync(decisionCapsuleDir(capsule.sessionId), { recursive: true }); + writeFileSync(path, JSON.stringify(capsule, null, 2) + "\n", "utf-8"); + log.summary("routing.decision_capsule_written", { + decisionId: capsule.decisionId, + sessionId: capsule.sessionId, + hook: capsule.hook, + path, + sha256: capsule.sha256, + }); + } catch (error) { + logCaughtError(log, "routing.decision_capsule_write_failed", error, { + decisionId: capsule.decisionId, + sessionId: capsule.sessionId, + path, + }); + } + + return path; +} + +export function buildDecisionCapsuleEnv( + capsule: DecisionCapsuleV1, + artifactPath: string, +): Record { + return { + VERCEL_PLUGIN_DECISION_ID: capsule.decisionId, + VERCEL_PLUGIN_DECISION_PATH: artifactPath, + VERCEL_PLUGIN_DECISION_SHA256: capsule.sha256, + }; +} + +export function readDecisionCapsule( + artifactPath: string, + logger?: Logger, +): DecisionCapsuleV1 | null { + const log = logger ?? createLogger(); + try { + return JSON.parse( + readFileSync(artifactPath, "utf-8"), + ) as DecisionCapsuleV1; + } catch (error) { + logCaughtError(log, "routing.decision_capsule_read_failed", error, { + artifactPath, + }); + return null; + } +} diff --git a/hooks/src/routing-decision-causality.mts b/hooks/src/routing-decision-causality.mts new file mode 100644 index 0000000..dd5e3a5 --- /dev/null +++ b/hooks/src/routing-decision-causality.mts @@ -0,0 +1,113 @@ +/** + * Routing Decision Causality: first-class, machine-readable decision capsule + * that records explicit causes[] and edges[] for every injected, boosted, + * recalled, companion-linked, or dropped skill. + * + * Deterministic: detail objects are key-sorted on insertion so JSON.stringify + * output is stable regardless of insertion order. + */ + +// --------------------------------------------------------------------------- +// Types +// --------------------------------------------------------------------------- + +export type RoutingDecisionCauseStage = "match" | "rank" | "inject" | "observe"; + +export type RoutingDecisionEdgeRelation = + | "companion-of" + | "recalled-after" + | "boosted-by-policy" + | "boosted-by-rulebook"; + +export interface RoutingDecisionCause { + code: string; + stage: RoutingDecisionCauseStage; + skill: string; + synthetic: boolean; + scoreDelta: number; + message: string; + detail: Record; +} + +export interface RoutingDecisionEdge { + fromSkill: string; + toSkill: string; + relation: RoutingDecisionEdgeRelation | string; + code: string; + detail: Record; +} + +export interface RoutingDecisionCausality { + causes: RoutingDecisionCause[]; + edges: RoutingDecisionEdge[]; +} + +// --------------------------------------------------------------------------- +// Deterministic sorting helpers +// --------------------------------------------------------------------------- + +function sortUnknown(value: unknown): unknown { + if (Array.isArray(value)) { + return value.map(sortUnknown); + } + if (!value || typeof value !== "object") { + return value; + } + const input = value as Record; + const output: Record = {}; + for (const key of Object.keys(input).sort()) { + output[key] = sortUnknown(input[key]); + } + return output; +} + +function causeKey(cause: RoutingDecisionCause): string { + return [cause.skill, cause.stage, cause.code, cause.message].join("\0"); +} + +function edgeKey(edge: RoutingDecisionEdge): string { + return [edge.fromSkill, edge.toSkill, String(edge.relation), edge.code].join( + "\0", + ); +} + +// --------------------------------------------------------------------------- +// Public API +// --------------------------------------------------------------------------- + +export function createDecisionCausality(): RoutingDecisionCausality { + return { causes: [], edges: [] }; +} + +export function addCause( + store: RoutingDecisionCausality, + cause: RoutingDecisionCause, +): void { + store.causes.push({ + ...cause, + detail: sortUnknown(cause.detail) as Record, + }); + store.causes.sort((left, right) => + causeKey(left).localeCompare(causeKey(right)), + ); +} + +export function addEdge( + store: RoutingDecisionCausality, + edge: RoutingDecisionEdge, +): void { + store.edges.push({ + ...edge, + detail: sortUnknown(edge.detail) as Record, + }); + store.edges.sort((left, right) => + edgeKey(left).localeCompare(edgeKey(right)), + ); +} + +export function causesForSkill( + store: RoutingDecisionCausality, + skill: string, +): RoutingDecisionCause[] { + return store.causes.filter((cause) => cause.skill === skill); +} diff --git a/hooks/src/routing-decision-trace.mts b/hooks/src/routing-decision-trace.mts new file mode 100644 index 0000000..da60a24 --- /dev/null +++ b/hooks/src/routing-decision-trace.mts @@ -0,0 +1,245 @@ +/** + * Routing Decision Flight Recorder: append-only JSONL trace of every routing + * decision (skill injection, prompt scoring, verification closure). + * + * Persistence contract: + * - Trace dir: `/vercel-plugin--trace/` + * - Trace file: `/routing-decision-trace.jsonl` + * + * Each routing event appends one JSON object per line. Reads return all traces + * in append order. Missing files return `[]` without throwing. + * + * v2 — separates storyRoute from observedRoute, marks synthetic injections, + * and encodes explicit drop reasons for all non-selected candidates. + * Backward-compatible: v1 lines are normalized to v2 on read. + */ + +import { + appendFileSync, + mkdirSync, + readFileSync, +} from "node:fs"; +import { join } from "node:path"; +import { tmpdir } from "node:os"; +import { createHash } from "node:crypto"; +import type { + RoutingDecisionCause, + RoutingDecisionEdge, +} from "./routing-decision-causality.mjs"; + +// --------------------------------------------------------------------------- +// Safe session-id segment (mirrors routing-policy-ledger.mts) +// --------------------------------------------------------------------------- + +const SAFE_SESSION_ID_RE = /^[a-zA-Z0-9_-]+$/; + +function safeSessionSegment(sessionId: string | null): string { + if (!sessionId) return "no-session"; + if (SAFE_SESSION_ID_RE.test(sessionId)) return sessionId; + return createHash("sha256").update(sessionId).digest("hex"); +} + +// --------------------------------------------------------------------------- +// Types +// --------------------------------------------------------------------------- + +export type DecisionHook = "PreToolUse" | "UserPromptSubmit" | "PostToolUse"; + +export interface RankedSkillTrace { + skill: string; + basePriority: number; + effectivePriority: number; + pattern: { type: string; value: string } | null; + profilerBoost: number; + policyBoost: number; + policyReason: string | null; + /** Matched learned-rulebook rule ID, or null when no rule applies. */ + matchedRuleId: string | null; + /** Boost applied from a learned-rulebook rule (0 when no rule matches). */ + ruleBoost: number; + /** Human-readable reason from the matched rulebook rule. */ + ruleReason: string | null; + /** Path to the rulebook file that provided the matched rule. */ + rulebookPath: string | null; + summaryOnly: boolean; + synthetic: boolean; + droppedReason: + | "deduped" + | "cap_exceeded" + | "budget_exhausted" + | "concurrent_claim" + | null; +} + +export interface RoutingDecisionTrace { + version: 2; + decisionId: string; + sessionId: string | null; + hook: DecisionHook; + toolName: string; + toolTarget: string; + timestamp: string; + primaryStory: { + id: string | null; + kind: string | null; + storyRoute: string | null; + targetBoundary: string | null; + }; + observedRoute: string | null; + policyScenario: string | null; + matchedSkills: string[]; + injectedSkills: string[]; + skippedReasons: string[]; + ranked: RankedSkillTrace[]; + verification: { + verificationId: string | null; + observedBoundary: string | null; + matchedSuggestedAction: boolean | null; + } | null; + /** Explicit causal reasons for each routing action (pattern match, boost, recall, drop). */ + causes: RoutingDecisionCause[]; + /** Explicit relationships between skills (companion-of, recalled-after, etc.). */ + edges: RoutingDecisionEdge[]; +} + +// Re-export causality types for downstream consumers +export type { RoutingDecisionCause, RoutingDecisionEdge } from "./routing-decision-causality.mjs"; + +/** + * V1 trace shape for backward-compatible reads. V1 stored `route` inside + * primaryStory and had no top-level `observedRoute`. + */ +interface RoutingDecisionTraceV1 { + version: 1; + decisionId: string; + sessionId: string | null; + hook: DecisionHook; + toolName: string; + toolTarget: string; + timestamp: string; + primaryStory: { + id: string | null; + kind: string | null; + route: string | null; + targetBoundary: string | null; + }; + policyScenario: string | null; + matchedSkills: string[]; + injectedSkills: string[]; + skippedReasons: string[]; + ranked: RankedSkillTrace[]; + verification: { + verificationId: string | null; + observedBoundary: string | null; + matchedSuggestedAction: boolean | null; + } | null; +} + +type PersistedTrace = RoutingDecisionTrace | RoutingDecisionTraceV1; + +// --------------------------------------------------------------------------- +// V1 → V2 normalization +// --------------------------------------------------------------------------- + +function normalizeTrace(raw: PersistedTrace): RoutingDecisionTrace { + if (raw.version === 2) { + // Backfill causes/edges for v2 traces written before the causality feature + const trace = raw as RoutingDecisionTrace; + return { + ...trace, + causes: trace.causes ?? [], + edges: trace.edges ?? [], + }; + } + + // V1 → V2: move primaryStory.route to storyRoute, add observedRoute + const v1 = raw as RoutingDecisionTraceV1; + return { + ...v1, + version: 2, + primaryStory: { + id: v1.primaryStory.id, + kind: v1.primaryStory.kind, + storyRoute: v1.primaryStory.route, + targetBoundary: v1.primaryStory.targetBoundary, + }, + observedRoute: v1.primaryStory.route, // best-effort: v1 conflated the two + causes: [], + edges: [], + }; +} + +// --------------------------------------------------------------------------- +// Path helpers (exported for testing) +// --------------------------------------------------------------------------- + +export function traceDir(sessionId: string | null): string { + return join( + tmpdir(), + `vercel-plugin-${safeSessionSegment(sessionId)}-trace`, + ); +} + +export function tracePath(sessionId: string | null): string { + return join(traceDir(sessionId), "routing-decision-trace.jsonl"); +} + +// --------------------------------------------------------------------------- +// Decision ID — deterministic for identical causal inputs +// --------------------------------------------------------------------------- + +export function createDecisionId(input: { + hook: DecisionHook; + sessionId: string | null; + toolName: string; + toolTarget: string; + timestamp?: string; +}): string { + const timestamp = input.timestamp ?? new Date().toISOString(); + return createHash("sha256") + .update( + [ + input.hook, + input.sessionId ?? "", + input.toolName, + input.toolTarget, + timestamp, + ].join("|"), + ) + .digest("hex") + .slice(0, 16); +} + +// --------------------------------------------------------------------------- +// Append (write) — one JSONL line per decision +// --------------------------------------------------------------------------- + +export function appendRoutingDecisionTrace( + trace: RoutingDecisionTrace, +): void { + mkdirSync(traceDir(trace.sessionId), { recursive: true }); + appendFileSync( + tracePath(trace.sessionId), + JSON.stringify(trace) + "\n", + "utf8", + ); +} + +// --------------------------------------------------------------------------- +// Read — returns all traces in append order, [] on missing file +// Normalizes v1 lines to v2 for backward compatibility. +// --------------------------------------------------------------------------- + +export function readRoutingDecisionTrace( + sessionId: string | null, +): RoutingDecisionTrace[] { + try { + const content = readFileSync(tracePath(sessionId), "utf8"); + return content + .split("\n") + .filter((line) => line.trim() !== "") + .map((line) => normalizeTrace(JSON.parse(line) as PersistedTrace)); + } catch { + return []; + } +} diff --git a/hooks/src/routing-diagnosis.mts b/hooks/src/routing-diagnosis.mts new file mode 100644 index 0000000..22ba990 --- /dev/null +++ b/hooks/src/routing-diagnosis.mts @@ -0,0 +1,474 @@ +/** + * Pure route-recall diagnosis engine with deterministic why-not output. + * + * All functions are pure — no filesystem access, no side effects. + * Designed for hooks, CLI, tests, and downstream agent consumers. + */ + +import { + computePolicySuccessRate, + derivePolicyBoost, + scenarioKeyCandidates, + type RoutingBoundary, + type RoutingHookName, + type RoutingPolicyFile, + type RoutingPolicyScenario, + type RoutingPolicyStats, + type RoutingToolName, +} from "./routing-policy.mjs"; +import { selectPolicyRecallCandidates } from "./policy-recall.mjs"; + +const POLICY_RECALL_MIN_EXPOSURES = 3; +const POLICY_RECALL_MIN_SUCCESS_RATE = 0.65; +const POLICY_RECALL_MIN_BOOST = 2; + +const HOOK_NAMES: RoutingHookName[] = ["PreToolUse", "UserPromptSubmit"]; +const TOOL_NAMES: RoutingToolName[] = ["Read", "Edit", "Write", "Bash", "Prompt"]; +const BOUNDARIES: RoutingBoundary[] = [ + "uiRender", + "clientRequest", + "serverHandler", + "environment", +]; + +// --------------------------------------------------------------------------- +// Types +// --------------------------------------------------------------------------- + +export interface RoutingDiagnosisAction { + type: + | "collect_more_exposures" + | "improve_success_rate" + | "seed_exact_route_history" + | "candidate_already_present" + | "selected_bucket_precedence" + | "no_history"; + skill?: string; + scenario?: string; + remainingExposures?: number; +} + +export interface RoutingDiagnosisHint { + severity: "info" | "warning" | "error"; + code: string; + message: string; + hint?: string; + action?: RoutingDiagnosisAction; +} + +export interface PolicyRecallSelectedCandidate { + skill: string; + scenario: string; + exposures: number; + wins: number; + directiveWins: number; + successRate: number; + policyBoost: number; + recallScore: number; + staleMisses?: number; +} + +export interface PolicyRecallCandidateDiagnosis { + skill: string; + scenario: string; + exposures: number; + wins: number; + directiveWins: number; + staleMisses: number; + successRate: number; + policyBoost: number; + recallScore: number; + qualified: boolean; + excluded: boolean; + rejectedReason: string | null; +} + +export interface PolicyRecallBucketDiagnosis { + scenario: string; + skillCount: number; + qualifiedCount: number; + selected: boolean; +} + +export interface PolicyRecallDiagnosis { + eligible: boolean; + skipReason: string | null; + checkedScenarios: PolicyRecallBucketDiagnosis[]; + selectedBucket: string | null; + selected: PolicyRecallCandidateDiagnosis[]; + rejected: PolicyRecallCandidateDiagnosis[]; + hints: RoutingDiagnosisHint[]; +} + +export interface ExplainPolicyRecallOptions { + excludeSkills?: Set; + maxCandidates?: number; +} + +// --------------------------------------------------------------------------- +// Helpers +// --------------------------------------------------------------------------- + +function round(value: number): number { + return Number(value.toFixed(4)); +} + +function diagnosticRecallScore(stats: RoutingPolicyStats): number { + return round( + derivePolicyBoost(stats) * 1000 + + computePolicySuccessRate(stats) * 100 + + stats.exposures, + ); +} + +function qualifies(stats: RoutingPolicyStats): { + successRate: number; + policyBoost: number; + qualified: boolean; +} { + const successRate = round(computePolicySuccessRate(stats)); + const policyBoost = derivePolicyBoost(stats); + const qualified = + stats.exposures >= POLICY_RECALL_MIN_EXPOSURES && + successRate >= POLICY_RECALL_MIN_SUCCESS_RATE && + policyBoost >= POLICY_RECALL_MIN_BOOST; + return { successRate, policyBoost, qualified }; +} + +function pushHint( + target: RoutingDiagnosisHint[], + hint: RoutingDiagnosisHint, +): void { + const key = JSON.stringify([ + hint.code, + hint.action?.type ?? null, + hint.action?.skill ?? null, + hint.action?.scenario ?? null, + ]); + const exists = target.some((existing) => { + const existingKey = JSON.stringify([ + existing.code, + existing.action?.type ?? null, + existing.action?.skill ?? null, + existing.action?.scenario ?? null, + ]); + return existingKey === key; + }); + if (!exists) { + target.push(hint); + } +} + +function isHookName(value: string): value is RoutingHookName { + return HOOK_NAMES.includes(value as RoutingHookName); +} + +function isToolName(value: string): value is RoutingToolName { + return TOOL_NAMES.includes(value as RoutingToolName); +} + +function isBoundary(value: string): value is RoutingBoundary { + return BOUNDARIES.includes(value as RoutingBoundary); +} + +// --------------------------------------------------------------------------- +// parsePolicyScenario +// --------------------------------------------------------------------------- + +export function parsePolicyScenario( + value: string | null, +): RoutingPolicyScenario | null { + if (!value) return null; + const parts = value.split("|"); + if (parts.length < 4) return null; + + const [hook, storyKind, targetBoundary, toolName, routeScope] = parts; + if (!isHookName(hook) || !isToolName(toolName)) { + return null; + } + + return { + hook, + storyKind: storyKind === "none" ? null : storyKind, + targetBoundary: + targetBoundary === "none" + ? null + : isBoundary(targetBoundary) + ? targetBoundary + : null, + toolName, + routeScope: + typeof routeScope === "string" && routeScope.length > 0 + ? routeScope + : null, + }; +} + +// --------------------------------------------------------------------------- +// candidateFromStats (rejected candidates only) +// --------------------------------------------------------------------------- + +function candidateFromStats( + skill: string, + scenario: string, + stats: RoutingPolicyStats, + selectedBucket: string | null, + selectedSkills: Set, + excludeSkills: Set, +): PolicyRecallCandidateDiagnosis | null { + // Skip candidates that are already in the selected set + if (selectedBucket === scenario && selectedSkills.has(skill)) { + return null; + } + + const { successRate, policyBoost, qualified } = qualifies(stats); + const excluded = excludeSkills.has(skill); + let rejectedReason: string | null = null; + + if (selectedBucket && scenario !== selectedBucket) { + rejectedReason = `shadowed_by_selected_bucket:${selectedBucket}`; + } else if (excluded) { + rejectedReason = "already_ranked_or_injected"; + } else if (stats.exposures < POLICY_RECALL_MIN_EXPOSURES) { + rejectedReason = `needs_${POLICY_RECALL_MIN_EXPOSURES - stats.exposures}_more_exposures`; + } else if (qualified) { + rejectedReason = "lost_tiebreak_in_selected_bucket"; + } else if (successRate < POLICY_RECALL_MIN_SUCCESS_RATE) { + rejectedReason = `success_rate_${successRate.toFixed(3)}_below_${POLICY_RECALL_MIN_SUCCESS_RATE.toFixed(3)}`; + } else if (policyBoost < POLICY_RECALL_MIN_BOOST) { + rejectedReason = `policy_boost_${policyBoost}_below_${POLICY_RECALL_MIN_BOOST}`; + } + + return { + skill, + scenario, + exposures: stats.exposures, + wins: stats.wins, + directiveWins: stats.directiveWins, + staleMisses: stats.staleMisses, + successRate, + policyBoost, + recallScore: diagnosticRecallScore(stats), + qualified, + excluded, + rejectedReason, + }; +} + +// --------------------------------------------------------------------------- +// buildHints +// --------------------------------------------------------------------------- + +function buildHints( + input: RoutingPolicyScenario, + diagnosis: PolicyRecallDiagnosis, +): RoutingDiagnosisHint[] { + const hints: RoutingDiagnosisHint[] = []; + const preferredExactScenario = scenarioKeyCandidates(input)[0] ?? null; + + if ( + diagnosis.selectedBucket && + diagnosis.selectedBucket.endsWith("|*") + ) { + pushHint(hints, { + severity: "info", + code: "POLICY_RECALL_USING_WILDCARD_ROUTE", + message: `Policy recall is selecting the wildcard bucket for ${input.toolName}`, + hint: "Collect exact-route wins for the active route so recall can promote from * to the concrete route key", + action: { + type: "seed_exact_route_history", + scenario: preferredExactScenario ?? undefined, + }, + }); + } + + if ( + diagnosis.checkedScenarios.every((bucket) => bucket.skillCount === 0) + ) { + pushHint(hints, { + severity: "info", + code: "POLICY_RECALL_NO_HISTORY", + message: "No routing-policy history exists for this scenario", + hint: "Let the current verification loop complete once to seed exposures and outcomes", + action: { + type: "no_history", + scenario: preferredExactScenario ?? undefined, + }, + }); + } + + const needsExposure = diagnosis.rejected.find( + (candidate) => + typeof candidate.rejectedReason === "string" && + candidate.rejectedReason.startsWith("needs_"), + ); + if (needsExposure) { + pushHint(hints, { + severity: "warning", + code: "POLICY_RECALL_NEEDS_EXPOSURES", + message: `${needsExposure.skill} is close to qualifying but needs more exposures`, + hint: `Record ${POLICY_RECALL_MIN_EXPOSURES - needsExposure.exposures} more exposure(s) for ${needsExposure.scenario}`, + action: { + type: "collect_more_exposures", + skill: needsExposure.skill, + scenario: needsExposure.scenario, + remainingExposures: + POLICY_RECALL_MIN_EXPOSURES - needsExposure.exposures, + }, + }); + } + + const lowSuccess = diagnosis.rejected.find( + (candidate) => + typeof candidate.rejectedReason === "string" && + candidate.rejectedReason.startsWith("success_rate_"), + ); + if (lowSuccess) { + pushHint(hints, { + severity: "warning", + code: "POLICY_RECALL_LOW_SUCCESS_RATE", + message: `${lowSuccess.skill} has history, but its success rate is below the recall threshold`, + hint: "Inspect stale misses and directive adherence before trusting policy recall here", + action: { + type: "improve_success_rate", + skill: lowSuccess.skill, + scenario: lowSuccess.scenario, + }, + }); + } + + const alreadyPresent = diagnosis.rejected.find( + (candidate) => + candidate.rejectedReason === "already_ranked_or_injected", + ); + if (alreadyPresent) { + pushHint(hints, { + severity: "info", + code: "POLICY_RECALL_ALREADY_PRESENT", + message: `${alreadyPresent.skill} already exists in the ranked or injected set`, + hint: "No recall action is needed; the candidate is already present via direct routing or prior injection", + action: { + type: "candidate_already_present", + skill: alreadyPresent.skill, + scenario: alreadyPresent.scenario, + }, + }); + } + + const precedence = diagnosis.rejected.find( + (candidate) => + typeof candidate.rejectedReason === "string" && + candidate.rejectedReason.startsWith("shadowed_by_selected_bucket:"), + ); + if (precedence) { + pushHint(hints, { + severity: "info", + code: "POLICY_RECALL_PRECEDENCE_APPLIED", + message: + "A higher-precedence bucket won, so lower-precedence buckets were intentionally ignored", + hint: "This is expected: exact route > wildcard route > legacy 4-part key", + action: { + type: "selected_bucket_precedence", + skill: precedence.skill, + scenario: diagnosis.selectedBucket ?? precedence.scenario, + }, + }); + } + + return hints; +} + +// --------------------------------------------------------------------------- +// explainPolicyRecall +// --------------------------------------------------------------------------- + +export function explainPolicyRecall( + policy: RoutingPolicyFile, + input: RoutingPolicyScenario, + options: ExplainPolicyRecallOptions = {}, +): PolicyRecallDiagnosis { + const excludeSkills = options.excludeSkills ?? new Set(); + const maxCandidates = options.maxCandidates ?? 1; + + if (!input.targetBoundary) { + return { + eligible: false, + skipReason: "no_target_boundary", + checkedScenarios: [], + selectedBucket: null, + selected: [], + rejected: [], + hints: [], + }; + } + + const selectedRaw = selectPolicyRecallCandidates(policy, input, { + maxCandidates, + excludeSkills, + }) as PolicyRecallSelectedCandidate[]; + + const selectedBucket = selectedRaw[0]?.scenario ?? null; + const selectedSkills = new Set( + selectedRaw.map((candidate) => candidate.skill), + ); + + const checkedScenarios = scenarioKeyCandidates(input).map((scenario) => { + const bucket = policy.scenarios[scenario] ?? {}; + const qualifiedCount = Object.entries(bucket).filter(([, stats]) => { + const { qualified } = qualifies(stats); + return qualified; + }).length; + return { + scenario, + skillCount: Object.keys(bucket).length, + qualifiedCount, + selected: scenario === selectedBucket, + }; + }); + + const selected = selectedRaw.map((candidate) => ({ + skill: candidate.skill, + scenario: candidate.scenario, + exposures: candidate.exposures, + wins: candidate.wins, + directiveWins: candidate.directiveWins, + staleMisses: candidate.staleMisses ?? 0, + successRate: round(candidate.successRate), + policyBoost: candidate.policyBoost, + recallScore: candidate.recallScore, + qualified: true, + excluded: false, + rejectedReason: null, + })); + + const rejected: PolicyRecallCandidateDiagnosis[] = []; + for (const scenario of scenarioKeyCandidates(input)) { + const bucket = policy.scenarios[scenario] ?? {}; + for (const [skill, stats] of Object.entries(bucket)) { + const candidate = candidateFromStats( + skill, + scenario, + stats, + selectedBucket, + selectedSkills, + excludeSkills, + ); + if (candidate) { + rejected.push(candidate); + } + } + } + + const diagnosis: PolicyRecallDiagnosis = { + eligible: true, + skipReason: null, + checkedScenarios, + selectedBucket, + selected, + rejected, + hints: [], + }; + + diagnosis.hints = buildHints(input, diagnosis); + return diagnosis; +} diff --git a/hooks/src/routing-policy-compiler.mts b/hooks/src/routing-policy-compiler.mts new file mode 100644 index 0000000..da49cb0 --- /dev/null +++ b/hooks/src/routing-policy-compiler.mts @@ -0,0 +1,370 @@ +/** + * Routing Policy Compiler: pure function that converts a replay report into + * bounded policy patches against an existing RoutingPolicyFile. + * + * Contract: + * - compilePolicyPatch is a pure function of (existing policy, replay report). + * - applyPolicyPatch produces a PromotionArtifact without mutating policy stats. + * - Patch recommendations reuse derivePolicyBoost thresholds — no second scoring system. + * - Deterministic patch ordering: scenario asc, skill asc. + * - Covers promote, demote, investigate, and no-op cases. + * - Routing-policy remains the observational evidence store; promotions live in a separate artifact. + */ + +import { + type RoutingPolicyFile, + type RoutingPolicyStats, + derivePolicyBoost, +} from "./routing-policy.mjs"; +import type { + RoutingReplayReport, + RoutingRecommendation, +} from "./routing-replay.mjs"; +import type { ReplayResult } from "./rule-distillation.mjs"; +import { + type LearnedRoutingRulebook, + type RulebookErrorCode, + createRule as createRulebookRule, +} from "./learned-routing-rulebook.mjs"; +import { createLogger } from "./logger.mjs"; + +// --------------------------------------------------------------------------- +// Types +// --------------------------------------------------------------------------- + +export type PatchAction = "promote" | "demote" | "investigate" | "no-op"; + +export interface PolicyPatchEntry { + scenario: string; + skill: string; + action: PatchAction; + currentBoost: number; + proposedBoost: number; + delta: number; + confidence: number; + reason: string; +} + +export interface PolicyPatchReport { + version: 1; + sessionId: string; + patchCount: number; + entries: PolicyPatchEntry[]; +} + +// --------------------------------------------------------------------------- +// Promotion artifact — separate from the observational policy ledger +// --------------------------------------------------------------------------- + +export interface PromotedRule { + scenario: string; + skill: string; + action: "promote" | "demote"; + boost: number; + confidence: number; + reason: string; +} + +export interface PromotionArtifact { + version: 1; + sessionId: string; + promotedAt: string; + applied: number; + rules: PromotedRule[]; +} + +// --------------------------------------------------------------------------- +// Thresholds — reuses derivePolicyBoost ladder exactly +// --------------------------------------------------------------------------- + +/** + * Compute the boost that derivePolicyBoost *would* produce if we applied + * the replay recommendation's implied stats. We translate the replay + * recommendation back through the same bounded ladder. + */ +function boostForAction(rec: RoutingRecommendation): number { + // These match the thresholds in derivePolicyBoost and routing-replay.mts: + // promote (>=80% success, >=3 exposures) → +8 + // demote (<15% success, >=5 exposures) → -2 + // investigate (40-65%, >=3 exposures) → 0 (no change) + switch (rec.action) { + case "promote": + return 8; + case "demote": + return -2; + case "investigate": + return 0; + } +} + +// --------------------------------------------------------------------------- +// Core compiler — pure function, no side effects +// --------------------------------------------------------------------------- + +export function compilePolicyPatch( + policy: RoutingPolicyFile, + report: RoutingReplayReport, +): PolicyPatchReport { + const log = createLogger(); + + log.summary("policy_compiler_start", { + sessionId: report.sessionId, + recommendationCount: report.recommendations.length, + }); + + const entries: PolicyPatchEntry[] = []; + + for (const rec of report.recommendations) { + const bucket = policy.scenarios[rec.scenario] ?? {}; + const stats: RoutingPolicyStats | undefined = bucket[rec.skill]; + const currentBoost = derivePolicyBoost(stats); + const proposedBoost = boostForAction(rec); + const delta = proposedBoost - currentBoost; + + // Only emit a patch entry when the proposed boost differs from current, + // or when the action is "investigate" (always surface for visibility). + if (delta !== 0 || rec.action === "investigate") { + // Preserve the original recommendation action. The replay analyzer + // already classified it correctly using the same thresholds as + // derivePolicyBoost. Re-classifying based on delta direction would + // break investigate entries that have non-zero delta. + const action: PatchAction = + rec.action === "investigate" + ? "investigate" + : delta > 0 + ? "promote" + : delta < 0 + ? "demote" + : "no-op"; + + const entry: PolicyPatchEntry = { + scenario: rec.scenario, + skill: rec.skill, + action, + currentBoost, + proposedBoost, + delta, + confidence: rec.confidence, + reason: rec.reason, + }; + + entries.push(entry); + + log.debug("policy_patch_entry", { + scenario: rec.scenario, + skill: rec.skill, + action, + currentBoost, + proposedBoost, + delta, + }); + } else { + log.debug("policy_patch_no_op", { + scenario: rec.scenario, + skill: rec.skill, + currentBoost, + proposedBoost, + reason: "boost already aligned", + }); + } + } + + // Deterministic ordering: scenario asc, skill asc + entries.sort( + (a, b) => + a.scenario.localeCompare(b.scenario) || + a.skill.localeCompare(b.skill), + ); + + log.summary("policy_compiler_complete", { + sessionId: report.sessionId, + patchCount: entries.length, + promotes: entries.filter((e) => e.action === "promote").length, + demotes: entries.filter((e) => e.action === "demote").length, + investigates: entries.filter((e) => e.action === "investigate").length, + noOps: entries.filter((e) => e.action === "no-op").length, + }); + + return { + version: 1, + sessionId: report.sessionId, + patchCount: entries.length, + entries, + }; +} + +// --------------------------------------------------------------------------- +// Apply — produces a PromotionArtifact without mutating the policy ledger +// --------------------------------------------------------------------------- + +/** + * Convert a compiled patch into a PromotionArtifact — a standalone record of + * promotion decisions that does NOT touch the observational routing-policy stats. + * + * The routing-policy ledger remains the evidence store (exposures, wins, + * directiveWins, staleMisses are never fabricated). Promotion boosts are + * recorded in the returned artifact and can be inspected, replayed, or + * applied by a downstream consumer without corrupting ground truth. + * + * Pure function: same inputs always produce the same artifact. + * Idempotent: calling twice with the same patch yields identical output. + */ +export function applyPolicyPatch( + patch: PolicyPatchReport, + now?: string, +): PromotionArtifact { + const log = createLogger(); + const timestamp = now ?? new Date().toISOString(); + const rules: PromotedRule[] = []; + + for (const entry of patch.entries) { + if (entry.action === "investigate" || entry.action === "no-op") { + log.debug("policy_apply_skip", { + scenario: entry.scenario, + skill: entry.skill, + action: entry.action, + reason: "non-actionable", + }); + continue; + } + + rules.push({ + scenario: entry.scenario, + skill: entry.skill, + action: entry.action as "promote" | "demote", + boost: Math.abs(entry.proposedBoost), + confidence: entry.confidence, + reason: entry.reason, + }); + + log.summary("policy_apply_entry", { + scenario: entry.scenario, + skill: entry.skill, + action: entry.action, + proposedBoost: entry.proposedBoost, + delta: entry.delta, + }); + } + + log.summary("policy_apply_complete", { + sessionId: patch.sessionId, + applied: rules.length, + total: patch.entries.length, + }); + + return { + version: 1, + sessionId: patch.sessionId, + promotedAt: timestamp, + applied: rules.length, + rules, + }; +} + +// --------------------------------------------------------------------------- +// Promotion gate — bridges PromotionArtifact + ReplayResult → Rulebook +// --------------------------------------------------------------------------- + +export interface PromotionGateResult { + accepted: boolean; + errorCode: RulebookErrorCode | null; + reason: string; + replay: ReplayResult; + rulebook: LearnedRoutingRulebook | null; +} + +/** + * Evaluate whether a promotion artifact should be accepted or rejected based + * on replay evidence. Produces a LearnedRoutingRulebook on acceptance. + * + * Rejection criteria: + * - `regressions.length > 0`: any historical win would regress under learned rules. + * - `learnedWins < baselineWins`: net reduction in verified wins. + * + * Pure function: same inputs always produce the same result. + */ +export function evaluatePromotionGate(params: { + artifact: PromotionArtifact; + replay: ReplayResult; + now?: string; +}): PromotionGateResult { + const { artifact, replay, now = artifact.promotedAt } = params; + const log = createLogger(); + + // Rejection: regressions detected + if (replay.regressions.length > 0) { + const result: PromotionGateResult = { + accepted: false, + errorCode: "RULEBOOK_PROMOTION_REJECTED_REGRESSION", + reason: `Promotion rejected: ${replay.regressions.length} regression(s) detected`, + replay, + rulebook: null, + }; + log.summary("promotion_gate_rejected", { + errorCode: result.errorCode, + regressionCount: replay.regressions.length, + regressions: replay.regressions, + }); + return result; + } + + // Rejection: learned wins worse than baseline + if (replay.learnedWins < replay.baselineWins) { + const result: PromotionGateResult = { + accepted: false, + errorCode: "RULEBOOK_PROMOTION_REJECTED_REGRESSION", + reason: `Promotion rejected: learned wins (${replay.learnedWins}) < baseline wins (${replay.baselineWins})`, + replay, + rulebook: null, + }; + log.summary("promotion_gate_rejected", { + errorCode: result.errorCode, + learnedWins: replay.learnedWins, + baselineWins: replay.baselineWins, + }); + return result; + } + + // Accepted: build rulebook from artifact + const rulebookRules = artifact.rules.map((r) => + createRulebookRule({ + scenario: r.scenario, + skill: r.skill, + action: r.action, + boost: r.boost, + confidence: r.confidence, + reason: r.reason, + sourceSessionId: artifact.sessionId, + promotedAt: now, + evidence: { + baselineWins: replay.baselineWins, + baselineDirectiveWins: replay.baselineDirectiveWins, + learnedWins: replay.learnedWins, + learnedDirectiveWins: replay.learnedDirectiveWins, + regressionCount: replay.regressions.length, + }, + }), + ); + + const rulebook: LearnedRoutingRulebook = { + version: 1, + createdAt: now, + sessionId: artifact.sessionId, + rules: rulebookRules, + }; + + log.summary("promotion_gate_accepted", { + sessionId: artifact.sessionId, + ruleCount: rulebookRules.length, + learnedWins: replay.learnedWins, + baselineWins: replay.baselineWins, + }); + + return { + accepted: true, + errorCode: null, + reason: `Promotion accepted: ${rulebookRules.length} rule(s), ${replay.learnedWins} learned wins, 0 regressions`, + replay, + rulebook, + }; +} diff --git a/hooks/src/routing-policy-ledger.mts b/hooks/src/routing-policy-ledger.mts new file mode 100644 index 0000000..51410bd --- /dev/null +++ b/hooks/src/routing-policy-ledger.mts @@ -0,0 +1,388 @@ +/** + * Routing Policy Ledger: exposure tracking and project-scoped policy persistence. + * + * Records every skill injection as an exposure in an append-only JSONL session + * ledger. Resolves exposures against verification-boundary outcomes and persists + * a deterministic project-scoped policy file across sessions. + * + * Persistence contract: + * - Project policy: `/vercel-plugin-routing-policy-.json` + * - Session exposures: `/vercel-plugin--routing-exposures.jsonl` + * + * v1 — Bash-only verification observer; non-Bash signals will be added in future. + */ + +import { + appendFileSync, + readFileSync, + writeFileSync, +} from "node:fs"; +import { createHash } from "node:crypto"; +import { tmpdir } from "node:os"; + +// --------------------------------------------------------------------------- +// Safe session-id segment (mirrors verification-ledger.mts) +// --------------------------------------------------------------------------- + +const SAFE_SESSION_ID_RE = /^[a-zA-Z0-9_-]+$/; + +function safeSessionSegment(sessionId: string): string { + if (SAFE_SESSION_ID_RE.test(sessionId)) return sessionId; + return createHash("sha256").update(sessionId).digest("hex"); +} +import { + createEmptyRoutingPolicy, + recordExposure as policyRecordExposure, + recordOutcome as policyRecordOutcome, + type RoutingBoundary, + type RoutingHookName, + type RoutingOutcome, + type RoutingPolicyFile, + type RoutingToolName, +} from "./routing-policy.mjs"; +import { createLogger, type Logger } from "./logger.mjs"; + +// --------------------------------------------------------------------------- +// Types +// --------------------------------------------------------------------------- + +export interface SkillExposure { + id: string; + sessionId: string; + projectRoot: string; + storyId: string | null; + storyKind: string | null; + route: string | null; + hook: RoutingHookName; + toolName: RoutingToolName; + skill: string; + targetBoundary: RoutingBoundary | null; + /** Shared across all exposures in the same injection batch. Null for legacy rows. */ + exposureGroupId: string | null; + /** Whether this skill is the causal candidate or a co-injected context helper. */ + attributionRole: "candidate" | "context"; + /** Which skill in the group owns policy credit. Null for legacy rows. */ + candidateSkill: string | null; + createdAt: string; + resolvedAt: string | null; + outcome: "pending" | "win" | "directive-win" | "stale-miss"; +} + +// --------------------------------------------------------------------------- +// Path helpers +// --------------------------------------------------------------------------- + +export function projectPolicyPath(projectRoot: string): string { + const hash = createHash("sha256").update(projectRoot).digest("hex"); + return `${tmpdir()}/vercel-plugin-routing-policy-${hash}.json`; +} + +export function sessionExposurePath(sessionId: string): string { + return `${tmpdir()}/vercel-plugin-${safeSessionSegment(sessionId)}-routing-exposures.jsonl`; +} + +// --------------------------------------------------------------------------- +// Project policy persistence +// --------------------------------------------------------------------------- + +export function loadProjectRoutingPolicy(projectRoot: string): RoutingPolicyFile { + const path = projectPolicyPath(projectRoot); + try { + const raw = readFileSync(path, "utf-8"); + const parsed = JSON.parse(raw); + if (parsed && parsed.version === 1 && typeof parsed.scenarios === "object") { + return parsed as RoutingPolicyFile; + } + } catch { + // File doesn't exist or is corrupt — start fresh + } + return createEmptyRoutingPolicy(); +} + +export function saveProjectRoutingPolicy( + projectRoot: string, + policy: RoutingPolicyFile, +): void { + const path = projectPolicyPath(projectRoot); + const log = createLogger(); + writeFileSync(path, JSON.stringify(policy, null, 2) + "\n"); + log.summary("routing-policy-ledger.save", { + path, + scenarioCount: Object.keys(policy.scenarios).length, + }); +} + +// --------------------------------------------------------------------------- +// Attribution gating +// --------------------------------------------------------------------------- + +/** + * Only candidate exposures update long-term project routing policy. + * Context exposures are still fully persisted in the session JSONL for + * replay, operator inspection, and diagnostics. + * + * Legacy rows (missing attributionRole) default to candidate behavior + * for backward compatibility. + */ +function shouldAffectPolicy(exposure: SkillExposure): boolean { + // Legacy rows without attribution fields default to candidate + if (!exposure.attributionRole) return true; + return exposure.attributionRole === "candidate"; +} + +// --------------------------------------------------------------------------- +// Session exposure ledger (append-only JSONL) +// --------------------------------------------------------------------------- + +export function appendSkillExposure(exposure: SkillExposure): void { + const path = sessionExposurePath(exposure.sessionId); + const log = createLogger(); + + // Always persist to session JSONL — full history for replay and inspection + appendFileSync(path, JSON.stringify(exposure) + "\n"); + + // Only candidate exposures update project routing policy + if (shouldAffectPolicy(exposure)) { + const policy = loadProjectRoutingPolicy(exposure.projectRoot); + policyRecordExposure(policy, { + hook: exposure.hook, + storyKind: exposure.storyKind, + targetBoundary: exposure.targetBoundary, + toolName: exposure.toolName, + routeScope: exposure.route, + skill: exposure.skill, + now: exposure.createdAt, + }); + saveProjectRoutingPolicy(exposure.projectRoot, policy); + } + + log.summary("routing-policy-ledger.exposure-append", { + id: exposure.id, + skill: exposure.skill, + hook: exposure.hook, + targetBoundary: exposure.targetBoundary, + route: exposure.route, + outcome: exposure.outcome, + attributionRole: exposure.attributionRole ?? "legacy", + exposureGroupId: exposure.exposureGroupId ?? null, + policyAffected: shouldAffectPolicy(exposure), + }); +} + +export function loadSessionExposures(sessionId: string): SkillExposure[] { + const path = sessionExposurePath(sessionId); + try { + const raw = readFileSync(path, "utf-8"); + return raw + .split("\n") + .filter((line) => line.trim().length > 0) + .map((line) => JSON.parse(line) as SkillExposure); + } catch { + return []; + } +} + +// --------------------------------------------------------------------------- +// Outcome resolution +// --------------------------------------------------------------------------- + +/** + * Resolve pending exposures whose session, boundary, story scope, and route + * scope all match the observed verification event. + * + * Only resolves exposures from the same session that are still `pending`. + * If `matchedSuggestedAction` is true, the outcome is `directive-win`; + * otherwise it is `win`. + * + * Also updates the project policy with the resolved outcomes. + * + * Returns the list of resolved exposures. + */ +export function resolveBoundaryOutcome(params: { + sessionId: string; + boundary: RoutingBoundary; + matchedSuggestedAction: boolean; + storyId?: string | null; + route?: string | null; + now?: string; +}): SkillExposure[] { + const { sessionId, boundary, matchedSuggestedAction } = params; + const storyId = params.storyId ?? null; + const route = params.route ?? null; + const now = params.now ?? new Date().toISOString(); + const log = createLogger(); + + const exposures = loadSessionExposures(sessionId); + const resolved: SkillExposure[] = []; + + // Strict null matching: a null observed route/storyId only resolves + // exposures that also have null route/storyId. This prevents + // over-crediting exposures across unrelated routes or stories when + // the observed route is null, inferred, or missing. + const pending = exposures.filter( + (e) => + e.outcome === "pending" && + e.sessionId === sessionId && + e.targetBoundary === boundary && + e.storyId === storyId && + e.route === route, + ); + + log.summary("routing-policy-ledger.resolve-filter", { + sessionId, + boundary, + storyId, + route, + totalExposures: exposures.length, + pendingCount: exposures.filter((e) => e.outcome === "pending").length, + matchedCount: pending.length, + }); + + if (pending.length === 0) { + log.trace("routing-policy-ledger.resolve-skip", { + sessionId, + boundary, + storyId, + route, + reason: "no_matching_pending_exposures", + }); + return []; + } + + const outcome: "win" | "directive-win" = matchedSuggestedAction + ? "directive-win" + : "win"; + + // Update each pending exposure in-place + for (const exposure of pending) { + exposure.outcome = outcome; + exposure.resolvedAt = now; + resolved.push(exposure); + log.summary("routing-policy-ledger.exposure-resolved", { + id: exposure.id, + skill: exposure.skill, + outcome, + storyId: exposure.storyId, + route: exposure.route, + boundary, + }); + } + + // Rewrite the full session ledger with updated outcomes + const path = sessionExposurePath(sessionId); + const lines = exposures.map((e) => JSON.stringify(e)).join("\n") + "\n"; + writeFileSync(path, lines); + + // Update project policy only for candidate exposures + const candidateResolved = resolved.filter(shouldAffectPolicy); + const projectRoots = new Set(resolved.map((e) => e.projectRoot)); + for (const projectRoot of projectRoots) { + const candidates = candidateResolved.filter((r) => r.projectRoot === projectRoot); + if (candidates.length === 0) continue; + const policy = loadProjectRoutingPolicy(projectRoot); + for (const e of candidates) { + policyRecordOutcome(policy, { + hook: e.hook, + storyKind: e.storyKind, + targetBoundary: e.targetBoundary, + toolName: e.toolName, + routeScope: e.route, + skill: e.skill, + outcome: outcome as RoutingOutcome, + now, + }); + } + saveProjectRoutingPolicy(projectRoot, policy); + } + + log.summary("routing-policy-ledger.resolve", { + sessionId, + boundary, + storyId, + route, + outcome, + resolvedCount: resolved.length, + candidateCount: candidateResolved.length, + contextCount: resolved.length - candidateResolved.length, + skills: resolved.map((e) => e.skill), + }); + + return resolved; +} + +/** + * Convert remaining pending exposures into stale-miss at session end. + * + * Updates both the session ledger and the project policy. + * Returns the list of finalized exposures. + */ +export function finalizeStaleExposures( + sessionId: string, + now?: string, +): SkillExposure[] { + const timestamp = now ?? new Date().toISOString(); + const log = createLogger(); + + const exposures = loadSessionExposures(sessionId); + const stale = exposures.filter( + (e) => e.outcome === "pending" && e.sessionId === sessionId, + ); + + if (stale.length === 0) { + log.trace("routing-policy-ledger.finalize-skip", { + sessionId, + reason: "no_pending_exposures", + }); + return []; + } + + for (const exposure of stale) { + exposure.outcome = "stale-miss"; + exposure.resolvedAt = timestamp; + log.summary("routing-policy-ledger.exposure-stale", { + id: exposure.id, + skill: exposure.skill, + outcome: "stale-miss", + storyId: exposure.storyId, + route: exposure.route, + targetBoundary: exposure.targetBoundary, + }); + } + + // Rewrite ledger + const path = sessionExposurePath(sessionId); + const lines = exposures.map((e) => JSON.stringify(e)).join("\n") + "\n"; + writeFileSync(path, lines); + + // Update project policies — only for candidate exposures + const candidateStale = stale.filter(shouldAffectPolicy); + const projectRoots = new Set(stale.map((e) => e.projectRoot)); + for (const projectRoot of projectRoots) { + const candidates = candidateStale.filter((r) => r.projectRoot === projectRoot); + if (candidates.length === 0) continue; + const policy = loadProjectRoutingPolicy(projectRoot); + for (const e of candidates) { + policyRecordOutcome(policy, { + hook: e.hook, + storyKind: e.storyKind, + targetBoundary: e.targetBoundary, + toolName: e.toolName, + routeScope: e.route, + skill: e.skill, + outcome: "stale-miss", + now: timestamp, + }); + } + saveProjectRoutingPolicy(projectRoot, policy); + } + + log.summary("routing-policy-ledger.finalize-stale", { + sessionId, + staleCount: stale.length, + candidateCount: candidateStale.length, + contextCount: stale.length - candidateStale.length, + skills: stale.map((e) => e.skill), + }); + + return stale; +} diff --git a/hooks/src/routing-policy.mts b/hooks/src/routing-policy.mts new file mode 100644 index 0000000..a7846b4 --- /dev/null +++ b/hooks/src/routing-policy.mts @@ -0,0 +1,354 @@ +/** + * Verified Routing Policy Engine — pure deterministic core. + * + * Records skill exposures, resolves them against verification-boundary + * outcomes, and applies bounded policy boosts during skill ranking. + * + * Precedence rule: when a learned-routing-rulebook exists and contains a + * matching rule for a (scenario, skill) pair, the rulebook boost is used + * and the stats-policy boost is suppressed for that skill. This prevents + * double-boosting from both systems. + */ + +import type { + LearnedRoutingRulebook, + LearnedRoutingRule, +} from "./learned-routing-rulebook.mjs"; + +// --------------------------------------------------------------------------- +// Types +// --------------------------------------------------------------------------- + +export type RoutingBoundary = + | "uiRender" + | "clientRequest" + | "serverHandler" + | "environment"; + +export type RoutingHookName = "PreToolUse" | "UserPromptSubmit"; + +export type RoutingToolName = + | "Read" + | "Edit" + | "Write" + | "Bash" + | "Prompt"; + +export interface RoutingPolicyScenario { + hook: RoutingHookName; + storyKind: string | null; + targetBoundary: RoutingBoundary | null; + toolName: RoutingToolName; + routeScope?: string | null; +} + +export interface RoutingPolicyStats { + exposures: number; + wins: number; + directiveWins: number; + staleMisses: number; + lastUpdatedAt: string; +} + +export interface RoutingPolicyFile { + version: 1; + scenarios: Record>; +} + +export interface PolicyBoostExplanation { + skill: string; + scenario: string; + boost: number; + reason: string; +} + +export interface RankableSkill { + skill: string; + priority: number; + effectivePriority?: number; +} + +// --------------------------------------------------------------------------- +// Factory +// --------------------------------------------------------------------------- + +export function createEmptyRoutingPolicy(): RoutingPolicyFile { + return { + version: 1, + scenarios: {}, + }; +} + +// --------------------------------------------------------------------------- +// Scenario key — deterministic, pipe-delimited +// --------------------------------------------------------------------------- + +export function scenarioKey(input: RoutingPolicyScenario): string { + return [ + input.hook, + input.storyKind ?? "none", + input.targetBoundary ?? "none", + input.toolName, + ].join("|"); +} + +export function scenarioKeyWithRoute(input: RoutingPolicyScenario): string { + return [ + input.hook, + input.storyKind ?? "none", + input.targetBoundary ?? "none", + input.toolName, + input.routeScope ?? "*", + ].join("|"); +} + +/** + * Deterministic candidate lookup order: + * 1. Exact route key (if routeScope is a non-wildcard string) + * 2. Wildcard route key (hook|story|boundary|tool|*) + * 3. Legacy 4-part key (hook|story|boundary|tool) + */ +export function scenarioKeyCandidates(input: RoutingPolicyScenario): string[] { + const keys: string[] = []; + if (input.routeScope && input.routeScope !== "*") { + keys.push(scenarioKeyWithRoute(input)); + } + keys.push(scenarioKeyWithRoute({ ...input, routeScope: "*" })); + keys.push(scenarioKey(input)); // legacy fallback + return [...new Set(keys)]; +} + +export function computePolicySuccessRate(stats: RoutingPolicyStats): number { + const weightedWins = stats.wins + stats.directiveWins * 0.25; + return weightedWins / Math.max(stats.exposures, 1); +} + +export function lookupPolicyStats( + policy: RoutingPolicyFile, + input: RoutingPolicyScenario, + skill: string, +): { scenario: string | null; stats: RoutingPolicyStats | undefined } { + for (const key of scenarioKeyCandidates(input)) { + const stats = policy.scenarios[key]?.[skill]; + if (stats) return { scenario: key, stats }; + } + return { scenario: null, stats: undefined }; +} + +// --------------------------------------------------------------------------- +// Ensure scenario + skill slot exists +// --------------------------------------------------------------------------- + +export function ensureScenario( + policy: RoutingPolicyFile, + scenario: string, + skill: string, + now: string, +): RoutingPolicyStats { + if (!policy.scenarios[scenario]) policy.scenarios[scenario] = {}; + if (!policy.scenarios[scenario][skill]) { + policy.scenarios[scenario][skill] = { + exposures: 0, + wins: 0, + directiveWins: 0, + staleMisses: 0, + lastUpdatedAt: now, + }; + } + return policy.scenarios[scenario][skill]; +} + +// --------------------------------------------------------------------------- +// Record an exposure (skill was injected) +// --------------------------------------------------------------------------- + +export function recordExposure( + policy: RoutingPolicyFile, + input: RoutingPolicyScenario & { skill: string; now?: string }, +): RoutingPolicyFile { + const now = input.now ?? new Date().toISOString(); + for (const key of scenarioKeyCandidates(input)) { + const stats = ensureScenario(policy, key, input.skill, now); + stats.exposures += 1; + stats.lastUpdatedAt = now; + } + return policy; +} + +// --------------------------------------------------------------------------- +// Record an outcome (verification boundary resolved) +// --------------------------------------------------------------------------- + +export type RoutingOutcome = "win" | "directive-win" | "stale-miss"; + +export function recordOutcome( + policy: RoutingPolicyFile, + input: RoutingPolicyScenario & { + skill: string; + outcome: RoutingOutcome; + now?: string; + }, +): RoutingPolicyFile { + const now = input.now ?? new Date().toISOString(); + for (const key of scenarioKeyCandidates(input)) { + const stats = ensureScenario(policy, key, input.skill, now); + + if (input.outcome === "win") { + stats.wins += 1; + } else if (input.outcome === "directive-win") { + stats.wins += 1; + stats.directiveWins += 1; + } else { + stats.staleMisses += 1; + } + + stats.lastUpdatedAt = now; + } + return policy; +} + +// --------------------------------------------------------------------------- +// Derive a bounded policy boost from stats +// --------------------------------------------------------------------------- + +export function derivePolicyBoost(stats: RoutingPolicyStats | undefined): number { + if (!stats) return 0; + if (stats.exposures < 3) return 0; + + const weightedWins = stats.wins + stats.directiveWins * 0.25; + const successRate = weightedWins / Math.max(stats.exposures, 1); + + if (successRate >= 0.80) return 8; + if (successRate >= 0.65) return 5; + if (successRate >= 0.40) return 2; + + if (stats.exposures >= 5 && successRate < 0.15) return -2; + return 0; +} + +// --------------------------------------------------------------------------- +// Apply policy boosts to a set of rankable skills +// --------------------------------------------------------------------------- + +export function applyPolicyBoosts( + entries: T[], + policy: RoutingPolicyFile, + scenarioInput: RoutingPolicyScenario, +): Array { + return entries.map((entry) => { + const { scenario, stats } = lookupPolicyStats(policy, scenarioInput, entry.skill); + const boost = derivePolicyBoost(stats); + const base = typeof entry.effectivePriority === "number" + ? entry.effectivePriority + : entry.priority; + + return { + ...entry, + effectivePriority: base + boost, + policyBoost: boost, + policyReason: stats && scenario + ? `${scenario}: ${stats.wins} wins / ${stats.exposures} exposures, ${stats.directiveWins} directive wins, ${stats.staleMisses} stale misses` + : null, + }; + }); +} + +// --------------------------------------------------------------------------- +// Rulebook match — find a matching learned rule for a (scenario, skill) pair +// --------------------------------------------------------------------------- + +export interface RulebookMatchResult { + rule: LearnedRoutingRule; + matchedScenario: string; +} + +/** + * Look up a matching rulebook rule for a skill in a given scenario. + * Checks scenario key candidates in precedence order (route-scoped first, + * then wildcard, then legacy). Only "promote" rules contribute positive + * boosts; "demote" rules contribute negative boosts. + */ +export function matchRulebookRule( + rulebook: LearnedRoutingRulebook, + scenarioInput: RoutingPolicyScenario, + skill: string, +): RulebookMatchResult | null { + if (rulebook.rules.length === 0) return null; + + for (const key of scenarioKeyCandidates(scenarioInput)) { + const rule = rulebook.rules.find( + (r) => r.scenario === key && r.skill === skill, + ); + if (rule) return { rule, matchedScenario: key }; + } + return null; +} + +// --------------------------------------------------------------------------- +// Apply rulebook boosts with precedence over stats-policy +// --------------------------------------------------------------------------- + +export interface RulebookBoostExplanation { + skill: string; + matchedRuleId: string; + ruleBoost: number; + ruleReason: string; + rulebookPath: string; +} + +/** + * Apply learned-rulebook boosts with explicit precedence over stats-policy. + * + * Precedence rule: when a rulebook rule matches a (scenario, skill) pair, + * the rulebook boost replaces the stats-policy boost. The stats-policy + * boost is zeroed out for that skill to prevent double-boosting. + * + * Skills without a matching rule keep their stats-policy boost unchanged. + */ +export function applyRulebookBoosts< + T extends RankableSkill & { policyBoost: number; policyReason: string | null }, +>( + entries: T[], + rulebook: LearnedRoutingRulebook, + scenarioInput: RoutingPolicyScenario, + rulebookFilePath: string, +): Array< + T & { + matchedRuleId: string | null; + ruleBoost: number; + ruleReason: string | null; + rulebookPath: string | null; + } +> { + return entries.map((entry) => { + const match = matchRulebookRule(rulebook, scenarioInput, entry.skill); + if (!match) { + return { + ...entry, + matchedRuleId: null, + ruleBoost: 0, + ruleReason: null, + rulebookPath: null, + }; + } + + const { rule } = match; + const ruleBoost = rule.action === "promote" ? rule.boost : -rule.boost; + + // Precedence: subtract old stats-policy boost, apply rulebook boost instead + const base = (typeof entry.effectivePriority === "number" + ? entry.effectivePriority + : entry.priority) - entry.policyBoost; + + return { + ...entry, + effectivePriority: base + ruleBoost, + policyBoost: 0, // suppressed — rulebook takes precedence + policyReason: null, + matchedRuleId: rule.id, + ruleBoost, + ruleReason: rule.reason, + rulebookPath: rulebookFilePath, + }; + }); +} diff --git a/hooks/src/routing-replay.mts b/hooks/src/routing-replay.mts new file mode 100644 index 0000000..10ac4de --- /dev/null +++ b/hooks/src/routing-replay.mts @@ -0,0 +1,248 @@ +/** + * Routing Replay Analyzer: deterministic replay compiler that reads routing + * traces and exposure ledgers, groups by policy scenario, and emits a stable + * RoutingReplayReport with scenario summaries and bounded recommendations. + * + * Contract: + * - Same trace input always yields byte-for-byte identical JSON output. + * - Scenario ordering is stable (lexicographic). + * - Skill ordering within scenarios is stable (wins desc, exposures desc, name asc). + * - Recommendations are derived from observed behavior with bounded thresholds. + */ + +import { + readRoutingDecisionTrace, + type RoutingDecisionTrace, +} from "./routing-decision-trace.mjs"; +import { loadSessionExposures, type SkillExposure } from "./routing-policy-ledger.mjs"; +import { createLogger } from "./logger.mjs"; + +// --------------------------------------------------------------------------- +// Types +// --------------------------------------------------------------------------- + +export interface RoutingScenarioSummary { + scenario: string; + exposures: number; + wins: number; + directiveWins: number; + staleMisses: number; + topSkills: Array<{ + skill: string; + exposures: number; + wins: number; + directiveWins: number; + staleMisses: number; + }>; +} + +export interface RoutingRecommendation { + scenario: string; + skill: string; + action: "promote" | "demote" | "investigate"; + suggestedBoost: number; + confidence: number; + reason: string; +} + +export interface RoutingReplayReport { + version: 1; + sessionId: string; + traceCount: number; + scenarioCount: number; + scenarios: RoutingScenarioSummary[]; + recommendations: RoutingRecommendation[]; +} + +// --------------------------------------------------------------------------- +// Scenario key builder +// --------------------------------------------------------------------------- + +function buildScenarioKey(exposure: SkillExposure): string { + return [ + exposure.hook, + exposure.storyKind ?? "none", + exposure.targetBoundary ?? "none", + exposure.toolName, + ].join("|"); +} + +// --------------------------------------------------------------------------- +// Skill stats accumulator +// --------------------------------------------------------------------------- + +interface SkillStats { + exposures: number; + wins: number; + directiveWins: number; + staleMisses: number; +} + +function emptyStats(): SkillStats { + return { exposures: 0, wins: 0, directiveWins: 0, staleMisses: 0 }; +} + +// --------------------------------------------------------------------------- +// Recommendation thresholds (bounded — matches derivePolicyBoost semantics) +// --------------------------------------------------------------------------- + +const PROMOTE_MIN_EXPOSURES = 3; +const PROMOTE_MIN_SUCCESS_RATE = 0.8; +const PROMOTE_BOOST = 8; + +const DEMOTE_MIN_EXPOSURES = 5; +const DEMOTE_MAX_SUCCESS_RATE = 0.15; +const DEMOTE_BOOST = -2; + +const INVESTIGATE_MIN_EXPOSURES = 3; +const INVESTIGATE_MIN_RATE = 0.4; +const INVESTIGATE_MAX_RATE = 0.65; + +// --------------------------------------------------------------------------- +// Core replay +// --------------------------------------------------------------------------- + +export function replayRoutingSession(sessionId: string): RoutingReplayReport { + const log = createLogger(); + + log.summary("replay_start", { sessionId }); + + const traces = readRoutingDecisionTrace(sessionId); + const exposures = loadSessionExposures(sessionId); + + log.debug("replay_loaded", { + sessionId, + traceCount: traces.length, + exposureCount: exposures.length, + }); + + // Group exposures by scenario → skill + const buckets = new Map>(); + + // Seed scenario keys from traces so empty scenarios still appear + for (const trace of traces) { + const scenario = trace.policyScenario; + if (scenario && !buckets.has(scenario)) { + buckets.set(scenario, new Map()); + } + } + + // Accumulate exposure outcomes + for (const exposure of exposures) { + const scenario = buildScenarioKey(exposure); + let bySkill = buckets.get(scenario); + if (!bySkill) { + bySkill = new Map(); + buckets.set(scenario, bySkill); + } + + const current = bySkill.get(exposure.skill) ?? emptyStats(); + current.exposures += 1; + + if (exposure.outcome === "win") { + current.wins += 1; + } else if (exposure.outcome === "directive-win") { + current.wins += 1; + current.directiveWins += 1; + } else if (exposure.outcome === "stale-miss") { + current.staleMisses += 1; + } + // "pending" contributes only to exposure count + + bySkill.set(exposure.skill, current); + } + + // Build deterministic scenario summaries + const scenarios: RoutingScenarioSummary[] = [...buckets.entries()] + .sort(([a], [b]) => a.localeCompare(b)) + .map(([scenario, bySkill]) => { + const topSkills = [...bySkill.entries()] + .map(([skill, stats]) => ({ skill, ...stats })) + .sort( + (a, b) => + b.wins - a.wins || + b.exposures - a.exposures || + a.skill.localeCompare(b.skill), + ); + + return { + scenario, + exposures: topSkills.reduce((n, s) => n + s.exposures, 0), + wins: topSkills.reduce((n, s) => n + s.wins, 0), + directiveWins: topSkills.reduce((n, s) => n + s.directiveWins, 0), + staleMisses: topSkills.reduce((n, s) => n + s.staleMisses, 0), + topSkills, + }; + }); + + // Derive bounded recommendations + const recommendations: RoutingRecommendation[] = []; + + for (const scenario of scenarios) { + for (const skill of scenario.topSkills) { + const successRate = + skill.exposures === 0 ? 0 : skill.wins / skill.exposures; + + if ( + skill.exposures >= PROMOTE_MIN_EXPOSURES && + successRate >= PROMOTE_MIN_SUCCESS_RATE + ) { + recommendations.push({ + scenario: scenario.scenario, + skill: skill.skill, + action: "promote", + suggestedBoost: PROMOTE_BOOST, + confidence: Math.min(0.99, successRate), + reason: `${skill.wins}/${skill.exposures} wins in ${scenario.scenario}`, + }); + } else if ( + skill.exposures >= DEMOTE_MIN_EXPOSURES && + successRate < DEMOTE_MAX_SUCCESS_RATE + ) { + recommendations.push({ + scenario: scenario.scenario, + skill: skill.skill, + action: "demote", + suggestedBoost: DEMOTE_BOOST, + confidence: 1 - successRate, + reason: `${skill.wins}/${skill.exposures} wins in ${scenario.scenario}`, + }); + } else if ( + skill.exposures >= INVESTIGATE_MIN_EXPOSURES && + successRate >= INVESTIGATE_MIN_RATE && + successRate < INVESTIGATE_MAX_RATE + ) { + recommendations.push({ + scenario: scenario.scenario, + skill: skill.skill, + action: "investigate", + suggestedBoost: 0, + confidence: successRate, + reason: `${skill.wins}/${skill.exposures} mixed results in ${scenario.scenario}`, + }); + } + } + } + + recommendations.sort( + (a, b) => + a.scenario.localeCompare(b.scenario) || + a.skill.localeCompare(b.skill), + ); + + log.summary("replay_complete", { + sessionId, + traceCount: traces.length, + scenarioCount: scenarios.length, + recommendationCount: recommendations.length, + }); + + return { + version: 1, + sessionId, + traceCount: traces.length, + scenarioCount: scenarios.length, + scenarios, + recommendations, + }; +} diff --git a/hooks/src/rule-distillation.mts b/hooks/src/rule-distillation.mts new file mode 100644 index 0000000..3ba1bb3 --- /dev/null +++ b/hooks/src/rule-distillation.mts @@ -0,0 +1,470 @@ +/** + * rule-distillation.mts — Verification-backed rule distiller. + * + * Reads routing decision traces, exposure ledgers, and verification outcomes, + * mines repeated high-precision patterns that predict successful skills, + * and distills them into a deterministic, reviewable rules artifact. + */ + +import type { RoutingDecisionTrace, RankedSkillTrace } from "./routing-decision-trace.mjs"; +import type { SkillExposure } from "./routing-policy-ledger.mjs"; +import type { + RoutingPolicyFile, + RoutingBoundary, + RoutingHookName, + RoutingToolName, +} from "./routing-policy.mjs"; +import { createLogger } from "./logger.mjs"; +import { replayLearnedRules } from "./rule-replay.mjs"; + +// --------------------------------------------------------------------------- +// Types +// --------------------------------------------------------------------------- + +export type LearnedRuleKind = + | "promptPhrase" + | "promptAllOf" + | "promptNoneOf" + | "pathPattern" + | "bashPattern" + | "importPattern" + | "companion"; + +export interface LearnedRoutingRule { + id: string; + skill: string; + kind: LearnedRuleKind; + value: string | string[]; + scenario: { + hook: RoutingHookName | "PostToolUse"; + storyKind: string | null; + targetBoundary: RoutingBoundary | null; + toolName: RoutingToolName; + routeScope: string | null; + }; + support: number; + wins: number; + directiveWins: number; + staleMisses: number; + precision: number; + lift: number; + sourceDecisionIds: string[]; + confidence: "candidate" | "promote" | "holdout-fail"; + promotedAt: string | null; +} + +export interface ReplayResult { + baselineWins: number; + baselineDirectiveWins: number; + learnedWins: number; + learnedDirectiveWins: number; + deltaWins: number; + deltaDirectiveWins: number; + regressions: string[]; +} + +export interface PromotionStatus { + accepted: boolean; + errorCode: string | null; + reason: string; +} + +export interface LearnedRoutingRulesFile { + version: 1; + generatedAt: string; + projectRoot: string; + rules: LearnedRoutingRule[]; + replay: ReplayResult; + promotion: PromotionStatus; +} + +// --------------------------------------------------------------------------- +// Scoring primitives +// --------------------------------------------------------------------------- + +export function computeRuleLift(input: { + wins: number; + support: number; + scenarioWins: number; + scenarioExposures: number; +}): number { + const rulePrecision = input.wins / Math.max(input.support, 1); + const scenarioPrecision = + input.scenarioWins / Math.max(input.scenarioExposures, 1); + if (scenarioPrecision === 0) return rulePrecision; + return rulePrecision / scenarioPrecision; +} + +export function classifyRuleConfidence(input: { + support: number; + precision: number; + lift: number; + regressions: number; +}): "candidate" | "promote" | "holdout-fail" { + if (input.regressions > 0) return "holdout-fail"; + if (input.support >= 5 && input.precision >= 0.8 && input.lift >= 1.5) + return "promote"; + if (input.support >= 3 && input.precision >= 0.65 && input.lift >= 1.1) + return "candidate"; + return "holdout-fail"; +} + +// --------------------------------------------------------------------------- +// Internal helpers +// --------------------------------------------------------------------------- + +/** Deterministic scenario key from trace context. */ +function scenarioKeyFromTrace( + trace: RoutingDecisionTrace, +): string { + const story = trace.primaryStory; + return [ + trace.hook, + story.kind ?? "_", + story.targetBoundary ?? "_", + trace.toolName, + story.storyRoute ?? "_", + ].join("|"); +} + +/** Build a scenario descriptor from a trace. */ +function scenarioFromTrace(trace: RoutingDecisionTrace): LearnedRoutingRule["scenario"] { + const story = trace.primaryStory; + return { + hook: trace.hook as LearnedRoutingRule["scenario"]["hook"], + storyKind: story.kind ?? null, + targetBoundary: (story.targetBoundary as RoutingBoundary) ?? null, + toolName: trace.toolName as RoutingToolName, + routeScope: story.storyRoute ?? null, + }; +} + +/** Infer a rule kind from a ranked skill's pattern info. */ +function inferRuleKind(ranked: RankedSkillTrace, hook: string): LearnedRuleKind { + if (!ranked.pattern) { + return hook === "UserPromptSubmit" ? "promptPhrase" : "pathPattern"; + } + switch (ranked.pattern.type) { + case "path": + case "pathPattern": + return "pathPattern"; + case "bash": + case "bashPattern": + return "bashPattern"; + case "import": + case "importPattern": + return "importPattern"; + case "prompt": + case "promptPhrase": + return "promptPhrase"; + case "promptAllOf": + return "promptAllOf"; + case "promptNoneOf": + return "promptNoneOf"; + case "companion": + return "companion"; + default: + return hook === "UserPromptSubmit" ? "promptPhrase" : "pathPattern"; + } +} + +/** Extract the pattern value for a rule. */ +function extractPatternValue(ranked: RankedSkillTrace, trace: RoutingDecisionTrace): string | string[] { + if (ranked.pattern?.value) return ranked.pattern.value; + // For prompt hooks without explicit pattern, use the tool target as a proxy + if (trace.hook === "UserPromptSubmit") return trace.toolTarget || ""; + return trace.toolTarget || ""; +} + +/** + * Composite key for grouping exposures into candidate rules. + * Combines scenario + skill + kind + pattern value for uniqueness. + */ +function candidateKey( + scenarioKey: string, + skill: string, + kind: LearnedRuleKind, + value: string | string[], +): string { + const v = Array.isArray(value) ? value.join(",") : value; + return `${scenarioKey}|${skill}|${kind}|${v}`; +} + +// --------------------------------------------------------------------------- +// Candidate accumulator +// --------------------------------------------------------------------------- + +interface CandidateAccumulator { + skill: string; + kind: LearnedRuleKind; + value: string | string[]; + scenario: LearnedRoutingRule["scenario"]; + scenarioKey: string; + support: number; + wins: number; + directiveWins: number; + staleMisses: number; + sourceDecisionIds: string[]; +} + +// --------------------------------------------------------------------------- +// Core distillation +// --------------------------------------------------------------------------- + +export interface DistillRulesParams { + projectRoot: string; + traces: RoutingDecisionTrace[]; + exposures: SkillExposure[]; + policy: RoutingPolicyFile; + minSupport?: number; + minPrecision?: number; + minLift?: number; + /** Override timestamp for deterministic output in tests. */ + generatedAt?: string; +} + +export function distillRulesFromTrace(params: DistillRulesParams): LearnedRoutingRulesFile { + const { + projectRoot, + traces, + exposures, + policy, + minSupport = 5, + minPrecision = 0.8, + minLift = 1.5, + generatedAt = new Date().toISOString(), + } = params; + + const logger = createLogger("summary"); + + logger.summary("distill_start", { + traceCount: traces.length, + exposureCount: exposures.length, + minSupport, + minPrecision, + minLift, + }); + + // Index exposures by decisionId-like key (sessionId + skill + hook) + const exposureByKey = new Map(); + for (const exp of exposures) { + const key = `${exp.sessionId}|${exp.skill}|${exp.hook}|${exp.route ?? "_"}`; + exposureByKey.set(key, exp); + } + + // Phase 1: Extract candidates from traces + const candidates = new Map(); + + // Scenario-level aggregate counters (for lift computation) + const scenarioExposureCounts = new Map(); + const scenarioWinCounts = new Map(); + + for (const trace of traces) { + const sKey = scenarioKeyFromTrace(trace); + const scenario = scenarioFromTrace(trace); + + for (const ranked of trace.ranked) { + // Only consider skills that were actually injected (not dropped) + if (ranked.droppedReason) continue; + + // Find corresponding exposure + const expKey = `${trace.sessionId}|${ranked.skill}|${trace.hook}|${trace.primaryStory.storyRoute ?? "_"}`; + const exposure = exposureByKey.get(expKey); + + // Only attribute from verified evidence + if (!exposure) continue; + // Only count candidate-role exposures for causal credit + if (exposure.attributionRole !== "candidate") continue; + + const kind = inferRuleKind(ranked, trace.hook); + const value = extractPatternValue(ranked, trace); + const cKey = candidateKey(sKey, ranked.skill, kind, value); + + let acc = candidates.get(cKey); + if (!acc) { + acc = { + skill: ranked.skill, + kind, + value, + scenario, + scenarioKey: sKey, + support: 0, + wins: 0, + directiveWins: 0, + staleMisses: 0, + sourceDecisionIds: [], + }; + candidates.set(cKey, acc); + } + + acc.support++; + acc.sourceDecisionIds.push(trace.decisionId); + + // Track scenario totals + scenarioExposureCounts.set( + sKey, + (scenarioExposureCounts.get(sKey) ?? 0) + 1, + ); + + if (exposure.outcome === "win" || exposure.outcome === "directive-win") { + scenarioWinCounts.set(sKey, (scenarioWinCounts.get(sKey) ?? 0) + 1); + } + + switch (exposure.outcome) { + case "win": + acc.wins++; + break; + case "directive-win": + acc.wins++; + acc.directiveWins++; + break; + case "stale-miss": + acc.staleMisses++; + break; + } + } + } + + logger.summary("distill_candidates_extracted", { + candidateCount: candidates.size, + scenarioCount: scenarioExposureCounts.size, + }); + + // Phase 2: Score and classify each candidate + const rules: LearnedRoutingRule[] = []; + + for (const acc of candidates.values()) { + const precision = acc.wins / Math.max(acc.support, 1); + const scenarioWins = scenarioWinCounts.get(acc.scenarioKey) ?? 0; + const scenarioExposures = scenarioExposureCounts.get(acc.scenarioKey) ?? 0; + + const lift = computeRuleLift({ + wins: acc.wins, + support: acc.support, + scenarioWins, + scenarioExposures, + }); + + // No regressions at distillation time — replay gate handles that + const confidence = classifyRuleConfidence({ + support: acc.support, + precision, + lift, + regressions: 0, + }); + + const ruleId = `${acc.kind}:${acc.skill}:${Array.isArray(acc.value) ? acc.value.join("+") : acc.value}`; + + // Sort sourceDecisionIds for determinism + const sortedIds = [...acc.sourceDecisionIds].sort(); + + rules.push({ + id: ruleId, + skill: acc.skill, + kind: acc.kind, + value: acc.value, + scenario: acc.scenario, + support: acc.support, + wins: acc.wins, + directiveWins: acc.directiveWins, + staleMisses: acc.staleMisses, + precision: Number(precision.toFixed(4)), + lift: Number(lift.toFixed(4)), + sourceDecisionIds: sortedIds, + confidence, + promotedAt: confidence === "promote" ? generatedAt : null, + }); + } + + logger.summary("distill_scoring_complete", { + totalRules: rules.length, + promoted: rules.filter((r) => r.confidence === "promote").length, + candidate: rules.filter((r) => r.confidence === "candidate").length, + holdoutFail: rules.filter((r) => r.confidence === "holdout-fail").length, + }); + + // Phase 3: Sort deterministically — by scenario key, then skill, then rule id + rules.sort((a, b) => { + const scenarioA = [a.scenario.hook, a.scenario.storyKind ?? "_", a.scenario.targetBoundary ?? "_", a.scenario.toolName, a.scenario.routeScope ?? "_"].join("|"); + const scenarioB = [b.scenario.hook, b.scenario.storyKind ?? "_", b.scenario.targetBoundary ?? "_", b.scenario.toolName, b.scenario.routeScope ?? "_"].join("|"); + const sc = scenarioA.localeCompare(scenarioB); + if (sc !== 0) return sc; + const sk = a.skill.localeCompare(b.skill); + if (sk !== 0) return sk; + return a.id.localeCompare(b.id); + }); + + // Phase 4: Replay gate + const replay = replayLearnedRules({ traces, rules }); + + // Determine promotion status + let promotion: PromotionStatus; + const rejected = replay.regressions.length > 0 || replay.learnedWins < replay.baselineWins; + + if (rejected) { + // Downgrade promoted rules + for (const rule of rules) { + if (rule.confidence === "promote") { + rule.confidence = "holdout-fail"; + rule.promotedAt = null; + } + } + + const reasons: string[] = []; + if (replay.regressions.length > 0) { + reasons.push(`${replay.regressions.length} regression(s) detected`); + } + if (replay.learnedWins < replay.baselineWins) { + reasons.push(`learned wins (${replay.learnedWins}) < baseline wins (${replay.baselineWins})`); + } + + promotion = { + accepted: false, + errorCode: "RULEBOOK_PROMOTION_REJECTED_REGRESSION", + reason: `Promotion rejected: ${reasons.join("; ")}`, + }; + + logger.summary("distill_promotion_rejected", { + errorCode: promotion.errorCode, + reason: promotion.reason, + regressions: replay.regressions.length, + learnedWins: replay.learnedWins, + baselineWins: replay.baselineWins, + }); + } else { + const promotedCount = rules.filter((r) => r.confidence === "promote").length; + promotion = { + accepted: true, + errorCode: null, + reason: `Promotion accepted: ${promotedCount} rule(s) promoted, ${replay.learnedWins} learned wins, 0 regressions`, + }; + + logger.summary("distill_promotion_accepted", { + promotedCount, + learnedWins: replay.learnedWins, + baselineWins: replay.baselineWins, + }); + } + + logger.summary("distill_complete", { + ruleCount: rules.length, + replayDelta: replay.deltaWins, + regressions: replay.regressions.length, + promotionAccepted: promotion.accepted, + }); + + return { + version: 1, + generatedAt, + projectRoot, + rules, + replay, + promotion, + }; +} + +// --------------------------------------------------------------------------- +// Re-export replayLearnedRules from rule-replay module for backward compat +// --------------------------------------------------------------------------- + +export { replayLearnedRules } from "./rule-replay.mjs"; diff --git a/hooks/src/rule-replay.mts b/hooks/src/rule-replay.mts new file mode 100644 index 0000000..442ea30 --- /dev/null +++ b/hooks/src/rule-replay.mts @@ -0,0 +1,177 @@ +/** + * rule-replay.mts — Deterministic replay gate for learned routing rules. + * + * Replays historical routing decision traces against baseline (existing) + * routing vs learned (promoted) routing rules. Blocks promotion when any + * trace that succeeded under baseline would regress under learned rules. + * + * Contract: + * - Pure function: no file I/O, no project reads, no side effects beyond logging. + * - Deterministic: identical inputs produce identical output, including + * regression ordering (sorted by decisionId). + * - Machine-readable: ReplayResult is structured JSON. + */ + +import type { RoutingDecisionTrace } from "./routing-decision-trace.mjs"; +import type { LearnedRoutingRule, ReplayResult } from "./rule-distillation.mjs"; +import { createLogger } from "./logger.mjs"; + +// --------------------------------------------------------------------------- +// Re-export ReplayResult so consumers can import from this module directly +// --------------------------------------------------------------------------- + +export type { ReplayResult } from "./rule-distillation.mjs"; + +// --------------------------------------------------------------------------- +// Internal: scenario key from a trace (mirrors rule-distillation.mts) +// --------------------------------------------------------------------------- + +function scenarioKeyFromTrace(trace: RoutingDecisionTrace): string { + const story = trace.primaryStory; + return [ + trace.hook, + story.kind ?? "_", + story.targetBoundary ?? "_", + trace.toolName, + story.storyRoute ?? "_", + ].join("|"); +} + +// --------------------------------------------------------------------------- +// Internal: scenario key from a rule +// --------------------------------------------------------------------------- + +function scenarioKeyFromRule(rule: LearnedRoutingRule): string { + return [ + rule.scenario.hook, + rule.scenario.storyKind ?? "_", + rule.scenario.targetBoundary ?? "_", + rule.scenario.toolName, + rule.scenario.routeScope ?? "_", + ].join("|"); +} + +// --------------------------------------------------------------------------- +// Core replay +// --------------------------------------------------------------------------- + +/** + * Replay historical traces against baseline vs learned routing. + * + * For each trace: + * - **Baseline win**: verification succeeded and at least one skill was injected. + * - **Learned win**: either (a) no promoted rules target this scenario so + * baseline carries through, or (b) at least one promoted rule's skill + * overlaps with the trace's injected skills. + * - **Regression**: baseline won but promoted rules exist for this scenario + * and none of them cover the injected winning skill. + * + * Only rules with `confidence === "promote"` participate. + */ +export function replayLearnedRules(params: { + traces: RoutingDecisionTrace[]; + rules: LearnedRoutingRule[]; +}): ReplayResult { + const { traces, rules } = params; + const logger = createLogger("summary"); + + logger.summary("replay_start", { + traceCount: traces.length, + ruleCount: rules.length, + promotedCount: rules.filter((r) => r.confidence === "promote").length, + }); + + // Build promoted-skill set per scenario key + const promotedByScenario = new Map>(); + for (const rule of rules) { + if (rule.confidence !== "promote") continue; + const sKey = scenarioKeyFromRule(rule); + let skills = promotedByScenario.get(sKey); + if (!skills) { + skills = new Set(); + promotedByScenario.set(sKey, skills); + } + skills.add(rule.skill); + } + + let baselineWins = 0; + let baselineDirectiveWins = 0; + let learnedWins = 0; + let learnedDirectiveWins = 0; + const regressions: string[] = []; + + for (const trace of traces) { + const sKey = scenarioKeyFromTrace(trace); + const promotedSkills = promotedByScenario.get(sKey); + + // Verified success requires an observed verification outcome and at least + // one injected skill. PreToolUse may write a placeholder verification + // object before PostToolUse observation arrives; those pending traces must + // not count as successes. + const verifiedSuccess = + trace.verification?.observedBoundary != null && + trace.injectedSkills.length > 0; + + // Directive adherence is the stricter subset where the suggested action + // matched on an observed verification. + const directiveAdherent = + verifiedSuccess && + trace.verification?.matchedSuggestedAction === true; + + if (verifiedSuccess) baselineWins++; + if (directiveAdherent) baselineDirectiveWins++; + + if (promotedSkills) { + // Learned rules exist for this scenario + const learnedOverlap = trace.injectedSkills.some((s) => + promotedSkills.has(s), + ); + + if (verifiedSuccess && !learnedOverlap) { + // Baseline won but promoted rules don't cover the winning skill. + // This is a regression: learned rules would displace the winner. + regressions.push(trace.decisionId); + logger.summary("replay_regression", { + decisionId: trace.decisionId, + scenario: sKey, + injectedSkills: trace.injectedSkills, + promotedSkills: [...promotedSkills], + }); + } else if (learnedOverlap) { + // Promoted rule covers an injected skill — learned win + learnedWins++; + if (directiveAdherent) learnedDirectiveWins++; + } + } else if (verifiedSuccess) { + // No promoted rules for this scenario — baseline win carries through + learnedWins++; + if (directiveAdherent) learnedDirectiveWins++; + } + } + + // Sort regressions for deterministic output + regressions.sort(); + + const result: ReplayResult = { + baselineWins, + baselineDirectiveWins, + learnedWins, + learnedDirectiveWins, + deltaWins: learnedWins - baselineWins, + deltaDirectiveWins: learnedDirectiveWins - baselineDirectiveWins, + regressions, + }; + + logger.summary("replay_complete", { + baselineWins: result.baselineWins, + baselineDirectiveWins: result.baselineDirectiveWins, + learnedWins: result.learnedWins, + learnedDirectiveWins: result.learnedDirectiveWins, + deltaWins: result.deltaWins, + deltaDirectiveWins: result.deltaDirectiveWins, + regressionCount: result.regressions.length, + regressionIds: result.regressions, + }); + + return result; +} diff --git a/hooks/src/session-end-cleanup.mts b/hooks/src/session-end-cleanup.mts index ffbd200..adeec3e 100644 --- a/hooks/src/session-end-cleanup.mts +++ b/hooks/src/session-end-cleanup.mts @@ -10,6 +10,7 @@ import { readdirSync, readFileSync, rmSync, unlinkSync, writeFileSync } from "no import { homedir, tmpdir } from "node:os"; import { join, resolve } from "node:path"; import { fileURLToPath } from "node:url"; +import { finalizeStaleExposures } from "./routing-policy-ledger.mjs"; type SessionEndHookInput = { session_id?: string; @@ -84,6 +85,13 @@ function main(): void { const tempRoot = tmpdir(); const prefix = `vercel-plugin-${tempSessionIdSegment(sessionId)}-`; + // Finalize any pending routing policy exposures before deleting temp files + try { + finalizeStaleExposures(sessionId, new Date().toISOString()); + } catch { + // Best-effort: don't block cleanup on policy finalization failure + } + // Glob all session-scoped temp entries (main + agent-scoped claim dirs, files, profile cache) let entries: string[] = []; try { diff --git a/hooks/src/subagent-start-bootstrap.mts b/hooks/src/subagent-start-bootstrap.mts index 2cb0404..aa1475a 100644 --- a/hooks/src/subagent-start-bootstrap.mts +++ b/hooks/src/subagent-start-bootstrap.mts @@ -25,6 +25,17 @@ import { compilePromptSignals, matchPromptWithReason, normalizePromptText } from import { loadSkills } from "./pretooluse-skill-inject.mjs"; import { extractFrontmatter } from "./skill-map-frontmatter.mjs"; import { claimPendingLaunch } from "./subagent-state.mjs"; +import { + computePlan, + loadCachedPlanResult, + selectActiveStory, + type VerificationPlanResult, +} from "./verification-plan.mjs"; +import { + buildVerificationDirective, + buildVerificationEnv, + type VerificationDirective, +} from "./verification-directive.mjs"; const PLUGIN_ROOT = resolvePluginRoot(); @@ -209,6 +220,144 @@ function resolveLikelySkillsFromPendingLaunch( } } +// --------------------------------------------------------------------------- +// Verification plan resolution (cached → fresh fallback) +// --------------------------------------------------------------------------- + +/** + * Resolve the verification plan for a session, trying the cached state first + * and falling back to a fresh computation from the ledger when the cache is + * missing or empty but ledger data exists. + */ +function resolveVerificationPlan( + sessionId: string | undefined, +): VerificationPlanResult | null { + if (!sessionId) return null; + + try { + const cached = loadCachedPlanResult(sessionId); + if (cached?.hasStories) { + log.debug("subagent-start-bootstrap:verification-plan-cached", { sessionId }); + return cached; + } + log.debug("subagent-start-bootstrap:verification-plan-cache-miss", { sessionId }); + } catch (error) { + logCaughtError(log, "subagent-start-bootstrap:verification-plan-cache-failed", error, { + sessionId, + }); + } + + try { + const fresh = computePlan(sessionId, { + agentBrowserAvailable: process.env.VERCEL_PLUGIN_AGENT_BROWSER_AVAILABLE !== "0", + lastAttemptedAction: process.env.VERCEL_PLUGIN_VERIFICATION_ACTION || null, + }); + if (fresh.hasStories) { + log.debug("subagent-start-bootstrap:verification-plan-fresh", { sessionId }); + return fresh; + } + log.debug("subagent-start-bootstrap:verification-plan-empty", { sessionId }); + } catch (error) { + logCaughtError(log, "subagent-start-bootstrap:verification-plan-fresh-failed", error, { + sessionId, + }); + } + + return null; +} + +// --------------------------------------------------------------------------- +// Verification context scoping +// --------------------------------------------------------------------------- + +/** + * Format a scoped verification context snippet from a resolved plan. + * Uses deterministic story selection via selectPrimaryStory. + * + * - minimal: story kind + route only + * - light: story + missing boundaries + candidate actions + * - standard: story + full primary action + evidence summary + */ +function buildVerificationContextFromPlan( + plan: VerificationPlanResult, + category: BudgetCategory, +): string | null { + if (!plan.hasStories || plan.stories.length === 0) return null; + + const story = selectActiveStory(plan); + if (!story) return null; + + const routePart = story.route ? ` (${story.route})` : ""; + + switch (category) { + case "minimal": { + return [ + ``, + `Verification story: ${story.kind}${routePart}`, + ``, + ].join("\n"); + } + case "light": { + const lines: string[] = [ + ``, + `Verification story: ${story.kind}${routePart} — "${story.promptExcerpt}"`, + ]; + if (plan.missingBoundaries.length > 0) { + lines.push(`Missing boundaries: ${plan.missingBoundaries.join(", ")}`); + } + if (plan.primaryNextAction) { + lines.push(`Candidate action: ${plan.primaryNextAction.action}`); + } + if (plan.blockedReasons.length > 0) { + lines.push(`Blocked: ${plan.blockedReasons[0]}`); + } + lines.push(``); + return lines.join("\n"); + } + case "standard": { + const lines: string[] = [ + ``, + `Verification story: ${story.kind}${routePart} — "${story.promptExcerpt}"`, + `Evidence: ${plan.satisfiedBoundaries.length}/4 boundaries [${plan.satisfiedBoundaries.join(", ") || "none"}]`, + ]; + if (plan.missingBoundaries.length > 0) { + lines.push(`Missing: ${plan.missingBoundaries.join(", ")}`); + } + if (plan.primaryNextAction) { + lines.push(`Primary action: \`${plan.primaryNextAction.action}\``); + lines.push(`Reason: ${plan.primaryNextAction.reason}`); + } + if (plan.blockedReasons.length > 0) { + for (const reason of plan.blockedReasons) { + lines.push(`Blocked: ${reason}`); + } + } + if (plan.recentRoutes.length > 0) { + lines.push(`Recent routes: ${plan.recentRoutes.join(", ")}`); + } + lines.push(``); + return lines.join("\n"); + } + } +} + +/** + * Load verification plan for the session and format a scoped snippet + * appropriate for the agent type's budget category. + * + * Uses resolveVerificationPlan for cached→fresh fallback, and + * selectPrimaryStory for deterministic story selection. + * + * Returns null if no verification plan exists or no stories are active. + */ +function buildVerificationContext( + sessionId: string | undefined, + category: BudgetCategory, +): string | null { + const plan = resolveVerificationPlan(sessionId); + return plan ? buildVerificationContextFromPlan(plan, category) : null; +} + // --------------------------------------------------------------------------- // Context assembly // --------------------------------------------------------------------------- @@ -221,10 +370,21 @@ function profileLine(agentType: string, likelySkills: string[]): string { * Build minimal context (~1KB): project profile + skill name list. * Used for Explore agents that only need orientation. */ -function buildMinimalContext(agentType: string, likelySkills: string[]): string { +function buildMinimalContext(agentType: string, likelySkills: string[], sessionId?: string): string { const parts: string[] = []; parts.push(``); parts.push(profileLine(agentType, likelySkills)); + + // Append verification context if present (minimal: story + route) + const verificationCtx = buildVerificationContext(sessionId, "minimal"); + if (verificationCtx) { + const verBytes = Buffer.byteLength(verificationCtx, "utf8"); + const currentBytes = Buffer.byteLength(parts.join("\n"), "utf8"); + if (currentBytes + verBytes + 50 <= MINIMAL_BUDGET_BYTES) { + parts.push(verificationCtx); + } + } + parts.push(""); return parts.join("\n"); } @@ -233,7 +393,7 @@ function buildMinimalContext(agentType: string, likelySkills: string[]): string * Build light context (~3KB): profile + skill summaries + deployment constraints. * Used for Plan agents that need enough context to architect solutions. */ -function buildLightContext(agentType: string, likelySkills: string[], budgetBytes: number): string { +function buildLightContext(agentType: string, likelySkills: string[], budgetBytes: number, sessionId?: string): string { const parts: string[] = []; parts.push(``); parts.push(profileLine(agentType, likelySkills)); @@ -269,6 +429,16 @@ function buildLightContext(agentType: string, likelySkills: string[], budgetByte usedBytes += lineBytes + 1; } + // Append verification context if present (light: story + missing boundaries + candidates) + const verificationCtx = buildVerificationContext(sessionId, "light"); + if (verificationCtx) { + const verBytes = Buffer.byteLength(verificationCtx, "utf8"); + if (usedBytes + verBytes + 1 <= budgetBytes) { + parts.push(verificationCtx); + usedBytes += verBytes + 1; + } + } + parts.push(""); return parts.join("\n"); } @@ -277,7 +447,7 @@ function buildLightContext(agentType: string, likelySkills: string[], budgetByte * Build standard context (~8KB): profile + top skill full bodies. * Used for general-purpose agents that need actionable skill content. */ -function buildStandardContext(agentType: string, likelySkills: string[], budgetBytes: number): string { +function buildStandardContext(agentType: string, likelySkills: string[], budgetBytes: number, sessionId?: string): string { const parts: string[] = []; parts.push(``); parts.push(profileLine(agentType, likelySkills)); @@ -316,6 +486,16 @@ function buildStandardContext(agentType: string, likelySkills: string[], budgetB } } + // Append verification context if present (standard: full evidence + primary action) + const verificationCtx = buildVerificationContext(sessionId, "standard"); + if (verificationCtx) { + const verBytes = Buffer.byteLength(verificationCtx, "utf8"); + if (usedBytes + verBytes + 1 <= budgetBytes) { + parts.push(verificationCtx); + usedBytes += verBytes + 1; + } + } + parts.push(""); return parts.join("\n"); } @@ -349,13 +529,13 @@ function main(): void { let context: string; switch (category) { case "minimal": - context = buildMinimalContext(agentType, likelySkills); + context = buildMinimalContext(agentType, likelySkills, sessionId); break; case "light": - context = buildLightContext(agentType, likelySkills, maxBytes); + context = buildLightContext(agentType, likelySkills, maxBytes, sessionId); break; case "standard": - context = buildStandardContext(agentType, likelySkills, maxBytes); + context = buildStandardContext(agentType, likelySkills, maxBytes, sessionId); break; } @@ -385,6 +565,11 @@ function main(): void { const pendingLaunchMatched = likelySkills.length !== profilerLikelySkills.length || likelySkills.some((s) => !profilerLikelySkills.includes(s)); + // Build verification directive for downstream hooks + const verificationPlan = resolveVerificationPlan(sessionId); + const verificationDirective = buildVerificationDirective(verificationPlan); + const verificationEnv = buildVerificationEnv(verificationDirective); + log.summary("subagent-start-bootstrap:complete", { agent_id: agentId, agent_type: agentType, @@ -393,13 +578,16 @@ function main(): void { budget_max: maxBytes, budget_category: category, pending_launch_matched: pendingLaunchMatched, + verification_directive: verificationDirective !== null, + verification_env_keys: Object.keys(verificationEnv), }); - const output: SyncHookJSONOutput = { + const output: SyncHookJSONOutput & { env?: Record } = { hookSpecificOutput: { hookEventName: "SubagentStart", additionalContext: context, }, + ...(Object.keys(verificationEnv).length > 0 ? { env: verificationEnv } : {}), }; process.stdout.write(JSON.stringify(output)); @@ -421,7 +609,14 @@ export { buildMinimalContext, buildLightContext, buildStandardContext, + buildVerificationContext, + buildVerificationContextFromPlan, + buildVerificationDirective, + buildVerificationEnv, + resolveVerificationPlan, getLikelySkills, + resolveBudgetCategory, main, }; +export type { VerificationDirective } from "./verification-directive.mjs"; export type { SubagentStartInput, ProfileCache, BudgetCategory }; diff --git a/hooks/src/user-prompt-submit-skill-inject.mts b/hooks/src/user-prompt-submit-skill-inject.mts index c3dfb3a..0bed61c 100644 --- a/hooks/src/user-prompt-submit-skill-inject.mts +++ b/hooks/src/user-prompt-submit-skill-inject.mts @@ -48,6 +48,29 @@ import type { PromptAnalysisReport } from "./prompt-analysis.mjs"; import { createLogger, logDecision } from "./logger.mjs"; import type { Logger } from "./logger.mjs"; import { trackBaseEvents } from "./telemetry.mjs"; +import { loadCachedPlanResult } from "./verification-plan.mjs"; +import { resolvePromptVerificationBinding } from "./prompt-verification-binding.mjs"; +import { applyPolicyBoosts, applyRulebookBoosts } from "./routing-policy.mjs"; +import type { RoutingHookName, RoutingToolName, RulebookBoostExplanation } from "./routing-policy.mjs"; +import { + appendSkillExposure, + loadProjectRoutingPolicy, +} from "./routing-policy-ledger.mjs"; +import { loadRulebook, rulebookPath } from "./learned-routing-rulebook.mjs"; +import { applyPromptPolicyRecall } from "./prompt-policy-recall.mjs"; +import { recallVerifiedCompanions } from "./companion-recall.mjs"; +import { recallVerifiedPlaybook } from "./playbook-recall.mjs"; +import { buildAttributionDecision } from "./routing-attribution.mjs"; +import { + appendRoutingDecisionTrace, + createDecisionId, +} from "./routing-decision-trace.mjs"; +import type { RoutingDecisionTrace } from "./routing-decision-trace.mjs"; +import { + buildDecisionCapsule, + buildDecisionCapsuleEnv, + persistDecisionCapsule, +} from "./routing-decision-capsule.mjs"; const MAX_SKILLS = 2; const DEFAULT_INJECTION_BUDGET_BYTES = 8_000; @@ -1042,12 +1065,89 @@ export function run(): string { }); } - // No matches at all - const allMatched = Object.entries(report.perSkillResults) + // Stage 3c: Resolve prompt verification binding before early returns + // so policy recall can rescue zero-match scenarios + const promptPlan = sessionId ? loadCachedPlanResult(sessionId, log) : null; + const promptBinding = resolvePromptVerificationBinding({ plan: promptPlan }); + log.debug("prompt-verification-binding", { + source: promptBinding.source, + storyId: promptBinding.storyId, + targetBoundary: promptBinding.targetBoundary, + confidence: promptBinding.confidence, + reason: promptBinding.reason, + }); + + let matchedSkills = Object.entries(report.perSkillResults) .filter(([, r]) => r.matched) .map(([skill]) => skill); - if (allMatched.length === 0) { + const promptPolicy = cwd ? loadProjectRoutingPolicy(cwd) : null; + const promptPolicyRecallSynthetic = new Set(); + const promptPolicyRecallReasons: Record = {}; + + if (promptPolicy && promptBinding.storyId && promptBinding.targetBoundary) { + const recall = applyPromptPolicyRecall({ + selectedSkills: report.selectedSkills, + matchedSkills, + seenSkills: dedupOff ? [] : parseSeenSkills(seenState), + maxSkills: MAX_SKILLS, + binding: { + storyId: promptBinding.storyId, + storyKind: promptBinding.storyKind, + route: promptBinding.route, + targetBoundary: promptBinding.targetBoundary, + }, + policy: promptPolicy, + }); + + report.selectedSkills.length = 0; + report.selectedSkills.push(...recall.selectedSkills); + matchedSkills = recall.matchedSkills; + for (const skill of recall.syntheticSkills) { + promptPolicyRecallSynthetic.add(skill); + } + Object.assign(promptPolicyRecallReasons, recall.reasons); + + if (recall.diagnosis) { + log.debug("prompt-policy-recall-lookup", { + requestedScenario: `UserPromptSubmit|${promptBinding.storyKind ?? "none"}|` + + `${promptBinding.targetBoundary ?? "none"}|Prompt|${promptBinding.route ?? "*"}`, + checkedScenarios: recall.diagnosis.checkedScenarios, + selectedBucket: recall.diagnosis.selectedBucket, + selectedSkills: recall.diagnosis.selected.map((c) => c.skill), + rejected: recall.diagnosis.rejected.map((c) => ({ + skill: c.skill, + scenario: c.scenario, + exposures: c.exposures, + successRate: c.successRate, + policyBoost: c.policyBoost, + excluded: c.excluded, + rejectedReason: c.rejectedReason, + })), + hintCodes: recall.diagnosis.hints.map((h) => h.code), + }); + for (const candidate of recall.diagnosis.selected) { + log.debug("prompt-policy-recall-injected", { + skill: candidate.skill, + scenario: candidate.scenario, + exposures: candidate.exposures, + wins: candidate.wins, + directiveWins: candidate.directiveWins, + successRate: candidate.successRate, + policyBoost: candidate.policyBoost, + recallScore: candidate.recallScore, + }); + } + } + } else if (cwd) { + log.debug("prompt-policy-recall-skipped", { + reason: !promptBinding.storyId + ? "no_active_verification_story" + : "no_target_boundary", + }); + } + + if (matchedSkills.length === 0) { log.debug("prompt-analysis-issue", { issue: "no_prompt_matches", evaluatedSkills: Object.keys(report.perSkillResults), @@ -1059,21 +1159,240 @@ export function run(): string { return formatEmptyOutput(platform, finalizePromptEnvUpdates(platform, promptEnvBefore)); } - // All matched but filtered by dedup if (report.selectedSkills.length === 0) { log.debug("prompt-analysis-issue", { issue: "all_deduped", - matchedSkills: allMatched, + matchedSkills, seenSkills: report.dedupState.seenSkills, dedupStrategy: report.dedupState.strategy, }); log.complete("all_deduped", { - matchedCount: allMatched.length, - dedupedCount: allMatched.length, + matchedCount: matchedSkills.length, + dedupedCount: matchedSkills.length, }, log.active ? timing : null); return formatEmptyOutput(platform, finalizePromptEnvUpdates(platform, promptEnvBefore)); } + // Stage 3d: Apply routing-policy boosts to reorder selected skills + const promptPolicyBoosted: Array<{ skill: string; boost: number; reason: string | null }> = []; + if (promptPolicy && report.selectedSkills.length > 0 && promptBinding.storyId && promptBinding.targetBoundary) { + const promptPolicyScenario = { + hook: "UserPromptSubmit" as RoutingHookName, + storyKind: promptBinding.storyKind, + targetBoundary: promptBinding.targetBoundary, + toolName: "Prompt" as RoutingToolName, + }; + const rankable = report.selectedSkills.map((skill) => { + const r = report.perSkillResults[skill]; + return { + skill, + priority: r?.score ?? 0, + effectivePriority: r?.score ?? 0, + }; + }); + const boosted = applyPolicyBoosts(rankable, promptPolicy, promptPolicyScenario); + + // Re-sort selected skills by boosted effective priority (desc), then skill name (asc) for determinism + boosted.sort((a, b) => + b.effectivePriority - a.effectivePriority || a.skill.localeCompare(b.skill), + ); + report.selectedSkills.length = 0; + report.selectedSkills.push(...boosted.map((b) => b.skill)); + + for (const b of boosted) { + if (b.policyBoost !== 0) { + promptPolicyBoosted.push({ + skill: b.skill, + boost: b.policyBoost, + reason: b.policyReason, + }); + } + } + + if (promptPolicyBoosted.length > 0) { + log.debug("prompt-policy-boosted", { + scenario: `${promptPolicyScenario.hook}|${promptPolicyScenario.storyKind ?? "none"}|${promptPolicyScenario.targetBoundary}|Prompt`, + boostedSkills: promptPolicyBoosted, + }); + } + } else if (cwd && report.selectedSkills.length > 0) { + log.debug("prompt-policy-boost-skipped", { + reason: !promptBinding.storyId + ? "no_active_verification_story" + : "no_target_boundary", + }); + } + + // Stage 3e: Apply learned-rulebook boosts with precedence over stats-policy + const promptRulebookBoosted: RulebookBoostExplanation[] = []; + if (cwd && report.selectedSkills.length > 0 && promptBinding.storyId && promptBinding.targetBoundary) { + const rbResult = loadRulebook(cwd); + if (rbResult.ok && rbResult.rulebook.rules.length > 0) { + const rbScenario = { + hook: "UserPromptSubmit" as RoutingHookName, + storyKind: promptBinding.storyKind, + targetBoundary: promptBinding.targetBoundary, + toolName: "Prompt" as RoutingToolName, + }; + const rbPath = rulebookPath(cwd); + const rankable = report.selectedSkills.map((skill) => { + const r = report.perSkillResults[skill]; + const pb = promptPolicyBoosted.find((p) => p.skill === skill); + return { + skill, + priority: r?.score ?? 0, + effectivePriority: (r?.score ?? 0) + (pb?.boost ?? 0), + policyBoost: pb?.boost ?? 0, + policyReason: pb?.reason ?? null, + }; + }); + const withRulebook = applyRulebookBoosts(rankable, rbResult.rulebook, rbScenario, rbPath); + + // Re-sort by effective priority after rulebook application + withRulebook.sort((a, b) => + b.effectivePriority - a.effectivePriority || a.skill.localeCompare(b.skill), + ); + report.selectedSkills.length = 0; + report.selectedSkills.push(...withRulebook.map((r) => r.skill)); + + for (const rb of withRulebook) { + if (rb.matchedRuleId) { + promptRulebookBoosted.push({ + skill: rb.skill, + matchedRuleId: rb.matchedRuleId, + ruleBoost: rb.ruleBoost, + ruleReason: rb.ruleReason ?? "", + rulebookPath: rb.rulebookPath ?? "", + }); + // Suppress stats-policy boost for skills where rulebook takes precedence + const pIdx = promptPolicyBoosted.findIndex((p) => p.skill === rb.skill); + if (pIdx !== -1) { + promptPolicyBoosted.splice(pIdx, 1); + } + } + } + + if (promptRulebookBoosted.length > 0) { + log.debug("prompt-rulebook-boosted", { + scenario: `${rbScenario.hook}|${rbScenario.storyKind ?? "none"}|${rbScenario.targetBoundary}|Prompt`, + boostedSkills: promptRulebookBoosted, + }); + } + } else if (!rbResult.ok) { + log.debug("prompt-rulebook-load-error", { code: rbResult.error.code, message: rbResult.error.message }); + } + } + + // Stage 3f: Verified companion recall — insert learned companion skills + // immediately after their candidate in the selected list. Symmetric with + // PreToolUse Stage 4.96. + const promptCompanionRecallReasons: Record = {}; + const promptForceSummarySkills = new Set(); + if (cwd && promptBinding.storyId && promptBinding.targetBoundary) { + const companionRecall = recallVerifiedCompanions({ + projectRoot: cwd, + scenario: { + hook: "UserPromptSubmit" as RoutingHookName, + storyKind: promptBinding.storyKind, + targetBoundary: promptBinding.targetBoundary, + toolName: "Prompt" as RoutingToolName, + routeScope: promptBinding.route ?? null, + }, + candidateSkills: [...report.selectedSkills], + excludeSkills: new Set([ + ...report.selectedSkills, + ...(dedupOff ? [] : parseSeenSkills(seenState)), + ]), + maxCompanions: 1, + }); + + for (const recall of companionRecall.selected) { + const candidateIdx = report.selectedSkills.indexOf(recall.candidateSkill); + if (candidateIdx === -1) continue; + report.selectedSkills.splice(candidateIdx + 1, 0, recall.companionSkill); + matchedSkills.push(recall.companionSkill); + + const seenSkills = dedupOff ? new Set() : parseSeenSkills(seenState); + const alreadySeen = !dedupOff && seenSkills.has(recall.companionSkill); + if (alreadySeen) { + promptForceSummarySkills.add(recall.companionSkill); + } + + promptCompanionRecallReasons[recall.companionSkill] = { + trigger: "verified-companion", + reasonCode: "scenario-companion-rulebook", + }; + + log.debug("prompt-companion-recall-injected", { + candidateSkill: recall.candidateSkill, + companionSkill: recall.companionSkill, + scenario: recall.scenario, + lift: recall.confidence, + summaryOnly: alreadySeen, + }); + } + + if (companionRecall.rejected.length > 0) { + log.debug("prompt-companion-recall-rejected", { + rejected: companionRecall.rejected, + }); + } + } else if (cwd) { + log.debug("prompt-companion-recall-skipped", { + reason: !promptBinding.storyId + ? "no_active_verification_story" + : "no_target_boundary", + }); + } + + // Stage 3g: Verified playbook recall — insert learned ordered multi-skill + // sequences after the anchor skill. Symmetric with PreToolUse Stage 4.97. + const promptPlaybookRecallReasons: Record = {}; + let promptPlaybookBanner: string | null = null; + const availablePlaybookSlots = Math.max(0, MAX_SKILLS - report.selectedSkills.length); + if (cwd && promptBinding.storyId && promptBinding.targetBoundary && availablePlaybookSlots > 0) { + const playbookRecall = recallVerifiedPlaybook({ + projectRoot: cwd, + scenario: { + hook: "UserPromptSubmit" as RoutingHookName, + storyKind: promptBinding.storyKind, + targetBoundary: promptBinding.targetBoundary, + toolName: "Prompt" as RoutingToolName, + routeScope: promptBinding.route ?? null, + }, + candidateSkills: [...report.selectedSkills], + excludeSkills: new Set([ + ...report.selectedSkills, + ...(dedupOff ? [] : parseSeenSkills(seenState)), + ]), + maxInsertedSkills: availablePlaybookSlots, + }); + + if (playbookRecall.selected) { + promptPlaybookBanner = playbookRecall.banner; + const anchorIdx = report.selectedSkills.indexOf(playbookRecall.selected.anchorSkill); + let insertOffset = 1; + for (const skill of playbookRecall.selected.insertedSkills) { + report.selectedSkills.splice(anchorIdx + insertOffset, 0, skill); + matchedSkills.push(skill); + const seenSkills = dedupOff ? new Set() : parseSeenSkills(seenState); + if (!dedupOff && seenSkills.has(skill)) { + promptForceSummarySkills.add(skill); + } + promptPlaybookRecallReasons[skill] = { + trigger: "verified-playbook", + reasonCode: "scenario-playbook-rulebook", + }; + insertOffset += 1; + } + log.debug("prompt-playbook-recall-injected", { + ruleId: playbookRecall.selected.ruleId, + anchorSkill: playbookRecall.selected.anchorSkill, + insertedSkills: playbookRecall.selected.insertedSkills, + }); + } + } + // Stage 4: inject selected skills (file I/O for SKILL.md bodies) const tInject = log.active ? log.now() : 0; const injectedSkills = dedupOff ? new Set() : parseSeenSkills(seenState); @@ -1087,6 +1406,7 @@ export function run(): string { maxSkills: MAX_SKILLS, skillMap: skills.skillMap, logger: log, + forceSummarySkills: promptForceSummarySkills.size > 0 ? promptForceSummarySkills : undefined, platform: platform as "claude-code" | "cursor", }); if (log.active) timing.inject = Math.round(log.now() - tInject); @@ -1098,7 +1418,58 @@ export function run(): string { } const droppedByCap = [...injectResult.droppedByCap, ...report.droppedByCap]; const droppedByBudget = [...injectResult.droppedByBudget, ...report.droppedByBudget]; - const matchedSkills = allMatched; + // Record routing-policy exposures for actually injected skills + // Only record when binding has both storyId and targetBoundary — prevents unresolvable exposures + let promptAttribution: ReturnType | null = null; + if (loaded.length > 0 && sessionId && promptBinding.storyId && promptBinding.targetBoundary) { + promptAttribution = buildAttributionDecision({ + sessionId, + hook: "UserPromptSubmit", + storyId: promptBinding.storyId, + route: promptBinding.route, + targetBoundary: promptBinding.targetBoundary, + loadedSkills: loaded, + preferredSkills: promptPolicyRecallSynthetic, + }); + + for (const skill of loaded) { + appendSkillExposure({ + id: `${sessionId}:prompt:${skill}:${Date.now()}`, + sessionId, + projectRoot: cwd, + storyId: promptBinding.storyId, + storyKind: promptBinding.storyKind, + route: promptBinding.route, + hook: "UserPromptSubmit", + toolName: "Prompt", + skill, + targetBoundary: promptBinding.targetBoundary, + exposureGroupId: promptAttribution!.exposureGroupId, + attributionRole: skill === promptAttribution!.candidateSkill ? "candidate" : "context", + candidateSkill: promptAttribution!.candidateSkill, + createdAt: new Date().toISOString(), + resolvedAt: null, + outcome: "pending", + }); + } + log.summary("routing-policy-exposures-recorded", { + hook: "UserPromptSubmit", + skills: loaded, + storyId: promptBinding.storyId, + storyKind: promptBinding.storyKind, + targetBoundary: promptBinding.targetBoundary, + candidateSkill: promptAttribution!.candidateSkill, + exposureGroupId: promptAttribution!.exposureGroupId, + }); + } else if (loaded.length > 0 && sessionId) { + log.debug("routing-policy-exposures-skipped", { + hook: "UserPromptSubmit", + reason: !promptBinding.storyId + ? "no active verification story" + : "no target boundary", + skills: loaded, + }); + } if (parts.length === 0) { log.complete("all_deduped", { @@ -1152,15 +1523,134 @@ export function run(): string { } outputEnv = finalizePromptEnvUpdates(platform, promptEnvBefore); } - // Stage 5: formatOutput + // Stage 5a: Emit routing decision trace + decision capsule + { + const traceTimestamp = new Date().toISOString(); + const decisionId = createDecisionId({ + hook: "UserPromptSubmit", + sessionId, + toolName: "Prompt", + toolTarget: normalizedPrompt, + timestamp: traceTimestamp, + }); + const promptTrace: RoutingDecisionTrace = { + version: 2, + decisionId, + sessionId, + hook: "UserPromptSubmit", + toolName: "Prompt", + toolTarget: normalizedPrompt, + timestamp: traceTimestamp, + primaryStory: { + id: promptBinding.storyId, + kind: promptBinding.storyKind, + storyRoute: promptBinding.route, + targetBoundary: promptBinding.targetBoundary, + }, + observedRoute: null, // UserPromptSubmit fires before execution; no observed route + policyScenario: promptBinding.storyId && promptBinding.targetBoundary + ? `UserPromptSubmit|${promptBinding.storyKind ?? "none"}|${promptBinding.targetBoundary}|Prompt` + : null, + matchedSkills, + injectedSkills: loaded, + skippedReasons: [ + ...(promptBinding.storyId ? [] : ["no_active_verification_story"]), + ...(promptBinding.storyId && !promptBinding.targetBoundary ? ["no_target_boundary"] : []), + ...droppedByCap.map((skill) => `cap_exceeded:${skill}`), + ...droppedByBudget.map((skill) => `budget_exhausted:${skill}`), + ], + ranked: report.selectedSkills.map((skill) => { + const result = report.perSkillResults[skill]; + const policy = promptPolicyBoosted.find((p) => p.skill === skill); + const rb = promptRulebookBoosted.find((r) => r.skill === skill); + const companionReason = promptCompanionRecallReasons[skill]; + const playbookReason = promptPlaybookRecallReasons[skill]; + const synthetic = promptPolicyRecallSynthetic.has(skill) || Boolean(companionReason) || Boolean(playbookReason); + const baseScore = result?.score ?? 0; + const effectiveBoost = rb ? rb.ruleBoost : (policy?.boost ?? 0); + return { + skill, + basePriority: baseScore, + effectivePriority: baseScore + effectiveBoost, + pattern: playbookReason + ? { type: playbookReason.trigger, value: playbookReason.reasonCode } + : companionReason + ? { type: companionReason.trigger, value: companionReason.reasonCode } + : promptPolicyRecallSynthetic.has(skill) + ? { type: "policy-recall", value: promptPolicyRecallReasons[skill] } + : result?.reason + ? { type: "prompt-signal", value: result.reason } + : null, + profilerBoost: 0, + policyBoost: policy?.boost ?? 0, + policyReason: policy?.reason ?? null, + matchedRuleId: rb?.matchedRuleId ?? null, + ruleBoost: rb?.ruleBoost ?? 0, + ruleReason: rb?.ruleReason ?? null, + rulebookPath: rb?.rulebookPath ?? null, + summaryOnly: summaryOnly.includes(skill), + synthetic, + droppedReason: droppedByCap.includes(skill) + ? "cap_exceeded" + : droppedByBudget.includes(skill) + ? "budget_exhausted" + : null, + }; + }), + verification: null, + causes: [], + edges: [], + }; + appendRoutingDecisionTrace(promptTrace); + + // Build and persist decision capsule + const promptCapsule = buildDecisionCapsule({ + sessionId, + hook: "UserPromptSubmit", + createdAt: traceTimestamp, + toolName: "Prompt", + toolTarget: normalizedPrompt, + platform, + trace: promptTrace, + directive: null, // UserPromptSubmit has no verification directive + attribution: promptAttribution + ? { + exposureGroupId: promptAttribution.exposureGroupId, + candidateSkill: promptAttribution.candidateSkill, + loadedSkills: promptAttribution.loadedSkills, + } + : null, + env: outputEnv, + }); + const promptCapsulePath = persistDecisionCapsule(promptCapsule, log); + const capsuleEnv = buildDecisionCapsuleEnv(promptCapsule, promptCapsulePath); + outputEnv = { ...(outputEnv ?? {}), ...capsuleEnv }; + + log.summary("routing.decision_trace_written", { + decisionId, + hook: "UserPromptSubmit", + matchedSkills, + injectedSkills: loaded, + capsulePath: promptCapsulePath, + }); + } + + // Stage 5b: formatOutput // Build prompt match reasons for the banner const promptMatchReasons: Record = {}; for (const skill of loaded) { + if (promptPolicyRecallReasons[skill]) { + promptMatchReasons[skill] = promptPolicyRecallReasons[skill]; + continue; + } const r = report.perSkillResults[skill]; if (r?.reason) { promptMatchReasons[skill] = r.reason; } } + if (promptPlaybookBanner) { + parts.unshift(promptPlaybookBanner); + } return formatOutput( parts, matchedSkills, diff --git a/hooks/src/verification-closure-capsule.mts b/hooks/src/verification-closure-capsule.mts new file mode 100644 index 0000000..c4fd3bd --- /dev/null +++ b/hooks/src/verification-closure-capsule.mts @@ -0,0 +1,229 @@ +/** + * Verification Closure Capsule: append-only JSONL receipt for every + * PostToolUse boundary observation. + * + * Each capsule captures the gate verdict, story-resolution method, + * exposure diagnosis, policy-resolution outcome, and current plan + * next action in one machine-readable object. + * + * Persistence contract: + * - Capsule file: `/verification-closure-capsules.jsonl` + * - One JSON object per line, appended atomically. + * - Safe to read incrementally (tail -f compatible). + */ + +import { + appendFileSync, + mkdirSync, + readFileSync, +} from "node:fs"; +import { join } from "node:path"; +import { + createLogger, + logCaughtError, + type Logger, +} from "./logger.mjs"; +import { traceDir } from "./routing-decision-trace.mjs"; +import type { SkillExposure } from "./routing-policy-ledger.mjs"; +import type { + PendingExposureMatchDiagnosis, + ResolutionGateEvaluation, +} from "./verification-closure-diagnosis.mjs"; + +// --------------------------------------------------------------------------- +// Types +// --------------------------------------------------------------------------- + +export interface VerificationClosureCapsule { + version: 1; + hook: "PostToolUse"; + createdAt: string; + sessionId: string | null; + verificationId: string; + toolName: string; + + observation: { + boundary: string; + signalStrength: string; + evidenceSource: string; + matchedPattern: string; + command: string; + inferredRoute: string | null; + matchedSuggestedAction: boolean; + }; + + storyResolution: { + resolvedStoryId: string | null; + method: "explicit-env" | "exact-route" | "active-story" | "none"; + activeStoryId: string | null; + activeStoryKind: string | null; + activeStoryRoute: string | null; + }; + + gate: ResolutionGateEvaluation; + + exposureDiagnosis: PendingExposureMatchDiagnosis | null; + + resolution: { + attempted: boolean; + outcomeKind: "win" | "directive-win" | null; + resolvedCount: number; + resolvedExposureIds: string[]; + candidateResolvedCount: number; + contextResolvedCount: number; + }; + + plan: { + activeStoryId: string | null; + satisfiedBoundaries: string[]; + missingBoundaries: string[]; + blockedReasons: string[]; + primaryNextAction: { + action: string; + targetBoundary: string; + reason: string; + } | null; + }; +} + +// --------------------------------------------------------------------------- +// Path helpers +// --------------------------------------------------------------------------- + +export function verificationClosureCapsulePath( + sessionId: string | null, +): string { + return join(traceDir(sessionId), "verification-closure-capsules.jsonl"); +} + +// --------------------------------------------------------------------------- +// Builder (pure) +// --------------------------------------------------------------------------- + +export function buildVerificationClosureCapsule(input: { + sessionId: string | null; + verificationId: string; + toolName: string; + createdAt?: string; + observation: VerificationClosureCapsule["observation"]; + storyResolution: VerificationClosureCapsule["storyResolution"]; + gate: ResolutionGateEvaluation; + exposureDiagnosis: PendingExposureMatchDiagnosis | null; + resolvedExposures: SkillExposure[]; + plan: { + activeStoryId: string | null; + satisfiedBoundaries: Iterable; + missingBoundaries: string[]; + blockedReasons: string[]; + primaryNextAction: { + action: string; + targetBoundary: string; + reason: string; + } | null; + }; +}): VerificationClosureCapsule { + const outcomeKind: "win" | "directive-win" | null = + input.resolvedExposures.length === 0 + ? null + : input.observation.matchedSuggestedAction + ? "directive-win" + : "win"; + + return { + version: 1, + hook: "PostToolUse", + createdAt: input.createdAt ?? new Date().toISOString(), + sessionId: input.sessionId, + verificationId: input.verificationId, + toolName: input.toolName, + observation: input.observation, + storyResolution: input.storyResolution, + gate: input.gate, + exposureDiagnosis: input.exposureDiagnosis, + resolution: { + attempted: input.gate.eligible, + outcomeKind, + resolvedCount: input.resolvedExposures.length, + resolvedExposureIds: input.resolvedExposures.map((e) => e.id), + candidateResolvedCount: input.resolvedExposures.filter( + (e) => e.attributionRole !== "context", + ).length, + contextResolvedCount: input.resolvedExposures.filter( + (e) => e.attributionRole === "context", + ).length, + }, + plan: { + activeStoryId: input.plan.activeStoryId, + satisfiedBoundaries: Array.from(input.plan.satisfiedBoundaries).sort(), + missingBoundaries: [...input.plan.missingBoundaries], + blockedReasons: [...input.plan.blockedReasons], + primaryNextAction: input.plan.primaryNextAction ?? null, + }, + }; +} + +// --------------------------------------------------------------------------- +// Persistence (append-only JSONL) +// --------------------------------------------------------------------------- + +export function persistVerificationClosureCapsule( + capsule: VerificationClosureCapsule, + logger?: Logger, +): string { + const log = logger ?? createLogger(); + const path = verificationClosureCapsulePath(capsule.sessionId); + + try { + mkdirSync(traceDir(capsule.sessionId), { recursive: true }); + appendFileSync(path, JSON.stringify(capsule) + "\n", "utf8"); + + log.summary("verification.closure_capsule_written", { + verificationId: capsule.verificationId, + sessionId: capsule.sessionId, + toolName: capsule.toolName, + boundary: capsule.observation.boundary, + path, + }); + } catch (error) { + logCaughtError( + log, + "verification.closure_capsule_write_failed", + error, + { + verificationId: capsule.verificationId, + sessionId: capsule.sessionId, + path, + }, + ); + } + + return path; +} + +// --------------------------------------------------------------------------- +// Readers +// --------------------------------------------------------------------------- + +export function readVerificationClosureCapsules( + sessionId: string | null, +): VerificationClosureCapsule[] { + try { + const raw = readFileSync( + verificationClosureCapsulePath(sessionId), + "utf8", + ); + return raw + .split("\n") + .filter((line) => line.trim() !== "") + .map((line) => JSON.parse(line) as VerificationClosureCapsule); + } catch { + return []; + } +} + +export function readLatestVerificationClosureCapsule( + sessionId: string | null, +): VerificationClosureCapsule | null { + const all = readVerificationClosureCapsules(sessionId); + return all.length > 0 ? all[all.length - 1]! : null; +} diff --git a/hooks/src/verification-closure-diagnosis.mts b/hooks/src/verification-closure-diagnosis.mts new file mode 100644 index 0000000..34c2260 --- /dev/null +++ b/hooks/src/verification-closure-diagnosis.mts @@ -0,0 +1,294 @@ +/** + * Verification Closure Diagnosis Engine + * + * Pure functions for diagnosing why a verification event did or did not + * resolve routing policy. Three concerns: + * + * 1. **Local verification URL inspection** — enriched locality check that + * returns structured reasons instead of a bare boolean. + * 2. **Resolution gate evaluation** — determines eligibility with explicit + * blocking reason codes for every failure path. + * 3. **Pending exposure match diagnosis** — explains zero-match outcomes + * (route mismatch, story mismatch, missing scope, etc.). + * + * All functions are side-effect-free: they read from arguments or env vars + * and return deterministic, JSON-serializable results. No ledger mutation. + */ + +import { loadSessionExposures, type SkillExposure } from "./routing-policy-ledger.mjs"; +import type { RoutingBoundary } from "./routing-policy.mjs"; + +// --------------------------------------------------------------------------- +// Constants +// --------------------------------------------------------------------------- + +const LOCAL_DEV_HOSTS = new Set([ + "localhost", + "127.0.0.1", + "0.0.0.0", + "::1", + "[::1]", +]); + +// --------------------------------------------------------------------------- +// Env helper +// --------------------------------------------------------------------------- + +function envString(env: NodeJS.ProcessEnv, key: string): string | null { + const value = env[key]; + return typeof value === "string" && value.trim() !== "" ? value.trim() : null; +} + +// --------------------------------------------------------------------------- +// Local Verification URL Inspection +// --------------------------------------------------------------------------- + +export interface LocalVerificationInspection { + /** Whether this inspection applies (true for any URL-bearing tool). */ + applicable: boolean; + /** Whether the URL could be parsed. */ + parseable: boolean; + /** Whether the URL targets a local dev server, or null if unparseable. */ + isLocal: boolean | null; + /** The observed host (including port), or null if unparseable. */ + observedHost: string | null; + /** The VERCEL_PLUGIN_LOCAL_DEV_ORIGIN value, if set. */ + configuredOrigin: string | null; + /** How locality was determined: loopback, configured-origin, or null. */ + matchSource: "loopback" | "configured-origin" | null; +} + +/** + * Inspect a URL for local-dev-server locality. + * + * Returns a structured inspection with explicit reasons instead of a + * bare boolean, so callers (and agents) can understand *why* a URL + * was classified as local or remote. + */ +export function inspectLocalVerificationUrl( + rawUrl: string, + env: NodeJS.ProcessEnv = process.env, +): LocalVerificationInspection { + const configuredOrigin = envString(env, "VERCEL_PLUGIN_LOCAL_DEV_ORIGIN"); + + try { + const url = new URL(rawUrl); + const observedHost = url.host.toLowerCase(); + + if (url.protocol !== "http:" && url.protocol !== "https:") { + return { + applicable: true, + parseable: true, + isLocal: false, + observedHost, + configuredOrigin, + matchSource: null, + }; + } + + if (LOCAL_DEV_HOSTS.has(url.hostname.toLowerCase())) { + return { + applicable: true, + parseable: true, + isLocal: true, + observedHost, + configuredOrigin, + matchSource: "loopback", + }; + } + + if (configuredOrigin) { + try { + const configured = new URL(configuredOrigin); + if (configured.host.toLowerCase() === observedHost) { + return { + applicable: true, + parseable: true, + isLocal: true, + observedHost, + configuredOrigin, + matchSource: "configured-origin", + }; + } + } catch { + // configured origin is itself unparseable — fall through to remote + } + } + + return { + applicable: true, + parseable: true, + isLocal: false, + observedHost, + configuredOrigin, + matchSource: null, + }; + } catch { + return { + applicable: true, + parseable: false, + isLocal: null, + observedHost: null, + configuredOrigin, + matchSource: null, + }; + } +} + +// --------------------------------------------------------------------------- +// Resolution Gate Evaluation +// --------------------------------------------------------------------------- + +export interface ResolutionGateEvaluation { + /** Whether the event is eligible to resolve routing policy. */ + eligible: boolean; + /** Checks that passed (for observability). */ + passedChecks: string[]; + /** Reason codes that blocked resolution (empty when eligible). */ + blockingReasonCodes: string[]; + /** Locality inspection (applicable only for WebFetch). */ + locality: LocalVerificationInspection; +} + +/** + * Evaluate whether a verification event should resolve long-term routing + * policy outcomes. Returns structured gate results with explicit blocking + * reason codes for every failure path. + */ +export function evaluateResolutionGate( + event: { + boundary: RoutingBoundary | "unknown"; + signalStrength: "strong" | "soft"; + toolName: string; + command: string; + }, + env: NodeJS.ProcessEnv = process.env, +): ResolutionGateEvaluation { + const passedChecks: string[] = []; + const blockingReasonCodes: string[] = []; + + // Check 1: known boundary + if (event.boundary === "unknown") { + blockingReasonCodes.push("unknown_boundary"); + } else { + passedChecks.push("known_boundary"); + } + + // Check 2: strong signal + if (event.signalStrength !== "strong") { + blockingReasonCodes.push("soft_signal"); + } else { + passedChecks.push("strong_signal"); + } + + // Check 3: WebFetch locality + let locality: LocalVerificationInspection = { + applicable: false, + parseable: true, + isLocal: null, + observedHost: null, + configuredOrigin: envString(env, "VERCEL_PLUGIN_LOCAL_DEV_ORIGIN"), + matchSource: null, + }; + + if (event.toolName === "WebFetch") { + locality = inspectLocalVerificationUrl(event.command, env); + + if (!locality.parseable) { + blockingReasonCodes.push("invalid_web_fetch_url"); + } else if (locality.isLocal !== true) { + blockingReasonCodes.push("remote_web_fetch"); + } else { + passedChecks.push("local_verification_url"); + } + } + + return { + eligible: blockingReasonCodes.length === 0, + passedChecks, + blockingReasonCodes, + locality, + }; +} + +// --------------------------------------------------------------------------- +// Pending Exposure Match Diagnosis +// --------------------------------------------------------------------------- + +export interface PendingExposureMatchDiagnosis { + /** Total pending exposures across all boundaries in this session. */ + pendingTotal: number; + /** Pending exposures matching the target boundary. */ + pendingBoundaryCount: number; + /** Exact matches (same boundary + story + route). */ + exactMatchCount: number; + /** IDs of exact-match exposures. */ + exactMatchExposureIds: string[]; + /** IDs of exposures with same story but different route. */ + sameStoryDifferentRouteExposureIds: string[]; + /** IDs of exposures with same route but different story. */ + sameRouteDifferentStoryExposureIds: string[]; + /** Reason codes explaining why no exact match was found. */ + unresolvedReasonCodes: string[]; +} + +/** + * Diagnose why pending exposures did or did not match the observed + * boundary event. Returns structured match analysis so agents and + * humans can understand zero-match outcomes. + * + * Pure: reads exposures from the provided array or loads them from the + * session ledger, but never mutates ledger state. + */ +export function diagnosePendingExposureMatch(params: { + sessionId: string; + boundary: RoutingBoundary; + storyId: string | null; + route: string | null; + exposures?: SkillExposure[]; +}): PendingExposureMatchDiagnosis { + const exposures = + params.exposures ?? loadSessionExposures(params.sessionId); + + const pending = exposures.filter( + (e) => e.sessionId === params.sessionId && e.outcome === "pending", + ); + + const pendingBoundary = pending.filter( + (e) => e.targetBoundary === params.boundary, + ); + + const exact = pendingBoundary.filter( + (e) => e.storyId === params.storyId && e.route === params.route, + ); + + const sameStoryDifferentRoute = pendingBoundary.filter( + (e) => e.storyId === params.storyId && e.route !== params.route, + ); + + const sameRouteDifferentStory = pendingBoundary.filter( + (e) => e.route === params.route && e.storyId !== params.storyId, + ); + + const unresolvedReasonCodes: string[] = []; + + if (pendingBoundary.length === 0) { + unresolvedReasonCodes.push("no_pending_for_boundary"); + } else if (exact.length === 0) { + if (params.storyId === null) unresolvedReasonCodes.push("missing_story_scope"); + if (params.route === null) unresolvedReasonCodes.push("missing_route_scope"); + if (sameStoryDifferentRoute.length > 0) unresolvedReasonCodes.push("route_mismatch"); + if (sameRouteDifferentStory.length > 0) unresolvedReasonCodes.push("story_mismatch"); + if (unresolvedReasonCodes.length === 0) unresolvedReasonCodes.push("no_exact_pending_match"); + } + + return { + pendingTotal: pending.length, + pendingBoundaryCount: pendingBoundary.length, + exactMatchCount: exact.length, + exactMatchExposureIds: exact.map((e) => e.id), + sameStoryDifferentRouteExposureIds: sameStoryDifferentRoute.map((e) => e.id), + sameRouteDifferentStoryExposureIds: sameRouteDifferentStory.map((e) => e.id), + unresolvedReasonCodes, + }; +} diff --git a/hooks/src/verification-directive.mts b/hooks/src/verification-directive.mts new file mode 100644 index 0000000..81fac69 --- /dev/null +++ b/hooks/src/verification-directive.mts @@ -0,0 +1,192 @@ +/** + * Shared Verification Directive Contract + * + * Extracts the verification directive, env builder, and runtime state resolver + * from subagent-start-bootstrap so that top-level hooks and subagents consume + * the same contract. The directive includes `route` alongside story/boundary/action. + * + * Key guarantees: + * - buildVerificationEnv(null) deterministically returns clearing values for all + * four env keys (STORY_ID, ROUTE, BOUNDARY, ACTION). + * - resolveVerificationRuntimeState is idempotent and safe to retry. + * - All state transitions emit structured log lines with sessionId context. + */ + +import { + computePlan, + formatVerificationBanner, + loadCachedPlanResult, + selectActiveStory, + type ComputePlanOptions, + type VerificationPlanResult, +} from "./verification-plan.mjs"; +import { createLogger, logCaughtError, type Logger } from "./logger.mjs"; + +// --------------------------------------------------------------------------- +// Directive contract +// --------------------------------------------------------------------------- + +export interface VerificationDirective { + version: 1; + storyId: string; + storyKind: string; + route: string | null; + missingBoundaries: string[]; + satisfiedBoundaries: string[]; + primaryNextAction: VerificationPlanResult["primaryNextAction"]; + blockedReasons: string[]; +} + +export interface VerificationRuntimeState { + plan: VerificationPlanResult | null; + directive: VerificationDirective | null; + banner: string | null; + env: Record; +} + +// --------------------------------------------------------------------------- +// Directive builder +// --------------------------------------------------------------------------- + +export function buildVerificationDirective( + plan: VerificationPlanResult | null, +): VerificationDirective | null { + if (!plan?.hasStories || plan.stories.length === 0) return null; + + const story = selectActiveStory(plan); + if (!story) return null; + + return { + version: 1, + storyId: story.id, + storyKind: story.kind, + route: story.route, + missingBoundaries: [...plan.missingBoundaries], + satisfiedBoundaries: [...plan.satisfiedBoundaries], + primaryNextAction: plan.primaryNextAction, + blockedReasons: [...plan.blockedReasons], + }; +} + +// --------------------------------------------------------------------------- +// Env builder — deterministic clearing when directive is null +// --------------------------------------------------------------------------- + +/** + * Build environment variables from a verification directive. + * When directive is null or has no primary action, returns empty-string + * clearing values for all four keys so stale env cannot bleed across + * tool calls. + */ +export function buildVerificationEnv( + directive: VerificationDirective | null, +): Record { + if (!directive?.primaryNextAction) { + return { + VERCEL_PLUGIN_VERIFICATION_STORY_ID: "", + VERCEL_PLUGIN_VERIFICATION_ROUTE: "", + VERCEL_PLUGIN_VERIFICATION_BOUNDARY: "", + VERCEL_PLUGIN_VERIFICATION_ACTION: "", + }; + } + + return { + VERCEL_PLUGIN_VERIFICATION_STORY_ID: directive.storyId, + VERCEL_PLUGIN_VERIFICATION_ROUTE: directive.route ?? "", + VERCEL_PLUGIN_VERIFICATION_BOUNDARY: directive.primaryNextAction.targetBoundary, + VERCEL_PLUGIN_VERIFICATION_ACTION: directive.primaryNextAction.action, + }; +} + +// --------------------------------------------------------------------------- +// Runtime state resolver +// --------------------------------------------------------------------------- + +/** + * Resolve the full verification runtime state for a session. + * Tries cached plan first, falls back to fresh computation. + * Idempotent and safe to retry — no mutations. + * + * Emits structured log lines at each resolution checkpoint: + * - verification-directive.resolve-start + * - verification-directive.cache-hit / cache-miss + * - verification-directive.fresh-computed / fresh-empty + * - verification-directive.resolve-complete + * - verification-directive.resolve-failed (on error) + */ +export function resolveVerificationRuntimeState( + sessionId: string | null | undefined, + options?: ComputePlanOptions, + logger?: Logger, +): VerificationRuntimeState { + const log = logger ?? createLogger(); + + if (!sessionId) { + log.debug("verification-directive.resolve-start", { + sessionId: null, + reason: "no-session", + }); + return { + plan: null, + directive: null, + banner: null, + env: buildVerificationEnv(null), + }; + } + + log.debug("verification-directive.resolve-start", { sessionId }); + + try { + let plan = loadCachedPlanResult(sessionId, log); + if (plan?.hasStories) { + log.debug("verification-directive.cache-hit", { + sessionId, + storyCount: plan.stories.length, + }); + } else { + log.debug("verification-directive.cache-miss", { sessionId }); + plan = computePlan(sessionId, options, log); + } + + if (!plan?.hasStories) { + log.debug("verification-directive.fresh-empty", { sessionId }); + return { + plan: null, + directive: null, + banner: null, + env: buildVerificationEnv(null), + }; + } + + log.debug("verification-directive.fresh-computed", { + sessionId, + storyCount: plan.stories.length, + missingBoundaries: plan.missingBoundaries, + }); + + const directive = buildVerificationDirective(plan); + const env = buildVerificationEnv(directive); + const banner = formatVerificationBanner(plan); + + log.summary("verification-directive.resolve-complete", { + sessionId, + storyId: directive?.storyId ?? null, + route: directive?.route ?? null, + hasDirective: directive !== null, + hasBanner: banner !== null, + envCleared: !directive?.primaryNextAction, + }); + + return { plan, directive, banner, env }; + } catch (error) { + logCaughtError(log, "verification-directive.resolve-failed", error, { + sessionId, + }); + return { + plan: null, + directive: null, + banner: null, + env: buildVerificationEnv(null), + }; + } +} diff --git a/hooks/src/verification-ledger.mts b/hooks/src/verification-ledger.mts new file mode 100644 index 0000000..204618a --- /dev/null +++ b/hooks/src/verification-ledger.mts @@ -0,0 +1,965 @@ +/** + * Verification Ledger: append-only observation log with deterministic state derivation. + * + * Provides the state model for an evidence-backed verification planner. + * Observations are appended to a JSONL ledger; a compact derived state + * snapshot is recomputed deterministically from the ordered trace. + * + * All functions are pure (except persistence I/O) and idempotent: + * - Appending the same observation twice (by id) is a no-op. + * - Replaying the same ordered trace produces byte-for-byte identical state JSON. + * + * Persistence: JSONL ledger + compact JSON state under session temp storage. + */ + +import { + appendFileSync, + mkdirSync, + readFileSync, + rmSync, + writeFileSync, +} from "node:fs"; +import { dirname, join } from "node:path"; +import { tmpdir } from "node:os"; +import { createHash } from "node:crypto"; +import { createLogger, logCaughtError, type Logger } from "./logger.mjs"; + +// --------------------------------------------------------------------------- +// Types +// --------------------------------------------------------------------------- + +export type VerificationBoundary = + | "uiRender" + | "clientRequest" + | "serverHandler" + | "environment"; + +export type VerificationStoryKind = + | "flow-verification" + | "stuck-investigation" + | "browser-only"; + +export interface VerificationObservationMeta { + matchedPattern?: string; + suggestedBoundary?: string | null; + suggestedAction?: string | null; + matchedSuggestedAction?: boolean; + toolName?: string; + signalStrength?: "strong" | "soft"; + evidenceSource?: + | "bash" + | "browser" + | "http" + | "log-read" + | "env-read" + | "file-read" + | "unknown"; + [key: string]: unknown; +} + +export interface VerificationObservation { + /** Unique observation id — used for dedup on append. */ + id: string; + /** ISO-8601 timestamp. */ + timestamp: string; + /** Source hook or subsystem that produced this observation. */ + source: "bash" | "prompt" | "edit" | "subagent"; + /** Classified verification boundary (null if not boundary-related). */ + boundary: VerificationBoundary | null; + /** Inferred route from recent edits or command URL. */ + route: string | null; + /** Story this observation belongs to (null = unattached). */ + storyId: string | null; + /** Redacted/truncated command or prompt excerpt. */ + summary: string; + /** Structured observation metadata from the verification signal classifier. */ + meta?: VerificationObservationMeta; +} + +export interface VerificationStory { + /** Stable story id — derived from kind + route. */ + id: string; + /** Classification of the verification scenario. */ + kind: VerificationStoryKind; + /** Target route (may be null for global stories). */ + route: string | null; + /** Prompt excerpt that initiated this story. */ + promptExcerpt: string; + /** ISO-8601 timestamp of story creation. */ + createdAt: string; + /** ISO-8601 timestamp of last update. */ + updatedAt: string; + /** Skills already selected for this story. */ + requestedSkills: string[]; +} + +export interface VerificationNextAction { + /** Human-readable action description. */ + action: string; + /** Which boundary this action targets. */ + targetBoundary: VerificationBoundary; + /** Confidence explanation. */ + reason: string; +} + +export interface VerificationStoryState { + storyId: string; + storyKind: VerificationStoryKind; + route: string | null; + observationIds: string[]; + satisfiedBoundaries: VerificationBoundary[]; + missingBoundaries: VerificationBoundary[]; + recentRoutes: string[]; + primaryNextAction: VerificationNextAction | null; + blockedReasons: string[]; + lastObservedAt: string | null; +} + +export interface VerificationPlan { + /** Active stories (keyed by story id in the map, array in plan). */ + stories: VerificationStory[]; + /** All observations in append order. */ + observations: VerificationObservation[]; + /** Set of observation ids (for fast dedup). */ + observationIds: Set; + /** Per-story derived state. */ + storyStates: Record; + /** Active story id (most recently updated). */ + activeStoryId: string | null; + /** Boundaries that have been satisfied (active-story projection). */ + satisfiedBoundaries: Set; + /** Boundaries still missing evidence (active-story projection). */ + missingBoundaries: VerificationBoundary[]; + /** Most recent routes observed (active-story projection). */ + recentRoutes: string[]; + /** Primary next action (active-story projection). */ + primaryNextAction: VerificationNextAction | null; + /** Reasons why certain actions were blocked (active-story projection). */ + blockedReasons: string[]; +} + +// --------------------------------------------------------------------------- +// Constants +// --------------------------------------------------------------------------- + +const ALL_BOUNDARIES: VerificationBoundary[] = [ + "uiRender", + "clientRequest", + "serverHandler", + "environment", +]; + +const SAFE_SESSION_ID_RE = /^[a-zA-Z0-9_-]+$/; + +// --------------------------------------------------------------------------- +// Pure state derivation +// --------------------------------------------------------------------------- + +/** + * Resolve which story an observation belongs to. + * Prefers explicit storyId, then exact route match, then null. + */ +export function resolveObservationStoryId( + observation: VerificationObservation, + stories: VerificationStory[], +): string | null { + if (observation.storyId) return observation.storyId; + if (observation.route) { + const exactMatches = stories.filter((story) => story.route === observation.route); + if (exactMatches.length === 1) { + return exactMatches[0]!.id; + } + } + // Fallback: if exactly one story exists, attribute to it + if (stories.length === 1) { + return stories[0]!.id; + } + return null; +} + +/** + * Collect recent routes from observations, most recent first. + */ +export function collectRecentRoutes( + observations: VerificationObservation[], +): string[] { + const sorted = [...observations].sort( + (a, b) => Date.parse(b.timestamp) - Date.parse(a.timestamp), + ); + const seen = new Set(); + const routes: string[] = []; + for (const observation of sorted) { + if (!observation.route) continue; + if (seen.has(observation.route)) continue; + seen.add(observation.route); + routes.push(observation.route); + } + return routes; +} + +/** + * Derive per-story boundary state from observations. + */ +export function deriveStoryStates( + observations: VerificationObservation[], + stories: VerificationStory[], + options?: { + agentBrowserAvailable?: boolean; + devServerLoopGuardHit?: boolean; + lastAttemptedAction?: string | null; + staleThresholdMs?: number; + }, +): Record { + const opts = { + agentBrowserAvailable: true, + devServerLoopGuardHit: false, + lastAttemptedAction: null as string | null, + staleThresholdMs: 5 * 60 * 1000, + ...options, + }; + + const states: Record = {}; + + // Initialize empty state for every story + for (const story of stories) { + states[story.id] = { + storyId: story.id, + storyKind: story.kind, + route: story.route, + observationIds: [], + satisfiedBoundaries: [], + missingBoundaries: [...ALL_BOUNDARIES], + recentRoutes: story.route ? [story.route] : [], + primaryNextAction: null, + blockedReasons: [], + lastObservedAt: null, + }; + } + + // Group observations by resolved story + for (const obs of observations) { + const resolvedStoryId = resolveObservationStoryId(obs, stories); + if (!resolvedStoryId || !states[resolvedStoryId]) continue; + + const state = states[resolvedStoryId]!; + state.observationIds.push(obs.id); + if (obs.boundary && !state.satisfiedBoundaries.includes(obs.boundary)) { + state.satisfiedBoundaries.push(obs.boundary); + } + if (obs.route && !state.recentRoutes.includes(obs.route)) { + state.recentRoutes.push(obs.route); + } + if (!state.lastObservedAt || Date.parse(obs.timestamp) > Date.parse(state.lastObservedAt)) { + state.lastObservedAt = obs.timestamp; + } + } + + // Compute missing boundaries and next action per story + for (const story of stories) { + const state = states[story.id]!; + const satisfiedSet = new Set(state.satisfiedBoundaries); + state.missingBoundaries = ALL_BOUNDARIES.filter((b) => !satisfiedSet.has(b)); + + const { primaryNextAction, blockedReasons } = computeNextAction( + state.missingBoundaries, + [story], + state.recentRoutes, + opts, + ); + state.primaryNextAction = primaryNextAction; + state.blockedReasons = blockedReasons; + } + + return states; +} + +/** + * Select the active story id — prefers most recently updated, then created. + */ +export function selectActiveStoryId( + stories: VerificationStory[], + storyStates: Record, +): string | null { + if (stories.length === 0) return null; + + // Sort: prefer stories with missing boundaries (incomplete first), + // then most recently updated + const sorted = [...stories].sort((a, b) => { + const stateA = storyStates[a.id]; + const stateB = storyStates[b.id]; + const missingA = stateA ? stateA.missingBoundaries.length : 0; + const missingB = stateB ? stateB.missingBoundaries.length : 0; + + // Incomplete stories first (more missing = higher priority) + if (missingA !== missingB) return missingB - missingA; + + // Then most recently updated + const updatedDiff = Date.parse(b.updatedAt) - Date.parse(a.updatedAt); + if (Number.isFinite(updatedDiff) && updatedDiff !== 0) return updatedDiff; + + const createdDiff = Date.parse(b.createdAt) - Date.parse(a.createdAt); + if (Number.isFinite(createdDiff) && createdDiff !== 0) return createdDiff; + + return a.id.localeCompare(b.id); + }); + + return sorted[0]!.id; +} + +/** + * Derive a VerificationPlan from ordered observations and stories. + * This is a pure function — same inputs always produce identical output. + * + * Top-level fields (satisfiedBoundaries, missingBoundaries, recentRoutes, + * primaryNextAction, blockedReasons) are the active-story projection. + */ +export function derivePlan( + observations: VerificationObservation[], + stories: VerificationStory[], + options?: { + agentBrowserAvailable?: boolean; + devServerLoopGuardHit?: boolean; + lastAttemptedAction?: string | null; + staleThresholdMs?: number; + }, +): VerificationPlan { + // Build dedup set + const observationIds = new Set(); + const deduped: VerificationObservation[] = []; + for (const obs of observations) { + if (!observationIds.has(obs.id)) { + observationIds.add(obs.id); + deduped.push(obs); + } + } + + // Derive per-story state + const storyStates = deriveStoryStates(deduped, stories, options); + const activeStoryId = selectActiveStoryId(stories, storyStates); + + // Project active story state to top-level fields + const activeState = activeStoryId ? storyStates[activeStoryId] : null; + + const satisfiedBoundaries = new Set( + activeState ? activeState.satisfiedBoundaries : [], + ); + const missingBoundaries = activeState ? activeState.missingBoundaries : ( + stories.length > 0 ? [...ALL_BOUNDARIES] : [] + ); + const recentRoutes = activeState ? activeState.recentRoutes : []; + const primaryNextAction = activeState ? activeState.primaryNextAction : null; + const blockedReasons = activeState ? activeState.blockedReasons : []; + + return { + stories: [...stories], + observations: deduped, + observationIds, + storyStates, + activeStoryId, + satisfiedBoundaries, + missingBoundaries, + recentRoutes, + primaryNextAction, + blockedReasons, + }; +} + +/** + * Compute the single best next verification action. + */ +function computeNextAction( + missingBoundaries: VerificationBoundary[], + stories: VerificationStory[], + recentRoutes: string[], + opts: { + agentBrowserAvailable: boolean; + devServerLoopGuardHit: boolean; + lastAttemptedAction: string | null; + }, +): { primaryNextAction: VerificationNextAction | null; blockedReasons: string[] } { + const blockedReasons: string[] = []; + + if (stories.length === 0) { + return { primaryNextAction: null, blockedReasons }; + } + + if (missingBoundaries.length === 0) { + return { primaryNextAction: null, blockedReasons }; + } + + const route = recentRoutes[recentRoutes.length - 1] ?? null; + const routeSuffix = route ? ` ${route}` : ""; + + // Priority order for boundary actions + const ACTION_MAP: Record VerificationNextAction | null> = { + clientRequest: () => ({ + action: `curl http://localhost:3000${route ?? "/"}`, + targetBoundary: "clientRequest", + reason: "No HTTP request observation yet — verify the endpoint responds", + }), + serverHandler: () => ({ + action: `tail server logs${routeSuffix}`, + targetBoundary: "serverHandler", + reason: "No server-side observation yet — check logs for errors", + }), + uiRender: () => { + if (!opts.agentBrowserAvailable) { + blockedReasons.push("agent-browser unavailable — cannot emit browser-only action"); + return null; + } + if (opts.devServerLoopGuardHit) { + blockedReasons.push("dev-server loop guard hit — skipping browser verification"); + return null; + } + return { + action: `open${routeSuffix || " /"} in agent-browser`, + targetBoundary: "uiRender", + reason: "No UI render observation yet — visually verify the page", + }; + }, + environment: () => ({ + action: "inspect env for required vars", + targetBoundary: "environment", + reason: "No environment observation yet — check env vars are set", + }), + }; + + // Walk boundaries in priority order + const PRIORITY_ORDER: VerificationBoundary[] = [ + "clientRequest", + "serverHandler", + "uiRender", + "environment", + ]; + + for (const boundary of PRIORITY_ORDER) { + if (!missingBoundaries.includes(boundary)) continue; + + const action = ACTION_MAP[boundary](); + if (action) { + // Suppress if this is the same as the last attempted action + if (opts.lastAttemptedAction && action.action === opts.lastAttemptedAction) { + blockedReasons.push( + `Suppressed repeat of last attempted action: ${opts.lastAttemptedAction}`, + ); + continue; + } + return { primaryNextAction: action, blockedReasons }; + } + } + + return { primaryNextAction: null, blockedReasons }; +} + +// --------------------------------------------------------------------------- +// Story helpers +// --------------------------------------------------------------------------- + +/** + * Compute a stable story id from kind + route. + */ +export function storyId(kind: VerificationStoryKind, route: string | null): string { + const input = `${kind}:${route ?? "*"}`; + return createHash("sha256").update(input).digest("hex").slice(0, 12); +} + +/** + * Create a new story or merge into an existing one. + * Returns the updated stories array (does not mutate input). + */ +export function upsertStory( + stories: VerificationStory[], + kind: VerificationStoryKind, + route: string | null, + promptExcerpt: string, + requestedSkills: string[], + now?: string, +): VerificationStory[] { + const id = storyId(kind, route); + const timestamp = now ?? new Date().toISOString(); + + const existing = stories.find((s) => s.id === id); + if (existing) { + // Merge: update timestamp, merge skills + const merged: VerificationStory = { + ...existing, + updatedAt: timestamp, + promptExcerpt: promptExcerpt || existing.promptExcerpt, + requestedSkills: Array.from( + new Set([...existing.requestedSkills, ...requestedSkills]), + ), + }; + return stories.map((s) => (s.id === id ? merged : s)); + } + + const newStory: VerificationStory = { + id, + kind, + route, + promptExcerpt, + createdAt: timestamp, + updatedAt: timestamp, + requestedSkills, + }; + return [...stories, newStory]; +} + +// --------------------------------------------------------------------------- +// Append (with dedup) +// --------------------------------------------------------------------------- + +/** + * Append an observation to an ordered list, deduplicating by id. + * Returns the new list (does not mutate input). + */ +export function appendObservation( + observations: VerificationObservation[], + observation: VerificationObservation, +): VerificationObservation[] { + if (observations.some((o) => o.id === observation.id)) { + return observations; // idempotent — same reference means no change + } + return [...observations, observation]; +} + +// --------------------------------------------------------------------------- +// Serialization (for persistence) +// --------------------------------------------------------------------------- + +export interface SerializedPlanStateV1 { + version: 1; + stories: VerificationStory[]; + observationIds: string[]; + satisfiedBoundaries: string[]; + missingBoundaries: string[]; + recentRoutes: string[]; + primaryNextAction: VerificationNextAction | null; + blockedReasons: string[]; +} + +export interface SerializedPlanStateV2 { + version: 2; + stories: VerificationStory[]; + activeStoryId: string | null; + storyStates: Array<{ + storyId: string; + storyKind: VerificationStoryKind; + route: string | null; + observationIds: string[]; + satisfiedBoundaries: VerificationBoundary[]; + missingBoundaries: VerificationBoundary[]; + recentRoutes: string[]; + primaryNextAction: VerificationNextAction | null; + blockedReasons: string[]; + lastObservedAt: string | null; + }>; + observationIds: string[]; + satisfiedBoundaries: VerificationBoundary[]; + missingBoundaries: VerificationBoundary[]; + recentRoutes: string[]; + primaryNextAction: VerificationNextAction | null; + blockedReasons: string[]; +} + +/** Union of all serialized state versions. */ +export type SerializedPlanState = SerializedPlanStateV1 | SerializedPlanStateV2; + +/** + * Normalize any serialized plan state to version 2. + * V1 state is upgraded by synthesizing one active-story entry from top-level fields. + */ +export function normalizeSerializedPlanState( + state: SerializedPlanState, +): SerializedPlanStateV2 { + if (state.version === 2) return state; + + // V1 → V2: synthesize active-story state from the flat top-level fields + const v1 = state as SerializedPlanStateV1; + + // Import selectPrimaryStory logic inline to avoid circular deps + const sorted = [...v1.stories].sort((a, b) => { + const updatedDiff = Date.parse(b.updatedAt) - Date.parse(a.updatedAt); + if (Number.isFinite(updatedDiff) && updatedDiff !== 0) return updatedDiff; + const createdDiff = Date.parse(b.createdAt) - Date.parse(a.createdAt); + if (Number.isFinite(createdDiff) && createdDiff !== 0) return createdDiff; + return a.id.localeCompare(b.id); + }); + const primaryStory = sorted[0] ?? null; + const activeStoryId = primaryStory?.id ?? null; + + const storyStates: SerializedPlanStateV2["storyStates"] = []; + if (primaryStory) { + storyStates.push({ + storyId: primaryStory.id, + storyKind: primaryStory.kind, + route: primaryStory.route, + observationIds: [...v1.observationIds], + satisfiedBoundaries: v1.satisfiedBoundaries as VerificationBoundary[], + missingBoundaries: v1.missingBoundaries as VerificationBoundary[], + recentRoutes: v1.recentRoutes, + primaryNextAction: v1.primaryNextAction, + blockedReasons: v1.blockedReasons, + lastObservedAt: null, + }); + } + + // Add empty entries for non-active stories + for (const story of v1.stories) { + if (story.id === activeStoryId) continue; + storyStates.push({ + storyId: story.id, + storyKind: story.kind, + route: story.route, + observationIds: [], + satisfiedBoundaries: [], + missingBoundaries: ALL_BOUNDARIES as VerificationBoundary[], + recentRoutes: story.route ? [story.route] : [], + primaryNextAction: null, + blockedReasons: [], + lastObservedAt: null, + }); + } + + return { + version: 2, + stories: v1.stories, + activeStoryId, + storyStates, + observationIds: v1.observationIds, + satisfiedBoundaries: v1.satisfiedBoundaries as VerificationBoundary[], + missingBoundaries: v1.missingBoundaries as VerificationBoundary[], + recentRoutes: v1.recentRoutes, + primaryNextAction: v1.primaryNextAction, + blockedReasons: v1.blockedReasons, + }; +} + +/** + * Serialize a VerificationPlan to a deterministic JSON string (version 2). + * Sets and arrays are sorted for byte-for-byte reproducibility. + */ +export function serializePlanState(plan: VerificationPlan): string { + const storyStates: SerializedPlanStateV2["storyStates"] = []; + for (const story of plan.stories) { + const ss = plan.storyStates[story.id]; + if (ss) { + storyStates.push({ + storyId: ss.storyId, + storyKind: ss.storyKind, + route: ss.route, + observationIds: [...ss.observationIds].sort(), + satisfiedBoundaries: [...ss.satisfiedBoundaries].sort() as VerificationBoundary[], + missingBoundaries: [...ss.missingBoundaries].sort() as VerificationBoundary[], + recentRoutes: ss.recentRoutes, + primaryNextAction: ss.primaryNextAction, + blockedReasons: ss.blockedReasons, + lastObservedAt: ss.lastObservedAt, + }); + } + } + + const state: SerializedPlanStateV2 = { + version: 2, + stories: plan.stories, + activeStoryId: plan.activeStoryId, + storyStates, + observationIds: Array.from(plan.observationIds).sort(), + satisfiedBoundaries: Array.from(plan.satisfiedBoundaries).sort() as VerificationBoundary[], + missingBoundaries: [...plan.missingBoundaries].sort() as VerificationBoundary[], + recentRoutes: plan.recentRoutes, + primaryNextAction: plan.primaryNextAction, + blockedReasons: plan.blockedReasons, + }; + return JSON.stringify(state, null, 2); +} + +// --------------------------------------------------------------------------- +// Persistence (JSONL ledger + compact state) +// --------------------------------------------------------------------------- + +function sessionIdSegment(sessionId: string): string { + if (SAFE_SESSION_ID_RE.test(sessionId)) return sessionId; + return createHash("sha256").update(sessionId).digest("hex"); +} + +function ledgerDir(sessionId: string): string { + return join(tmpdir(), `vercel-plugin-${sessionIdSegment(sessionId)}-ledger`); +} + +export function ledgerPath(sessionId: string): string { + return join(ledgerDir(sessionId), "observations.jsonl"); +} + +export function storiesPath(sessionId: string): string { + return join(ledgerDir(sessionId), "stories.json"); +} + +export function statePath(sessionId: string): string { + return join(ledgerDir(sessionId), "state.json"); +} + +/** + * Persist an observation to the session JSONL ledger. + * Idempotent — duplicate ids are skipped at derive time. + */ +export function persistObservation( + sessionId: string, + observation: VerificationObservation, + logger?: Logger, +): void { + const log = logger ?? createLogger(); + const dir = ledgerDir(sessionId); + try { + mkdirSync(dir, { recursive: true }); + const line = JSON.stringify(observation) + "\n"; + appendFileSync(ledgerPath(sessionId), line, "utf-8"); + log.summary("verification-ledger.observation_persisted", { + observationId: observation.id, + boundary: observation.boundary, + source: observation.source, + }); + } catch (error) { + logCaughtError(log, "verification-ledger.persist_observation_failed", error, { + sessionId, + observationId: observation.id, + }); + } +} + +/** + * Persist stories to the session storage. + */ +export function persistStories( + sessionId: string, + stories: VerificationStory[], + logger?: Logger, +): void { + const log = logger ?? createLogger(); + const dir = ledgerDir(sessionId); + try { + mkdirSync(dir, { recursive: true }); + writeFileSync(storiesPath(sessionId), JSON.stringify(stories, null, 2), "utf-8"); + log.summary("verification-ledger.stories_persisted", { + storyCount: stories.length, + }); + } catch (error) { + logCaughtError(log, "verification-ledger.persist_stories_failed", error, { + sessionId, + }); + } +} + +/** + * Persist derived plan state to the session snapshot file. + */ +export function persistPlanState( + sessionId: string, + plan: VerificationPlan, + logger?: Logger, +): void { + const log = logger ?? createLogger(); + const dir = ledgerDir(sessionId); + try { + mkdirSync(dir, { recursive: true }); + writeFileSync(statePath(sessionId), serializePlanState(plan), "utf-8"); + log.summary("verification-ledger.state_persisted", { + observationCount: plan.observations.length, + storyCount: plan.stories.length, + missingBoundaries: plan.missingBoundaries, + }); + } catch (error) { + logCaughtError(log, "verification-ledger.persist_state_failed", error, { + sessionId, + }); + } +} + +/** + * Load observations from the session JSONL ledger. + */ +export function loadObservations( + sessionId: string, + logger?: Logger, +): VerificationObservation[] { + const log = logger ?? createLogger(); + try { + const content = readFileSync(ledgerPath(sessionId), "utf-8"); + const lines = content.split("\n").filter((l) => l.trim() !== ""); + return lines.map((line) => JSON.parse(line) as VerificationObservation); + } catch (error) { + if ( + typeof error === "object" && + error !== null && + "code" in error && + (error as { code?: string }).code === "ENOENT" + ) { + return []; // no ledger yet + } + logCaughtError(log, "verification-ledger.load_observations_failed", error, { + sessionId, + }); + return []; + } +} + +/** + * Load stories from the session storage. + */ +export function loadStories( + sessionId: string, + logger?: Logger, +): VerificationStory[] { + const log = logger ?? createLogger(); + try { + const content = readFileSync(storiesPath(sessionId), "utf-8"); + return JSON.parse(content) as VerificationStory[]; + } catch (error) { + if ( + typeof error === "object" && + error !== null && + "code" in error && + (error as { code?: string }).code === "ENOENT" + ) { + return []; + } + logCaughtError(log, "verification-ledger.load_stories_failed", error, { + sessionId, + }); + return []; + } +} + +/** + * Load the derived plan state from the session snapshot. + * Always normalizes to V2 format for consumers. + */ +export function loadPlanState( + sessionId: string, + logger?: Logger, +): SerializedPlanStateV2 | null { + const log = logger ?? createLogger(); + try { + const content = readFileSync(statePath(sessionId), "utf-8"); + const raw = JSON.parse(content) as SerializedPlanState; + const normalized = normalizeSerializedPlanState(raw); + if (raw.version !== normalized.version) { + log.summary("verification-ledger.state_normalized", { + sessionId, + fromVersion: raw.version, + toVersion: normalized.version, + }); + } + return normalized; + } catch (error) { + if ( + typeof error === "object" && + error !== null && + "code" in error && + (error as { code?: string }).code === "ENOENT" + ) { + return null; + } + logCaughtError(log, "verification-ledger.load_state_failed", error, { + sessionId, + }); + return null; + } +} + +// --------------------------------------------------------------------------- +// Full cycle: append → derive → persist +// --------------------------------------------------------------------------- + +/** + * Append an observation, re-derive state, and persist everything. + * Returns the updated plan. Idempotent by observation id. + */ +export function recordObservation( + sessionId: string, + observation: VerificationObservation, + options?: { + agentBrowserAvailable?: boolean; + devServerLoopGuardHit?: boolean; + lastAttemptedAction?: string | null; + }, + logger?: Logger, +): VerificationPlan { + const log = logger ?? createLogger(); + + // Load current state + const existingObservations = loadObservations(sessionId, log); + const stories = loadStories(sessionId, log); + + // Append (dedup by id) + const observations = appendObservation(existingObservations, observation); + + // Keep the append-only ledger idempotent by observation id. + if (observations !== existingObservations) { + persistObservation(sessionId, observation, log); + } + + // Derive + const plan = derivePlan(observations, stories, options); + + // Persist state + persistPlanState(sessionId, plan, log); + + return plan; +} + +/** + * Create or update a story, re-derive state, and persist. + * Returns the updated plan. + */ +export function recordStory( + sessionId: string, + kind: VerificationStoryKind, + route: string | null, + promptExcerpt: string, + requestedSkills: string[], + options?: { + agentBrowserAvailable?: boolean; + devServerLoopGuardHit?: boolean; + lastAttemptedAction?: string | null; + }, + logger?: Logger, +): VerificationPlan { + const log = logger ?? createLogger(); + + const observations = loadObservations(sessionId, log); + let stories = loadStories(sessionId, log); + + // Upsert story + stories = upsertStory(stories, kind, route, promptExcerpt, requestedSkills); + + // Persist stories + persistStories(sessionId, stories, log); + + // Derive + const plan = derivePlan(observations, stories, options); + + // Persist state + persistPlanState(sessionId, plan, log); + + return plan; +} + +// --------------------------------------------------------------------------- +// Cleanup +// --------------------------------------------------------------------------- + +/** + * Remove all ledger artifacts for a session. + */ +export function removeLedgerArtifacts(sessionId: string, logger?: Logger): void { + const log = logger ?? createLogger(); + const dir = ledgerDir(sessionId); + try { + rmSync(dir, { recursive: true, force: true }); + log.summary("verification-ledger.artifacts_removed", { sessionId }); + } catch (error) { + logCaughtError(log, "verification-ledger.remove_artifacts_failed", error, { + sessionId, + }); + } +} diff --git a/hooks/src/verification-plan.mts b/hooks/src/verification-plan.mts new file mode 100644 index 0000000..c1d4a15 --- /dev/null +++ b/hooks/src/verification-plan.mts @@ -0,0 +1,415 @@ +/** + * Verification Plan: compute a single ranked next verification action + * from ledger state and surface it for hooks and CLI. + * + * This module is the bridge between the raw verification ledger + * (observations + stories) and the surfaces that consume the plan + * (PreToolUse banner, CLI command, subagent bootstrap). + * + * All public functions are pure or read-only — they load ledger state + * and derive a plan but never mutate it. + */ + +import { + type VerificationPlan, + type VerificationNextAction, + type VerificationStoryState, + type SerializedPlanStateV2, + derivePlan, + loadObservations, + loadStories, + loadPlanState, + serializePlanState, +} from "./verification-ledger.mjs"; +import { createLogger, type Logger } from "./logger.mjs"; + +// --------------------------------------------------------------------------- +// Public plan result (JSON-serializable, shared by CLI and hooks) +// --------------------------------------------------------------------------- + +export interface VerificationPlanStorySummary { + id: string; + kind: string; + route: string | null; + promptExcerpt: string; + createdAt: string; + updatedAt: string; +} + +export interface VerificationPlanStoryStateSummary { + storyId: string; + storyKind: string; + route: string | null; + observationIds: string[]; + satisfiedBoundaries: string[]; + missingBoundaries: string[]; + recentRoutes: string[]; + primaryNextAction: VerificationNextAction | null; + blockedReasons: string[]; + lastObservedAt: string | null; +} + +export interface VerificationPlanResult { + /** Whether any verification stories exist. */ + hasStories: boolean; + /** Active story id. */ + activeStoryId: string | null; + /** Active story summaries. */ + stories: VerificationPlanStorySummary[]; + /** Per-story state summaries. */ + storyStates: VerificationPlanStoryStateSummary[]; + /** Total observation count. */ + observationCount: number; + /** Boundaries with at least one observation (active-story projection). */ + satisfiedBoundaries: string[]; + /** Boundaries still missing evidence (active-story projection). */ + missingBoundaries: string[]; + /** Recent routes observed (active-story projection). */ + recentRoutes: string[]; + /** The single best next verification action (active-story projection). */ + primaryNextAction: VerificationNextAction | null; + /** Reasons certain actions were blocked (active-story projection). */ + blockedReasons: string[]; +} + +// --------------------------------------------------------------------------- +// Deterministic story selection +// --------------------------------------------------------------------------- + +/** + * Select the primary story from a list of summaries. + * Prefers the most recently updated story, breaking ties by createdAt + * (newest first), then by id (lexicographic ascending) for full determinism. + */ +export function selectPrimaryStory( + stories: VerificationPlanStorySummary[], +): VerificationPlanStorySummary | null { + if (stories.length === 0) return null; + + return [...stories].sort((a, b) => { + const updatedDiff = Date.parse(b.updatedAt) - Date.parse(a.updatedAt); + if (Number.isFinite(updatedDiff) && updatedDiff !== 0) return updatedDiff; + + const createdDiff = Date.parse(b.createdAt) - Date.parse(a.createdAt); + if (Number.isFinite(createdDiff) && createdDiff !== 0) return createdDiff; + + return a.id.localeCompare(b.id); + })[0]; +} + +export function selectActiveStory( + result: Pick, +): VerificationPlanStorySummary | null { + if (result.activeStoryId) { + const activeStory = result.stories.find((story) => story.id === result.activeStoryId); + if (activeStory) return activeStory; + } + return selectPrimaryStory(result.stories); +} + +// --------------------------------------------------------------------------- +// Derive plan result from session state +// --------------------------------------------------------------------------- + +export interface ComputePlanOptions { + agentBrowserAvailable?: boolean; + devServerLoopGuardHit?: boolean; + lastAttemptedAction?: string | null; + staleThresholdMs?: number; +} + +/** + * Load ledger state for a session and derive the current plan. + * Returns a JSON-serializable result suitable for CLI and hook consumption. + */ +export function computePlan( + sessionId: string, + options?: ComputePlanOptions, + logger?: Logger, +): VerificationPlanResult { + const log = logger ?? createLogger(); + + const observations = loadObservations(sessionId, log); + const stories = loadStories(sessionId, log); + const plan = derivePlan(observations, stories, options); + + log.summary("verification-plan.computed", { + sessionId, + storyCount: stories.length, + observationCount: observations.length, + missingBoundaries: plan.missingBoundaries, + hasNextAction: plan.primaryNextAction !== null, + }); + + return planToResult(plan); +} + +/** + * Convert a VerificationPlan to a JSON-serializable result. + */ +export function planToResult(plan: VerificationPlan): VerificationPlanResult { + const storyStates: VerificationPlanStoryStateSummary[] = plan.stories.map((s) => { + const ss = plan.storyStates[s.id]; + if (!ss) { + return { + storyId: s.id, + storyKind: s.kind, + route: s.route, + observationIds: [], + satisfiedBoundaries: [], + missingBoundaries: [], + recentRoutes: [], + primaryNextAction: null, + blockedReasons: [], + lastObservedAt: null, + }; + } + return { + storyId: ss.storyId, + storyKind: ss.storyKind, + route: ss.route, + observationIds: ss.observationIds, + satisfiedBoundaries: [...ss.satisfiedBoundaries].sort(), + missingBoundaries: [...ss.missingBoundaries].sort(), + recentRoutes: ss.recentRoutes, + primaryNextAction: ss.primaryNextAction, + blockedReasons: ss.blockedReasons, + lastObservedAt: ss.lastObservedAt, + }; + }); + + return { + hasStories: plan.stories.length > 0, + activeStoryId: plan.activeStoryId, + stories: plan.stories.map((s) => ({ + id: s.id, + kind: s.kind, + route: s.route, + promptExcerpt: s.promptExcerpt, + createdAt: s.createdAt, + updatedAt: s.updatedAt, + })), + storyStates, + observationCount: plan.observations.length, + satisfiedBoundaries: Array.from(plan.satisfiedBoundaries).sort(), + missingBoundaries: [...plan.missingBoundaries].sort(), + recentRoutes: plan.recentRoutes, + primaryNextAction: plan.primaryNextAction, + blockedReasons: plan.blockedReasons, + }; +} + +/** + * Load the persisted plan state snapshot without re-deriving. + * Returns null if no state exists. Faster than computePlan when + * the caller only needs the last-persisted snapshot. + */ +export function loadCachedPlanResult( + sessionId: string, + logger?: Logger, +): VerificationPlanResult | null { + const log = logger ?? createLogger(); + const state = loadPlanState(sessionId, log); + if (!state) return null; + + const storyStates: VerificationPlanStoryStateSummary[] = (state.storyStates ?? []).map((ss) => ({ + storyId: ss.storyId, + storyKind: ss.storyKind, + route: ss.route, + observationIds: ss.observationIds, + satisfiedBoundaries: [...ss.satisfiedBoundaries].sort(), + missingBoundaries: [...ss.missingBoundaries].sort(), + recentRoutes: ss.recentRoutes, + primaryNextAction: ss.primaryNextAction, + blockedReasons: ss.blockedReasons, + lastObservedAt: ss.lastObservedAt, + })); + + return { + hasStories: state.stories.length > 0, + activeStoryId: state.activeStoryId ?? null, + stories: state.stories.map((s) => ({ + id: s.id, + kind: s.kind, + route: s.route, + promptExcerpt: s.promptExcerpt, + createdAt: s.createdAt, + updatedAt: s.updatedAt, + })), + storyStates, + observationCount: state.observationIds.length, + satisfiedBoundaries: [...state.satisfiedBoundaries].sort(), + missingBoundaries: [...state.missingBoundaries].sort(), + recentRoutes: state.recentRoutes, + primaryNextAction: state.primaryNextAction, + blockedReasons: state.blockedReasons, + }; +} + +// --------------------------------------------------------------------------- +// Loop snapshot (extends plan result with last-observation adherence) +// --------------------------------------------------------------------------- + +export interface VerificationLoopSnapshot extends VerificationPlanResult { + lastObservation: { + id: string; + boundary: string | null; + route: string | null; + matchedSuggestedAction: boolean | null; + suggestedBoundary: string | null; + suggestedAction: string | null; + } | null; +} + +/** + * Extend a VerificationPlan into a VerificationLoopSnapshot by extracting + * the most recent observation's adherence metadata. + */ +export function planToLoopSnapshot( + plan: VerificationPlan, +): VerificationLoopSnapshot { + const result = planToResult(plan); + const last = plan.observations[plan.observations.length - 1] ?? null; + + if (!last) { + return { + ...result, + lastObservation: null, + }; + } + + const meta = (last.meta ?? {}) as Record; + + return { + ...result, + lastObservation: { + id: last.id, + boundary: last.boundary, + route: last.route, + matchedSuggestedAction: + typeof meta.matchedSuggestedAction === "boolean" + ? meta.matchedSuggestedAction + : null, + suggestedBoundary: + typeof meta.suggestedBoundary === "string" + ? meta.suggestedBoundary + : null, + suggestedAction: + typeof meta.suggestedAction === "string" + ? meta.suggestedAction + : null, + }, + }; +} + +// --------------------------------------------------------------------------- +// PreToolUse banner generation +// --------------------------------------------------------------------------- + +/** + * Format a compact verification banner for injection into PreToolUse additionalContext. + * Returns null if there's nothing to surface (no stories or all boundaries satisfied). + */ +export function formatVerificationBanner( + result: VerificationPlanResult, +): string | null { + if (!result.hasStories) return null; + if (!result.primaryNextAction && result.missingBoundaries.length === 0) return null; + + const lines: string[] = [""]; + lines.push("**[Verification Plan]**"); + + // Current story — use deterministic selection + const story = selectActiveStory(result); + if (story) { + const routePart = story.route ? ` (${story.route})` : ""; + lines.push(`Story: ${story.kind}${routePart} — "${story.promptExcerpt}"`); + } + + // Evidence summary + const satisfied = result.satisfiedBoundaries; + const missing = result.missingBoundaries; + if (satisfied.length > 0 || missing.length > 0) { + lines.push(`Evidence: ${satisfied.length}/4 boundaries satisfied [${satisfied.join(", ") || "none"}]`); + if (missing.length > 0) { + lines.push(`Missing: ${missing.join(", ")}`); + } + } + + // Next action + if (result.primaryNextAction) { + lines.push(`Next action: \`${result.primaryNextAction.action}\``); + lines.push(`Reason: ${result.primaryNextAction.reason}`); + } else if (result.blockedReasons.length > 0) { + lines.push(`Blocked: ${result.blockedReasons[0]}`); + } else { + lines.push("All verification boundaries satisfied."); + } + + lines.push(""); + return lines.join("\n"); +} + +// --------------------------------------------------------------------------- +// Human-readable CLI output +// --------------------------------------------------------------------------- + +/** + * Format a human-readable plan summary for terminal output. + * + * When multiple stories exist, highlights the active story and appends + * a compact summary of other stories with their progress. + */ +export function formatPlanHuman(result: VerificationPlanResult): string { + if (!result.hasStories) { + return "No verification stories active.\nNo observations recorded.\n"; + } + + const lines: string[] = []; + + // Active story header + const activeStory = selectActiveStory(result); + + if (activeStory) { + const routePart = activeStory.route ? ` (${activeStory.route})` : ""; + lines.push(`Active story: ${activeStory.kind}${routePart}: "${activeStory.promptExcerpt}"`); + } + + // Evidence for active story + const satisfied = result.satisfiedBoundaries; + const missing = result.missingBoundaries; + lines.push(`Evidence: ${satisfied.length}/4 boundaries satisfied [${satisfied.join(", ") || "none"}]`); + if (missing.length > 0) { + lines.push(`Missing: ${missing.join(", ")}`); + } + + // Next action with reason + if (result.primaryNextAction) { + lines.push(`Next action: ${result.primaryNextAction.action}`); + lines.push(` Reason: ${result.primaryNextAction.reason}`); + } else if (result.blockedReasons.length > 0) { + lines.push("Next action: blocked"); + for (const reason of result.blockedReasons) { + lines.push(` - ${reason}`); + } + } else if (missing.length === 0) { + lines.push("All verification boundaries satisfied."); + } else { + lines.push("No next action available."); + } + + // Compact summary of other stories + const otherStories = result.stories.filter((s) => s.id !== (activeStory?.id ?? null)); + if (otherStories.length > 0) { + lines.push(""); + lines.push("Other stories:"); + for (const story of otherStories) { + const ss = result.storyStates?.find((st) => st.storyId === story.id); + const satisfiedCount = ss ? ss.satisfiedBoundaries.length : 0; + const routePart = story.route ? ` (${story.route})` : ""; + lines.push(` ${story.kind}${routePart} — ${satisfiedCount}/4 boundaries satisfied`); + } + } + + return lines.join("\n") + "\n"; +} diff --git a/hooks/src/verification-signal.mts b/hooks/src/verification-signal.mts new file mode 100644 index 0000000..20d1b35 --- /dev/null +++ b/hooks/src/verification-signal.mts @@ -0,0 +1,341 @@ +/** + * Normalized multi-tool verification-signal classifier. + * + * Unifies Bash boundary classification and non-Bash tool classification into + * a single deterministic entry point. Returns a NormalizedVerificationSignal + * for every tool call that constitutes verification evidence, or null for + * unsupported/irrelevant tool calls. + * + * Signal strength rules: + * - "strong": Bash HTTP/browser commands, WebFetch, registered browser/HTTP tools + * → resolves long-term routing policy outcomes + * - "soft": .env* reads, vercel.json reads, log reads + * → records observations but does NOT resolve routing policy + */ + +// --------------------------------------------------------------------------- +// Types +// --------------------------------------------------------------------------- + +export type VerificationEvidenceSource = + | "bash" + | "browser" + | "http" + | "log-read" + | "env-read" + | "file-read" + | "unknown"; + +export type VerificationSignalStrength = "strong" | "soft"; + +export type VerificationObservedBoundary = + | "uiRender" + | "clientRequest" + | "serverHandler" + | "environment" + | "unknown"; + +export interface NormalizedVerificationSignal { + boundary: VerificationObservedBoundary; + matchedPattern: string; + inferredRoute: string | null; + signalStrength: VerificationSignalStrength; + evidenceSource: VerificationEvidenceSource; + summary: string; + toolName: string; +} + +// --------------------------------------------------------------------------- +// Bash boundary patterns +// --------------------------------------------------------------------------- + +interface BoundaryPattern { + boundary: VerificationObservedBoundary; + pattern: RegExp; + label: string; + evidenceSource: VerificationEvidenceSource; + signalStrength: VerificationSignalStrength; +} + +const BASH_BOUNDARY_PATTERNS: BoundaryPattern[] = [ + // uiRender: browser/screenshot/playwright/puppeteer commands → strong + // More specific patterns first to avoid early generic matches + { boundary: "uiRender", pattern: /\bnpx\s+playwright\b/i, label: "playwright-cli", evidenceSource: "browser", signalStrength: "strong" }, + { boundary: "uiRender", pattern: /\bopen\s+https?:/i, label: "open-url", evidenceSource: "browser", signalStrength: "strong" }, + { boundary: "uiRender", pattern: /\b(open|launch|browse|screenshot|puppeteer|playwright|chromium|firefox|webkit)\b/i, label: "browser-tool", evidenceSource: "browser", signalStrength: "strong" }, + + // clientRequest: curl, wget, httpie → strong + { boundary: "clientRequest", pattern: /\b(curl|wget|http|httpie)\b/i, label: "http-client", evidenceSource: "bash", signalStrength: "strong" }, + { boundary: "clientRequest", pattern: /\bfetch\s*\(/i, label: "fetch-call", evidenceSource: "bash", signalStrength: "strong" }, + { boundary: "clientRequest", pattern: /\bnpx\s+undici\b/i, label: "undici-cli", evidenceSource: "bash", signalStrength: "strong" }, + + // serverHandler: log tailing, server inspection → strong (Bash observation of server state) + { boundary: "serverHandler", pattern: /\b(tail|less|cat)\b.*\.(log|out|err)\b/i, label: "log-tail", evidenceSource: "bash", signalStrength: "strong" }, + { boundary: "serverHandler", pattern: /\b(tail\s+-f|journalctl\s+-f)\b/i, label: "log-follow", evidenceSource: "bash", signalStrength: "strong" }, + { boundary: "serverHandler", pattern: /\blog(s)?\s/i, label: "log-command", evidenceSource: "bash", signalStrength: "strong" }, + { boundary: "serverHandler", pattern: /\b(vercel\s+logs|vercel\s+inspect)\b/i, label: "vercel-logs", evidenceSource: "bash", signalStrength: "strong" }, + { boundary: "serverHandler", pattern: /\b(lsof|netstat|ss)\s.*:(3000|3001|4000|5173|8080)\b/i, label: "port-inspect", evidenceSource: "bash", signalStrength: "strong" }, + + // environment: env reads, config inspection → strong (Bash env observation) + { boundary: "environment", pattern: /\b(printenv|env\b|echo\s+\$)/i, label: "env-read", evidenceSource: "bash", signalStrength: "strong" }, + { boundary: "environment", pattern: /\bvercel\s+env\b/i, label: "vercel-env", evidenceSource: "bash", signalStrength: "strong" }, + { boundary: "environment", pattern: /\bcat\b.*\.env\b/i, label: "dotenv-read", evidenceSource: "bash", signalStrength: "strong" }, + { boundary: "environment", pattern: /\bnode\s+-e\b.*process\.env\b/i, label: "node-env", evidenceSource: "bash", signalStrength: "strong" }, +]; + +// --------------------------------------------------------------------------- +// Registered browser/HTTP tool names (strong signals) +// --------------------------------------------------------------------------- + +const BROWSER_TOOLS = new Set([ + "agent_browser", + "agent-browser", + "mcp__browser__navigate", + "mcp__browser__screenshot", + "mcp__browser__click", + "mcp__puppeteer__navigate", + "mcp__puppeteer__screenshot", + "mcp__playwright__navigate", + "mcp__playwright__screenshot", +]); + +const HTTP_TOOLS = new Set([ + "WebFetch", + "mcp__fetch__fetch", + "mcp__http__request", + "mcp__http__get", + "mcp__http__post", +]); + +// --------------------------------------------------------------------------- +// URL route inference +// --------------------------------------------------------------------------- + +const URL_ROUTE_REGEX = /https?:\/\/[^/\s]+(\/([\w-]+(?:\/[\w-]+)*))/; + +function inferRouteFromUrl(url: string): string | null { + const match = URL_ROUTE_REGEX.exec(url); + return match?.[1] ?? null; +} + +// --------------------------------------------------------------------------- +// File path route inference +// --------------------------------------------------------------------------- + +const FILE_ROUTE_REGEX = /\b(?:app|pages|src\/pages|src\/app)\/([\w[\].-]+(?:\/[\w[\].-]+)*)/; + +function inferRouteFromFilePath(filePath: string): string | null { + const match = FILE_ROUTE_REGEX.exec(filePath); + if (!match) return null; + const route = "/" + match[1] + .replace(/\/page\.\w+$/, "") + .replace(/\/route\.\w+$/, "") + .replace(/\/layout\.\w+$/, "") + .replace(/\/loading\.\w+$/, "") + .replace(/\/error\.\w+$/, "") + .replace(/\[([^\]]+)\]/g, ":$1"); + return route === "/" ? "/" : route.replace(/\/$/, ""); +} + +// --------------------------------------------------------------------------- +// Main classifier +// --------------------------------------------------------------------------- + +/** + * Classify a tool call into a normalized verification signal. + * + * Returns a NormalizedVerificationSignal for any tool call that constitutes + * verification evidence, or null for unsupported/irrelevant calls. + * + * Classification is deterministic for the same toolName and toolInput. + */ +export function classifyVerificationSignal(input: { + toolName: string; + toolInput: Record; + env?: NodeJS.ProcessEnv; +}): NormalizedVerificationSignal | null { + const { toolName, toolInput } = input; + + // --- Bash classification --- + if (toolName === "Bash") { + const command = String(toolInput.command || ""); + if (!command) return null; + + for (const bp of BASH_BOUNDARY_PATTERNS) { + if (bp.pattern.test(command)) { + const inferredRoute = inferRouteFromUrl(command); + return { + boundary: bp.boundary, + matchedPattern: bp.label, + inferredRoute, + signalStrength: bp.signalStrength, + evidenceSource: bp.evidenceSource, + summary: command.slice(0, 200), + toolName: "Bash", + }; + } + } + + // No boundary pattern matched + return null; + } + + // --- Registered browser tools → uiRender + strong --- + if (BROWSER_TOOLS.has(toolName)) { + const url = String(toolInput.url || toolInput.uri || ""); + return { + boundary: "uiRender", + matchedPattern: "browser-tool", + inferredRoute: url ? inferRouteFromUrl(url) : null, + signalStrength: "strong", + evidenceSource: "browser", + summary: url ? url.slice(0, 200) : toolName, + toolName, + }; + } + + // --- Registered HTTP tools → clientRequest + strong --- + if (HTTP_TOOLS.has(toolName)) { + const url = String(toolInput.url || toolInput.uri || ""); + if (!url && toolName !== "WebFetch") { + // Non-WebFetch HTTP tools without a URL still classify as strong HTTP + return { + boundary: "clientRequest", + matchedPattern: "http-tool", + inferredRoute: null, + signalStrength: "strong", + evidenceSource: "http", + summary: toolName, + toolName, + }; + } + if (!url) return null; + + return { + boundary: "clientRequest", + matchedPattern: toolName === "WebFetch" ? "web-fetch" : "http-tool", + inferredRoute: inferRouteFromUrl(url), + signalStrength: "strong", + evidenceSource: "http", + summary: url.slice(0, 200), + toolName, + }; + } + + // --- Read tool --- + if (toolName === "Read") { + const filePath = String(toolInput.file_path || ""); + if (!filePath) return null; + + // .env files → environment + soft + if (/\.env(\.\w+)?$/.test(filePath)) { + return { + boundary: "environment", + matchedPattern: "env-file-read", + inferredRoute: null, + signalStrength: "soft", + evidenceSource: "env-read", + summary: filePath, + toolName: "Read", + }; + } + + // vercel.json, .vercel/project.json → environment + soft + if (/vercel\.json$/.test(filePath) || /\.vercel\/project\.json$/.test(filePath)) { + return { + boundary: "environment", + matchedPattern: "vercel-config-read", + inferredRoute: null, + signalStrength: "soft", + evidenceSource: "env-read", + summary: filePath, + toolName: "Read", + }; + } + + // Log files → serverHandler + soft + if (/\.(log|out|err)$/.test(filePath) || /vercel-logs/.test(filePath) || /\.next\/.*server.*\.log/.test(filePath)) { + return { + boundary: "serverHandler", + matchedPattern: "log-file-read", + inferredRoute: inferRouteFromFilePath(filePath), + signalStrength: "soft", + evidenceSource: "log-read", + summary: filePath, + toolName: "Read", + }; + } + + return null; + } + + // --- Grep tool --- + if (toolName === "Grep") { + const path = String(toolInput.path || ""); + + if (/\.(log|out|err)$/.test(path) || /logs?\//.test(path)) { + return { + boundary: "serverHandler", + matchedPattern: "log-grep", + inferredRoute: null, + signalStrength: "soft", + evidenceSource: "log-read", + summary: `grep ${toolInput.pattern || ""} in ${path}`.slice(0, 200), + toolName: "Grep", + }; + } + + if (/\.env/.test(path)) { + return { + boundary: "environment", + matchedPattern: "env-grep", + inferredRoute: null, + signalStrength: "soft", + evidenceSource: "env-read", + summary: `grep ${toolInput.pattern || ""} in ${path}`.slice(0, 200), + toolName: "Grep", + }; + } + + return null; + } + + // --- Glob tool --- + if (toolName === "Glob") { + const pattern = String(toolInput.pattern || ""); + + if (/\*\.(log|out|err)/.test(pattern) || /logs?\//.test(pattern)) { + return { + boundary: "serverHandler", + matchedPattern: "log-glob", + inferredRoute: null, + signalStrength: "soft", + evidenceSource: "log-read", + summary: `glob ${pattern}`.slice(0, 200), + toolName: "Glob", + }; + } + + if (/\.env/.test(pattern)) { + return { + boundary: "environment", + matchedPattern: "env-glob", + inferredRoute: null, + signalStrength: "soft", + evidenceSource: "env-read", + summary: `glob ${pattern}`.slice(0, 200), + toolName: "Glob", + }; + } + + return null; + } + + // Edit/Write are mutations, not observations + if (toolName === "Edit" || toolName === "Write") { + return null; + } + + // Unsupported tool + return null; +} diff --git a/hooks/subagent-start-bootstrap.mjs b/hooks/subagent-start-bootstrap.mjs index db47d9a..67e71f6 100755 --- a/hooks/subagent-start-bootstrap.mjs +++ b/hooks/subagent-start-bootstrap.mjs @@ -10,6 +10,15 @@ import { compilePromptSignals, matchPromptWithReason, normalizePromptText } from import { loadSkills } from "./pretooluse-skill-inject.mjs"; import { extractFrontmatter } from "./skill-map-frontmatter.mjs"; import { claimPendingLaunch } from "./subagent-state.mjs"; +import { + computePlan, + loadCachedPlanResult, + selectActiveStory +} from "./verification-plan.mjs"; +import { + buildVerificationDirective, + buildVerificationEnv +} from "./verification-directive.mjs"; var PLUGIN_ROOT = resolvePluginRoot(); var MINIMAL_BUDGET_BYTES = 1024; var LIGHT_BUDGET_BYTES = 3072; @@ -125,17 +134,116 @@ function resolveLikelySkillsFromPendingLaunch(sessionId, agentType, likelySkills return likelySkills; } } +function resolveVerificationPlan(sessionId) { + if (!sessionId) return null; + try { + const cached = loadCachedPlanResult(sessionId); + if (cached?.hasStories) { + log.debug("subagent-start-bootstrap:verification-plan-cached", { sessionId }); + return cached; + } + log.debug("subagent-start-bootstrap:verification-plan-cache-miss", { sessionId }); + } catch (error) { + logCaughtError(log, "subagent-start-bootstrap:verification-plan-cache-failed", error, { + sessionId + }); + } + try { + const fresh = computePlan(sessionId, { + agentBrowserAvailable: process.env.VERCEL_PLUGIN_AGENT_BROWSER_AVAILABLE !== "0", + lastAttemptedAction: process.env.VERCEL_PLUGIN_VERIFICATION_ACTION || null + }); + if (fresh.hasStories) { + log.debug("subagent-start-bootstrap:verification-plan-fresh", { sessionId }); + return fresh; + } + log.debug("subagent-start-bootstrap:verification-plan-empty", { sessionId }); + } catch (error) { + logCaughtError(log, "subagent-start-bootstrap:verification-plan-fresh-failed", error, { + sessionId + }); + } + return null; +} +function buildVerificationContextFromPlan(plan, category) { + if (!plan.hasStories || plan.stories.length === 0) return null; + const story = selectActiveStory(plan); + if (!story) return null; + const routePart = story.route ? ` (${story.route})` : ""; + switch (category) { + case "minimal": { + return [ + ``, + `Verification story: ${story.kind}${routePart}`, + `` + ].join("\n"); + } + case "light": { + const lines = [ + ``, + `Verification story: ${story.kind}${routePart} \u2014 "${story.promptExcerpt}"` + ]; + if (plan.missingBoundaries.length > 0) { + lines.push(`Missing boundaries: ${plan.missingBoundaries.join(", ")}`); + } + if (plan.primaryNextAction) { + lines.push(`Candidate action: ${plan.primaryNextAction.action}`); + } + if (plan.blockedReasons.length > 0) { + lines.push(`Blocked: ${plan.blockedReasons[0]}`); + } + lines.push(``); + return lines.join("\n"); + } + case "standard": { + const lines = [ + ``, + `Verification story: ${story.kind}${routePart} \u2014 "${story.promptExcerpt}"`, + `Evidence: ${plan.satisfiedBoundaries.length}/4 boundaries [${plan.satisfiedBoundaries.join(", ") || "none"}]` + ]; + if (plan.missingBoundaries.length > 0) { + lines.push(`Missing: ${plan.missingBoundaries.join(", ")}`); + } + if (plan.primaryNextAction) { + lines.push(`Primary action: \`${plan.primaryNextAction.action}\``); + lines.push(`Reason: ${plan.primaryNextAction.reason}`); + } + if (plan.blockedReasons.length > 0) { + for (const reason of plan.blockedReasons) { + lines.push(`Blocked: ${reason}`); + } + } + if (plan.recentRoutes.length > 0) { + lines.push(`Recent routes: ${plan.recentRoutes.join(", ")}`); + } + lines.push(``); + return lines.join("\n"); + } + } +} +function buildVerificationContext(sessionId, category) { + const plan = resolveVerificationPlan(sessionId); + return plan ? buildVerificationContextFromPlan(plan, category) : null; +} function profileLine(agentType, likelySkills) { return "Vercel plugin active. Project likely uses: " + (likelySkills.length > 0 ? likelySkills.join(", ") : "unknown stack") + "."; } -function buildMinimalContext(agentType, likelySkills) { +function buildMinimalContext(agentType, likelySkills, sessionId) { const parts = []; parts.push(``); parts.push(profileLine(agentType, likelySkills)); + const verificationCtx = buildVerificationContext(sessionId, "minimal"); + if (verificationCtx) { + const verBytes = Buffer.byteLength(verificationCtx, "utf8"); + const currentBytes = Buffer.byteLength(parts.join("\n"), "utf8"); + if (currentBytes + verBytes + 50 <= MINIMAL_BUDGET_BYTES) { + parts.push(verificationCtx); + } + } parts.push(""); return parts.join("\n"); } -function buildLightContext(agentType, likelySkills, budgetBytes) { +function buildLightContext(agentType, likelySkills, budgetBytes, sessionId) { const parts = []; parts.push(``); parts.push(profileLine(agentType, likelySkills)); @@ -164,10 +272,18 @@ function buildLightContext(agentType, likelySkills, budgetBytes) { parts.push(constraint); usedBytes += lineBytes + 1; } + const verificationCtx = buildVerificationContext(sessionId, "light"); + if (verificationCtx) { + const verBytes = Buffer.byteLength(verificationCtx, "utf8"); + if (usedBytes + verBytes + 1 <= budgetBytes) { + parts.push(verificationCtx); + usedBytes += verBytes + 1; + } + } parts.push(""); return parts.join("\n"); } -function buildStandardContext(agentType, likelySkills, budgetBytes) { +function buildStandardContext(agentType, likelySkills, budgetBytes, sessionId) { const parts = []; parts.push(``); parts.push(profileLine(agentType, likelySkills)); @@ -201,6 +317,14 @@ ${summary} } } } + const verificationCtx = buildVerificationContext(sessionId, "standard"); + if (verificationCtx) { + const verBytes = Buffer.byteLength(verificationCtx, "utf8"); + if (usedBytes + verBytes + 1 <= budgetBytes) { + parts.push(verificationCtx); + usedBytes += verBytes + 1; + } + } parts.push(""); return parts.join("\n"); } @@ -224,13 +348,13 @@ function main() { let context; switch (category) { case "minimal": - context = buildMinimalContext(agentType, likelySkills); + context = buildMinimalContext(agentType, likelySkills, sessionId); break; case "light": - context = buildLightContext(agentType, likelySkills, maxBytes); + context = buildLightContext(agentType, likelySkills, maxBytes, sessionId); break; case "standard": - context = buildStandardContext(agentType, likelySkills, maxBytes); + context = buildStandardContext(agentType, likelySkills, maxBytes, sessionId); break; } if (Buffer.byteLength(context, "utf8") > maxBytes) { @@ -250,6 +374,9 @@ function main() { } const budgetUsed = Buffer.byteLength(context, "utf8"); const pendingLaunchMatched = likelySkills.length !== profilerLikelySkills.length || likelySkills.some((s) => !profilerLikelySkills.includes(s)); + const verificationPlan = resolveVerificationPlan(sessionId); + const verificationDirective = buildVerificationDirective(verificationPlan); + const verificationEnv = buildVerificationEnv(verificationDirective); log.summary("subagent-start-bootstrap:complete", { agent_id: agentId, agent_type: agentType, @@ -257,13 +384,16 @@ function main() { budget_used: budgetUsed, budget_max: maxBytes, budget_category: category, - pending_launch_matched: pendingLaunchMatched + pending_launch_matched: pendingLaunchMatched, + verification_directive: verificationDirective !== null, + verification_env_keys: Object.keys(verificationEnv) }); const output = { hookSpecificOutput: { hookEventName: "SubagentStart", additionalContext: context - } + }, + ...Object.keys(verificationEnv).length > 0 ? { env: verificationEnv } : {} }; process.stdout.write(JSON.stringify(output)); process.exit(0); @@ -280,7 +410,13 @@ export { buildLightContext, buildMinimalContext, buildStandardContext, + buildVerificationContext, + buildVerificationContextFromPlan, + buildVerificationDirective, + buildVerificationEnv, getLikelySkills, main, - parseInput + parseInput, + resolveBudgetCategory, + resolveVerificationPlan }; diff --git a/hooks/user-prompt-submit-skill-inject.mjs b/hooks/user-prompt-submit-skill-inject.mjs index da1a029..83cc163 100755 --- a/hooks/user-prompt-submit-skill-inject.mjs +++ b/hooks/user-prompt-submit-skill-inject.mjs @@ -26,6 +26,27 @@ import { searchSkills, initializeLexicalIndex } from "./lexical-index.mjs"; import { analyzePrompt } from "./prompt-analysis.mjs"; import { createLogger, logDecision } from "./logger.mjs"; import { trackBaseEvents } from "./telemetry.mjs"; +import { loadCachedPlanResult } from "./verification-plan.mjs"; +import { resolvePromptVerificationBinding } from "./prompt-verification-binding.mjs"; +import { applyPolicyBoosts, applyRulebookBoosts } from "./routing-policy.mjs"; +import { + appendSkillExposure, + loadProjectRoutingPolicy +} from "./routing-policy-ledger.mjs"; +import { loadRulebook, rulebookPath } from "./learned-routing-rulebook.mjs"; +import { applyPromptPolicyRecall } from "./prompt-policy-recall.mjs"; +import { recallVerifiedCompanions } from "./companion-recall.mjs"; +import { recallVerifiedPlaybook } from "./playbook-recall.mjs"; +import { buildAttributionDecision } from "./routing-attribution.mjs"; +import { + appendRoutingDecisionTrace, + createDecisionId +} from "./routing-decision-trace.mjs"; +import { + buildDecisionCapsule, + buildDecisionCapsuleEnv, + persistDecisionCapsule +} from "./routing-decision-capsule.mjs"; var MAX_SKILLS = 2; var DEFAULT_INJECTION_BUDGET_BYTES = 8e3; var MIN_PROMPT_LENGTH = 10; @@ -678,8 +699,76 @@ function run() { durationMs: log.active ? log.elapsed() : void 0 }); } - const allMatched = Object.entries(report.perSkillResults).filter(([, r]) => r.matched).map(([skill]) => skill); - if (allMatched.length === 0) { + const promptPlan = sessionId ? loadCachedPlanResult(sessionId, log) : null; + const promptBinding = resolvePromptVerificationBinding({ plan: promptPlan }); + log.debug("prompt-verification-binding", { + source: promptBinding.source, + storyId: promptBinding.storyId, + targetBoundary: promptBinding.targetBoundary, + confidence: promptBinding.confidence, + reason: promptBinding.reason + }); + let matchedSkills = Object.entries(report.perSkillResults).filter(([, r]) => r.matched).map(([skill]) => skill); + const promptPolicy = cwd ? loadProjectRoutingPolicy(cwd) : null; + const promptPolicyRecallSynthetic = /* @__PURE__ */ new Set(); + const promptPolicyRecallReasons = {}; + if (promptPolicy && promptBinding.storyId && promptBinding.targetBoundary) { + const recall = applyPromptPolicyRecall({ + selectedSkills: report.selectedSkills, + matchedSkills, + seenSkills: dedupOff ? [] : parseSeenSkills(seenState), + maxSkills: MAX_SKILLS, + binding: { + storyId: promptBinding.storyId, + storyKind: promptBinding.storyKind, + route: promptBinding.route, + targetBoundary: promptBinding.targetBoundary + }, + policy: promptPolicy + }); + report.selectedSkills.length = 0; + report.selectedSkills.push(...recall.selectedSkills); + matchedSkills = recall.matchedSkills; + for (const skill of recall.syntheticSkills) { + promptPolicyRecallSynthetic.add(skill); + } + Object.assign(promptPolicyRecallReasons, recall.reasons); + if (recall.diagnosis) { + log.debug("prompt-policy-recall-lookup", { + requestedScenario: `UserPromptSubmit|${promptBinding.storyKind ?? "none"}|${promptBinding.targetBoundary ?? "none"}|Prompt|${promptBinding.route ?? "*"}`, + checkedScenarios: recall.diagnosis.checkedScenarios, + selectedBucket: recall.diagnosis.selectedBucket, + selectedSkills: recall.diagnosis.selected.map((c) => c.skill), + rejected: recall.diagnosis.rejected.map((c) => ({ + skill: c.skill, + scenario: c.scenario, + exposures: c.exposures, + successRate: c.successRate, + policyBoost: c.policyBoost, + excluded: c.excluded, + rejectedReason: c.rejectedReason + })), + hintCodes: recall.diagnosis.hints.map((h) => h.code) + }); + for (const candidate of recall.diagnosis.selected) { + log.debug("prompt-policy-recall-injected", { + skill: candidate.skill, + scenario: candidate.scenario, + exposures: candidate.exposures, + wins: candidate.wins, + directiveWins: candidate.directiveWins, + successRate: candidate.successRate, + policyBoost: candidate.policyBoost, + recallScore: candidate.recallScore + }); + } + } + } else if (cwd) { + log.debug("prompt-policy-recall-skipped", { + reason: !promptBinding.storyId ? "no_active_verification_story" : "no_target_boundary" + }); + } + if (matchedSkills.length === 0) { log.debug("prompt-analysis-issue", { issue: "no_prompt_matches", evaluatedSkills: Object.keys(report.perSkillResults), @@ -691,16 +780,206 @@ function run() { if (report.selectedSkills.length === 0) { log.debug("prompt-analysis-issue", { issue: "all_deduped", - matchedSkills: allMatched, + matchedSkills, seenSkills: report.dedupState.seenSkills, dedupStrategy: report.dedupState.strategy }); log.complete("all_deduped", { - matchedCount: allMatched.length, - dedupedCount: allMatched.length + matchedCount: matchedSkills.length, + dedupedCount: matchedSkills.length }, log.active ? timing : null); return formatEmptyOutput(platform, finalizePromptEnvUpdates(platform, promptEnvBefore)); } + const promptPolicyBoosted = []; + if (promptPolicy && report.selectedSkills.length > 0 && promptBinding.storyId && promptBinding.targetBoundary) { + const promptPolicyScenario = { + hook: "UserPromptSubmit", + storyKind: promptBinding.storyKind, + targetBoundary: promptBinding.targetBoundary, + toolName: "Prompt" + }; + const rankable = report.selectedSkills.map((skill) => { + const r = report.perSkillResults[skill]; + return { + skill, + priority: r?.score ?? 0, + effectivePriority: r?.score ?? 0 + }; + }); + const boosted = applyPolicyBoosts(rankable, promptPolicy, promptPolicyScenario); + boosted.sort( + (a, b) => b.effectivePriority - a.effectivePriority || a.skill.localeCompare(b.skill) + ); + report.selectedSkills.length = 0; + report.selectedSkills.push(...boosted.map((b) => b.skill)); + for (const b of boosted) { + if (b.policyBoost !== 0) { + promptPolicyBoosted.push({ + skill: b.skill, + boost: b.policyBoost, + reason: b.policyReason + }); + } + } + if (promptPolicyBoosted.length > 0) { + log.debug("prompt-policy-boosted", { + scenario: `${promptPolicyScenario.hook}|${promptPolicyScenario.storyKind ?? "none"}|${promptPolicyScenario.targetBoundary}|Prompt`, + boostedSkills: promptPolicyBoosted + }); + } + } else if (cwd && report.selectedSkills.length > 0) { + log.debug("prompt-policy-boost-skipped", { + reason: !promptBinding.storyId ? "no_active_verification_story" : "no_target_boundary" + }); + } + const promptRulebookBoosted = []; + if (cwd && report.selectedSkills.length > 0 && promptBinding.storyId && promptBinding.targetBoundary) { + const rbResult = loadRulebook(cwd); + if (rbResult.ok && rbResult.rulebook.rules.length > 0) { + const rbScenario = { + hook: "UserPromptSubmit", + storyKind: promptBinding.storyKind, + targetBoundary: promptBinding.targetBoundary, + toolName: "Prompt" + }; + const rbPath = rulebookPath(cwd); + const rankable = report.selectedSkills.map((skill) => { + const r = report.perSkillResults[skill]; + const pb = promptPolicyBoosted.find((p) => p.skill === skill); + return { + skill, + priority: r?.score ?? 0, + effectivePriority: (r?.score ?? 0) + (pb?.boost ?? 0), + policyBoost: pb?.boost ?? 0, + policyReason: pb?.reason ?? null + }; + }); + const withRulebook = applyRulebookBoosts(rankable, rbResult.rulebook, rbScenario, rbPath); + withRulebook.sort( + (a, b) => b.effectivePriority - a.effectivePriority || a.skill.localeCompare(b.skill) + ); + report.selectedSkills.length = 0; + report.selectedSkills.push(...withRulebook.map((r) => r.skill)); + for (const rb of withRulebook) { + if (rb.matchedRuleId) { + promptRulebookBoosted.push({ + skill: rb.skill, + matchedRuleId: rb.matchedRuleId, + ruleBoost: rb.ruleBoost, + ruleReason: rb.ruleReason ?? "", + rulebookPath: rb.rulebookPath ?? "" + }); + const pIdx = promptPolicyBoosted.findIndex((p) => p.skill === rb.skill); + if (pIdx !== -1) { + promptPolicyBoosted.splice(pIdx, 1); + } + } + } + if (promptRulebookBoosted.length > 0) { + log.debug("prompt-rulebook-boosted", { + scenario: `${rbScenario.hook}|${rbScenario.storyKind ?? "none"}|${rbScenario.targetBoundary}|Prompt`, + boostedSkills: promptRulebookBoosted + }); + } + } else if (!rbResult.ok) { + log.debug("prompt-rulebook-load-error", { code: rbResult.error.code, message: rbResult.error.message }); + } + } + const promptCompanionRecallReasons = {}; + const promptForceSummarySkills = /* @__PURE__ */ new Set(); + if (cwd && promptBinding.storyId && promptBinding.targetBoundary) { + const companionRecall = recallVerifiedCompanions({ + projectRoot: cwd, + scenario: { + hook: "UserPromptSubmit", + storyKind: promptBinding.storyKind, + targetBoundary: promptBinding.targetBoundary, + toolName: "Prompt", + routeScope: promptBinding.route ?? null + }, + candidateSkills: [...report.selectedSkills], + excludeSkills: /* @__PURE__ */ new Set([ + ...report.selectedSkills, + ...dedupOff ? [] : parseSeenSkills(seenState) + ]), + maxCompanions: 1 + }); + for (const recall of companionRecall.selected) { + const candidateIdx = report.selectedSkills.indexOf(recall.candidateSkill); + if (candidateIdx === -1) continue; + report.selectedSkills.splice(candidateIdx + 1, 0, recall.companionSkill); + matchedSkills.push(recall.companionSkill); + const seenSkills2 = dedupOff ? /* @__PURE__ */ new Set() : parseSeenSkills(seenState); + const alreadySeen = !dedupOff && seenSkills2.has(recall.companionSkill); + if (alreadySeen) { + promptForceSummarySkills.add(recall.companionSkill); + } + promptCompanionRecallReasons[recall.companionSkill] = { + trigger: "verified-companion", + reasonCode: "scenario-companion-rulebook" + }; + log.debug("prompt-companion-recall-injected", { + candidateSkill: recall.candidateSkill, + companionSkill: recall.companionSkill, + scenario: recall.scenario, + lift: recall.confidence, + summaryOnly: alreadySeen + }); + } + if (companionRecall.rejected.length > 0) { + log.debug("prompt-companion-recall-rejected", { + rejected: companionRecall.rejected + }); + } + } else if (cwd) { + log.debug("prompt-companion-recall-skipped", { + reason: !promptBinding.storyId ? "no_active_verification_story" : "no_target_boundary" + }); + } + const promptPlaybookRecallReasons = {}; + let promptPlaybookBanner = null; + const availablePlaybookSlots = Math.max(0, MAX_SKILLS - report.selectedSkills.length); + if (cwd && promptBinding.storyId && promptBinding.targetBoundary && availablePlaybookSlots > 0) { + const playbookRecall = recallVerifiedPlaybook({ + projectRoot: cwd, + scenario: { + hook: "UserPromptSubmit", + storyKind: promptBinding.storyKind, + targetBoundary: promptBinding.targetBoundary, + toolName: "Prompt", + routeScope: promptBinding.route ?? null + }, + candidateSkills: [...report.selectedSkills], + excludeSkills: /* @__PURE__ */ new Set([ + ...report.selectedSkills, + ...dedupOff ? [] : parseSeenSkills(seenState) + ]), + maxInsertedSkills: availablePlaybookSlots + }); + if (playbookRecall.selected) { + promptPlaybookBanner = playbookRecall.banner; + const anchorIdx = report.selectedSkills.indexOf(playbookRecall.selected.anchorSkill); + let insertOffset = 1; + for (const skill of playbookRecall.selected.insertedSkills) { + report.selectedSkills.splice(anchorIdx + insertOffset, 0, skill); + matchedSkills.push(skill); + const seenSkills2 = dedupOff ? /* @__PURE__ */ new Set() : parseSeenSkills(seenState); + if (!dedupOff && seenSkills2.has(skill)) { + promptForceSummarySkills.add(skill); + } + promptPlaybookRecallReasons[skill] = { + trigger: "verified-playbook", + reasonCode: "scenario-playbook-rulebook" + }; + insertOffset += 1; + } + log.debug("prompt-playbook-recall-injected", { + ruleId: playbookRecall.selected.ruleId, + anchorSkill: playbookRecall.selected.anchorSkill, + insertedSkills: playbookRecall.selected.insertedSkills + }); + } + } const tInject = log.active ? log.now() : 0; const injectedSkills = dedupOff ? /* @__PURE__ */ new Set() : parseSeenSkills(seenState); const injectResult = injectSkills(report.selectedSkills, { @@ -712,6 +991,7 @@ function run() { maxSkills: MAX_SKILLS, skillMap: skills.skillMap, logger: log, + forceSummarySkills: promptForceSummarySkills.size > 0 ? promptForceSummarySkills : void 0, platform }); if (log.active) timing.inject = Math.round(log.now() - tInject); @@ -722,7 +1002,53 @@ function run() { } const droppedByCap = [...injectResult.droppedByCap, ...report.droppedByCap]; const droppedByBudget = [...injectResult.droppedByBudget, ...report.droppedByBudget]; - const matchedSkills = allMatched; + let promptAttribution = null; + if (loaded.length > 0 && sessionId && promptBinding.storyId && promptBinding.targetBoundary) { + promptAttribution = buildAttributionDecision({ + sessionId, + hook: "UserPromptSubmit", + storyId: promptBinding.storyId, + route: promptBinding.route, + targetBoundary: promptBinding.targetBoundary, + loadedSkills: loaded, + preferredSkills: promptPolicyRecallSynthetic + }); + for (const skill of loaded) { + appendSkillExposure({ + id: `${sessionId}:prompt:${skill}:${Date.now()}`, + sessionId, + projectRoot: cwd, + storyId: promptBinding.storyId, + storyKind: promptBinding.storyKind, + route: promptBinding.route, + hook: "UserPromptSubmit", + toolName: "Prompt", + skill, + targetBoundary: promptBinding.targetBoundary, + exposureGroupId: promptAttribution.exposureGroupId, + attributionRole: skill === promptAttribution.candidateSkill ? "candidate" : "context", + candidateSkill: promptAttribution.candidateSkill, + createdAt: (/* @__PURE__ */ new Date()).toISOString(), + resolvedAt: null, + outcome: "pending" + }); + } + log.summary("routing-policy-exposures-recorded", { + hook: "UserPromptSubmit", + skills: loaded, + storyId: promptBinding.storyId, + storyKind: promptBinding.storyKind, + targetBoundary: promptBinding.targetBoundary, + candidateSkill: promptAttribution.candidateSkill, + exposureGroupId: promptAttribution.exposureGroupId + }); + } else if (loaded.length > 0 && sessionId) { + log.debug("routing-policy-exposures-skipped", { + hook: "UserPromptSubmit", + reason: !promptBinding.storyId ? "no active verification story" : "no target boundary", + skills: loaded + }); + } if (parts.length === 0) { log.complete("all_deduped", { matchedCount: matchedSkills.length, @@ -770,13 +1096,113 @@ function run() { } outputEnv = finalizePromptEnvUpdates(platform, promptEnvBefore); } + { + const traceTimestamp = (/* @__PURE__ */ new Date()).toISOString(); + const decisionId = createDecisionId({ + hook: "UserPromptSubmit", + sessionId, + toolName: "Prompt", + toolTarget: normalizedPrompt, + timestamp: traceTimestamp + }); + const promptTrace = { + version: 2, + decisionId, + sessionId, + hook: "UserPromptSubmit", + toolName: "Prompt", + toolTarget: normalizedPrompt, + timestamp: traceTimestamp, + primaryStory: { + id: promptBinding.storyId, + kind: promptBinding.storyKind, + storyRoute: promptBinding.route, + targetBoundary: promptBinding.targetBoundary + }, + observedRoute: null, + // UserPromptSubmit fires before execution; no observed route + policyScenario: promptBinding.storyId && promptBinding.targetBoundary ? `UserPromptSubmit|${promptBinding.storyKind ?? "none"}|${promptBinding.targetBoundary}|Prompt` : null, + matchedSkills, + injectedSkills: loaded, + skippedReasons: [ + ...promptBinding.storyId ? [] : ["no_active_verification_story"], + ...promptBinding.storyId && !promptBinding.targetBoundary ? ["no_target_boundary"] : [], + ...droppedByCap.map((skill) => `cap_exceeded:${skill}`), + ...droppedByBudget.map((skill) => `budget_exhausted:${skill}`) + ], + ranked: report.selectedSkills.map((skill) => { + const result = report.perSkillResults[skill]; + const policy = promptPolicyBoosted.find((p) => p.skill === skill); + const rb = promptRulebookBoosted.find((r) => r.skill === skill); + const companionReason = promptCompanionRecallReasons[skill]; + const playbookReason = promptPlaybookRecallReasons[skill]; + const synthetic = promptPolicyRecallSynthetic.has(skill) || Boolean(companionReason) || Boolean(playbookReason); + const baseScore = result?.score ?? 0; + const effectiveBoost = rb ? rb.ruleBoost : policy?.boost ?? 0; + return { + skill, + basePriority: baseScore, + effectivePriority: baseScore + effectiveBoost, + pattern: playbookReason ? { type: playbookReason.trigger, value: playbookReason.reasonCode } : companionReason ? { type: companionReason.trigger, value: companionReason.reasonCode } : promptPolicyRecallSynthetic.has(skill) ? { type: "policy-recall", value: promptPolicyRecallReasons[skill] } : result?.reason ? { type: "prompt-signal", value: result.reason } : null, + profilerBoost: 0, + policyBoost: policy?.boost ?? 0, + policyReason: policy?.reason ?? null, + matchedRuleId: rb?.matchedRuleId ?? null, + ruleBoost: rb?.ruleBoost ?? 0, + ruleReason: rb?.ruleReason ?? null, + rulebookPath: rb?.rulebookPath ?? null, + summaryOnly: summaryOnly.includes(skill), + synthetic, + droppedReason: droppedByCap.includes(skill) ? "cap_exceeded" : droppedByBudget.includes(skill) ? "budget_exhausted" : null + }; + }), + verification: null, + causes: [], + edges: [] + }; + appendRoutingDecisionTrace(promptTrace); + const promptCapsule = buildDecisionCapsule({ + sessionId, + hook: "UserPromptSubmit", + createdAt: traceTimestamp, + toolName: "Prompt", + toolTarget: normalizedPrompt, + platform, + trace: promptTrace, + directive: null, + // UserPromptSubmit has no verification directive + attribution: promptAttribution ? { + exposureGroupId: promptAttribution.exposureGroupId, + candidateSkill: promptAttribution.candidateSkill, + loadedSkills: promptAttribution.loadedSkills + } : null, + env: outputEnv + }); + const promptCapsulePath = persistDecisionCapsule(promptCapsule, log); + const capsuleEnv = buildDecisionCapsuleEnv(promptCapsule, promptCapsulePath); + outputEnv = { ...outputEnv ?? {}, ...capsuleEnv }; + log.summary("routing.decision_trace_written", { + decisionId, + hook: "UserPromptSubmit", + matchedSkills, + injectedSkills: loaded, + capsulePath: promptCapsulePath + }); + } const promptMatchReasons = {}; for (const skill of loaded) { + if (promptPolicyRecallReasons[skill]) { + promptMatchReasons[skill] = promptPolicyRecallReasons[skill]; + continue; + } const r = report.perSkillResults[skill]; if (r?.reason) { promptMatchReasons[skill] = r.reason; } } + if (promptPlaybookBanner) { + parts.unshift(promptPlaybookBanner); + } return formatOutput( parts, matchedSkills, diff --git a/hooks/verification-closure-capsule.mjs b/hooks/verification-closure-capsule.mjs new file mode 100644 index 0000000..50fa07d --- /dev/null +++ b/hooks/verification-closure-capsule.mjs @@ -0,0 +1,98 @@ +// hooks/src/verification-closure-capsule.mts +import { + appendFileSync, + mkdirSync, + readFileSync +} from "fs"; +import { join } from "path"; +import { + createLogger, + logCaughtError +} from "./logger.mjs"; +import { traceDir } from "./routing-decision-trace.mjs"; +function verificationClosureCapsulePath(sessionId) { + return join(traceDir(sessionId), "verification-closure-capsules.jsonl"); +} +function buildVerificationClosureCapsule(input) { + const outcomeKind = input.resolvedExposures.length === 0 ? null : input.observation.matchedSuggestedAction ? "directive-win" : "win"; + return { + version: 1, + hook: "PostToolUse", + createdAt: input.createdAt ?? (/* @__PURE__ */ new Date()).toISOString(), + sessionId: input.sessionId, + verificationId: input.verificationId, + toolName: input.toolName, + observation: input.observation, + storyResolution: input.storyResolution, + gate: input.gate, + exposureDiagnosis: input.exposureDiagnosis, + resolution: { + attempted: input.gate.eligible, + outcomeKind, + resolvedCount: input.resolvedExposures.length, + resolvedExposureIds: input.resolvedExposures.map((e) => e.id), + candidateResolvedCount: input.resolvedExposures.filter( + (e) => e.attributionRole !== "context" + ).length, + contextResolvedCount: input.resolvedExposures.filter( + (e) => e.attributionRole === "context" + ).length + }, + plan: { + activeStoryId: input.plan.activeStoryId, + satisfiedBoundaries: Array.from(input.plan.satisfiedBoundaries).sort(), + missingBoundaries: [...input.plan.missingBoundaries], + blockedReasons: [...input.plan.blockedReasons], + primaryNextAction: input.plan.primaryNextAction ?? null + } + }; +} +function persistVerificationClosureCapsule(capsule, logger) { + const log = logger ?? createLogger(); + const path = verificationClosureCapsulePath(capsule.sessionId); + try { + mkdirSync(traceDir(capsule.sessionId), { recursive: true }); + appendFileSync(path, JSON.stringify(capsule) + "\n", "utf8"); + log.summary("verification.closure_capsule_written", { + verificationId: capsule.verificationId, + sessionId: capsule.sessionId, + toolName: capsule.toolName, + boundary: capsule.observation.boundary, + path + }); + } catch (error) { + logCaughtError( + log, + "verification.closure_capsule_write_failed", + error, + { + verificationId: capsule.verificationId, + sessionId: capsule.sessionId, + path + } + ); + } + return path; +} +function readVerificationClosureCapsules(sessionId) { + try { + const raw = readFileSync( + verificationClosureCapsulePath(sessionId), + "utf8" + ); + return raw.split("\n").filter((line) => line.trim() !== "").map((line) => JSON.parse(line)); + } catch { + return []; + } +} +function readLatestVerificationClosureCapsule(sessionId) { + const all = readVerificationClosureCapsules(sessionId); + return all.length > 0 ? all[all.length - 1] : null; +} +export { + buildVerificationClosureCapsule, + persistVerificationClosureCapsule, + readLatestVerificationClosureCapsule, + readVerificationClosureCapsules, + verificationClosureCapsulePath +}; diff --git a/hooks/verification-closure-diagnosis.mjs b/hooks/verification-closure-diagnosis.mjs new file mode 100644 index 0000000..93f4009 --- /dev/null +++ b/hooks/verification-closure-diagnosis.mjs @@ -0,0 +1,153 @@ +// hooks/src/verification-closure-diagnosis.mts +import { loadSessionExposures } from "./routing-policy-ledger.mjs"; +var LOCAL_DEV_HOSTS = /* @__PURE__ */ new Set([ + "localhost", + "127.0.0.1", + "0.0.0.0", + "::1", + "[::1]" +]); +function envString(env, key) { + const value = env[key]; + return typeof value === "string" && value.trim() !== "" ? value.trim() : null; +} +function inspectLocalVerificationUrl(rawUrl, env = process.env) { + const configuredOrigin = envString(env, "VERCEL_PLUGIN_LOCAL_DEV_ORIGIN"); + try { + const url = new URL(rawUrl); + const observedHost = url.host.toLowerCase(); + if (url.protocol !== "http:" && url.protocol !== "https:") { + return { + applicable: true, + parseable: true, + isLocal: false, + observedHost, + configuredOrigin, + matchSource: null + }; + } + if (LOCAL_DEV_HOSTS.has(url.hostname.toLowerCase())) { + return { + applicable: true, + parseable: true, + isLocal: true, + observedHost, + configuredOrigin, + matchSource: "loopback" + }; + } + if (configuredOrigin) { + try { + const configured = new URL(configuredOrigin); + if (configured.host.toLowerCase() === observedHost) { + return { + applicable: true, + parseable: true, + isLocal: true, + observedHost, + configuredOrigin, + matchSource: "configured-origin" + }; + } + } catch { + } + } + return { + applicable: true, + parseable: true, + isLocal: false, + observedHost, + configuredOrigin, + matchSource: null + }; + } catch { + return { + applicable: true, + parseable: false, + isLocal: null, + observedHost: null, + configuredOrigin, + matchSource: null + }; + } +} +function evaluateResolutionGate(event, env = process.env) { + const passedChecks = []; + const blockingReasonCodes = []; + if (event.boundary === "unknown") { + blockingReasonCodes.push("unknown_boundary"); + } else { + passedChecks.push("known_boundary"); + } + if (event.signalStrength !== "strong") { + blockingReasonCodes.push("soft_signal"); + } else { + passedChecks.push("strong_signal"); + } + let locality = { + applicable: false, + parseable: true, + isLocal: null, + observedHost: null, + configuredOrigin: envString(env, "VERCEL_PLUGIN_LOCAL_DEV_ORIGIN"), + matchSource: null + }; + if (event.toolName === "WebFetch") { + locality = inspectLocalVerificationUrl(event.command, env); + if (!locality.parseable) { + blockingReasonCodes.push("invalid_web_fetch_url"); + } else if (locality.isLocal !== true) { + blockingReasonCodes.push("remote_web_fetch"); + } else { + passedChecks.push("local_verification_url"); + } + } + return { + eligible: blockingReasonCodes.length === 0, + passedChecks, + blockingReasonCodes, + locality + }; +} +function diagnosePendingExposureMatch(params) { + const exposures = params.exposures ?? loadSessionExposures(params.sessionId); + const pending = exposures.filter( + (e) => e.sessionId === params.sessionId && e.outcome === "pending" + ); + const pendingBoundary = pending.filter( + (e) => e.targetBoundary === params.boundary + ); + const exact = pendingBoundary.filter( + (e) => e.storyId === params.storyId && e.route === params.route + ); + const sameStoryDifferentRoute = pendingBoundary.filter( + (e) => e.storyId === params.storyId && e.route !== params.route + ); + const sameRouteDifferentStory = pendingBoundary.filter( + (e) => e.route === params.route && e.storyId !== params.storyId + ); + const unresolvedReasonCodes = []; + if (pendingBoundary.length === 0) { + unresolvedReasonCodes.push("no_pending_for_boundary"); + } else if (exact.length === 0) { + if (params.storyId === null) unresolvedReasonCodes.push("missing_story_scope"); + if (params.route === null) unresolvedReasonCodes.push("missing_route_scope"); + if (sameStoryDifferentRoute.length > 0) unresolvedReasonCodes.push("route_mismatch"); + if (sameRouteDifferentStory.length > 0) unresolvedReasonCodes.push("story_mismatch"); + if (unresolvedReasonCodes.length === 0) unresolvedReasonCodes.push("no_exact_pending_match"); + } + return { + pendingTotal: pending.length, + pendingBoundaryCount: pendingBoundary.length, + exactMatchCount: exact.length, + exactMatchExposureIds: exact.map((e) => e.id), + sameStoryDifferentRouteExposureIds: sameStoryDifferentRoute.map((e) => e.id), + sameRouteDifferentStoryExposureIds: sameRouteDifferentStory.map((e) => e.id), + unresolvedReasonCodes + }; +} +export { + diagnosePendingExposureMatch, + evaluateResolutionGate, + inspectLocalVerificationUrl +}; diff --git a/hooks/verification-directive.mjs b/hooks/verification-directive.mjs new file mode 100644 index 0000000..8c2fd43 --- /dev/null +++ b/hooks/verification-directive.mjs @@ -0,0 +1,108 @@ +// hooks/src/verification-directive.mts +import { + computePlan, + formatVerificationBanner, + loadCachedPlanResult, + selectActiveStory +} from "./verification-plan.mjs"; +import { createLogger, logCaughtError } from "./logger.mjs"; +function buildVerificationDirective(plan) { + if (!plan?.hasStories || plan.stories.length === 0) return null; + const story = selectActiveStory(plan); + if (!story) return null; + return { + version: 1, + storyId: story.id, + storyKind: story.kind, + route: story.route, + missingBoundaries: [...plan.missingBoundaries], + satisfiedBoundaries: [...plan.satisfiedBoundaries], + primaryNextAction: plan.primaryNextAction, + blockedReasons: [...plan.blockedReasons] + }; +} +function buildVerificationEnv(directive) { + if (!directive?.primaryNextAction) { + return { + VERCEL_PLUGIN_VERIFICATION_STORY_ID: "", + VERCEL_PLUGIN_VERIFICATION_ROUTE: "", + VERCEL_PLUGIN_VERIFICATION_BOUNDARY: "", + VERCEL_PLUGIN_VERIFICATION_ACTION: "" + }; + } + return { + VERCEL_PLUGIN_VERIFICATION_STORY_ID: directive.storyId, + VERCEL_PLUGIN_VERIFICATION_ROUTE: directive.route ?? "", + VERCEL_PLUGIN_VERIFICATION_BOUNDARY: directive.primaryNextAction.targetBoundary, + VERCEL_PLUGIN_VERIFICATION_ACTION: directive.primaryNextAction.action + }; +} +function resolveVerificationRuntimeState(sessionId, options, logger) { + const log = logger ?? createLogger(); + if (!sessionId) { + log.debug("verification-directive.resolve-start", { + sessionId: null, + reason: "no-session" + }); + return { + plan: null, + directive: null, + banner: null, + env: buildVerificationEnv(null) + }; + } + log.debug("verification-directive.resolve-start", { sessionId }); + try { + let plan = loadCachedPlanResult(sessionId, log); + if (plan?.hasStories) { + log.debug("verification-directive.cache-hit", { + sessionId, + storyCount: plan.stories.length + }); + } else { + log.debug("verification-directive.cache-miss", { sessionId }); + plan = computePlan(sessionId, options, log); + } + if (!plan?.hasStories) { + log.debug("verification-directive.fresh-empty", { sessionId }); + return { + plan: null, + directive: null, + banner: null, + env: buildVerificationEnv(null) + }; + } + log.debug("verification-directive.fresh-computed", { + sessionId, + storyCount: plan.stories.length, + missingBoundaries: plan.missingBoundaries + }); + const directive = buildVerificationDirective(plan); + const env = buildVerificationEnv(directive); + const banner = formatVerificationBanner(plan); + log.summary("verification-directive.resolve-complete", { + sessionId, + storyId: directive?.storyId ?? null, + route: directive?.route ?? null, + hasDirective: directive !== null, + hasBanner: banner !== null, + envCleared: !directive?.primaryNextAction + }); + return { plan, directive, banner, env }; + } catch (error) { + logCaughtError(log, "verification-directive.resolve-failed", error, { + sessionId + }); + return { + plan: null, + directive: null, + banner: null, + env: buildVerificationEnv(null) + }; + } +} +export { + buildVerificationDirective, + buildVerificationEnv, + resolveVerificationRuntimeState +}; diff --git a/hooks/verification-ledger.mjs b/hooks/verification-ledger.mjs new file mode 100644 index 0000000..76a2525 --- /dev/null +++ b/hooks/verification-ledger.mjs @@ -0,0 +1,514 @@ +// hooks/src/verification-ledger.mts +import { + appendFileSync, + mkdirSync, + readFileSync, + rmSync, + writeFileSync +} from "fs"; +import { join } from "path"; +import { tmpdir } from "os"; +import { createHash } from "crypto"; +import { createLogger, logCaughtError } from "./logger.mjs"; +var ALL_BOUNDARIES = [ + "uiRender", + "clientRequest", + "serverHandler", + "environment" +]; +var SAFE_SESSION_ID_RE = /^[a-zA-Z0-9_-]+$/; +function resolveObservationStoryId(observation, stories) { + if (observation.storyId) return observation.storyId; + if (observation.route) { + const exactMatches = stories.filter((story) => story.route === observation.route); + if (exactMatches.length === 1) { + return exactMatches[0].id; + } + } + if (stories.length === 1) { + return stories[0].id; + } + return null; +} +function collectRecentRoutes(observations) { + const sorted = [...observations].sort( + (a, b) => Date.parse(b.timestamp) - Date.parse(a.timestamp) + ); + const seen = /* @__PURE__ */ new Set(); + const routes = []; + for (const observation of sorted) { + if (!observation.route) continue; + if (seen.has(observation.route)) continue; + seen.add(observation.route); + routes.push(observation.route); + } + return routes; +} +function deriveStoryStates(observations, stories, options) { + const opts = { + agentBrowserAvailable: true, + devServerLoopGuardHit: false, + lastAttemptedAction: null, + staleThresholdMs: 5 * 60 * 1e3, + ...options + }; + const states = {}; + for (const story of stories) { + states[story.id] = { + storyId: story.id, + storyKind: story.kind, + route: story.route, + observationIds: [], + satisfiedBoundaries: [], + missingBoundaries: [...ALL_BOUNDARIES], + recentRoutes: story.route ? [story.route] : [], + primaryNextAction: null, + blockedReasons: [], + lastObservedAt: null + }; + } + for (const obs of observations) { + const resolvedStoryId = resolveObservationStoryId(obs, stories); + if (!resolvedStoryId || !states[resolvedStoryId]) continue; + const state = states[resolvedStoryId]; + state.observationIds.push(obs.id); + if (obs.boundary && !state.satisfiedBoundaries.includes(obs.boundary)) { + state.satisfiedBoundaries.push(obs.boundary); + } + if (obs.route && !state.recentRoutes.includes(obs.route)) { + state.recentRoutes.push(obs.route); + } + if (!state.lastObservedAt || Date.parse(obs.timestamp) > Date.parse(state.lastObservedAt)) { + state.lastObservedAt = obs.timestamp; + } + } + for (const story of stories) { + const state = states[story.id]; + const satisfiedSet = new Set(state.satisfiedBoundaries); + state.missingBoundaries = ALL_BOUNDARIES.filter((b) => !satisfiedSet.has(b)); + const { primaryNextAction, blockedReasons } = computeNextAction( + state.missingBoundaries, + [story], + state.recentRoutes, + opts + ); + state.primaryNextAction = primaryNextAction; + state.blockedReasons = blockedReasons; + } + return states; +} +function selectActiveStoryId(stories, storyStates) { + if (stories.length === 0) return null; + const sorted = [...stories].sort((a, b) => { + const stateA = storyStates[a.id]; + const stateB = storyStates[b.id]; + const missingA = stateA ? stateA.missingBoundaries.length : 0; + const missingB = stateB ? stateB.missingBoundaries.length : 0; + if (missingA !== missingB) return missingB - missingA; + const updatedDiff = Date.parse(b.updatedAt) - Date.parse(a.updatedAt); + if (Number.isFinite(updatedDiff) && updatedDiff !== 0) return updatedDiff; + const createdDiff = Date.parse(b.createdAt) - Date.parse(a.createdAt); + if (Number.isFinite(createdDiff) && createdDiff !== 0) return createdDiff; + return a.id.localeCompare(b.id); + }); + return sorted[0].id; +} +function derivePlan(observations, stories, options) { + const observationIds = /* @__PURE__ */ new Set(); + const deduped = []; + for (const obs of observations) { + if (!observationIds.has(obs.id)) { + observationIds.add(obs.id); + deduped.push(obs); + } + } + const storyStates = deriveStoryStates(deduped, stories, options); + const activeStoryId = selectActiveStoryId(stories, storyStates); + const activeState = activeStoryId ? storyStates[activeStoryId] : null; + const satisfiedBoundaries = new Set( + activeState ? activeState.satisfiedBoundaries : [] + ); + const missingBoundaries = activeState ? activeState.missingBoundaries : stories.length > 0 ? [...ALL_BOUNDARIES] : []; + const recentRoutes = activeState ? activeState.recentRoutes : []; + const primaryNextAction = activeState ? activeState.primaryNextAction : null; + const blockedReasons = activeState ? activeState.blockedReasons : []; + return { + stories: [...stories], + observations: deduped, + observationIds, + storyStates, + activeStoryId, + satisfiedBoundaries, + missingBoundaries, + recentRoutes, + primaryNextAction, + blockedReasons + }; +} +function computeNextAction(missingBoundaries, stories, recentRoutes, opts) { + const blockedReasons = []; + if (stories.length === 0) { + return { primaryNextAction: null, blockedReasons }; + } + if (missingBoundaries.length === 0) { + return { primaryNextAction: null, blockedReasons }; + } + const route = recentRoutes[recentRoutes.length - 1] ?? null; + const routeSuffix = route ? ` ${route}` : ""; + const ACTION_MAP = { + clientRequest: () => ({ + action: `curl http://localhost:3000${route ?? "/"}`, + targetBoundary: "clientRequest", + reason: "No HTTP request observation yet \u2014 verify the endpoint responds" + }), + serverHandler: () => ({ + action: `tail server logs${routeSuffix}`, + targetBoundary: "serverHandler", + reason: "No server-side observation yet \u2014 check logs for errors" + }), + uiRender: () => { + if (!opts.agentBrowserAvailable) { + blockedReasons.push("agent-browser unavailable \u2014 cannot emit browser-only action"); + return null; + } + if (opts.devServerLoopGuardHit) { + blockedReasons.push("dev-server loop guard hit \u2014 skipping browser verification"); + return null; + } + return { + action: `open${routeSuffix || " /"} in agent-browser`, + targetBoundary: "uiRender", + reason: "No UI render observation yet \u2014 visually verify the page" + }; + }, + environment: () => ({ + action: "inspect env for required vars", + targetBoundary: "environment", + reason: "No environment observation yet \u2014 check env vars are set" + }) + }; + const PRIORITY_ORDER = [ + "clientRequest", + "serverHandler", + "uiRender", + "environment" + ]; + for (const boundary of PRIORITY_ORDER) { + if (!missingBoundaries.includes(boundary)) continue; + const action = ACTION_MAP[boundary](); + if (action) { + if (opts.lastAttemptedAction && action.action === opts.lastAttemptedAction) { + blockedReasons.push( + `Suppressed repeat of last attempted action: ${opts.lastAttemptedAction}` + ); + continue; + } + return { primaryNextAction: action, blockedReasons }; + } + } + return { primaryNextAction: null, blockedReasons }; +} +function storyId(kind, route) { + const input = `${kind}:${route ?? "*"}`; + return createHash("sha256").update(input).digest("hex").slice(0, 12); +} +function upsertStory(stories, kind, route, promptExcerpt, requestedSkills, now) { + const id = storyId(kind, route); + const timestamp = now ?? (/* @__PURE__ */ new Date()).toISOString(); + const existing = stories.find((s) => s.id === id); + if (existing) { + const merged = { + ...existing, + updatedAt: timestamp, + promptExcerpt: promptExcerpt || existing.promptExcerpt, + requestedSkills: Array.from( + /* @__PURE__ */ new Set([...existing.requestedSkills, ...requestedSkills]) + ) + }; + return stories.map((s) => s.id === id ? merged : s); + } + const newStory = { + id, + kind, + route, + promptExcerpt, + createdAt: timestamp, + updatedAt: timestamp, + requestedSkills + }; + return [...stories, newStory]; +} +function appendObservation(observations, observation) { + if (observations.some((o) => o.id === observation.id)) { + return observations; + } + return [...observations, observation]; +} +function normalizeSerializedPlanState(state) { + if (state.version === 2) return state; + const v1 = state; + const sorted = [...v1.stories].sort((a, b) => { + const updatedDiff = Date.parse(b.updatedAt) - Date.parse(a.updatedAt); + if (Number.isFinite(updatedDiff) && updatedDiff !== 0) return updatedDiff; + const createdDiff = Date.parse(b.createdAt) - Date.parse(a.createdAt); + if (Number.isFinite(createdDiff) && createdDiff !== 0) return createdDiff; + return a.id.localeCompare(b.id); + }); + const primaryStory = sorted[0] ?? null; + const activeStoryId = primaryStory?.id ?? null; + const storyStates = []; + if (primaryStory) { + storyStates.push({ + storyId: primaryStory.id, + storyKind: primaryStory.kind, + route: primaryStory.route, + observationIds: [...v1.observationIds], + satisfiedBoundaries: v1.satisfiedBoundaries, + missingBoundaries: v1.missingBoundaries, + recentRoutes: v1.recentRoutes, + primaryNextAction: v1.primaryNextAction, + blockedReasons: v1.blockedReasons, + lastObservedAt: null + }); + } + for (const story of v1.stories) { + if (story.id === activeStoryId) continue; + storyStates.push({ + storyId: story.id, + storyKind: story.kind, + route: story.route, + observationIds: [], + satisfiedBoundaries: [], + missingBoundaries: ALL_BOUNDARIES, + recentRoutes: story.route ? [story.route] : [], + primaryNextAction: null, + blockedReasons: [], + lastObservedAt: null + }); + } + return { + version: 2, + stories: v1.stories, + activeStoryId, + storyStates, + observationIds: v1.observationIds, + satisfiedBoundaries: v1.satisfiedBoundaries, + missingBoundaries: v1.missingBoundaries, + recentRoutes: v1.recentRoutes, + primaryNextAction: v1.primaryNextAction, + blockedReasons: v1.blockedReasons + }; +} +function serializePlanState(plan) { + const storyStates = []; + for (const story of plan.stories) { + const ss = plan.storyStates[story.id]; + if (ss) { + storyStates.push({ + storyId: ss.storyId, + storyKind: ss.storyKind, + route: ss.route, + observationIds: [...ss.observationIds].sort(), + satisfiedBoundaries: [...ss.satisfiedBoundaries].sort(), + missingBoundaries: [...ss.missingBoundaries].sort(), + recentRoutes: ss.recentRoutes, + primaryNextAction: ss.primaryNextAction, + blockedReasons: ss.blockedReasons, + lastObservedAt: ss.lastObservedAt + }); + } + } + const state = { + version: 2, + stories: plan.stories, + activeStoryId: plan.activeStoryId, + storyStates, + observationIds: Array.from(plan.observationIds).sort(), + satisfiedBoundaries: Array.from(plan.satisfiedBoundaries).sort(), + missingBoundaries: [...plan.missingBoundaries].sort(), + recentRoutes: plan.recentRoutes, + primaryNextAction: plan.primaryNextAction, + blockedReasons: plan.blockedReasons + }; + return JSON.stringify(state, null, 2); +} +function sessionIdSegment(sessionId) { + if (SAFE_SESSION_ID_RE.test(sessionId)) return sessionId; + return createHash("sha256").update(sessionId).digest("hex"); +} +function ledgerDir(sessionId) { + return join(tmpdir(), `vercel-plugin-${sessionIdSegment(sessionId)}-ledger`); +} +function ledgerPath(sessionId) { + return join(ledgerDir(sessionId), "observations.jsonl"); +} +function storiesPath(sessionId) { + return join(ledgerDir(sessionId), "stories.json"); +} +function statePath(sessionId) { + return join(ledgerDir(sessionId), "state.json"); +} +function persistObservation(sessionId, observation, logger) { + const log = logger ?? createLogger(); + const dir = ledgerDir(sessionId); + try { + mkdirSync(dir, { recursive: true }); + const line = JSON.stringify(observation) + "\n"; + appendFileSync(ledgerPath(sessionId), line, "utf-8"); + log.summary("verification-ledger.observation_persisted", { + observationId: observation.id, + boundary: observation.boundary, + source: observation.source + }); + } catch (error) { + logCaughtError(log, "verification-ledger.persist_observation_failed", error, { + sessionId, + observationId: observation.id + }); + } +} +function persistStories(sessionId, stories, logger) { + const log = logger ?? createLogger(); + const dir = ledgerDir(sessionId); + try { + mkdirSync(dir, { recursive: true }); + writeFileSync(storiesPath(sessionId), JSON.stringify(stories, null, 2), "utf-8"); + log.summary("verification-ledger.stories_persisted", { + storyCount: stories.length + }); + } catch (error) { + logCaughtError(log, "verification-ledger.persist_stories_failed", error, { + sessionId + }); + } +} +function persistPlanState(sessionId, plan, logger) { + const log = logger ?? createLogger(); + const dir = ledgerDir(sessionId); + try { + mkdirSync(dir, { recursive: true }); + writeFileSync(statePath(sessionId), serializePlanState(plan), "utf-8"); + log.summary("verification-ledger.state_persisted", { + observationCount: plan.observations.length, + storyCount: plan.stories.length, + missingBoundaries: plan.missingBoundaries + }); + } catch (error) { + logCaughtError(log, "verification-ledger.persist_state_failed", error, { + sessionId + }); + } +} +function loadObservations(sessionId, logger) { + const log = logger ?? createLogger(); + try { + const content = readFileSync(ledgerPath(sessionId), "utf-8"); + const lines = content.split("\n").filter((l) => l.trim() !== ""); + return lines.map((line) => JSON.parse(line)); + } catch (error) { + if (typeof error === "object" && error !== null && "code" in error && error.code === "ENOENT") { + return []; + } + logCaughtError(log, "verification-ledger.load_observations_failed", error, { + sessionId + }); + return []; + } +} +function loadStories(sessionId, logger) { + const log = logger ?? createLogger(); + try { + const content = readFileSync(storiesPath(sessionId), "utf-8"); + return JSON.parse(content); + } catch (error) { + if (typeof error === "object" && error !== null && "code" in error && error.code === "ENOENT") { + return []; + } + logCaughtError(log, "verification-ledger.load_stories_failed", error, { + sessionId + }); + return []; + } +} +function loadPlanState(sessionId, logger) { + const log = logger ?? createLogger(); + try { + const content = readFileSync(statePath(sessionId), "utf-8"); + const raw = JSON.parse(content); + const normalized = normalizeSerializedPlanState(raw); + if (raw.version !== normalized.version) { + log.summary("verification-ledger.state_normalized", { + sessionId, + fromVersion: raw.version, + toVersion: normalized.version + }); + } + return normalized; + } catch (error) { + if (typeof error === "object" && error !== null && "code" in error && error.code === "ENOENT") { + return null; + } + logCaughtError(log, "verification-ledger.load_state_failed", error, { + sessionId + }); + return null; + } +} +function recordObservation(sessionId, observation, options, logger) { + const log = logger ?? createLogger(); + const existingObservations = loadObservations(sessionId, log); + const stories = loadStories(sessionId, log); + const observations = appendObservation(existingObservations, observation); + if (observations !== existingObservations) { + persistObservation(sessionId, observation, log); + } + const plan = derivePlan(observations, stories, options); + persistPlanState(sessionId, plan, log); + return plan; +} +function recordStory(sessionId, kind, route, promptExcerpt, requestedSkills, options, logger) { + const log = logger ?? createLogger(); + const observations = loadObservations(sessionId, log); + let stories = loadStories(sessionId, log); + stories = upsertStory(stories, kind, route, promptExcerpt, requestedSkills); + persistStories(sessionId, stories, log); + const plan = derivePlan(observations, stories, options); + persistPlanState(sessionId, plan, log); + return plan; +} +function removeLedgerArtifacts(sessionId, logger) { + const log = logger ?? createLogger(); + const dir = ledgerDir(sessionId); + try { + rmSync(dir, { recursive: true, force: true }); + log.summary("verification-ledger.artifacts_removed", { sessionId }); + } catch (error) { + logCaughtError(log, "verification-ledger.remove_artifacts_failed", error, { + sessionId + }); + } +} +export { + appendObservation, + collectRecentRoutes, + derivePlan, + deriveStoryStates, + ledgerPath, + loadObservations, + loadPlanState, + loadStories, + normalizeSerializedPlanState, + persistObservation, + persistPlanState, + persistStories, + recordObservation, + recordStory, + removeLedgerArtifacts, + resolveObservationStoryId, + selectActiveStoryId, + serializePlanState, + statePath, + storiesPath, + storyId, + upsertStory +}; diff --git a/hooks/verification-plan.mjs b/hooks/verification-plan.mjs new file mode 100644 index 0000000..3aac9a2 --- /dev/null +++ b/hooks/verification-plan.mjs @@ -0,0 +1,228 @@ +// hooks/src/verification-plan.mts +import { + derivePlan, + loadObservations, + loadStories, + loadPlanState +} from "./verification-ledger.mjs"; +import { createLogger } from "./logger.mjs"; +function selectPrimaryStory(stories) { + if (stories.length === 0) return null; + return [...stories].sort((a, b) => { + const updatedDiff = Date.parse(b.updatedAt) - Date.parse(a.updatedAt); + if (Number.isFinite(updatedDiff) && updatedDiff !== 0) return updatedDiff; + const createdDiff = Date.parse(b.createdAt) - Date.parse(a.createdAt); + if (Number.isFinite(createdDiff) && createdDiff !== 0) return createdDiff; + return a.id.localeCompare(b.id); + })[0]; +} +function selectActiveStory(result) { + if (result.activeStoryId) { + const activeStory = result.stories.find((story) => story.id === result.activeStoryId); + if (activeStory) return activeStory; + } + return selectPrimaryStory(result.stories); +} +function computePlan(sessionId, options, logger) { + const log = logger ?? createLogger(); + const observations = loadObservations(sessionId, log); + const stories = loadStories(sessionId, log); + const plan = derivePlan(observations, stories, options); + log.summary("verification-plan.computed", { + sessionId, + storyCount: stories.length, + observationCount: observations.length, + missingBoundaries: plan.missingBoundaries, + hasNextAction: plan.primaryNextAction !== null + }); + return planToResult(plan); +} +function planToResult(plan) { + const storyStates = plan.stories.map((s) => { + const ss = plan.storyStates[s.id]; + if (!ss) { + return { + storyId: s.id, + storyKind: s.kind, + route: s.route, + observationIds: [], + satisfiedBoundaries: [], + missingBoundaries: [], + recentRoutes: [], + primaryNextAction: null, + blockedReasons: [], + lastObservedAt: null + }; + } + return { + storyId: ss.storyId, + storyKind: ss.storyKind, + route: ss.route, + observationIds: ss.observationIds, + satisfiedBoundaries: [...ss.satisfiedBoundaries].sort(), + missingBoundaries: [...ss.missingBoundaries].sort(), + recentRoutes: ss.recentRoutes, + primaryNextAction: ss.primaryNextAction, + blockedReasons: ss.blockedReasons, + lastObservedAt: ss.lastObservedAt + }; + }); + return { + hasStories: plan.stories.length > 0, + activeStoryId: plan.activeStoryId, + stories: plan.stories.map((s) => ({ + id: s.id, + kind: s.kind, + route: s.route, + promptExcerpt: s.promptExcerpt, + createdAt: s.createdAt, + updatedAt: s.updatedAt + })), + storyStates, + observationCount: plan.observations.length, + satisfiedBoundaries: Array.from(plan.satisfiedBoundaries).sort(), + missingBoundaries: [...plan.missingBoundaries].sort(), + recentRoutes: plan.recentRoutes, + primaryNextAction: plan.primaryNextAction, + blockedReasons: plan.blockedReasons + }; +} +function loadCachedPlanResult(sessionId, logger) { + const log = logger ?? createLogger(); + const state = loadPlanState(sessionId, log); + if (!state) return null; + const storyStates = (state.storyStates ?? []).map((ss) => ({ + storyId: ss.storyId, + storyKind: ss.storyKind, + route: ss.route, + observationIds: ss.observationIds, + satisfiedBoundaries: [...ss.satisfiedBoundaries].sort(), + missingBoundaries: [...ss.missingBoundaries].sort(), + recentRoutes: ss.recentRoutes, + primaryNextAction: ss.primaryNextAction, + blockedReasons: ss.blockedReasons, + lastObservedAt: ss.lastObservedAt + })); + return { + hasStories: state.stories.length > 0, + activeStoryId: state.activeStoryId ?? null, + stories: state.stories.map((s) => ({ + id: s.id, + kind: s.kind, + route: s.route, + promptExcerpt: s.promptExcerpt, + createdAt: s.createdAt, + updatedAt: s.updatedAt + })), + storyStates, + observationCount: state.observationIds.length, + satisfiedBoundaries: [...state.satisfiedBoundaries].sort(), + missingBoundaries: [...state.missingBoundaries].sort(), + recentRoutes: state.recentRoutes, + primaryNextAction: state.primaryNextAction, + blockedReasons: state.blockedReasons + }; +} +function planToLoopSnapshot(plan) { + const result = planToResult(plan); + const last = plan.observations[plan.observations.length - 1] ?? null; + if (!last) { + return { + ...result, + lastObservation: null + }; + } + const meta = last.meta ?? {}; + return { + ...result, + lastObservation: { + id: last.id, + boundary: last.boundary, + route: last.route, + matchedSuggestedAction: typeof meta.matchedSuggestedAction === "boolean" ? meta.matchedSuggestedAction : null, + suggestedBoundary: typeof meta.suggestedBoundary === "string" ? meta.suggestedBoundary : null, + suggestedAction: typeof meta.suggestedAction === "string" ? meta.suggestedAction : null + } + }; +} +function formatVerificationBanner(result) { + if (!result.hasStories) return null; + if (!result.primaryNextAction && result.missingBoundaries.length === 0) return null; + const lines = [""]; + lines.push("**[Verification Plan]**"); + const story = selectActiveStory(result); + if (story) { + const routePart = story.route ? ` (${story.route})` : ""; + lines.push(`Story: ${story.kind}${routePart} \u2014 "${story.promptExcerpt}"`); + } + const satisfied = result.satisfiedBoundaries; + const missing = result.missingBoundaries; + if (satisfied.length > 0 || missing.length > 0) { + lines.push(`Evidence: ${satisfied.length}/4 boundaries satisfied [${satisfied.join(", ") || "none"}]`); + if (missing.length > 0) { + lines.push(`Missing: ${missing.join(", ")}`); + } + } + if (result.primaryNextAction) { + lines.push(`Next action: \`${result.primaryNextAction.action}\``); + lines.push(`Reason: ${result.primaryNextAction.reason}`); + } else if (result.blockedReasons.length > 0) { + lines.push(`Blocked: ${result.blockedReasons[0]}`); + } else { + lines.push("All verification boundaries satisfied."); + } + lines.push(""); + return lines.join("\n"); +} +function formatPlanHuman(result) { + if (!result.hasStories) { + return "No verification stories active.\nNo observations recorded.\n"; + } + const lines = []; + const activeStory = selectActiveStory(result); + if (activeStory) { + const routePart = activeStory.route ? ` (${activeStory.route})` : ""; + lines.push(`Active story: ${activeStory.kind}${routePart}: "${activeStory.promptExcerpt}"`); + } + const satisfied = result.satisfiedBoundaries; + const missing = result.missingBoundaries; + lines.push(`Evidence: ${satisfied.length}/4 boundaries satisfied [${satisfied.join(", ") || "none"}]`); + if (missing.length > 0) { + lines.push(`Missing: ${missing.join(", ")}`); + } + if (result.primaryNextAction) { + lines.push(`Next action: ${result.primaryNextAction.action}`); + lines.push(` Reason: ${result.primaryNextAction.reason}`); + } else if (result.blockedReasons.length > 0) { + lines.push("Next action: blocked"); + for (const reason of result.blockedReasons) { + lines.push(` - ${reason}`); + } + } else if (missing.length === 0) { + lines.push("All verification boundaries satisfied."); + } else { + lines.push("No next action available."); + } + const otherStories = result.stories.filter((s) => s.id !== (activeStory?.id ?? null)); + if (otherStories.length > 0) { + lines.push(""); + lines.push("Other stories:"); + for (const story of otherStories) { + const ss = result.storyStates?.find((st) => st.storyId === story.id); + const satisfiedCount = ss ? ss.satisfiedBoundaries.length : 0; + const routePart = story.route ? ` (${story.route})` : ""; + lines.push(` ${story.kind}${routePart} \u2014 ${satisfiedCount}/4 boundaries satisfied`); + } + } + return lines.join("\n") + "\n"; +} +export { + computePlan, + formatPlanHuman, + formatVerificationBanner, + loadCachedPlanResult, + planToLoopSnapshot, + planToResult, + selectActiveStory, + selectPrimaryStory +}; diff --git a/hooks/verification-signal.mjs b/hooks/verification-signal.mjs new file mode 100644 index 0000000..e36dc40 --- /dev/null +++ b/hooks/verification-signal.mjs @@ -0,0 +1,208 @@ +// hooks/src/verification-signal.mts +var BASH_BOUNDARY_PATTERNS = [ + // uiRender: browser/screenshot/playwright/puppeteer commands → strong + // More specific patterns first to avoid early generic matches + { boundary: "uiRender", pattern: /\bnpx\s+playwright\b/i, label: "playwright-cli", evidenceSource: "browser", signalStrength: "strong" }, + { boundary: "uiRender", pattern: /\bopen\s+https?:/i, label: "open-url", evidenceSource: "browser", signalStrength: "strong" }, + { boundary: "uiRender", pattern: /\b(open|launch|browse|screenshot|puppeteer|playwright|chromium|firefox|webkit)\b/i, label: "browser-tool", evidenceSource: "browser", signalStrength: "strong" }, + // clientRequest: curl, wget, httpie → strong + { boundary: "clientRequest", pattern: /\b(curl|wget|http|httpie)\b/i, label: "http-client", evidenceSource: "bash", signalStrength: "strong" }, + { boundary: "clientRequest", pattern: /\bfetch\s*\(/i, label: "fetch-call", evidenceSource: "bash", signalStrength: "strong" }, + { boundary: "clientRequest", pattern: /\bnpx\s+undici\b/i, label: "undici-cli", evidenceSource: "bash", signalStrength: "strong" }, + // serverHandler: log tailing, server inspection → strong (Bash observation of server state) + { boundary: "serverHandler", pattern: /\b(tail|less|cat)\b.*\.(log|out|err)\b/i, label: "log-tail", evidenceSource: "bash", signalStrength: "strong" }, + { boundary: "serverHandler", pattern: /\b(tail\s+-f|journalctl\s+-f)\b/i, label: "log-follow", evidenceSource: "bash", signalStrength: "strong" }, + { boundary: "serverHandler", pattern: /\blog(s)?\s/i, label: "log-command", evidenceSource: "bash", signalStrength: "strong" }, + { boundary: "serverHandler", pattern: /\b(vercel\s+logs|vercel\s+inspect)\b/i, label: "vercel-logs", evidenceSource: "bash", signalStrength: "strong" }, + { boundary: "serverHandler", pattern: /\b(lsof|netstat|ss)\s.*:(3000|3001|4000|5173|8080)\b/i, label: "port-inspect", evidenceSource: "bash", signalStrength: "strong" }, + // environment: env reads, config inspection → strong (Bash env observation) + { boundary: "environment", pattern: /\b(printenv|env\b|echo\s+\$)/i, label: "env-read", evidenceSource: "bash", signalStrength: "strong" }, + { boundary: "environment", pattern: /\bvercel\s+env\b/i, label: "vercel-env", evidenceSource: "bash", signalStrength: "strong" }, + { boundary: "environment", pattern: /\bcat\b.*\.env\b/i, label: "dotenv-read", evidenceSource: "bash", signalStrength: "strong" }, + { boundary: "environment", pattern: /\bnode\s+-e\b.*process\.env\b/i, label: "node-env", evidenceSource: "bash", signalStrength: "strong" } +]; +var BROWSER_TOOLS = /* @__PURE__ */ new Set([ + "agent_browser", + "agent-browser", + "mcp__browser__navigate", + "mcp__browser__screenshot", + "mcp__browser__click", + "mcp__puppeteer__navigate", + "mcp__puppeteer__screenshot", + "mcp__playwright__navigate", + "mcp__playwright__screenshot" +]); +var HTTP_TOOLS = /* @__PURE__ */ new Set([ + "WebFetch", + "mcp__fetch__fetch", + "mcp__http__request", + "mcp__http__get", + "mcp__http__post" +]); +var URL_ROUTE_REGEX = /https?:\/\/[^/\s]+(\/([\w-]+(?:\/[\w-]+)*))/; +function inferRouteFromUrl(url) { + const match = URL_ROUTE_REGEX.exec(url); + return match?.[1] ?? null; +} +var FILE_ROUTE_REGEX = /\b(?:app|pages|src\/pages|src\/app)\/([\w[\].-]+(?:\/[\w[\].-]+)*)/; +function inferRouteFromFilePath(filePath) { + const match = FILE_ROUTE_REGEX.exec(filePath); + if (!match) return null; + const route = "/" + match[1].replace(/\/page\.\w+$/, "").replace(/\/route\.\w+$/, "").replace(/\/layout\.\w+$/, "").replace(/\/loading\.\w+$/, "").replace(/\/error\.\w+$/, "").replace(/\[([^\]]+)\]/g, ":$1"); + return route === "/" ? "/" : route.replace(/\/$/, ""); +} +function classifyVerificationSignal(input) { + const { toolName, toolInput } = input; + if (toolName === "Bash") { + const command = String(toolInput.command || ""); + if (!command) return null; + for (const bp of BASH_BOUNDARY_PATTERNS) { + if (bp.pattern.test(command)) { + const inferredRoute = inferRouteFromUrl(command); + return { + boundary: bp.boundary, + matchedPattern: bp.label, + inferredRoute, + signalStrength: bp.signalStrength, + evidenceSource: bp.evidenceSource, + summary: command.slice(0, 200), + toolName: "Bash" + }; + } + } + return null; + } + if (BROWSER_TOOLS.has(toolName)) { + const url = String(toolInput.url || toolInput.uri || ""); + return { + boundary: "uiRender", + matchedPattern: "browser-tool", + inferredRoute: url ? inferRouteFromUrl(url) : null, + signalStrength: "strong", + evidenceSource: "browser", + summary: url ? url.slice(0, 200) : toolName, + toolName + }; + } + if (HTTP_TOOLS.has(toolName)) { + const url = String(toolInput.url || toolInput.uri || ""); + if (!url && toolName !== "WebFetch") { + return { + boundary: "clientRequest", + matchedPattern: "http-tool", + inferredRoute: null, + signalStrength: "strong", + evidenceSource: "http", + summary: toolName, + toolName + }; + } + if (!url) return null; + return { + boundary: "clientRequest", + matchedPattern: toolName === "WebFetch" ? "web-fetch" : "http-tool", + inferredRoute: inferRouteFromUrl(url), + signalStrength: "strong", + evidenceSource: "http", + summary: url.slice(0, 200), + toolName + }; + } + if (toolName === "Read") { + const filePath = String(toolInput.file_path || ""); + if (!filePath) return null; + if (/\.env(\.\w+)?$/.test(filePath)) { + return { + boundary: "environment", + matchedPattern: "env-file-read", + inferredRoute: null, + signalStrength: "soft", + evidenceSource: "env-read", + summary: filePath, + toolName: "Read" + }; + } + if (/vercel\.json$/.test(filePath) || /\.vercel\/project\.json$/.test(filePath)) { + return { + boundary: "environment", + matchedPattern: "vercel-config-read", + inferredRoute: null, + signalStrength: "soft", + evidenceSource: "env-read", + summary: filePath, + toolName: "Read" + }; + } + if (/\.(log|out|err)$/.test(filePath) || /vercel-logs/.test(filePath) || /\.next\/.*server.*\.log/.test(filePath)) { + return { + boundary: "serverHandler", + matchedPattern: "log-file-read", + inferredRoute: inferRouteFromFilePath(filePath), + signalStrength: "soft", + evidenceSource: "log-read", + summary: filePath, + toolName: "Read" + }; + } + return null; + } + if (toolName === "Grep") { + const path = String(toolInput.path || ""); + if (/\.(log|out|err)$/.test(path) || /logs?\//.test(path)) { + return { + boundary: "serverHandler", + matchedPattern: "log-grep", + inferredRoute: null, + signalStrength: "soft", + evidenceSource: "log-read", + summary: `grep ${toolInput.pattern || ""} in ${path}`.slice(0, 200), + toolName: "Grep" + }; + } + if (/\.env/.test(path)) { + return { + boundary: "environment", + matchedPattern: "env-grep", + inferredRoute: null, + signalStrength: "soft", + evidenceSource: "env-read", + summary: `grep ${toolInput.pattern || ""} in ${path}`.slice(0, 200), + toolName: "Grep" + }; + } + return null; + } + if (toolName === "Glob") { + const pattern = String(toolInput.pattern || ""); + if (/\*\.(log|out|err)/.test(pattern) || /logs?\//.test(pattern)) { + return { + boundary: "serverHandler", + matchedPattern: "log-glob", + inferredRoute: null, + signalStrength: "soft", + evidenceSource: "log-read", + summary: `glob ${pattern}`.slice(0, 200), + toolName: "Glob" + }; + } + if (/\.env/.test(pattern)) { + return { + boundary: "environment", + matchedPattern: "env-glob", + inferredRoute: null, + signalStrength: "soft", + evidenceSource: "env-read", + summary: `glob ${pattern}`.slice(0, 200), + toolName: "Glob" + }; + } + return null; + } + if (toolName === "Edit" || toolName === "Write") { + return null; + } + return null; +} +export { + classifyVerificationSignal +}; diff --git a/package.json b/package.json index a2031f8..0811771 100644 --- a/package.json +++ b/package.json @@ -18,6 +18,7 @@ "typecheck": "tsc -p hooks/tsconfig.json --noEmit", "test": "bun run typecheck && bun test", "test:update-snapshots": "UPDATE_SNAPSHOTS=1 bun test tests/snapshot-runner.test.ts", + "learn": "bun run src/cli/index.ts learn", "playground:generate": "bun run .playground/generate-all.ts" }, "devDependencies": { diff --git a/scripts/build-manifest.ts b/scripts/build-manifest.ts index 74ad079..042043f 100644 --- a/scripts/build-manifest.ts +++ b/scripts/build-manifest.ts @@ -16,8 +16,13 @@ import { globToRegex, importPatternToRegex } from "../hooks/patterns.mjs"; import type { SkillEntry, ManifestSkill } from "../hooks/patterns.mjs"; import type { ChainToRule, ValidationRule } from "../hooks/skill-map-frontmatter.mjs"; import { loadValidatedSkillMap } from "../src/shared/skill-map-loader.ts"; +import { + EXCLUDED_SKILL_PATTERN, + filterExcludedSkillMap, + type SkillExclusion, +} from "../src/shared/skill-exclusion-policy.ts"; -export { buildManifest, writeManifestFile, synthesizeChainToFromValidate }; +export { buildManifest, writeManifestFile, synthesizeChainToFromValidate, EXCLUDED_SKILL_PATTERN }; const ROOT = resolve(import.meta.dir, ".."); const SKILLS_DIR = join(ROOT, "skills"); @@ -31,6 +36,7 @@ interface ManifestSkillWithBody extends ManifestSkill { interface Manifest { generatedAt: string; version: 2; + excludedSkills: SkillExclusion[]; skills: Record; } @@ -154,8 +160,22 @@ function buildManifest(skillsDir: string): { manifest: Manifest; warnings: strin allWarnings.push(...validation.warnings); } + // Filter out test-only / fake skills before building the runtime manifest + const allSkills = validation.normalizedSkillMap.skills as Record; + const { included: normalizedSkills, excluded: excludedSkillEntries } = + filterExcludedSkillMap(allSkills); + if (excludedSkillEntries.length > 0) { + console.error( + JSON.stringify({ + event: "skill_manifest_exclusions", + count: excludedSkillEntries.length, + skills: excludedSkillEntries.map((e) => e.slug), + reason: "test-only-pattern", + }), + ); + } + // Auto-synthesize chainTo from upgradeToSkill validate rules - const normalizedSkills = validation.normalizedSkillMap.skills as Record; const allSlugs = new Set(Object.keys(normalizedSkills)); const { count: synthCount, warnings: synthWarnings } = synthesizeChainToFromValidate(normalizedSkills, allSlugs); @@ -189,6 +209,7 @@ function buildManifest(skillsDir: string): { manifest: Manifest; warnings: strin const manifest: Manifest = { generatedAt: new Date().toISOString(), version: 2, + excludedSkills: excludedSkillEntries, skills, }; diff --git a/scripts/verify-playbook-transactionality.mjs b/scripts/verify-playbook-transactionality.mjs new file mode 100644 index 0000000..23e7f6c --- /dev/null +++ b/scripts/verify-playbook-transactionality.mjs @@ -0,0 +1,139 @@ +import assert from "node:assert/strict"; +import { + applyVerifiedPlaybookInsertion, + buildPlaybookExposureRoles, +} from "../hooks/pretooluse-skill-inject.mjs"; + +const banner = + "**[Verified Playbook]** verification → agent-browser-verify → investigation-mode"; + +// Case 1: applies new playbook steps +const applied = applyVerifiedPlaybookInsertion({ + rankedSkills: ["verification", "env-vars"], + matched: new Set(["verification", "env-vars"]), + injectedSkills: new Set(), + dedupOff: false, + forceSummarySkills: new Set(), + selection: { + anchorSkill: "verification", + insertedSkills: ["agent-browser-verify", "investigation-mode"], + banner, + }, +}); + +assert.equal(applied.applied, true); +assert.deepEqual(applied.rankedSkills, [ + "verification", + "agent-browser-verify", + "investigation-mode", + "env-vars", +]); +assert.deepEqual(applied.appliedOrderedSkills, [ + "verification", + "agent-browser-verify", + "investigation-mode", +]); +assert.deepEqual(applied.appliedInsertedSkills, [ + "agent-browser-verify", + "investigation-mode", +]); +assert.equal(applied.banner, banner); + +// Case 2: suppresses banner on noop (all inserted skills already present) +const noop = applyVerifiedPlaybookInsertion({ + rankedSkills: ["verification", "agent-browser-verify", "investigation-mode"], + matched: new Set([ + "verification", + "agent-browser-verify", + "investigation-mode", + ]), + injectedSkills: new Set(), + dedupOff: false, + forceSummarySkills: new Set(), + selection: { + anchorSkill: "verification", + insertedSkills: ["agent-browser-verify", "investigation-mode"], + banner, + }, +}); + +assert.equal(noop.applied, false); +assert.deepEqual(noop.appliedOrderedSkills, []); +assert.deepEqual(noop.appliedInsertedSkills, []); +assert.equal(noop.banner, null); + +// Case 3: builds candidate + context exposure roles +const roles = buildPlaybookExposureRoles([ + "verification", + "agent-browser-verify", + "investigation-mode", +]); + +assert.deepEqual(roles, [ + { + skill: "verification", + attributionRole: "candidate", + candidateSkill: "verification", + }, + { + skill: "agent-browser-verify", + attributionRole: "context", + candidateSkill: "verification", + }, + { + skill: "investigation-mode", + attributionRole: "context", + candidateSkill: "verification", + }, +]); + +// Case 4: anchor missing returns applied: false +const noAnchor = applyVerifiedPlaybookInsertion({ + rankedSkills: ["env-vars"], + matched: new Set(["env-vars"]), + injectedSkills: new Set(), + dedupOff: false, + forceSummarySkills: new Set(), + selection: { + anchorSkill: "verification", + insertedSkills: ["agent-browser-verify"], + banner, + }, +}); + +assert.equal(noAnchor.applied, false); +assert.deepEqual(noAnchor.appliedOrderedSkills, []); +assert.deepEqual(noAnchor.appliedInsertedSkills, []); +assert.equal(noAnchor.banner, null); + +// Case 5: null selection returns applied: false +const nullSelection = applyVerifiedPlaybookInsertion({ + rankedSkills: ["verification"], + matched: new Set(["verification"]), + injectedSkills: new Set(), + dedupOff: false, + forceSummarySkills: new Set(), + selection: null, +}); + +assert.equal(nullSelection.applied, false); +assert.deepEqual(nullSelection.appliedOrderedSkills, []); +assert.deepEqual(nullSelection.appliedInsertedSkills, []); +assert.equal(nullSelection.banner, null); + +console.log( + JSON.stringify( + { + ok: true, + cases: [ + "applies-new-playbook-steps", + "suppresses-banner-on-noop", + "builds-candidate-context-exposure-roles", + "anchor-missing-returns-noop", + "null-selection-returns-noop", + ], + }, + null, + 2, + ), +); diff --git a/skills/fake-banned-test-skill/SKILL.md b/skills/fake-banned-test-skill/SKILL.md new file mode 100644 index 0000000..6462ccb --- /dev/null +++ b/skills/fake-banned-test-skill/SKILL.md @@ -0,0 +1,10 @@ +--- +name: fake-banned-test-skill +description: test banned +--- + +# Test + +```bash +vercel logs drain ls +``` diff --git a/src/cli/explain.ts b/src/cli/explain.ts index 1d53491..7d28449 100644 --- a/src/cli/explain.ts +++ b/src/cli/explain.ts @@ -22,11 +22,19 @@ import { rankEntries, } from "../../hooks/patterns.mjs"; import { loadValidatedSkillMap } from "../shared/skill-map-loader.ts"; +import { filterExcludedSkillMap } from "../shared/skill-exclusion-policy.ts"; import { resolveVercelJsonSkills, isVercelJsonPath, VERCEL_JSON_SKILLS, } from "../../hooks/vercel-config.mjs"; +import { + applyPolicyBoosts, + type RoutingPolicyFile, +} from "../../hooks/src/routing-policy.mts"; +import { + loadProjectRoutingPolicy, +} from "../../hooks/src/routing-policy-ledger.mts"; const MAX_SKILLS = 3; const DEFAULT_INJECTION_BUDGET_BYTES = 12_000; @@ -45,6 +53,10 @@ export interface ExplainMatch { bodyBytes: number | null; /** Human-readable explanation of why the skill was dropped or how it was injected */ capReason: string; + /** Policy boost applied (0 when no policy data or below threshold) */ + policyBoost?: number; + /** Human-readable policy stats when policy data is present */ + policyReason?: string | null; } export interface ExplainCollision { @@ -78,6 +90,8 @@ export interface ExplainOptions { fileContent?: string; /** Explicit tool name (Read, Edit, Write, Bash) — overrides auto-detection */ toolName?: string; + /** Pre-loaded routing policy (loads from project tmpdir if not provided) */ + policyFile?: RoutingPolicyFile; } // --------------------------------------------------------------------------- @@ -138,7 +152,10 @@ export function explain(target: string, projectRoot: string, options?: ExplainOp throw new Error(`Skill map validation failed: ${validation.errors.join(", ")}`); } buildWarnings = buildDiagnostics; - skillMap = skills; + // Apply the same exclusion policy as the manifest build so excluded + // test-only skills never surface as live runtime candidates. + const { included } = filterExcludedSkillMap(skills); + skillMap = included; } const targetType = detectTargetType(target, opts.toolName); @@ -219,6 +236,28 @@ export function explain(target: string, projectRoot: string, options?: ExplainOp } } + // Policy boost: apply verified routing policy boosts + const policy = opts.policyFile ?? loadProjectRoutingPolicy(projectRoot); + const toolForPolicy = opts.toolName ?? (targetType === "bash" ? "Bash" : "Read"); + const policyScenario = { + hook: "PreToolUse" as const, + storyKind: null as string | null, + targetBoundary: null as null, + toolName: toolForPolicy as "Read" | "Edit" | "Write" | "Bash", + }; + const boostedEntries = applyPolicyBoosts(matchedEntries, policy, policyScenario); + + // Write back boosted effective priorities and track policy data + const policyData = new Map(); + for (const b of boostedEntries) { + if (b.policyBoost !== 0 || b.policyReason) { + policyData.set(b.skill, { boost: b.policyBoost, reason: b.policyReason }); + } + // Update matched entry with policy-adjusted effective priority + const orig = matchedEntries.find((e) => e.skill === b.skill); + if (orig) orig.effectivePriority = b.effectivePriority; + } + // Sort by effectivePriority DESC, then skill name ASC const rankedEntries = rankEntries(matchedEntries); @@ -228,7 +267,8 @@ export function explain(target: string, projectRoot: string, options?: ExplainOp // Build result with injection/cap/budget tracking const matches: ExplainMatch[] = rankedEntries.map((entry, idx) => { const plan = injectionPlan.get(entry.skill)!; - return { + const pd = policyData.get(entry.skill); + const match: ExplainMatch = { skill: entry.skill, priority: entry.priority, effectivePriority: entry.effectivePriority, @@ -240,6 +280,11 @@ export function explain(target: string, projectRoot: string, options?: ExplainOp bodyBytes: plan.bodyBytes, capReason: plan.capReason, }; + if (pd) { + match.policyBoost = pd.boost; + match.policyReason = pd.reason; + } + return match; }); // Detect collisions: skills at same priority competing for injection slots @@ -286,6 +331,10 @@ interface InjectionPlan { capReason: string; } +function explainSkillInvocationMessage(skill: string): string { + return `You must run the Skill(${skill}) tool.`; +} + function simulateInjection( rankedEntries: Array<{ skill: string }>, skillMap: Record, @@ -300,14 +349,14 @@ function simulateInjection( const skill = entry.skill; const skillPath = join(projectRoot, "skills", skill, "SKILL.md"); - // Read body size + // Read the on-disk body for informational reporting, but budget the same + // invocation string the runtime injector emits. let bodyBytes: number | null = null; let wrappedBytes = 0; try { const content = readFileSync(skillPath, "utf-8"); - const wrapped = `\n${content}\n`; - wrappedBytes = Buffer.byteLength(wrapped, "utf-8"); - bodyBytes = wrappedBytes; + bodyBytes = Buffer.byteLength(content, "utf-8"); + wrappedBytes = Buffer.byteLength(explainSkillInvocationMessage(skill), "utf-8"); } catch { // SKILL.md not found — would be skipped at runtime too result.set(skill, { mode: "droppedByCap", bodyBytes: null, capReason: "SKILL.md not found" }); @@ -325,8 +374,7 @@ function simulateInjection( // Try summary fallback const summary = skillMap[skill]?.summary; if (summary) { - const summaryWrapped = `\n${summary}\n`; - const summaryBytes = Buffer.byteLength(summaryWrapped, "utf-8"); + const summaryBytes = Buffer.byteLength(explainSkillInvocationMessage(skill), "utf-8"); if (usedBytes + summaryBytes <= budgetBytes) { result.set(skill, { mode: "summary", bodyBytes, capReason: `full body (${wrappedBytes}B) exceeds budget (${usedBytes}+${wrappedBytes} > ${budgetBytes}B); using summary (${summaryBytes}B)` }); loadedCount++; @@ -383,14 +431,27 @@ export function formatExplainResult(result: ExplainResult): string { else if (m.injectionMode === "droppedByBudget") status = "BUDGET"; else status = "CAPPED"; - const priStr = m.effectivePriority !== m.priority - ? `${m.effectivePriority} (base ${m.priority})` - : `${m.priority}`; + const policyDelta = m.policyBoost ?? 0; + const nonPolicyBase = m.effectivePriority - policyDelta; + let priStr: string; + if (policyDelta !== 0 && nonPolicyBase !== m.priority) { + // Both profiler/vercel.json and policy boosts active + priStr = `${m.effectivePriority} (base ${m.priority}, policy ${policyDelta > 0 ? "+" : ""}${policyDelta})`; + } else if (policyDelta !== 0) { + priStr = `${m.effectivePriority} (base ${m.priority}, policy ${policyDelta > 0 ? "+" : ""}${policyDelta})`; + } else if (m.effectivePriority !== m.priority) { + priStr = `${m.effectivePriority} (base ${m.priority})`; + } else { + priStr = `${m.priority}`; + } const bytesStr = m.bodyBytes != null ? ` (${m.bodyBytes} bytes)` : ""; lines.push(` [${status}] ${m.skill}${bytesStr}`); lines.push(` priority: ${priStr}`); lines.push(` pattern: ${m.matchedPattern} (${m.matchType})`); lines.push(` reason: ${m.capReason}`); + if (m.policyReason) { + lines.push(` policy: ${m.policyReason}`); + } } if (result.collisions.length > 0) { diff --git a/src/cli/index.ts b/src/cli/index.ts index 73dd212..025fbc4 100644 --- a/src/cli/index.ts +++ b/src/cli/index.ts @@ -7,10 +7,15 @@ * vercel-plugin explain --help */ -import { existsSync } from "node:fs"; +import { existsSync, readFileSync } from "node:fs"; import { resolve, join } from "node:path"; import { explain, formatExplainResult } from "./explain.ts"; import { doctor, formatDoctorResult } from "../commands/doctor.ts"; +import { runRoutingExplain } from "../commands/routing-explain.ts"; +import { runSessionExplain } from "../commands/session-explain.ts"; +import { runDecisionCat } from "../commands/decision-cat.ts"; +import { createEmptyRoutingPolicy, type RoutingPolicyFile } from "../../hooks/src/routing-policy.mts"; +import { runLearnCommand } from "./learn.ts"; function validateProjectRoot(projectRoot: string): void { const skillsDir = join(projectRoot, "skills"); @@ -28,6 +33,10 @@ function printUsage() { Commands: explain Show which skills match a file path or bash command + routing-explain Show the latest routing decision trace + session-explain Show manifest, routing, verification, and exposure state together + decision-cat Read and display a decision capsule artifact + learn Distill verified routing wins into learned rules doctor Run self-diagnosis checks on the plugin setup Options for explain: @@ -35,6 +44,32 @@ Options for explain: --project Project root (default: current plugin directory) --likely-skills s1,s2 Simulate session-start profiler boost (+5 priority) --budget Override injection byte budget (default: 12000) + --policy-file Load routing policy from a JSON file (default: project tmpdir) + --help, -h Show this help message + +Options for routing-explain: + --json Output machine-readable JSON + --session Session ID (reads traces from session trace dir) + --help, -h Show this help message + +Options for decision-cat: + --json Output machine-readable JSON + --help, -h Show this help message + +Options for learn: + --json Output machine-readable JSON + --write Write generated/learned-routing-rules.json + --project Project root (default: current plugin directory) + --session Scope to a single session ID + --min-support Minimum support threshold (default: 5) + --min-precision Minimum precision threshold (default: 0.8) + --min-lift Minimum lift threshold (default: 1.5) + --help, -h Show this help message + +Options for session-explain: + --json Output machine-readable JSON + --session Session ID + --project Project root (default: current plugin directory) --help, -h Show this help message Examples: @@ -53,6 +88,14 @@ const command = args[0]; if (command === "explain") { runExplain(args.slice(1)); +} else if (command === "routing-explain") { + runRoutingExplainCmd(args.slice(1)); +} else if (command === "session-explain") { + runSessionExplainCmd(args.slice(1)); +} else if (command === "decision-cat") { + runDecisionCatCmd(args.slice(1)); +} else if (command === "learn") { + runLearnCmd(args.slice(1)); } else if (command === "doctor") { runDoctor(args.slice(1)); } else { @@ -67,6 +110,7 @@ function runExplain(explainArgs: string[]) { let projectRoot = resolve(import.meta.dir, "../.."); let likelySkills: string | undefined; let budgetBytes: number | undefined; + let policyFilePath: string | undefined; for (let i = 0; i < explainArgs.length; i++) { const arg = explainArgs[i]; @@ -97,6 +141,13 @@ function runExplain(explainArgs: string[]) { console.error("Error: --budget must be a positive integer"); process.exit(1); } + } else if (arg === "--policy-file") { + i++; + if (i >= explainArgs.length) { + console.error("Error: --policy-file requires a file path"); + process.exit(1); + } + policyFilePath = resolve(explainArgs[i]); } else if (arg === "--help" || arg === "-h") { printUsage(); process.exit(0); @@ -117,8 +168,19 @@ function runExplain(explainArgs: string[]) { // Validate project path has skills/ validateProjectRoot(projectRoot); + // Load policy file if provided + let policyFile: RoutingPolicyFile | undefined; + if (policyFilePath) { + try { + policyFile = JSON.parse(readFileSync(policyFilePath, "utf-8")); + } catch { + console.error(`Error: could not read routing policy file at ${policyFilePath}`); + process.exit(2); + } + } + try { - const result = explain(target, projectRoot, { likelySkills, budgetBytes }); + const result = explain(target, projectRoot, { likelySkills, budgetBytes, policyFile }); if (jsonOutput) { console.log(JSON.stringify(result, null, 2)); @@ -175,3 +237,197 @@ function runDoctor(doctorArgs: string[]) { process.exit(2); } } + +function runRoutingExplainCmd(cmdArgs: string[]) { + let jsonOutput = false; + let sessionId: string | null = null; + + for (let i = 0; i < cmdArgs.length; i++) { + const arg = cmdArgs[i]; + if (arg === "--json") { + jsonOutput = true; + } else if (arg === "--session") { + i++; + if (i >= cmdArgs.length) { + console.error("Error: --session requires a session ID argument"); + process.exit(1); + } + sessionId = cmdArgs[i]; + } else if (arg === "--help" || arg === "-h") { + printUsage(); + process.exit(0); + } else { + console.error(`Error: unexpected argument "${arg}"`); + process.exit(1); + } + } + + try { + const output = runRoutingExplain(sessionId, jsonOutput); + console.log(output); + process.exit(0); + } catch (err: any) { + console.error(`Error: ${err.message}`); + process.exit(2); + } +} + +function runSessionExplainCmd(cmdArgs: string[]) { + let jsonOutput = false; + let sessionId: string | null = null; + let projectRoot = resolve(import.meta.dir, "../.."); + + for (let i = 0; i < cmdArgs.length; i++) { + const arg = cmdArgs[i]; + if (arg === "--json") { + jsonOutput = true; + } else if (arg === "--session") { + i++; + if (i >= cmdArgs.length) { + console.error("Error: --session requires a session ID argument"); + process.exit(1); + } + sessionId = cmdArgs[i]; + } else if (arg === "--project") { + i++; + if (i >= cmdArgs.length) { + console.error("Error: --project requires a path argument"); + process.exit(1); + } + projectRoot = resolve(cmdArgs[i]); + } else if (arg === "--help" || arg === "-h") { + printUsage(); + process.exit(0); + } else { + console.error(`Error: unexpected argument "${arg}"`); + process.exit(1); + } + } + + try { + const output = runSessionExplain(sessionId, projectRoot, jsonOutput); + console.log(output); + process.exit(0); + } catch (err: any) { + console.error(`Error: ${err.message}`); + process.exit(2); + } +} + +function runDecisionCatCmd(cmdArgs: string[]) { + let jsonOutput = false; + let artifactPath = ""; + + for (let i = 0; i < cmdArgs.length; i++) { + const arg = cmdArgs[i]; + if (arg === "--json") { + jsonOutput = true; + } else if (arg === "--help" || arg === "-h") { + printUsage(); + process.exit(0); + } else if (arg!.startsWith("-")) { + console.error(`Error: unexpected option "${arg}"`); + process.exit(1); + } else if (!artifactPath) { + artifactPath = resolve(arg!); + } else { + console.error(`Error: unexpected argument "${arg}"`); + process.exit(1); + } + } + + if (!artifactPath) { + console.error("Error: decision-cat requires an argument"); + process.exit(1); + } + + const { output, ok } = runDecisionCat(artifactPath, jsonOutput); + + if (ok) { + console.log(output); + process.exit(0); + } else { + // For JSON mode, output goes to stdout (structured failure); for text, stderr + if (jsonOutput) { + console.log(output); + } else { + console.error(output); + } + process.exit(2); + } +} + +function runLearnCmd(cmdArgs: string[]) { + let jsonOutput = false; + let writeOutput = false; + let projectRoot = resolve(import.meta.dir, "../.."); + let sessionId: string | undefined; + let minSupport: number | undefined; + let minPrecision: number | undefined; + let minLift: number | undefined; + + for (let i = 0; i < cmdArgs.length; i++) { + const arg = cmdArgs[i]; + if (arg === "--json") { + jsonOutput = true; + } else if (arg === "--write") { + writeOutput = true; + } else if (arg === "--project") { + i++; + if (i >= cmdArgs.length) { + console.error("Error: --project requires a path argument"); + process.exit(1); + } + projectRoot = resolve(cmdArgs[i]); + } else if (arg === "--session") { + i++; + if (i >= cmdArgs.length) { + console.error("Error: --session requires a session ID argument"); + process.exit(1); + } + sessionId = cmdArgs[i]; + } else if (arg === "--min-support") { + i++; + if (i >= cmdArgs.length) { + console.error("Error: --min-support requires a number"); + process.exit(1); + } + minSupport = Number(cmdArgs[i]); + } else if (arg === "--min-precision") { + i++; + if (i >= cmdArgs.length) { + console.error("Error: --min-precision requires a number"); + process.exit(1); + } + minPrecision = Number(cmdArgs[i]); + } else if (arg === "--min-lift") { + i++; + if (i >= cmdArgs.length) { + console.error("Error: --min-lift requires a number"); + process.exit(1); + } + minLift = Number(cmdArgs[i]); + } else if (arg === "--help" || arg === "-h") { + printUsage(); + process.exit(0); + } else { + console.error(`Error: unexpected argument "${arg}"`); + process.exit(1); + } + } + + runLearnCommand({ + project: projectRoot, + json: jsonOutput, + write: writeOutput, + session: sessionId, + minSupport, + minPrecision, + minLift, + }).then((code) => { + process.exit(code); + }).catch((err: any) => { + console.error(`Error: ${err.message}`); + process.exit(2); + }); +} diff --git a/src/cli/learn.ts b/src/cli/learn.ts new file mode 100644 index 0000000..37d5af0 --- /dev/null +++ b/src/cli/learn.ts @@ -0,0 +1,391 @@ +/** + * `vercel-plugin learn` — Distill verified routing wins into learned rules. + * + * Reads routing decision traces, exposure ledgers, and verification outcomes + * from session history, distills high-precision routing rules, replays them + * against historical traces to guard against regressions, and outputs or + * writes the result as a deterministic JSON artifact. + * + * Usage: + * vercel-plugin learn --project . --json + * vercel-plugin learn --project . --write + */ + +import { existsSync, writeFileSync, readdirSync } from "node:fs"; +import { resolve, join } from "node:path"; +import { tmpdir } from "node:os"; +import { readRoutingDecisionTrace } from "../../hooks/src/routing-decision-trace.mts"; +import { loadSessionExposures, loadProjectRoutingPolicy } from "../../hooks/src/routing-policy-ledger.mts"; +import { distillRulesFromTrace } from "../../hooks/src/rule-distillation.mts"; +import { distillCompanionRules } from "../../hooks/src/companion-distillation.mts"; +import { + companionRulebookPath, + saveCompanionRulebook, +} from "../../hooks/src/learned-companion-rulebook.mts"; +import { distillPlaybooks } from "../../hooks/src/playbook-distillation.mts"; +import { + createEmptyPlaybookRulebook, + playbookRulebookPath, + savePlaybookRulebook, +} from "../../hooks/src/learned-playbook-rulebook.mts"; +import type { LearnedRoutingRulesFile } from "../../hooks/src/rule-distillation.mts"; +import type { LearnedCompanionRulebook } from "../../hooks/src/learned-companion-rulebook.mts"; +import type { LearnedPlaybookRulebook } from "../../hooks/src/learned-playbook-rulebook.mts"; +import type { RoutingDecisionTrace } from "../../hooks/src/routing-decision-trace.mts"; +import type { SkillExposure } from "../../hooks/src/routing-policy-ledger.mts"; + +// --------------------------------------------------------------------------- +// Types +// --------------------------------------------------------------------------- + +export interface LearnCommandOptions { + project?: string; + json?: boolean; + write?: boolean; + session?: string; + minSupport?: number; + minPrecision?: number; + minLift?: number; +} + +export interface LearnCommandOutput { + rules: LearnedRoutingRulesFile; + companions: LearnedCompanionRulebook; + companionPath: string; + playbooks: LearnedPlaybookRulebook; + playbookPath: string; +} + +// --------------------------------------------------------------------------- +// Session discovery +// --------------------------------------------------------------------------- + +/** + * Discover session IDs from tmpdir by scanning for trace directories and + * keeping only sessions whose exposure ledger belongs to the target project. + * Pattern: vercel-plugin--trace/ + */ +function discoverSessionIds(projectRoot: string): string[] { + const tmp = tmpdir(); + try { + const entries = readdirSync(tmp); + const ids: string[] = []; + for (const entry of entries) { + const match = entry.match(/^vercel-plugin-(.+)-trace$/); + if (!match || !match[1]) continue; + const sessionExposures = loadSessionExposures(match[1]); + if ( + sessionExposures.some((exposure) => exposure.projectRoot === projectRoot) + ) { + ids.push(match[1]); + } + } + return ids.sort(); + } catch { + return []; + } +} + +/** + * Load all traces, optionally scoped to a single session. + */ +function loadTraces( + sessionId: string | null, + projectRoot: string, +): RoutingDecisionTrace[] { + if (sessionId) { + return readRoutingDecisionTrace(sessionId); + } + // Aggregate across all discovered sessions + const sessionIds = discoverSessionIds(projectRoot); + const all: RoutingDecisionTrace[] = []; + for (const id of sessionIds) { + all.push(...readRoutingDecisionTrace(id)); + } + return all; +} + +/** + * Load all exposures, optionally scoped to a single session. + */ +function loadExposures( + sessionId: string | null, + projectRoot: string, +): SkillExposure[] { + if (sessionId) { + return loadSessionExposures(sessionId); + } + const sessionIds = discoverSessionIds(projectRoot); + const all: SkillExposure[] = []; + for (const id of sessionIds) { + all.push(...loadSessionExposures(id)); + } + return all; +} + +// --------------------------------------------------------------------------- +// Output path +// --------------------------------------------------------------------------- + +export function learnedRulesPath(projectRoot: string): string { + return join(projectRoot, "generated", "learned-routing-rules.json"); +} + +// --------------------------------------------------------------------------- +// Core command +// --------------------------------------------------------------------------- + +export async function runLearnCommand(options: LearnCommandOptions): Promise { + const projectRoot = resolve(options.project ?? "."); + const jsonOutput = options.json ?? false; + const writeOutput = options.write ?? false; + const sessionId = options.session ?? null; + + // Validate project root + const skillsDir = join(projectRoot, "skills"); + if (!existsSync(skillsDir)) { + const msg = `error: no skills/ directory found at ${projectRoot}`; + if (jsonOutput) { + console.log(JSON.stringify({ ok: false, error: msg })); + } else { + console.error(msg); + } + return 2; + } + + // Load inputs + const traces = loadTraces(sessionId, projectRoot); + const exposures = loadExposures(sessionId, projectRoot); + const policy = loadProjectRoutingPolicy(projectRoot); + + console.error(JSON.stringify({ + event: "learn_inputs_loaded", + traceCount: traces.length, + exposureCount: exposures.length, + sessionScope: sessionId ?? "all", + })); + + if (traces.length === 0) { + const result: LearnedRoutingRulesFile = { + version: 1, + generatedAt: new Date().toISOString(), + projectRoot, + rules: [], + replay: { baselineWins: 0, baselineDirectiveWins: 0, learnedWins: 0, learnedDirectiveWins: 0, deltaWins: 0, deltaDirectiveWins: 0, regressions: [] }, + promotion: { accepted: true, errorCode: null, reason: "No traces to evaluate" }, + }; + const emptyCompanions = distillCompanionRules({ + projectRoot, + traces: [], + exposures: [], + }); + const emptyPlaybooks = createEmptyPlaybookRulebook(projectRoot); + const output: LearnCommandOutput = { + rules: result, + companions: emptyCompanions, + companionPath: companionRulebookPath(projectRoot), + playbooks: emptyPlaybooks, + playbookPath: playbookRulebookPath(projectRoot), + }; + if (jsonOutput) { + console.log(JSON.stringify(output, null, 2)); + } else { + console.error("No routing decision traces found. Run some sessions first."); + // Still emit human-readable summary for consistent output + console.log([ + "Learned routing rules: 0", + " promoted: 0", + " candidate: 0", + " holdout-fail: 0", + "", + "Replay:", + " baseline wins: 0", + " baseline directive wins: 0", + " learned wins: 0", + " learned directive wins: 0", + " delta: 0", + " delta directive: 0", + " regressions: 0", + "", + "Companion rules: 0", + " promoted: 0", + "", + "Playbooks: 0", + " promoted: 0", + ].join("\n")); + } + if (writeOutput) { + const outPath = learnedRulesPath(projectRoot); + writeFileSync(outPath, JSON.stringify(result, null, 2) + "\n"); + console.error(JSON.stringify({ event: "learn_written", path: outPath })); + saveCompanionRulebook(projectRoot, emptyCompanions); + console.error(JSON.stringify({ event: "learn_companion_written", path: companionRulebookPath(projectRoot) })); + savePlaybookRulebook(projectRoot, emptyPlaybooks); + console.error(JSON.stringify({ event: "learn_playbooks_written", path: playbookRulebookPath(projectRoot) })); + } + return 0; + } + + // Distill single-skill rules + const result = distillRulesFromTrace({ + projectRoot, + traces, + exposures, + policy, + minSupport: options.minSupport, + minPrecision: options.minPrecision, + minLift: options.minLift, + }); + + // Distill companion rules + const companionRulebook = distillCompanionRules({ + projectRoot, + traces, + exposures, + minSupport: options.minSupport ?? 4, + minPrecision: options.minPrecision ?? 0.75, + minLift: options.minLift ?? 1.25, + }); + + // Distill playbook rules + const playbookRulebook = distillPlaybooks({ + projectRoot, + exposures, + minSupport: options.minSupport ?? 3, + minPrecision: options.minPrecision ?? 0.75, + minLift: options.minLift ?? 1.25, + maxSkills: 3, + }); + + const promoted = result.rules.filter((r) => r.confidence === "promote").length; + const candidates = result.rules.filter((r) => r.confidence === "candidate").length; + const holdoutFail = result.rules.filter((r) => r.confidence === "holdout-fail").length; + const companionPromoted = companionRulebook.rules.filter((r) => r.confidence === "promote").length; + const companionHoldoutFail = companionRulebook.rules.filter((r) => r.confidence === "holdout-fail").length; + const playbookPromoted = playbookRulebook.rules.filter((r) => r.confidence === "promote").length; + + console.error(JSON.stringify({ + event: "learn_distill_complete", + ruleCount: result.rules.length, + promoted, + candidates, + holdoutFail, + replayDelta: result.replay.deltaWins, + regressions: result.replay.regressions.length, + companionRuleCount: companionRulebook.rules.length, + companionPromoted, + companionHoldoutFail, + playbookRuleCount: playbookRulebook.rules.length, + playbookPromoted, + })); + + const output: LearnCommandOutput = { + rules: result, + companions: companionRulebook, + companionPath: companionRulebookPath(projectRoot), + playbooks: playbookRulebook, + playbookPath: playbookRulebookPath(projectRoot), + }; + + // Output + if (jsonOutput) { + console.log(JSON.stringify(output, null, 2)); + } else { + // Human-readable summary + const lines: string[] = [ + `Learned routing rules: ${result.rules.length}`, + ` promoted: ${promoted}`, + ` candidate: ${candidates}`, + ` holdout-fail: ${holdoutFail}`, + "", + `Replay:`, + ` baseline wins: ${result.replay.baselineWins}`, + ` baseline directive wins: ${result.replay.baselineDirectiveWins}`, + ` learned wins: ${result.replay.learnedWins}`, + ` learned directive wins: ${result.replay.learnedDirectiveWins}`, + ` delta: ${result.replay.deltaWins > 0 ? "+" : ""}${result.replay.deltaWins}`, + ` delta directive: ${result.replay.deltaDirectiveWins > 0 ? "+" : ""}${result.replay.deltaDirectiveWins}`, + ` regressions: ${result.replay.regressions.length}`, + ]; + + lines.push(""); + lines.push(`Promotion: ${result.promotion.accepted ? "ACCEPTED" : "REJECTED"}`); + if (result.promotion.errorCode) { + lines.push(` error code: ${result.promotion.errorCode}`); + } + lines.push(` reason: ${result.promotion.reason}`); + + if (result.replay.regressions.length > 0) { + lines.push(""); + lines.push("Regression decision IDs:"); + for (const id of result.replay.regressions) { + lines.push(` - ${id}`); + } + } + + if (promoted > 0) { + lines.push(""); + lines.push("Promoted rules:"); + for (const rule of result.rules) { + if (rule.confidence !== "promote") continue; + lines.push(` ${rule.id} (${rule.kind}, precision=${rule.precision}, lift=${rule.lift}, support=${rule.support})`); + } + } + + // Companion rules summary + lines.push(""); + lines.push(`Companion rules: ${companionRulebook.rules.length}`); + lines.push(` promoted: ${companionPromoted}`); + lines.push(` holdout-fail: ${companionHoldoutFail}`); + + if (companionPromoted > 0) { + lines.push(""); + lines.push("Promoted companions:"); + for (const rule of companionRulebook.rules) { + if (rule.confidence !== "promote") continue; + lines.push(` ${rule.candidateSkill} -> ${rule.companionSkill} (precision=${rule.precisionWithCompanion}, lift=${rule.liftVsCandidateAlone}, support=${rule.support})`); + } + } + + // Playbook rules summary + lines.push(""); + lines.push(`Playbooks: ${playbookRulebook.rules.length}`); + lines.push(` promoted: ${playbookPromoted}`); + + if (playbookPromoted > 0) { + lines.push(""); + lines.push("Promoted playbooks:"); + for (const rule of playbookRulebook.rules) { + if (rule.confidence !== "promote") continue; + lines.push(` ${rule.orderedSkills.join(" → ")} (precision=${rule.precision}, lift=${rule.liftVsAnchorBaseline}, support=${rule.support})`); + } + } + + console.log(lines.join("\n")); + } + + // Write + if (writeOutput) { + const outPath = learnedRulesPath(projectRoot); + const payload = JSON.stringify(result, null, 2) + "\n"; + writeFileSync(outPath, payload); + console.error(JSON.stringify({ event: "learn_written", path: outPath })); + + saveCompanionRulebook(projectRoot, companionRulebook); + console.error(JSON.stringify({ event: "learn_companion_written", path: companionRulebookPath(projectRoot) })); + + savePlaybookRulebook(projectRoot, playbookRulebook); + console.error(JSON.stringify({ event: "learn_playbooks_written", path: playbookRulebookPath(projectRoot) })); + } + + // Non-zero exit if regressions detected + if (result.replay.regressions.length > 0) { + console.error(JSON.stringify({ + event: "learn_regressions_detected", + count: result.replay.regressions.length, + })); + return 1; + } + + return 0; +} diff --git a/src/commands/decision-cat.ts b/src/commands/decision-cat.ts new file mode 100644 index 0000000..fd2dc98 --- /dev/null +++ b/src/commands/decision-cat.ts @@ -0,0 +1,67 @@ +import { + readDecisionCapsule, + type DecisionCapsuleV1, +} from "../../hooks/src/routing-decision-capsule.mts"; + +export interface DecisionCatResult { + ok: boolean; + capsule: DecisionCapsuleV1 | null; + error?: string; +} + +export function runDecisionCat( + artifactPath: string, + json = false, +): { output: string; ok: boolean } { + const capsule = readDecisionCapsule(artifactPath); + + if (json) { + const result: DecisionCatResult = { + ok: capsule !== null, + capsule, + ...(capsule === null ? { error: `Cannot read capsule: ${artifactPath}` } : {}), + }; + return { output: JSON.stringify(result, null, 2), ok: result.ok }; + } + + if (!capsule) { + return { + output: `Decision capsule not found: ${artifactPath}`, + ok: false, + }; + } + + return { output: formatDecisionCapsule(capsule), ok: true }; +} + +export function formatDecisionCapsule(capsule: DecisionCapsuleV1): string { + const lines: string[] = [ + `Decision: ${capsule.decisionId}`, + `Hook: ${capsule.hook}`, + `Tool: ${capsule.input.toolName}`, + `Target: ${capsule.input.toolTarget}`, + `Story: ${capsule.activeStory.kind ?? "none"}${capsule.activeStory.route ? ` (${capsule.activeStory.route})` : ""}`, + `Injected: ${capsule.injectedSkills.join(", ") || "none"}`, + `Candidate: ${capsule.attribution?.candidateSkill ?? "none"}`, + `Rule: ${capsule.rulebookProvenance?.matchedRuleId ?? "none"}`, + ...(capsule.rulebookProvenance + ? [ + `Rule Boost: ${capsule.rulebookProvenance.ruleBoost}`, + `Rule Reason: ${capsule.rulebookProvenance.ruleReason}`, + `Rulebook: ${capsule.rulebookProvenance.rulebookPath}`, + ] + : []), + `SHA256: ${capsule.sha256}`, + ]; + + if (capsule.issues.length > 0) { + lines.push(""); + lines.push("Issues:"); + for (const issue of capsule.issues) { + lines.push(` - [${issue.severity}] ${issue.code}: ${issue.message}`); + if (issue.action) lines.push(` action: ${issue.action}`); + } + } + + return lines.join("\n"); +} diff --git a/src/commands/doctor.ts b/src/commands/doctor.ts index 66910e1..4515388 100644 --- a/src/commands/doctor.ts +++ b/src/commands/doctor.ts @@ -11,6 +11,7 @@ import { existsSync, readFileSync, statSync, readdirSync } from "node:fs"; import { join } from "node:path"; import { loadValidatedSkillMap } from "../shared/skill-map-loader.ts"; +import { filterExcludedSkillMap } from "../shared/skill-exclusion-policy.ts"; /** Maximum allowed timeout (seconds) for subagent hooks. */ const SUBAGENT_HOOK_TIMEOUT_MAX = 5; @@ -82,10 +83,13 @@ export function doctor(projectRoot: string): DoctorResult { } } + // Apply the same exclusion policy as the manifest build so test-only + // skills do not produce false manifest-parity errors. + const { included: filteredSkills } = filterExcludedSkillMap(loadedSkills); const liveSkills: Record< string, { priority: number; pathPatterns: string[]; bashPatterns: string[] } - > = loadedSkills; + > = filteredSkills; const liveSkillCount = Object.keys(liveSkills).length; diff --git a/src/commands/routing-explain.ts b/src/commands/routing-explain.ts new file mode 100644 index 0000000..b691490 --- /dev/null +++ b/src/commands/routing-explain.ts @@ -0,0 +1,131 @@ +/** + * `vercel-plugin routing-explain` — surfaces the latest routing decision + * from the flight recorder for humans and agents. + * + * Reads JSONL traces written by PreToolUse, UserPromptSubmit, and PostToolUse + * hooks, then formats the most recent decision as either structured JSON + * (for agent consumption) or human-readable text. + * + * JSON mode: { ok, decisionCount, latest } + * Text mode: decision id, hook, tool target, story context, injected skills, + * ranked candidates with effective priority and policy boost details. + */ + +import { readRoutingDecisionTrace } from "../../hooks/src/routing-decision-trace.mts"; +import type { RoutingDecisionTrace } from "../../hooks/src/routing-decision-trace.mts"; + +// --------------------------------------------------------------------------- +// Result types (stable contract for agent consumers) +// --------------------------------------------------------------------------- + +export interface RoutingExplainResult { + ok: boolean; + decisionCount: number; + latest: RoutingDecisionTrace | null; +} + +// --------------------------------------------------------------------------- +// Core logic +// --------------------------------------------------------------------------- + +export function runRoutingExplain( + sessionId: string | null, + json = false, +): string { + const traces = readRoutingDecisionTrace(sessionId); + const latest = traces[traces.length - 1] ?? null; + + if (json) { + const result: RoutingExplainResult = { + ok: true, + decisionCount: traces.length, + latest, + }; + return JSON.stringify(result, null, 2); + } + + return formatRoutingExplainText(traces, latest); +} + +// --------------------------------------------------------------------------- +// Text formatting +// --------------------------------------------------------------------------- + +function formatRoutingExplainText( + traces: RoutingDecisionTrace[], + latest: RoutingDecisionTrace | null, +): string { + if (!latest) { + return "No routing decision traces found. Use `vercel-plugin session-explain --json` for cross-surface state.\n"; + } + + const lines: string[] = [ + `Decision: ${latest.decisionId}`, + `Hook: ${latest.hook}`, + `Tool: ${latest.toolName}`, + `Target: ${latest.toolTarget}`, + `Story: ${latest.primaryStory.kind ?? "none"}${latest.primaryStory.storyRoute ? ` (${latest.primaryStory.storyRoute})` : ""}`, + `Injected: ${latest.injectedSkills.join(", ") || "none"}`, + `Total traces: ${traces.length}`, + ]; + + // Skipped reasons (undertrained, story-less, budget, cap) + if (latest.skippedReasons.length > 0) { + lines.push(`Skipped: ${latest.skippedReasons.join(", ")}`); + } + + // Verification closure info + if (latest.verification) { + const v = latest.verification; + lines.push(""); + lines.push("Verification:"); + lines.push(` id: ${v.verificationId ?? "none"}`); + lines.push(` boundary: ${v.observedBoundary ?? "none"}`); + lines.push(` matched action: ${v.matchedSuggestedAction ?? "n/a"}`); + } + + // Ranked candidates + if (latest.ranked.length > 0) { + lines.push(""); + lines.push("Ranked:"); + for (const r of latest.ranked) { + const parts = [ + `effective=${r.effectivePriority}`, + `base=${r.basePriority}`, + ]; + if (r.profilerBoost !== 0) parts.push(`profiler=+${r.profilerBoost}`); + if (r.policyBoost !== 0) + parts.push( + `policy=${r.policyBoost > 0 ? "+" : ""}${r.policyBoost}`, + ); + if (r.droppedReason) parts.push(`dropped=${r.droppedReason}`); + if (r.summaryOnly) parts.push("summary-only"); + + lines.push(` - ${r.skill}: ${parts.join(", ")}`); + if (r.policyReason) { + lines.push(` reason: ${r.policyReason}`); + } + } + } + + // Companion recall: detect ranked entries injected via verified-companion + const companions = latest.ranked.filter( + (r) => r.pattern?.type === "verified-companion", + ); + if (companions.length > 0) { + lines.push(""); + lines.push("Companions recalled:"); + for (const c of companions) { + const suffix = c.summaryOnly ? " (summary-only)" : ""; + lines.push(` - ${c.skill}${suffix}`); + } + } + + // Policy scenario for diagnostic context + if (latest.policyScenario) { + lines.push(""); + lines.push(`Policy scenario: ${latest.policyScenario}`); + } + + return lines.join("\n") + "\n"; +} diff --git a/src/commands/session-explain.ts b/src/commands/session-explain.ts new file mode 100644 index 0000000..2ec9eb3 --- /dev/null +++ b/src/commands/session-explain.ts @@ -0,0 +1,594 @@ +/** + * `vercel-plugin session-explain` — unified control-plane snapshot. + * + * Merges manifest provenance, routing decision traces, verification plan + * state, and exposure outcomes into a single deterministic JSON/human output. + * + * JSON mode: stable additive-only contract for downstream agent consumers. + * Text mode: concise operator summary with actionable next steps. + */ + +import { existsSync, readFileSync } from "node:fs"; +import { join } from "node:path"; +import { loadValidatedSkillMap } from "../shared/skill-map-loader.ts"; +import { filterExcludedSkillMap, type SkillExclusion } from "../shared/skill-exclusion-policy.ts"; +import { readRoutingDecisionTrace } from "../../hooks/src/routing-decision-trace.mts"; +import { + loadProjectRoutingPolicy, + loadSessionExposures, +} from "../../hooks/src/routing-policy-ledger.mts"; +import { + computePlan, + loadCachedPlanResult, + selectActiveStory, + type VerificationPlanResult, +} from "../../hooks/src/verification-plan.mts"; +import { + explainPolicyRecall, + parsePolicyScenario, + type PolicyRecallDiagnosis, + type RoutingDiagnosisHint, +} from "../../hooks/src/routing-diagnosis.mts"; +import { + buildVerificationDirective, + buildVerificationEnv, + type VerificationDirective, +} from "../../hooks/src/verification-directive.mts"; + +// --------------------------------------------------------------------------- +// Stable JSON contract (additive-only) +// --------------------------------------------------------------------------- + +export interface SessionExplainDiagnosis { + severity: "info" | "warning" | "error"; + code: string; + message: string; + hint?: string; +} + +export interface SessionExplainDoctorRankedSkill { + skill: string; + basePriority: number; + effectivePriority: number; + policyBoost: number; + policyReason: string | null; + synthetic: boolean; + droppedReason: string | null; +} + +export interface CompanionRecallDoctorEntry { + companionSkill: string; + candidateSkill: string | null; + patternType: string; + patternValue: string; + synthetic: boolean; + droppedReason: string | null; +} + +export interface CompanionRecallDiagnosis { + detected: boolean; + entries: CompanionRecallDoctorEntry[]; +} + +export interface SessionExplainDoctorCause { + code: string; + stage: string; + skill: string; + synthetic: boolean; + scoreDelta: number; + message: string; + detail: Record; +} + +export interface SessionExplainDoctorEdge { + fromSkill: string; + toSkill: string; + relation: string; + code: string; + detail: Record; +} + +export interface SessionExplainDoctor { + latestDecisionId: string | null; + latestScenario: string | null; + latestRanked: SessionExplainDoctorRankedSkill[]; + policyRecall: PolicyRecallDiagnosis | null; + companionRecall: CompanionRecallDiagnosis; + hints: RoutingDiagnosisHint[]; +} + +export interface SessionExplainResult { + ok: boolean; + sessionId: string | null; + manifest: { + generatedAt: string | null; + skillCount: number; + excludedSkills: SkillExclusion[]; + parity: { + ok: boolean; + missingFromManifest: string[]; + extraInManifest: string[]; + }; + }; + routing: { + decisionCount: number; + latestDecisionId: string | null; + latestHook: string | null; + latestPolicyScenario: string | null; + }; + verification: { + hasStories: boolean; + missingBoundaries: string[]; + satisfiedBoundaries: string[]; + primaryNextAction: VerificationPlanResult["primaryNextAction"]; + directive: VerificationDirective | null; + env: Record; + }; + exposures: { + pending: number; + wins: number; + directiveWins: number; + staleMisses: number; + candidateWins: number; + contextWins: number; + }; + diagnosis: SessionExplainDiagnosis[]; + doctor: SessionExplainDoctor | null; +} + +// --------------------------------------------------------------------------- +// Core logic +// --------------------------------------------------------------------------- + +function toRecord(value: unknown): Record { + return typeof value === "object" && value !== null + ? (value as Record) + : {}; +} + +function stringOrNull(value: unknown): string | null { + return typeof value === "string" && value.trim() !== "" ? value : null; +} + +function numberOrZero(value: unknown): number { + return typeof value === "number" && Number.isFinite(value) ? value : 0; +} + +function toRecordArray(value: unknown): Record[] { + return Array.isArray(value) + ? value.map((item) => toRecord(item)).filter((item) => Object.keys(item).length > 0) + : []; +} + +function readDecisionCauses( + trace: Record, +): SessionExplainDoctorCause[] { + return toRecordArray(trace.causes).map((cause) => ({ + code: stringOrNull(cause.code) ?? "unknown", + stage: stringOrNull(cause.stage) ?? "unknown", + skill: stringOrNull(cause.skill) ?? "unknown", + synthetic: cause.synthetic === true, + scoreDelta: typeof cause.scoreDelta === "number" ? cause.scoreDelta : 0, + message: stringOrNull(cause.message) ?? "", + detail: toRecord(cause.detail), + })); +} + +function readDecisionEdges( + trace: Record, +): SessionExplainDoctorEdge[] { + return toRecordArray(trace.edges).map((edge) => ({ + fromSkill: stringOrNull(edge.fromSkill) ?? "unknown", + toSkill: stringOrNull(edge.toSkill) ?? "unknown", + relation: stringOrNull(edge.relation) ?? "unknown", + code: stringOrNull(edge.code) ?? "unknown", + detail: toRecord(edge.detail), + })); +} + +function buildCompanionRecallDiagnosis( + trace: Record, + rankedSource: unknown[], +): CompanionRecallDiagnosis { + // Prefer explicit causes/edges from the causality system + const causes = readDecisionCauses(trace); + const edges = readDecisionEdges(trace); + + const explicitEntries = causes + .filter((cause) => cause.code === "verified-companion") + .map((cause) => { + const edge = edges.find( + (item) => + item.code === "verified-companion" && + item.toSkill === cause.skill, + ); + return { + companionSkill: cause.skill, + candidateSkill: edge + ? edge.fromSkill + : stringOrNull(cause.detail.candidateSkill), + patternType: cause.code, + patternValue: stringOrNull(cause.detail.scenario) ?? "scenario-companion-rulebook", + synthetic: cause.synthetic, + droppedReason: stringOrNull(cause.detail.droppedReason), + }; + }); + + if (explicitEntries.length > 0) { + return { detected: true, entries: explicitEntries }; + } + + // Backward-compatible fallback for old traces without causes/edges + const fallbackEntries: CompanionRecallDoctorEntry[] = []; + for (const entry of rankedSource) { + const obj = toRecord(entry); + const pattern = toRecord(obj.pattern); + if (stringOrNull(pattern.type) !== "verified-companion") continue; + const skill = stringOrNull(obj.skill); + if (!skill) continue; + + let candidateSkill: string | null = null; + const idx = rankedSource.indexOf(entry); + for (let i = idx - 1; i >= 0; i--) { + const prev = toRecord(rankedSource[i]); + const prevPattern = toRecord(prev.pattern); + if (stringOrNull(prevPattern.type) !== "verified-companion") { + candidateSkill = stringOrNull(prev.skill); + break; + } + } + + fallbackEntries.push({ + companionSkill: skill, + candidateSkill, + patternType: String(pattern.type), + patternValue: stringOrNull(pattern.value) ?? "scenario-companion-rulebook", + synthetic: obj.synthetic === true, + droppedReason: stringOrNull(obj.droppedReason), + }); + } + + return { detected: fallbackEntries.length > 0, entries: fallbackEntries }; +} + +function buildRoutingDoctor( + latestTrace: unknown, + plan: VerificationPlanResult, + projectRoot: string, +): SessionExplainDoctor | null { + const trace = toRecord(latestTrace); + if (Object.keys(trace).length === 0) return null; + + const rankedSource = Array.isArray(trace.ranked) ? trace.ranked : []; + const latestRanked = rankedSource + .map((entry) => { + const obj = toRecord(entry); + const skill = stringOrNull(obj.skill); + if (!skill) return null; + return { + skill, + basePriority: numberOrZero(obj.basePriority), + effectivePriority: numberOrZero(obj.effectivePriority), + policyBoost: numberOrZero(obj.policyBoost), + policyReason: stringOrNull(obj.policyReason), + synthetic: obj.synthetic === true, + droppedReason: stringOrNull(obj.droppedReason), + }; + }) + .filter( + (entry): entry is SessionExplainDoctorRankedSkill => entry !== null, + ); + + const latestScenario = stringOrNull(trace.policyScenario); + const parsedScenario = parsePolicyScenario(latestScenario); + + const primaryStory = selectActiveStory(plan); + const primaryStoryRecord = toRecord(trace.primaryStory); + const routeScope = + stringOrNull(trace.observedRoute) ?? + stringOrNull(primaryStoryRecord.storyRoute) ?? + primaryStory?.route ?? + null; + + const scenario = parsedScenario + ? { + ...parsedScenario, + routeScope: parsedScenario.routeScope ?? routeScope, + } + : null; + + const injectedSkills = Array.isArray(trace.injectedSkills) + ? trace.injectedSkills.map((skill) => String(skill)) + : []; + + const excludeSkills = new Set([ + ...latestRanked.map((entry) => entry.skill), + ...injectedSkills, + ]); + + const policy = loadProjectRoutingPolicy(projectRoot); + const policyRecall = + scenario && + scenario.targetBoundary + ? explainPolicyRecall(policy, scenario, { + excludeSkills, + maxCandidates: 1, + }) + : null; + + // --- Companion recall extraction: prefer explicit causes/edges, fall back to ranked[] --- + const companionRecall = buildCompanionRecallDiagnosis(trace, rankedSource); + + const hints: RoutingDiagnosisHint[] = [...(policyRecall?.hints ?? [])]; + + if (latestRanked.length === 0) { + hints.push({ + severity: "warning", + code: "ROUTING_TRACE_MISSING_RANKED", + message: + "Latest routing trace has no ranked[] candidates", + hint: "Ensure PreToolUse/UserPromptSubmit persists ranked[] into the routing decision trace", + }); + } + + if (companionRecall.detected) { + for (const ce of companionRecall.entries) { + if (!ce.candidateSkill) { + hints.push({ + severity: "warning", + code: "COMPANION_EDGE_MISSING", + message: `Companion-recalled skill ${ce.companionSkill} has no explicit candidateSkill in the trace`, + hint: "Write a verified-companion edge into the routing trace instead of inferring from ranked order", + }); + } + if (!ce.synthetic) { + hints.push({ + severity: "warning", + code: "COMPANION_RECALL_NOT_SYNTHETIC", + message: `Companion-recalled skill ${ce.companionSkill} is not marked synthetic in the routing trace`, + hint: "Companion-recalled skills must be synthetic to preserve causal attribution", + }); + } + } + } + + return { + latestDecisionId: stringOrNull(trace.decisionId), + latestScenario, + latestRanked, + policyRecall, + companionRecall, + hints, + }; +} + +export function runSessionExplain( + sessionId: string | null, + projectRoot: string, + json = false, +): string { + const manifestPath = join(projectRoot, "generated", "skill-manifest.json"); + const skillsDir = join(projectRoot, "skills"); + const diagnosis: SessionExplainDiagnosis[] = []; + + // --- Manifest --- + let generatedAt: string | null = null; + let manifestSkills: Record = {}; + let manifestExcludedSkills: SkillExclusion[] = []; + + if (existsSync(manifestPath)) { + try { + const manifest = JSON.parse(readFileSync(manifestPath, "utf-8")); + generatedAt = manifest.generatedAt ?? null; + manifestSkills = manifest.skills ?? {}; + manifestExcludedSkills = manifest.excludedSkills ?? []; + } catch (err: any) { + diagnosis.push({ + severity: "error", + code: "MANIFEST_PARSE_FAILED", + message: `Failed to parse generated/skill-manifest.json: ${err.message}`, + hint: "Run `bun run build:manifest` to regenerate it", + }); + } + } else { + diagnosis.push({ + severity: "warning", + code: "MANIFEST_MISSING", + message: "No generated/skill-manifest.json found", + hint: "Run `bun run build:manifest`", + }); + } + + // --- Live scan with exclusion policy --- + let liveNames = new Set(); + let liveExcluded: SkillExclusion[] = []; + + if (existsSync(skillsDir)) { + const live = loadValidatedSkillMap(skillsDir); + const filteredLive = filterExcludedSkillMap(live.skills); + liveNames = new Set(Object.keys(filteredLive.included)); + liveExcluded = filteredLive.excluded; + } + + // Use manifest exclusions if available, otherwise fall back to live scan + const excludedSkills = manifestExcludedSkills.length > 0 + ? manifestExcludedSkills + : liveExcluded; + + // Emit hard diagnosis when live exclusions exist but manifest reports none + if (liveExcluded.length > 0 && manifestExcludedSkills.length === 0) { + diagnosis.push({ + severity: "error", + code: "MANIFEST_EXCLUSION_DRIFT", + message: + "Live exclusion policy found excluded skills, but generated/skill-manifest.json lists none.", + hint: "Run `bun run build:manifest` and commit the regenerated artifact.", + }); + } + + // Emit exclusion diagnosis for each excluded skill + for (const ex of excludedSkills) { + diagnosis.push({ + severity: "info", + code: "SKILL_EXCLUDED_BY_POLICY", + message: `${ex.slug} is intentionally excluded from the runtime manifest`, + hint: "Rename the skill if it should ship at runtime", + }); + } + + // --- Manifest parity --- + const manifestNames = new Set(Object.keys(manifestSkills)); + const missingFromManifest = [...liveNames].filter((s) => !manifestNames.has(s)).sort(); + const extraInManifest = [...manifestNames].filter((s) => !liveNames.has(s)).sort(); + + // --- Routing traces --- + const traces = readRoutingDecisionTrace(sessionId); + const latest = traces[traces.length - 1] ?? null; + + // --- Verification plan --- + const emptyPlan: VerificationPlanResult = { + hasStories: false, + activeStoryId: null, + stories: [], + storyStates: [], + observationCount: 0, + satisfiedBoundaries: [], + missingBoundaries: [], + recentRoutes: [], + primaryNextAction: null, + blockedReasons: [], + }; + + let plan: VerificationPlanResult; + if (sessionId) { + const cached = loadCachedPlanResult(sessionId); + plan = cached ?? computePlan(sessionId); + } else { + plan = emptyPlan; + } + + const directive = buildVerificationDirective(plan); + const env = buildVerificationEnv(directive); + + // --- Exposures --- + const exposures = sessionId ? loadSessionExposures(sessionId) : []; + + // --- Routing doctor --- + const doctor = buildRoutingDoctor(latest, plan, projectRoot); + + // --- Assemble result --- + const result: SessionExplainResult = { + ok: true, + sessionId, + manifest: { + generatedAt, + skillCount: Object.keys(manifestSkills).length, + excludedSkills, + parity: { + ok: missingFromManifest.length === 0 && extraInManifest.length === 0, + missingFromManifest, + extraInManifest, + }, + }, + routing: { + decisionCount: traces.length, + latestDecisionId: latest?.decisionId ?? null, + latestHook: latest?.hook ?? null, + latestPolicyScenario: latest?.policyScenario ?? null, + }, + verification: { + hasStories: plan.hasStories, + missingBoundaries: [...plan.missingBoundaries], + satisfiedBoundaries: [...plan.satisfiedBoundaries], + primaryNextAction: plan.primaryNextAction, + directive, + env, + }, + exposures: { + pending: exposures.filter((e) => e.outcome === "pending").length, + wins: exposures.filter((e) => e.outcome === "win").length, + directiveWins: exposures.filter((e) => e.outcome === "directive-win").length, + staleMisses: exposures.filter((e) => e.outcome === "stale-miss").length, + candidateWins: exposures.filter((e) => + (e.outcome === "win" || e.outcome === "directive-win") && + (e as any).attributionRole === "candidate" + ).length, + contextWins: exposures.filter((e) => + (e.outcome === "win" || e.outcome === "directive-win") && + (e as any).attributionRole === "context" + ).length, + }, + diagnosis, + doctor, + }; + + if (json) return JSON.stringify(result, null, 2); + return formatSessionExplainText(result); +} + +// --------------------------------------------------------------------------- +// Text formatting +// --------------------------------------------------------------------------- + +function formatSessionExplainText(result: SessionExplainResult): string { + const lines: string[] = [ + `Session: ${result.sessionId ?? "none"}`, + `Manifest: ${result.manifest.skillCount} skills`, + `Excluded: ${result.manifest.excludedSkills.map((s) => s.slug).join(", ") || "none"}`, + `Parity: ${result.manifest.parity.ok ? "ok" : "drift detected"}`, + `Routing traces: ${result.routing.decisionCount}`, + `Latest hook: ${result.routing.latestHook ?? "none"}`, + `Verification stories: ${result.verification.hasStories ? "yes" : "no"}`, + result.verification.primaryNextAction + ? `Next action: ${result.verification.primaryNextAction.action}` + : "Next action: none", + `Pending exposures: ${result.exposures.pending}`, + ]; + + if (result.diagnosis.length > 0) { + lines.push(""); + lines.push("Diagnosis:"); + for (const d of result.diagnosis) { + lines.push(` [${d.severity}] ${d.code}: ${d.message}`); + if (d.hint) lines.push(` -> ${d.hint}`); + } + } + + if (result.doctor) { + lines.push(""); + lines.push("Routing doctor:"); + lines.push(` Decision: ${result.doctor.latestDecisionId ?? "none"}`); + lines.push(` Scenario: ${result.doctor.latestScenario ?? "none"}`); + if (result.doctor.latestRanked.length > 0) { + const top = result.doctor.latestRanked + .slice(0, 3) + .map((entry) => `${entry.skill}=${entry.effectivePriority}`) + .join(", "); + lines.push(` Top ranked: ${top}`); + } + if (result.doctor.policyRecall) { + lines.push( + ` Recall bucket: ${result.doctor.policyRecall.selectedBucket ?? "none"}`, + ); + lines.push( + ` Recall selected: ${ + result.doctor.policyRecall.selected + .map((candidate) => candidate.skill) + .join(", ") || "none" + }`, + ); + } + if (result.doctor.companionRecall.detected) { + const companions = result.doctor.companionRecall.entries + .map((e) => `${e.companionSkill}→${e.candidateSkill ?? "?"}`) + .join(", "); + lines.push(` Companion recall: ${companions}`); + } + for (const hint of result.doctor.hints) { + lines.push(` [${hint.severity}] ${hint.code}: ${hint.message}`); + if (hint.hint) lines.push(` -> ${hint.hint}`); + } + } + + return lines.join("\n") + "\n"; +} diff --git a/src/commands/verify-plan.ts b/src/commands/verify-plan.ts new file mode 100644 index 0000000..f76fb3e --- /dev/null +++ b/src/commands/verify-plan.ts @@ -0,0 +1,162 @@ +/** + * `vercel-plugin verify-plan` — inspect the current verification plan state. + * + * Reads the session ledger and derives (or loads cached) plan state. + * Exits 0 on success, non-zero only on actual command failure. + * + * Usage: + * vercel-plugin verify-plan [--json] [--session ] + */ + +import { tmpdir } from "node:os"; +import { readdirSync, statSync } from "node:fs"; +import { + computePlan, + formatPlanHuman, + planToLoopSnapshot, + type VerificationPlanResult, + type VerificationLoopSnapshot, + type ComputePlanOptions, +} from "../../hooks/src/verification-plan.mts"; +import { + derivePlan, + loadObservations, + loadStories, +} from "../../hooks/src/verification-ledger.mts"; + +export interface VerifyPlanOptions { + sessionId?: string; + agentBrowserAvailable?: boolean; + devServerLoopGuardHit?: boolean; + lastAttemptedAction?: string | null; +} + +/** + * Auto-detect the most recent session ledger directory. + * Returns null if none found. + */ +function detectSessionId(): string | null { + const tmp = tmpdir(); + let entries: string[]; + try { + entries = readdirSync(tmp); + } catch { + return null; + } + + const latestLedger = entries + .filter((e) => e.startsWith("vercel-plugin-") && e.endsWith("-ledger")) + .map((entry) => { + try { + const ledgerPath = `${tmp}/${entry}`; + const dirStat = statSync(ledgerPath); + let latestMtimeMs = dirStat.mtimeMs; + try { + for (const child of readdirSync(ledgerPath)) { + const childMtimeMs = statSync(`${ledgerPath}/${child}`).mtimeMs; + latestMtimeMs = Math.max(latestMtimeMs, childMtimeMs); + } + } catch {} + return { + entry, + mtimeMs: latestMtimeMs, + }; + } catch { + return null; + } + }) + .filter((entry): entry is { entry: string; mtimeMs: number } => entry !== null) + .sort((a, b) => b.mtimeMs - a.mtimeMs || a.entry.localeCompare(b.entry))[0]; + + if (!latestLedger) return null; + + // Extract session id from directory name: vercel-plugin--ledger + const match = latestLedger.entry.match(/^vercel-plugin-(.+)-ledger$/); + return match ? match[1] : null; +} + +/** + * Run the verify-plan command. + */ +export function verifyPlan(options: VerifyPlanOptions = {}): VerificationPlanResult { + const sessionId = + options.sessionId || + process.env.CLAUDE_SESSION_ID || + detectSessionId(); + + if (!sessionId) { + return { + hasStories: false, + activeStoryId: null, + stories: [], + storyStates: [], + observationCount: 0, + satisfiedBoundaries: [], + missingBoundaries: [], + recentRoutes: [], + primaryNextAction: null, + blockedReasons: ["No active session found"], + }; + } + + const planOptions: ComputePlanOptions = {}; + if (options.agentBrowserAvailable !== undefined) { + planOptions.agentBrowserAvailable = options.agentBrowserAvailable; + } + if (options.devServerLoopGuardHit !== undefined) { + planOptions.devServerLoopGuardHit = options.devServerLoopGuardHit; + } + if (options.lastAttemptedAction !== undefined) { + planOptions.lastAttemptedAction = options.lastAttemptedAction; + } + + return computePlan(sessionId, planOptions); +} + +/** + * Return a VerificationLoopSnapshot including last-observation adherence metadata. + * Machine-readable API for downstream tooling and subagents. + */ +export function verifyPlanSnapshot( + options: VerifyPlanOptions = {}, +): VerificationLoopSnapshot { + const sessionId = + options.sessionId || + process.env.CLAUDE_SESSION_ID || + detectSessionId(); + + if (!sessionId) { + return { + hasStories: false, + activeStoryId: null, + stories: [], + storyStates: [], + observationCount: 0, + satisfiedBoundaries: [], + missingBoundaries: [], + recentRoutes: [], + primaryNextAction: null, + blockedReasons: ["No active session found"], + lastObservation: null, + }; + } + + const planOptions: ComputePlanOptions = {}; + if (options.agentBrowserAvailable !== undefined) { + planOptions.agentBrowserAvailable = options.agentBrowserAvailable; + } + if (options.devServerLoopGuardHit !== undefined) { + planOptions.devServerLoopGuardHit = options.devServerLoopGuardHit; + } + if (options.lastAttemptedAction !== undefined) { + planOptions.lastAttemptedAction = options.lastAttemptedAction; + } + + const observations = loadObservations(sessionId); + const stories = loadStories(sessionId); + const plan = derivePlan(observations, stories, planOptions); + + return planToLoopSnapshot(plan); +} + +export { formatPlanHuman }; diff --git a/src/shared/skill-exclusion-policy.ts b/src/shared/skill-exclusion-policy.ts new file mode 100644 index 0000000..3977d83 --- /dev/null +++ b/src/shared/skill-exclusion-policy.ts @@ -0,0 +1,55 @@ +/** + * Unified skill exclusion policy. + * + * Single source of truth for which skills are test-only fixtures that must + * never appear in the runtime manifest or be surfaced as live candidates in + * CLI diagnostics. + * + * Consumers: scripts/build-manifest.ts, src/cli/explain.ts, src/commands/doctor.ts + */ + +/** + * Skills matching this pattern are test-only fixtures. The pattern matches + * slugs prefixed with "fake-" or suffixed with "-test-skill". + */ +export const EXCLUDED_SKILL_PATTERN = /^fake-|-test-skill$/; + +export type SkillExclusionReason = "test-only-pattern"; + +export interface SkillExclusion { + slug: string; + reason: SkillExclusionReason; +} + +/** + * Check whether a single skill slug is excluded by policy. + * Returns the exclusion record or null if the skill is not excluded. + */ +export function getSkillExclusion(slug: string): SkillExclusion | null { + return EXCLUDED_SKILL_PATTERN.test(slug) + ? { slug, reason: "test-only-pattern" } + : null; +} + +/** + * Partition a skill map into included (runtime) and excluded (test-only) sets. + * Excluded entries are sorted by slug for deterministic output. + */ +export function filterExcludedSkillMap( + skills: Record, +): { included: Record; excluded: SkillExclusion[] } { + const included: Record = {}; + const excluded: SkillExclusion[] = []; + + for (const [slug, value] of Object.entries(skills)) { + const hit = getSkillExclusion(slug); + if (hit) { + excluded.push(hit); + continue; + } + included[slug] = value; + } + + excluded.sort((a, b) => a.slug.localeCompare(b.slug)); + return { included, excluded }; +} diff --git a/tests/build-skill-map.test.ts b/tests/build-skill-map.test.ts index da35969..c5fae34 100644 --- a/tests/build-skill-map.test.ts +++ b/tests/build-skill-map.test.ts @@ -81,10 +81,14 @@ describe("build-manifest.ts", () => { expect(Number.isNaN(Date.parse(manifest.generatedAt))).toBe(false); }); - test("manifest skill count matches skills/ directory", () => { + test("manifest skill count matches skills/ directory minus excluded test-only skills", () => { + const { EXCLUDED_SKILL_PATTERN } = require("../scripts/build-manifest.ts"); const manifest = readManifest(); - const expected = countSkillDirs(); - expect(Object.keys(manifest.skills).length).toBe(expected); + const allDirs = readdirSync(SKILLS_DIR).filter((d) => { + try { return existsSync(join(SKILLS_DIR, d, "SKILL.md")); } catch { return false; } + }); + const productionDirs = allDirs.filter((d) => !EXCLUDED_SKILL_PATTERN.test(d)); + expect(Object.keys(manifest.skills).length).toBe(productionDirs.length); }); test("each manifest skill has required fields", () => { @@ -136,6 +140,48 @@ describe("build-manifest.ts", () => { }); }); +// --------------------------------------------------------------------------- +// Manifest hygiene — test-only skills must not appear in runtime manifest +// --------------------------------------------------------------------------- + +describe("manifest hygiene: no test-only skills in runtime manifest", () => { + test("runtime manifest excludes skills matching EXCLUDED_SKILL_PATTERN", async () => { + const { EXCLUDED_SKILL_PATTERN } = await import("../scripts/build-manifest.ts"); + + // Rebuild to ensure fresh state + const { code } = await runBuild(); + expect(code).toBe(0); + + const manifest = readManifest(); + const manifestSlugs = Object.keys(manifest.skills); + const excluded = manifestSlugs.filter((s) => EXCLUDED_SKILL_PATTERN.test(s)); + + expect(excluded).toEqual([]); + }); + + test("fake-banned-test-skill is absent from runtime manifest", () => { + const manifest = readManifest(); + expect(manifest.skills).not.toHaveProperty("fake-banned-test-skill"); + }); + + test("fake-banned-test-skill still exists in skills/ directory (test fixture)", () => { + expect(existsSync(join(SKILLS_DIR, "fake-banned-test-skill", "SKILL.md"))).toBe(true); + }); + + test("EXCLUDED_SKILL_PATTERN matches expected slugs and rejects production slugs", async () => { + const { EXCLUDED_SKILL_PATTERN } = await import("../scripts/build-manifest.ts"); + + // Should match test-only patterns + expect(EXCLUDED_SKILL_PATTERN.test("fake-banned-test-skill")).toBe(true); + expect(EXCLUDED_SKILL_PATTERN.test("fake-something")).toBe(true); + + // Should NOT match production skills + expect(EXCLUDED_SKILL_PATTERN.test("nextjs")).toBe(false); + expect(EXCLUDED_SKILL_PATTERN.test("vercel-cli")).toBe(false); + expect(EXCLUDED_SKILL_PATTERN.test("ai-sdk")).toBe(false); + }); +}); + describe("manifest-backed hook loading", () => { test("hook uses manifest when present and still matches skills", async () => { // Ensure manifest exists @@ -201,7 +247,9 @@ describe("loadSkills pipeline stage", () => { expect(result).not.toBeNull(); expect(result.usedManifest).toBe(true); expect(Array.isArray(result.compiledSkills)).toBe(true); - expect(result.compiledSkills.length).toBe(countSkillDirs()); + // Manifest excludes test-only skills, so count should match manifest keys + const manifest = readManifest(); + expect(result.compiledSkills.length).toBe(Object.keys(manifest.skills).length); // Each compiled skill should have paired pattern+regex arrays for (const entry of result.compiledSkills) { diff --git a/tests/cli-explain.test.ts b/tests/cli-explain.test.ts index 9ec865f..34eafa0 100644 --- a/tests/cli-explain.test.ts +++ b/tests/cli-explain.test.ts @@ -1,6 +1,7 @@ -import { describe, test, expect, beforeAll } from "bun:test"; +import { describe, test, expect, beforeAll, afterAll } from "bun:test"; import { resolve, join } from "node:path"; -import { existsSync } from "node:fs"; +import { existsSync, writeFileSync, unlinkSync } from "node:fs"; +import { tmpdir } from "node:os"; const ROOT = resolve(import.meta.dir, ".."); const CLI = join(ROOT, "src", "cli", "index.ts"); @@ -169,9 +170,10 @@ describe("budget-aware injection", () => { test("tiny budget forces budget drops", async () => { const { stdout } = await runCli("explain", "vercel.json", "--json", "--budget", "100"); const result = JSON.parse(stdout); - // First skill always injected regardless of budget, rest should be budget-dropped + // The first match bypasses budget enforcement, but additional invocation strings + // can still fit if the remaining budget allows. const fullCount = result.matches.filter((m: any) => m.injectionMode === "full").length; - expect(fullCount).toBe(1); + expect(fullCount).toBeGreaterThanOrEqual(1); if (result.matches.length > 1) { expect(result.droppedByBudgetCount).toBeGreaterThan(0); } @@ -193,13 +195,15 @@ describe("profiler boost", () => { }); test("--likely-skills reorders ranking", async () => { - const { stdout: before } = await runCli("explain", "vercel.json", "--json"); - const { stdout: after } = await runCli("explain", "vercel.json", "--json", "--likely-skills", "vercel-cli"); + const { stdout: before } = await runCli("explain", "vercel deploy --prod", "--json"); + const { stdout: after } = await runCli("explain", "vercel deploy --prod", "--json", "--likely-skills", "vercel-cli"); const resultBefore = JSON.parse(before); const resultAfter = JSON.parse(after); - // Without boost, vercel-cli should not be first; with boost it should be - expect(resultAfter.matches[0].skill).toBe("vercel-cli"); - expect(resultBefore.matches[0].skill).not.toBe("vercel-cli"); + const beforeIndex = resultBefore.matches.findIndex((m: any) => m.skill === "vercel-cli"); + const afterIndex = resultAfter.matches.findIndex((m: any) => m.skill === "vercel-cli"); + expect(beforeIndex).toBeGreaterThan(-1); + expect(afterIndex).toBeGreaterThan(-1); + expect(afterIndex).toBeLessThan(beforeIndex); }); }); @@ -244,3 +248,152 @@ describe("cap behavior", () => { } }); }); + +// --------------------------------------------------------------------------- +// policy boost via --policy-file +// --------------------------------------------------------------------------- + +describe("policy boost", () => { + const policyPath = join(tmpdir(), `cli-explain-test-policy-${Date.now()}.json`); + + // Build a policy where routing-middleware has a high success rate + // under PreToolUse|none|none|Read scenario (which is what explain uses for file targets) + const policy = { + version: 1, + scenarios: { + "PreToolUse|none|none|Read": { + "routing-middleware": { + exposures: 10, + wins: 9, + directiveWins: 5, + staleMisses: 1, + lastUpdatedAt: "2026-03-27T04:00:00.000Z", + }, + }, + }, + }; + + beforeAll(() => { + writeFileSync(policyPath, JSON.stringify(policy)); + }); + + afterAll(() => { + try { unlinkSync(policyPath); } catch {} + }); + + test("--policy-file adds policyBoost to JSON output", async () => { + const { stdout, exitCode } = await runCli( + "explain", "middleware.ts", "--json", "--policy-file", policyPath, + ); + expect(exitCode).toBe(0); + const result = JSON.parse(stdout); + const rm = result.matches.find((m: any) => m.skill === "routing-middleware"); + expect(rm).toBeDefined(); + expect(rm.policyBoost).toBe(8); + expect(rm.policyReason).toContain("9 wins / 10 exposures"); + expect(rm.policyReason).toContain("5 directive wins"); + expect(rm.effectivePriority).toBe(rm.priority + 8); + }); + + test("human output shows policy boost in priority line", async () => { + const { stdout, exitCode } = await runCli( + "explain", "middleware.ts", "--policy-file", policyPath, + ); + expect(exitCode).toBe(0); + expect(stdout).toContain("policy +8"); + expect(stdout).toContain("policy:"); + }); + + test("human output shows policy reason line", async () => { + const { stdout } = await runCli( + "explain", "middleware.ts", "--policy-file", policyPath, + ); + expect(stdout).toContain("9 wins / 10 exposures"); + expect(stdout).toContain("5 directive wins"); + }); + + test("policy boost reorders ranking", async () => { + // Build a policy that boosts a normally low-priority skill + const boostPolicy = { + version: 1, + scenarios: { + "PreToolUse|none|none|Read": { + "vercel-cli": { + exposures: 5, + wins: 5, + directiveWins: 3, + staleMisses: 0, + lastUpdatedAt: "2026-03-27T04:00:00.000Z", + }, + }, + }, + }; + const boostPath = join(tmpdir(), `cli-explain-test-boost-${Date.now()}.json`); + writeFileSync(boostPath, JSON.stringify(boostPolicy)); + + try { + const { stdout: before } = await runCli("explain", "vercel.json", "--json"); + const { stdout: after } = await runCli("explain", "vercel.json", "--json", "--policy-file", boostPath); + const resultBefore = JSON.parse(before); + const resultAfter = JSON.parse(after); + const vcBefore = resultBefore.matches.find((m: any) => m.skill === "vercel-cli"); + const vcAfter = resultAfter.matches.find((m: any) => m.skill === "vercel-cli"); + expect(vcAfter.effectivePriority).toBeGreaterThan(vcBefore.effectivePriority); + expect(vcAfter.policyBoost).toBe(8); + } finally { + try { unlinkSync(boostPath); } catch {} + } + }); + + test("no policy boost when policy file has no matching scenario", async () => { + const emptyPolicy = { version: 1, scenarios: {} }; + const emptyPath = join(tmpdir(), `cli-explain-test-empty-${Date.now()}.json`); + writeFileSync(emptyPath, JSON.stringify(emptyPolicy)); + + try { + const { stdout } = await runCli( + "explain", "middleware.ts", "--json", "--policy-file", emptyPath, + ); + const result = JSON.parse(stdout); + const rm = result.matches.find((m: any) => m.skill === "routing-middleware"); + expect(rm).toBeDefined(); + // No policyBoost field when there's no data + expect(rm.policyBoost).toBeUndefined(); + expect(rm.effectivePriority).toBe(rm.priority); + } finally { + try { unlinkSync(emptyPath); } catch {} + } + }); + + test("negative policy boost reduces effective priority", async () => { + const negPolicy = { + version: 1, + scenarios: { + "PreToolUse|none|none|Read": { + "routing-middleware": { + exposures: 10, + wins: 1, + directiveWins: 0, + staleMisses: 9, + lastUpdatedAt: "2026-03-27T04:00:00.000Z", + }, + }, + }, + }; + const negPath = join(tmpdir(), `cli-explain-test-neg-${Date.now()}.json`); + writeFileSync(negPath, JSON.stringify(negPolicy)); + + try { + const { stdout } = await runCli( + "explain", "middleware.ts", "--json", "--policy-file", negPath, + ); + const result = JSON.parse(stdout); + const rm = result.matches.find((m: any) => m.skill === "routing-middleware"); + expect(rm).toBeDefined(); + expect(rm.policyBoost).toBe(-2); + expect(rm.effectivePriority).toBe(rm.priority - 2); + } finally { + try { unlinkSync(negPath); } catch {} + } + }); +}); diff --git a/tests/cli-learn.test.ts b/tests/cli-learn.test.ts new file mode 100644 index 0000000..6860d8c --- /dev/null +++ b/tests/cli-learn.test.ts @@ -0,0 +1,614 @@ +import { describe, test, expect, beforeEach, afterEach } from "bun:test"; +import { mkdirSync, writeFileSync, rmSync, existsSync, readFileSync } from "node:fs"; +import { join } from "node:path"; +import { tmpdir } from "node:os"; +import { runLearnCommand, learnedRulesPath } from "../src/cli/learn.ts"; + +// --------------------------------------------------------------------------- +// Fixtures +// --------------------------------------------------------------------------- + +const FIXED_TS = "2026-03-28T06:00:00.000Z"; +const TEST_SESSION = "test-learn-cli"; +const FOREIGN_SESSION = "test-learn-cli-foreign"; +let tempProjectCounter = 0; + +/** Minimal fixture project with a skills/ dir. */ +function makeTempProject(): string { + tempProjectCounter += 1; + const dir = join( + tmpdir(), + `vercel-plugin-learn-test-${Date.now()}-${tempProjectCounter}`, + ); + mkdirSync(join(dir, "skills"), { recursive: true }); + mkdirSync(join(dir, "generated"), { recursive: true }); + return dir; +} + +/** Write a JSONL trace file for a session. */ +function writeTraceFixture(sessionId: string, traces: object[]): void { + const traceDir = join(tmpdir(), `vercel-plugin-${sessionId}-trace`); + mkdirSync(traceDir, { recursive: true }); + const lines = traces.map((t) => JSON.stringify(t)).join("\n") + "\n"; + writeFileSync(join(traceDir, "routing-decision-trace.jsonl"), lines); +} + +/** Write an exposure JSONL file for a session. */ +function writeExposureFixture(sessionId: string, exposures: object[]): void { + const path = join(tmpdir(), `vercel-plugin-${sessionId}-routing-exposures.jsonl`); + const lines = exposures.map((e) => JSON.stringify(e)).join("\n") + "\n"; + writeFileSync(path, lines); +} + +function makeTrace(overrides: Record = {}): Record { + return { + version: 2, + decisionId: "d1", + sessionId: TEST_SESSION, + hook: "PreToolUse", + toolName: "Read", + toolTarget: "/app/page.tsx", + timestamp: FIXED_TS, + primaryStory: { + id: "story-1", + kind: "feature", + storyRoute: "/app", + targetBoundary: "uiRender", + }, + observedRoute: "/app", + policyScenario: null, + matchedSkills: [], + injectedSkills: [], + skippedReasons: [], + ranked: [], + verification: null, + ...overrides, + }; +} + +function makeExposure(overrides: Record = {}): Record { + return { + id: "exp-1", + sessionId: TEST_SESSION, + projectRoot: "/test", + storyId: "story-1", + storyKind: "feature", + route: "/app", + hook: "PreToolUse", + toolName: "Read", + targetBoundary: "uiRender", + exposureGroupId: null, + attributionRole: "candidate", + candidateSkill: "next-config", + createdAt: FIXED_TS, + resolvedAt: FIXED_TS, + outcome: "win", + skill: "next-config", + ...overrides, + }; +} + +// --------------------------------------------------------------------------- +// Cleanup helpers +// --------------------------------------------------------------------------- + +let tempDirs: string[] = []; + +function trackDir(dir: string): string { + tempDirs.push(dir); + return dir; +} + +beforeEach(() => { + tempDirs = []; + tempProjectCounter = 0; +}); + +afterEach(() => { + for (const dir of tempDirs) { + try { + rmSync(dir, { recursive: true, force: true }); + } catch {} + } + // Clean up test session trace dir + try { + rmSync(join(tmpdir(), `vercel-plugin-${TEST_SESSION}-trace`), { recursive: true, force: true }); + } catch {} + try { + rmSync(join(tmpdir(), `vercel-plugin-${TEST_SESSION}-routing-exposures.jsonl`), { force: true }); + } catch {} + try { + rmSync(join(tmpdir(), `vercel-plugin-${FOREIGN_SESSION}-trace`), { recursive: true, force: true }); + } catch {} + try { + rmSync(join(tmpdir(), `vercel-plugin-${FOREIGN_SESSION}-routing-exposures.jsonl`), { force: true }); + } catch {} +}); + +// --------------------------------------------------------------------------- +// Tests +// --------------------------------------------------------------------------- + +describe("runLearnCommand", () => { + test("returns exit code 0 with no traces", async () => { + const project = trackDir(makeTempProject()); + const code = await runLearnCommand({ project, session: TEST_SESSION }); + expect(code).toBe(0); + }); + + test("returns exit code 2 for missing project", async () => { + const code = await runLearnCommand({ project: "/nonexistent/path", json: true }); + expect(code).toBe(2); + }); + + test("--json outputs valid JSON to stdout", async () => { + const project = trackDir(makeTempProject()); + const logs: string[] = []; + const origLog = console.log; + console.log = (msg: string) => logs.push(msg); + try { + await runLearnCommand({ project, json: true, session: TEST_SESSION }); + } finally { + console.log = origLog; + } + + const stdout = logs.join("\n"); + const parsed = JSON.parse(stdout); + expect(parsed.rules.version).toBe(1); + expect(parsed.rules.rules).toEqual([]); + expect(parsed.rules.replay).toBeDefined(); + expect(parsed.rules.replay.regressions).toEqual([]); + }); + + test("--write creates generated/learned-routing-rules.json", async () => { + const project = trackDir(makeTempProject()); + const code = await runLearnCommand({ project, write: true, session: TEST_SESSION }); + expect(code).toBe(0); + + const outPath = learnedRulesPath(project); + expect(existsSync(outPath)).toBe(true); + + const content = JSON.parse(readFileSync(outPath, "utf-8")); + expect(content.version).toBe(1); + expect(content.projectRoot).toBe(project); + }); + + test("--json with traces produces rules in output", async () => { + const project = trackDir(makeTempProject()); + + // Write 6 winning traces (enough for candidate/promote) + const traces = Array.from({ length: 6 }, (_, i) => + makeTrace({ + decisionId: `d${i}`, + injectedSkills: ["next-config"], + ranked: [ + { + skill: "next-config", + basePriority: 6, + effectivePriority: 6, + pattern: { type: "path", value: "next.config.*" }, + profilerBoost: 0, + policyBoost: 0, + policyReason: null, + summaryOnly: false, + synthetic: false, + droppedReason: null, + }, + ], + }), + ); + writeTraceFixture(TEST_SESSION, traces); + + const exposures = [makeExposure({ skill: "next-config", outcome: "win" })]; + writeExposureFixture(TEST_SESSION, exposures); + + const logs: string[] = []; + const origLog = console.log; + console.log = (msg: string) => logs.push(msg); + try { + await runLearnCommand({ project, json: true, session: TEST_SESSION }); + } finally { + console.log = origLog; + } + + const stdout = logs.join("\n"); + const parsed = JSON.parse(stdout); + expect(parsed.rules.rules.length).toBeGreaterThanOrEqual(1); + expect(parsed.rules.replay).toBeDefined(); + }); + + test("--write exits non-zero when replay reports regressions", async () => { + const project = trackDir(makeTempProject()); + + // Winning traces for skill-a (baseline wins), but promoted rule targets skill-b + const traces = Array.from({ length: 6 }, (_, i) => + makeTrace({ + decisionId: `d${i}`, + injectedSkills: ["skill-a"], + verification: { + verificationId: `v${i}`, + observedBoundary: "uiRender", + matchedSuggestedAction: true, + }, + ranked: [ + { + skill: "skill-b", + basePriority: 6, + effectivePriority: 6, + pattern: { type: "path", value: "b.*" }, + profilerBoost: 0, + policyBoost: 0, + policyReason: null, + summaryOnly: false, + synthetic: false, + droppedReason: null, + }, + ], + }), + ); + writeTraceFixture(TEST_SESSION, traces); + + // skill-b wins in exposure — it'll get promoted, but baseline wins used skill-a + const exposures = [makeExposure({ skill: "skill-b", outcome: "win" })]; + writeExposureFixture(TEST_SESSION, exposures); + + const code = await runLearnCommand({ + project, + write: true, + session: TEST_SESSION, + }); + + // The distiller may or may not produce regressions depending on the exact + // scoring. If no rules get promoted, there are no regressions, exit 0. + // If rules get promoted and cause regressions, exit 1. + // Either outcome is valid — just verify the file was written. + const outPath = learnedRulesPath(project); + expect(existsSync(outPath)).toBe(true); + }); + + test("human-readable output includes summary lines", async () => { + const project = trackDir(makeTempProject()); + const logs: string[] = []; + const origLog = console.log; + console.log = (msg: string) => logs.push(msg); + try { + await runLearnCommand({ project, session: TEST_SESSION }); + } finally { + console.log = origLog; + } + + // With no traces, human output reports zero rules + const stdout = logs.join("\n"); + expect(stdout).toContain("Learned routing rules: 0"); + expect(stdout).toContain("promoted: 0"); + expect(stdout).toContain("baseline wins:"); + }); + + test("custom thresholds are passed through to distiller", async () => { + const project = trackDir(makeTempProject()); + + const traces = Array.from({ length: 3 }, (_, i) => + makeTrace({ + decisionId: `d${i}`, + injectedSkills: ["next-config"], + ranked: [ + { + skill: "next-config", + basePriority: 6, + effectivePriority: 6, + pattern: { type: "path", value: "next.config.*" }, + profilerBoost: 0, + policyBoost: 0, + policyReason: null, + summaryOnly: false, + synthetic: false, + droppedReason: null, + }, + ], + }), + ); + writeTraceFixture(TEST_SESSION, traces); + writeExposureFixture(TEST_SESSION, [makeExposure({ skill: "next-config", outcome: "win" })]); + + const logs: string[] = []; + const origLog = console.log; + console.log = (msg: string) => logs.push(msg); + try { + await runLearnCommand({ + project, + json: true, + session: TEST_SESSION, + minSupport: 2, + minPrecision: 0.5, + minLift: 1.0, + }); + } finally { + console.log = origLog; + } + + const stdout = logs.join("\n"); + const parsed = JSON.parse(stdout); + // With relaxed thresholds and 3 traces, should produce at least 1 rule + expect(parsed.rules.rules.length).toBeGreaterThanOrEqual(1); + }); + + test("auto-discovery excludes sessions from other projects", async () => { + const project = trackDir(makeTempProject()); + const otherProject = trackDir(makeTempProject()); + + writeTraceFixture(TEST_SESSION, [ + makeTrace({ + decisionId: "local-1", + injectedSkills: ["next-config"], + ranked: [ + { + skill: "next-config", + basePriority: 6, + effectivePriority: 6, + pattern: { type: "path", value: "next.config.*" }, + profilerBoost: 0, + policyBoost: 0, + policyReason: null, + summaryOnly: false, + synthetic: false, + droppedReason: null, + }, + ], + }), + ]); + writeExposureFixture(TEST_SESSION, [ + makeExposure({ projectRoot: project, skill: "next-config", outcome: "win" }), + ]); + + writeTraceFixture(FOREIGN_SESSION, [ + makeTrace({ + decisionId: "foreign-1", + sessionId: FOREIGN_SESSION, + injectedSkills: ["foreign-skill"], + ranked: [ + { + skill: "foreign-skill", + basePriority: 6, + effectivePriority: 6, + pattern: { type: "path", value: "foreign.*" }, + profilerBoost: 0, + policyBoost: 0, + policyReason: null, + summaryOnly: false, + synthetic: false, + droppedReason: null, + }, + ], + }), + ]); + writeExposureFixture(FOREIGN_SESSION, [ + makeExposure({ + sessionId: FOREIGN_SESSION, + projectRoot: otherProject, + skill: "foreign-skill", + candidateSkill: "foreign-skill", + outcome: "win", + }), + ]); + + const logs: string[] = []; + const origLog = console.log; + console.log = (msg: string) => logs.push(msg); + try { + await runLearnCommand({ project, json: true }); + } finally { + console.log = origLog; + } + + const parsed = JSON.parse(logs.join("\n")); + expect(parsed.rules.rules).toHaveLength(1); + expect(parsed.rules.rules[0]?.skill).toBe("next-config"); + }); + + // --------------------------------------------------------------------------- + // --write vs dry-run behavior + // --------------------------------------------------------------------------- + + test("dry-run (no --write) does NOT create the artifact file", async () => { + const project = trackDir(makeTempProject()); + + const traces = Array.from({ length: 6 }, (_, i) => + makeTrace({ + decisionId: `d${i}`, + injectedSkills: ["next-config"], + ranked: [ + { + skill: "next-config", + basePriority: 6, + effectivePriority: 6, + pattern: { type: "path", value: "next.config.*" }, + profilerBoost: 0, + policyBoost: 0, + policyReason: null, + summaryOnly: false, + synthetic: false, + droppedReason: null, + }, + ], + }), + ); + writeTraceFixture(TEST_SESSION, traces); + writeExposureFixture(TEST_SESSION, [makeExposure({ skill: "next-config", outcome: "win" })]); + + const logs: string[] = []; + const origLog = console.log; + console.log = (msg: string) => logs.push(msg); + try { + await runLearnCommand({ project, json: true, session: TEST_SESSION }); + } finally { + console.log = origLog; + } + + // stdout has JSON, but no file was written + const stdout = logs.join("\n"); + expect(() => JSON.parse(stdout)).not.toThrow(); + expect(existsSync(learnedRulesPath(project))).toBe(false); + }); + + test("--write creates file while --json dry-run does not", async () => { + const projectWrite = trackDir(makeTempProject()); + const projectDry = trackDir(makeTempProject()); + + // Same session/traces for both + writeTraceFixture(TEST_SESSION, []); + writeExposureFixture(TEST_SESSION, []); + + await runLearnCommand({ project: projectWrite, write: true, session: TEST_SESSION }); + await runLearnCommand({ project: projectDry, json: true, session: TEST_SESSION }); + + expect(existsSync(learnedRulesPath(projectWrite))).toBe(true); + expect(existsSync(learnedRulesPath(projectDry))).toBe(false); + }); + + // --------------------------------------------------------------------------- + // Deterministic JSON output for --json + // --------------------------------------------------------------------------- + + test("--json output has deterministic key ordering across runs", async () => { + const project = trackDir(makeTempProject()); + + const traces = Array.from({ length: 6 }, (_, i) => + makeTrace({ + decisionId: `d${i}`, + injectedSkills: ["next-config"], + ranked: [ + { + skill: "next-config", + basePriority: 6, + effectivePriority: 6, + pattern: { type: "path", value: "next.config.*" }, + profilerBoost: 0, + policyBoost: 0, + policyReason: null, + summaryOnly: false, + synthetic: false, + droppedReason: null, + }, + ], + }), + ); + writeTraceFixture(TEST_SESSION, traces); + writeExposureFixture(TEST_SESSION, [makeExposure({ skill: "next-config", outcome: "win" })]); + + const capture = async () => { + const logs: string[] = []; + const origLog = console.log; + console.log = (msg: string) => logs.push(msg); + try { + await runLearnCommand({ project, json: true, session: TEST_SESSION }); + } finally { + console.log = origLog; + } + const parsed = JSON.parse(logs.join("\n")); + delete parsed.rules?.generatedAt; + delete parsed.companions?.generatedAt; + delete parsed.playbooks?.generatedAt; + return JSON.stringify(parsed); + }; + + const run1 = await capture(); + const run2 = await capture(); + expect(run1).toBe(run2); + }); + + // --------------------------------------------------------------------------- + // Regression exit code with properly constructed regression scenario + // --------------------------------------------------------------------------- + + test("exit code 1 when replay detects regressions from promoted rules", async () => { + const project = trackDir(makeTempProject()); + + // 8 verified traces: skill-a injected (baseline wins), skill-b ranked + const winTraces = Array.from({ length: 8 }, (_, i) => + makeTrace({ + decisionId: `d${i}`, + injectedSkills: ["skill-a"], + ranked: [ + { + skill: "skill-b", + basePriority: 6, + effectivePriority: 6, + pattern: { type: "path", value: "b.*" }, + profilerBoost: 0, + policyBoost: 0, + policyReason: null, + summaryOnly: false, + synthetic: false, + droppedReason: null, + }, + ], + verification: { + verificationId: `v${i}`, + observedBoundary: "uiRender", + matchedSuggestedAction: true, + }, + }), + ); + + // 8 unverified traces with skill-c to dilute scenario precision (lift > 1.5) + const dilutionTraces = Array.from({ length: 8 }, (_, i) => + makeTrace({ + decisionId: `dilute${i}`, + injectedSkills: ["skill-c"], + ranked: [ + { + skill: "skill-c", + basePriority: 6, + effectivePriority: 6, + pattern: { type: "path", value: "c.*" }, + profilerBoost: 0, + policyBoost: 0, + policyReason: null, + summaryOnly: false, + synthetic: false, + droppedReason: null, + }, + ], + }), + ); + + writeTraceFixture(TEST_SESSION, [...winTraces, ...dilutionTraces]); + writeExposureFixture(TEST_SESSION, [ + // skill-b: 8 candidate wins + ...Array.from({ length: 8 }, (_, i) => + makeExposure({ + id: `exp-b-${i}`, + skill: "skill-b", + candidateSkill: "skill-b", + attributionRole: "candidate", + outcome: "win", + }), + ), + // skill-c: 8 candidate stale-misses (scenario dilution) + ...Array.from({ length: 8 }, (_, i) => + makeExposure({ + id: `exp-c-${i}`, + skill: "skill-c", + candidateSkill: "skill-c", + attributionRole: "candidate", + outcome: "stale-miss", + }), + ), + ]); + + const code = await runLearnCommand({ + project, + json: true, + session: TEST_SESSION, + }); + + expect(code).toBe(1); + }); +}); + +describe("learnedRulesPath", () => { + test("returns correct path", () => { + const path = learnedRulesPath("/my/project"); + expect(path).toBe("/my/project/generated/learned-routing-rules.json"); + }); +}); diff --git a/tests/companion-distillation.test.ts b/tests/companion-distillation.test.ts new file mode 100644 index 0000000..e8c462b --- /dev/null +++ b/tests/companion-distillation.test.ts @@ -0,0 +1,681 @@ +import { describe, test, expect } from "bun:test"; +import { randomUUID } from "node:crypto"; +import type { SkillExposure } from "../hooks/src/routing-policy-ledger.mts"; +import type { RoutingDecisionTrace } from "../hooks/src/routing-decision-trace.mts"; +import { distillCompanionRules } from "../hooks/src/companion-distillation.mts"; + +// --------------------------------------------------------------------------- +// Fixtures +// --------------------------------------------------------------------------- + +const T0 = "2026-03-28T08:00:00.000Z"; +const PROJECT = "/test/project"; + +/** Create a minimal SkillExposure fixture. */ +function makeExposure( + overrides: Partial & { + exposureGroupId: string; + skill: string; + attributionRole: "candidate" | "context"; + outcome: SkillExposure["outcome"]; + }, +): SkillExposure { + return { + id: randomUUID(), + sessionId: "test-session", + projectRoot: PROJECT, + storyId: null, + storyKind: "flow-verification", + route: "/dashboard", + hook: "PreToolUse", + toolName: "Bash", + targetBoundary: "uiRender", + candidateSkill: null, + createdAt: T0, + resolvedAt: T0, + ...overrides, + }; +} + +/** + * Generate N exposure groups where candidate+companion both exist. + * Each group has one candidate and one context exposure sharing the same outcome. + */ +function makeGroupedExposures(params: { + count: number; + candidateSkill: string; + companionSkill: string; + outcome: SkillExposure["outcome"]; + hook?: SkillExposure["hook"]; + toolName?: SkillExposure["toolName"]; + storyKind?: string | null; + targetBoundary?: SkillExposure["targetBoundary"]; + route?: string | null; +}): SkillExposure[] { + const exposures: SkillExposure[] = []; + for (let i = 0; i < params.count; i++) { + const groupId = `g-${params.outcome}-${params.candidateSkill}-${params.companionSkill}-${i}`; + exposures.push( + makeExposure({ + exposureGroupId: groupId, + skill: params.candidateSkill, + attributionRole: "candidate", + candidateSkill: params.candidateSkill, + outcome: params.outcome, + hook: params.hook ?? "PreToolUse", + toolName: params.toolName ?? "Bash", + storyKind: params.storyKind ?? "flow-verification", + targetBoundary: params.targetBoundary ?? "uiRender", + route: params.route ?? "/dashboard", + }), + makeExposure({ + exposureGroupId: groupId, + skill: params.companionSkill, + attributionRole: "context", + candidateSkill: params.candidateSkill, + outcome: params.outcome, + hook: params.hook ?? "PreToolUse", + toolName: params.toolName ?? "Bash", + storyKind: params.storyKind ?? "flow-verification", + targetBoundary: params.targetBoundary ?? "uiRender", + route: params.route ?? "/dashboard", + }), + ); + } + return exposures; +} + +/** + * Generate N candidate-only exposure groups (no companion). + */ +function makeSoloExposures(params: { + count: number; + candidateSkill: string; + outcome: SkillExposure["outcome"]; + hook?: SkillExposure["hook"]; + toolName?: SkillExposure["toolName"]; + storyKind?: string | null; + targetBoundary?: SkillExposure["targetBoundary"]; + route?: string | null; +}): SkillExposure[] { + const exposures: SkillExposure[] = []; + for (let i = 0; i < params.count; i++) { + const groupId = `g-solo-${params.outcome}-${params.candidateSkill}-${i}`; + exposures.push( + makeExposure({ + exposureGroupId: groupId, + skill: params.candidateSkill, + attributionRole: "candidate", + candidateSkill: params.candidateSkill, + outcome: params.outcome, + hook: params.hook ?? "PreToolUse", + toolName: params.toolName ?? "Bash", + storyKind: params.storyKind ?? "flow-verification", + targetBoundary: params.targetBoundary ?? "uiRender", + route: params.route ?? "/dashboard", + }), + ); + } + return exposures; +} + +const emptyTraces: RoutingDecisionTrace[] = []; + +// --------------------------------------------------------------------------- +// AC1: Promote when candidate+companion outperforms candidate-alone +// --------------------------------------------------------------------------- + +describe("AC1: promote when companion outperforms candidate-alone", () => { + test("emits promote rule with correct metrics when thresholds met", () => { + // 4 groups where candidate+companion both win + const withCompanion = makeGroupedExposures({ + count: 4, + candidateSkill: "verification", + companionSkill: "agent-browser-verify", + outcome: "win", + }); + // 4 solo groups where candidate alone wins only 50% + const soloWins = makeSoloExposures({ + count: 2, + candidateSkill: "verification", + outcome: "win", + }); + const soloLosses = makeSoloExposures({ + count: 2, + candidateSkill: "verification", + outcome: "stale-miss", + }); + + const result = distillCompanionRules({ + projectRoot: PROJECT, + traces: emptyTraces, + exposures: [...withCompanion, ...soloWins, ...soloLosses], + generatedAt: T0, + }); + + expect(result.version).toBe(1); + expect(result.rules.length).toBe(1); + + const rule = result.rules[0]; + expect(rule.confidence).toBe("promote"); + expect(rule.candidateSkill).toBe("verification"); + expect(rule.companionSkill).toBe("agent-browser-verify"); + expect(rule.support).toBe(4); + expect(rule.winsWithCompanion).toBe(4); + expect(rule.precisionWithCompanion).toBe(1.0); + expect(rule.baselinePrecisionWithoutCompanion).toBe(0.5); + expect(rule.liftVsCandidateAlone).toBe(2.0); + expect(rule.staleMissDelta).toBeLessThanOrEqual(0.10); + expect(rule.promotedAt).toBe(T0); + expect(rule.reason).toContain("companion beats candidate-alone"); + }); + + test("scenario id matches expected pipe-delimited format", () => { + const exposures = [ + ...makeGroupedExposures({ + count: 4, + candidateSkill: "verification", + companionSkill: "agent-browser-verify", + outcome: "win", + }), + ...makeSoloExposures({ + count: 4, + candidateSkill: "verification", + outcome: "stale-miss", + }), + ]; + + const result = distillCompanionRules({ + projectRoot: PROJECT, + traces: emptyTraces, + exposures, + generatedAt: T0, + }); + + const rule = result.rules[0]; + expect(rule.scenario).toBe( + "PreToolUse|flow-verification|uiRender|Bash|/dashboard", + ); + expect(rule.id).toBe( + "PreToolUse|flow-verification|uiRender|Bash|/dashboard::verification->agent-browser-verify", + ); + }); + + test("sourceExposureGroupIds are sorted", () => { + const exposures = [ + ...makeGroupedExposures({ + count: 4, + candidateSkill: "verification", + companionSkill: "agent-browser-verify", + outcome: "win", + }), + ...makeSoloExposures({ + count: 4, + candidateSkill: "verification", + outcome: "stale-miss", + }), + ]; + + const result = distillCompanionRules({ + projectRoot: PROJECT, + traces: emptyTraces, + exposures, + generatedAt: T0, + }); + + const ids = result.rules[0].sourceExposureGroupIds; + const sorted = [...ids].sort(); + expect(ids).toEqual(sorted); + }); + + test("promotion summary counts promoted rules", () => { + // 4 companion wins + 4 solo (2 wins, 2 stale-miss → 50% baseline) + const exposures = [ + ...makeGroupedExposures({ + count: 4, + candidateSkill: "verification", + companionSkill: "agent-browser-verify", + outcome: "win", + }), + ...makeSoloExposures({ + count: 2, + candidateSkill: "verification", + outcome: "win", + }), + ...makeSoloExposures({ + count: 2, + candidateSkill: "verification", + outcome: "stale-miss", + }), + ]; + + const result = distillCompanionRules({ + projectRoot: PROJECT, + traces: emptyTraces, + exposures, + generatedAt: T0, + }); + + expect(result.promotion.accepted).toBe(true); + expect(result.promotion.reason).toBe("1 promoted companion rules"); + }); +}); + +// --------------------------------------------------------------------------- +// AC2: Sparse data below threshold emits no promoted rule +// --------------------------------------------------------------------------- + +describe("AC2: sparse data rejects promotion", () => { + test("support below threshold yields holdout-fail", () => { + // Only 3 groups (below default minSupport=4) + const exposures = [ + ...makeGroupedExposures({ + count: 3, + candidateSkill: "verification", + companionSkill: "agent-browser-verify", + outcome: "win", + }), + ...makeSoloExposures({ + count: 3, + candidateSkill: "verification", + outcome: "stale-miss", + }), + ]; + + const result = distillCompanionRules({ + projectRoot: PROJECT, + traces: emptyTraces, + exposures, + generatedAt: T0, + }); + + expect(result.rules.length).toBe(1); + expect(result.rules[0].confidence).toBe("holdout-fail"); + expect(result.rules[0].promotedAt).toBeNull(); + expect(result.rules[0].reason).toContain("insufficient"); + }); + + test("low precision below threshold yields holdout-fail", () => { + // 4 groups but only 2 wins (50% < 75%) + const wins = makeGroupedExposures({ + count: 2, + candidateSkill: "ai-sdk", + companionSkill: "ai-elements", + outcome: "win", + }); + const misses = makeGroupedExposures({ + count: 2, + candidateSkill: "ai-sdk", + companionSkill: "ai-elements", + outcome: "stale-miss", + }); + + const result = distillCompanionRules({ + projectRoot: PROJECT, + traces: emptyTraces, + exposures: [...wins, ...misses], + generatedAt: T0, + }); + + const rule = result.rules.find( + (r) => + r.candidateSkill === "ai-sdk" && r.companionSkill === "ai-elements", + ); + expect(rule).toBeDefined(); + expect(rule!.confidence).toBe("holdout-fail"); + expect(rule!.precisionWithCompanion).toBe(0.5); + }); + + test("empty exposures produce empty rulebook", () => { + const result = distillCompanionRules({ + projectRoot: PROJECT, + traces: emptyTraces, + exposures: [], + generatedAt: T0, + }); + + expect(result.rules).toEqual([]); + expect(result.promotion.reason).toBe("0 promoted companion rules"); + }); + + test("exposures without exposureGroupId are skipped", () => { + const exposures: SkillExposure[] = [ + makeExposure({ + exposureGroupId: null as unknown as string, + skill: "verification", + attributionRole: "candidate", + outcome: "win", + }), + ]; + + // Filter out since exposureGroupId is null + const result = distillCompanionRules({ + projectRoot: PROJECT, + traces: emptyTraces, + exposures, + generatedAt: T0, + }); + + expect(result.rules).toEqual([]); + }); +}); + +// --------------------------------------------------------------------------- +// AC3: Stale-miss delta exceeds threshold → reject +// --------------------------------------------------------------------------- + +describe("AC3: stale-miss delta rejects promotion", () => { + test("companion with high stale-miss rate is rejected", () => { + // 4 companion groups: 3 wins + 1 stale-miss → precision 0.75, but... + const companionWins = makeGroupedExposures({ + count: 3, + candidateSkill: "verification", + companionSkill: "bad-companion", + outcome: "win", + }); + const companionStaleMisses = makeGroupedExposures({ + count: 1, + candidateSkill: "verification", + companionSkill: "bad-companion", + outcome: "stale-miss", + }); + // 4 solo groups: 3 wins + 1 pending (no stale-misses without companion) + const soloWins = makeSoloExposures({ + count: 3, + candidateSkill: "verification", + outcome: "win", + }); + const soloPending = makeSoloExposures({ + count: 1, + candidateSkill: "verification", + outcome: "pending", + }); + + const result = distillCompanionRules({ + projectRoot: PROJECT, + traces: emptyTraces, + exposures: [ + ...companionWins, + ...companionStaleMisses, + ...soloWins, + ...soloPending, + ], + generatedAt: T0, + maxStaleMissDelta: 0.10, + }); + + const rule = result.rules.find( + (r) => r.companionSkill === "bad-companion", + ); + expect(rule).toBeDefined(); + // staleMissDelta = 0.25 (companion) - 0.0 (solo) = 0.25 > 0.10 + expect(rule!.staleMissDelta).toBeGreaterThan(0.10); + expect(rule!.confidence).toBe("holdout-fail"); + }); + + test("companion within stale-miss threshold is promoted", () => { + // 5 companion groups: 4 wins + 1 stale-miss + const companionWins = makeGroupedExposures({ + count: 4, + candidateSkill: "verification", + companionSkill: "good-companion", + outcome: "win", + }); + const companionStale = makeGroupedExposures({ + count: 1, + candidateSkill: "verification", + companionSkill: "good-companion", + outcome: "stale-miss", + }); + // 5 solo groups: 2 wins + 2 stale-miss + 1 pending + // baseline stale rate = 2/5 = 0.4 + // companion stale rate = 1/5 = 0.2 + // delta = 0.2 - 0.4 = -0.2 (negative = improvement) + const soloWins = makeSoloExposures({ + count: 2, + candidateSkill: "verification", + outcome: "win", + }); + const soloStale = makeSoloExposures({ + count: 2, + candidateSkill: "verification", + outcome: "stale-miss", + }); + const soloPending = makeSoloExposures({ + count: 1, + candidateSkill: "verification", + outcome: "pending", + }); + + const result = distillCompanionRules({ + projectRoot: PROJECT, + traces: emptyTraces, + exposures: [ + ...companionWins, + ...companionStale, + ...soloWins, + ...soloStale, + ...soloPending, + ], + generatedAt: T0, + }); + + const rule = result.rules.find( + (r) => r.companionSkill === "good-companion", + ); + expect(rule).toBeDefined(); + expect(rule!.staleMissDelta).toBeLessThanOrEqual(0.10); + expect(rule!.confidence).toBe("promote"); + }); +}); + +// --------------------------------------------------------------------------- +// AC4: Does not change candidate-only policy credit semantics +// --------------------------------------------------------------------------- + +describe("AC4: reads grouped exposure fields only", () => { + test("only reads exposureGroupId, attributionRole, outcome, skill fields", () => { + // This is a structural test: distillation should work even with + // minimal grouped exposure data + const exposures = [ + ...makeGroupedExposures({ + count: 4, + candidateSkill: "verification", + companionSkill: "agent-browser-verify", + outcome: "win", + }), + ...makeSoloExposures({ + count: 4, + candidateSkill: "verification", + outcome: "stale-miss", + }), + ]; + + // Just verify it runs and produces a valid rulebook + const result = distillCompanionRules({ + projectRoot: PROJECT, + traces: emptyTraces, + exposures, + generatedAt: T0, + }); + + expect(result.version).toBe(1); + expect(result.projectRoot).toBe(PROJECT); + expect(result.generatedAt).toBe(T0); + expect(Array.isArray(result.rules)).toBe(true); + expect(result.replay).toBeDefined(); + expect(result.promotion).toBeDefined(); + }); + + test("groups without a candidate exposure are skipped", () => { + // Group with only context exposures — no candidate + const exposures: SkillExposure[] = [ + makeExposure({ + exposureGroupId: "g-orphan", + skill: "agent-browser-verify", + attributionRole: "context", + outcome: "win", + }), + makeExposure({ + exposureGroupId: "g-orphan", + skill: "another-skill", + attributionRole: "context", + outcome: "win", + }), + ]; + + const result = distillCompanionRules({ + projectRoot: PROJECT, + traces: emptyTraces, + exposures, + generatedAt: T0, + }); + + expect(result.rules).toEqual([]); + }); +}); + +// --------------------------------------------------------------------------- +// Metric rounding +// --------------------------------------------------------------------------- + +describe("deterministic rounding to 4 decimals", () => { + test("precision values are rounded to exactly 4 decimal places", () => { + // 3 wins out of 4 = 0.75 exactly, 1 out of 4 solo wins = 0.25 + const exposures = [ + ...makeGroupedExposures({ + count: 3, + candidateSkill: "ai-sdk", + companionSkill: "ai-elements", + outcome: "win", + }), + ...makeGroupedExposures({ + count: 1, + candidateSkill: "ai-sdk", + companionSkill: "ai-elements", + outcome: "stale-miss", + }), + ...makeSoloExposures({ + count: 1, + candidateSkill: "ai-sdk", + outcome: "win", + }), + ...makeSoloExposures({ + count: 2, + candidateSkill: "ai-sdk", + outcome: "stale-miss", + }), + ]; + + const result = distillCompanionRules({ + projectRoot: PROJECT, + traces: emptyTraces, + exposures, + generatedAt: T0, + }); + + const rule = result.rules[0]; + // Check values have at most 4 decimal digits + expect(String(rule.precisionWithCompanion).split(".")[1]?.length ?? 0).toBeLessThanOrEqual(4); + expect(String(rule.baselinePrecisionWithoutCompanion).split(".")[1]?.length ?? 0).toBeLessThanOrEqual(4); + expect(String(rule.liftVsCandidateAlone).split(".")[1]?.length ?? 0).toBeLessThanOrEqual(4); + expect(String(rule.staleMissDelta).split(".")[1]?.length ?? 0).toBeLessThanOrEqual(4); + }); +}); + +// --------------------------------------------------------------------------- +// Rule sorting +// --------------------------------------------------------------------------- + +describe("deterministic rule ordering", () => { + test("rules are sorted by scenario, candidateSkill, companionSkill", () => { + const exposures = [ + // Scenario B (alphabetically second) + ...makeGroupedExposures({ + count: 4, + candidateSkill: "z-skill", + companionSkill: "z-companion", + outcome: "win", + route: "/z-route", + }), + ...makeSoloExposures({ + count: 4, + candidateSkill: "z-skill", + outcome: "stale-miss", + route: "/z-route", + }), + // Scenario A (alphabetically first) + ...makeGroupedExposures({ + count: 4, + candidateSkill: "a-skill", + companionSkill: "a-companion", + outcome: "win", + route: "/a-route", + }), + ...makeSoloExposures({ + count: 4, + candidateSkill: "a-skill", + outcome: "stale-miss", + route: "/a-route", + }), + ]; + + const result = distillCompanionRules({ + projectRoot: PROJECT, + traces: emptyTraces, + exposures, + generatedAt: T0, + }); + + expect(result.rules.length).toBe(2); + expect(result.rules[0].candidateSkill).toBe("a-skill"); + expect(result.rules[1].candidateSkill).toBe("z-skill"); + }); +}); + +// --------------------------------------------------------------------------- +// Custom threshold overrides +// --------------------------------------------------------------------------- + +describe("custom threshold overrides", () => { + test("minSupport override allows lower support", () => { + // 2 companion wins + 2 solo (1 win, 1 stale-miss → 50% baseline) + const exposures = [ + ...makeGroupedExposures({ + count: 2, + candidateSkill: "verification", + companionSkill: "agent-browser-verify", + outcome: "win", + }), + ...makeSoloExposures({ + count: 1, + candidateSkill: "verification", + outcome: "win", + }), + ...makeSoloExposures({ + count: 1, + candidateSkill: "verification", + outcome: "stale-miss", + }), + ]; + + // Default would reject (support=2 < minSupport=4) + const defaultResult = distillCompanionRules({ + projectRoot: PROJECT, + traces: emptyTraces, + exposures, + generatedAt: T0, + }); + expect(defaultResult.rules[0].confidence).toBe("holdout-fail"); + + // With minSupport=2 should promote (precision=1.0, lift=2.0) + const customResult = distillCompanionRules({ + projectRoot: PROJECT, + traces: emptyTraces, + exposures, + generatedAt: T0, + minSupport: 2, + }); + expect(customResult.rules[0].confidence).toBe("promote"); + }); +}); diff --git a/tests/decision-cat-cli.test.ts b/tests/decision-cat-cli.test.ts new file mode 100644 index 0000000..805e1bc --- /dev/null +++ b/tests/decision-cat-cli.test.ts @@ -0,0 +1,262 @@ +import { afterEach, describe, expect, test } from "bun:test"; +import { mkdirSync, rmSync, writeFileSync } from "node:fs"; +import { join, resolve } from "node:path"; +import { tmpdir } from "node:os"; +import { + buildDecisionCapsule, + decisionCapsuleDir, + persistDecisionCapsule, + type DecisionCapsuleV1, +} from "../hooks/src/routing-decision-capsule.mts"; +import type { RoutingDecisionTrace } from "../hooks/src/routing-decision-trace.mts"; +import type { VerificationDirective } from "../hooks/src/verification-directive.mts"; +import { runDecisionCat, formatDecisionCapsule } from "../src/commands/decision-cat.ts"; + +const ROOT = resolve(import.meta.dir, ".."); +const CLI = join(ROOT, "src", "cli", "index.ts"); +const SESSION_ID = "decision-cat-test"; + +async function runCli( + ...args: string[] +): Promise<{ stdout: string; stderr: string; exitCode: number }> { + const proc = Bun.spawn(["bun", "run", CLI, ...args], { + cwd: ROOT, + stdout: "pipe", + stderr: "pipe", + env: { ...process.env, NO_COLOR: "1" }, + }); + const [stdout, stderr] = await Promise.all([ + new Response(proc.stdout).text(), + new Response(proc.stderr).text(), + ]); + const exitCode = await proc.exited; + return { stdout: stdout.trim(), stderr: stderr.trim(), exitCode }; +} + +function makeTrace(): RoutingDecisionTrace { + return { + version: 2, + decisionId: "test-decision-001", + sessionId: SESSION_ID, + hook: "PreToolUse", + toolName: "Read", + toolTarget: "app/page.tsx", + timestamp: "2026-03-28T02:30:00.000Z", + primaryStory: { + id: "story-1", + kind: "flow-verification", + storyRoute: "/settings", + targetBoundary: "uiRender", + }, + observedRoute: null, + policyScenario: "PreToolUse|flow-verification|uiRender|Read", + matchedSkills: ["nextjs", "react-best-practices"], + injectedSkills: ["nextjs"], + skippedReasons: [], + ranked: [ + { + skill: "nextjs", + basePriority: 7, + effectivePriority: 12, + pattern: { type: "suffix", value: "app/**/*.tsx" }, + profilerBoost: 5, + policyBoost: 0, + policyReason: null, + summaryOnly: false, + synthetic: false, + droppedReason: null, + }, + ], + verification: { + verificationId: "verify-1", + observedBoundary: null, + matchedSuggestedAction: null, + }, + }; +} + +function makeDirective(): VerificationDirective { + return { + version: 1, + storyId: "story-1", + storyKind: "flow-verification", + route: "/settings", + missingBoundaries: ["uiRender"], + satisfiedBoundaries: ["clientRequest", "serverHandler"], + primaryNextAction: { + action: "open /settings in agent-browser", + targetBoundary: "uiRender", + reason: "No UI render observation yet", + }, + blockedReasons: [], + }; +} + +function makeCapsule(): DecisionCapsuleV1 { + return buildDecisionCapsule({ + sessionId: SESSION_ID, + hook: "PreToolUse", + createdAt: "2026-03-28T02:30:00.000Z", + toolName: "Read", + toolTarget: "app/page.tsx", + platform: "claude-code", + trace: makeTrace(), + directive: makeDirective(), + attribution: { + exposureGroupId: "group-1", + candidateSkill: "nextjs", + loadedSkills: ["nextjs"], + }, + reasons: { + nextjs: { trigger: "suffix", reasonCode: "pattern-match" }, + }, + env: { VERCEL_PLUGIN_VERIFICATION_ROUTE: "/settings" }, + }); +} + +afterEach(() => { + rmSync(decisionCapsuleDir(SESSION_ID), { recursive: true, force: true }); +}); + +// --------------------------------------------------------------------------- +// Unit: runDecisionCat +// --------------------------------------------------------------------------- + +describe("runDecisionCat", () => { + test("returns JSON with ok:true for valid capsule", () => { + const capsule = makeCapsule(); + const path = persistDecisionCapsule(capsule); + const { output, ok } = runDecisionCat(path, true); + const parsed = JSON.parse(output); + + expect(ok).toBe(true); + expect(parsed.ok).toBe(true); + expect(parsed.capsule.decisionId).toBe("test-decision-001"); + expect(parsed.capsule.sha256).toBe(capsule.sha256); + }); + + test("returns JSON with ok:false for missing file", () => { + const { output, ok } = runDecisionCat("/tmp/nonexistent-capsule.json", true); + const parsed = JSON.parse(output); + + expect(ok).toBe(false); + expect(parsed.ok).toBe(false); + expect(parsed.capsule).toBeNull(); + expect(parsed.error).toContain("nonexistent-capsule.json"); + }); + + test("returns human-readable text for valid capsule", () => { + const capsule = makeCapsule(); + const path = persistDecisionCapsule(capsule); + const { output, ok } = runDecisionCat(path, false); + + expect(ok).toBe(true); + expect(output).toContain("Decision: test-decision-001"); + expect(output).toContain("Hook: PreToolUse"); + expect(output).toContain("Tool: Read"); + expect(output).toContain("Target: app/page.tsx"); + expect(output).toContain("Story: flow-verification (/settings)"); + expect(output).toContain("Injected: nextjs"); + expect(output).toContain("Candidate: nextjs"); + expect(output).toContain("SHA256:"); + }); + + test("returns error text for missing file", () => { + const { output, ok } = runDecisionCat("/tmp/nonexistent-capsule.json", false); + + expect(ok).toBe(false); + expect(output).toContain("Decision capsule not found"); + }); +}); + +// --------------------------------------------------------------------------- +// Unit: formatDecisionCapsule +// --------------------------------------------------------------------------- + +describe("formatDecisionCapsule", () => { + test("includes issues section when issues exist", () => { + const capsule = makeCapsule(); + const text = formatDecisionCapsule(capsule); + + expect(text).toContain("Issues:"); + expect(text).toContain("machine_output_hidden_in_html_comment"); + }); + + test("shows 'none' for missing optional fields", () => { + const capsule = makeCapsule(); + capsule.injectedSkills = []; + capsule.attribution = null; + capsule.activeStory = { id: null, kind: null, route: null, targetBoundary: null }; + + const text = formatDecisionCapsule(capsule); + + expect(text).toContain("Injected: none"); + expect(text).toContain("Candidate: none"); + expect(text).toContain("Story: none"); + }); +}); + +// --------------------------------------------------------------------------- +// CLI integration: decision-cat +// --------------------------------------------------------------------------- + +describe("CLI decision-cat", () => { + test("--help prints usage with decision-cat", async () => { + const { stdout, exitCode } = await runCli("--help"); + expect(exitCode).toBe(0); + expect(stdout).toContain("decision-cat"); + }); + + test("decision-cat with no args exits 1", async () => { + const { stderr, exitCode } = await runCli("decision-cat"); + expect(exitCode).toBe(1); + expect(stderr).toContain("requires"); + }); + + test("decision-cat --json returns valid JSON for a persisted capsule", async () => { + const capsule = makeCapsule(); + const path = persistDecisionCapsule(capsule); + + const { stdout, exitCode } = await runCli("decision-cat", path, "--json"); + expect(exitCode).toBe(0); + + const parsed = JSON.parse(stdout); + expect(parsed.ok).toBe(true); + expect(parsed.capsule.decisionId).toBe("test-decision-001"); + expect(parsed.capsule.sha256).toBe(capsule.sha256); + }); + + test("decision-cat prints human summary for a persisted capsule", async () => { + const capsule = makeCapsule(); + const path = persistDecisionCapsule(capsule); + + const { stdout, exitCode } = await runCli("decision-cat", path); + expect(exitCode).toBe(0); + expect(stdout).toContain("Decision: test-decision-001"); + expect(stdout).toContain("Hook: PreToolUse"); + expect(stdout).toContain("Candidate: nextjs"); + expect(stdout).toContain("SHA256:"); + }); + + test("decision-cat --json returns ok:false for missing file", async () => { + const { stdout, exitCode } = await runCli( + "decision-cat", + "/tmp/no-such-capsule.json", + "--json", + ); + expect(exitCode).toBe(2); + + const parsed = JSON.parse(stdout); + expect(parsed.ok).toBe(false); + expect(parsed.capsule).toBeNull(); + }); + + test("decision-cat exits 2 for missing file (text mode)", async () => { + const { stderr, exitCode } = await runCli( + "decision-cat", + "/tmp/no-such-capsule.json", + ); + expect(exitCode).toBe(2); + expect(stderr).toContain("not found"); + }); +}); diff --git a/tests/hook-sync.test.ts b/tests/hook-sync.test.ts index 6a310e3..d7f997f 100644 --- a/tests/hook-sync.test.ts +++ b/tests/hook-sync.test.ts @@ -139,6 +139,69 @@ describe("vercel-config .mts/.mjs sync", () => { }); }); +// --------------------------------------------------------------------------- +// policy-recall module +// --------------------------------------------------------------------------- +describe("policy-recall .mts/.mjs sync", () => { + test("exported function names match", async () => { + const src = await load("hooks/src/policy-recall.mts"); + const compiled = await load("hooks/policy-recall.mjs"); + + const srcFns = Object.keys(src).filter((k) => typeof src[k] === "function").sort(); + const compiledFns = Object.keys(compiled).filter((k) => typeof compiled[k] === "function").sort(); + + expect(compiledFns).toEqual(srcFns); + }); + + test("selectPolicyRecallCandidates produces identical output", async () => { + const src = await load("hooks/src/policy-recall.mts"); + const compiled = await load("hooks/policy-recall.mjs"); + + const policy = { + version: 1, + scenarios: { + "Read|tsx|react-best-practices|/app": { + "react-best-practices": { + exposures: 5, + wins: 4, + directiveWins: 1, + staleMisses: 0, + }, + "nextjs": { + exposures: 2, + wins: 1, + directiveWins: 0, + staleMisses: 0, + }, + }, + }, + }; + const scenario = { + toolName: "Read", + fileType: "tsx", + skill: "react-best-practices", + routeScope: "/app", + }; + + const srcResult = src.selectPolicyRecallCandidates(policy, scenario); + const compiledResult = compiled.selectPolicyRecallCandidates(policy, scenario); + + expect(compiledResult).toEqual(srcResult); + }); + + test("empty policy returns empty candidates from both", async () => { + const src = await load("hooks/src/policy-recall.mts"); + const compiled = await load("hooks/policy-recall.mjs"); + + const empty = { version: 1, scenarios: {} }; + const scenario = { toolName: "Read", fileType: "ts", skill: "nextjs", routeScope: "/" }; + + expect(compiled.selectPolicyRecallCandidates(empty, scenario)).toEqual( + src.selectPolicyRecallCandidates(empty, scenario), + ); + }); +}); + // --------------------------------------------------------------------------- // logger module // --------------------------------------------------------------------------- diff --git a/tests/hooks-json-structural.test.ts b/tests/hooks-json-structural.test.ts index 2681513..7d9c571 100644 --- a/tests/hooks-json-structural.test.ts +++ b/tests/hooks-json-structural.test.ts @@ -41,6 +41,73 @@ describe("hooks.json SubagentStart", () => { }); }); +describe("hooks.json PostToolUse verification observer coverage", () => { + const postToolUseGroups = hooksJson.hooks.PostToolUse; + + test("verification observer is registered for Bash", () => { + const bashGroup = postToolUseGroups.find((g) => g.matcher === "Bash"); + expect(bashGroup).toBeDefined(); + const observerHook = bashGroup!.hooks.find((h) => + h.command.includes("posttooluse-verification-observe"), + ); + expect(observerHook).toBeDefined(); + expect(observerHook!.timeout).toBe(5); + }); + + test("verification observer is registered for non-Bash tools", () => { + const nonBashGroup = postToolUseGroups.find( + (g) => g.matcher.includes("Read") && g.matcher.includes("WebFetch"), + ); + expect(nonBashGroup).toBeDefined(); + expect(nonBashGroup!.matcher).toBe("Read|Edit|Write|Glob|Grep|WebFetch"); + + const observerHook = nonBashGroup!.hooks.find((h) => + h.command.includes("posttooluse-verification-observe"), + ); + expect(observerHook).toBeDefined(); + expect(observerHook!.timeout).toBe(5); + }); + + test("non-Bash observer group does NOT include shadcn-font-fix or bash-chain", () => { + const nonBashGroup = postToolUseGroups.find( + (g) => g.matcher.includes("Read") && g.matcher.includes("WebFetch"), + ); + expect(nonBashGroup).toBeDefined(); + const hasUnrelated = nonBashGroup!.hooks.some( + (h) => h.command.includes("shadcn-font-fix") || h.command.includes("bash-chain"), + ); + expect(hasUnrelated).toBe(false); + }); + + test("Bash-only hooks remain scoped to Bash matcher only", () => { + const bashGroup = postToolUseGroups.find((g) => g.matcher === "Bash"); + expect(bashGroup).toBeDefined(); + const hasShadcn = bashGroup!.hooks.some((h) => h.command.includes("shadcn-font-fix")); + const hasBashChain = bashGroup!.hooks.some((h) => h.command.includes("bash-chain")); + expect(hasShadcn).toBe(true); + expect(hasBashChain).toBe(true); + }); + + test("fixture matrix: every registered tool name reaches observer", () => { + // Build a map of tool_name -> whether observer is reachable + const toolNames = ["Bash", "Read", "Edit", "Write", "Glob", "Grep", "WebFetch"]; + const matrix: Record = {}; + for (const tool of toolNames) { + const reachable = postToolUseGroups.some((g) => { + const matcherRegex = new RegExp(`^(${g.matcher})$`); + return matcherRegex.test(tool) && g.hooks.some((h) => + h.command.includes("posttooluse-verification-observe"), + ); + }); + matrix[tool] = reachable; + } + // All tools must reach the observer + for (const tool of toolNames) { + expect(matrix[tool]).toBe(true); + } + }); +}); + describe("hooks.json SubagentStop", () => { const groups = hooksJson.hooks.SubagentStop; diff --git a/tests/learn-companion-cli.test.ts b/tests/learn-companion-cli.test.ts new file mode 100644 index 0000000..d4a4f49 --- /dev/null +++ b/tests/learn-companion-cli.test.ts @@ -0,0 +1,257 @@ +import { describe, test, expect, beforeEach, afterEach } from "bun:test"; +import { mkdirSync, writeFileSync, rmSync, existsSync } from "node:fs"; +import { join } from "node:path"; +import { tmpdir } from "node:os"; +import { runLearnCommand, learnedRulesPath, type LearnCommandOutput } from "../src/cli/learn.ts"; +import { companionRulebookPath } from "../hooks/src/learned-companion-rulebook.mts"; + +// --------------------------------------------------------------------------- +// Helpers +// --------------------------------------------------------------------------- + +const TEST_DIR = join(tmpdir(), `vercel-plugin-learn-companion-test-${Date.now()}`); + +function setupTestProject(): string { + mkdirSync(join(TEST_DIR, "skills"), { recursive: true }); + mkdirSync(join(TEST_DIR, "generated"), { recursive: true }); + // Minimal skill for the learn command to find + const skillDir = join(TEST_DIR, "skills", "test-skill"); + mkdirSync(skillDir, { recursive: true }); + writeFileSync( + join(skillDir, "SKILL.md"), + [ + "---", + "name: test-skill", + 'description: "test"', + "metadata:", + " priority: 6", + "---", + "# Test skill body", + ].join("\n"), + ); + return TEST_DIR; +} + +// --------------------------------------------------------------------------- +// Lifecycle +// --------------------------------------------------------------------------- + +beforeEach(() => { + setupTestProject(); +}); + +afterEach(() => { + try { + rmSync(TEST_DIR, { recursive: true, force: true }); + } catch { + // ignore + } + // Clean companion rulebook artifact + try { + rmSync(companionRulebookPath(TEST_DIR), { force: true }); + } catch { + // ignore + } + // Clean learned rules artifact + try { + rmSync(learnedRulesPath(TEST_DIR), { force: true }); + } catch { + // ignore + } +}); + +// --------------------------------------------------------------------------- +// JSON output structure +// --------------------------------------------------------------------------- + +describe("learn command companion JSON output", () => { + test("returns JSON with rules, companions, and companionPath fields", async () => { + const logs: string[] = []; + const origLog = console.log; + const origErr = console.error; + console.log = (msg: string) => logs.push(msg); + console.error = () => {}; + + try { + const exitCode = await runLearnCommand({ + project: TEST_DIR, + json: true, + }); + + expect(exitCode).toBe(0); + expect(logs.length).toBeGreaterThan(0); + + const output: LearnCommandOutput = JSON.parse(logs.join("")); + + // Must have all three top-level fields + expect(output).toHaveProperty("rules"); + expect(output).toHaveProperty("companions"); + expect(output).toHaveProperty("companionPath"); + + // rules is the existing single-skill rulebook + expect(output.rules).toHaveProperty("version"); + expect(output.rules).toHaveProperty("rules"); + expect(output.rules).toHaveProperty("replay"); + expect(output.rules).toHaveProperty("promotion"); + + // companions is a companion rulebook + expect(output.companions.version).toBe(1); + expect(Array.isArray(output.companions.rules)).toBe(true); + expect(output.companions).toHaveProperty("replay"); + expect(output.companions).toHaveProperty("promotion"); + + // companionPath is deterministic + expect(output.companionPath).toBe(companionRulebookPath(TEST_DIR)); + } finally { + console.log = origLog; + console.error = origErr; + } + }); + + test("companions rulebook is empty when no traces exist", async () => { + const logs: string[] = []; + const origLog = console.log; + const origErr = console.error; + console.log = (msg: string) => logs.push(msg); + console.error = () => {}; + + try { + await runLearnCommand({ project: TEST_DIR, json: true }); + + const output: LearnCommandOutput = JSON.parse(logs.join("")); + expect(output.companions.rules).toEqual([]); + expect(output.companions.promotion.accepted).toBe(true); + } finally { + console.log = origLog; + console.error = origErr; + } + }); + + test("companionPath is deterministic for same project root", async () => { + const logs1: string[] = []; + const logs2: string[] = []; + const origLog = console.log; + const origErr = console.error; + + try { + console.log = (msg: string) => logs1.push(msg); + console.error = () => {}; + await runLearnCommand({ project: TEST_DIR, json: true }); + + console.log = (msg: string) => logs2.push(msg); + await runLearnCommand({ project: TEST_DIR, json: true }); + + const output1: LearnCommandOutput = JSON.parse(logs1.join("")); + const output2: LearnCommandOutput = JSON.parse(logs2.join("")); + + expect(output1.companionPath).toBe(output2.companionPath); + } finally { + console.log = origLog; + console.error = origErr; + } + }); +}); + +// --------------------------------------------------------------------------- +// Write mode +// --------------------------------------------------------------------------- + +describe("learn command companion write mode", () => { + test("persists companion rulebook to dedicated artifact path", async () => { + const origLog = console.log; + const origErr = console.error; + const stderrLogs: string[] = []; + console.log = () => {}; + console.error = (msg: string) => stderrLogs.push(msg); + + try { + const exitCode = await runLearnCommand({ + project: TEST_DIR, + write: true, + }); + + expect(exitCode).toBe(0); + + // Companion artifact should exist at its own path + const cPath = companionRulebookPath(TEST_DIR); + expect(existsSync(cPath)).toBe(true); + + // Single-skill artifact should also exist + expect(existsSync(learnedRulesPath(TEST_DIR))).toBe(true); + + // Companion path should differ from single-skill path + expect(cPath).not.toBe(learnedRulesPath(TEST_DIR)); + + // Stderr should log companion write event + const companionWriteLog = stderrLogs.find((l) => l.includes("learn_companion_written")); + expect(companionWriteLog).toBeDefined(); + } finally { + console.log = origLog; + console.error = origErr; + } + }); + + test("does not write companion artifact when write mode is disabled", async () => { + const origLog = console.log; + const origErr = console.error; + console.log = () => {}; + console.error = () => {}; + + try { + await runLearnCommand({ project: TEST_DIR, json: true }); + + const cPath = companionRulebookPath(TEST_DIR); + expect(existsSync(cPath)).toBe(false); + } finally { + console.log = origLog; + console.error = origErr; + } + }); +}); + +// --------------------------------------------------------------------------- +// Human-readable output +// --------------------------------------------------------------------------- + +describe("learn command companion text output", () => { + test("includes companion rules summary in text mode", async () => { + const logs: string[] = []; + const origLog = console.log; + const origErr = console.error; + console.log = (msg: string) => logs.push(msg); + console.error = () => {}; + + try { + await runLearnCommand({ project: TEST_DIR }); + + const text = logs.join("\n"); + expect(text).toContain("Companion rules: 0"); + expect(text).toContain("promoted: 0"); + } finally { + console.log = origLog; + console.error = origErr; + } + }); + + test("text output is stable when no companion rules are promoted", async () => { + const logs1: string[] = []; + const logs2: string[] = []; + const origLog = console.log; + const origErr = console.error; + + try { + console.log = (msg: string) => logs1.push(msg); + console.error = () => {}; + await runLearnCommand({ project: TEST_DIR }); + + console.log = (msg: string) => logs2.push(msg); + await runLearnCommand({ project: TEST_DIR }); + + // Exact match — output is deterministic + expect(logs1.join("\n")).toBe(logs2.join("\n")); + } finally { + console.log = origLog; + console.error = origErr; + } + }); +}); diff --git a/tests/learn-playbook-cli.test.ts b/tests/learn-playbook-cli.test.ts new file mode 100644 index 0000000..35e85de --- /dev/null +++ b/tests/learn-playbook-cli.test.ts @@ -0,0 +1,339 @@ +import { describe, test, expect, beforeEach, afterEach } from "bun:test"; +import { mkdirSync, writeFileSync, rmSync, existsSync, readFileSync } from "node:fs"; +import { join } from "node:path"; +import { tmpdir } from "node:os"; +import { runLearnCommand, type LearnCommandOutput } from "../src/cli/learn.ts"; +import { playbookRulebookPath } from "../hooks/src/learned-playbook-rulebook.mts"; + +// --------------------------------------------------------------------------- +// Fixtures +// --------------------------------------------------------------------------- + +const FIXED_TS = "2026-03-28T06:00:00.000Z"; +const TEST_SESSION = "test-learn-playbook-cli"; +let tempProjectCounter = 0; + +function makeTempProject(): string { + tempProjectCounter += 1; + const dir = join( + tmpdir(), + `vercel-plugin-learn-playbook-test-${Date.now()}-${tempProjectCounter}`, + ); + mkdirSync(join(dir, "skills"), { recursive: true }); + mkdirSync(join(dir, "generated"), { recursive: true }); + return dir; +} + +function writeTraceFixture(sessionId: string, traces: object[]): void { + const traceDir = join(tmpdir(), `vercel-plugin-${sessionId}-trace`); + mkdirSync(traceDir, { recursive: true }); + const lines = traces.map((t) => JSON.stringify(t)).join("\n") + "\n"; + writeFileSync(join(traceDir, "routing-decision-trace.jsonl"), lines); +} + +function writeExposureFixture(sessionId: string, exposures: object[]): void { + const path = join(tmpdir(), `vercel-plugin-${sessionId}-routing-exposures.jsonl`); + const lines = exposures.map((e) => JSON.stringify(e)).join("\n") + "\n"; + writeFileSync(path, lines); +} + +function makeTrace(overrides: Record = {}): Record { + return { + version: 2, + decisionId: "d1", + sessionId: TEST_SESSION, + hook: "PreToolUse", + toolName: "Bash", + toolTarget: "npm test", + timestamp: FIXED_TS, + primaryStory: { + id: "story-1", + kind: "flow-verification", + storyRoute: "/settings", + targetBoundary: "clientRequest", + }, + observedRoute: "/settings", + policyScenario: null, + matchedSkills: [], + injectedSkills: [], + skippedReasons: [], + ranked: [], + verification: null, + ...overrides, + }; +} + +function makeExposure(overrides: Record = {}): Record { + return { + id: "exp-1", + sessionId: TEST_SESSION, + projectRoot: "/test", + storyId: "story-1", + storyKind: "flow-verification", + route: "/settings", + hook: "PreToolUse", + toolName: "Bash", + targetBoundary: "clientRequest", + exposureGroupId: null, + attributionRole: "candidate", + candidateSkill: "verification", + createdAt: FIXED_TS, + resolvedAt: FIXED_TS, + outcome: "win", + skill: "verification", + ...overrides, + }; +} + +/** Capture stdout from runLearnCommand. */ +async function captureJsonOutput(options: Parameters[0]): Promise { + const logs: string[] = []; + const origLog = console.log; + console.log = (msg: string) => logs.push(msg); + try { + await runLearnCommand({ ...options, json: true }); + } finally { + console.log = origLog; + } + return JSON.parse(logs.join("\n")); +} + +/** Capture human-readable stdout from runLearnCommand. */ +async function captureTextOutput(options: Parameters[0]): Promise { + const logs: string[] = []; + const origLog = console.log; + console.log = (msg: string) => logs.push(msg); + try { + await runLearnCommand(options); + } finally { + console.log = origLog; + } + return logs.join("\n"); +} + +// --------------------------------------------------------------------------- +// Cleanup +// --------------------------------------------------------------------------- + +let tempDirs: string[] = []; + +function trackDir(dir: string): string { + tempDirs.push(dir); + return dir; +} + +beforeEach(() => { + tempDirs = []; + tempProjectCounter = 0; +}); + +afterEach(() => { + for (const dir of tempDirs) { + try { rmSync(dir, { recursive: true, force: true }); } catch {} + } + try { rmSync(join(tmpdir(), `vercel-plugin-${TEST_SESSION}-trace`), { recursive: true, force: true }); } catch {} + try { rmSync(join(tmpdir(), `vercel-plugin-${TEST_SESSION}-routing-exposures.jsonl`), { force: true }); } catch {} +}); + +// --------------------------------------------------------------------------- +// Tests: --json output includes playbooks +// --------------------------------------------------------------------------- + +describe("learn --json playbook fields", () => { + test("no-trace path includes empty playbooks and playbookPath", async () => { + const project = trackDir(makeTempProject()); + const output = await captureJsonOutput({ project, session: TEST_SESSION }); + + expect(output.playbooks).toBeDefined(); + expect(output.playbooks.version).toBe(1); + expect(output.playbooks.rules).toEqual([]); + expect(output.playbookPath).toBe(playbookRulebookPath(project)); + }); + + test("normal distillation path includes playbooks and playbookPath", async () => { + const project = trackDir(makeTempProject()); + + // Need at least one trace so we enter the normal distillation branch + writeTraceFixture(TEST_SESSION, [ + makeTrace({ + decisionId: "d1", + injectedSkills: ["verification"], + ranked: [{ + skill: "verification", + basePriority: 6, + effectivePriority: 6, + pattern: { type: "bash", value: "npm test" }, + profilerBoost: 0, + policyBoost: 0, + policyReason: null, + summaryOnly: false, + synthetic: false, + droppedReason: null, + }], + }), + ]); + writeExposureFixture(TEST_SESSION, [ + makeExposure({ skill: "verification", outcome: "win" }), + ]); + + const output = await captureJsonOutput({ project, session: TEST_SESSION }); + + expect(output.playbooks).toBeDefined(); + expect(output.playbooks.version).toBe(1); + expect(output.playbooks.projectRoot).toBe(project); + expect(Array.isArray(output.playbooks.rules)).toBe(true); + expect(output.playbookPath).toBe(playbookRulebookPath(project)); + }); + + test("playbook rulebook is versioned in JSON output", async () => { + const project = trackDir(makeTempProject()); + const output = await captureJsonOutput({ project, session: TEST_SESSION }); + + expect(output.playbooks.version).toBe(1); + expect(typeof output.playbooks.generatedAt).toBe("string"); + expect(output.playbooks.promotion).toBeDefined(); + expect(output.playbooks.replay).toBeDefined(); + }); +}); + +// --------------------------------------------------------------------------- +// Tests: --write persists playbook artifact +// --------------------------------------------------------------------------- + +describe("learn --write playbook persistence", () => { + test("--write persists generated/learned-playbooks.json (no traces)", async () => { + const project = trackDir(makeTempProject()); + const code = await runLearnCommand({ project, write: true, session: TEST_SESSION }); + expect(code).toBe(0); + + const path = playbookRulebookPath(project); + expect(existsSync(path)).toBe(true); + + const content = JSON.parse(readFileSync(path, "utf-8")); + expect(content.version).toBe(1); + expect(content.rules).toEqual([]); + }); + + test("--write persists generated/learned-playbooks.json (with traces)", async () => { + const project = trackDir(makeTempProject()); + + writeTraceFixture(TEST_SESSION, [ + makeTrace({ + decisionId: "d1", + injectedSkills: ["verification"], + ranked: [{ + skill: "verification", + basePriority: 6, + effectivePriority: 6, + pattern: { type: "bash", value: "npm test" }, + profilerBoost: 0, + policyBoost: 0, + policyReason: null, + summaryOnly: false, + synthetic: false, + droppedReason: null, + }], + }), + ]); + writeExposureFixture(TEST_SESSION, [ + makeExposure({ skill: "verification", outcome: "win" }), + ]); + + const code = await runLearnCommand({ project, write: true, session: TEST_SESSION }); + expect(code).toBe(0); + + const path = playbookRulebookPath(project); + expect(existsSync(path)).toBe(true); + + const content = JSON.parse(readFileSync(path, "utf-8")); + expect(content.version).toBe(1); + expect(content.projectRoot).toBe(project); + }); + + test("--write emits playbook write event to stderr", async () => { + const project = trackDir(makeTempProject()); + const stderrLogs: string[] = []; + const origError = console.error; + console.error = (msg: string) => stderrLogs.push(msg); + try { + await runLearnCommand({ project, write: true, session: TEST_SESSION }); + } finally { + console.error = origError; + } + + const playbookEvent = stderrLogs.find((line) => { + try { + const parsed = JSON.parse(line); + return parsed.event === "learn_playbooks_written"; + } catch { + return false; + } + }); + expect(playbookEvent).toBeDefined(); + + const parsed = JSON.parse(playbookEvent!); + expect(parsed.path).toBe(playbookRulebookPath(project)); + }); + + test("dry-run does NOT create playbook artifact", async () => { + const project = trackDir(makeTempProject()); + await runLearnCommand({ project, json: true, session: TEST_SESSION }); + expect(existsSync(playbookRulebookPath(project))).toBe(false); + }); +}); + +// --------------------------------------------------------------------------- +// Tests: human-readable output includes playbook lines +// --------------------------------------------------------------------------- + +describe("learn human-readable playbook output", () => { + test("no-trace output includes Playbooks: 0", async () => { + const project = trackDir(makeTempProject()); + const text = await captureTextOutput({ project, session: TEST_SESSION }); + expect(text).toContain("Playbooks: 0"); + expect(text).toContain("promoted: 0"); + }); + + test("normal path output includes Playbooks line", async () => { + const project = trackDir(makeTempProject()); + + writeTraceFixture(TEST_SESSION, [ + makeTrace({ + decisionId: "d1", + injectedSkills: ["verification"], + ranked: [{ + skill: "verification", + basePriority: 6, + effectivePriority: 6, + pattern: { type: "bash", value: "npm test" }, + profilerBoost: 0, + policyBoost: 0, + policyReason: null, + summaryOnly: false, + synthetic: false, + droppedReason: null, + }], + }), + ]); + writeExposureFixture(TEST_SESSION, [ + makeExposure({ skill: "verification", outcome: "win" }), + ]); + + const text = await captureTextOutput({ project, session: TEST_SESSION }); + expect(text).toContain("Playbooks:"); + }); +}); + +// --------------------------------------------------------------------------- +// Tests: playbookPath is deterministic +// --------------------------------------------------------------------------- + +describe("playbookPath determinism", () => { + test("playbookPath is deterministic across runs", async () => { + const project = trackDir(makeTempProject()); + const output1 = await captureJsonOutput({ project, session: TEST_SESSION }); + const output2 = await captureJsonOutput({ project, session: TEST_SESSION }); + expect(output1.playbookPath).toBe(output2.playbookPath); + expect(output1.playbookPath).toBe(join(project, "generated", "learned-playbooks.json")); + }); +}); diff --git a/tests/learned-companion-rulebook.test.ts b/tests/learned-companion-rulebook.test.ts new file mode 100644 index 0000000..32787fb --- /dev/null +++ b/tests/learned-companion-rulebook.test.ts @@ -0,0 +1,411 @@ +import { describe, test, expect, beforeEach, afterEach } from "bun:test"; +import { mkdirSync, rmSync, readFileSync, writeFileSync } from "node:fs"; +import { join } from "node:path"; +import { tmpdir } from "node:os"; +import { randomUUID } from "node:crypto"; +import { + type CompanionConfidence, + type LearnedCompanionRule, + type LearnedCompanionRulebook, + type CompanionRulebookErrorCode, + companionRulebookPath, + serializeCompanionRulebook, + loadCompanionRulebook, + saveCompanionRulebook, + createEmptyCompanionRulebook, +} from "../hooks/src/learned-companion-rulebook.mts"; + +// --------------------------------------------------------------------------- +// Fixtures +// --------------------------------------------------------------------------- + +const T0 = "2026-03-28T08:15:00.000Z"; +const PROJECT_ROOT = "/test/project"; +const SCENARIO_A = "PreToolUse|flow-verification|uiRender|Bash|/dashboard"; +const SCENARIO_B = "UserPromptSubmit|none|none|Prompt|*"; + +function makeRule( + overrides: Partial = {}, +): LearnedCompanionRule { + return { + id: `${SCENARIO_A}::verification->agent-browser-verify`, + scenario: SCENARIO_A, + hook: "PreToolUse", + storyKind: "flow-verification", + targetBoundary: "uiRender", + toolName: "Bash", + routeScope: "/dashboard", + candidateSkill: "verification", + companionSkill: "agent-browser-verify", + support: 5, + winsWithCompanion: 4, + winsWithoutCompanion: 2, + directiveWinsWithCompanion: 1, + staleMissesWithCompanion: 0, + precisionWithCompanion: 0.8, + baselinePrecisionWithoutCompanion: 0.5, + liftVsCandidateAlone: 1.6, + staleMissDelta: 0, + confidence: "promote", + promotedAt: T0, + reason: "companion beats candidate-alone within same verified scenario", + sourceExposureGroupIds: ["g-1", "g-2", "g-3", "g-4", "g-5"], + ...overrides, + }; +} + +function makeRulebook( + overrides: Partial = {}, +): LearnedCompanionRulebook { + return { + version: 1, + generatedAt: T0, + projectRoot: PROJECT_ROOT, + rules: [makeRule()], + replay: { + baselineWins: 0, + learnedWins: 0, + deltaWins: 0, + regressions: [], + }, + promotion: { + accepted: true, + errorCode: null, + reason: "1 promoted companion rules", + }, + ...overrides, + }; +} + +// --------------------------------------------------------------------------- +// Unique temp project root per test to avoid collisions +// --------------------------------------------------------------------------- + +let testProjectRoot: string; + +beforeEach(() => { + testProjectRoot = join(tmpdir(), `vercel-plugin-test-${randomUUID()}`); + mkdirSync(testProjectRoot, { recursive: true }); +}); + +afterEach(() => { + try { rmSync(companionRulebookPath(testProjectRoot)); } catch {} + try { rmSync(testProjectRoot, { recursive: true }); } catch {} +}); + +// --------------------------------------------------------------------------- +// AC1: Loading when absent returns empty rulebook +// --------------------------------------------------------------------------- + +describe("load absent companion rulebook", () => { + test("returns ok: true with version 1 empty rulebook", () => { + const result = loadCompanionRulebook(testProjectRoot); + expect(result.ok).toBe(true); + if (!result.ok) return; + expect(result.rulebook.version).toBe(1); + expect(result.rulebook.rules).toEqual([]); + }); + + test("empty rulebook has accepted promotion with empty reason", () => { + const result = loadCompanionRulebook(testProjectRoot); + expect(result.ok).toBe(true); + if (!result.ok) return; + expect(result.rulebook.promotion.accepted).toBe(true); + expect(result.rulebook.promotion.reason).toBe("empty rulebook"); + }); + + test("empty rulebook has zeroed replay stats", () => { + const result = loadCompanionRulebook(testProjectRoot); + expect(result.ok).toBe(true); + if (!result.ok) return; + expect(result.rulebook.replay).toEqual({ + baselineWins: 0, + learnedWins: 0, + deltaWins: 0, + regressions: [], + }); + }); +}); + +// --------------------------------------------------------------------------- +// AC2: Atomic save and byte-for-byte round-trip +// --------------------------------------------------------------------------- + +describe("save and load round-trip", () => { + test("round-trip preserves all fields without loss", () => { + const rulebook = makeRulebook(); + saveCompanionRulebook(testProjectRoot, rulebook); + const result = loadCompanionRulebook(testProjectRoot); + expect(result.ok).toBe(true); + if (!result.ok) return; + expect(result.rulebook).toEqual(rulebook); + }); + + test("round-trip produces byte-identical JSON", () => { + const rulebook = makeRulebook(); + saveCompanionRulebook(testProjectRoot, rulebook); + const raw = readFileSync( + companionRulebookPath(testProjectRoot), + "utf-8", + ); + expect(raw).toBe(serializeCompanionRulebook(rulebook)); + }); + + test("save overwrites previous rulebook atomically", () => { + const v1 = makeRulebook({ + rules: [makeRule({ companionSkill: "first" })], + }); + saveCompanionRulebook(testProjectRoot, v1); + + const v2 = makeRulebook({ + rules: [makeRule({ companionSkill: "second" })], + }); + saveCompanionRulebook(testProjectRoot, v2); + + const result = loadCompanionRulebook(testProjectRoot); + expect(result.ok).toBe(true); + if (!result.ok) return; + expect(result.rulebook.rules.length).toBe(1); + expect(result.rulebook.rules[0].companionSkill).toBe("second"); + }); + + test("idempotent save — writing same rulebook twice yields identical file", () => { + const rulebook = makeRulebook(); + saveCompanionRulebook(testProjectRoot, rulebook); + const first = readFileSync( + companionRulebookPath(testProjectRoot), + "utf-8", + ); + + saveCompanionRulebook(testProjectRoot, rulebook); + const second = readFileSync( + companionRulebookPath(testProjectRoot), + "utf-8", + ); + + expect(first).toBe(second); + }); +}); + +// --------------------------------------------------------------------------- +// AC3: Invalid content returns structured error +// --------------------------------------------------------------------------- + +describe("error handling", () => { + test("COMPANION_RULEBOOK_READ_FAILED for non-JSON content", () => { + writeFileSync( + companionRulebookPath(testProjectRoot), + "not json {{{", + ); + + const result = loadCompanionRulebook(testProjectRoot); + expect(result.ok).toBe(false); + if (result.ok) return; + expect(result.error.code).toBe( + "COMPANION_RULEBOOK_READ_FAILED" satisfies CompanionRulebookErrorCode, + ); + }); + + test("COMPANION_RULEBOOK_VERSION_UNSUPPORTED for version 2", () => { + const bad = { + version: 2, + generatedAt: T0, + projectRoot: "/x", + rules: [], + replay: { baselineWins: 0, learnedWins: 0, deltaWins: 0, regressions: [] }, + promotion: { accepted: true, errorCode: null, reason: "test" }, + }; + writeFileSync( + companionRulebookPath(testProjectRoot), + JSON.stringify(bad), + ); + + const result = loadCompanionRulebook(testProjectRoot); + expect(result.ok).toBe(false); + if (result.ok) return; + expect(result.error.code).toBe( + "COMPANION_RULEBOOK_VERSION_UNSUPPORTED" satisfies CompanionRulebookErrorCode, + ); + expect(result.error.detail.supportedVersions).toEqual([1]); + }); + + test("COMPANION_RULEBOOK_SCHEMA_INVALID for missing rules array", () => { + const bad = { + version: 1, + generatedAt: T0, + projectRoot: "/x", + replay: {}, + promotion: {}, + }; + writeFileSync( + companionRulebookPath(testProjectRoot), + JSON.stringify(bad), + ); + + const result = loadCompanionRulebook(testProjectRoot); + expect(result.ok).toBe(false); + if (result.ok) return; + expect(result.error.code).toBe( + "COMPANION_RULEBOOK_SCHEMA_INVALID" satisfies CompanionRulebookErrorCode, + ); + expect(result.error.detail.field).toBe("rules"); + }); + + test("COMPANION_RULEBOOK_SCHEMA_INVALID for rule with invalid confidence", () => { + const bad = { + version: 1, + generatedAt: T0, + projectRoot: "/x", + rules: [{ + id: "test", + scenario: "test", + candidateSkill: "test", + companionSkill: "test", + reason: "test", + support: 1, + winsWithCompanion: 1, + winsWithoutCompanion: 0, + precisionWithCompanion: 1, + baselinePrecisionWithoutCompanion: 0, + liftVsCandidateAlone: 1, + staleMissDelta: 0, + confidence: "unknown-value", + }], + replay: { baselineWins: 0, learnedWins: 0, deltaWins: 0, regressions: [] }, + promotion: { accepted: true, errorCode: null, reason: "test" }, + }; + writeFileSync( + companionRulebookPath(testProjectRoot), + JSON.stringify(bad), + ); + + const result = loadCompanionRulebook(testProjectRoot); + expect(result.ok).toBe(false); + if (result.ok) return; + expect(result.error.code).toBe( + "COMPANION_RULEBOOK_SCHEMA_INVALID" satisfies CompanionRulebookErrorCode, + ); + expect(result.error.message).toContain("confidence"); + }); + + test("COMPANION_RULEBOOK_SCHEMA_INVALID for JSON array", () => { + writeFileSync(companionRulebookPath(testProjectRoot), "[]"); + + const result = loadCompanionRulebook(testProjectRoot); + expect(result.ok).toBe(false); + if (result.ok) return; + expect(result.error.code).toBe( + "COMPANION_RULEBOOK_SCHEMA_INVALID" satisfies CompanionRulebookErrorCode, + ); + }); + + test("errors have stable code, message, detail structure", () => { + writeFileSync(companionRulebookPath(testProjectRoot), "[]"); + + const result = loadCompanionRulebook(testProjectRoot); + expect(result.ok).toBe(false); + if (result.ok) return; + expect(typeof result.error.code).toBe("string"); + expect(typeof result.error.message).toBe("string"); + expect(typeof result.error.detail).toBe("object"); + }); +}); + +// --------------------------------------------------------------------------- +// Deterministic serialization +// --------------------------------------------------------------------------- + +describe("serializeCompanionRulebook", () => { + test("same input produces byte-identical output", () => { + const rulebook = makeRulebook(); + const first = serializeCompanionRulebook(rulebook); + const second = serializeCompanionRulebook(rulebook); + expect(first).toBe(second); + }); + + test("rules are sorted by scenario, candidateSkill, companionSkill", () => { + const rulebook = makeRulebook({ + rules: [ + makeRule({ scenario: "Z", candidateSkill: "z", companionSkill: "z" }), + makeRule({ scenario: "A", candidateSkill: "a", companionSkill: "b" }), + makeRule({ scenario: "A", candidateSkill: "a", companionSkill: "a" }), + ], + }); + const serialized = serializeCompanionRulebook(rulebook); + const parsed = JSON.parse(serialized) as LearnedCompanionRulebook; + expect(parsed.rules[0].companionSkill).toBe("a"); + expect(parsed.rules[1].companionSkill).toBe("b"); + expect(parsed.rules[2].scenario).toBe("Z"); + }); + + test("serialization does not mutate original rules order", () => { + const rules = [ + makeRule({ scenario: "Z", candidateSkill: "z", companionSkill: "z" }), + makeRule({ scenario: "A", candidateSkill: "a", companionSkill: "a" }), + ]; + const rulebook = makeRulebook({ rules }); + serializeCompanionRulebook(rulebook); + expect(rulebook.rules[0].scenario).toBe("Z"); + expect(rulebook.rules[1].scenario).toBe("A"); + }); +}); + +// --------------------------------------------------------------------------- +// Path resolution +// --------------------------------------------------------------------------- + +describe("companionRulebookPath", () => { + test("uses learned-companions prefix (not routing-policy)", () => { + const path = companionRulebookPath("/test/project"); + expect(path).toContain("vercel-plugin-learned-companions-"); + expect(path).toEndWith(".json"); + expect(path).not.toContain("routing-policy"); + }); + + test("different project roots produce different paths", () => { + const a = companionRulebookPath("/project/a"); + const b = companionRulebookPath("/project/b"); + expect(a).not.toBe(b); + }); + + test("same project root produces same path", () => { + const a = companionRulebookPath("/project/same"); + const b = companionRulebookPath("/project/same"); + expect(a).toBe(b); + }); +}); + +// --------------------------------------------------------------------------- +// Factory helpers +// --------------------------------------------------------------------------- + +describe("createEmptyCompanionRulebook", () => { + test("has version 1, empty rules, accepted promotion", () => { + const rb = createEmptyCompanionRulebook(PROJECT_ROOT, T0); + expect(rb.version).toBe(1); + expect(rb.projectRoot).toBe(PROJECT_ROOT); + expect(rb.generatedAt).toBe(T0); + expect(rb.rules).toEqual([]); + expect(rb.promotion.accepted).toBe(true); + }); +}); + +// --------------------------------------------------------------------------- +// Type export tests +// --------------------------------------------------------------------------- + +describe("type exports", () => { + test("CompanionConfidence accepts all valid values", () => { + const a: CompanionConfidence = "candidate"; + const b: CompanionConfidence = "promote"; + const c: CompanionConfidence = "holdout-fail"; + expect(a).toBe("candidate"); + expect(b).toBe("promote"); + expect(c).toBe("holdout-fail"); + }); + + test("LearnedCompanionRulebook has version 1", () => { + const rulebook = makeRulebook(); + expect(rulebook.version).toBe(1); + }); +}); diff --git a/tests/learned-routing-rulebook.test.ts b/tests/learned-routing-rulebook.test.ts new file mode 100644 index 0000000..1470c95 --- /dev/null +++ b/tests/learned-routing-rulebook.test.ts @@ -0,0 +1,412 @@ +import { describe, test, expect, beforeEach, afterEach } from "bun:test"; +import { mkdirSync, rmSync, readFileSync, writeFileSync } from "node:fs"; +import { join } from "node:path"; +import { tmpdir } from "node:os"; +import { randomUUID } from "node:crypto"; +import { + type LearnedRuleAction, + type LearnedRoutingRuleEvidence, + type LearnedRoutingRule, + type LearnedRoutingRulebook, + type RulebookErrorCode, + rulebookPath, + serializeRulebook, + loadRulebook, + saveRulebook, + createEmptyRulebook, + createRule, +} from "../hooks/src/learned-routing-rulebook.mts"; + +// --------------------------------------------------------------------------- +// Fixtures +// --------------------------------------------------------------------------- + +const T0 = "2026-03-28T08:15:00.000Z"; +const SESSION_ID = "sess_test_rulebook"; +const SCENARIO_A = "PreToolUse|flow-verification|uiRender|Bash"; +const SCENARIO_B = "UserPromptSubmit|none|none|Prompt"; + +function makeEvidence( + overrides: Partial = {}, +): LearnedRoutingRuleEvidence { + return { + baselineWins: 4, + baselineDirectiveWins: 2, + learnedWins: 4, + learnedDirectiveWins: 2, + regressionCount: 0, + ...overrides, + }; +} + +function makeRule( + overrides: Partial = {}, +): LearnedRoutingRule { + return { + id: `${SCENARIO_A}|agent-browser-verify`, + scenario: SCENARIO_A, + skill: "agent-browser-verify", + action: "promote", + boost: 8, + confidence: 0.93, + reason: "replay verified: no regressions, learned routing matched winning skill", + sourceSessionId: SESSION_ID, + promotedAt: T0, + evidence: makeEvidence(), + ...overrides, + }; +} + +function makeRulebook( + overrides: Partial = {}, +): LearnedRoutingRulebook { + return { + version: 1, + createdAt: T0, + sessionId: SESSION_ID, + rules: [makeRule()], + ...overrides, + }; +} + +// --------------------------------------------------------------------------- +// Unique temp project root per test to avoid collisions +// --------------------------------------------------------------------------- + +let testProjectRoot: string; + +beforeEach(() => { + testProjectRoot = join(tmpdir(), `vercel-plugin-test-${randomUUID()}`); + mkdirSync(testProjectRoot, { recursive: true }); +}); + +afterEach(() => { + // Clean up the rulebook file + try { rmSync(rulebookPath(testProjectRoot)); } catch {} + try { rmSync(testProjectRoot, { recursive: true }); } catch {} +}); + +// --------------------------------------------------------------------------- +// Type export tests +// --------------------------------------------------------------------------- + +describe("type exports", () => { + test("LearnedRuleAction accepts promote and demote", () => { + const promote: LearnedRuleAction = "promote"; + const demote: LearnedRuleAction = "demote"; + expect(promote).toBe("promote"); + expect(demote).toBe("demote"); + }); + + test("LearnedRoutingRuleEvidence has all required fields", () => { + const evidence = makeEvidence(); + expect(typeof evidence.baselineWins).toBe("number"); + expect(typeof evidence.baselineDirectiveWins).toBe("number"); + expect(typeof evidence.learnedWins).toBe("number"); + expect(typeof evidence.learnedDirectiveWins).toBe("number"); + expect(typeof evidence.regressionCount).toBe("number"); + }); + + test("LearnedRoutingRulebook has version 1", () => { + const rulebook = makeRulebook(); + expect(rulebook.version).toBe(1); + }); +}); + +// --------------------------------------------------------------------------- +// Deterministic serialization +// --------------------------------------------------------------------------- + +describe("serializeRulebook", () => { + test("same input produces byte-identical output", () => { + const rulebook = makeRulebook(); + const first = serializeRulebook(rulebook); + const second = serializeRulebook(rulebook); + expect(first).toBe(second); + }); + + test("rules are sorted by scenario, skill, id", () => { + const rulebook = makeRulebook({ + rules: [ + makeRule({ id: "z|z", scenario: "Z", skill: "z" }), + makeRule({ id: "a|a", scenario: "A", skill: "a" }), + makeRule({ id: "a|b", scenario: "A", skill: "b" }), + ], + }); + const serialized = serializeRulebook(rulebook); + const parsed = JSON.parse(serialized) as LearnedRoutingRulebook; + expect(parsed.rules[0].id).toBe("a|a"); + expect(parsed.rules[1].id).toBe("a|b"); + expect(parsed.rules[2].id).toBe("z|z"); + }); + + test("serialization does not mutate original rules order", () => { + const rules = [ + makeRule({ id: "z|z", scenario: "Z", skill: "z" }), + makeRule({ id: "a|a", scenario: "A", skill: "a" }), + ]; + const rulebook = makeRulebook({ rules }); + serializeRulebook(rulebook); + expect(rulebook.rules[0].id).toBe("z|z"); + expect(rulebook.rules[1].id).toBe("a|a"); + }); +}); + +// --------------------------------------------------------------------------- +// Round-trip persistence +// --------------------------------------------------------------------------- + +describe("save and load", () => { + test("round-trip preserves all fields without loss", () => { + const rulebook = makeRulebook(); + saveRulebook(testProjectRoot, rulebook); + const result = loadRulebook(testProjectRoot); + expect(result.ok).toBe(true); + if (!result.ok) return; + expect(result.rulebook).toEqual(makeRulebook({ rules: [makeRule()] })); + }); + + test("round-trip produces byte-identical JSON", () => { + const rulebook = makeRulebook(); + saveRulebook(testProjectRoot, rulebook); + const raw = readFileSync(rulebookPath(testProjectRoot), "utf-8"); + expect(raw).toBe(serializeRulebook(rulebook)); + }); + + test("load returns empty rulebook when file does not exist", () => { + const result = loadRulebook(testProjectRoot); + expect(result.ok).toBe(true); + if (!result.ok) return; + expect(result.rulebook.rules).toEqual([]); + expect(result.rulebook.version).toBe(1); + }); + + test("save overwrites previous rulebook atomically", () => { + const v1 = makeRulebook({ rules: [makeRule({ skill: "first" })] }); + saveRulebook(testProjectRoot, v1); + + const v2 = makeRulebook({ rules: [makeRule({ skill: "second" })] }); + saveRulebook(testProjectRoot, v2); + + const result = loadRulebook(testProjectRoot); + expect(result.ok).toBe(true); + if (!result.ok) return; + expect(result.rulebook.rules.length).toBe(1); + expect(result.rulebook.rules[0].skill).toBe("second"); + }); + + test("idempotent save — writing same rulebook twice yields identical file", () => { + const rulebook = makeRulebook(); + saveRulebook(testProjectRoot, rulebook); + const first = readFileSync(rulebookPath(testProjectRoot), "utf-8"); + + saveRulebook(testProjectRoot, rulebook); + const second = readFileSync(rulebookPath(testProjectRoot), "utf-8"); + + expect(first).toBe(second); + }); +}); + +// --------------------------------------------------------------------------- +// Error handling +// --------------------------------------------------------------------------- + +describe("error handling", () => { + test("RULEBOOK_VERSION_UNSUPPORTED for version 2", () => { + const bad = { version: 2, createdAt: T0, sessionId: "x", rules: [] }; + writeFileSync(rulebookPath(testProjectRoot), JSON.stringify(bad)); + + const result = loadRulebook(testProjectRoot); + expect(result.ok).toBe(false); + if (result.ok) return; + expect(result.error.code).toBe("RULEBOOK_VERSION_UNSUPPORTED" satisfies RulebookErrorCode); + expect(result.error.detail.supportedVersions).toEqual([1]); + }); + + test("RULEBOOK_SCHEMA_INVALID for non-JSON content", () => { + writeFileSync(rulebookPath(testProjectRoot), "not json {{{"); + + const result = loadRulebook(testProjectRoot); + expect(result.ok).toBe(false); + if (result.ok) return; + expect(result.error.code).toBe("RULEBOOK_SCHEMA_INVALID" satisfies RulebookErrorCode); + }); + + test("RULEBOOK_SCHEMA_INVALID for missing rules array", () => { + const bad = { version: 1, createdAt: T0, sessionId: "x" }; + writeFileSync(rulebookPath(testProjectRoot), JSON.stringify(bad)); + + const result = loadRulebook(testProjectRoot); + expect(result.ok).toBe(false); + if (result.ok) return; + expect(result.error.code).toBe("RULEBOOK_SCHEMA_INVALID" satisfies RulebookErrorCode); + expect(result.error.detail.field).toBe("rules"); + }); + + test("RULEBOOK_SCHEMA_INVALID for rule with missing evidence", () => { + const bad = { + version: 1, + createdAt: T0, + sessionId: "x", + rules: [{ + id: "test", + scenario: "test", + skill: "test", + action: "promote", + boost: 8, + confidence: 0.9, + reason: "test", + sourceSessionId: "x", + promotedAt: T0, + // evidence missing + }], + }; + writeFileSync(rulebookPath(testProjectRoot), JSON.stringify(bad)); + + const result = loadRulebook(testProjectRoot); + expect(result.ok).toBe(false); + if (result.ok) return; + expect(result.error.code).toBe("RULEBOOK_SCHEMA_INVALID" satisfies RulebookErrorCode); + expect(result.error.message).toContain("evidence"); + }); + + test("RULEBOOK_SCHEMA_INVALID for rule with invalid action", () => { + const bad = { + version: 1, + createdAt: T0, + sessionId: "x", + rules: [{ + id: "test", + scenario: "test", + skill: "test", + action: "investigate", // not a valid LearnedRuleAction + boost: 0, + confidence: 0.5, + reason: "test", + sourceSessionId: "x", + promotedAt: T0, + evidence: makeEvidence(), + }], + }; + writeFileSync(rulebookPath(testProjectRoot), JSON.stringify(bad)); + + const result = loadRulebook(testProjectRoot); + expect(result.ok).toBe(false); + if (result.ok) return; + expect(result.error.code).toBe("RULEBOOK_SCHEMA_INVALID" satisfies RulebookErrorCode); + expect(result.error.message).toContain("action"); + }); + + test("errors are structured with code, message, detail", () => { + writeFileSync(rulebookPath(testProjectRoot), "[]"); + + const result = loadRulebook(testProjectRoot); + expect(result.ok).toBe(false); + if (result.ok) return; + expect(typeof result.error.code).toBe("string"); + expect(typeof result.error.message).toBe("string"); + expect(typeof result.error.detail).toBe("object"); + }); +}); + +// --------------------------------------------------------------------------- +// Factory helpers +// --------------------------------------------------------------------------- + +describe("createRule", () => { + test("generates deterministic id from scenario and skill", () => { + const rule = createRule({ + scenario: SCENARIO_A, + skill: "agent-browser-verify", + action: "promote", + boost: 8, + confidence: 0.93, + reason: "test", + sourceSessionId: SESSION_ID, + promotedAt: T0, + evidence: makeEvidence(), + }); + expect(rule.id).toBe(`${SCENARIO_A}|agent-browser-verify`); + }); + + test("createEmptyRulebook has version 1 and empty rules", () => { + const rb = createEmptyRulebook(SESSION_ID, T0); + expect(rb.version).toBe(1); + expect(rb.sessionId).toBe(SESSION_ID); + expect(rb.createdAt).toBe(T0); + expect(rb.rules).toEqual([]); + }); +}); + +// --------------------------------------------------------------------------- +// Path resolution +// --------------------------------------------------------------------------- + +describe("rulebookPath", () => { + test("sits next to routing policy path", () => { + const path = rulebookPath("/test/project"); + expect(path).toContain("vercel-plugin-routing-policy-"); + expect(path).toEndWith("-rulebook.json"); + }); + + test("different project roots produce different paths", () => { + const a = rulebookPath("/project/a"); + const b = rulebookPath("/project/b"); + expect(a).not.toBe(b); + }); + + test("same project root produces same path", () => { + const a = rulebookPath("/project/same"); + const b = rulebookPath("/project/same"); + expect(a).toBe(b); + }); +}); + +// --------------------------------------------------------------------------- +// Canonical JSON contract example (from task spec) +// --------------------------------------------------------------------------- + +describe("canonical JSON contract", () => { + test("matches the specified contract shape", () => { + const rulebook: LearnedRoutingRulebook = { + version: 1, + createdAt: "2026-03-28T08:15:00.000Z", + sessionId: "sess_123", + rules: [ + { + id: "PreToolUse|flow-verification|uiRender|Bash|agent-browser-verify", + scenario: "PreToolUse|flow-verification|uiRender|Bash", + skill: "agent-browser-verify", + action: "promote", + boost: 8, + confidence: 0.93, + reason: "replay verified: no regressions, learned routing matched winning skill", + sourceSessionId: "sess_123", + promotedAt: "2026-03-28T08:15:00.000Z", + evidence: { + baselineWins: 4, + baselineDirectiveWins: 2, + learnedWins: 4, + learnedDirectiveWins: 2, + regressionCount: 0, + }, + }, + ], + }; + + // Round-trip through serialization + const json = serializeRulebook(rulebook); + const parsed = JSON.parse(json); + expect(parsed.version).toBe(1); + expect(parsed.rules).toHaveLength(1); + expect(parsed.rules[0].action).toBe("promote"); + expect(parsed.rules[0].evidence.regressionCount).toBe(0); + + // Persist and reload + saveRulebook(testProjectRoot, rulebook); + const result = loadRulebook(testProjectRoot); + expect(result.ok).toBe(true); + if (!result.ok) return; + expect(result.rulebook).toEqual(rulebook); + }); +}); diff --git a/tests/learned-rules-integration.test.ts b/tests/learned-rules-integration.test.ts new file mode 100644 index 0000000..6d3c9ff --- /dev/null +++ b/tests/learned-rules-integration.test.ts @@ -0,0 +1,727 @@ +import { describe, test, expect, beforeEach, afterEach } from "bun:test"; +import { mkdirSync, writeFileSync, rmSync, existsSync, readFileSync } from "node:fs"; +import { join } from "node:path"; +import { tmpdir } from "node:os"; +import { runLearnCommand, learnedRulesPath } from "../src/cli/learn.ts"; +import type { LearnedRoutingRulesFile } from "../hooks/src/rule-distillation.mts"; + +// --------------------------------------------------------------------------- +// Fixtures +// --------------------------------------------------------------------------- + +const FIXED_TS = "2026-03-28T06:00:00.000Z"; +const TEST_SESSION = "test-integration-learn"; + +function makeTempProject(): string { + const dir = join(tmpdir(), `vercel-plugin-integ-learn-${Date.now()}`); + mkdirSync(join(dir, "skills"), { recursive: true }); + mkdirSync(join(dir, "generated"), { recursive: true }); + return dir; +} + +function writeTraceFixture(sessionId: string, traces: object[]): void { + const traceDir = join(tmpdir(), `vercel-plugin-${sessionId}-trace`); + mkdirSync(traceDir, { recursive: true }); + const lines = traces.map((t) => JSON.stringify(t)).join("\n") + "\n"; + writeFileSync(join(traceDir, "routing-decision-trace.jsonl"), lines); +} + +function writeExposureFixture(sessionId: string, exposures: object[]): void { + const path = join(tmpdir(), `vercel-plugin-${sessionId}-routing-exposures.jsonl`); + const lines = exposures.map((e) => JSON.stringify(e)).join("\n") + "\n"; + writeFileSync(path, lines); +} + +function makeTrace(overrides: Record = {}): Record { + return { + version: 2, + decisionId: "d1", + sessionId: TEST_SESSION, + hook: "PreToolUse", + toolName: "Read", + toolTarget: "/app/page.tsx", + timestamp: FIXED_TS, + primaryStory: { + id: "story-1", + kind: "feature", + storyRoute: "/app", + targetBoundary: "uiRender", + }, + observedRoute: "/app", + policyScenario: null, + matchedSkills: [], + injectedSkills: [], + skippedReasons: [], + ranked: [], + verification: null, + ...overrides, + }; +} + +function makeExposure(overrides: Record = {}): Record { + return { + id: "exp-1", + sessionId: TEST_SESSION, + projectRoot: "/test", + storyId: "story-1", + storyKind: "feature", + route: "/app", + hook: "PreToolUse", + toolName: "Read", + targetBoundary: "uiRender", + exposureGroupId: null, + attributionRole: "candidate", + candidateSkill: "next-config", + createdAt: FIXED_TS, + resolvedAt: FIXED_TS, + outcome: "win", + skill: "next-config", + ...overrides, + }; +} + +function makeRanked(skill: string, pattern?: { type: string; value: string }) { + return { + skill, + basePriority: 6, + effectivePriority: 6, + pattern: pattern ?? null, + profilerBoost: 0, + policyBoost: 0, + policyReason: null, + summaryOnly: false, + synthetic: false, + droppedReason: null, + }; +} + +// --------------------------------------------------------------------------- +// Cleanup +// --------------------------------------------------------------------------- + +let tempDirs: string[] = []; + +beforeEach(() => { + tempDirs = []; +}); + +afterEach(() => { + for (const dir of tempDirs) { + try { rmSync(dir, { recursive: true, force: true }); } catch {} + } + try { + rmSync(join(tmpdir(), `vercel-plugin-${TEST_SESSION}-trace`), { recursive: true, force: true }); + } catch {} + try { + rmSync(join(tmpdir(), `vercel-plugin-${TEST_SESSION}-routing-exposures.jsonl`), { force: true }); + } catch {} +}); + +function trackDir(dir: string): string { + tempDirs.push(dir); + return dir; +} + +// --------------------------------------------------------------------------- +// Integration: end-to-end learn pipeline +// --------------------------------------------------------------------------- + +describe("learned-rules integration", () => { + test("end-to-end: distill → write → read produces valid artifact", async () => { + const project = trackDir(makeTempProject()); + + // 8 winning traces + 8 losing traces to create lift + const winTraces = Array.from({ length: 8 }, (_, i) => + makeTrace({ + decisionId: `win${i}`, + injectedSkills: ["next-config"], + ranked: [makeRanked("next-config", { type: "path", value: "next.config.*" })], + verification: { + verificationId: `v${i}`, + observedBoundary: "uiRender", + matchedSuggestedAction: true, + }, + }), + ); + const loseTraces = Array.from({ length: 8 }, (_, i) => + makeTrace({ + decisionId: `lose${i}`, + injectedSkills: ["tailwind"], + ranked: [makeRanked("tailwind", { type: "path", value: "tailwind.*" })], + }), + ); + + writeTraceFixture(TEST_SESSION, [...winTraces, ...loseTraces]); + writeExposureFixture(TEST_SESSION, [ + makeExposure({ skill: "next-config", outcome: "win" }), + makeExposure({ skill: "tailwind", outcome: "stale-miss" }), + ]); + + const code = await runLearnCommand({ + project, + write: true, + session: TEST_SESSION, + }); + + expect(code).toBe(0); + + const outPath = learnedRulesPath(project); + expect(existsSync(outPath)).toBe(true); + + const content: LearnedRoutingRulesFile = JSON.parse(readFileSync(outPath, "utf-8")); + expect(content.version).toBe(1); + expect(content.projectRoot).toBe(project); + expect(content.rules.length).toBeGreaterThanOrEqual(1); + expect(content.replay).toBeDefined(); + expect(content.replay.regressions).toEqual([]); + }); + + test("idempotent: running learn twice with same data produces identical artifacts", async () => { + const project = trackDir(makeTempProject()); + + const traces = Array.from({ length: 6 }, (_, i) => + makeTrace({ + decisionId: `d${i}`, + injectedSkills: ["next-config"], + ranked: [makeRanked("next-config", { type: "path", value: "next.config.*" })], + }), + ); + writeTraceFixture(TEST_SESSION, traces); + writeExposureFixture(TEST_SESSION, [makeExposure({ skill: "next-config", outcome: "win" })]); + + // Run 1 + const logs1: string[] = []; + const origLog = console.log; + console.log = (msg: string) => logs1.push(msg); + await runLearnCommand({ project, json: true, session: TEST_SESSION }); + console.log = origLog; + + // Run 2 + const logs2: string[] = []; + console.log = (msg: string) => logs2.push(msg); + await runLearnCommand({ project, json: true, session: TEST_SESSION }); + console.log = origLog; + + const json1 = JSON.parse(logs1.join("\n")); + const json2 = JSON.parse(logs2.join("\n")); + + // Strip generatedAt for comparison (timestamp changes between runs) + delete json1.generatedAt; + delete json2.generatedAt; + expect(JSON.stringify(json1)).toBe(JSON.stringify(json2)); + }); + + test("--json stdout contains only the JSON payload", async () => { + const project = trackDir(makeTempProject()); + + const logs: string[] = []; + const origLog = console.log; + console.log = (msg: string) => logs.push(msg); + try { + await runLearnCommand({ project, json: true, session: TEST_SESSION }); + } finally { + console.log = origLog; + } + + // stdout must be valid JSON (no extra lines) + const stdout = logs.join("\n"); + expect(() => JSON.parse(stdout)).not.toThrow(); + }); + + test("--write is atomic: file exists or doesn't, no partial writes", async () => { + const project = trackDir(makeTempProject()); + + await runLearnCommand({ project, write: true, session: TEST_SESSION }); + + const outPath = learnedRulesPath(project); + if (existsSync(outPath)) { + // If file exists, it must be valid JSON + const raw = readFileSync(outPath, "utf-8"); + expect(() => JSON.parse(raw)).not.toThrow(); + } + }); + + test("empty traces still produce valid artifact with --write", async () => { + const project = trackDir(makeTempProject()); + + await runLearnCommand({ project, write: true, session: TEST_SESSION }); + + const outPath = learnedRulesPath(project); + expect(existsSync(outPath)).toBe(true); + + const content: LearnedRoutingRulesFile = JSON.parse(readFileSync(outPath, "utf-8")); + expect(content.rules).toEqual([]); + expect(content.replay.regressions).toEqual([]); + expect(content.replay.baselineWins).toBe(0); + expect(content.replay.baselineDirectiveWins).toBe(0); + expect(content.replay.learnedWins).toBe(0); + expect(content.replay.learnedDirectiveWins).toBe(0); + expect(content.replay.deltaWins).toBe(0); + expect(content.replay.deltaDirectiveWins).toBe(0); + }); + + test("exit code reflects regression state", async () => { + const project = trackDir(makeTempProject()); + + // No traces = no regressions = exit 0 + const code = await runLearnCommand({ project, session: TEST_SESSION }); + expect(code).toBe(0); + }); + + // --------------------------------------------------------------------------- + // Exact-route promotion beating wildcard fallback + // --------------------------------------------------------------------------- + + test("exact-route rule beats wildcard fallback in distillation", async () => { + const project = trackDir(makeTempProject()); + + // 6 traces on exact route /dashboard with skill-a winning + const exactTraces = Array.from({ length: 6 }, (_, i) => + makeTrace({ + decisionId: `exact${i}`, + injectedSkills: ["skill-a"], + ranked: [makeRanked("skill-a", { type: "path", value: "app/dashboard/**" })], + primaryStory: { + id: "story-1", + kind: "feature", + storyRoute: "/dashboard", + targetBoundary: "uiRender", + }, + observedRoute: "/dashboard", + verification: { + verificationId: `ve${i}`, + observedBoundary: "uiRender", + matchedSuggestedAction: true, + }, + }), + ); + + // 6 traces on wildcard route * with skill-b winning + const wildcardTraces = Array.from({ length: 6 }, (_, i) => + makeTrace({ + decisionId: `wild${i}`, + injectedSkills: ["skill-b"], + ranked: [makeRanked("skill-b", { type: "path", value: "**/*.tsx" })], + primaryStory: { + id: "story-2", + kind: "feature", + storyRoute: "*", + targetBoundary: "uiRender", + }, + observedRoute: "*", + verification: { + verificationId: `vw${i}`, + observedBoundary: "uiRender", + matchedSuggestedAction: true, + }, + }), + ); + + writeTraceFixture(TEST_SESSION, [...exactTraces, ...wildcardTraces]); + writeExposureFixture(TEST_SESSION, [ + ...Array.from({ length: 6 }, (_, i) => + makeExposure({ + id: `exp-exact-${i}`, + skill: "skill-a", + candidateSkill: "skill-a", + route: "/dashboard", + outcome: "win", + }), + ), + ...Array.from({ length: 6 }, (_, i) => + makeExposure({ + id: `exp-wild-${i}`, + skill: "skill-b", + candidateSkill: "skill-b", + route: "*", + outcome: "win", + }), + ), + ]); + + const logs: string[] = []; + const origLog = console.log; + console.log = (msg: string) => logs.push(msg); + try { + await runLearnCommand({ project, json: true, session: TEST_SESSION }); + } finally { + console.log = origLog; + } + + const parsed: LearnedRoutingRulesFile = JSON.parse(logs.join("\n")); + // Both exact-route and wildcard should produce rules + expect(parsed.rules.length).toBeGreaterThanOrEqual(2); + + // Exact-route rule for skill-a exists with routeScope=/dashboard + const exactRule = parsed.rules.find( + (r) => r.skill === "skill-a" && r.scenario.routeScope === "/dashboard", + ); + expect(exactRule).toBeDefined(); + expect(exactRule!.support).toBe(6); + + // Wildcard rule for skill-b exists with routeScope=* + const wildRule = parsed.rules.find( + (r) => r.skill === "skill-b" && r.scenario.routeScope === "*", + ); + expect(wildRule).toBeDefined(); + expect(wildRule!.support).toBe(6); + + // Both rules are scoped to their own scenario — they don't interfere + expect(exactRule!.scenario.routeScope).not.toBe(wildRule!.scenario.routeScope); + }); + + // --------------------------------------------------------------------------- + // Candidate-only attribution: context skills don't get policy credit + // --------------------------------------------------------------------------- + + test("context-only attribution does not produce distilled rules", async () => { + const project = trackDir(makeTempProject()); + + // 8 traces where skill-candidate is the candidate and skill-context is context + const traces = Array.from({ length: 8 }, (_, i) => + makeTrace({ + decisionId: `d${i}`, + injectedSkills: ["skill-candidate", "skill-context"], + ranked: [ + makeRanked("skill-candidate", { type: "path", value: "next.config.*" }), + makeRanked("skill-context", { type: "path", value: "*.json" }), + ], + verification: { + verificationId: `v${i}`, + observedBoundary: "uiRender", + matchedSuggestedAction: true, + }, + }), + ); + + writeTraceFixture(TEST_SESSION, traces); + writeExposureFixture(TEST_SESSION, [ + // Candidate exposure — gets policy credit + ...Array.from({ length: 8 }, (_, i) => + makeExposure({ + id: `exp-cand-${i}`, + skill: "skill-candidate", + candidateSkill: "skill-candidate", + attributionRole: "candidate", + outcome: "win", + }), + ), + // Context exposure — does NOT get policy credit + ...Array.from({ length: 8 }, (_, i) => + makeExposure({ + id: `exp-ctx-${i}`, + skill: "skill-context", + candidateSkill: "skill-candidate", + attributionRole: "context", + outcome: "win", + }), + ), + ]); + + const logs: string[] = []; + const origLog = console.log; + console.log = (msg: string) => logs.push(msg); + try { + await runLearnCommand({ project, json: true, session: TEST_SESSION }); + } finally { + console.log = origLog; + } + + const parsed: LearnedRoutingRulesFile = JSON.parse(logs.join("\n")); + + // Candidate skill should produce a rule + const candidateRules = parsed.rules.filter((r) => r.skill === "skill-candidate"); + expect(candidateRules.length).toBeGreaterThanOrEqual(1); + + // Context skill must NOT produce any rules + const contextRules = parsed.rules.filter((r) => r.skill === "skill-context"); + expect(contextRules).toEqual([]); + }); + + // --------------------------------------------------------------------------- + // Replay rejection on regression: promoted rules downgraded + // --------------------------------------------------------------------------- + + test("replay rejection downgrades all promoted rules to holdout-fail", async () => { + const project = trackDir(makeTempProject()); + + // 8 verified traces: skill-a injected (baseline wins), skill-b ranked + // → skill-b gets candidate exposure with wins + const winTraces = Array.from({ length: 8 }, (_, i) => + makeTrace({ + decisionId: `win${i}`, + injectedSkills: ["skill-a"], + ranked: [makeRanked("skill-b", { type: "path", value: "b-pattern.*" })], + verification: { + verificationId: `v${i}`, + observedBoundary: "uiRender", + matchedSuggestedAction: true, + }, + }), + ); + + // 8 unverified traces with skill-c: dilute scenario precision so + // skill-b's lift > 1.5 (lift = 1.0 / 0.5 = 2.0) + const loseTraces = Array.from({ length: 8 }, (_, i) => + makeTrace({ + decisionId: `lose${i}`, + injectedSkills: ["skill-c"], + ranked: [makeRanked("skill-c", { type: "path", value: "c-pattern.*" })], + }), + ); + + writeTraceFixture(TEST_SESSION, [...winTraces, ...loseTraces]); + writeExposureFixture(TEST_SESSION, [ + // skill-b: 8 candidate wins (from the verified traces) + ...Array.from({ length: 8 }, (_, i) => + makeExposure({ + id: `exp-b-${i}`, + skill: "skill-b", + candidateSkill: "skill-b", + attributionRole: "candidate", + outcome: "win", + }), + ), + // skill-c: 8 candidate stale-misses (from unverified traces) + ...Array.from({ length: 8 }, (_, i) => + makeExposure({ + id: `exp-c-${i}`, + skill: "skill-c", + candidateSkill: "skill-c", + attributionRole: "candidate", + outcome: "stale-miss", + }), + ), + ]); + + const logs: string[] = []; + const origLog = console.log; + console.log = (msg: string) => logs.push(msg); + try { + const code = await runLearnCommand({ + project, + json: true, + session: TEST_SESSION, + }); + // Exit code 1 for regressions + expect(code).toBe(1); + } finally { + console.log = origLog; + } + + const parsed: LearnedRoutingRulesFile = JSON.parse(logs.join("\n")); + + // All rules should be downgraded to holdout-fail + for (const rule of parsed.rules) { + expect(rule.confidence).toBe("holdout-fail"); + expect(rule.promotedAt).toBeNull(); + } + + // Regressions array should be populated and sorted + expect(parsed.replay.regressions.length).toBeGreaterThan(0); + const sorted = [...parsed.replay.regressions].sort(); + expect(parsed.replay.regressions).toEqual(sorted); + }); + + // --------------------------------------------------------------------------- + // Deterministic JSON ordering + // --------------------------------------------------------------------------- + + test("promoted rules have deterministic JSON ordering: confidence → skill → id", async () => { + const project = trackDir(makeTempProject()); + + // Create traces for three different skills that all get promoted + const skills = ["zebra-skill", "alpha-skill", "middle-skill"]; + const allTraces: Record[] = []; + const allExposures: Record[] = []; + + for (const skill of skills) { + for (let i = 0; i < 6; i++) { + allTraces.push( + makeTrace({ + decisionId: `${skill}-d${i}`, + injectedSkills: [skill], + ranked: [makeRanked(skill, { type: "path", value: `${skill}.*` })], + verification: { + verificationId: `v-${skill}-${i}`, + observedBoundary: "uiRender", + matchedSuggestedAction: true, + }, + }), + ); + allExposures.push( + makeExposure({ + id: `exp-${skill}-${i}`, + skill, + candidateSkill: skill, + attributionRole: "candidate", + outcome: "win", + }), + ); + } + } + + writeTraceFixture(TEST_SESSION, allTraces); + writeExposureFixture(TEST_SESSION, allExposures); + + const logs: string[] = []; + const origLog = console.log; + console.log = (msg: string) => logs.push(msg); + try { + await runLearnCommand({ project, json: true, session: TEST_SESSION }); + } finally { + console.log = origLog; + } + + const parsed: LearnedRoutingRulesFile = JSON.parse(logs.join("\n")); + + // Verify rules are sorted by confidence, then skill, then id + for (let i = 1; i < parsed.rules.length; i++) { + const prev = parsed.rules[i - 1]!; + const curr = parsed.rules[i]!; + const confidenceOrder: Record = { + promote: 0, + candidate: 1, + "holdout-fail": 2, + }; + const co = + (confidenceOrder[prev.confidence] ?? 9) - + (confidenceOrder[curr.confidence] ?? 9); + if (co === 0) { + const sk = prev.skill.localeCompare(curr.skill); + if (sk === 0) { + expect(prev.id.localeCompare(curr.id)).toBeLessThanOrEqual(0); + } else { + expect(sk).toBeLessThanOrEqual(0); + } + } else { + expect(co).toBeLessThanOrEqual(0); + } + } + + // Verify sourceDecisionIds within each rule are sorted + for (const rule of parsed.rules) { + const sorted = [...rule.sourceDecisionIds].sort(); + expect(rule.sourceDecisionIds).toEqual(sorted); + } + }); + + test("replay regression IDs are deterministically sorted", async () => { + const project = trackDir(makeTempProject()); + + // Multiple baseline wins with skill-a, but promoted rule for skill-b → regressions + // Use deliberately unsorted decision IDs + const decisionIds = ["z-dec", "a-dec", "m-dec", "b-dec"]; + const traces: Record[] = decisionIds.map((id) => + makeTrace({ + decisionId: id, + injectedSkills: ["skill-a"], + ranked: [makeRanked("skill-b", { type: "path", value: "b.*" })], + verification: { + verificationId: `v-${id}`, + observedBoundary: "uiRender", + matchedSuggestedAction: true, + }, + }), + ); + // Add extra non-regressing traces to reach support threshold (4 win + 4 lose = lift > 1) + for (let i = 0; i < 4; i++) { + traces.push( + makeTrace({ + decisionId: `extra-b-${i}`, + injectedSkills: ["skill-b"], + ranked: [makeRanked("skill-b", { type: "path", value: "b.*" })], + }), + ); + } + // Add losing traces for a different skill to dilute scenario precision + for (let i = 0; i < 8; i++) { + traces.push( + makeTrace({ + decisionId: `dilute-${i}`, + injectedSkills: ["skill-c"], + ranked: [makeRanked("skill-c", { type: "path", value: "c.*" })], + }), + ); + } + + writeTraceFixture(TEST_SESSION, traces); + writeExposureFixture(TEST_SESSION, [ + // skill-b: 8 candidate wins (4 verified + 4 unverified from ranked matching) + ...Array.from({ length: 8 }, (_, i) => + makeExposure({ + id: `exp-b-${i}`, + skill: "skill-b", + candidateSkill: "skill-b", + attributionRole: "candidate", + outcome: "win", + }), + ), + // skill-c: 8 candidate stale-misses (to dilute scenario precision) + ...Array.from({ length: 8 }, (_, i) => + makeExposure({ + id: `exp-c-${i}`, + skill: "skill-c", + candidateSkill: "skill-c", + attributionRole: "candidate", + outcome: "stale-miss", + }), + ), + ]); + + const logs: string[] = []; + const origLog = console.log; + console.log = (msg: string) => logs.push(msg); + try { + await runLearnCommand({ project, json: true, session: TEST_SESSION }); + } finally { + console.log = origLog; + } + + const parsed: LearnedRoutingRulesFile = JSON.parse(logs.join("\n")); + + // Regressions must be sorted alphabetically + const regressions = parsed.replay.regressions; + expect(regressions.length).toBeGreaterThan(0); + const sorted = [...regressions].sort(); + expect(regressions).toEqual(sorted); + }); + + // --------------------------------------------------------------------------- + // No eligible rules: traces exist but none meet thresholds + // --------------------------------------------------------------------------- + + test("traces with no eligible rules produce empty rules array", async () => { + const project = trackDir(makeTempProject()); + + // Only 2 traces — below default minSupport=5 + const traces = Array.from({ length: 2 }, (_, i) => + makeTrace({ + decisionId: `d${i}`, + injectedSkills: ["next-config"], + ranked: [makeRanked("next-config", { type: "path", value: "next.config.*" })], + }), + ); + writeTraceFixture(TEST_SESSION, traces); + writeExposureFixture(TEST_SESSION, [ + makeExposure({ skill: "next-config", outcome: "win" }), + ]); + + const logs: string[] = []; + const origLog = console.log; + console.log = (msg: string) => logs.push(msg); + try { + const code = await runLearnCommand({ project, json: true, session: TEST_SESSION }); + expect(code).toBe(0); + } finally { + console.log = origLog; + } + + const parsed: LearnedRoutingRulesFile = JSON.parse(logs.join("\n")); + + // Rules may exist but none should be promoted (support < 5) + const promoted = parsed.rules.filter((r) => r.confidence === "promote"); + expect(promoted).toEqual([]); + expect(parsed.replay.regressions).toEqual([]); + }); +}); diff --git a/tests/manifest-exclusion-parity.test.ts b/tests/manifest-exclusion-parity.test.ts new file mode 100644 index 0000000..1e4875f --- /dev/null +++ b/tests/manifest-exclusion-parity.test.ts @@ -0,0 +1,98 @@ +import { describe, test, expect } from "bun:test"; +import { join } from "node:path"; +import { existsSync } from "node:fs"; +import { buildManifest } from "../scripts/build-manifest.ts"; +import { loadValidatedSkillMap } from "../src/shared/skill-map-loader.ts"; +import { + filterExcludedSkillMap, + getSkillExclusion, + EXCLUDED_SKILL_PATTERN, +} from "../src/shared/skill-exclusion-policy.ts"; + +const SKILLS_DIR = join(import.meta.dir, "..", "skills"); + +describe("manifest exclusion parity", () => { + test("manifest excludedSkills matches live exclusion policy", () => { + const { manifest, errors } = buildManifest(SKILLS_DIR); + expect(errors).toEqual([]); + expect(manifest).not.toBeNull(); + + const live = loadValidatedSkillMap(SKILLS_DIR); + const filtered = filterExcludedSkillMap(live.skills); + + // Manifest exclusions must exactly match what the live policy produces + expect(manifest.excludedSkills).toEqual(filtered.excluded); + + // Excluded skills must be absent from manifest.skills + for (const ex of filtered.excluded) { + expect(Object.keys(manifest.skills)).not.toContain(ex.slug); + } + }); + + test("excluded skill slugs match the expected pattern", () => { + const live = loadValidatedSkillMap(SKILLS_DIR); + const filtered = filterExcludedSkillMap(live.skills); + + // Every excluded slug must match ^fake- or -test-skill$ + for (const ex of filtered.excluded) { + expect(ex.slug).toMatch(/^fake-|-test-skill$/); + expect(ex.reason).toBe("test-only-pattern"); + } + }); + + test("buildManifest returns no errors for the current skills directory", () => { + const { errors, warnings } = buildManifest(SKILLS_DIR); + expect(errors).toEqual([]); + // Warnings are acceptable, but no hard errors + }); + + // --------------------------------------------------------------------------- + // Explicit fixture verification after cleanup + // --------------------------------------------------------------------------- + + test("fake-banned-test-skill exists on disk but is excluded from manifest", () => { + const skillDir = join(SKILLS_DIR, "fake-banned-test-skill"); + expect(existsSync(skillDir)).toBe(true); + + const exclusion = getSkillExclusion("fake-banned-test-skill"); + expect(exclusion).not.toBeNull(); + expect(exclusion!.reason).toBe("test-only-pattern"); + + const { manifest } = buildManifest(SKILLS_DIR); + expect(Object.keys(manifest.skills)).not.toContain("fake-banned-test-skill"); + expect(manifest.excludedSkills.some((e: { slug: string }) => e.slug === "fake-banned-test-skill")).toBe(true); + }); + + test("fake-orphan-test-skill exists on disk but is excluded from manifest", () => { + const skillDir = join(SKILLS_DIR, "fake-orphan-test-skill"); + // This skill may or may not exist depending on fixture cleanup state + if (!existsSync(skillDir)) { + // If it's been cleaned up, verify it's not in the manifest at all + const { manifest } = buildManifest(SKILLS_DIR); + expect(Object.keys(manifest.skills)).not.toContain("fake-orphan-test-skill"); + return; + } + + const exclusion = getSkillExclusion("fake-orphan-test-skill"); + expect(exclusion).not.toBeNull(); + expect(exclusion!.reason).toBe("test-only-pattern"); + + const { manifest } = buildManifest(SKILLS_DIR); + expect(Object.keys(manifest.skills)).not.toContain("fake-orphan-test-skill"); + expect(manifest.excludedSkills.some((e: { slug: string }) => e.slug === "fake-orphan-test-skill")).toBe(true); + }); + + test("excluded skills array is sorted by slug for deterministic output", () => { + const { manifest } = buildManifest(SKILLS_DIR); + const slugs = manifest.excludedSkills.map((e: { slug: string }) => e.slug); + const sorted = [...slugs].sort(); + expect(slugs).toEqual(sorted); + }); + + test("production skills are never caught by the exclusion pattern", () => { + const { manifest } = buildManifest(SKILLS_DIR); + for (const slug of Object.keys(manifest.skills)) { + expect(EXCLUDED_SKILL_PATTERN.test(slug)).toBe(false); + } + }); +}); diff --git a/tests/playbook-distillation.test.ts b/tests/playbook-distillation.test.ts new file mode 100644 index 0000000..d0ce5b7 --- /dev/null +++ b/tests/playbook-distillation.test.ts @@ -0,0 +1,338 @@ +import { describe, test, expect } from "bun:test"; +import { distillPlaybooks } from "../hooks/src/playbook-distillation.mts"; +import { + playbookRulebookPath, + createEmptyPlaybookRulebook, + savePlaybookRulebook, + loadPlaybookRulebook, +} from "../hooks/src/learned-playbook-rulebook.mts"; +import type { SkillExposure } from "../hooks/src/routing-policy-ledger.mts"; +import { mkdtempSync } from "node:fs"; +import { join } from "node:path"; +import { tmpdir } from "node:os"; + +// --------------------------------------------------------------------------- +// Helpers +// --------------------------------------------------------------------------- + +function makeExposure( + input: Partial & { + exposureGroupId: string; + skill: string; + outcome: SkillExposure["outcome"]; + attributionRole: SkillExposure["attributionRole"]; + }, +): SkillExposure { + return { + id: `${input.exposureGroupId}:${input.skill}`, + sessionId: "s1", + projectRoot: "/repo", + storyId: "story-1", + storyKind: "flow-verification", + route: "/settings", + hook: "PreToolUse", + toolName: "Bash", + skill: input.skill, + targetBoundary: "clientRequest", + exposureGroupId: input.exposureGroupId, + attributionRole: input.attributionRole, + candidateSkill: "verification", + createdAt: "2026-03-28T16:00:00.000Z", + resolvedAt: "2026-03-28T16:01:00.000Z", + outcome: input.outcome, + ...input, + }; +} + +// --------------------------------------------------------------------------- +// Rulebook persistence +// --------------------------------------------------------------------------- + +describe("learned-playbook-rulebook", () => { + test("playbookRulebookPath resolves to generated/learned-playbooks.json", () => { + expect(playbookRulebookPath("/repo")).toBe( + "/repo/generated/learned-playbooks.json", + ); + }); + + test("save and load round-trip a versioned rulebook with deterministic sorting", () => { + const projectRoot = mkdtempSync(join(tmpdir(), "vp-pb-")); + const rulebook = createEmptyPlaybookRulebook( + projectRoot, + "2026-03-28T16:00:00.000Z", + ); + rulebook.rules.push({ + id: "test-rule", + scenario: "PreToolUse|flow|clientRequest|Bash|*", + hook: "PreToolUse", + storyKind: "flow", + targetBoundary: "clientRequest", + toolName: "Bash", + routeScope: "*", + anchorSkill: "a", + orderedSkills: ["a", "b"], + support: 5, + wins: 4, + directiveWins: 1, + staleMisses: 0, + precision: 0.8, + baselinePrecisionWithoutPlaybook: 0.4, + liftVsAnchorBaseline: 2, + staleMissDelta: -0.1, + confidence: "promote", + promotedAt: "2026-03-28T16:00:00.000Z", + reason: "test", + sourceExposureGroupIds: ["g1", "g2"], + }); + + savePlaybookRulebook(projectRoot, rulebook); + const loaded = loadPlaybookRulebook(projectRoot); + expect(loaded.ok).toBe(true); + if (!loaded.ok) return; + expect(loaded.rulebook.version).toBe(1); + expect(loaded.rulebook.rules).toHaveLength(1); + expect(loaded.rulebook.rules[0].anchorSkill).toBe("a"); + expect(loaded.rulebook.rules[0].orderedSkills).toEqual(["a", "b"]); + }); + + test("loadPlaybookRulebook returns ENOENT for missing file", () => { + const result = loadPlaybookRulebook("/nonexistent/path"); + expect(result.ok).toBe(false); + if (result.ok) return; + expect(result.error.code).toBe("ENOENT"); + }); +}); + +// --------------------------------------------------------------------------- +// Distillation +// --------------------------------------------------------------------------- + +describe("distillPlaybooks", () => { + test("promotes an ordered sequence that beats the same anchor baseline", () => { + // 3 groups with the full playbook (verification → observability → routing-middleware), all wins + // 3 groups with anchor-only (verification), mixed outcomes (1 win, 2 stale-miss) + const exposures: SkillExposure[] = [ + // Group 1: full playbook, win + makeExposure({ exposureGroupId: "g1", skill: "verification", attributionRole: "candidate", outcome: "win" }), + makeExposure({ exposureGroupId: "g1", skill: "observability", attributionRole: "context", outcome: "win" }), + makeExposure({ exposureGroupId: "g1", skill: "routing-middleware", attributionRole: "context", outcome: "win" }), + // Group 2: full playbook, directive-win + makeExposure({ exposureGroupId: "g2", skill: "verification", attributionRole: "candidate", outcome: "directive-win" }), + makeExposure({ exposureGroupId: "g2", skill: "observability", attributionRole: "context", outcome: "directive-win" }), + makeExposure({ exposureGroupId: "g2", skill: "routing-middleware", attributionRole: "context", outcome: "directive-win" }), + // Group 3: full playbook, win + makeExposure({ exposureGroupId: "g3", skill: "verification", attributionRole: "candidate", outcome: "win" }), + makeExposure({ exposureGroupId: "g3", skill: "observability", attributionRole: "context", outcome: "win" }), + makeExposure({ exposureGroupId: "g3", skill: "routing-middleware", attributionRole: "context", outcome: "win" }), + // Group 4: anchor-only, stale-miss + makeExposure({ exposureGroupId: "g4", skill: "verification", attributionRole: "candidate", outcome: "stale-miss" }), + // Group 5: anchor-only, stale-miss + makeExposure({ exposureGroupId: "g5", skill: "verification", attributionRole: "candidate", outcome: "stale-miss" }), + // Group 6: anchor-only, win + makeExposure({ exposureGroupId: "g6", skill: "verification", attributionRole: "candidate", outcome: "win" }), + ]; + + const rulebook = distillPlaybooks({ + projectRoot: "/repo", + exposures, + generatedAt: "2026-03-28T16:10:00.000Z", + minSupport: 3, + minPrecision: 0.75, + minLift: 1.25, + maxSkills: 3, + }); + + const promoted = rulebook.rules.find((r) => r.confidence === "promote"); + expect(promoted).toBeDefined(); + expect(promoted?.anchorSkill).toBe("verification"); + expect(promoted?.orderedSkills).toEqual([ + "verification", + "observability", + "routing-middleware", + ]); + expect(promoted?.support).toBe(3); + expect(promoted?.wins).toBe(3); + expect(promoted?.confidence).toBe("promote"); + // Playbook precision: 3/3 = 1.0 + expect(promoted?.precision).toBe(1); + // Baseline: 1 win out of 3 anchor-only = 0.3333 + expect(promoted?.baselinePrecisionWithoutPlaybook).toBeCloseTo(0.3333, 3); + // Lift: 1.0 / 0.3333 = 3.0 + expect(promoted?.liftVsAnchorBaseline).toBeCloseTo(3, 1); + }); + + test("does not promote sequences below minSupport", () => { + const exposures: SkillExposure[] = [ + makeExposure({ exposureGroupId: "g1", skill: "a", attributionRole: "candidate", outcome: "win", candidateSkill: "a" }), + makeExposure({ exposureGroupId: "g1", skill: "b", attributionRole: "context", outcome: "win", candidateSkill: "a" }), + makeExposure({ exposureGroupId: "g2", skill: "a", attributionRole: "candidate", outcome: "win", candidateSkill: "a" }), + makeExposure({ exposureGroupId: "g2", skill: "b", attributionRole: "context", outcome: "win", candidateSkill: "a" }), + ]; + + const rulebook = distillPlaybooks({ + projectRoot: "/repo", + exposures, + minSupport: 3, + }); + + expect(rulebook.rules.every((r) => r.confidence === "holdout-fail")).toBe( + true, + ); + }); + + test("skips single-skill groups (no playbook possible)", () => { + const exposures: SkillExposure[] = [ + makeExposure({ exposureGroupId: "g1", skill: "only-one", attributionRole: "candidate", outcome: "win", candidateSkill: "only-one" }), + makeExposure({ exposureGroupId: "g2", skill: "only-one", attributionRole: "candidate", outcome: "win", candidateSkill: "only-one" }), + makeExposure({ exposureGroupId: "g3", skill: "only-one", attributionRole: "candidate", outcome: "win", candidateSkill: "only-one" }), + ]; + + const rulebook = distillPlaybooks({ + projectRoot: "/repo", + exposures, + minSupport: 3, + }); + + expect(rulebook.rules).toHaveLength(0); + }); + + test("skips exposures without exposureGroupId", () => { + const exposures: SkillExposure[] = [ + makeExposure({ exposureGroupId: "", skill: "a", attributionRole: "candidate", outcome: "win" }), + ]; + + const rulebook = distillPlaybooks({ + projectRoot: "/repo", + exposures, + }); + + expect(rulebook.rules).toHaveLength(0); + }); + + test("caps orderedSkills at maxSkills", () => { + const exposures: SkillExposure[] = []; + for (let i = 0; i < 4; i++) { + const gid = `g${i}`; + exposures.push( + makeExposure({ exposureGroupId: gid, skill: "a", attributionRole: "candidate", outcome: "win", candidateSkill: "a" }), + makeExposure({ exposureGroupId: gid, skill: "b", attributionRole: "context", outcome: "win", candidateSkill: "a" }), + makeExposure({ exposureGroupId: gid, skill: "c", attributionRole: "context", outcome: "win", candidateSkill: "a" }), + makeExposure({ exposureGroupId: gid, skill: "d", attributionRole: "context", outcome: "win", candidateSkill: "a" }), + ); + } + + const rulebook = distillPlaybooks({ + projectRoot: "/repo", + exposures, + maxSkills: 2, + minSupport: 3, + }); + + for (const rule of rulebook.rules) { + expect(rule.orderedSkills.length).toBeLessThanOrEqual(2); + } + }); + + test("deduplicates skills in orderedSkills", () => { + const exposures: SkillExposure[] = []; + for (let i = 0; i < 4; i++) { + const gid = `g${i}`; + exposures.push( + makeExposure({ exposureGroupId: gid, skill: "a", attributionRole: "candidate", outcome: "win", candidateSkill: "a" }), + makeExposure({ exposureGroupId: gid, skill: "a", attributionRole: "context", outcome: "win", candidateSkill: "a" }), + makeExposure({ exposureGroupId: gid, skill: "b", attributionRole: "context", outcome: "win", candidateSkill: "a" }), + ); + } + + const rulebook = distillPlaybooks({ + projectRoot: "/repo", + exposures, + minSupport: 3, + }); + + for (const rule of rulebook.rules) { + const unique = [...new Set(rule.orderedSkills)]; + expect(rule.orderedSkills).toEqual(unique); + } + }); + + test("rules are sorted deterministically", () => { + const exposures: SkillExposure[] = []; + // Two different scenarios + for (let i = 0; i < 4; i++) { + const gid = `gA${i}`; + exposures.push( + makeExposure({ exposureGroupId: gid, skill: "z-skill", attributionRole: "candidate", outcome: "win", candidateSkill: "z-skill", hook: "PreToolUse" }), + makeExposure({ exposureGroupId: gid, skill: "a-skill", attributionRole: "context", outcome: "win", candidateSkill: "z-skill", hook: "PreToolUse" }), + ); + } + for (let i = 0; i < 4; i++) { + const gid = `gB${i}`; + exposures.push( + makeExposure({ exposureGroupId: gid, skill: "a-anchor", attributionRole: "candidate", outcome: "win", candidateSkill: "a-anchor", hook: "PreToolUse" }), + makeExposure({ exposureGroupId: gid, skill: "b-step", attributionRole: "context", outcome: "win", candidateSkill: "a-anchor", hook: "PreToolUse" }), + ); + } + + const rulebook = distillPlaybooks({ + projectRoot: "/repo", + exposures, + minSupport: 3, + }); + + expect(rulebook.rules.length).toBeGreaterThanOrEqual(2); + // Rules should be sorted by scenario, then anchorSkill, then orderedSkills + for (let i = 1; i < rulebook.rules.length; i++) { + const prev = rulebook.rules[i - 1]; + const curr = rulebook.rules[i]; + const cmp = + prev.scenario.localeCompare(curr.scenario) || + prev.anchorSkill.localeCompare(curr.anchorSkill) || + prev.orderedSkills.join(">").localeCompare(curr.orderedSkills.join(">")); + expect(cmp).toBeLessThanOrEqual(0); + } + }); + + test("pending outcomes are ignored", () => { + const exposures: SkillExposure[] = []; + for (let i = 0; i < 4; i++) { + exposures.push( + makeExposure({ exposureGroupId: `g${i}`, skill: "a", attributionRole: "candidate", outcome: "pending", candidateSkill: "a" }), + makeExposure({ exposureGroupId: `g${i}`, skill: "b", attributionRole: "context", outcome: "pending", candidateSkill: "a" }), + ); + } + + const rulebook = distillPlaybooks({ + projectRoot: "/repo", + exposures, + minSupport: 3, + }); + + expect(rulebook.rules).toHaveLength(0); + }); + + test("stale-miss heavy playbook is not promoted", () => { + const exposures: SkillExposure[] = []; + // 3 playbook groups: all stale-miss + for (let i = 0; i < 3; i++) { + exposures.push( + makeExposure({ exposureGroupId: `g${i}`, skill: "a", attributionRole: "candidate", outcome: "stale-miss", candidateSkill: "a" }), + makeExposure({ exposureGroupId: `g${i}`, skill: "b", attributionRole: "context", outcome: "stale-miss", candidateSkill: "a" }), + ); + } + // 3 anchor-only groups: all wins (good baseline) + for (let i = 3; i < 6; i++) { + exposures.push( + makeExposure({ exposureGroupId: `g${i}`, skill: "a", attributionRole: "candidate", outcome: "win", candidateSkill: "a" }), + ); + } + + const rulebook = distillPlaybooks({ + projectRoot: "/repo", + exposures, + minSupport: 3, + }); + + expect(rulebook.rules.every((r) => r.confidence === "holdout-fail")).toBe(true); + }); +}); diff --git a/tests/playbook-recall.test.ts b/tests/playbook-recall.test.ts new file mode 100644 index 0000000..d5805a1 --- /dev/null +++ b/tests/playbook-recall.test.ts @@ -0,0 +1,222 @@ +import { describe, test, expect } from "bun:test"; +import { mkdtempSync } from "node:fs"; +import { join } from "node:path"; +import { tmpdir } from "node:os"; +import { + createEmptyPlaybookRulebook, + savePlaybookRulebook, +} from "../hooks/src/learned-playbook-rulebook.mts"; +import { recallVerifiedPlaybook } from "../hooks/src/playbook-recall.mts"; + +// --------------------------------------------------------------------------- +// Helpers +// --------------------------------------------------------------------------- + +function makeProjectWithPlaybook() { + const projectRoot = mkdtempSync(join(tmpdir(), "vp-playbook-")); + const rulebook = createEmptyPlaybookRulebook( + projectRoot, + "2026-03-28T16:00:00.000Z", + ); + rulebook.rules.push({ + id: "PreToolUse|flow-verification|clientRequest|Bash|/settings::verification>observability>routing-middleware", + scenario: "PreToolUse|flow-verification|clientRequest|Bash|/settings", + hook: "PreToolUse", + storyKind: "flow-verification", + targetBoundary: "clientRequest", + toolName: "Bash", + routeScope: "/settings", + anchorSkill: "verification", + orderedSkills: ["verification", "observability", "routing-middleware"], + support: 5, + wins: 4, + directiveWins: 1, + staleMisses: 0, + precision: 0.8, + baselinePrecisionWithoutPlaybook: 0.4, + liftVsAnchorBaseline: 2, + staleMissDelta: -0.2, + confidence: "promote", + promotedAt: "2026-03-28T16:00:00.000Z", + reason: + "verified ordered playbook beats same anchor without this exact sequence", + sourceExposureGroupIds: ["g1", "g2", "g3", "g4", "g5"], + }); + savePlaybookRulebook(projectRoot, rulebook); + return projectRoot; +} + +// --------------------------------------------------------------------------- +// Tests +// --------------------------------------------------------------------------- + +describe("recallVerifiedPlaybook", () => { + test("inserts missing ordered steps after the anchor skill", () => { + const projectRoot = makeProjectWithPlaybook(); + + const result = recallVerifiedPlaybook({ + projectRoot, + scenario: { + hook: "PreToolUse", + storyKind: "flow-verification", + targetBoundary: "clientRequest", + toolName: "Bash", + routeScope: "/settings", + }, + candidateSkills: ["verification", "nextjs"], + excludeSkills: new Set(["verification"]), + maxInsertedSkills: 2, + }); + + expect(result.selected).not.toBeNull(); + expect(result.selected?.anchorSkill).toBe("verification"); + expect(result.selected?.insertedSkills).toEqual([ + "observability", + "routing-middleware", + ]); + expect(result.banner).toContain("Verified Playbook"); + expect(result.banner).toContain("verification"); + expect(result.banner).toContain("observability"); + }); + + test("respects maxInsertedSkills cap", () => { + const projectRoot = makeProjectWithPlaybook(); + + const result = recallVerifiedPlaybook({ + projectRoot, + scenario: { + hook: "PreToolUse", + storyKind: "flow-verification", + targetBoundary: "clientRequest", + toolName: "Bash", + routeScope: "/settings", + }, + candidateSkills: ["verification"], + excludeSkills: new Set(["verification"]), + maxInsertedSkills: 1, + }); + + expect(result.selected).not.toBeNull(); + expect(result.selected?.insertedSkills).toHaveLength(1); + expect(result.selected?.insertedSkills[0]).toBe("observability"); + }); + + test("rejects when all playbook steps are excluded", () => { + const projectRoot = makeProjectWithPlaybook(); + + const result = recallVerifiedPlaybook({ + projectRoot, + scenario: { + hook: "PreToolUse", + storyKind: "flow-verification", + targetBoundary: "clientRequest", + toolName: "Bash", + routeScope: "/settings", + }, + candidateSkills: ["verification"], + excludeSkills: new Set([ + "verification", + "observability", + "routing-middleware", + ]), + }); + + expect(result.selected).toBeNull(); + expect(result.rejected).toHaveLength(1); + expect(result.rejected[0].reason).toContain("already_present"); + }); + + test("returns null when anchor skill is not in candidateSkills", () => { + const projectRoot = makeProjectWithPlaybook(); + + const result = recallVerifiedPlaybook({ + projectRoot, + scenario: { + hook: "PreToolUse", + storyKind: "flow-verification", + targetBoundary: "clientRequest", + toolName: "Bash", + routeScope: "/settings", + }, + candidateSkills: ["nextjs", "react"], + }); + + expect(result.selected).toBeNull(); + }); + + test("returns null when rulebook does not exist", () => { + const result = recallVerifiedPlaybook({ + projectRoot: "/nonexistent/path", + scenario: { + hook: "PreToolUse", + storyKind: "flow", + targetBoundary: "clientRequest", + toolName: "Bash", + }, + candidateSkills: ["verification"], + }); + + expect(result.selected).toBeNull(); + expect(result.banner).toBeNull(); + }); + + test("scenario mismatch returns null", () => { + const projectRoot = makeProjectWithPlaybook(); + + const result = recallVerifiedPlaybook({ + projectRoot, + scenario: { + hook: "UserPromptSubmit", + storyKind: "different-story", + targetBoundary: "environment", + toolName: "Prompt", + }, + candidateSkills: ["verification"], + }); + + expect(result.selected).toBeNull(); + }); + + test("holdout-fail rules are not recalled", () => { + const projectRoot = mkdtempSync(join(tmpdir(), "vp-pb-holdout-")); + const rulebook = createEmptyPlaybookRulebook(projectRoot); + rulebook.rules.push({ + id: "test::a>b", + scenario: "PreToolUse|flow|clientRequest|Bash|*", + hook: "PreToolUse", + storyKind: "flow", + targetBoundary: "clientRequest", + toolName: "Bash", + routeScope: "*", + anchorSkill: "a", + orderedSkills: ["a", "b"], + support: 2, + wins: 1, + directiveWins: 0, + staleMisses: 1, + precision: 0.5, + baselinePrecisionWithoutPlaybook: 0.5, + liftVsAnchorBaseline: 1, + staleMissDelta: 0, + confidence: "holdout-fail", + promotedAt: null, + reason: "insufficient support", + sourceExposureGroupIds: ["g1", "g2"], + }); + savePlaybookRulebook(projectRoot, rulebook); + + const result = recallVerifiedPlaybook({ + projectRoot, + scenario: { + hook: "PreToolUse", + storyKind: "flow", + targetBoundary: "clientRequest", + toolName: "Bash", + routeScope: "*", + }, + candidateSkills: ["a"], + }); + + expect(result.selected).toBeNull(); + }); +}); diff --git a/tests/policy-recall.test.ts b/tests/policy-recall.test.ts new file mode 100644 index 0000000..698de34 --- /dev/null +++ b/tests/policy-recall.test.ts @@ -0,0 +1,321 @@ +import { describe, expect, test } from "bun:test"; +import { selectPolicyRecallCandidates } from "../hooks/policy-recall.mjs"; +import type { RoutingPolicyFile } from "../hooks/routing-policy.mjs"; + +// --------------------------------------------------------------------------- +// Helpers +// --------------------------------------------------------------------------- + +function makePolicy( + scenarios: Record< + string, + Record< + string, + { + exposures: number; + wins: number; + directiveWins: number; + staleMisses: number; + } + > + >, +): RoutingPolicyFile { + const out: RoutingPolicyFile = { version: 1, scenarios: {} }; + for (const [key, skills] of Object.entries(scenarios)) { + out.scenarios[key] = {}; + for (const [skill, stats] of Object.entries(skills)) { + out.scenarios[key][skill] = { + ...stats, + lastUpdatedAt: "2026-03-27T19:00:00.000Z", + }; + } + } + return out; +} + +const BASE_SCENARIO = { + hook: "PreToolUse" as const, + storyKind: "flow-verification", + targetBoundary: "clientRequest" as const, + toolName: "Bash" as const, + routeScope: "/settings", +}; + +// --------------------------------------------------------------------------- +// Core behavior +// --------------------------------------------------------------------------- + +describe("selectPolicyRecallCandidates", () => { + test("prefers exact-route policy before wildcard fallback", () => { + const policy = makePolicy({ + "PreToolUse|flow-verification|clientRequest|Bash|/settings": { + verification: { + exposures: 4, + wins: 4, + directiveWins: 2, + staleMisses: 0, + }, + }, + "PreToolUse|flow-verification|clientRequest|Bash|*": { + workflow: { + exposures: 8, + wins: 6, + directiveWins: 1, + staleMisses: 2, + }, + }, + }); + + const result = selectPolicyRecallCandidates(policy, BASE_SCENARIO); + expect(result.map((e) => e.skill)).toEqual(["verification"]); + expect(result[0]?.scenario).toBe( + "PreToolUse|flow-verification|clientRequest|Bash|/settings", + ); + }); + + test("falls back to wildcard when exact route has no qualified evidence", () => { + const policy = makePolicy({ + "PreToolUse|flow-verification|clientRequest|Bash|/settings": { + // Too few exposures to qualify (< 3) + verification: { + exposures: 1, + wins: 1, + directiveWins: 0, + staleMisses: 0, + }, + }, + "PreToolUse|flow-verification|clientRequest|Bash|*": { + workflow: { + exposures: 5, + wins: 4, + directiveWins: 1, + staleMisses: 1, + }, + }, + }); + + const result = selectPolicyRecallCandidates(policy, BASE_SCENARIO); + expect(result.map((e) => e.skill)).toEqual(["workflow"]); + expect(result[0]?.scenario).toBe( + "PreToolUse|flow-verification|clientRequest|Bash|*", + ); + }); + + test("returns empty when no bucket qualifies", () => { + const policy = makePolicy({ + "PreToolUse|flow-verification|clientRequest|Bash|/settings": { + verification: { + exposures: 1, + wins: 0, + directiveWins: 0, + staleMisses: 1, + }, + }, + "PreToolUse|flow-verification|clientRequest|Bash|*": { + workflow: { + exposures: 2, + wins: 1, + directiveWins: 0, + staleMisses: 1, + }, + }, + }); + + const result = selectPolicyRecallCandidates(policy, BASE_SCENARIO); + expect(result).toEqual([]); + }); + + test("excludes skills in excludeSkills set", () => { + const policy = makePolicy({ + "PreToolUse|flow-verification|clientRequest|Bash|/settings": { + verification: { + exposures: 5, + wins: 5, + directiveWins: 2, + staleMisses: 0, + }, + }, + }); + + const result = selectPolicyRecallCandidates(policy, BASE_SCENARIO, { + excludeSkills: new Set(["verification"]), + }); + expect(result).toEqual([]); + }); + + test("filters by minSuccessRate threshold", () => { + const policy = makePolicy({ + "PreToolUse|flow-verification|clientRequest|Bash|/settings": { + verification: { + exposures: 10, + wins: 4, + directiveWins: 0, + staleMisses: 6, + }, + }, + }); + + // 4/10 = 0.40, below default 0.65 + const result = selectPolicyRecallCandidates(policy, BASE_SCENARIO); + expect(result).toEqual([]); + }); + + test("filters by minBoost threshold", () => { + const policy = makePolicy({ + "PreToolUse|flow-verification|clientRequest|Bash|/settings": { + // 3 exposures, 2 wins → successRate ~0.67 → boost = 5 (qualifies) + // But if we raise minBoost to 6, should be excluded + skillA: { + exposures: 3, + wins: 2, + directiveWins: 0, + staleMisses: 1, + }, + }, + }); + + const withDefault = selectPolicyRecallCandidates(policy, BASE_SCENARIO); + expect(withDefault.length).toBe(1); + + const withHighMinBoost = selectPolicyRecallCandidates( + policy, + BASE_SCENARIO, + { minBoost: 6 }, + ); + expect(withHighMinBoost).toEqual([]); + }); + + test("returns maxCandidates candidates when multiple qualify", () => { + const policy = makePolicy({ + "PreToolUse|flow-verification|clientRequest|Bash|/settings": { + skillA: { + exposures: 5, + wins: 5, + directiveWins: 3, + staleMisses: 0, + }, + skillB: { + exposures: 4, + wins: 4, + directiveWins: 1, + staleMisses: 0, + }, + skillC: { + exposures: 6, + wins: 5, + directiveWins: 0, + staleMisses: 1, + }, + }, + }); + + // Default maxCandidates = 1 + const single = selectPolicyRecallCandidates(policy, BASE_SCENARIO); + expect(single.length).toBe(1); + + // maxCandidates = 2 + const two = selectPolicyRecallCandidates(policy, BASE_SCENARIO, { + maxCandidates: 2, + }); + expect(two.length).toBe(2); + }); + + test("tie-breaking is deterministic: recallScore > exposures > skill name", () => { + // Two skills with identical stats → tie-break on skill name (asc) + const policy = makePolicy({ + "PreToolUse|flow-verification|clientRequest|Bash|/settings": { + zeta: { + exposures: 5, + wins: 4, + directiveWins: 1, + staleMisses: 0, + }, + alpha: { + exposures: 5, + wins: 4, + directiveWins: 1, + staleMisses: 0, + }, + }, + }); + + const result = selectPolicyRecallCandidates(policy, BASE_SCENARIO, { + maxCandidates: 2, + }); + expect(result.map((e) => e.skill)).toEqual(["alpha", "zeta"]); + }); + + test("candidate includes all required machine-readable fields", () => { + const policy = makePolicy({ + "PreToolUse|flow-verification|clientRequest|Bash|/settings": { + verification: { + exposures: 6, + wins: 5, + directiveWins: 2, + staleMisses: 1, + }, + }, + }); + + const [candidate] = selectPolicyRecallCandidates(policy, BASE_SCENARIO); + expect(candidate).toBeDefined(); + expect(candidate!.skill).toBe("verification"); + expect(candidate!.scenario).toBe( + "PreToolUse|flow-verification|clientRequest|Bash|/settings", + ); + expect(candidate!.exposures).toBe(6); + expect(candidate!.wins).toBe(5); + expect(candidate!.directiveWins).toBe(2); + expect(candidate!.staleMisses).toBe(1); + expect(typeof candidate!.successRate).toBe("number"); + expect(typeof candidate!.policyBoost).toBe("number"); + expect(typeof candidate!.recallScore).toBe("number"); + expect(candidate!.successRate).toBeGreaterThan(0); + expect(candidate!.policyBoost).toBeGreaterThan(0); + expect(candidate!.recallScore).toBeGreaterThan(0); + }); + + test("falls back to legacy 4-part key when no route-keyed bucket exists", () => { + const policy = makePolicy({ + "PreToolUse|flow-verification|clientRequest|Bash": { + legacySkill: { + exposures: 5, + wins: 4, + directiveWins: 1, + staleMisses: 0, + }, + }, + }); + + const result = selectPolicyRecallCandidates(policy, BASE_SCENARIO); + expect(result.map((e) => e.skill)).toEqual(["legacySkill"]); + expect(result[0]?.scenario).toBe( + "PreToolUse|flow-verification|clientRequest|Bash", + ); + }); + + test("works with null routeScope (no route context)", () => { + const policy = makePolicy({ + "PreToolUse|flow-verification|clientRequest|Bash|*": { + workflow: { + exposures: 5, + wins: 4, + directiveWins: 1, + staleMisses: 0, + }, + }, + }); + + const result = selectPolicyRecallCandidates(policy, { + ...BASE_SCENARIO, + routeScope: null, + }); + expect(result.map((e) => e.skill)).toEqual(["workflow"]); + }); + + test("empty policy returns empty", () => { + const policy: RoutingPolicyFile = { version: 1, scenarios: {} }; + const result = selectPolicyRecallCandidates(policy, BASE_SCENARIO); + expect(result).toEqual([]); + }); +}); diff --git a/tests/posttooluse-verification-closure-capsule.test.ts b/tests/posttooluse-verification-closure-capsule.test.ts new file mode 100644 index 0000000..1c09853 --- /dev/null +++ b/tests/posttooluse-verification-closure-capsule.test.ts @@ -0,0 +1,265 @@ +import { afterEach, describe, expect, test } from "bun:test"; +import { rmSync, unlinkSync } from "node:fs"; +import { run } from "../hooks/src/posttooluse-verification-observe.mts"; +import { + recordStory, + removeLedgerArtifacts, + storyId as computeStoryId, +} from "../hooks/src/verification-ledger.mts"; +import { + appendSkillExposure, + sessionExposurePath, + type SkillExposure, +} from "../hooks/src/routing-policy-ledger.mts"; +import { + readLatestVerificationClosureCapsule, + readVerificationClosureCapsules, +} from "../hooks/src/verification-closure-capsule.mts"; +import { traceDir } from "../hooks/src/routing-decision-trace.mts"; +import { readRoutingDecisionTrace } from "../hooks/src/routing-decision-trace.mts"; + +const SESSION = "verification-closure-capsule-" + Date.now(); +const CREATED_AT = "2026-03-28T11:00:00.000Z"; + +function exposure( + id: string, + overrides: Partial = {}, +): SkillExposure { + return { + id, + sessionId: SESSION, + projectRoot: "/tmp/project", + storyId: null, + storyKind: "flow-verification", + route: null, + hook: "PreToolUse", + toolName: "Bash", + skill: "agent-browser-verify", + targetBoundary: "clientRequest", + exposureGroupId: "group-1", + attributionRole: "candidate", + candidateSkill: "agent-browser-verify", + createdAt: CREATED_AT, + resolvedAt: null, + outcome: "pending", + ...overrides, + }; +} + +afterEach(() => { + try { + removeLedgerArtifacts(SESSION); + } catch {} + try { + unlinkSync(sessionExposurePath(SESSION)); + } catch {} + try { + rmSync(traceDir(SESSION), { recursive: true, force: true }); + } catch {} + delete process.env.VERCEL_PLUGIN_LOCAL_DEV_ORIGIN; + delete process.env.VERCEL_PLUGIN_VERIFICATION_STORY_ID; + delete process.env.VERCEL_PLUGIN_VERIFICATION_BOUNDARY; + delete process.env.VERCEL_PLUGIN_VERIFICATION_ACTION; + delete process.env.VERCEL_PLUGIN_VERIFICATION_ROUTE; +}); + +describe("verification closure capsule", () => { + test("records explicit gate failure for remote WebFetch", () => { + recordStory( + SESSION, + "flow-verification", + "/dashboard", + "remote fetch check", + [], + ); + + run( + JSON.stringify({ + tool_name: "WebFetch", + tool_input: { url: "https://example.com/dashboard" }, + session_id: SESSION, + }), + ); + + const capsule = readLatestVerificationClosureCapsule(SESSION); + expect(capsule).not.toBeNull(); + expect(capsule!.observation.boundary).toBe("clientRequest"); + expect(capsule!.gate.eligible).toBe(false); + expect(capsule!.gate.blockingReasonCodes).toContain("remote_web_fetch"); + expect(capsule!.resolution.attempted).toBe(false); + expect(capsule!.resolution.resolvedCount).toBe(0); + }); + + test("records route mismatch when local strong verification resolves nothing", () => { + recordStory( + SESSION, + "flow-verification", + "/settings", + "route mismatch check", + [], + ); + + const sid = computeStoryId("flow-verification", "/settings"); + appendSkillExposure( + exposure("exp-route-mismatch", { + storyId: sid, + route: "/settings", + }), + ); + + run( + JSON.stringify({ + tool_name: "Bash", + tool_input: { command: "curl http://localhost:3000/dashboard" }, + session_id: SESSION, + }), + ); + + const capsule = readLatestVerificationClosureCapsule(SESSION); + expect(capsule).not.toBeNull(); + expect(capsule!.gate.eligible).toBe(true); + expect(capsule!.exposureDiagnosis).not.toBeNull(); + expect(capsule!.exposureDiagnosis!.unresolvedReasonCodes).toContain( + "route_mismatch", + ); + expect(capsule!.resolution.resolvedCount).toBe(0); + }); + + test("capsule includes story resolution method", () => { + recordStory( + SESSION, + "flow-verification", + "/dashboard", + "method tracking", + [], + ); + + run( + JSON.stringify({ + tool_name: "Bash", + tool_input: { command: "curl http://localhost:3000/dashboard" }, + session_id: SESSION, + }), + ); + + const capsule = readLatestVerificationClosureCapsule(SESSION); + expect(capsule).not.toBeNull(); + expect(capsule!.storyResolution.method).toBe("exact-route"); + expect(capsule!.storyResolution.resolvedStoryId).toBe( + computeStoryId("flow-verification", "/dashboard"), + ); + }); + + test("capsule plan fields reflect active-story projection", () => { + recordStory( + SESSION, + "flow-verification", + "/dashboard", + "plan projection", + [], + ); + + run( + JSON.stringify({ + tool_name: "Bash", + tool_input: { command: "curl http://localhost:3000/dashboard" }, + session_id: SESSION, + }), + ); + + const capsule = readLatestVerificationClosureCapsule(SESSION); + expect(capsule).not.toBeNull(); + expect(capsule!.plan.activeStoryId).not.toBeNull(); + // After a clientRequest observation, that boundary should be satisfied + expect(capsule!.plan.satisfiedBoundaries).toContain("clientRequest"); + expect(capsule!.plan.missingBoundaries.length).toBeGreaterThan(0); + }); + + test("routing decision trace uses namespaced skip reasons", () => { + recordStory( + SESSION, + "flow-verification", + "/dashboard", + "trace reasons", + [], + ); + + run( + JSON.stringify({ + tool_name: "WebFetch", + tool_input: { url: "https://example.com/dashboard" }, + session_id: SESSION, + }), + ); + + const traces = readRoutingDecisionTrace(SESSION); + const trace = traces.find((t) => t.hook === "PostToolUse"); + expect(trace).toBeDefined(); + expect(trace!.skippedReasons).toContain("gate:remote_web_fetch"); + }); + + test("multiple observations produce multiple capsules in JSONL", () => { + recordStory( + SESSION, + "flow-verification", + "/dashboard", + "multi capsule", + [], + ); + + run( + JSON.stringify({ + tool_name: "Bash", + tool_input: { command: "curl http://localhost:3000/dashboard" }, + session_id: SESSION, + }), + ); + + run( + JSON.stringify({ + tool_name: "WebFetch", + tool_input: { url: "https://example.com/dashboard" }, + session_id: SESSION, + }), + ); + + const capsules = readVerificationClosureCapsules(SESSION); + expect(capsules.length).toBe(2); + expect(capsules[0]!.toolName).toBe("Bash"); + expect(capsules[1]!.toolName).toBe("WebFetch"); + }); + + test("successful resolution records win in capsule", () => { + recordStory( + SESSION, + "flow-verification", + "/dashboard", + "win test", + [], + ); + + const sid = computeStoryId("flow-verification", "/dashboard"); + appendSkillExposure( + exposure("exp-win", { + storyId: sid, + route: "/dashboard", + }), + ); + + run( + JSON.stringify({ + tool_name: "Bash", + tool_input: { command: "curl http://localhost:3000/dashboard" }, + session_id: SESSION, + }), + ); + + const capsule = readLatestVerificationClosureCapsule(SESSION); + expect(capsule).not.toBeNull(); + expect(capsule!.gate.eligible).toBe(true); + expect(capsule!.resolution.attempted).toBe(true); + expect(capsule!.resolution.resolvedCount).toBe(1); + expect(capsule!.resolution.outcomeKind).toBe("win"); + expect(capsule!.resolution.resolvedExposureIds).toContain("exp-win"); + }); +}); diff --git a/tests/posttooluse-verification-observe.test.ts b/tests/posttooluse-verification-observe.test.ts new file mode 100644 index 0000000..55ff9a4 --- /dev/null +++ b/tests/posttooluse-verification-observe.test.ts @@ -0,0 +1,1019 @@ +import { afterEach, describe, expect, test } from "bun:test"; +import { rmSync, unlinkSync } from "node:fs"; +import { + buildBoundaryEvent, + buildLedgerObservation, + classifyToolSignal, + isLocalVerificationUrl, + parseInput, + resolveObservedStoryId, + shouldResolveRoutingOutcome, + type VerificationBoundaryEvent, +} from "../hooks/src/posttooluse-verification-observe.mts"; +import { + recordObservation, + recordStory, + removeLedgerArtifacts, +} from "../hooks/src/verification-ledger.mts"; +import { storyId as computeStoryId } from "../hooks/src/verification-ledger.mts"; +import { verifyPlanSnapshot } from "../src/commands/verify-plan.ts"; +import { + readRoutingDecisionTrace, + createDecisionId, + traceDir, +} from "../hooks/src/routing-decision-trace.mts"; +import { + appendSkillExposure, + sessionExposurePath, + type SkillExposure, +} from "../hooks/src/routing-policy-ledger.mts"; +import { + inspectLocalVerificationUrl, + evaluateResolutionGate, + diagnosePendingExposureMatch, +} from "../hooks/src/verification-closure-diagnosis.mts"; + +describe("posttooluse verification closed loop", () => { + const sessionId = `verification-loop-${Date.now()}`; + + afterEach(() => { + removeLedgerArtifacts(sessionId); + }); + + test("buildLedgerObservation maps boundary event to observation shape", () => { + const event = buildBoundaryEvent({ + command: "curl http://localhost:3000/settings", + boundary: "clientRequest", + matchedPattern: "http-client", + inferredRoute: "/settings", + verificationId: "verif-shape-1", + timestamp: "2026-03-27T03:17:44.104Z", + }); + + const obs = buildLedgerObservation(event); + expect(obs.id).toBe("verif-shape-1"); + expect(obs.source).toBe("bash"); + expect(obs.boundary).toBe("clientRequest"); + expect(obs.route).toBe("/settings"); + expect(obs.meta?.matchedPattern).toBe("http-client"); + expect(obs.meta?.matchedSuggestedAction).toBe(false); + }); + + test("buildLedgerObservation nullifies unknown boundary", () => { + const event = buildBoundaryEvent({ + command: "ls", + boundary: "unknown", + matchedPattern: "none", + inferredRoute: null, + verificationId: "verif-unknown-1", + }); + + const obs = buildLedgerObservation(event); + expect(obs.boundary).toBeNull(); + }); + + test("records directive adherence and advances the plan", () => { + recordStory(sessionId, "flow-verification", "/settings", "save fails", []); + + const event = buildBoundaryEvent({ + command: "curl http://localhost:3000/settings", + boundary: "clientRequest", + matchedPattern: "http-client", + inferredRoute: "/settings", + verificationId: "verif-1", + timestamp: "2026-03-27T03:17:44.104Z", + env: { + ...process.env, + VERCEL_PLUGIN_VERIFICATION_BOUNDARY: "clientRequest", + VERCEL_PLUGIN_VERIFICATION_ACTION: + "curl http://localhost:3000/settings", + }, + }); + + expect(event.matchedSuggestedAction).toBe(true); + + const plan = recordObservation( + sessionId, + buildLedgerObservation(event), + { + lastAttemptedAction: "curl http://localhost:3000/settings", + }, + ); + + expect(Array.from(plan.satisfiedBoundaries)).toContain("clientRequest"); + + const snapshot = verifyPlanSnapshot({ sessionId }); + expect(snapshot.observationCount).toBe(1); + expect(snapshot.lastObservation?.matchedSuggestedAction).toBe(true); + expect(snapshot.lastObservation?.route).toBe("/settings"); + expect(snapshot.primaryNextAction?.targetBoundary).toBe("serverHandler"); + }); + + test("records divergence when the observed action does not match the suggestion", () => { + recordStory(sessionId, "flow-verification", "/settings", "save fails", []); + + const event = buildBoundaryEvent({ + command: "printenv", + boundary: "environment", + matchedPattern: "env-read", + inferredRoute: "/settings", + verificationId: "verif-2", + timestamp: "2026-03-27T03:17:45.104Z", + env: { + ...process.env, + VERCEL_PLUGIN_VERIFICATION_BOUNDARY: "clientRequest", + VERCEL_PLUGIN_VERIFICATION_ACTION: + "curl http://localhost:3000/settings", + }, + }); + + expect(event.matchedSuggestedAction).toBe(false); + + recordObservation( + sessionId, + buildLedgerObservation(event), + { + lastAttemptedAction: "curl http://localhost:3000/settings", + }, + ); + + const snapshot = verifyPlanSnapshot({ sessionId }); + expect(snapshot.lastObservation?.matchedSuggestedAction).toBe(false); + expect(snapshot.lastObservation?.boundary).toBe("environment"); + }); + + test("snapshot with no session returns empty with null lastObservation", () => { + const snapshot = verifyPlanSnapshot({ + sessionId: "nonexistent-session-" + Date.now(), + }); + expect(snapshot.hasStories).toBe(false); + expect(snapshot.lastObservation).toBeNull(); + expect(snapshot.observationCount).toBe(0); + }); + + describe("PostToolUse trace emission via run()", () => { + const traceSessionId = `trace-observe-${Date.now()}`; + + afterEach(() => { + removeLedgerArtifacts(traceSessionId); + try { rmSync(traceDir(traceSessionId), { recursive: true, force: true }); } catch {} + }); + + test("run() emits a PostToolUse trace with verification correlation", async () => { + const { run } = await import("../hooks/src/posttooluse-verification-observe.mts"); + + recordStory(traceSessionId, "flow-verification", "/settings", "test trace emit", []); + + const input = JSON.stringify({ + tool_name: "Bash", + tool_input: { command: "curl http://localhost:3000/settings" }, + session_id: traceSessionId, + }); + + run(input); + + const traces = readRoutingDecisionTrace(traceSessionId); + expect(traces.length).toBeGreaterThanOrEqual(1); + + const trace = traces.find((t) => t.hook === "PostToolUse"); + expect(trace).toBeDefined(); + expect(trace!.toolName).toBe("Bash"); + expect(trace!.verification).not.toBeNull(); + expect(trace!.verification!.verificationId).toBeTruthy(); + expect(trace!.verification!.observedBoundary).toBe("clientRequest"); + }); + + test("createDecisionId is deterministic for same inputs", () => { + const input = { + hook: "PostToolUse" as const, + sessionId: "sess-1", + toolName: "Bash", + toolTarget: "curl http://localhost:3000", + timestamp: "2026-03-27T04:00:00.000Z", + }; + const id1 = createDecisionId(input); + const id2 = createDecisionId(input); + expect(id1).toBe(id2); + expect(id1).toHaveLength(16); + }); + }); +}); + +// --------------------------------------------------------------------------- +// Multi-tool parseInput coverage +// --------------------------------------------------------------------------- + +describe("parseInput multi-tool support", () => { + test("parses Bash with command", () => { + const result = parseInput(JSON.stringify({ + tool_name: "Bash", + tool_input: { command: "curl http://localhost:3000" }, + session_id: "s1", + })); + expect(result).not.toBeNull(); + expect(result!.toolName).toBe("Bash"); + expect(result!.toolInput.command).toBe("curl http://localhost:3000"); + }); + + test("rejects Bash without command", () => { + const result = parseInput(JSON.stringify({ + tool_name: "Bash", + tool_input: {}, + })); + expect(result).toBeNull(); + }); + + test("parses Read tool", () => { + const result = parseInput(JSON.stringify({ + tool_name: "Read", + tool_input: { file_path: "/repo/.env.local" }, + session_id: "s2", + })); + expect(result).not.toBeNull(); + expect(result!.toolName).toBe("Read"); + }); + + test("parses WebFetch tool", () => { + const result = parseInput(JSON.stringify({ + tool_name: "WebFetch", + tool_input: { url: "https://example.com/api/health" }, + session_id: "s3", + })); + expect(result).not.toBeNull(); + expect(result!.toolName).toBe("WebFetch"); + }); + + test("parses Grep tool", () => { + const result = parseInput(JSON.stringify({ + tool_name: "Grep", + tool_input: { pattern: "ERROR", path: "/var/log/app.log" }, + })); + expect(result).not.toBeNull(); + expect(result!.toolName).toBe("Grep"); + }); + + test("parses Glob tool", () => { + const result = parseInput(JSON.stringify({ + tool_name: "Glob", + tool_input: { pattern: "**/*.log" }, + })); + expect(result).not.toBeNull(); + expect(result!.toolName).toBe("Glob"); + }); + + test("rejects unsupported tool names", () => { + const result = parseInput(JSON.stringify({ + tool_name: "Agent", + tool_input: {}, + })); + expect(result).toBeNull(); + }); + + test("rejects unknown tool names", () => { + const result = parseInput(JSON.stringify({ + tool_name: "SomeFutureTool", + tool_input: { data: "test" }, + })); + expect(result).toBeNull(); + }); + + test("returns {} without throwing for unsupported payloads via run()", async () => { + const { run } = await import("../hooks/src/posttooluse-verification-observe.mts"); + + // Unknown tool + expect(run(JSON.stringify({ tool_name: "UnknownTool", tool_input: {} }))).toBe("{}"); + + // Empty input + expect(run("")).toBe("{}"); + + // Invalid JSON + expect(run("not-json")).toBe("{}"); + }); +}); + +// --------------------------------------------------------------------------- +// classifyToolSignal coverage +// --------------------------------------------------------------------------- + +describe("classifyToolSignal", () => { + test("Read .env.local → environment + soft + env-read", () => { + const result = classifyToolSignal("Read", { file_path: "/repo/.env.local" }); + expect(result).not.toBeNull(); + expect(result!.boundary).toBe("environment"); + expect(result!.signalStrength).toBe("soft"); + expect(result!.evidenceSource).toBe("env-read"); + expect(result!.matchedPattern).toBe("env-file-read"); + }); + + test("Read vercel.json → environment + soft", () => { + const result = classifyToolSignal("Read", { file_path: "/repo/vercel.json" }); + expect(result).not.toBeNull(); + expect(result!.boundary).toBe("environment"); + expect(result!.matchedPattern).toBe("vercel-config-read"); + }); + + test("Read .vercel/project.json → environment + soft", () => { + const result = classifyToolSignal("Read", { file_path: "/repo/.vercel/project.json" }); + expect(result).not.toBeNull(); + expect(result!.boundary).toBe("environment"); + }); + + test("Read server.log → serverHandler + soft + log-read", () => { + const result = classifyToolSignal("Read", { file_path: "/repo/.next/server/app.log" }); + expect(result).not.toBeNull(); + expect(result!.boundary).toBe("serverHandler"); + expect(result!.signalStrength).toBe("soft"); + expect(result!.evidenceSource).toBe("log-read"); + }); + + test("Read generic file → null (no verification evidence)", () => { + const result = classifyToolSignal("Read", { file_path: "/repo/src/index.ts" }); + expect(result).toBeNull(); + }); + + test("WebFetch → clientRequest + strong + http", () => { + const result = classifyToolSignal("WebFetch", { url: "https://example.com/api/data" }); + expect(result).not.toBeNull(); + expect(result!.boundary).toBe("clientRequest"); + expect(result!.signalStrength).toBe("strong"); + expect(result!.evidenceSource).toBe("http"); + expect(result!.matchedPattern).toBe("web-fetch"); + }); + + test("WebFetch without url → null", () => { + const result = classifyToolSignal("WebFetch", {}); + expect(result).toBeNull(); + }); + + test("Grep in log file → serverHandler + soft", () => { + const result = classifyToolSignal("Grep", { pattern: "ERROR", path: "/var/log/app.log" }); + expect(result).not.toBeNull(); + expect(result!.boundary).toBe("serverHandler"); + expect(result!.signalStrength).toBe("soft"); + expect(result!.evidenceSource).toBe("log-read"); + }); + + test("Grep in .env → environment + soft", () => { + const result = classifyToolSignal("Grep", { pattern: "API_KEY", path: ".env" }); + expect(result).not.toBeNull(); + expect(result!.boundary).toBe("environment"); + expect(result!.evidenceSource).toBe("env-read"); + }); + + test("Grep in generic path → null", () => { + const result = classifyToolSignal("Grep", { pattern: "foo", path: "/repo/src" }); + expect(result).toBeNull(); + }); + + test("Glob for *.log → serverHandler + soft", () => { + const result = classifyToolSignal("Glob", { pattern: "**/*.log" }); + expect(result).not.toBeNull(); + expect(result!.boundary).toBe("serverHandler"); + expect(result!.signalStrength).toBe("soft"); + }); + + test("Glob for .env* → environment + soft", () => { + const result = classifyToolSignal("Glob", { pattern: ".env*" }); + expect(result).not.toBeNull(); + expect(result!.boundary).toBe("environment"); + }); + + test("Glob for generic pattern → null", () => { + const result = classifyToolSignal("Glob", { pattern: "**/*.ts" }); + expect(result).toBeNull(); + }); + + test("Edit → null (not verification evidence)", () => { + const result = classifyToolSignal("Edit", { file_path: "/repo/src/page.tsx" }); + expect(result).toBeNull(); + }); + + test("Write → null (not verification evidence)", () => { + const result = classifyToolSignal("Write", { file_path: "/repo/src/page.tsx" }); + expect(result).toBeNull(); + }); +}); + +// --------------------------------------------------------------------------- +// shouldResolveRoutingOutcome gating +// --------------------------------------------------------------------------- + +describe("shouldResolveRoutingOutcome", () => { + test("strong + known boundary + Bash → true", () => { + expect(shouldResolveRoutingOutcome({ boundary: "clientRequest", signalStrength: "strong", toolName: "Bash", command: "curl http://localhost:3000" })).toBe(true); + expect(shouldResolveRoutingOutcome({ boundary: "uiRender", signalStrength: "strong", toolName: "Bash", command: "open http://localhost:3000" })).toBe(true); + expect(shouldResolveRoutingOutcome({ boundary: "serverHandler", signalStrength: "strong", toolName: "Bash", command: "tail -f server.log" })).toBe(true); + expect(shouldResolveRoutingOutcome({ boundary: "environment", signalStrength: "strong", toolName: "Bash", command: "printenv" })).toBe(true); + }); + + test("soft + known boundary → false", () => { + expect(shouldResolveRoutingOutcome({ boundary: "environment", signalStrength: "soft", toolName: "Read", command: ".env" })).toBe(false); + expect(shouldResolveRoutingOutcome({ boundary: "serverHandler", signalStrength: "soft", toolName: "Grep", command: "grep ERROR app.log" })).toBe(false); + }); + + test("strong + unknown boundary → false", () => { + expect(shouldResolveRoutingOutcome({ boundary: "unknown", signalStrength: "strong", toolName: "Bash", command: "ls" })).toBe(false); + }); + + test("soft + unknown boundary → false", () => { + expect(shouldResolveRoutingOutcome({ boundary: "unknown", signalStrength: "soft", toolName: "Bash", command: "ls" })).toBe(false); + }); + + test("WebFetch strong signal does not resolve policy for external origin", () => { + expect(shouldResolveRoutingOutcome({ + boundary: "clientRequest", + signalStrength: "strong", + toolName: "WebFetch", + command: "https://example.com/settings", + })).toBe(false); + }); + + test("WebFetch strong signal resolves policy for localhost", () => { + expect(shouldResolveRoutingOutcome({ + boundary: "clientRequest", + signalStrength: "strong", + toolName: "WebFetch", + command: "http://localhost:3000/settings", + })).toBe(true); + }); + + test("WebFetch resolves for configured VERCEL_PLUGIN_LOCAL_DEV_ORIGIN", () => { + const env = { VERCEL_PLUGIN_LOCAL_DEV_ORIGIN: "http://myapp.test:4000" }; + expect(shouldResolveRoutingOutcome({ + boundary: "clientRequest", + signalStrength: "strong", + toolName: "WebFetch", + command: "http://myapp.test:4000/dashboard", + }, env)).toBe(true); + }); + + test("Bash curl strong signal still resolves policy regardless of URL", () => { + expect(shouldResolveRoutingOutcome({ + boundary: "clientRequest", + signalStrength: "strong", + toolName: "Bash", + command: "curl https://example.com/settings", + })).toBe(true); + }); +}); + +// --------------------------------------------------------------------------- +// isLocalVerificationUrl +// --------------------------------------------------------------------------- + +describe("isLocalVerificationUrl", () => { + test("localhost is local", () => { + expect(isLocalVerificationUrl("http://localhost:3000/settings")).toBe(true); + }); + + test("127.0.0.1 is local", () => { + expect(isLocalVerificationUrl("http://127.0.0.1:3000/api")).toBe(true); + }); + + test("::1 is local", () => { + expect(isLocalVerificationUrl("http://[::1]:3000/")).toBe(true); + }); + + test("0.0.0.0 is local", () => { + expect(isLocalVerificationUrl("http://0.0.0.0:5173/dashboard")).toBe(true); + }); + + test("external host is not local", () => { + expect(isLocalVerificationUrl("https://example.com/settings")).toBe(false); + }); + + test("configured origin matches", () => { + const env = { VERCEL_PLUGIN_LOCAL_DEV_ORIGIN: "http://myapp.local:4000" }; + expect(isLocalVerificationUrl("http://myapp.local:4000/settings", env)).toBe(true); + }); + + test("configured origin mismatch", () => { + const env = { VERCEL_PLUGIN_LOCAL_DEV_ORIGIN: "http://myapp.local:4000" }; + expect(isLocalVerificationUrl("http://other.local:4000/settings", env)).toBe(false); + }); + + test("non-http protocol returns false", () => { + expect(isLocalVerificationUrl("ftp://localhost:3000/")).toBe(false); + }); + + test("invalid URL returns false", () => { + expect(isLocalVerificationUrl("not-a-url")).toBe(false); + }); +}); + +// --------------------------------------------------------------------------- +// resolveObservedStoryId +// --------------------------------------------------------------------------- + +describe("resolveObservedStoryId", () => { + const plan = { + activeStoryId: "story-settings", + stories: [ + { id: "story-settings", route: "/settings" }, + { id: "story-dashboard", route: "/dashboard" }, + ], + }; + + test("observed route selects matching story instead of active story", () => { + expect(resolveObservedStoryId(plan, "/dashboard")).toBe("story-dashboard"); + }); + + test("observed route matching active story returns that story", () => { + expect(resolveObservedStoryId(plan, "/settings")).toBe("story-settings"); + }); + + test("null observed route falls back to activeStoryId", () => { + expect(resolveObservedStoryId(plan, null)).toBe("story-settings"); + }); + + test("unmatched observed route falls back to activeStoryId", () => { + expect(resolveObservedStoryId(plan, "/unknown")).toBe("story-settings"); + }); + + test("explicit env override takes precedence", () => { + const env = { VERCEL_PLUGIN_VERIFICATION_STORY_ID: "story-override" }; + expect(resolveObservedStoryId(plan, "/dashboard", env)).toBe("story-override"); + }); + + test("ambiguous route (multiple matches) falls back to activeStoryId", () => { + const ambiguousPlan = { + activeStoryId: "story-a", + stories: [ + { id: "story-a", route: "/shared" }, + { id: "story-b", route: "/shared" }, + ], + }; + expect(resolveObservedStoryId(ambiguousPlan, "/shared")).toBe("story-a"); + }); + + test("no stories and no active story returns null", () => { + expect(resolveObservedStoryId({ stories: [], activeStoryId: null }, "/dashboard")).toBeNull(); + }); +}); + +// --------------------------------------------------------------------------- +// buildBoundaryEvent and buildLedgerObservation with new fields +// --------------------------------------------------------------------------- + +describe("buildBoundaryEvent with signalStrength and evidenceSource", () => { + test("defaults to strong/bash/Bash when not specified", () => { + const event = buildBoundaryEvent({ + command: "curl http://localhost:3000", + boundary: "clientRequest", + matchedPattern: "http-client", + inferredRoute: "/", + verificationId: "v-1", + }); + expect(event.signalStrength).toBe("strong"); + expect(event.evidenceSource).toBe("bash"); + expect(event.toolName).toBe("Bash"); + }); + + test("propagates explicit soft/env-read/Read", () => { + const event = buildBoundaryEvent({ + command: "/repo/.env.local", + boundary: "environment", + matchedPattern: "env-file-read", + inferredRoute: null, + verificationId: "v-2", + signalStrength: "soft", + evidenceSource: "env-read", + toolName: "Read", + }); + expect(event.signalStrength).toBe("soft"); + expect(event.evidenceSource).toBe("env-read"); + expect(event.toolName).toBe("Read"); + }); + + test("ledger observation includes toolName and signalStrength in meta", () => { + const event = buildBoundaryEvent({ + command: "https://example.com/api", + boundary: "clientRequest", + matchedPattern: "web-fetch", + inferredRoute: null, + verificationId: "v-3", + signalStrength: "strong", + evidenceSource: "http", + toolName: "WebFetch", + }); + const obs = buildLedgerObservation(event); + expect(obs.meta?.toolName).toBe("WebFetch"); + expect(obs.meta?.signalStrength).toBe("strong"); + expect(obs.meta?.evidenceSource).toBe("http"); + }); +}); + +// --------------------------------------------------------------------------- +// Fixture matrix: tool_name -> observer_reached +// --------------------------------------------------------------------------- + +describe("fixture matrix: tool_name -> observer_reached", () => { + const toolPayloads: Record> = { + Bash: { command: "curl http://localhost:3000/dashboard" }, + Read: { file_path: "/repo/.env.local" }, + WebFetch: { url: "https://example.com/api" }, + Grep: { pattern: "ERROR", path: "/var/log/app.log" }, + Glob: { pattern: "**/*.log" }, + // These tools produce null from classifyToolSignal but parseInput accepts them + Edit: { file_path: "/repo/src/page.tsx" }, + Write: { file_path: "/repo/src/page.tsx" }, + }; + + for (const [toolName, toolInput] of Object.entries(toolPayloads)) { + test(`parseInput accepts ${toolName}`, () => { + const result = parseInput(JSON.stringify({ + tool_name: toolName, + tool_input: toolInput, + session_id: "test-session", + })); + expect(result).not.toBeNull(); + expect(result!.toolName).toBe(toolName); + }); + } + + test("run() returns {} for each tool without throwing", async () => { + const { run } = await import("../hooks/src/posttooluse-verification-observe.mts"); + for (const [toolName, toolInput] of Object.entries(toolPayloads)) { + const output = run(JSON.stringify({ + tool_name: toolName, + tool_input: toolInput, + })); + expect(output).toBe("{}"); + } + }); +}); + +// --------------------------------------------------------------------------- +// Verification Closure Diagnosis — inspectLocalVerificationUrl +// --------------------------------------------------------------------------- + +describe("inspectLocalVerificationUrl", () => { + test("localhost returns loopback match", () => { + const result = inspectLocalVerificationUrl("http://localhost:3000/settings", {}); + expect(result.applicable).toBe(true); + expect(result.parseable).toBe(true); + expect(result.isLocal).toBe(true); + expect(result.matchSource).toBe("loopback"); + expect(result.observedHost).toBe("localhost:3000"); + }); + + test("127.0.0.1 returns loopback match", () => { + const result = inspectLocalVerificationUrl("http://127.0.0.1:4000/api", {}); + expect(result.isLocal).toBe(true); + expect(result.matchSource).toBe("loopback"); + }); + + test("[::1] returns loopback match", () => { + const result = inspectLocalVerificationUrl("http://[::1]:3000/", {}); + expect(result.isLocal).toBe(true); + expect(result.matchSource).toBe("loopback"); + }); + + test("external host returns non-local", () => { + const result = inspectLocalVerificationUrl("https://example.com/dashboard", {}); + expect(result.isLocal).toBe(false); + expect(result.matchSource).toBeNull(); + expect(result.observedHost).toBe("example.com"); + }); + + test("configured origin matches", () => { + const env = { VERCEL_PLUGIN_LOCAL_DEV_ORIGIN: "http://myapp.local:4000" }; + const result = inspectLocalVerificationUrl("http://myapp.local:4000/settings", env); + expect(result.isLocal).toBe(true); + expect(result.matchSource).toBe("configured-origin"); + expect(result.configuredOrigin).toBe("http://myapp.local:4000"); + }); + + test("non-http protocol returns non-local", () => { + const result = inspectLocalVerificationUrl("ftp://localhost:21/data", {}); + expect(result.parseable).toBe(true); + expect(result.isLocal).toBe(false); + }); + + test("invalid URL returns unparseable", () => { + const result = inspectLocalVerificationUrl("not-a-url", {}); + expect(result.parseable).toBe(false); + expect(result.isLocal).toBeNull(); + expect(result.observedHost).toBeNull(); + }); +}); + +// --------------------------------------------------------------------------- +// Verification Closure Diagnosis — evaluateResolutionGate +// --------------------------------------------------------------------------- + +describe("evaluateResolutionGate", () => { + test("strong signal + known boundary + Bash → eligible", () => { + const gate = evaluateResolutionGate({ + boundary: "clientRequest", + signalStrength: "strong", + toolName: "Bash", + command: "curl http://localhost:3000/settings", + }, {}); + expect(gate.eligible).toBe(true); + expect(gate.passedChecks).toContain("known_boundary"); + expect(gate.passedChecks).toContain("strong_signal"); + expect(gate.blockingReasonCodes).toHaveLength(0); + expect(gate.locality.applicable).toBe(false); + }); + + test("soft signal blocks with soft_signal code", () => { + const gate = evaluateResolutionGate({ + boundary: "environment", + signalStrength: "soft", + toolName: "Read", + command: ".env", + }, {}); + expect(gate.eligible).toBe(false); + expect(gate.blockingReasonCodes).toContain("soft_signal"); + expect(gate.passedChecks).toContain("known_boundary"); + }); + + test("unknown boundary blocks with unknown_boundary code", () => { + const gate = evaluateResolutionGate({ + boundary: "unknown", + signalStrength: "strong", + toolName: "Bash", + command: "ls", + }, {}); + expect(gate.eligible).toBe(false); + expect(gate.blockingReasonCodes).toContain("unknown_boundary"); + }); + + test("remote WebFetch blocks with remote_web_fetch code", () => { + const gate = evaluateResolutionGate({ + boundary: "clientRequest", + signalStrength: "strong", + toolName: "WebFetch", + command: "https://example.com/dashboard", + }, {}); + expect(gate.eligible).toBe(false); + expect(gate.passedChecks).toContain("known_boundary"); + expect(gate.passedChecks).toContain("strong_signal"); + expect(gate.blockingReasonCodes).toContain("remote_web_fetch"); + expect(gate.locality.applicable).toBe(true); + expect(gate.locality.isLocal).toBe(false); + expect(gate.locality.observedHost).toBe("example.com"); + }); + + test("local WebFetch is eligible", () => { + const gate = evaluateResolutionGate({ + boundary: "clientRequest", + signalStrength: "strong", + toolName: "WebFetch", + command: "http://localhost:3000/api/health", + }, {}); + expect(gate.eligible).toBe(true); + expect(gate.passedChecks).toContain("local_verification_url"); + expect(gate.locality.isLocal).toBe(true); + expect(gate.locality.matchSource).toBe("loopback"); + }); + + test("WebFetch with configured origin is eligible", () => { + const gate = evaluateResolutionGate({ + boundary: "clientRequest", + signalStrength: "strong", + toolName: "WebFetch", + command: "http://myapp.test:4000/dashboard", + }, { VERCEL_PLUGIN_LOCAL_DEV_ORIGIN: "http://myapp.test:4000" }); + expect(gate.eligible).toBe(true); + expect(gate.locality.matchSource).toBe("configured-origin"); + }); + + test("soft + unknown accumulates multiple blocking codes", () => { + const gate = evaluateResolutionGate({ + boundary: "unknown", + signalStrength: "soft", + toolName: "Bash", + command: "ls", + }, {}); + expect(gate.eligible).toBe(false); + expect(gate.blockingReasonCodes).toContain("unknown_boundary"); + expect(gate.blockingReasonCodes).toContain("soft_signal"); + expect(gate.blockingReasonCodes).toHaveLength(2); + }); +}); + +// --------------------------------------------------------------------------- +// Verification Closure Diagnosis — diagnosePendingExposureMatch +// --------------------------------------------------------------------------- + +describe("diagnosePendingExposureMatch", () => { + const SESSION = "diagnosis-test-" + Date.now(); + + function makeExposure(id: string, overrides: Partial = {}): SkillExposure { + return { + id, + sessionId: SESSION, + projectRoot: "/tmp/project", + storyId: null, + storyKind: "flow-verification", + route: null, + hook: "PreToolUse", + toolName: "Bash", + skill: "agent-browser-verify", + targetBoundary: "clientRequest", + exposureGroupId: "group-1", + attributionRole: "candidate", + candidateSkill: "agent-browser-verify", + createdAt: "2026-03-28T11:00:00.000Z", + resolvedAt: null, + outcome: "pending", + ...overrides, + }; + } + + test("exact match returns matched exposure IDs with no unresolved codes", () => { + const exposures = [ + makeExposure("exp-1", { storyId: "story-1", route: "/settings" }), + ]; + const result = diagnosePendingExposureMatch({ + sessionId: SESSION, + boundary: "clientRequest", + storyId: "story-1", + route: "/settings", + exposures, + }); + expect(result.exactMatchCount).toBe(1); + expect(result.exactMatchExposureIds).toEqual(["exp-1"]); + expect(result.unresolvedReasonCodes).toHaveLength(0); + }); + + test("route mismatch diagnosed when same story different route", () => { + const exposures = [ + makeExposure("exp-2", { storyId: "story-1", route: "/settings" }), + ]; + const result = diagnosePendingExposureMatch({ + sessionId: SESSION, + boundary: "clientRequest", + storyId: "story-1", + route: "/dashboard", + exposures, + }); + expect(result.exactMatchCount).toBe(0); + expect(result.unresolvedReasonCodes).toContain("route_mismatch"); + expect(result.sameStoryDifferentRouteExposureIds).toEqual(["exp-2"]); + }); + + test("story mismatch diagnosed when same route different story", () => { + const exposures = [ + makeExposure("exp-3", { storyId: "story-other", route: "/settings" }), + ]; + const result = diagnosePendingExposureMatch({ + sessionId: SESSION, + boundary: "clientRequest", + storyId: "story-1", + route: "/settings", + exposures, + }); + expect(result.exactMatchCount).toBe(0); + expect(result.unresolvedReasonCodes).toContain("story_mismatch"); + expect(result.sameRouteDifferentStoryExposureIds).toEqual(["exp-3"]); + }); + + test("missing story scope diagnosed when storyId is null", () => { + const exposures = [ + makeExposure("exp-4", { storyId: "story-1", route: "/settings" }), + ]; + const result = diagnosePendingExposureMatch({ + sessionId: SESSION, + boundary: "clientRequest", + storyId: null, + route: "/settings", + exposures, + }); + expect(result.unresolvedReasonCodes).toContain("missing_story_scope"); + expect(result.unresolvedReasonCodes).toContain("story_mismatch"); + }); + + test("missing route scope diagnosed when route is null", () => { + const exposures = [ + makeExposure("exp-5", { storyId: "story-1", route: "/settings" }), + ]; + const result = diagnosePendingExposureMatch({ + sessionId: SESSION, + boundary: "clientRequest", + storyId: "story-1", + route: null, + exposures, + }); + expect(result.unresolvedReasonCodes).toContain("missing_route_scope"); + expect(result.unresolvedReasonCodes).toContain("route_mismatch"); + }); + + test("no pending for boundary diagnosed when boundary doesn't match", () => { + const exposures = [ + makeExposure("exp-6", { + storyId: "story-1", + route: "/settings", + targetBoundary: "serverHandler", + }), + ]; + const result = diagnosePendingExposureMatch({ + sessionId: SESSION, + boundary: "clientRequest", + storyId: "story-1", + route: "/settings", + exposures, + }); + expect(result.pendingBoundaryCount).toBe(0); + expect(result.unresolvedReasonCodes).toContain("no_pending_for_boundary"); + }); + + test("already-resolved exposures are excluded from pending", () => { + const exposures = [ + makeExposure("exp-7", { + storyId: "story-1", + route: "/settings", + outcome: "win", + resolvedAt: "2026-03-28T12:00:00.000Z", + }), + ]; + const result = diagnosePendingExposureMatch({ + sessionId: SESSION, + boundary: "clientRequest", + storyId: "story-1", + route: "/settings", + exposures, + }); + expect(result.pendingTotal).toBe(0); + expect(result.pendingBoundaryCount).toBe(0); + expect(result.unresolvedReasonCodes).toContain("no_pending_for_boundary"); + }); + + test("no exact pending match as fallback when no specific reason applies", () => { + const exposures = [ + makeExposure("exp-8", { + storyId: "story-x", + route: "/other", + }), + ]; + const result = diagnosePendingExposureMatch({ + sessionId: SESSION, + boundary: "clientRequest", + storyId: "story-1", + route: "/settings", + exposures, + }); + expect(result.exactMatchCount).toBe(0); + expect(result.unresolvedReasonCodes).toContain("no_exact_pending_match"); + }); + + test("active-story fallback: pending on same boundary with null storyId matches null", () => { + const exposures = [ + makeExposure("exp-9", { storyId: null, route: "/settings" }), + ]; + const result = diagnosePendingExposureMatch({ + sessionId: SESSION, + boundary: "clientRequest", + storyId: null, + route: "/settings", + exposures, + }); + expect(result.exactMatchCount).toBe(1); + expect(result.exactMatchExposureIds).toEqual(["exp-9"]); + expect(result.unresolvedReasonCodes).toHaveLength(0); + }); + + test("ambiguous route: multiple stories on same route", () => { + const exposures = [ + makeExposure("exp-10a", { storyId: "story-a", route: "/shared" }), + makeExposure("exp-10b", { storyId: "story-b", route: "/shared" }), + ]; + const result = diagnosePendingExposureMatch({ + sessionId: SESSION, + boundary: "clientRequest", + storyId: "story-a", + route: "/shared", + exposures, + }); + expect(result.exactMatchCount).toBe(1); + expect(result.exactMatchExposureIds).toEqual(["exp-10a"]); + expect(result.sameRouteDifferentStoryExposureIds).toEqual(["exp-10b"]); + }); + + test("pendingTotal counts across all boundaries", () => { + const exposures = [ + makeExposure("exp-11a", { + storyId: "story-1", + route: "/settings", + targetBoundary: "clientRequest", + }), + makeExposure("exp-11b", { + storyId: "story-1", + route: "/settings", + targetBoundary: "serverHandler", + }), + ]; + const result = diagnosePendingExposureMatch({ + sessionId: SESSION, + boundary: "clientRequest", + storyId: "story-1", + route: "/settings", + exposures, + }); + expect(result.pendingTotal).toBe(2); + expect(result.pendingBoundaryCount).toBe(1); + expect(result.exactMatchCount).toBe(1); + }); +}); diff --git a/tests/posttooluse-verification-route-fallback.test.ts b/tests/posttooluse-verification-route-fallback.test.ts new file mode 100644 index 0000000..1be92b6 --- /dev/null +++ b/tests/posttooluse-verification-route-fallback.test.ts @@ -0,0 +1,271 @@ +import { describe, test, expect, beforeEach, afterEach } from "bun:test"; +import { unlinkSync, rmSync } from "node:fs"; +import { + resolveObservedRoute, + envString, +} from "../hooks/src/posttooluse-verification-observe.mts"; +import { + projectPolicyPath, + sessionExposurePath, + appendSkillExposure, + loadSessionExposures, + loadProjectRoutingPolicy, + resolveBoundaryOutcome, + type SkillExposure, +} from "../hooks/src/routing-policy-ledger.mts"; +import { + readRoutingDecisionTrace, + traceDir, +} from "../hooks/src/routing-decision-trace.mts"; + +// --------------------------------------------------------------------------- +// Fixtures +// --------------------------------------------------------------------------- + +const PROJECT_ROOT = "/tmp/test-project-route-fallback"; +const SESSION_ID = "route-fallback-test-" + Date.now(); +const T0 = "2026-03-27T05:00:00.000Z"; +const T1 = "2026-03-27T05:01:00.000Z"; + +function exposure(id: string, overrides: Partial = {}): SkillExposure { + return { + id, + sessionId: SESSION_ID, + projectRoot: PROJECT_ROOT, + storyId: "story-settings", + storyKind: "flow-verification", + route: "/settings", + hook: "PreToolUse", + toolName: "Bash", + skill: "agent-browser-verify", + targetBoundary: "clientRequest", + exposureGroupId: null, + attributionRole: "candidate", + candidateSkill: null, + createdAt: T0, + resolvedAt: null, + outcome: "pending", + ...overrides, + }; +} + +function cleanup() { + try { unlinkSync(projectPolicyPath(PROJECT_ROOT)); } catch {} + try { unlinkSync(sessionExposurePath(SESSION_ID)); } catch {} +} + +// --------------------------------------------------------------------------- +// Unit: resolveObservedRoute +// --------------------------------------------------------------------------- + +describe("resolveObservedRoute", () => { + test("returns inferred route when present", () => { + expect(resolveObservedRoute("/api/data", {})).toBe("/api/data"); + }); + + test("falls back to VERCEL_PLUGIN_VERIFICATION_ROUTE when inferred is null", () => { + const env = { VERCEL_PLUGIN_VERIFICATION_ROUTE: "/settings" } as NodeJS.ProcessEnv; + expect(resolveObservedRoute(null, env)).toBe("/settings"); + }); + + test("trims whitespace from directive env value", () => { + const env = { VERCEL_PLUGIN_VERIFICATION_ROUTE: " /settings " } as NodeJS.ProcessEnv; + expect(resolveObservedRoute(null, env)).toBe("/settings"); + }); + + test("returns null when both inferred and env are absent", () => { + expect(resolveObservedRoute(null, {})).toBeNull(); + }); + + test("returns null when env value is empty string", () => { + const env = { VERCEL_PLUGIN_VERIFICATION_ROUTE: "" } as NodeJS.ProcessEnv; + expect(resolveObservedRoute(null, env)).toBeNull(); + }); + + test("returns null when env value is whitespace-only", () => { + const env = { VERCEL_PLUGIN_VERIFICATION_ROUTE: " " } as NodeJS.ProcessEnv; + expect(resolveObservedRoute(null, env)).toBeNull(); + }); + + test("prefers inferred route over directive env", () => { + const env = { VERCEL_PLUGIN_VERIFICATION_ROUTE: "/dashboard" } as NodeJS.ProcessEnv; + expect(resolveObservedRoute("/api/data", env)).toBe("/api/data"); + }); +}); + +// --------------------------------------------------------------------------- +// Unit: envString +// --------------------------------------------------------------------------- + +describe("envString", () => { + test("returns trimmed value for non-empty env var", () => { + expect(envString({ FOO: " bar " } as NodeJS.ProcessEnv, "FOO")).toBe("bar"); + }); + + test("returns null for missing key", () => { + expect(envString({} as NodeJS.ProcessEnv, "MISSING")).toBeNull(); + }); + + test("returns null for empty string", () => { + expect(envString({ X: "" } as NodeJS.ProcessEnv, "X")).toBeNull(); + }); + + test("returns null for whitespace-only string", () => { + expect(envString({ X: " " } as NodeJS.ProcessEnv, "X")).toBeNull(); + }); + + test("returns null for tab-only string", () => { + expect(envString({ X: "\t\t" } as NodeJS.ProcessEnv, "X")).toBeNull(); + }); +}); + +// --------------------------------------------------------------------------- +// Integration: directive env enables route-scoped closure +// --------------------------------------------------------------------------- + +describe("directive route fallback closes route-scoped exposures", () => { + beforeEach(cleanup); + afterEach(cleanup); + + test("pending exposure resolves when command inference is null but directive env matches", () => { + appendSkillExposure(exposure("e1", { + storyId: "story-settings", + route: "/settings", + targetBoundary: "clientRequest", + createdAt: T0, + })); + + // Simulate what run() does: inferRoute returns null, but directive env has the route + const directiveRoute = resolveObservedRoute(null, { + VERCEL_PLUGIN_VERIFICATION_ROUTE: "/settings", + } as NodeJS.ProcessEnv); + + const directiveStoryId = envString( + { VERCEL_PLUGIN_VERIFICATION_STORY_ID: "story-settings" } as NodeJS.ProcessEnv, + "VERCEL_PLUGIN_VERIFICATION_STORY_ID", + ); + + const resolved = resolveBoundaryOutcome({ + sessionId: SESSION_ID, + boundary: "clientRequest", + matchedSuggestedAction: true, + storyId: directiveStoryId, + route: directiveRoute, + now: T1, + }); + + expect(resolved).toHaveLength(1); + expect(resolved[0].outcome).toBe("directive-win"); + expect(resolved[0].id).toBe("e1"); + + const policy = loadProjectRoutingPolicy(PROJECT_ROOT); + const stats = policy.scenarios["PreToolUse|flow-verification|clientRequest|Bash"]?.["agent-browser-verify"]; + expect(stats).toBeDefined(); + expect(stats!.directiveWins).toBe(1); + expect(stats!.wins).toBe(1); + }); + + test("win (not directive-win) when action does not match suggestion", () => { + appendSkillExposure(exposure("e2", { + storyId: "story-settings", + route: "/settings", + targetBoundary: "clientRequest", + createdAt: T0, + })); + + const directiveRoute = resolveObservedRoute(null, { + VERCEL_PLUGIN_VERIFICATION_ROUTE: "/settings", + } as NodeJS.ProcessEnv); + + const resolved = resolveBoundaryOutcome({ + sessionId: SESSION_ID, + boundary: "clientRequest", + matchedSuggestedAction: false, + storyId: "story-settings", + route: directiveRoute, + now: T1, + }); + + expect(resolved).toHaveLength(1); + expect(resolved[0].outcome).toBe("win"); + + const policy = loadProjectRoutingPolicy(PROJECT_ROOT); + const stats = policy.scenarios["PreToolUse|flow-verification|clientRequest|Bash"]?.["agent-browser-verify"]; + expect(stats!.directiveWins).toBe(0); + expect(stats!.wins).toBe(1); + }); +}); + +// --------------------------------------------------------------------------- +// Integration: run() with directive env (end-to-end through observer) +// --------------------------------------------------------------------------- + +describe("run() with directive env fallback", () => { + const RUN_SESSION = "run-directive-" + Date.now(); + + afterEach(() => { + try { unlinkSync(sessionExposurePath(RUN_SESSION)); } catch {} + try { rmSync(traceDir(RUN_SESSION), { recursive: true, force: true }); } catch {} + }); + + test("run() uses directive route when command has no route hint", async () => { + const { run } = await import("../hooks/src/posttooluse-verification-observe.mts"); + const { recordStory, removeLedgerArtifacts } = await import("../hooks/src/verification-ledger.mts"); + + const saved = { + VERCEL_PLUGIN_VERIFICATION_ROUTE: process.env.VERCEL_PLUGIN_VERIFICATION_ROUTE, + VERCEL_PLUGIN_VERIFICATION_STORY_ID: process.env.VERCEL_PLUGIN_VERIFICATION_STORY_ID, + VERCEL_PLUGIN_VERIFICATION_BOUNDARY: process.env.VERCEL_PLUGIN_VERIFICATION_BOUNDARY, + VERCEL_PLUGIN_VERIFICATION_ACTION: process.env.VERCEL_PLUGIN_VERIFICATION_ACTION, + }; + + try { + recordStory(RUN_SESSION, "flow-verification", "/settings", "directive fallback", []); + + // Set directive env + process.env.VERCEL_PLUGIN_VERIFICATION_ROUTE = "/settings"; + process.env.VERCEL_PLUGIN_VERIFICATION_STORY_ID = ""; + process.env.VERCEL_PLUGIN_VERIFICATION_BOUNDARY = "clientRequest"; + process.env.VERCEL_PLUGIN_VERIFICATION_ACTION = "curl $LOCAL_URL"; + + // Add exposure to close + appendSkillExposure({ + id: "run-e1", + sessionId: RUN_SESSION, + projectRoot: PROJECT_ROOT, + storyId: "flow-verification", + storyKind: "flow-verification", + route: "/settings", + hook: "PreToolUse", + toolName: "Bash", + skill: "agent-browser-verify", + targetBoundary: "clientRequest", + createdAt: T0, + resolvedAt: null, + outcome: "pending", + }); + + // Command that does NOT contain a route URL — forces directive fallback + const input = JSON.stringify({ + tool_name: "Bash", + tool_input: { command: "curl $LOCAL_URL" }, + session_id: RUN_SESSION, + }); + + run(input); + + // Verify trace has the directive-derived route + const traces = readRoutingDecisionTrace(RUN_SESSION); + const postTrace = traces.find((t) => t.hook === "PostToolUse"); + expect(postTrace).toBeDefined(); + expect(postTrace!.observedRoute).toBe("/settings"); + } finally { + // Restore env + for (const [k, v] of Object.entries(saved)) { + if (v === undefined) delete process.env[k]; + else process.env[k] = v; + } + removeLedgerArtifacts(RUN_SESSION); + } + }); +}); diff --git a/tests/pretooluse-companion-rulebook.test.ts b/tests/pretooluse-companion-rulebook.test.ts new file mode 100644 index 0000000..4223a75 --- /dev/null +++ b/tests/pretooluse-companion-rulebook.test.ts @@ -0,0 +1,333 @@ +import { describe, test, expect, beforeEach, afterEach } from "bun:test"; +import { writeFileSync, mkdirSync, rmSync, existsSync } from "node:fs"; +import { tmpdir } from "node:os"; +import { join } from "node:path"; +import { randomUUID } from "node:crypto"; +import { + companionRulebookPath, + saveCompanionRulebook, + createEmptyCompanionRulebook, + type LearnedCompanionRule, + type LearnedCompanionRulebook, +} from "../hooks/src/learned-companion-rulebook.mts"; +import { + recallVerifiedCompanions, + type CompanionRecallResult, +} from "../hooks/src/companion-recall.mts"; + +// --------------------------------------------------------------------------- +// Fixtures +// --------------------------------------------------------------------------- + +const T0 = "2026-03-28T08:00:00.000Z"; +const PROJECT = `/tmp/test-companion-pretool-${randomUUID()}`; +const SCENARIO = "PreToolUse|flow-verification|uiRender|Bash|/dashboard"; + +function makeRule( + overrides: Partial = {}, +): LearnedCompanionRule { + return { + id: `${SCENARIO}::verification->agent-browser-verify`, + scenario: SCENARIO, + hook: "PreToolUse", + storyKind: "flow-verification", + targetBoundary: "uiRender", + toolName: "Bash", + routeScope: "/dashboard", + candidateSkill: "verification", + companionSkill: "agent-browser-verify", + support: 5, + winsWithCompanion: 4, + winsWithoutCompanion: 2, + directiveWinsWithCompanion: 1, + staleMissesWithCompanion: 0, + precisionWithCompanion: 0.8, + baselinePrecisionWithoutCompanion: 0.5, + liftVsCandidateAlone: 1.6, + staleMissDelta: 0, + confidence: "promote", + promotedAt: T0, + reason: "companion beats candidate-alone within same verified scenario", + sourceExposureGroupIds: ["g-1", "g-2", "g-3", "g-4", "g-5"], + ...overrides, + }; +} + +function makeRulebook( + rules: LearnedCompanionRule[] = [makeRule()], +): LearnedCompanionRulebook { + return { + version: 1, + generatedAt: T0, + projectRoot: PROJECT, + rules, + replay: { baselineWins: 0, learnedWins: 0, deltaWins: 0, regressions: [] }, + promotion: { + accepted: true, + errorCode: null, + reason: `${rules.filter((r) => r.confidence === "promote").length} promoted companion rules`, + }, + }; +} + +// --------------------------------------------------------------------------- +// Lifecycle +// --------------------------------------------------------------------------- + +beforeEach(() => { + // Ensure the project directory exists for path hashing + mkdirSync(PROJECT, { recursive: true }); +}); + +afterEach(() => { + // Clean up rulebook file + const path = companionRulebookPath(PROJECT); + try { rmSync(path); } catch {} + try { rmSync(PROJECT, { recursive: true }); } catch {} +}); + +// --------------------------------------------------------------------------- +// Tests +// --------------------------------------------------------------------------- + +describe("PreToolUse companion recall", () => { + test("recalls promoted companion after its candidate", () => { + saveCompanionRulebook(PROJECT, makeRulebook()); + + const result = recallVerifiedCompanions({ + projectRoot: PROJECT, + scenario: { + hook: "PreToolUse", + storyKind: "flow-verification", + targetBoundary: "uiRender", + toolName: "Bash", + routeScope: "/dashboard", + }, + candidateSkills: ["verification"], + excludeSkills: new Set(), + maxCompanions: 1, + }); + + expect(result.selected).toHaveLength(1); + expect(result.selected[0].candidateSkill).toBe("verification"); + expect(result.selected[0].companionSkill).toBe("agent-browser-verify"); + expect(result.selected[0].confidence).toBe(1.6); + expect(result.rejected).toHaveLength(0); + }); + + test("rejects companion when it is in excludeSkills", () => { + saveCompanionRulebook(PROJECT, makeRulebook()); + + const result = recallVerifiedCompanions({ + projectRoot: PROJECT, + scenario: { + hook: "PreToolUse", + storyKind: "flow-verification", + targetBoundary: "uiRender", + toolName: "Bash", + routeScope: "/dashboard", + }, + candidateSkills: ["verification"], + excludeSkills: new Set(["agent-browser-verify"]), + maxCompanions: 1, + }); + + expect(result.selected).toHaveLength(0); + expect(result.rejected).toHaveLength(1); + expect(result.rejected[0].rejectedReason).toBe("excluded"); + }); + + test("no-ops when rulebook artifact is missing", () => { + // Don't write any rulebook — file does not exist + const result = recallVerifiedCompanions({ + projectRoot: PROJECT, + scenario: { + hook: "PreToolUse", + storyKind: "flow-verification", + targetBoundary: "uiRender", + toolName: "Bash", + routeScope: "/dashboard", + }, + candidateSkills: ["verification"], + excludeSkills: new Set(), + maxCompanions: 1, + }); + + // Empty rulebook returns no candidates + expect(result.selected).toHaveLength(0); + expect(result.rejected).toHaveLength(0); + }); + + test("no-ops when rulebook is invalid JSON", () => { + const path = companionRulebookPath(PROJECT); + writeFileSync(path, "not-json{{{"); + + const result = recallVerifiedCompanions({ + projectRoot: PROJECT, + scenario: { + hook: "PreToolUse", + storyKind: "flow-verification", + targetBoundary: "uiRender", + toolName: "Bash", + routeScope: "/dashboard", + }, + candidateSkills: ["verification"], + excludeSkills: new Set(), + maxCompanions: 1, + }); + + expect(result.selected).toHaveLength(0); + }); + + test("skips holdout-fail rules", () => { + const rulebook = makeRulebook([ + makeRule({ confidence: "holdout-fail", promotedAt: null }), + ]); + saveCompanionRulebook(PROJECT, rulebook); + + const result = recallVerifiedCompanions({ + projectRoot: PROJECT, + scenario: { + hook: "PreToolUse", + storyKind: "flow-verification", + targetBoundary: "uiRender", + toolName: "Bash", + routeScope: "/dashboard", + }, + candidateSkills: ["verification"], + excludeSkills: new Set(), + maxCompanions: 1, + }); + + expect(result.selected).toHaveLength(0); + }); + + test("respects maxCompanions cap", () => { + const rules = [ + makeRule({ companionSkill: "companion-a", id: `${SCENARIO}::verification->companion-a` }), + makeRule({ companionSkill: "companion-b", id: `${SCENARIO}::verification->companion-b` }), + ]; + saveCompanionRulebook(PROJECT, makeRulebook(rules)); + + const result = recallVerifiedCompanions({ + projectRoot: PROJECT, + scenario: { + hook: "PreToolUse", + storyKind: "flow-verification", + targetBoundary: "uiRender", + toolName: "Bash", + routeScope: "/dashboard", + }, + candidateSkills: ["verification"], + excludeSkills: new Set(), + maxCompanions: 1, + }); + + expect(result.selected).toHaveLength(1); + }); + + test("selects companion with highest lift first", () => { + const rules = [ + makeRule({ + companionSkill: "low-lift", + liftVsCandidateAlone: 1.3, + id: `${SCENARIO}::verification->low-lift`, + }), + makeRule({ + companionSkill: "high-lift", + liftVsCandidateAlone: 2.0, + id: `${SCENARIO}::verification->high-lift`, + }), + ]; + saveCompanionRulebook(PROJECT, makeRulebook(rules)); + + const result = recallVerifiedCompanions({ + projectRoot: PROJECT, + scenario: { + hook: "PreToolUse", + storyKind: "flow-verification", + targetBoundary: "uiRender", + toolName: "Bash", + routeScope: "/dashboard", + }, + candidateSkills: ["verification"], + excludeSkills: new Set(), + maxCompanions: 2, + }); + + expect(result.selected).toHaveLength(2); + expect(result.selected[0].companionSkill).toBe("high-lift"); + expect(result.selected[1].companionSkill).toBe("low-lift"); + }); + + test("records trigger and reasonCode for recalled companions", () => { + saveCompanionRulebook(PROJECT, makeRulebook()); + + const result = recallVerifiedCompanions({ + projectRoot: PROJECT, + scenario: { + hook: "PreToolUse", + storyKind: "flow-verification", + targetBoundary: "uiRender", + toolName: "Bash", + routeScope: "/dashboard", + }, + candidateSkills: ["verification"], + excludeSkills: new Set(), + maxCompanions: 1, + }); + + expect(result.selected[0].reason).toBe( + "companion beats candidate-alone within same verified scenario", + ); + }); + + test("does not match when candidate skill is not in candidateSkills list", () => { + saveCompanionRulebook(PROJECT, makeRulebook()); + + const result = recallVerifiedCompanions({ + projectRoot: PROJECT, + scenario: { + hook: "PreToolUse", + storyKind: "flow-verification", + targetBoundary: "uiRender", + toolName: "Bash", + routeScope: "/dashboard", + }, + candidateSkills: ["some-other-skill"], + excludeSkills: new Set(), + maxCompanions: 1, + }); + + expect(result.selected).toHaveLength(0); + }); + + test("falls back to wildcard scenario when exact route does not match", () => { + const wildcardScenario = "PreToolUse|flow-verification|uiRender|Bash|*"; + const rulebook = makeRulebook([ + makeRule({ + scenario: wildcardScenario, + routeScope: "*", + id: `${wildcardScenario}::verification->agent-browser-verify`, + }), + ]); + saveCompanionRulebook(PROJECT, rulebook); + + const result = recallVerifiedCompanions({ + projectRoot: PROJECT, + scenario: { + hook: "PreToolUse", + storyKind: "flow-verification", + targetBoundary: "uiRender", + toolName: "Bash", + routeScope: "/some-other-route", + }, + candidateSkills: ["verification"], + excludeSkills: new Set(), + maxCompanions: 1, + }); + + expect(result.selected).toHaveLength(1); + expect(result.selected[0].companionSkill).toBe("agent-browser-verify"); + }); +}); diff --git a/tests/pretooluse-playbook-recall.test.ts b/tests/pretooluse-playbook-recall.test.ts new file mode 100644 index 0000000..0984c2e --- /dev/null +++ b/tests/pretooluse-playbook-recall.test.ts @@ -0,0 +1,285 @@ +import { describe, expect, test } from "bun:test"; +import { applyVerifiedPlaybookInsertion, buildPlaybookExposureRoles, formatOutput } from "../hooks/src/pretooluse-skill-inject.mts"; + +describe("applyVerifiedPlaybookInsertion", () => { + test("splices ordered steps after anchor and emits verified-playbook reasons", () => { + const result = applyVerifiedPlaybookInsertion({ + rankedSkills: ["verification", "vercel-functions"], + matched: new Set(["verification", "vercel-functions"]), + injectedSkills: new Set(["workflow"]), + dedupOff: false, + forceSummarySkills: new Set(), + selection: { + anchorSkill: "verification", + insertedSkills: ["workflow", "agent-browser-verify"], + banner: "[vercel-plugin] Verified playbook applied", + }, + }); + + expect(result.rankedSkills).toEqual([ + "verification", + "workflow", + "agent-browser-verify", + "vercel-functions", + ]); + expect(result.reasons.workflow).toEqual({ + trigger: "verified-playbook", + reasonCode: "scenario-playbook-rulebook", + }); + expect(result.reasons["agent-browser-verify"]).toEqual({ + trigger: "verified-playbook", + reasonCode: "scenario-playbook-rulebook", + }); + expect([...result.forceSummarySkills]).toEqual(["workflow"]); + expect(result.banner).toBe("[vercel-plugin] Verified playbook applied"); + }); + + test("no-ops when anchor skill is absent", () => { + const result = applyVerifiedPlaybookInsertion({ + rankedSkills: ["vercel-functions"], + matched: new Set(["vercel-functions"]), + injectedSkills: new Set(), + dedupOff: false, + forceSummarySkills: new Set(), + selection: { + anchorSkill: "verification", + insertedSkills: ["workflow"], + banner: "[vercel-plugin] Verified playbook applied", + }, + }); + + expect(result.rankedSkills).toEqual(["vercel-functions"]); + expect(result.reasons).toEqual({}); + expect(result.banner).toBeNull(); + }); + + test("no-ops when selection is null", () => { + const result = applyVerifiedPlaybookInsertion({ + rankedSkills: ["verification", "vercel-functions"], + matched: new Set(["verification", "vercel-functions"]), + injectedSkills: new Set(), + dedupOff: false, + forceSummarySkills: new Set(), + selection: null, + }); + + expect(result.rankedSkills).toEqual(["verification", "vercel-functions"]); + expect(result.reasons).toEqual({}); + expect(result.banner).toBeNull(); + }); + + test("skips inserted skills already present in rankedSkills", () => { + const result = applyVerifiedPlaybookInsertion({ + rankedSkills: ["verification", "workflow", "vercel-functions"], + matched: new Set(["verification", "workflow", "vercel-functions"]), + injectedSkills: new Set(), + dedupOff: false, + forceSummarySkills: new Set(), + selection: { + anchorSkill: "verification", + insertedSkills: ["workflow", "agent-browser-verify"], + banner: null, + }, + }); + + // "workflow" already present, only "agent-browser-verify" is inserted + expect(result.rankedSkills).toEqual([ + "verification", + "agent-browser-verify", + "workflow", + "vercel-functions", + ]); + expect(result.reasons.workflow).toBeUndefined(); + expect(result.reasons["agent-browser-verify"]).toEqual({ + trigger: "verified-playbook", + reasonCode: "scenario-playbook-rulebook", + }); + }); + + test("does not mark deduped skills as forceSummary when dedupOff is true", () => { + const result = applyVerifiedPlaybookInsertion({ + rankedSkills: ["verification", "vercel-functions"], + matched: new Set(["verification", "vercel-functions"]), + injectedSkills: new Set(["workflow"]), + dedupOff: true, + forceSummarySkills: new Set(), + selection: { + anchorSkill: "verification", + insertedSkills: ["workflow"], + banner: null, + }, + }); + + expect(result.rankedSkills).toEqual([ + "verification", + "workflow", + "vercel-functions", + ]); + expect(result.forceSummarySkills.size).toBe(0); + }); +}); + +describe("buildPlaybookExposureRoles", () => { + test("marks anchor as candidate and inserted steps as context", () => { + const roles = buildPlaybookExposureRoles([ + "verification", + "workflow", + "agent-browser-verify", + ]); + expect(roles).toEqual([ + { skill: "verification", attributionRole: "candidate", candidateSkill: "verification" }, + { skill: "workflow", attributionRole: "context", candidateSkill: "verification" }, + { skill: "agent-browser-verify", attributionRole: "context", candidateSkill: "verification" }, + ]); + }); + + test("returns single candidate for solo anchor", () => { + const roles = buildPlaybookExposureRoles(["verification"]); + expect(roles).toEqual([ + { skill: "verification", attributionRole: "candidate", candidateSkill: "verification" }, + ]); + }); + + test("returns empty array for empty input", () => { + expect(buildPlaybookExposureRoles([])).toEqual([]); + }); + + test("filters out empty strings", () => { + const roles = buildPlaybookExposureRoles(["", "verification", "", "workflow"]); + expect(roles).toEqual([ + { skill: "verification", attributionRole: "candidate", candidateSkill: "verification" }, + { skill: "workflow", attributionRole: "context", candidateSkill: "verification" }, + ]); + }); +}); + +// --------------------------------------------------------------------------- +// Hook-layer integration: banner/reason contract through formatOutput +// --------------------------------------------------------------------------- + +describe("hook-layer playbook banner/reason contract", () => { + /** + * Helper: extract the skillInjection metadata object from formatOutput's + * HTML comment embedded in the additionalContext. + */ + function extractMetadata(output: string): Record | null { + const parsed = JSON.parse(output); + const ctx: string = + parsed.hookSpecificOutput?.additionalContext ?? ""; + const match = ctx.match(//); + if (!match) return null; + return JSON.parse(match[1]) as Record; + } + + test("playbook banner appears exactly once in additionalContext", () => { + const playbookApply = applyVerifiedPlaybookInsertion({ + rankedSkills: ["verification", "vercel-functions"], + matched: new Set(["verification", "vercel-functions"]), + injectedSkills: new Set(), + dedupOff: false, + forceSummarySkills: new Set(), + selection: { + anchorSkill: "verification", + insertedSkills: ["workflow"], + banner: "[vercel-plugin] Verified playbook applied", + }, + }); + + // Simulate run() wiring: prepend banner to parts + const parts = ["skill-body-placeholder"]; + if (playbookApply.banner) { + parts.unshift(playbookApply.banner); + } + + const output = formatOutput({ + parts, + matched: playbookApply.matched, + injectedSkills: ["verification", "workflow", "vercel-functions"], + droppedByCap: [], + toolName: "Bash", + toolTarget: "npm run dev", + reasons: playbookApply.reasons, + }); + + const parsed = JSON.parse(output); + const ctx: string = parsed.hookSpecificOutput?.additionalContext ?? ""; + + // Banner appears exactly once + const bannerCount = ctx.split("[vercel-plugin] Verified playbook applied").length - 1; + expect(bannerCount).toBe(1); + }); + + test("metadata exposes trigger and reasonCode for each playbook-inserted skill", () => { + const playbookApply = applyVerifiedPlaybookInsertion({ + rankedSkills: ["verification", "vercel-functions"], + matched: new Set(["verification", "vercel-functions"]), + injectedSkills: new Set(), + dedupOff: false, + forceSummarySkills: new Set(), + selection: { + anchorSkill: "verification", + insertedSkills: ["workflow", "agent-browser-verify"], + banner: "[vercel-plugin] Verified playbook applied", + }, + }); + + const parts = ["body"]; + if (playbookApply.banner) parts.unshift(playbookApply.banner); + + const output = formatOutput({ + parts, + matched: playbookApply.matched, + injectedSkills: ["verification", "workflow", "agent-browser-verify", "vercel-functions"], + droppedByCap: [], + toolName: "Bash", + toolTarget: "npm run dev", + reasons: playbookApply.reasons, + }); + + const meta = extractMetadata(output); + expect(meta).not.toBeNull(); + const reasons = meta!.reasons as Record; + expect(reasons.workflow).toEqual({ + trigger: "verified-playbook", + reasonCode: "scenario-playbook-rulebook", + }); + expect(reasons["agent-browser-verify"]).toEqual({ + trigger: "verified-playbook", + reasonCode: "scenario-playbook-rulebook", + }); + }); + + test("no playbook reasons or banner in metadata when selection is null", () => { + const playbookApply = applyVerifiedPlaybookInsertion({ + rankedSkills: ["vercel-functions"], + matched: new Set(["vercel-functions"]), + injectedSkills: new Set(), + dedupOff: false, + forceSummarySkills: new Set(), + selection: null, + }); + + const parts = ["body"]; + // No banner to prepend + expect(playbookApply.banner).toBeNull(); + + const output = formatOutput({ + parts, + matched: playbookApply.matched, + injectedSkills: ["vercel-functions"], + droppedByCap: [], + toolName: "Read", + toolTarget: "src/app.tsx", + reasons: playbookApply.reasons, + }); + + const meta = extractMetadata(output); + expect(meta).not.toBeNull(); + // No reasons key when reasons is empty + expect(meta!.reasons).toBeUndefined(); + + const parsed = JSON.parse(output); + const ctx: string = parsed.hookSpecificOutput?.additionalContext ?? ""; + expect(ctx).not.toContain("Verified playbook"); + }); +}); diff --git a/tests/pretooluse-policy-recall.test.ts b/tests/pretooluse-policy-recall.test.ts new file mode 100644 index 0000000..3f3fa17 --- /dev/null +++ b/tests/pretooluse-policy-recall.test.ts @@ -0,0 +1,648 @@ +import { describe, test, expect, beforeEach, afterEach } from "bun:test"; +import { writeFileSync, mkdirSync, rmSync, unlinkSync } from "node:fs"; +import { join, resolve } from "node:path"; +import { + createEmptyRoutingPolicy, + type RoutingPolicyFile, +} from "../hooks/src/routing-policy.mts"; +import { + projectPolicyPath, + saveProjectRoutingPolicy, +} from "../hooks/src/routing-policy-ledger.mts"; +import { + statePath as verificationStatePath, +} from "../hooks/src/verification-ledger.mts"; +import { + readRoutingDecisionTrace, + traceDir, +} from "../hooks/src/routing-decision-trace.mts"; +import { + tryClaimSessionKey, + dedupClaimDirPath, +} from "../hooks/src/hook-env.mts"; + +// --------------------------------------------------------------------------- +// Constants +// --------------------------------------------------------------------------- + +const ROOT = resolve(import.meta.dirname, ".."); +const HOOK_SCRIPT = join(ROOT, "hooks", "pretooluse-skill-inject.mjs"); +const TEST_PROJECT = "/tmp/test-pretooluse-policy-recall-" + Date.now(); + +const T0 = "2026-03-27T19:00:00.000Z"; +const T1 = "2026-03-27T19:01:00.000Z"; + +// --------------------------------------------------------------------------- +// Helpers +// --------------------------------------------------------------------------- + +/** Write a mock verification plan so loadCachedPlanResult returns a story. */ +function writeMockPlanState(sessionId: string, opts?: { + storyKind?: string; + route?: string | null; + targetBoundary?: string; +}): void { + const sp = verificationStatePath(sessionId); + mkdirSync(join(sp, ".."), { recursive: true }); + writeFileSync(sp, JSON.stringify({ + version: 1, + stories: [{ + id: "recall-story-1", + kind: opts?.storyKind ?? "flow-verification", + route: opts?.route ?? "/settings", + promptExcerpt: "test policy recall", + createdAt: T0, + updatedAt: T1, + requestedSkills: [], + }], + observationIds: [], + satisfiedBoundaries: [], + missingBoundaries: [opts?.targetBoundary ?? "clientRequest"], + recentRoutes: [], + primaryNextAction: { + targetBoundary: opts?.targetBoundary ?? "clientRequest", + suggestedAction: "curl http://localhost:3000/settings", + }, + blockedReasons: [], + })); +} + +function cleanupPlanState(sessionId: string): void { + const sp = verificationStatePath(sessionId); + try { rmSync(join(sp, ".."), { recursive: true, force: true }); } catch {} +} + +function cleanupPolicyFile(): void { + try { unlinkSync(projectPolicyPath(TEST_PROJECT)); } catch {} +} + +/** Build a policy with a strong verified winner for a given scenario key. */ +function buildStrongPolicy( + skill: string, + scenarioKey: string, + overrides?: Partial<{ exposures: number; wins: number; directiveWins: number; staleMisses: number }>, +): RoutingPolicyFile { + const policy = createEmptyRoutingPolicy(); + policy.scenarios[scenarioKey] = { + [skill]: { + exposures: overrides?.exposures ?? 5, + wins: overrides?.wins ?? 5, + directiveWins: overrides?.directiveWins ?? 2, + staleMisses: overrides?.staleMisses ?? 0, + lastUpdatedAt: T0, + }, + }; + return policy; +} + +/** Run PreToolUse hook as subprocess with cwd pointing to TEST_PROJECT. */ +async function runHook( + toolName: string, + toolInput: Record, + sessionId: string, + extraEnv?: Record, +): Promise<{ code: number; stdout: string; stderr: string; parsed: Record | null }> { + const payload = JSON.stringify({ + tool_name: toolName, + tool_input: toolInput, + session_id: sessionId, + cwd: TEST_PROJECT, + }); + const proc = Bun.spawn(["node", HOOK_SCRIPT], { + stdin: "pipe", + stdout: "pipe", + stderr: "pipe", + env: { + ...process.env, + VERCEL_PLUGIN_SEEN_SKILLS: "", + VERCEL_PLUGIN_LOG_LEVEL: "debug", + ...extraEnv, + }, + }); + proc.stdin.write(payload); + proc.stdin.end(); + const code = await proc.exited; + const stdout = await new Response(proc.stdout).text(); + const stderr = await new Response(proc.stderr).text(); + let parsed: Record | null = null; + try { parsed = JSON.parse(stdout); } catch {} + return { code, stdout, stderr, parsed }; +} + +/** Extract skillInjection metadata from hook output. */ +function extractInjectionMeta(stdout: string): { + matchedSkills: string[]; + injectedSkills: string[]; + reasons: Record; +} | null { + try { + const output = JSON.parse(stdout); + const ctx = output.hookSpecificOutput?.additionalContext || ""; + const match = ctx.match(//); + if (!match) return null; + return JSON.parse(match[1]); + } catch { + return null; + } +} + +/** Parse structured stderr log lines. */ +function parseLogLines(stderr: string): Array> { + return stderr + .split("\n") + .filter((l) => l.trim()) + .map((l) => { try { return JSON.parse(l); } catch { return null; } }) + .filter((o): o is Record => o !== null); +} + +// --------------------------------------------------------------------------- +// Tests +// --------------------------------------------------------------------------- + +describe("pretooluse policy recall integration", () => { + let sessionId: string; + + beforeEach(() => { + sessionId = `recall-test-${Date.now()}-${Math.random().toString(36).slice(2)}`; + mkdirSync(TEST_PROJECT, { recursive: true }); + }); + + afterEach(() => { + cleanupPlanState(sessionId); + cleanupPolicyFile(); + try { rmSync(traceDir(sessionId), { recursive: true, force: true }); } catch {} + // Clean up file-based dedup claims + try { rmSync(dedupClaimDirPath(sessionId, "seen-skills"), { recursive: true, force: true }); } catch {} + }); + + test("recalls a verified winner when pattern matching misses it", async () => { + // Seed plan state with story route /settings and target boundary clientRequest + writeMockPlanState(sessionId, { + storyKind: "flow-verification", + route: "/settings", + targetBoundary: "clientRequest", + }); + + // Seed policy with strong exact-route bucket for "verification" skill + const policy = buildStrongPolicy( + "verification", + "PreToolUse|flow-verification|clientRequest|Bash|/settings", + ); + saveProjectRoutingPolicy(TEST_PROJECT, policy); + + // Run PreToolUse with a command that won't pattern-match "verification" + const result = await runHook( + "Bash", + { command: "echo hello" }, + sessionId, + ); + + expect(result.code).toBe(0); + + // Check debug logs for policy-recall-injected + const logs = parseLogLines(result.stderr); + const recallLog = logs.find((l) => l.event === "policy-recall-injected"); + expect(recallLog).toBeDefined(); + expect(recallLog!.skill).toBe("verification"); + expect(recallLog!.scenario).toBe("PreToolUse|flow-verification|clientRequest|Bash|/settings"); + + // Check injection metadata + const meta = extractInjectionMeta(result.stdout); + if (meta) { + expect(meta.injectedSkills).toContain("verification"); + expect(meta.reasons?.verification?.trigger).toBe("policy-recall"); + expect(meta.reasons?.verification?.reasonCode).toBe("route-scoped-verified-policy-recall"); + } + }); + + test("recalled skill is NOT forced into summary-only mode (summary path is identical to full)", async () => { + writeMockPlanState(sessionId, { + storyKind: "flow-verification", + route: "/settings", + targetBoundary: "clientRequest", + }); + + const policy = buildStrongPolicy( + "verification", + "PreToolUse|flow-verification|clientRequest|Bash|/settings", + ); + saveProjectRoutingPolicy(TEST_PROJECT, policy); + + const result = await runHook( + "Bash", + { command: "echo hello" }, + sessionId, + ); + expect(result.code).toBe(0); + + // Recalled skill should NOT be marked summary-only since the summary and + // full payloads are identical (both use skillInvocationMessage). + const traces = readRoutingDecisionTrace(sessionId); + if (traces.length > 0) { + const recallEntry = traces[0].ranked?.find( + (r: { skill: string }) => r.skill === "verification", + ); + if (recallEntry) { + expect(recallEntry.summaryOnly).toBe(false); + expect(recallEntry.synthetic).toBe(true); + } + } + }); + + test("decision trace marks recalled skill as synthetic with policy-recall reason", async () => { + writeMockPlanState(sessionId, { + storyKind: "flow-verification", + route: "/settings", + targetBoundary: "clientRequest", + }); + + const policy = buildStrongPolicy( + "verification", + "PreToolUse|flow-verification|clientRequest|Bash|/settings", + ); + saveProjectRoutingPolicy(TEST_PROJECT, policy); + + const result = await runHook( + "Bash", + { command: "echo hello" }, + sessionId, + ); + expect(result.code).toBe(0); + + const traces = readRoutingDecisionTrace(sessionId); + expect(traces.length).toBeGreaterThanOrEqual(1); + + const recallEntry = traces[0].ranked?.find( + (r: { skill: string }) => r.skill === "verification", + ); + expect(recallEntry).toBeDefined(); + expect(recallEntry!.synthetic).toBe(true); + expect(recallEntry!.pattern?.type).toBe("policy-recall"); + expect(recallEntry!.pattern?.value).toBe("route-scoped-verified-policy-recall"); + }); + + test("does not recall when no active verification story exists", async () => { + // No writeMockPlanState — no story + + const policy = buildStrongPolicy( + "verification", + "PreToolUse|flow-verification|clientRequest|Bash|/settings", + ); + saveProjectRoutingPolicy(TEST_PROJECT, policy); + + const result = await runHook( + "Bash", + { command: "echo hello" }, + sessionId, + ); + expect(result.code).toBe(0); + + const logs = parseLogLines(result.stderr); + const recallLog = logs.find((l) => l.event === "policy-recall-injected"); + expect(recallLog).toBeUndefined(); + + const skipLog = logs.find((l) => l.event === "policy-recall-skipped"); + expect(skipLog).toBeDefined(); + }); + + test("does not recall when target boundary is null", async () => { + // Plan state with no target boundary + const sp = verificationStatePath(sessionId); + mkdirSync(join(sp, ".."), { recursive: true }); + writeFileSync(sp, JSON.stringify({ + version: 1, + stories: [{ + id: "no-boundary-story", + kind: "flow-verification", + route: "/settings", + promptExcerpt: "test", + createdAt: T0, + updatedAt: T1, + requestedSkills: [], + }], + observationIds: [], + satisfiedBoundaries: [], + missingBoundaries: [], + recentRoutes: [], + primaryNextAction: { targetBoundary: null, suggestedAction: null }, + blockedReasons: [], + })); + + const policy = buildStrongPolicy( + "verification", + "PreToolUse|flow-verification|clientRequest|Bash|/settings", + ); + saveProjectRoutingPolicy(TEST_PROJECT, policy); + + const result = await runHook( + "Bash", + { command: "echo hello" }, + sessionId, + ); + expect(result.code).toBe(0); + + const logs = parseLogLines(result.stderr); + const recallLog = logs.find((l) => l.event === "policy-recall-injected"); + expect(recallLog).toBeUndefined(); + }); + + test("does not recall a skill that is already ranked via pattern matching", async () => { + writeMockPlanState(sessionId, { + storyKind: "flow-verification", + route: "/settings", + targetBoundary: "clientRequest", + }); + + // Build policy for "nextjs" which will be pattern-matched by next.config.ts + const policy = buildStrongPolicy( + "nextjs", + "PreToolUse|flow-verification|clientRequest|Read|/settings", + ); + saveProjectRoutingPolicy(TEST_PROJECT, policy); + + // Run with a file read that will pattern-match nextjs + const result = await runHook( + "Read", + { file_path: "next.config.ts" }, + sessionId, + ); + expect(result.code).toBe(0); + + // Should not appear as policy-recall since it was already matched via patterns + const logs = parseLogLines(result.stderr); + const recallLog = logs.find( + (l) => l.event === "policy-recall-injected" && l.skill === "nextjs", + ); + expect(recallLog).toBeUndefined(); + }); + + test("does not recall a skill that is already in injectedSkills (dedup)", async () => { + writeMockPlanState(sessionId, { + storyKind: "flow-verification", + route: "/settings", + targetBoundary: "clientRequest", + }); + + const policy = buildStrongPolicy( + "verification", + "PreToolUse|flow-verification|clientRequest|Bash|/settings", + ); + saveProjectRoutingPolicy(TEST_PROJECT, policy); + + // Claim verification via file-based dedup (the authoritative source when sessionId is present) + tryClaimSessionKey(sessionId, "seen-skills", "verification"); + + const result = await runHook( + "Bash", + { command: "echo hello" }, + sessionId, + ); + expect(result.code).toBe(0); + + const logs = parseLogLines(result.stderr); + const recallLog = logs.find((l) => l.event === "policy-recall-injected"); + expect(recallLog).toBeUndefined(); + }); + + test("respects existing cap and budget behavior", async () => { + writeMockPlanState(sessionId, { + storyKind: "flow-verification", + route: "/settings", + targetBoundary: "clientRequest", + }); + + const policy = buildStrongPolicy( + "verification", + "PreToolUse|flow-verification|clientRequest|Bash|/settings", + ); + saveProjectRoutingPolicy(TEST_PROJECT, policy); + + // Use a very small budget to force budget exhaustion + const result = await runHook( + "Bash", + { command: "echo hello" }, + sessionId, + { VERCEL_PLUGIN_INJECTION_BUDGET: "100" }, + ); + expect(result.code).toBe(0); + + // The recalled skill should be attempted but may be dropped by budget + // Key assertion: the hook does not crash and respects budget + const logs = parseLogLines(result.stderr); + const recallLog = logs.find((l) => l.event === "policy-recall-injected"); + // It should still attempt to inject + expect(recallLog).toBeDefined(); + }); + + test("falls back to wildcard route when exact route has no qualified evidence", async () => { + writeMockPlanState(sessionId, { + storyKind: "flow-verification", + route: "/settings", + targetBoundary: "clientRequest", + }); + + // No exact-route bucket; only wildcard + const policy = buildStrongPolicy( + "verification", + "PreToolUse|flow-verification|clientRequest|Bash|*", + ); + saveProjectRoutingPolicy(TEST_PROJECT, policy); + + const result = await runHook( + "Bash", + { command: "echo hello" }, + sessionId, + ); + expect(result.code).toBe(0); + + const logs = parseLogLines(result.stderr); + const recallLog = logs.find((l) => l.event === "policy-recall-injected"); + expect(recallLog).toBeDefined(); + expect(recallLog!.scenario).toBe("PreToolUse|flow-verification|clientRequest|Bash|*"); + }); + + test("recalled skill is inserted behind direct match, not at slot-1", async () => { + writeMockPlanState(sessionId, { + storyKind: "flow-verification", + route: "/settings", + targetBoundary: "clientRequest", + }); + + // Seed policy for "verification" which won't pattern-match "next.config.ts" + const policy = buildStrongPolicy( + "verification", + "PreToolUse|flow-verification|clientRequest|Read|/settings", + ); + saveProjectRoutingPolicy(TEST_PROJECT, policy); + + // Read next.config.ts — will pattern-match "nextjs" as a direct match + const result = await runHook( + "Read", + { file_path: "next.config.ts" }, + sessionId, + ); + expect(result.code).toBe(0); + + // Check injection metadata: direct match should be first, recalled second + const meta = extractInjectionMeta(result.stdout); + if (meta && meta.injectedSkills.length >= 2) { + // The direct pattern match should remain first + expect(meta.reasons?.[meta.injectedSkills[0]]?.trigger).not.toBe("policy-recall"); + // Verification should be present but not first + const verificationIdx = meta.injectedSkills.indexOf("verification"); + if (verificationIdx !== -1) { + expect(verificationIdx).toBeGreaterThan(0); + } + } + + // Check debug logs confirm insertionIndex > 0 + const logs = parseLogLines(result.stderr); + const recallLog = logs.find((l) => l.event === "policy-recall-injected" && l.skill === "verification"); + if (recallLog) { + expect(recallLog.insertionIndex).toBeGreaterThan(0); + } + }); + + test("recalled skill takes slot-1 when no direct matches exist", async () => { + writeMockPlanState(sessionId, { + storyKind: "flow-verification", + route: "/settings", + targetBoundary: "clientRequest", + }); + + const policy = buildStrongPolicy( + "verification", + "PreToolUse|flow-verification|clientRequest|Bash|/settings", + ); + saveProjectRoutingPolicy(TEST_PROJECT, policy); + + // echo hello won't pattern-match anything + const result = await runHook( + "Bash", + { command: "echo hello" }, + sessionId, + ); + expect(result.code).toBe(0); + + const logs = parseLogLines(result.stderr); + const recallLog = logs.find((l) => l.event === "policy-recall-injected" && l.skill === "verification"); + expect(recallLog).toBeDefined(); + expect(recallLog!.insertionIndex).toBe(0); + }); + + // --------------------------------------------------------------------------- + // Regression: policy recall parity with companion recall + // --------------------------------------------------------------------------- + + test("policy recall still owns attribution when both policy and companion recall are present", async () => { + writeMockPlanState(sessionId, { + storyKind: "flow-verification", + route: "/settings", + targetBoundary: "clientRequest", + }); + + // Strong policy for "verification" + const policy = buildStrongPolicy( + "verification", + "PreToolUse|flow-verification|clientRequest|Bash|/settings", + ); + saveProjectRoutingPolicy(TEST_PROJECT, policy); + + const result = await runHook( + "Bash", + { command: "echo hello" }, + sessionId, + ); + expect(result.code).toBe(0); + + // Policy recall should still be logged + const logs = parseLogLines(result.stderr); + const recallLog = logs.find((l) => l.event === "policy-recall-injected"); + expect(recallLog).toBeDefined(); + expect(recallLog!.skill).toBe("verification"); + + // Attribution candidate should be the policy-recalled skill or direct match, + // never a companion-only skill + const attributionLog = logs.find((l) => l.event === "companion-recall-attribution"); + if (attributionLog) { + // companionRecalledSkills should NOT include the attribution candidate + const companionRecalled = attributionLog.companionRecalledSkills as string[]; + expect(companionRecalled).not.toContain(attributionLog.causalCandidate); + } + }); + + test("companion recall never suppresses a stronger direct pattern match", async () => { + writeMockPlanState(sessionId, { + storyKind: "flow-verification", + route: "/settings", + targetBoundary: "clientRequest", + }); + + // Policy for "verification" — strong history + const policy = buildStrongPolicy( + "verification", + "PreToolUse|flow-verification|clientRequest|Read|/settings", + ); + saveProjectRoutingPolicy(TEST_PROJECT, policy); + + // Read next.config.ts — "nextjs" will be a direct pattern match + const result = await runHook( + "Read", + { file_path: "next.config.ts" }, + sessionId, + ); + expect(result.code).toBe(0); + + const meta = extractInjectionMeta(result.stdout); + if (meta && meta.injectedSkills.length > 0) { + // The first injected skill must be the direct pattern match, not a companion + const firstSkillReason = meta.reasons?.[meta.injectedSkills[0]]; + if (firstSkillReason) { + expect(firstSkillReason.trigger).not.toBe("verified-companion"); + } + } + + // Decision trace should show direct match before any companion entries + const traces = readRoutingDecisionTrace(sessionId); + if (traces.length > 0 && traces[0].ranked && traces[0].ranked.length > 0) { + const first = traces[0].ranked[0] as { pattern?: { type: string } }; + if (first.pattern) { + expect(first.pattern.type).not.toBe("verified-companion"); + } + } + }); + + test("at most one recalled skill in phase 1", async () => { + writeMockPlanState(sessionId, { + storyKind: "flow-verification", + route: "/settings", + targetBoundary: "clientRequest", + }); + + // Two strong skills in the same bucket + const policy = createEmptyRoutingPolicy(); + const key = "PreToolUse|flow-verification|clientRequest|Bash|/settings"; + policy.scenarios[key] = { + verification: { + exposures: 5, wins: 5, directiveWins: 2, staleMisses: 0, + lastUpdatedAt: T0, + }, + "agent-browser-verify": { + exposures: 5, wins: 5, directiveWins: 3, staleMisses: 0, + lastUpdatedAt: T0, + }, + }; + saveProjectRoutingPolicy(TEST_PROJECT, policy); + + const result = await runHook( + "Bash", + { command: "echo hello" }, + sessionId, + ); + expect(result.code).toBe(0); + + const logs = parseLogLines(result.stderr); + const recallLogs = logs.filter((l) => l.event === "policy-recall-injected"); + expect(recallLogs.length).toBeLessThanOrEqual(1); + }); +}); diff --git a/tests/pretooluse-routing-policy-integration.test.ts b/tests/pretooluse-routing-policy-integration.test.ts new file mode 100644 index 0000000..3db423a --- /dev/null +++ b/tests/pretooluse-routing-policy-integration.test.ts @@ -0,0 +1,1002 @@ +import { describe, test, expect, beforeEach, afterEach } from "bun:test"; +import { writeFileSync, unlinkSync, existsSync, readFileSync, mkdirSync, rmSync } from "node:fs"; +import { tmpdir } from "node:os"; +import { join, resolve } from "node:path"; +import { createHash } from "node:crypto"; +import { + deduplicateSkills, + type DeduplicateResult, +} from "../hooks/src/pretooluse-skill-inject.mts"; +import { + readRoutingDecisionTrace, + traceDir, +} from "../hooks/src/routing-decision-trace.mts"; +import { + createEmptyRoutingPolicy, + recordExposure, + recordOutcome, + type RoutingPolicyFile, +} from "../hooks/src/routing-policy.mts"; +import { + projectPolicyPath, + sessionExposurePath, + loadSessionExposures, + saveProjectRoutingPolicy, +} from "../hooks/src/routing-policy-ledger.mts"; +import { + statePath as verificationStatePath, +} from "../hooks/src/verification-ledger.mts"; +import { + saveRulebook, + rulebookPath, + createRule, + createEmptyRulebook, + type LearnedRoutingRulebook, +} from "../hooks/src/learned-routing-rulebook.mts"; +import type { CompiledSkillEntry } from "../hooks/src/patterns.mts"; + +// --------------------------------------------------------------------------- +// Helpers +// --------------------------------------------------------------------------- + +const TEST_PROJECT = "/tmp/test-pretooluse-routing-policy-" + Date.now(); +const TEST_SESSION = "test-session-ptrp-" + Date.now(); + +const T0 = "2026-03-27T04:00:00.000Z"; +const T1 = "2026-03-27T04:01:00.000Z"; +const T2 = "2026-03-27T04:02:00.000Z"; +const T3 = "2026-03-27T04:03:00.000Z"; + +function makeEntry(skill: string, priority: number): CompiledSkillEntry { + return { + skill, + priority, + compiledPaths: [], + compiledBash: [], + compiledImports: [], + }; +} + +function cleanupPolicyFile(): void { + const path = projectPolicyPath(TEST_PROJECT); + try { unlinkSync(path); } catch {} +} + +function cleanupExposureFile(): void { + const path = sessionExposurePath(TEST_SESSION); + try { unlinkSync(path); } catch {} +} + +function cleanupRulebookFile(): void { + const path = rulebookPath(TEST_PROJECT); + try { unlinkSync(path); } catch {} +} + +/** Write a minimal mock verification plan state so loadCachedPlanResult returns a story. */ +function writeMockPlanState(sessionId: string, story?: { + id?: string; + kind?: string; + route?: string | null; + updatedAt?: string; +}): void { + const sp = verificationStatePath(sessionId); + mkdirSync(join(sp, ".."), { recursive: true }); + const s = { + id: story?.id ?? "test-story-1", + kind: story?.kind ?? "deployment", + route: story?.route ?? "/api/test", + promptExcerpt: "test prompt", + createdAt: T0, + updatedAt: story?.updatedAt ?? T1, + requestedSkills: [], + }; + writeFileSync(sp, JSON.stringify({ + version: 1, + stories: [s], + observationIds: [], + satisfiedBoundaries: [], + missingBoundaries: ["clientRequest"], + recentRoutes: [], + primaryNextAction: { targetBoundary: "clientRequest", suggestedAction: "curl test" }, + blockedReasons: [], + })); +} + +function cleanupMockPlanState(sessionId: string): void { + const sp = verificationStatePath(sessionId); + try { rmSync(join(sp, ".."), { recursive: true, force: true }); } catch {} +} + +function buildPolicyWithHistory( + skill: string, + exposures: number, + wins: number, + directiveWins: number, + staleMisses: number, + scenarioKey?: string, +): RoutingPolicyFile { + const policy = createEmptyRoutingPolicy(); + const scenario = scenarioKey ?? "PreToolUse|deployment|clientRequest|Bash"; + policy.scenarios[scenario] = { + [skill]: { + exposures, + wins, + directiveWins, + staleMisses, + lastUpdatedAt: T0, + }, + }; + return policy; +} + +// --------------------------------------------------------------------------- +// Setup / teardown +// --------------------------------------------------------------------------- + +beforeEach(() => { + cleanupPolicyFile(); + cleanupExposureFile(); + cleanupRulebookFile(); + writeMockPlanState(TEST_SESSION); +}); + +afterEach(() => { + cleanupPolicyFile(); + cleanupExposureFile(); + cleanupRulebookFile(); + cleanupMockPlanState(TEST_SESSION); +}); + +// --------------------------------------------------------------------------- +// Tests +// --------------------------------------------------------------------------- + +describe("pretooluse routing-policy integration", () => { + test("DeduplicateResult includes policyBoosted array", () => { + const entries = [makeEntry("agent-browser-verify", 7), makeEntry("next-config", 6)]; + const result = deduplicateSkills({ + matchedEntries: entries, + matched: new Set(["agent-browser-verify", "next-config"]), + toolName: "Bash", + toolInput: { command: "next dev" }, + injectedSkills: new Set(), + dedupOff: false, + cwd: TEST_PROJECT, + sessionId: TEST_SESSION, + }); + + expect(result).toHaveProperty("policyBoosted"); + expect(Array.isArray(result.policyBoosted)).toBe(true); + }); + + test("policyBoosted is empty when no policy file exists", () => { + const entries = [makeEntry("agent-browser-verify", 7)]; + const result = deduplicateSkills({ + matchedEntries: entries, + matched: new Set(["agent-browser-verify"]), + toolName: "Bash", + toolInput: { command: "next dev" }, + injectedSkills: new Set(), + dedupOff: false, + cwd: TEST_PROJECT, + sessionId: TEST_SESSION, + }); + + expect(result.policyBoosted).toEqual([]); + }); + + test("applies positive policy boost when skill has high success rate", () => { + // 5 exposures, 4 wins, 3 directive wins => successRate = (4 + 3*0.25)/5 = 0.95 => +8 boost + const policy = buildPolicyWithHistory("agent-browser-verify", 5, 4, 3, 0); + saveProjectRoutingPolicy(TEST_PROJECT, policy); + + const entries = [makeEntry("agent-browser-verify", 7), makeEntry("next-config", 8)]; + const result = deduplicateSkills({ + matchedEntries: entries, + matched: new Set(["agent-browser-verify", "next-config"]), + toolName: "Bash", + toolInput: { command: "next dev" }, + injectedSkills: new Set(), + dedupOff: false, + cwd: TEST_PROJECT, + sessionId: TEST_SESSION, + }); + + // agent-browser-verify should be boosted: 7 + 8 = 15 > next-config's 8 + expect(result.policyBoosted.length).toBe(1); + expect(result.policyBoosted[0].skill).toBe("agent-browser-verify"); + expect(result.policyBoosted[0].boost).toBe(8); + + // Should be ranked first now despite lower base priority + expect(result.rankedSkills[0]).toBe("agent-browser-verify"); + }); + + test("applies negative policy boost for low success rate", () => { + // 6 exposures, 0 wins => successRate = 0 < 0.15 => -2 boost + const policy = buildPolicyWithHistory("low-success-skill", 6, 0, 0, 5); + saveProjectRoutingPolicy(TEST_PROJECT, policy); + + const entries = [makeEntry("low-success-skill", 7)]; + const result = deduplicateSkills({ + matchedEntries: entries, + matched: new Set(["low-success-skill"]), + toolName: "Bash", + toolInput: { command: "test" }, + injectedSkills: new Set(), + dedupOff: false, + cwd: TEST_PROJECT, + sessionId: TEST_SESSION, + }); + + expect(result.policyBoosted.length).toBe(1); + expect(result.policyBoosted[0].boost).toBe(-2); + }); + + test("no boost applied when exposures below threshold", () => { + // 2 exposures, 2 wins => below min-sample threshold of 3 + const policy = buildPolicyWithHistory("new-skill", 2, 2, 0, 0); + saveProjectRoutingPolicy(TEST_PROJECT, policy); + + const entries = [makeEntry("new-skill", 7)]; + const result = deduplicateSkills({ + matchedEntries: entries, + matched: new Set(["new-skill"]), + toolName: "Bash", + toolInput: { command: "test" }, + injectedSkills: new Set(), + dedupOff: false, + cwd: TEST_PROJECT, + sessionId: TEST_SESSION, + }); + + expect(result.policyBoosted).toEqual([]); + }); + + test("policy boost does not mutate persisted policy file", () => { + const policy = buildPolicyWithHistory("agent-browser-verify", 5, 4, 3, 0); + saveProjectRoutingPolicy(TEST_PROJECT, policy); + + const before = readFileSync(projectPolicyPath(TEST_PROJECT), "utf-8"); + + deduplicateSkills({ + matchedEntries: [makeEntry("agent-browser-verify", 7)], + matched: new Set(["agent-browser-verify"]), + toolName: "Bash", + toolInput: { command: "next dev" }, + injectedSkills: new Set(), + dedupOff: false, + cwd: TEST_PROJECT, + sessionId: TEST_SESSION, + }); + + const after = readFileSync(projectPolicyPath(TEST_PROJECT), "utf-8"); + expect(after).toBe(before); + }); + + test("deterministic ordering when policy scores tie", () => { + // Both skills get no boost (no policy data) — ranking is by base priority then skill name + const entries = [makeEntry("skill-b", 7), makeEntry("skill-a", 7)]; + const result = deduplicateSkills({ + matchedEntries: entries, + matched: new Set(["skill-b", "skill-a"]), + toolName: "Bash", + toolInput: { command: "test" }, + injectedSkills: new Set(), + dedupOff: false, + cwd: TEST_PROJECT, + sessionId: TEST_SESSION, + }); + + // skill-a should come first (alphabetical tiebreak) + expect(result.rankedSkills).toEqual(["skill-a", "skill-b"]); + }); + + test("existing profiler and setup-mode boosts remain intact alongside policy boosts", () => { + // Policy boosts agent-browser-verify, profiler boosts next-config + const policy = buildPolicyWithHistory("agent-browser-verify", 5, 4, 3, 0); + saveProjectRoutingPolicy(TEST_PROJECT, policy); + + const entries = [makeEntry("agent-browser-verify", 5), makeEntry("next-config", 5)]; + const likelySkills = new Set(["next-config"]); + const result = deduplicateSkills({ + matchedEntries: entries, + matched: new Set(["agent-browser-verify", "next-config"]), + toolName: "Bash", + toolInput: { command: "next dev" }, + injectedSkills: new Set(), + dedupOff: false, + likelySkills, + cwd: TEST_PROJECT, + sessionId: TEST_SESSION, + }); + + // Both should be boosted + expect(result.profilerBoosted).toContain("next-config"); + expect(result.policyBoosted.some((p) => p.skill === "agent-browser-verify")).toBe(true); + + // agent-browser-verify: 5 + 8 (policy) = 13 + // next-config: 5 + 5 (profiler) = 10 + expect(result.rankedSkills[0]).toBe("agent-browser-verify"); + }); + + test("policyBoosted contains reason string with scenario stats", () => { + const policy = buildPolicyWithHistory("agent-browser-verify", 5, 4, 3, 1); + saveProjectRoutingPolicy(TEST_PROJECT, policy); + + const entries = [makeEntry("agent-browser-verify", 7)]; + const result = deduplicateSkills({ + matchedEntries: entries, + matched: new Set(["agent-browser-verify"]), + toolName: "Bash", + toolInput: { command: "next dev" }, + injectedSkills: new Set(), + dedupOff: false, + cwd: TEST_PROJECT, + sessionId: TEST_SESSION, + }); + + expect(result.policyBoosted[0].reason).toContain("4 wins"); + expect(result.policyBoosted[0].reason).toContain("5 exposures"); + expect(result.policyBoosted[0].reason).toContain("3 directive wins"); + expect(result.policyBoosted[0].reason).toContain("1 stale miss"); + }); + + test("no cwd means no policy boost is applied", () => { + const policy = buildPolicyWithHistory("agent-browser-verify", 5, 4, 3, 0); + saveProjectRoutingPolicy(TEST_PROJECT, policy); + + const entries = [makeEntry("agent-browser-verify", 7)]; + const result = deduplicateSkills({ + matchedEntries: entries, + matched: new Set(["agent-browser-verify"]), + toolName: "Bash", + toolInput: { command: "next dev" }, + injectedSkills: new Set(), + dedupOff: false, + // no cwd + }); + + expect(result.policyBoosted).toEqual([]); + }); + + test("no policy boost when session has no active verification story", () => { + // Remove mock plan state so no story is found + cleanupMockPlanState(TEST_SESSION); + + const policy = buildPolicyWithHistory("agent-browser-verify", 5, 4, 3, 0, "PreToolUse|none|none|Bash"); + saveProjectRoutingPolicy(TEST_PROJECT, policy); + + const entries = [makeEntry("agent-browser-verify", 7)]; + const result = deduplicateSkills({ + matchedEntries: entries, + matched: new Set(["agent-browser-verify"]), + toolName: "Bash", + toolInput: { command: "next dev" }, + injectedSkills: new Set(), + dedupOff: false, + cwd: TEST_PROJECT, + sessionId: TEST_SESSION, + }); + + // No boosts — story gate prevents policy application + expect(result.policyBoosted).toEqual([]); + }); + + test("does not create none|none scenario keys for exposures", () => { + // Remove mock plan state so no story is found + cleanupMockPlanState(TEST_SESSION); + + const entries = [makeEntry("next-config", 7)]; + deduplicateSkills({ + matchedEntries: entries, + matched: new Set(["next-config"]), + toolName: "Bash", + toolInput: { command: "next dev" }, + injectedSkills: new Set(), + dedupOff: false, + cwd: TEST_PROJECT, + sessionId: TEST_SESSION, + }); + + // No exposures should be recorded when no story exists + const exposures = loadSessionExposures(TEST_SESSION); + const noneNone = exposures.filter( + (e) => e.storyId === null && e.storyKind === null, + ); + expect(noneNone).toEqual([]); + }); + + test("uses selectPrimaryStory for deterministic story attribution", () => { + // Write plan state with two stories, the second one more recently updated + const sp = verificationStatePath(TEST_SESSION); + mkdirSync(join(sp, ".."), { recursive: true }); + writeFileSync(sp, JSON.stringify({ + version: 1, + stories: [ + { + id: "story-older", + kind: "deployment", + route: "/api/old", + promptExcerpt: "old", + createdAt: T0, + updatedAt: T1, + requestedSkills: [], + }, + { + id: "story-newer", + kind: "feature-investigation", + route: "/settings", + promptExcerpt: "new", + createdAt: T2, + updatedAt: T3, + requestedSkills: [], + }, + ], + observationIds: [], + satisfiedBoundaries: [], + missingBoundaries: ["clientRequest"], + recentRoutes: [], + primaryNextAction: { targetBoundary: "clientRequest", suggestedAction: "curl test" }, + blockedReasons: [], + })); + + // Build policy matching the newer story's kind + const policy = buildPolicyWithHistory( + "agent-browser-verify", 5, 4, 3, 0, + "PreToolUse|feature-investigation|clientRequest|Bash", + ); + saveProjectRoutingPolicy(TEST_PROJECT, policy); + + const entries = [makeEntry("agent-browser-verify", 7), makeEntry("next-config", 8)]; + const result = deduplicateSkills({ + matchedEntries: entries, + matched: new Set(["agent-browser-verify", "next-config"]), + toolName: "Bash", + toolInput: { command: "next dev" }, + injectedSkills: new Set(), + dedupOff: false, + cwd: TEST_PROJECT, + sessionId: TEST_SESSION, + }); + + // selectPrimaryStory should pick story-newer (most recently updated) + // and match the "feature-investigation" scenario key + expect(result.policyBoosted.length).toBe(1); + expect(result.policyBoosted[0].skill).toBe("agent-browser-verify"); + expect(result.policyBoosted[0].boost).toBe(8); + }); +}); + +// --------------------------------------------------------------------------- +// Routing decision trace integration tests (PreToolUse) +// --------------------------------------------------------------------------- + +const ROOT = resolve(import.meta.dirname, ".."); +const HOOK_SCRIPT = join(ROOT, "hooks", "pretooluse-skill-inject.mjs"); + +/** Run PreToolUse hook as subprocess */ +async function runPreToolUseHook( + toolName: string, + toolInput: Record, + env?: Record, + sessionId?: string, +): Promise<{ code: number; stdout: string; stderr: string }> { + const sid = sessionId ?? `trace-test-${Date.now()}-${Math.random().toString(36).slice(2)}`; + const payload = JSON.stringify({ + tool_name: toolName, + tool_input: toolInput, + session_id: sid, + cwd: ROOT, + }); + const proc = Bun.spawn(["node", HOOK_SCRIPT], { + stdin: "pipe", + stdout: "pipe", + stderr: "pipe", + env: { + ...process.env, + VERCEL_PLUGIN_SEEN_SKILLS: "", + VERCEL_PLUGIN_LOG_LEVEL: "summary", + ...env, + }, + }); + proc.stdin.write(payload); + proc.stdin.end(); + const code = await proc.exited; + const stdout = await new Response(proc.stdout).text(); + const stderr = await new Response(proc.stderr).text(); + return { code, stdout, stderr }; +} + +describe("pretooluse routing decision trace", () => { + let traceSession: string; + + beforeEach(() => { + traceSession = `trace-ptu-${Date.now()}-${Math.random().toString(36).slice(2)}`; + }); + + afterEach(() => { + try { rmSync(traceDir(traceSession), { recursive: true, force: true }); } catch {} + cleanupMockPlanState(traceSession); + }); + + test("emits exactly one trace per ranking/injection attempt", async () => { + const { code } = await runPreToolUseHook( + "Bash", + { command: "npx next dev" }, + {}, + traceSession, + ); + expect(code).toBe(0); + + const traces = readRoutingDecisionTrace(traceSession); + expect(traces).toHaveLength(1); + expect(traces[0].hook).toBe("PreToolUse"); + expect(traces[0].version).toBe(2); + expect(traces[0].toolName).toBe("Bash"); + expect(traces[0].sessionId).toBe(traceSession); + expect(traces[0].decisionId).toMatch(/^[0-9a-f]{16}$/); + expect(Array.isArray(traces[0].matchedSkills)).toBe(true); + expect(Array.isArray(traces[0].injectedSkills)).toBe(true); + expect(Array.isArray(traces[0].ranked)).toBe(true); + }); + + test("records no_active_verification_story when no story exists", async () => { + const { code } = await runPreToolUseHook( + "Bash", + { command: "npx next dev" }, + {}, + traceSession, + ); + expect(code).toBe(0); + + const traces = readRoutingDecisionTrace(traceSession); + expect(traces).toHaveLength(1); + expect(traces[0].skippedReasons).toContain("no_active_verification_story"); + expect(traces[0].policyScenario).toBeNull(); + }); + + test("records primaryStory and policyScenario when verification story exists", async () => { + writeMockPlanState(traceSession, { + id: "trace-story-1", + kind: "deployment", + route: "/api/test", + }); + + const { code } = await runPreToolUseHook( + "Bash", + { command: "npx next dev" }, + {}, + traceSession, + ); + expect(code).toBe(0); + + const traces = readRoutingDecisionTrace(traceSession); + expect(traces).toHaveLength(1); + expect(traces[0].primaryStory.id).toBe("trace-story-1"); + expect(traces[0].primaryStory.kind).toBe("deployment"); + expect(traces[0].policyScenario).toMatch(/^PreToolUse\|deployment\|/); + expect(traces[0].skippedReasons).not.toContain("no_active_verification_story"); + }); + + test("does not emit synthetic none|none policyScenario without story", async () => { + const { code } = await runPreToolUseHook( + "Bash", + { command: "npx next dev" }, + {}, + traceSession, + ); + expect(code).toBe(0); + + const traces = readRoutingDecisionTrace(traceSession); + expect(traces).toHaveLength(1); + expect(traces[0].policyScenario).toBeNull(); + }); + + test("ranked entries include droppedReason for cap/budget drops", async () => { + const { code } = await runPreToolUseHook( + "Read", + { file_path: "next.config.mjs" }, + { + VERCEL_PLUGIN_INJECTION_BUDGET: "500", + }, + traceSession, + ); + expect(code).toBe(0); + + const traces = readRoutingDecisionTrace(traceSession); + expect(traces).toHaveLength(1); + + for (const reason of traces[0].skippedReasons) { + if (reason.startsWith("cap_exceeded:")) { + const skill = reason.replace("cap_exceeded:", ""); + const ranked = traces[0].ranked.find((r) => r.skill === skill); + if (ranked) { + expect(ranked.droppedReason).toBe("cap_exceeded"); + } + } + if (reason.startsWith("budget_exhausted:")) { + const skill = reason.replace("budget_exhausted:", ""); + const ranked = traces[0].ranked.find((r) => r.skill === skill); + if (ranked) { + expect(ranked.droppedReason).toBe("budget_exhausted"); + } + } + } + }); + + test("emits routing.decision_trace_written summary log", async () => { + const { code, stderr } = await runPreToolUseHook( + "Bash", + { command: "npx next dev" }, + { VERCEL_PLUGIN_LOG_LEVEL: "summary" }, + traceSession, + ); + expect(code).toBe(0); + + const logLines = stderr + .split("\n") + .filter((l) => l.trim()) + .map((l) => { try { return JSON.parse(l); } catch { return null; } }) + .filter((o): o is Record => o !== null); + + const traceLog = logLines.find( + (l) => l.event === "routing.decision_trace_written", + ); + expect(traceLog).toBeDefined(); + expect(traceLog!.hook).toBe("PreToolUse"); + expect(traceLog!.decisionId).toMatch(/^[0-9a-f]{16}$/); + }); +}); + +// --------------------------------------------------------------------------- +// Manifest parity: manifest vs live-scan produce identical routing decisions +// --------------------------------------------------------------------------- + +const MANIFEST_PATH = join(ROOT, "generated", "skill-manifest.json"); + +/** + * Parse skillInjection metadata from hook stdout JSON. + * Returns a normalized comparison object suitable for deep equality. + */ +function parseSkillInjection(stdout: string): { + matchedSkills: string[]; + injectedSkills: string[]; + reasons: Record; +} | null { + try { + const output = JSON.parse(stdout); + const ctx = output.hookSpecificOutput?.additionalContext || ""; + const siMatch = ctx.match(//); + if (!siMatch) return null; + const si = JSON.parse(siMatch[1]); + return { + matchedSkills: [...(si.matchedSkills ?? [])].sort(), + injectedSkills: [...(si.injectedSkills ?? [])].sort(), + reasons: si.reasons ?? {}, + }; + } catch { + return null; + } +} + +describe("manifest vs live-scan parity", () => { + const paritySession = () => + `parity-${Date.now()}-${Math.random().toString(36).slice(2)}`; + + test("identical matched/injected skills and reasons for a Bash next-dev input", async () => { + const { renameSync } = await import("node:fs"); + const backupPath = MANIFEST_PATH + ".bak"; + + // 1. Run with manifest + const sid1 = paritySession(); + const withManifest = await runPreToolUseHook( + "Bash", + { command: "npx next dev" }, + { VERCEL_PLUGIN_LOG_LEVEL: "off" }, + sid1, + ); + expect(withManifest.code).toBe(0); + const manifestResult = parseSkillInjection(withManifest.stdout); + expect(manifestResult).not.toBeNull(); + + // 2. Run without manifest (live scan) + renameSync(MANIFEST_PATH, backupPath); + try { + const sid2 = paritySession(); + const withoutManifest = await runPreToolUseHook( + "Bash", + { command: "npx next dev" }, + { VERCEL_PLUGIN_LOG_LEVEL: "off" }, + sid2, + ); + expect(withoutManifest.code).toBe(0); + const liveScanResult = parseSkillInjection(withoutManifest.stdout); + expect(liveScanResult).not.toBeNull(); + + // 3. Assert parity — matched skills, injected skills, and match reasons + const comparison = { + manifest: { + matchedSkills: manifestResult!.matchedSkills, + injectedSkills: manifestResult!.injectedSkills, + reasons: normalizeReasons(manifestResult!.reasons), + }, + liveScan: { + matchedSkills: liveScanResult!.matchedSkills, + injectedSkills: liveScanResult!.injectedSkills, + reasons: normalizeReasons(liveScanResult!.reasons), + }, + }; + + // Emit normalized comparison JSON for agent observability + console.error(JSON.stringify({ + event: "parity.comparison", + tool: "Bash", + input: "npx next dev", + ...comparison, + })); + + expect(comparison.manifest.matchedSkills).toEqual(comparison.liveScan.matchedSkills); + expect(comparison.manifest.injectedSkills).toEqual(comparison.liveScan.injectedSkills); + expect(comparison.manifest.reasons).toEqual(comparison.liveScan.reasons); + } finally { + renameSync(backupPath, MANIFEST_PATH); + } + + // Cleanup trace dirs + try { rmSync(traceDir(sid1), { recursive: true, force: true }); } catch {} + }); + + test("identical matched/injected skills for a Read next.config input", async () => { + const { renameSync } = await import("node:fs"); + const backupPath = MANIFEST_PATH + ".bak"; + + const sid1 = paritySession(); + const withManifest = await runPreToolUseHook( + "Read", + { file_path: "next.config.ts" }, + { VERCEL_PLUGIN_LOG_LEVEL: "off" }, + sid1, + ); + expect(withManifest.code).toBe(0); + const manifestResult = parseSkillInjection(withManifest.stdout); + expect(manifestResult).not.toBeNull(); + + renameSync(MANIFEST_PATH, backupPath); + try { + const sid2 = paritySession(); + const withoutManifest = await runPreToolUseHook( + "Read", + { file_path: "next.config.ts" }, + { VERCEL_PLUGIN_LOG_LEVEL: "off" }, + sid2, + ); + expect(withoutManifest.code).toBe(0); + const liveScanResult = parseSkillInjection(withoutManifest.stdout); + expect(liveScanResult).not.toBeNull(); + + const comparison = { + manifest: { + matchedSkills: manifestResult!.matchedSkills, + injectedSkills: manifestResult!.injectedSkills, + }, + liveScan: { + matchedSkills: liveScanResult!.matchedSkills, + injectedSkills: liveScanResult!.injectedSkills, + }, + }; + + console.error(JSON.stringify({ + event: "parity.comparison", + tool: "Read", + input: "next.config.ts", + ...comparison, + })); + + expect(comparison.manifest.matchedSkills).toEqual(comparison.liveScan.matchedSkills); + expect(comparison.manifest.injectedSkills).toEqual(comparison.liveScan.injectedSkills); + } finally { + renameSync(backupPath, MANIFEST_PATH); + } + + try { rmSync(traceDir(sid1), { recursive: true, force: true }); } catch {} + }); +}); + +// --------------------------------------------------------------------------- +// Learned-routing-rulebook precedence tests +// --------------------------------------------------------------------------- + +describe("pretooluse rulebook precedence", () => { + function makeRulebook(rules: Array<{ + scenario: string; + skill: string; + boost: number; + action?: "promote" | "demote"; + reason?: string; + }>): LearnedRoutingRulebook { + const rb = createEmptyRulebook("test-sess", T0); + for (const r of rules) { + rb.rules.push(createRule({ + scenario: r.scenario, + skill: r.skill, + action: r.action ?? "promote", + boost: r.boost, + confidence: 0.9, + reason: r.reason ?? "replay verified: no regressions", + sourceSessionId: "test-sess", + promotedAt: T0, + evidence: { + baselineWins: 4, + baselineDirectiveWins: 2, + learnedWins: 4, + learnedDirectiveWins: 2, + regressionCount: 0, + }, + })); + } + return rb; + } + + test("DeduplicateResult includes rulebookBoosted array", () => { + const entries = [makeEntry("agent-browser-verify", 7)]; + const result = deduplicateSkills({ + matchedEntries: entries, + matched: new Set(["agent-browser-verify"]), + toolName: "Bash", + toolInput: { command: "next dev" }, + injectedSkills: new Set(), + dedupOff: false, + cwd: TEST_PROJECT, + sessionId: TEST_SESSION, + }); + + expect(result).toHaveProperty("rulebookBoosted"); + expect(Array.isArray(result.rulebookBoosted)).toBe(true); + }); + + test("rulebookBoosted is empty when no rulebook exists", () => { + const entries = [makeEntry("agent-browser-verify", 7)]; + const result = deduplicateSkills({ + matchedEntries: entries, + matched: new Set(["agent-browser-verify"]), + toolName: "Bash", + toolInput: { command: "next dev" }, + injectedSkills: new Set(), + dedupOff: false, + cwd: TEST_PROJECT, + sessionId: TEST_SESSION, + }); + + expect(result.rulebookBoosted).toEqual([]); + }); + + test("rulebook boost takes precedence over stats-policy boost", () => { + // Set up stats-policy that would give +8 + const policy = buildPolicyWithHistory("agent-browser-verify", 5, 4, 3, 0); + saveProjectRoutingPolicy(TEST_PROJECT, policy); + + // Set up rulebook that gives +10 + const rulebook = makeRulebook([{ + scenario: "PreToolUse|deployment|clientRequest|Bash", + skill: "agent-browser-verify", + boost: 10, + }]); + saveRulebook(TEST_PROJECT, rulebook); + + const entries = [makeEntry("agent-browser-verify", 5), makeEntry("next-config", 8)]; + const result = deduplicateSkills({ + matchedEntries: entries, + matched: new Set(["agent-browser-verify", "next-config"]), + toolName: "Bash", + toolInput: { command: "next dev" }, + injectedSkills: new Set(), + dedupOff: false, + cwd: TEST_PROJECT, + sessionId: TEST_SESSION, + }); + + // Rulebook match should be present + expect(result.rulebookBoosted.length).toBe(1); + expect(result.rulebookBoosted[0].skill).toBe("agent-browser-verify"); + expect(result.rulebookBoosted[0].ruleBoost).toBe(10); + expect(result.rulebookBoosted[0].matchedRuleId).toBe( + "PreToolUse|deployment|clientRequest|Bash|agent-browser-verify", + ); + + // Stats-policy should be suppressed for that skill (not double-boosted) + expect(result.policyBoosted.find((p) => p.skill === "agent-browser-verify")).toBeUndefined(); + + // Effective priority: 5 (base) + 10 (rule) = 15 > next-config's 8 + expect(result.rankedSkills[0]).toBe("agent-browser-verify"); + }); + + test("stats-policy boost still applies for skills without rulebook match", () => { + // Stats-policy for next-config: +8 + const policy = buildPolicyWithHistory("next-config", 5, 4, 3, 0); + saveProjectRoutingPolicy(TEST_PROJECT, policy); + + // Rulebook only has a rule for agent-browser-verify + const rulebook = makeRulebook([{ + scenario: "PreToolUse|deployment|clientRequest|Bash", + skill: "agent-browser-verify", + boost: 3, + }]); + saveRulebook(TEST_PROJECT, rulebook); + + const entries = [makeEntry("agent-browser-verify", 5), makeEntry("next-config", 5)]; + const result = deduplicateSkills({ + matchedEntries: entries, + matched: new Set(["agent-browser-verify", "next-config"]), + toolName: "Bash", + toolInput: { command: "next dev" }, + injectedSkills: new Set(), + dedupOff: false, + cwd: TEST_PROJECT, + sessionId: TEST_SESSION, + }); + + // next-config should still get stats-policy boost (+8) + expect(result.policyBoosted.find((p) => p.skill === "next-config")?.boost).toBe(8); + // next-config: 5 + 8 = 13 > agent-browser-verify: 5 + 3 = 8 + expect(result.rankedSkills[0]).toBe("next-config"); + }); + + test("route-scoped rule only affects its intended scenario", () => { + // Rulebook rule scoped to deployment|clientRequest + const rulebook = makeRulebook([{ + scenario: "PreToolUse|deployment|clientRequest|Bash", + skill: "agent-browser-verify", + boost: 10, + }]); + saveRulebook(TEST_PROJECT, rulebook); + + // Test with a different story kind (uiRender boundary) + cleanupMockPlanState(TEST_SESSION); + writeMockPlanState(TEST_SESSION, { kind: "feature" }); + + const entries = [makeEntry("agent-browser-verify", 5), makeEntry("next-config", 5)]; + const result = deduplicateSkills({ + matchedEntries: entries, + matched: new Set(["agent-browser-verify", "next-config"]), + toolName: "Bash", + toolInput: { command: "next dev" }, + injectedSkills: new Set(), + dedupOff: false, + cwd: TEST_PROJECT, + sessionId: TEST_SESSION, + }); + + // No rulebook match — the rule is for deployment|clientRequest, not feature|clientRequest + expect(result.rulebookBoosted).toEqual([]); + }); + + test("empty rulebook has no effect", () => { + const rulebook = createEmptyRulebook("test-sess", T0); + saveRulebook(TEST_PROJECT, rulebook); + + const entries = [makeEntry("agent-browser-verify", 7)]; + const result = deduplicateSkills({ + matchedEntries: entries, + matched: new Set(["agent-browser-verify"]), + toolName: "Bash", + toolInput: { command: "next dev" }, + injectedSkills: new Set(), + dedupOff: false, + cwd: TEST_PROJECT, + sessionId: TEST_SESSION, + }); + + expect(result.rulebookBoosted).toEqual([]); + }); +}); + +/** Sort reason keys for deterministic comparison */ +function normalizeReasons( + reasons: Record, +): Record { + const sorted: Record = {}; + for (const key of Object.keys(reasons).sort()) { + sorted[key] = reasons[key]; + } + return sorted; +} diff --git a/tests/pretooluse-skill-inject.test.ts b/tests/pretooluse-skill-inject.test.ts index 5a635d1..8162926 100644 --- a/tests/pretooluse-skill-inject.test.ts +++ b/tests/pretooluse-skill-inject.test.ts @@ -16,6 +16,10 @@ const TEMP_HOOK_RUNTIME_MODULES = [ "hook-env.mjs", "compat.mjs", "telemetry.mjs", + "routing-policy.mjs", + "routing-policy-ledger.mjs", + "verification-plan.mjs", + "verification-ledger.mjs", ] as const; function copyTempHookRuntime( diff --git a/tests/pretooluse-verification-directive.test.ts b/tests/pretooluse-verification-directive.test.ts new file mode 100644 index 0000000..bc8d8ab --- /dev/null +++ b/tests/pretooluse-verification-directive.test.ts @@ -0,0 +1,280 @@ +import { describe, test, expect, beforeEach, afterEach } from "bun:test"; +import { writeFileSync, mkdirSync, rmSync } from "node:fs"; +import { join } from "node:path"; +import { + buildVerificationDirective, + buildVerificationEnv, + resolveVerificationRuntimeState, + type VerificationDirective, +} from "../hooks/src/verification-directive.mts"; +import { + statePath as verificationStatePath, +} from "../hooks/src/verification-ledger.mts"; +import type { + VerificationPlanResult, + VerificationPlanStorySummary, +} from "../hooks/src/verification-plan.mts"; + +// --------------------------------------------------------------------------- +// Fixtures +// --------------------------------------------------------------------------- + +const T0 = "2026-03-27T06:00:00.000Z"; +const T1 = "2026-03-27T06:01:00.000Z"; +const T2 = "2026-03-27T06:02:00.000Z"; +const T3 = "2026-03-27T06:03:00.000Z"; + +function makeStory(overrides: Partial = {}): VerificationPlanStorySummary { + return { + id: "story-1", + kind: "flow-verification", + route: "/settings", + promptExcerpt: "verify settings flow", + createdAt: T0, + updatedAt: T1, + ...overrides, + }; +} + +function makePlan(overrides: Partial = {}): VerificationPlanResult { + return { + hasStories: true, + stories: [makeStory()], + observationCount: 1, + satisfiedBoundaries: ["serverHandler"], + missingBoundaries: ["clientRequest", "uiRender", "environment"], + recentRoutes: ["/settings"], + primaryNextAction: { + targetBoundary: "clientRequest", + action: "curl http://localhost:3000/settings", + reason: "No HTTP request observation yet", + }, + blockedReasons: [], + ...overrides, + }; +} + +function sessionId(): string { + return `directive-test-${Date.now()}-${Math.random().toString(36).slice(2, 8)}`; +} + +function writeMockPlanState(sid: string, plan: VerificationPlanResult): void { + const sp = verificationStatePath(sid); + mkdirSync(join(sp, ".."), { recursive: true }); + writeFileSync(sp, JSON.stringify({ + version: 1, + stories: plan.stories, + observationIds: [], + satisfiedBoundaries: plan.satisfiedBoundaries, + missingBoundaries: plan.missingBoundaries, + recentRoutes: plan.recentRoutes, + primaryNextAction: plan.primaryNextAction, + blockedReasons: plan.blockedReasons, + })); +} + +function cleanupPlanState(sid: string): void { + const sp = verificationStatePath(sid); + try { rmSync(join(sp, ".."), { recursive: true, force: true }); } catch {} +} + +// --------------------------------------------------------------------------- +// Unit: buildVerificationDirective +// --------------------------------------------------------------------------- + +describe("buildVerificationDirective", () => { + test("returns null for null plan", () => { + expect(buildVerificationDirective(null)).toBeNull(); + }); + + test("returns null for plan with no stories", () => { + expect(buildVerificationDirective(makePlan({ hasStories: false, stories: [] }))).toBeNull(); + }); + + test("returns null for plan with hasStories true but empty array", () => { + expect(buildVerificationDirective(makePlan({ stories: [] }))).toBeNull(); + }); + + test("builds directive from plan with route and primaryNextAction", () => { + const directive = buildVerificationDirective(makePlan()); + expect(directive).not.toBeNull(); + expect(directive!.version).toBe(1); + expect(directive!.storyId).toBe("story-1"); + expect(directive!.storyKind).toBe("flow-verification"); + expect(directive!.route).toBe("/settings"); + expect(directive!.missingBoundaries).toEqual(["clientRequest", "uiRender", "environment"]); + expect(directive!.satisfiedBoundaries).toEqual(["serverHandler"]); + expect(directive!.primaryNextAction).toEqual({ + targetBoundary: "clientRequest", + action: "curl http://localhost:3000/settings", + reason: "No HTTP request observation yet", + }); + expect(directive!.blockedReasons).toEqual([]); + }); + + test("selects most recently updated story when multiple exist", () => { + const plan = makePlan({ + stories: [ + makeStory({ id: "old", updatedAt: T0 }), + makeStory({ id: "newer", updatedAt: T3, route: "/dashboard" }), + ], + }); + const directive = buildVerificationDirective(plan); + expect(directive!.storyId).toBe("newer"); + expect(directive!.route).toBe("/dashboard"); + }); + + test("directive with null route preserves null", () => { + const plan = makePlan({ stories: [makeStory({ route: null })] }); + const directive = buildVerificationDirective(plan); + expect(directive!.route).toBeNull(); + }); +}); + +// --------------------------------------------------------------------------- +// Unit: buildVerificationEnv +// --------------------------------------------------------------------------- + +describe("buildVerificationEnv", () => { + const CLEARING_KEYS = [ + "VERCEL_PLUGIN_VERIFICATION_STORY_ID", + "VERCEL_PLUGIN_VERIFICATION_ROUTE", + "VERCEL_PLUGIN_VERIFICATION_BOUNDARY", + "VERCEL_PLUGIN_VERIFICATION_ACTION", + ]; + + test("returns clearing values (empty strings) for null directive", () => { + const env = buildVerificationEnv(null); + for (const key of CLEARING_KEYS) { + expect(env[key]).toBe(""); + } + }); + + test("returns clearing values when directive has no primaryNextAction", () => { + const directive: VerificationDirective = { + version: 1, + storyId: "story-1", + storyKind: "flow-verification", + route: "/settings", + missingBoundaries: [], + satisfiedBoundaries: ["clientRequest"], + primaryNextAction: null, + blockedReasons: [], + }; + const env = buildVerificationEnv(directive); + for (const key of CLEARING_KEYS) { + expect(env[key]).toBe(""); + } + }); + + test("exports all four directive env keys for active directive", () => { + const directive = buildVerificationDirective(makePlan())!; + const env = buildVerificationEnv(directive); + expect(env.VERCEL_PLUGIN_VERIFICATION_STORY_ID).toBe("story-1"); + expect(env.VERCEL_PLUGIN_VERIFICATION_ROUTE).toBe("/settings"); + expect(env.VERCEL_PLUGIN_VERIFICATION_BOUNDARY).toBe("clientRequest"); + expect(env.VERCEL_PLUGIN_VERIFICATION_ACTION).toBe("curl http://localhost:3000/settings"); + }); + + test("exports empty string for null route in active directive", () => { + const plan = makePlan({ stories: [makeStory({ route: null })] }); + const directive = buildVerificationDirective(plan)!; + const env = buildVerificationEnv(directive); + expect(env.VERCEL_PLUGIN_VERIFICATION_ROUTE).toBe(""); + }); +}); + +// --------------------------------------------------------------------------- +// Integration: resolveVerificationRuntimeState +// --------------------------------------------------------------------------- + +describe("resolveVerificationRuntimeState", () => { + let sid: string; + + beforeEach(() => { + sid = sessionId(); + }); + + afterEach(() => { + cleanupPlanState(sid); + }); + + test("returns clearing env and nulls for null sessionId", () => { + const state = resolveVerificationRuntimeState(null); + expect(state.plan).toBeNull(); + expect(state.directive).toBeNull(); + expect(state.banner).toBeNull(); + expect(state.env.VERCEL_PLUGIN_VERIFICATION_STORY_ID).toBe(""); + expect(state.env.VERCEL_PLUGIN_VERIFICATION_ROUTE).toBe(""); + expect(state.env.VERCEL_PLUGIN_VERIFICATION_BOUNDARY).toBe(""); + expect(state.env.VERCEL_PLUGIN_VERIFICATION_ACTION).toBe(""); + }); + + test("returns clearing env when session has no stories", () => { + const state = resolveVerificationRuntimeState(sid); + expect(state.plan).toBeNull(); + expect(state.directive).toBeNull(); + expect(state.banner).toBeNull(); + expect(state.env.VERCEL_PLUGIN_VERIFICATION_ACTION).toBe(""); + }); + + test("exports banner and directive env for routed story", () => { + writeMockPlanState(sid, makePlan()); + + const state = resolveVerificationRuntimeState(sid); + expect(state.plan).not.toBeNull(); + expect(state.directive).not.toBeNull(); + expect(state.directive!.storyId).toBe("story-1"); + expect(state.directive!.route).toBe("/settings"); + + // Banner contains verification plan marker + expect(state.banner).not.toBeNull(); + expect(state.banner).toContain("Verification Plan"); + + // Env exports all four keys + expect(state.env.VERCEL_PLUGIN_VERIFICATION_STORY_ID).toBe("story-1"); + expect(state.env.VERCEL_PLUGIN_VERIFICATION_ROUTE).toBe("/settings"); + expect(state.env.VERCEL_PLUGIN_VERIFICATION_BOUNDARY).toBe("clientRequest"); + expect(state.env.VERCEL_PLUGIN_VERIFICATION_ACTION).toBe("curl http://localhost:3000/settings"); + }); + + test("is idempotent — same session returns identical state", () => { + writeMockPlanState(sid, makePlan()); + + const first = resolveVerificationRuntimeState(sid); + const second = resolveVerificationRuntimeState(sid); + + expect(first.env).toEqual(second.env); + expect(first.directive).toEqual(second.directive); + expect(first.banner).toEqual(second.banner); + }); + + test("banner is null when all boundaries are satisfied", () => { + const plan = makePlan({ + missingBoundaries: [], + satisfiedBoundaries: ["clientRequest", "serverHandler", "uiRender", "environment"], + primaryNextAction: null, + }); + writeMockPlanState(sid, plan); + + const state = resolveVerificationRuntimeState(sid); + // Directive exists but has no action, so env is clearing + expect(state.env.VERCEL_PLUGIN_VERIFICATION_ACTION).toBe(""); + }); + + test("survives errors gracefully and returns clearing state", () => { + // Pass a session ID that has a corrupt state file + const corruptSid = sessionId(); + const sp = verificationStatePath(corruptSid); + mkdirSync(join(sp, ".."), { recursive: true }); + writeFileSync(sp, "{{not json}}"); + + try { + const state = resolveVerificationRuntimeState(corruptSid); + expect(state.plan).toBeNull(); + expect(state.env.VERCEL_PLUGIN_VERIFICATION_STORY_ID).toBe(""); + } finally { + cleanupPlanState(corruptSid); + } + }); +}); diff --git a/tests/prompt-policy-recall.test.ts b/tests/prompt-policy-recall.test.ts new file mode 100644 index 0000000..0e1dd18 --- /dev/null +++ b/tests/prompt-policy-recall.test.ts @@ -0,0 +1,221 @@ +import { describe, expect, test } from "bun:test"; +import { + createEmptyRoutingPolicy, + recordExposure, + recordOutcome, +} from "../hooks/src/routing-policy.mts"; +import { applyPromptPolicyRecall } from "../hooks/src/prompt-policy-recall.mts"; + +function buildWinningPolicy(skill: string) { + const policy = createEmptyRoutingPolicy(); + const scenario = { + hook: "UserPromptSubmit" as const, + storyKind: "flow-verification", + targetBoundary: "clientRequest" as const, + toolName: "Prompt" as const, + routeScope: "/settings", + skill, + }; + for (let i = 0; i < 5; i += 1) { + recordExposure(policy, { + ...scenario, + now: `2026-03-28T00:00:0${i}.000Z`, + }); + recordOutcome(policy, { + ...scenario, + outcome: "win", + now: `2026-03-28T00:10:0${i}.000Z`, + }); + } + return policy; +} + +function buildOrderedWinningPolicy(skills: string[]) { + const policy = createEmptyRoutingPolicy(); + for (const [idx, skill] of skills.entries()) { + const scenario = { + hook: "UserPromptSubmit" as const, + storyKind: "flow-verification", + targetBoundary: "clientRequest" as const, + toolName: "Prompt" as const, + routeScope: "/settings", + skill, + }; + for (let i = 0; i < 5; i += 1) { + recordExposure(policy, { + ...scenario, + now: `2026-03-28T00:${String(idx).padStart(2, "0")}:${i}0.000Z`, + }); + recordOutcome(policy, { + ...scenario, + outcome: "win", + now: `2026-03-28T00:${String(idx).padStart(2, "0")}:${i}5.000Z`, + }); + } + } + return policy; +} + +describe("applyPromptPolicyRecall", () => { + test("recalls a verified winner when prompt matching found nothing", () => { + const policy = buildWinningPolicy("verification"); + const result = applyPromptPolicyRecall({ + selectedSkills: [], + matchedSkills: [], + seenSkills: [], + maxSkills: 2, + binding: { + storyId: "story-1", + storyKind: "flow-verification", + route: "/settings", + targetBoundary: "clientRequest", + }, + policy, + }); + expect(result.selectedSkills).toEqual(["verification"]); + expect(result.matchedSkills).toEqual(["verification"]); + expect(result.syntheticSkills).toEqual(["verification"]); + expect(result.reasons["verification"]).toContain( + "route-scoped verified policy recall", + ); + }); + + test("inserts the recalled skill into slot 2 without displacing the explicit prompt match", () => { + const policy = buildWinningPolicy("verification"); + const result = applyPromptPolicyRecall({ + selectedSkills: ["investigation-mode"], + matchedSkills: ["investigation-mode"], + seenSkills: [], + maxSkills: 2, + binding: { + storyId: "story-1", + storyKind: "flow-verification", + route: "/settings", + targetBoundary: "clientRequest", + }, + policy, + }); + expect(result.selectedSkills).toEqual([ + "investigation-mode", + "verification", + ]); + expect(result.syntheticSkills).toEqual(["verification"]); + }); + + test("does not recall a skill that is already seen", () => { + const policy = buildWinningPolicy("verification"); + const result = applyPromptPolicyRecall({ + selectedSkills: [], + matchedSkills: [], + seenSkills: ["verification"], + maxSkills: 2, + binding: { + storyId: "story-1", + storyKind: "flow-verification", + route: "/settings", + targetBoundary: "clientRequest", + }, + policy, + }); + expect(result.selectedSkills).toEqual([]); + expect(result.syntheticSkills).toEqual([]); + }); + + test("preserves diagnosis order when multiple recalled skills are inserted", () => { + const policy = buildOrderedWinningPolicy(["verification", "investigation"]); + const result = applyPromptPolicyRecall({ + selectedSkills: ["explicit"], + matchedSkills: ["explicit"], + seenSkills: [], + maxSkills: 3, + binding: { + storyId: "story-1", + storyKind: "flow-verification", + route: "/settings", + targetBoundary: "clientRequest", + }, + policy, + }); + const recalledOrder = + result.diagnosis?.selected.map((candidate) => candidate.skill) ?? []; + expect(result.selectedSkills).toEqual(["explicit", ...recalledOrder]); + expect(result.syntheticSkills).toEqual(recalledOrder); + }); + + test("returns unchanged when no storyId", () => { + const policy = buildWinningPolicy("verification"); + const result = applyPromptPolicyRecall({ + selectedSkills: ["existing"], + matchedSkills: ["existing"], + maxSkills: 2, + binding: { + storyId: null, + storyKind: null, + route: null, + targetBoundary: "clientRequest", + }, + policy, + }); + expect(result.selectedSkills).toEqual(["existing"]); + expect(result.syntheticSkills).toEqual([]); + expect(result.diagnosis).toBeNull(); + }); + + test("returns unchanged when no targetBoundary", () => { + const policy = buildWinningPolicy("verification"); + const result = applyPromptPolicyRecall({ + selectedSkills: [], + matchedSkills: [], + maxSkills: 2, + binding: { + storyId: "story-1", + storyKind: "flow-verification", + route: "/settings", + targetBoundary: null, + }, + policy, + }); + expect(result.selectedSkills).toEqual([]); + expect(result.syntheticSkills).toEqual([]); + expect(result.diagnosis).toBeNull(); + }); + + test("does not recall when all slots are full", () => { + const policy = buildWinningPolicy("verification"); + const result = applyPromptPolicyRecall({ + selectedSkills: ["a", "b"], + matchedSkills: ["a", "b"], + maxSkills: 2, + binding: { + storyId: "story-1", + storyKind: "flow-verification", + route: "/settings", + targetBoundary: "clientRequest", + }, + policy, + }); + expect(result.selectedSkills).toEqual(["a", "b"]); + expect(result.syntheticSkills).toEqual([]); + expect(result.diagnosis).toBeNull(); + }); + + test("does not mutate caller arrays", () => { + const policy = buildWinningPolicy("verification"); + const selected = ["existing"]; + const matched = ["existing"]; + applyPromptPolicyRecall({ + selectedSkills: selected, + matchedSkills: matched, + maxSkills: 3, + binding: { + storyId: "story-1", + storyKind: "flow-verification", + route: "/settings", + targetBoundary: "clientRequest", + }, + policy, + }); + expect(selected).toEqual(["existing"]); + expect(matched).toEqual(["existing"]); + }); +}); diff --git a/tests/prompt-verification-binding-closure.test.ts b/tests/prompt-verification-binding-closure.test.ts new file mode 100644 index 0000000..c3a13c4 --- /dev/null +++ b/tests/prompt-verification-binding-closure.test.ts @@ -0,0 +1,96 @@ +import { describe, expect, test } from "bun:test"; +import { mkdtempSync, rmSync, unlinkSync } from "node:fs"; +import { tmpdir } from "node:os"; +import { join } from "node:path"; +import { + appendSkillExposure, + resolveBoundaryOutcome, + sessionExposurePath, +} from "../hooks/src/routing-policy-ledger.mjs"; + +describe("prompt verification binding closure", () => { + test("bound prompt exposure resolves on matching posttool boundary", () => { + const projectRoot = mkdtempSync(join(tmpdir(), "vercel-plugin-binding-")); + const sessionId = `sess-${Date.now()}`; + + appendSkillExposure({ + id: `${sessionId}:prompt:verification:1`, + sessionId, + projectRoot, + storyId: "story-1", + storyKind: "flow-verification", + route: "/settings", + hook: "UserPromptSubmit", + toolName: "Prompt", + skill: "verification", + targetBoundary: "serverHandler", + exposureGroupId: "group-1", + attributionRole: "candidate", + candidateSkill: "verification", + createdAt: "2026-03-28T00:00:00.000Z", + resolvedAt: null, + outcome: "pending", + }); + + const resolved = resolveBoundaryOutcome({ + sessionId, + boundary: "serverHandler", + matchedSuggestedAction: true, + storyId: "story-1", + route: "/settings", + now: "2026-03-28T00:01:00.000Z", + }); + + expect(resolved).toHaveLength(1); + expect(resolved[0]?.outcome).toBe("directive-win"); + + // Cleanup + try { + unlinkSync(sessionExposurePath(sessionId)); + } catch {} + rmSync(projectRoot, { recursive: true, force: true }); + }); + + test("unbound prompt exposure (null boundary) does not resolve", () => { + const projectRoot = mkdtempSync(join(tmpdir(), "vercel-plugin-binding-")); + const sessionId = `sess-unbound-${Date.now()}`; + + appendSkillExposure({ + id: `${sessionId}:prompt:verification:1`, + sessionId, + projectRoot, + storyId: "story-1", + storyKind: "flow-verification", + route: "/settings", + hook: "UserPromptSubmit", + toolName: "Prompt", + skill: "verification", + targetBoundary: null, + exposureGroupId: "group-1", + attributionRole: "candidate", + candidateSkill: "verification", + createdAt: "2026-03-28T00:00:00.000Z", + resolvedAt: null, + outcome: "pending", + }); + + const resolved = resolveBoundaryOutcome({ + sessionId, + boundary: "serverHandler", + matchedSuggestedAction: true, + storyId: "story-1", + route: "/settings", + now: "2026-03-28T00:01:00.000Z", + }); + + // null targetBoundary can never match "serverHandler" — this proves + // the binding is required for resolution + expect(resolved).toHaveLength(0); + + // Cleanup + try { + unlinkSync(sessionExposurePath(sessionId)); + } catch {} + rmSync(projectRoot, { recursive: true, force: true }); + }); +}); diff --git a/tests/prompt-verification-binding.test.ts b/tests/prompt-verification-binding.test.ts new file mode 100644 index 0000000..bb25e32 --- /dev/null +++ b/tests/prompt-verification-binding.test.ts @@ -0,0 +1,114 @@ +import { describe, expect, test } from "bun:test"; +import { resolvePromptVerificationBinding } from "../hooks/src/prompt-verification-binding.mjs"; + +describe("resolvePromptVerificationBinding", () => { + test("binds to active plan next action boundary", () => { + const binding = resolvePromptVerificationBinding({ + plan: { + hasStories: true, + activeStoryId: "story-1", + stories: [ + { + id: "story-1", + kind: "flow-verification", + route: "/settings", + promptExcerpt: "save fails", + createdAt: "2026-03-28T00:00:00.000Z", + updatedAt: "2026-03-28T00:00:00.000Z", + }, + ], + storyStates: [], + observationCount: 1, + satisfiedBoundaries: ["clientRequest"], + missingBoundaries: ["environment", "serverHandler", "uiRender"], + recentRoutes: ["/settings"], + primaryNextAction: { + action: "tail server logs /settings", + targetBoundary: "serverHandler", + reason: "No server-side observation yet", + }, + blockedReasons: [], + }, + }); + + expect(binding).toEqual({ + targetBoundary: "serverHandler", + storyId: "story-1", + storyKind: "flow-verification", + route: "/settings", + source: "active-plan", + confidence: 1, + reason: "active verification plan predicted serverHandler", + }); + }); + + test("returns no binding when there is no next boundary", () => { + const binding = resolvePromptVerificationBinding({ + plan: { + hasStories: true, + activeStoryId: "story-1", + stories: [ + { + id: "story-1", + kind: "flow-verification", + route: "/settings", + promptExcerpt: "save fails", + createdAt: "2026-03-28T00:00:00.000Z", + updatedAt: "2026-03-28T00:00:00.000Z", + }, + ], + storyStates: [], + observationCount: 4, + satisfiedBoundaries: [ + "clientRequest", + "environment", + "serverHandler", + "uiRender", + ], + missingBoundaries: [], + recentRoutes: ["/settings"], + primaryNextAction: null, + blockedReasons: [], + }, + }); + + expect(binding.targetBoundary).toBeNull(); + expect(binding.source).toBe("none"); + expect(binding.storyId).toBe("story-1"); + expect(binding.reason).toBe( + "active verification story exists but no primary next boundary is available", + ); + }); + + test("returns no binding when plan is null", () => { + const binding = resolvePromptVerificationBinding({ plan: null }); + + expect(binding.targetBoundary).toBeNull(); + expect(binding.source).toBe("none"); + expect(binding.storyId).toBeNull(); + expect(binding.confidence).toBe(0); + expect(binding.reason).toBe("no active verification story"); + }); + + test("returns no binding when plan has no stories", () => { + const binding = resolvePromptVerificationBinding({ + plan: { + hasStories: false, + activeStoryId: null, + stories: [], + storyStates: [], + observationCount: 0, + satisfiedBoundaries: [], + missingBoundaries: [], + recentRoutes: [], + primaryNextAction: null, + blockedReasons: [], + }, + }); + + expect(binding.targetBoundary).toBeNull(); + expect(binding.source).toBe("none"); + expect(binding.storyId).toBeNull(); + expect(binding.reason).toBe("no active verification story"); + }); +}); diff --git a/tests/routing-attribution.test.ts b/tests/routing-attribution.test.ts new file mode 100644 index 0000000..2739a17 --- /dev/null +++ b/tests/routing-attribution.test.ts @@ -0,0 +1,462 @@ +import { describe, test, expect, beforeEach, afterEach } from "bun:test"; +import { unlinkSync } from "node:fs"; +import { + chooseAttributedSkill, + buildAttributionDecision, +} from "../hooks/src/routing-attribution.mts"; +import { + projectPolicyPath, + sessionExposurePath, + appendSkillExposure, + loadSessionExposures, + loadProjectRoutingPolicy, + resolveBoundaryOutcome, + finalizeStaleExposures, + type SkillExposure, +} from "../hooks/src/routing-policy-ledger.mts"; +import { + distillRulesFromTrace, + type LearnedRoutingRulesFile, +} from "../hooks/src/rule-distillation.mts"; +import type { RoutingDecisionTrace } from "../hooks/src/routing-decision-trace.mts"; + +// --------------------------------------------------------------------------- +// Fixtures +// --------------------------------------------------------------------------- + +const PROJECT_ROOT = "/tmp/test-attribution-project"; +const SESSION_ID = "attribution-test-" + Date.now(); + +const T0 = "2026-03-27T05:00:00.000Z"; +const T1 = "2026-03-27T05:01:00.000Z"; +const T2 = "2026-03-27T05:02:00.000Z"; +const T3 = "2026-03-27T05:03:00.000Z"; +const T_END = "2026-03-27T05:30:00.000Z"; + +function exposure( + id: string, + overrides: Partial = {}, +): SkillExposure { + return { + id, + sessionId: SESSION_ID, + projectRoot: PROJECT_ROOT, + storyId: "story-1", + storyKind: "flow-verification", + route: "/dashboard", + hook: "PreToolUse", + toolName: "Bash", + skill: "agent-browser-verify", + targetBoundary: "uiRender", + exposureGroupId: null, + attributionRole: "candidate", + candidateSkill: null, + createdAt: T0, + resolvedAt: null, + outcome: "pending", + ...overrides, + }; +} + +function cleanupFiles() { + try { unlinkSync(projectPolicyPath(PROJECT_ROOT)); } catch {} + try { unlinkSync(sessionExposurePath(SESSION_ID)); } catch {} +} + +// --------------------------------------------------------------------------- +// chooseAttributedSkill +// --------------------------------------------------------------------------- + +describe("chooseAttributedSkill", () => { + test("returns null for empty batch", () => { + expect(chooseAttributedSkill([])).toBeNull(); + }); + + test("returns first loaded skill when no preferred", () => { + expect(chooseAttributedSkill(["a", "b", "c"])).toBe("a"); + }); + + test("prefers a skill in preferredSkills set", () => { + expect( + chooseAttributedSkill( + ["verification", "agent-browser-verify"], + ["verification"], + ), + ).toBe("verification"); + }); + + test("falls back to first when no preferred match", () => { + expect( + chooseAttributedSkill(["a", "b"], ["z"]), + ).toBe("a"); + }); + + test("returns first preferred match in load order", () => { + expect( + chooseAttributedSkill( + ["x", "y", "z"], + ["z", "y"], + ), + ).toBe("y"); + }); +}); + +// --------------------------------------------------------------------------- +// buildAttributionDecision +// --------------------------------------------------------------------------- + +describe("buildAttributionDecision", () => { + test("produces stable exposureGroupId segments", () => { + const decision = buildAttributionDecision({ + sessionId: "sess-1", + hook: "PreToolUse", + storyId: "story-1", + route: "/settings", + targetBoundary: "clientRequest", + loadedSkills: ["verification", "agent-browser-verify"], + now: "2026-03-27T05:00:00.000Z", + }); + + expect(decision.exposureGroupId).toBe( + "sess-1:PreToolUse:story-1:/settings:clientRequest:2026-03-27T05:00:00.000Z", + ); + expect(decision.candidateSkill).toBe("verification"); + expect(decision.loadedSkills).toEqual(["verification", "agent-browser-verify"]); + }); + + test("null storyId/route/boundary become placeholder segments", () => { + const decision = buildAttributionDecision({ + sessionId: "sess-1", + hook: "UserPromptSubmit", + storyId: null, + route: null, + targetBoundary: null, + loadedSkills: ["next-config"], + now: "2026-03-27T05:00:00.000Z", + }); + + expect(decision.exposureGroupId).toContain("none:*:none"); + expect(decision.candidateSkill).toBe("next-config"); + }); + + test("preferredSkills overrides load-order selection", () => { + const decision = buildAttributionDecision({ + sessionId: "sess-1", + hook: "PreToolUse", + storyId: null, + route: null, + targetBoundary: null, + loadedSkills: ["a", "b", "c"], + preferredSkills: ["c"], + now: "2026-03-27T05:00:00.000Z", + }); + + expect(decision.candidateSkill).toBe("c"); + }); +}); + +// --------------------------------------------------------------------------- +// Candidate-vs-context policy gating (critical acceptance test) +// --------------------------------------------------------------------------- + +describe("candidate-vs-context policy gating", () => { + beforeEach(cleanupFiles); + afterEach(cleanupFiles); + + test("two skills in same group: only candidate updates policy on win", () => { + const groupId = "group-1"; + + // Candidate: agent-browser-verify + appendSkillExposure(exposure("e1", { + skill: "agent-browser-verify", + exposureGroupId: groupId, + attributionRole: "candidate", + candidateSkill: "agent-browser-verify", + createdAt: T0, + })); + + // Context: verification + appendSkillExposure(exposure("e2", { + skill: "verification", + exposureGroupId: groupId, + attributionRole: "context", + candidateSkill: "agent-browser-verify", + createdAt: T1, + })); + + // Both get resolved (outcome set on both) + const resolved = resolveBoundaryOutcome({ + sessionId: SESSION_ID, + boundary: "uiRender", + matchedSuggestedAction: false, + storyId: "story-1", + route: "/dashboard", + now: T2, + }); + + // Both exposures are resolved in the ledger + expect(resolved).toHaveLength(2); + expect(resolved.every((e) => e.outcome === "win")).toBe(true); + + // Full history is preserved in session JSONL + const all = loadSessionExposures(SESSION_ID); + expect(all).toHaveLength(2); + expect(all.every((e) => e.outcome === "win")).toBe(true); + + // BUT only the candidate's policy stats are updated + const policy = loadProjectRoutingPolicy(PROJECT_ROOT); + const scenario = "PreToolUse|flow-verification|uiRender|Bash"; + + const candidateStats = policy.scenarios[scenario]?.["agent-browser-verify"]; + expect(candidateStats).toBeDefined(); + expect(candidateStats!.wins).toBe(1); + expect(candidateStats!.exposures).toBe(1); + + // Context skill should have NO policy entry + const contextStats = policy.scenarios[scenario]?.["verification"]; + expect(contextStats).toBeUndefined(); + }); + + test("context exposure records to JSONL but not to policy on append", () => { + appendSkillExposure(exposure("ctx-1", { + skill: "helper-skill", + attributionRole: "context", + candidateSkill: "main-skill", + exposureGroupId: "group-2", + createdAt: T0, + })); + + // JSONL has the exposure + const all = loadSessionExposures(SESSION_ID); + expect(all).toHaveLength(1); + expect(all[0].skill).toBe("helper-skill"); + + // Policy does NOT have an exposure count for helper-skill + const policy = loadProjectRoutingPolicy(PROJECT_ROOT); + const scenario = "PreToolUse|flow-verification|uiRender|Bash"; + expect(policy.scenarios[scenario]?.["helper-skill"]).toBeUndefined(); + }); + + test("stale-miss finalization only updates policy for candidate", () => { + appendSkillExposure(exposure("stale-cand", { + skill: "candidate-skill", + attributionRole: "candidate", + candidateSkill: "candidate-skill", + exposureGroupId: "group-stale", + createdAt: T0, + })); + + appendSkillExposure(exposure("stale-ctx", { + skill: "context-skill", + attributionRole: "context", + candidateSkill: "candidate-skill", + exposureGroupId: "group-stale", + createdAt: T1, + })); + + const stale = finalizeStaleExposures(SESSION_ID, T_END); + + // Both exposures are marked stale in the ledger + expect(stale).toHaveLength(2); + expect(stale.every((e) => e.outcome === "stale-miss")).toBe(true); + + // Only candidate updates policy + const policy = loadProjectRoutingPolicy(PROJECT_ROOT); + const scenario = "PreToolUse|flow-verification|uiRender|Bash"; + + const candStats = policy.scenarios[scenario]?.["candidate-skill"]; + expect(candStats).toBeDefined(); + expect(candStats!.staleMisses).toBe(1); + + const ctxStats = policy.scenarios[scenario]?.["context-skill"]; + expect(ctxStats).toBeUndefined(); + }); + + test("legacy rows without attributionRole default to candidate behavior", () => { + // Simulate a legacy exposure (no attribution fields) + const legacyExposure: SkillExposure = { + id: "legacy-1", + sessionId: SESSION_ID, + projectRoot: PROJECT_ROOT, + storyId: "story-1", + storyKind: "flow-verification", + route: "/dashboard", + hook: "PreToolUse", + toolName: "Bash", + skill: "legacy-skill", + targetBoundary: "uiRender", + exposureGroupId: null, + attributionRole: undefined as any, // Simulate missing field + candidateSkill: null, + createdAt: T0, + resolvedAt: null, + outcome: "pending", + }; + + appendSkillExposure(legacyExposure); + + // Should still update policy (backward compat) + const policy = loadProjectRoutingPolicy(PROJECT_ROOT); + const scenario = "PreToolUse|flow-verification|uiRender|Bash"; + expect(policy.scenarios[scenario]?.["legacy-skill"]).toBeDefined(); + expect(policy.scenarios[scenario]!["legacy-skill"]!.exposures).toBe(1); + }); + + // --------------------------------------------------------------------------- + // Distillation-level attribution: context-only skills produce no rules + // --------------------------------------------------------------------------- + + test("distillation pipeline: candidate-only attribution produces rules, context does not", () => { + const DISTILL_TS = "2026-03-28T06:00:00.000Z"; + + // Build traces with both candidate and context skills ranked + const traces: RoutingDecisionTrace[] = Array.from({ length: 8 }, (_, i) => ({ + version: 2 as const, + decisionId: `distill-attr-${i}`, + sessionId: SESSION_ID, + hook: "PreToolUse" as const, + toolName: "Read" as const, + toolTarget: "/app/page.tsx", + timestamp: DISTILL_TS, + primaryStory: { + id: "story-1", + kind: "feature", + storyRoute: "/app", + targetBoundary: "uiRender", + }, + observedRoute: "/app", + policyScenario: null, + matchedSkills: ["main-skill", "helper-skill"], + injectedSkills: ["main-skill", "helper-skill"], + skippedReasons: [], + ranked: [ + { + skill: "main-skill", + basePriority: 6, + effectivePriority: 6, + pattern: { type: "path", value: "app/**" }, + profilerBoost: 0, + policyBoost: 0, + policyReason: null, + summaryOnly: false, + synthetic: false, + droppedReason: null, + }, + { + skill: "helper-skill", + basePriority: 4, + effectivePriority: 4, + pattern: { type: "path", value: "**/*.tsx" }, + profilerBoost: 0, + policyBoost: 0, + policyReason: null, + summaryOnly: false, + synthetic: false, + droppedReason: null, + }, + ], + verification: { + verificationId: `v-attr-${i}`, + observedBoundary: "uiRender", + matchedSuggestedAction: true, + }, + })); + + // Candidate exposures for main-skill + const candidateExposures: SkillExposure[] = Array.from({ length: 8 }, (_, i) => ({ + id: `cand-exp-${i}`, + sessionId: SESSION_ID, + projectRoot: PROJECT_ROOT, + storyId: "story-1", + storyKind: "feature", + route: "/app", + hook: "PreToolUse" as const, + toolName: "Read" as const, + skill: "main-skill", + targetBoundary: "uiRender", + exposureGroupId: `group-${i}`, + attributionRole: "candidate" as const, + candidateSkill: "main-skill", + createdAt: DISTILL_TS, + resolvedAt: DISTILL_TS, + outcome: "win" as const, + })); + + // Context exposures for helper-skill + const contextExposures: SkillExposure[] = Array.from({ length: 8 }, (_, i) => ({ + id: `ctx-exp-${i}`, + sessionId: SESSION_ID, + projectRoot: PROJECT_ROOT, + storyId: "story-1", + storyKind: "feature", + route: "/app", + hook: "PreToolUse" as const, + toolName: "Read" as const, + skill: "helper-skill", + targetBoundary: "uiRender", + exposureGroupId: `group-${i}`, + attributionRole: "context" as const, + candidateSkill: "main-skill", + createdAt: DISTILL_TS, + resolvedAt: DISTILL_TS, + outcome: "win" as const, + })); + + const result: LearnedRoutingRulesFile = distillRulesFromTrace({ + projectRoot: PROJECT_ROOT, + traces, + exposures: [...candidateExposures, ...contextExposures], + policy: { scenarios: {} }, + generatedAt: DISTILL_TS, + }); + + // candidate main-skill should have rules + const mainRules = result.rules.filter((r) => r.skill === "main-skill"); + expect(mainRules.length).toBeGreaterThanOrEqual(1); + + // context helper-skill should have ZERO rules + const helperRules = result.rules.filter((r) => r.skill === "helper-skill"); + expect(helperRules).toEqual([]); + }); + + test("directive-win only credits candidate in policy", () => { + appendSkillExposure(exposure("dw-cand", { + skill: "verification", + attributionRole: "candidate", + candidateSkill: "verification", + exposureGroupId: "group-dw", + createdAt: T0, + })); + + appendSkillExposure(exposure("dw-ctx", { + skill: "agent-browser-verify", + attributionRole: "context", + candidateSkill: "verification", + exposureGroupId: "group-dw", + createdAt: T1, + })); + + const resolved = resolveBoundaryOutcome({ + sessionId: SESSION_ID, + boundary: "uiRender", + matchedSuggestedAction: true, + storyId: "story-1", + route: "/dashboard", + now: T3, + }); + + expect(resolved).toHaveLength(2); + expect(resolved.every((e) => e.outcome === "directive-win")).toBe(true); + + const policy = loadProjectRoutingPolicy(PROJECT_ROOT); + const scenario = "PreToolUse|flow-verification|uiRender|Bash"; + + // Candidate gets wins + directiveWins + const candStats = policy.scenarios[scenario]?.["verification"]; + expect(candStats!.wins).toBe(1); + expect(candStats!.directiveWins).toBe(1); + + // Context gets nothing in policy + expect(policy.scenarios[scenario]?.["agent-browser-verify"]).toBeUndefined(); + }); +}); diff --git a/tests/routing-decision-capsule.test.ts b/tests/routing-decision-capsule.test.ts new file mode 100644 index 0000000..18f56bd --- /dev/null +++ b/tests/routing-decision-capsule.test.ts @@ -0,0 +1,426 @@ +import { afterEach, describe, expect, test } from "bun:test"; +import { rmSync } from "node:fs"; +import { + buildDecisionCapsule, + buildDecisionCapsuleEnv, + decisionCapsuleDir, + decisionCapsulePath, + persistDecisionCapsule, + readDecisionCapsule, +} from "../hooks/src/routing-decision-capsule.mts"; +import type { RoutingDecisionTrace } from "../hooks/src/routing-decision-trace.mts"; +import type { VerificationDirective } from "../hooks/src/verification-directive.mts"; + +const SESSION_ID = "decision-capsule-test"; + +afterEach(() => { + rmSync(decisionCapsuleDir(SESSION_ID), { recursive: true, force: true }); +}); + +function makeTrace( + overrides?: Partial, +): RoutingDecisionTrace { + return { + version: 2, + decisionId: "abc123def4567890", + sessionId: SESSION_ID, + hook: "PreToolUse", + toolName: "Read", + toolTarget: "app/page.tsx", + timestamp: "2026-03-28T02:30:00.000Z", + primaryStory: { + id: "story-1", + kind: "flow-verification", + storyRoute: "/settings", + targetBoundary: "uiRender", + }, + observedRoute: null, + policyScenario: "PreToolUse|flow-verification|uiRender|Read", + matchedSkills: ["nextjs", "react-best-practices"], + injectedSkills: ["nextjs"], + skippedReasons: [], + ranked: [ + { + skill: "nextjs", + basePriority: 7, + effectivePriority: 12, + pattern: { type: "suffix", value: "app/**/*.tsx" }, + profilerBoost: 5, + policyBoost: 0, + policyReason: null, + matchedRuleId: null, + ruleBoost: 0, + ruleReason: null, + rulebookPath: null, + summaryOnly: false, + synthetic: false, + droppedReason: null, + }, + ], + verification: { + verificationId: "verify-1", + observedBoundary: null, + matchedSuggestedAction: null, + }, + ...overrides, + }; +} + +function makeDirective( + overrides?: Partial, +): VerificationDirective { + return { + version: 1, + storyId: "story-1", + storyKind: "flow-verification", + route: "/settings", + missingBoundaries: ["uiRender"], + satisfiedBoundaries: ["clientRequest", "serverHandler"], + primaryNextAction: { + action: "open /settings in agent-browser", + targetBoundary: "uiRender", + reason: "No UI render observation yet", + }, + blockedReasons: [], + ...overrides, + }; +} + +describe("routing decision capsule", () => { + test("buildDecisionCapsule returns v1 payload with stable sha256", () => { + const trace = makeTrace(); + const directive = makeDirective(); + + const capsule = buildDecisionCapsule({ + sessionId: SESSION_ID, + hook: "PreToolUse", + createdAt: trace.timestamp, + toolName: trace.toolName, + toolTarget: trace.toolTarget, + platform: "claude-code", + trace, + directive, + attribution: { + exposureGroupId: "group-1", + candidateSkill: "nextjs", + loadedSkills: ["nextjs"], + }, + reasons: { + nextjs: { trigger: "suffix", reasonCode: "pattern-match" }, + }, + env: { VERCEL_PLUGIN_VERIFICATION_ROUTE: "/settings" }, + }); + + expect(capsule.type).toBe("routing.decision-capsule/v1"); + expect(capsule.version).toBe(1); + expect(capsule.decisionId).toBe("abc123def4567890"); + expect(capsule.hook).toBe("PreToolUse"); + expect(capsule.input.platform).toBe("claude-code"); + expect(capsule.activeStory.id).toBe("story-1"); + expect(capsule.injectedSkills).toEqual(["nextjs"]); + expect(capsule.sha256).toBeString(); + expect(capsule.sha256).toHaveLength(64); + }); + + test("identical inputs produce identical sha256", () => { + const trace = makeTrace(); + const directive = makeDirective(); + const args = { + sessionId: SESSION_ID, + hook: "PreToolUse" as const, + createdAt: trace.timestamp, + toolName: trace.toolName, + toolTarget: trace.toolTarget, + platform: "claude-code", + trace, + directive, + }; + + const a = buildDecisionCapsule(args); + const b = buildDecisionCapsule(args); + expect(a.sha256).toBe(b.sha256); + }); + + test("different inputs produce different sha256", () => { + const trace1 = makeTrace({ decisionId: "id-1" }); + const trace2 = makeTrace({ decisionId: "id-2" }); + + const a = buildDecisionCapsule({ + sessionId: SESSION_ID, + hook: "PreToolUse", + createdAt: trace1.timestamp, + toolName: "Read", + toolTarget: "a.tsx", + trace: trace1, + directive: null, + }); + const b = buildDecisionCapsule({ + sessionId: SESSION_ID, + hook: "PreToolUse", + createdAt: trace2.timestamp, + toolName: "Read", + toolTarget: "a.tsx", + trace: trace2, + directive: null, + }); + expect(a.sha256).not.toBe(b.sha256); + }); + + test("persist and read round-trip", () => { + const trace = makeTrace(); + const directive = makeDirective(); + + const capsule = buildDecisionCapsule({ + sessionId: SESSION_ID, + hook: "PreToolUse", + createdAt: trace.timestamp, + toolName: trace.toolName, + toolTarget: trace.toolTarget, + platform: "claude-code", + trace, + directive, + attribution: { + exposureGroupId: "group-1", + candidateSkill: "nextjs", + loadedSkills: ["nextjs"], + }, + reasons: { + nextjs: { trigger: "suffix", reasonCode: "pattern-match" }, + }, + env: { VERCEL_PLUGIN_VERIFICATION_ROUTE: "/settings" }, + }); + + const artifactPath = persistDecisionCapsule(capsule); + const loaded = readDecisionCapsule(artifactPath); + + expect(loaded).not.toBeNull(); + expect(loaded!.decisionId).toBe(capsule.decisionId); + expect(loaded!.sha256).toBe(capsule.sha256); + expect(loaded!.type).toBe("routing.decision-capsule/v1"); + expect(loaded!.activeStory).toEqual(capsule.activeStory); + expect(loaded!.attribution).toEqual(capsule.attribution); + }); + + test("buildDecisionCapsuleEnv returns correct env vars", () => { + const trace = makeTrace(); + const capsule = buildDecisionCapsule({ + sessionId: SESSION_ID, + hook: "PreToolUse", + createdAt: trace.timestamp, + toolName: "Read", + toolTarget: "app/page.tsx", + trace, + directive: null, + }); + + const artifactPath = persistDecisionCapsule(capsule); + const env = buildDecisionCapsuleEnv(capsule, artifactPath); + + expect(env.VERCEL_PLUGIN_DECISION_ID).toBe(capsule.decisionId); + expect(env.VERCEL_PLUGIN_DECISION_PATH).toBe(artifactPath); + expect(env.VERCEL_PLUGIN_DECISION_SHA256).toBe(capsule.sha256); + }); + + test("readDecisionCapsule returns null for missing file", () => { + const result = readDecisionCapsule("/nonexistent/path.json"); + expect(result).toBeNull(); + }); + + test("decisionCapsulePath is session-scoped", () => { + const pathA = decisionCapsulePath("session-a", "dec-1"); + const pathB = decisionCapsulePath("session-b", "dec-1"); + expect(pathA).not.toBe(pathB); + expect(pathA).toContain("session-a"); + expect(pathB).toContain("session-b"); + }); + + test("unsafe session IDs are hashed", () => { + const path = decisionCapsulePath("../../etc/passwd", "dec-1"); + expect(path).not.toContain("../../"); + expect(path).toContain("-capsules/dec-1.json"); + }); + + test("null session uses no-session segment", () => { + const path = decisionCapsulePath(null, "dec-1"); + expect(path).toContain("no-session"); + }); + + test("unknown platform defaults correctly", () => { + const trace = makeTrace(); + const capsule = buildDecisionCapsule({ + sessionId: SESSION_ID, + hook: "PreToolUse", + createdAt: trace.timestamp, + toolName: "Read", + toolTarget: "a.tsx", + platform: "vscode", + trace, + directive: null, + }); + expect(capsule.input.platform).toBe("unknown"); + }); + + test("issues include no_active_verification_story when story id is null", () => { + const trace = makeTrace({ + primaryStory: { + id: null, + kind: null, + storyRoute: null, + targetBoundary: null, + }, + }); + const capsule = buildDecisionCapsule({ + sessionId: SESSION_ID, + hook: "PreToolUse", + createdAt: trace.timestamp, + toolName: "Read", + toolTarget: "a.tsx", + trace, + directive: null, + }); + const codes = capsule.issues.map((i) => i.code); + expect(codes).toContain("no_active_verification_story"); + }); + + test("issues include budget_exhausted when skippedReasons has budget entry", () => { + const trace = makeTrace({ + skippedReasons: ["budget_exhausted:tailwindcss"], + }); + const capsule = buildDecisionCapsule({ + sessionId: SESSION_ID, + hook: "PreToolUse", + createdAt: trace.timestamp, + toolName: "Read", + toolTarget: "a.tsx", + trace, + directive: makeDirective(), + }); + const codes = capsule.issues.map((i) => i.code); + expect(codes).toContain("budget_exhausted"); + }); + + test("issues include verification_blocked when directive has blocked reasons", () => { + const trace = makeTrace(); + const directive = makeDirective({ + blockedReasons: ["missing browser agent"], + }); + const capsule = buildDecisionCapsule({ + sessionId: SESSION_ID, + hook: "PreToolUse", + createdAt: trace.timestamp, + toolName: "Read", + toolTarget: "a.tsx", + trace, + directive, + }); + const codes = capsule.issues.map((i) => i.code); + expect(codes).toContain("verification_blocked"); + }); + + test("rulebookProvenance is null when no rule fires", () => { + const trace = makeTrace(); + const capsule = buildDecisionCapsule({ + sessionId: SESSION_ID, + hook: "PreToolUse", + createdAt: trace.timestamp, + toolName: "Read", + toolTarget: "app/page.tsx", + trace, + directive: null, + }); + expect(capsule.rulebookProvenance).toBeNull(); + }); + + test("rulebookProvenance is populated when a ranked entry has a matched rule", () => { + const trace = makeTrace({ + ranked: [ + { + skill: "agent-browser-verify", + basePriority: 6, + effectivePriority: 14, + pattern: { type: "bash", value: "vercel dev" }, + profilerBoost: 0, + policyBoost: 0, + policyReason: null, + matchedRuleId: "PreToolUse|flow-verification|uiRender|Bash|agent-browser-verify", + ruleBoost: 8, + ruleReason: "replay verified: no regressions, learned routing matched winning skill", + rulebookPath: "/tmp/vercel-plugin-routing-policy-abc-rulebook.json", + summaryOnly: false, + synthetic: false, + droppedReason: null, + }, + ], + }); + const capsule = buildDecisionCapsule({ + sessionId: SESSION_ID, + hook: "PreToolUse", + createdAt: trace.timestamp, + toolName: "Bash", + toolTarget: "vercel dev", + trace, + directive: makeDirective(), + }); + expect(capsule.rulebookProvenance).not.toBeNull(); + expect(capsule.rulebookProvenance!.matchedRuleId).toBe( + "PreToolUse|flow-verification|uiRender|Bash|agent-browser-verify", + ); + expect(capsule.rulebookProvenance!.ruleBoost).toBe(8); + expect(capsule.rulebookProvenance!.ruleReason).toBe( + "replay verified: no regressions, learned routing matched winning skill", + ); + expect(capsule.rulebookProvenance!.rulebookPath).toBe( + "/tmp/vercel-plugin-routing-policy-abc-rulebook.json", + ); + }); + + test("rulebookProvenance round-trips through persist and read", () => { + const trace = makeTrace({ + ranked: [ + { + skill: "agent-browser-verify", + basePriority: 6, + effectivePriority: 14, + pattern: { type: "bash", value: "vercel dev" }, + profilerBoost: 0, + policyBoost: 0, + policyReason: null, + matchedRuleId: "PreToolUse|flow-verification|uiRender|Bash|agent-browser-verify", + ruleBoost: 8, + ruleReason: "replay verified", + rulebookPath: "/tmp/rulebook.json", + summaryOnly: false, + synthetic: false, + droppedReason: null, + }, + ], + }); + const capsule = buildDecisionCapsule({ + sessionId: SESSION_ID, + hook: "PreToolUse", + createdAt: trace.timestamp, + toolName: "Bash", + toolTarget: "vercel dev", + trace, + directive: null, + }); + const artifactPath = persistDecisionCapsule(capsule); + const loaded = readDecisionCapsule(artifactPath); + expect(loaded!.rulebookProvenance).toEqual(capsule.rulebookProvenance); + }); + + test("PostToolUse hook omits machine_output_hidden_in_html_comment issue", () => { + const trace = makeTrace({ hook: "PostToolUse" }); + const capsule = buildDecisionCapsule({ + sessionId: SESSION_ID, + hook: "PostToolUse", + createdAt: trace.timestamp, + toolName: "Write", + toolTarget: "a.tsx", + trace, + directive: makeDirective(), + }); + const codes = capsule.issues.map((i) => i.code); + expect(codes).not.toContain("machine_output_hidden_in_html_comment"); + }); +}); diff --git a/tests/routing-decision-trace.test.ts b/tests/routing-decision-trace.test.ts new file mode 100644 index 0000000..276aa42 --- /dev/null +++ b/tests/routing-decision-trace.test.ts @@ -0,0 +1,1114 @@ +import { describe, test, expect, beforeEach, afterEach } from "bun:test"; +import { rmSync, existsSync, readFileSync, mkdirSync, appendFileSync } from "node:fs"; +import { tmpdir } from "node:os"; +import { createHash } from "node:crypto"; +import { + appendRoutingDecisionTrace, + readRoutingDecisionTrace, + createDecisionId, + traceDir, + tracePath, + type RoutingDecisionTrace, + type DecisionHook, +} from "../hooks/src/routing-decision-trace.mts"; +import { + createDecisionCausality, + addCause, + addEdge, + causesForSkill, + type RoutingDecisionCause, + type RoutingDecisionEdge, +} from "../hooks/src/routing-decision-causality.mts"; + +// --------------------------------------------------------------------------- +// Helpers +// --------------------------------------------------------------------------- + +const TEST_SESSION = "test-session-rdt-" + Date.now(); + +function makeTrace( + overrides: Partial = {}, +): RoutingDecisionTrace { + return { + version: 2, + decisionId: "deadbeef01234567", + sessionId: TEST_SESSION, + hook: "PreToolUse", + toolName: "Bash", + toolTarget: "npm run dev", + timestamp: "2026-03-27T08:00:00.000Z", + primaryStory: { + id: "story-1", + kind: "flow-verification", + storyRoute: "/settings", + targetBoundary: "uiRender", + }, + observedRoute: null, + policyScenario: "PreToolUse|flow-verification|uiRender|Bash", + matchedSkills: ["agent-browser-verify"], + injectedSkills: ["agent-browser-verify"], + skippedReasons: [], + ranked: [ + { + skill: "agent-browser-verify", + basePriority: 7, + effectivePriority: 15, + pattern: { type: "bashPattern", value: "dev server" }, + profilerBoost: 0, + policyBoost: 8, + policyReason: "4/5 wins", + summaryOnly: false, + synthetic: false, + droppedReason: null, + }, + ], + verification: null, + causes: [], + edges: [], + ...overrides, + }; +} + +function cleanup() { + try { + rmSync(traceDir(TEST_SESSION), { recursive: true, force: true }); + } catch {} + try { + rmSync(traceDir(null), { recursive: true, force: true }); + } catch {} + try { + rmSync(traceDir("unsafe/session:id"), { recursive: true, force: true }); + } catch {} +} + +// --------------------------------------------------------------------------- +// Tests +// --------------------------------------------------------------------------- + +describe("routing-decision-trace", () => { + beforeEach(cleanup); + afterEach(cleanup); + + // ------------------------------------------------------------------------- + // Path helpers + // ------------------------------------------------------------------------- + + describe("traceDir / tracePath", () => { + test("uses sessionId directly for safe IDs", () => { + const dir = traceDir("my-session-123"); + expect(dir).toBe(`${tmpdir()}/vercel-plugin-my-session-123-trace`); + }); + + test("hashes unsafe session IDs", () => { + const dir = traceDir("unsafe/session:id"); + const hash = createHash("sha256") + .update("unsafe/session:id") + .digest("hex"); + expect(dir).toBe(`${tmpdir()}/vercel-plugin-${hash}-trace`); + }); + + test("uses 'no-session' for null sessionId", () => { + const dir = traceDir(null); + expect(dir).toBe(`${tmpdir()}/vercel-plugin-no-session-trace`); + }); + + test("tracePath ends with routing-decision-trace.jsonl", () => { + const path = tracePath(TEST_SESSION); + expect(path).toEndWith("/routing-decision-trace.jsonl"); + expect(path).toContain(TEST_SESSION); + }); + }); + + // ------------------------------------------------------------------------- + // createDecisionId + // ------------------------------------------------------------------------- + + describe("createDecisionId", () => { + test("returns 16-character hex string", () => { + const id = createDecisionId({ + hook: "PreToolUse", + sessionId: "sess-1", + toolName: "Bash", + toolTarget: "npm run dev", + timestamp: "2026-03-27T08:00:00.000Z", + }); + expect(id).toHaveLength(16); + expect(id).toMatch(/^[0-9a-f]{16}$/); + }); + + test("deterministic for identical inputs", () => { + const input = { + hook: "PreToolUse" as DecisionHook, + sessionId: "sess-1", + toolName: "Bash", + toolTarget: "npm run dev", + timestamp: "2026-03-27T08:00:00.000Z", + }; + const id1 = createDecisionId(input); + const id2 = createDecisionId(input); + expect(id1).toBe(id2); + }); + + test("changes when hook changes", () => { + const base = { + sessionId: "sess-1", + toolName: "Bash", + toolTarget: "npm run dev", + timestamp: "2026-03-27T08:00:00.000Z", + }; + const a = createDecisionId({ ...base, hook: "PreToolUse" }); + const b = createDecisionId({ ...base, hook: "PostToolUse" }); + expect(a).not.toBe(b); + }); + + test("changes when sessionId changes", () => { + const base = { + hook: "PreToolUse" as DecisionHook, + toolName: "Bash", + toolTarget: "npm run dev", + timestamp: "2026-03-27T08:00:00.000Z", + }; + const a = createDecisionId({ ...base, sessionId: "sess-1" }); + const b = createDecisionId({ ...base, sessionId: "sess-2" }); + expect(a).not.toBe(b); + }); + + test("changes when toolName changes", () => { + const base = { + hook: "PreToolUse" as DecisionHook, + sessionId: "sess-1", + toolTarget: "npm run dev", + timestamp: "2026-03-27T08:00:00.000Z", + }; + const a = createDecisionId({ ...base, toolName: "Bash" }); + const b = createDecisionId({ ...base, toolName: "Read" }); + expect(a).not.toBe(b); + }); + + test("changes when toolTarget changes", () => { + const base = { + hook: "PreToolUse" as DecisionHook, + sessionId: "sess-1", + toolName: "Bash", + timestamp: "2026-03-27T08:00:00.000Z", + }; + const a = createDecisionId({ ...base, toolTarget: "npm run dev" }); + const b = createDecisionId({ ...base, toolTarget: "npm run build" }); + expect(a).not.toBe(b); + }); + + test("changes when timestamp changes", () => { + const base = { + hook: "PreToolUse" as DecisionHook, + sessionId: "sess-1", + toolName: "Bash", + toolTarget: "npm run dev", + }; + const a = createDecisionId({ + ...base, + timestamp: "2026-03-27T08:00:00.000Z", + }); + const b = createDecisionId({ + ...base, + timestamp: "2026-03-27T08:01:00.000Z", + }); + expect(a).not.toBe(b); + }); + + test("treats null sessionId as empty string", () => { + const base = { + hook: "PreToolUse" as DecisionHook, + toolName: "Bash", + toolTarget: "npm run dev", + timestamp: "2026-03-27T08:00:00.000Z", + }; + const a = createDecisionId({ ...base, sessionId: null }); + const b = createDecisionId({ ...base, sessionId: null }); + expect(a).toBe(b); + + const c = createDecisionId({ ...base, sessionId: "real-session" }); + expect(a).not.toBe(c); + }); + }); + + // ------------------------------------------------------------------------- + // Append + Read round-trip + // ------------------------------------------------------------------------- + + describe("appendRoutingDecisionTrace / readRoutingDecisionTrace", () => { + test("single trace round-trip", () => { + const trace = makeTrace(); + appendRoutingDecisionTrace(trace); + + const traces = readRoutingDecisionTrace(TEST_SESSION); + expect(traces).toHaveLength(1); + expect(traces[0]).toEqual(trace); + }); + + test("multiple traces appended in order", () => { + const t1 = makeTrace({ + decisionId: "aaaa000000000001", + timestamp: "2026-03-27T08:00:00.000Z", + }); + const t2 = makeTrace({ + decisionId: "aaaa000000000002", + timestamp: "2026-03-27T08:01:00.000Z", + hook: "UserPromptSubmit", + toolName: "Prompt", + toolTarget: "deploy my app", + }); + const t3 = makeTrace({ + decisionId: "aaaa000000000003", + timestamp: "2026-03-27T08:02:00.000Z", + hook: "PostToolUse", + observedRoute: "/dashboard", + verification: { + verificationId: "verif-1", + observedBoundary: "uiRender", + matchedSuggestedAction: true, + }, + }); + + appendRoutingDecisionTrace(t1); + appendRoutingDecisionTrace(t2); + appendRoutingDecisionTrace(t3); + + const traces = readRoutingDecisionTrace(TEST_SESSION); + expect(traces).toHaveLength(3); + expect(traces[0].decisionId).toBe("aaaa000000000001"); + expect(traces[1].decisionId).toBe("aaaa000000000002"); + expect(traces[2].decisionId).toBe("aaaa000000000003"); + expect(traces[0]).toEqual(t1); + expect(traces[1]).toEqual(t2); + expect(traces[2]).toEqual(t3); + }); + + test("returns [] for non-existent session trace", () => { + const traces = readRoutingDecisionTrace("nonexistent-session-xyz"); + expect(traces).toEqual([]); + }); + + test("returns [] for null sessionId with no trace file", () => { + const traces = readRoutingDecisionTrace(null); + expect(traces).toEqual([]); + }); + + test("handles null sessionId traces", () => { + const trace = makeTrace({ sessionId: null }); + appendRoutingDecisionTrace(trace); + + const traces = readRoutingDecisionTrace(null); + expect(traces).toHaveLength(1); + expect(traces[0].sessionId).toBeNull(); + }); + + test("JSONL file has one JSON object per line", () => { + appendRoutingDecisionTrace(makeTrace({ decisionId: "line1-id-0000001" })); + appendRoutingDecisionTrace(makeTrace({ decisionId: "line2-id-0000002" })); + + const raw = readFileSync(tracePath(TEST_SESSION), "utf8"); + const lines = raw.split("\n").filter((l) => l.trim() !== ""); + expect(lines).toHaveLength(2); + + // Each line is valid JSON + const parsed1 = JSON.parse(lines[0]); + const parsed2 = JSON.parse(lines[1]); + expect(parsed1.decisionId).toBe("line1-id-0000001"); + expect(parsed2.decisionId).toBe("line2-id-0000002"); + }); + + test("creates trace directory if it does not exist", () => { + // Cleanup ensures dir doesn't exist + expect(existsSync(traceDir(TEST_SESSION))).toBe(false); + + appendRoutingDecisionTrace(makeTrace()); + expect(existsSync(traceDir(TEST_SESSION))).toBe(true); + }); + + test("preserves all v2 trace fields", () => { + const trace = makeTrace({ + policyScenario: "PreToolUse|flow-verification|uiRender|Bash", + observedRoute: "/dashboard", + skippedReasons: [ + "no_active_verification_story", + "cap_exceeded:some-skill", + ], + ranked: [ + { + skill: "agent-browser-verify", + basePriority: 7, + effectivePriority: 15, + pattern: { type: "bashPattern", value: "dev" }, + profilerBoost: 5, + policyBoost: 8, + policyReason: "4/5 wins", + summaryOnly: false, + synthetic: false, + droppedReason: null, + }, + { + skill: "verification", + basePriority: 6, + effectivePriority: 6, + pattern: null, + profilerBoost: 0, + policyBoost: 0, + policyReason: null, + summaryOnly: true, + synthetic: true, + droppedReason: "budget_exhausted", + }, + ], + verification: { + verificationId: "verif-abc", + observedBoundary: "uiRender", + matchedSuggestedAction: true, + }, + }); + + appendRoutingDecisionTrace(trace); + const [read] = readRoutingDecisionTrace(TEST_SESSION); + + expect(read.version).toBe(2); + expect(read.primaryStory.id).toBe("story-1"); + expect(read.primaryStory.kind).toBe("flow-verification"); + expect(read.primaryStory.storyRoute).toBe("/settings"); + expect(read.primaryStory.targetBoundary).toBe("uiRender"); + expect(read.observedRoute).toBe("/dashboard"); + expect(read.policyScenario).toBe( + "PreToolUse|flow-verification|uiRender|Bash", + ); + expect(read.skippedReasons).toEqual([ + "no_active_verification_story", + "cap_exceeded:some-skill", + ]); + expect(read.ranked).toHaveLength(2); + expect(read.ranked[0].policyBoost).toBe(8); + expect(read.ranked[0].synthetic).toBe(false); + expect(read.ranked[1].droppedReason).toBe("budget_exhausted"); + expect(read.ranked[1].synthetic).toBe(true); + expect(read.verification?.verificationId).toBe("verif-abc"); + expect(read.verification?.matchedSuggestedAction).toBe(true); + }); + + test("v2 storyRoute and observedRoute are independent", () => { + const trace = makeTrace({ + hook: "PostToolUse", + primaryStory: { + id: "story-1", + kind: "flow-verification", + storyRoute: "/settings", + targetBoundary: "uiRender", + }, + observedRoute: "/api/users", + }); + + appendRoutingDecisionTrace(trace); + const [read] = readRoutingDecisionTrace(TEST_SESSION); + + expect(read.primaryStory.storyRoute).toBe("/settings"); + expect(read.observedRoute).toBe("/api/users"); + }); + + test("idempotent: appending same trace twice yields two records", () => { + const trace = makeTrace(); + appendRoutingDecisionTrace(trace); + appendRoutingDecisionTrace(trace); + + const traces = readRoutingDecisionTrace(TEST_SESSION); + expect(traces).toHaveLength(2); + expect(traces[0]).toEqual(traces[1]); + }); + }); + + // ------------------------------------------------------------------------- + // V1 backward compatibility + // ------------------------------------------------------------------------- + + describe("v1 backward compatibility", () => { + test("v1 traces are normalized to v2 on read", () => { + // Write a raw v1 trace directly to the JSONL file + const v1Trace = { + version: 1, + decisionId: "v1-trace-0000001", + sessionId: TEST_SESSION, + hook: "PreToolUse", + toolName: "Bash", + toolTarget: "npm run dev", + timestamp: "2026-03-27T08:00:00.000Z", + primaryStory: { + id: "story-1", + kind: "flow-verification", + route: "/settings", + targetBoundary: "uiRender", + }, + policyScenario: "PreToolUse|flow-verification|uiRender|Bash", + matchedSkills: ["agent-browser-verify"], + injectedSkills: ["agent-browser-verify"], + skippedReasons: [], + ranked: [], + verification: null, + }; + + mkdirSync(traceDir(TEST_SESSION), { recursive: true }); + appendFileSync( + tracePath(TEST_SESSION), + JSON.stringify(v1Trace) + "\n", + "utf8", + ); + + const traces = readRoutingDecisionTrace(TEST_SESSION); + expect(traces).toHaveLength(1); + const read = traces[0]; + + // Normalized to v2 + expect(read.version).toBe(2); + expect(read.primaryStory.storyRoute).toBe("/settings"); + expect(read.observedRoute).toBe("/settings"); // best-effort from v1 route + expect((read.primaryStory as any).route).toBeUndefined(); + }); + + test("mixed v1 and v2 traces are all normalized to v2", () => { + const v1Trace = { + version: 1, + decisionId: "v1-trace-0000001", + sessionId: TEST_SESSION, + hook: "PreToolUse", + toolName: "Bash", + toolTarget: "npm run dev", + timestamp: "2026-03-27T08:00:00.000Z", + primaryStory: { + id: "story-1", + kind: "flow-verification", + route: "/old-route", + targetBoundary: "uiRender", + }, + policyScenario: null, + matchedSkills: [], + injectedSkills: [], + skippedReasons: [], + ranked: [], + verification: null, + }; + + const v2Trace = makeTrace({ + decisionId: "v2-trace-0000002", + primaryStory: { + id: "story-2", + kind: "flow-verification", + storyRoute: "/new-route", + targetBoundary: "clientRequest", + }, + observedRoute: "/api/data", + }); + + mkdirSync(traceDir(TEST_SESSION), { recursive: true }); + appendFileSync( + tracePath(TEST_SESSION), + JSON.stringify(v1Trace) + "\n", + "utf8", + ); + appendFileSync( + tracePath(TEST_SESSION), + JSON.stringify(v2Trace) + "\n", + "utf8", + ); + + const traces = readRoutingDecisionTrace(TEST_SESSION); + expect(traces).toHaveLength(2); + expect(traces[0].version).toBe(2); + expect(traces[1].version).toBe(2); + expect(traces[0].primaryStory.storyRoute).toBe("/old-route"); + expect(traces[1].primaryStory.storyRoute).toBe("/new-route"); + expect(traces[1].observedRoute).toBe("/api/data"); + }); + + test("v1 trace with null route normalizes correctly", () => { + const v1Trace = { + version: 1, + decisionId: "v1-null-route", + sessionId: TEST_SESSION, + hook: "UserPromptSubmit", + toolName: "Prompt", + toolTarget: "deploy", + timestamp: "2026-03-27T08:00:00.000Z", + primaryStory: { + id: null, + kind: null, + route: null, + targetBoundary: null, + }, + policyScenario: null, + matchedSkills: [], + injectedSkills: [], + skippedReasons: ["no_active_verification_story"], + ranked: [], + verification: null, + }; + + mkdirSync(traceDir(TEST_SESSION), { recursive: true }); + appendFileSync( + tracePath(TEST_SESSION), + JSON.stringify(v1Trace) + "\n", + "utf8", + ); + + const [read] = readRoutingDecisionTrace(TEST_SESSION); + expect(read.version).toBe(2); + expect(read.primaryStory.storyRoute).toBeNull(); + expect(read.observedRoute).toBeNull(); + }); + }); + + // ------------------------------------------------------------------------- + // Synthetic injection tracking + // ------------------------------------------------------------------------- + + describe("synthetic injection tracking", () => { + test("synthetic flag distinguishes pattern-matched from synthetic injections", () => { + const trace = makeTrace({ + ranked: [ + { + skill: "agent-browser-verify", + basePriority: 7, + effectivePriority: 15, + pattern: { type: "bashPattern", value: "dev server" }, + profilerBoost: 0, + policyBoost: 8, + policyReason: "4/5 wins", + summaryOnly: false, + synthetic: false, + droppedReason: null, + }, + { + skill: "verification", + basePriority: 0, + effectivePriority: 0, + pattern: { type: "dev-server-companion", value: "dev-server-co-inject" }, + profilerBoost: 0, + policyBoost: 0, + policyReason: null, + summaryOnly: false, + synthetic: true, + droppedReason: null, + }, + { + skill: "react-best-practices", + basePriority: 0, + effectivePriority: 0, + pattern: { type: "tsx-edit-threshold", value: "tsx-review-trigger" }, + profilerBoost: 0, + policyBoost: 0, + policyReason: null, + summaryOnly: false, + synthetic: true, + droppedReason: "cap_exceeded", + }, + ], + }); + + appendRoutingDecisionTrace(trace); + const [read] = readRoutingDecisionTrace(TEST_SESSION); + + const patternMatched = read.ranked.filter((r) => !r.synthetic); + const synthetic = read.ranked.filter((r) => r.synthetic); + + expect(patternMatched).toHaveLength(1); + expect(patternMatched[0].skill).toBe("agent-browser-verify"); + expect(synthetic).toHaveLength(2); + expect(synthetic.map((s) => s.skill).sort()).toEqual([ + "react-best-practices", + "verification", + ]); + }); + + test("one trace line reconstructs final injected set plus dropped candidates", () => { + const trace = makeTrace({ + injectedSkills: ["agent-browser-verify", "verification"], + ranked: [ + { + skill: "agent-browser-verify", + basePriority: 7, + effectivePriority: 15, + pattern: { type: "bashPattern", value: "dev" }, + profilerBoost: 0, + policyBoost: 8, + policyReason: "4/5 wins", + summaryOnly: false, + synthetic: false, + droppedReason: null, + }, + { + skill: "verification", + basePriority: 0, + effectivePriority: 0, + pattern: null, + profilerBoost: 0, + policyBoost: 0, + policyReason: null, + summaryOnly: false, + synthetic: true, + droppedReason: null, + }, + { + skill: "react-best-practices", + basePriority: 6, + effectivePriority: 6, + pattern: null, + profilerBoost: 0, + policyBoost: 0, + policyReason: null, + summaryOnly: false, + synthetic: true, + droppedReason: "cap_exceeded", + }, + { + skill: "nextjs-basics", + basePriority: 5, + effectivePriority: 5, + pattern: { type: "pathPattern", value: "**/*.tsx" }, + profilerBoost: 0, + policyBoost: 0, + policyReason: null, + summaryOnly: false, + synthetic: false, + droppedReason: "deduped", + }, + ], + }); + + appendRoutingDecisionTrace(trace); + const [read] = readRoutingDecisionTrace(TEST_SESSION); + + // Reconstruct final injected set from ranked + const injected = read.ranked.filter((r) => r.droppedReason === null); + expect(injected.map((r) => r.skill).sort()).toEqual( + [...read.injectedSkills].sort(), + ); + + // Reconstruct dropped candidates + const dropped = read.ranked.filter((r) => r.droppedReason !== null); + expect(dropped).toHaveLength(2); + expect(dropped.find((r) => r.skill === "react-best-practices")?.droppedReason).toBe("cap_exceeded"); + expect(dropped.find((r) => r.skill === "nextjs-basics")?.droppedReason).toBe("deduped"); + }); + }); + + // ------------------------------------------------------------------------- + // Unsafe session IDs + // ------------------------------------------------------------------------- + + describe("unsafe session IDs", () => { + test("session with slashes is hashed for path safety", () => { + const unsafeSession = "unsafe/session:id"; + const trace = makeTrace({ sessionId: unsafeSession }); + appendRoutingDecisionTrace(trace); + + const traces = readRoutingDecisionTrace(unsafeSession); + expect(traces).toHaveLength(1); + expect(traces[0].sessionId).toBe(unsafeSession); + + // The directory should use the hashed name + const dir = traceDir(unsafeSession); + const hash = createHash("sha256") + .update(unsafeSession) + .digest("hex"); + expect(dir).toContain(hash); + }); + }); + + // ------------------------------------------------------------------------- + // Causality: causes and edges round-trip + // ------------------------------------------------------------------------- + + describe("causality round-trip", () => { + test("traces with causes and edges round-trip through JSONL", () => { + const causes: RoutingDecisionCause[] = [ + { + code: "pattern-match", + stage: "match", + skill: "agent-browser-verify", + synthetic: false, + scoreDelta: 0, + message: "Matched bashPattern pattern", + detail: { matchType: "bashPattern", pattern: "dev server" }, + }, + { + code: "policy-recall", + stage: "rank", + skill: "agent-browser-verify", + synthetic: true, + scoreDelta: 0, + message: "Recalled historically verified skill", + detail: { scenario: "PreToolUse|bugfix|uiRender|Bash|/settings", wins: 4 }, + }, + { + code: "verified-companion", + stage: "rank", + skill: "verification", + synthetic: true, + scoreDelta: 0, + message: "Inserted learned companion after agent-browser-verify", + detail: { candidateSkill: "agent-browser-verify", confidence: 0.93 }, + }, + ]; + const edges: RoutingDecisionEdge[] = [ + { + fromSkill: "agent-browser-verify", + toSkill: "verification", + relation: "companion-of", + code: "verified-companion", + detail: { confidence: 0.93, scenario: "PreToolUse|bugfix|uiRender|Bash|/settings" }, + }, + ]; + const trace = makeTrace({ causes, edges }); + appendRoutingDecisionTrace(trace); + + const [read] = readRoutingDecisionTrace(TEST_SESSION); + expect(read.causes).toEqual(causes); + expect(read.edges).toEqual(edges); + }); + + test("old v2 traces without causes/edges get empty arrays on read", () => { + // Simulate a v2 trace written before the causality feature + const rawTrace = { + version: 2, + decisionId: "pre-causality-0001", + sessionId: TEST_SESSION, + hook: "PreToolUse", + toolName: "Bash", + toolTarget: "npm run dev", + timestamp: "2026-03-27T08:00:00.000Z", + primaryStory: { id: null, kind: null, storyRoute: null, targetBoundary: null }, + observedRoute: null, + policyScenario: null, + matchedSkills: [], + injectedSkills: [], + skippedReasons: [], + ranked: [], + verification: null, + // no causes or edges field + }; + + mkdirSync(traceDir(TEST_SESSION), { recursive: true }); + appendFileSync( + tracePath(TEST_SESSION), + JSON.stringify(rawTrace) + "\n", + "utf8", + ); + + const [read] = readRoutingDecisionTrace(TEST_SESSION); + expect(read.causes).toEqual([]); + expect(read.edges).toEqual([]); + }); + + test("v1 traces get empty causes and edges on normalization", () => { + const v1Trace = { + version: 1, + decisionId: "v1-causality-test", + sessionId: TEST_SESSION, + hook: "PreToolUse", + toolName: "Bash", + toolTarget: "npm run dev", + timestamp: "2026-03-27T08:00:00.000Z", + primaryStory: { id: null, kind: null, route: null, targetBoundary: null }, + policyScenario: null, + matchedSkills: [], + injectedSkills: [], + skippedReasons: [], + ranked: [], + verification: null, + }; + + mkdirSync(traceDir(TEST_SESSION), { recursive: true }); + appendFileSync( + tracePath(TEST_SESSION), + JSON.stringify(v1Trace) + "\n", + "utf8", + ); + + const [read] = readRoutingDecisionTrace(TEST_SESSION); + expect(read.version).toBe(2); + expect(read.causes).toEqual([]); + expect(read.edges).toEqual([]); + }); + }); + + // ------------------------------------------------------------------------- + // Causality helpers: deterministic sorting + // ------------------------------------------------------------------------- + + describe("routing-decision-causality", () => { + test("createDecisionCausality returns empty causes and edges", () => { + const store = createDecisionCausality(); + expect(store.causes).toEqual([]); + expect(store.edges).toEqual([]); + }); + + test("addCause sorts detail keys deterministically", () => { + const store = createDecisionCausality(); + addCause(store, { + code: "pattern-match", + stage: "match", + skill: "nextjs-basics", + synthetic: false, + scoreDelta: 0, + message: "Matched pathPattern", + detail: { zebra: 1, alpha: 2, middle: 3 }, + }); + + const keys = Object.keys(store.causes[0].detail); + expect(keys).toEqual(["alpha", "middle", "zebra"]); + }); + + test("addCause sorts nested detail objects", () => { + const store = createDecisionCausality(); + addCause(store, { + code: "policy-boost", + stage: "rank", + skill: "agent-browser-verify", + synthetic: false, + scoreDelta: 8, + message: "Policy boost", + detail: { z: { b: 1, a: 2 }, a: { d: 3, c: 4 } }, + }); + + const detail = store.causes[0].detail; + expect(Object.keys(detail)).toEqual(["a", "z"]); + expect(Object.keys(detail.a as Record)).toEqual(["c", "d"]); + expect(Object.keys(detail.z as Record)).toEqual(["a", "b"]); + }); + + test("causes are sorted by (skill, stage, code, message) regardless of insertion order", () => { + const store = createDecisionCausality(); + + // Insert in reverse alphabetical order + addCause(store, { + code: "verified-companion", + stage: "rank", + skill: "verification", + synthetic: true, + scoreDelta: 0, + message: "Companion", + detail: {}, + }); + addCause(store, { + code: "pattern-match", + stage: "match", + skill: "agent-browser-verify", + synthetic: false, + scoreDelta: 0, + message: "Pattern match", + detail: {}, + }); + addCause(store, { + code: "policy-boost", + stage: "rank", + skill: "agent-browser-verify", + synthetic: false, + scoreDelta: 8, + message: "Policy boost", + detail: {}, + }); + + // Sorted by skill first, then stage, code, message + expect(store.causes[0].skill).toBe("agent-browser-verify"); + expect(store.causes[0].code).toBe("pattern-match"); + expect(store.causes[1].skill).toBe("agent-browser-verify"); + expect(store.causes[1].code).toBe("policy-boost"); + expect(store.causes[2].skill).toBe("verification"); + }); + + test("edges are sorted by (fromSkill, toSkill, relation, code) regardless of insertion order", () => { + const store = createDecisionCausality(); + + addEdge(store, { + fromSkill: "nextjs-basics", + toSkill: "verification", + relation: "companion-of", + code: "verified-companion", + detail: {}, + }); + addEdge(store, { + fromSkill: "agent-browser-verify", + toSkill: "verification", + relation: "companion-of", + code: "verified-companion", + detail: {}, + }); + + expect(store.edges[0].fromSkill).toBe("agent-browser-verify"); + expect(store.edges[1].fromSkill).toBe("nextjs-basics"); + }); + + test("addEdge sorts detail keys deterministically", () => { + const store = createDecisionCausality(); + addEdge(store, { + fromSkill: "a", + toSkill: "b", + relation: "companion-of", + code: "test", + detail: { scenario: "x", confidence: 0.9, alpha: true }, + }); + + const keys = Object.keys(store.edges[0].detail); + expect(keys).toEqual(["alpha", "confidence", "scenario"]); + }); + + test("causesForSkill filters by skill name", () => { + const store = createDecisionCausality(); + addCause(store, { + code: "pattern-match", + stage: "match", + skill: "agent-browser-verify", + synthetic: false, + scoreDelta: 0, + message: "Pattern", + detail: {}, + }); + addCause(store, { + code: "policy-boost", + stage: "rank", + skill: "agent-browser-verify", + synthetic: false, + scoreDelta: 8, + message: "Boost", + detail: {}, + }); + addCause(store, { + code: "verified-companion", + stage: "rank", + skill: "verification", + synthetic: true, + scoreDelta: 0, + message: "Companion", + detail: {}, + }); + + const abvCauses = causesForSkill(store, "agent-browser-verify"); + expect(abvCauses).toHaveLength(2); + expect(abvCauses.every((c) => c.skill === "agent-browser-verify")).toBe(true); + + const verifCauses = causesForSkill(store, "verification"); + expect(verifCauses).toHaveLength(1); + expect(verifCauses[0].code).toBe("verified-companion"); + + const noneCauses = causesForSkill(store, "nonexistent"); + expect(noneCauses).toEqual([]); + }); + + test("deterministic serialization: same causes in different order produce identical JSON", () => { + const storeA = createDecisionCausality(); + const storeB = createDecisionCausality(); + + const cause1: RoutingDecisionCause = { + code: "verified-companion", + stage: "rank", + skill: "verification", + synthetic: true, + scoreDelta: 0, + message: "Companion", + detail: { candidateSkill: "agent-browser-verify", confidence: 0.93 }, + }; + const cause2: RoutingDecisionCause = { + code: "pattern-match", + stage: "match", + skill: "agent-browser-verify", + synthetic: false, + scoreDelta: 0, + message: "Pattern match", + detail: { matchType: "bashPattern", pattern: "dev" }, + }; + + // Insert in opposite orders + addCause(storeA, cause1); + addCause(storeA, cause2); + + addCause(storeB, cause2); + addCause(storeB, cause1); + + expect(JSON.stringify(storeA.causes)).toBe(JSON.stringify(storeB.causes)); + }); + + test("deterministic serialization: same edges in different order produce identical JSON", () => { + const storeA = createDecisionCausality(); + const storeB = createDecisionCausality(); + + const edge1: RoutingDecisionEdge = { + fromSkill: "nextjs-basics", + toSkill: "verification", + relation: "companion-of", + code: "verified-companion", + detail: { scenario: "test" }, + }; + const edge2: RoutingDecisionEdge = { + fromSkill: "agent-browser-verify", + toSkill: "verification", + relation: "companion-of", + code: "verified-companion", + detail: { scenario: "test" }, + }; + + addEdge(storeA, edge1); + addEdge(storeA, edge2); + + addEdge(storeB, edge2); + addEdge(storeB, edge1); + + expect(JSON.stringify(storeA.edges)).toBe(JSON.stringify(storeB.edges)); + }); + + test("causality persists through trace round-trip with deterministic ordering", () => { + const store = createDecisionCausality(); + + // Insert causes in reverse order + addCause(store, { + code: "dropped-cap", + stage: "inject", + skill: "react-best-practices", + synthetic: false, + scoreDelta: 0, + message: "Dropped because max skill cap was exceeded", + detail: { maxSkills: 3 }, + }); + addCause(store, { + code: "pattern-match", + stage: "match", + skill: "agent-browser-verify", + synthetic: false, + scoreDelta: 0, + message: "Matched bashPattern", + detail: { pattern: "dev", matchType: "bashPattern" }, + }); + + addEdge(store, { + fromSkill: "agent-browser-verify", + toSkill: "verification", + relation: "companion-of", + code: "verified-companion", + detail: { confidence: 0.93 }, + }); + + const trace = makeTrace({ + causes: store.causes, + edges: store.edges, + }); + appendRoutingDecisionTrace(trace); + + const [read] = readRoutingDecisionTrace(TEST_SESSION); + + // Verify deterministic order persists through serialization + expect(read.causes[0].skill).toBe("agent-browser-verify"); + expect(read.causes[0].code).toBe("pattern-match"); + expect(read.causes[1].skill).toBe("react-best-practices"); + expect(read.causes[1].code).toBe("dropped-cap"); + + expect(read.edges).toHaveLength(1); + expect(read.edges[0].fromSkill).toBe("agent-browser-verify"); + expect(read.edges[0].toSkill).toBe("verification"); + + // Detail keys are sorted + expect(Object.keys(read.causes[0].detail)).toEqual(["matchType", "pattern"]); + }); + }); +}); diff --git a/tests/routing-diagnosis.test.ts b/tests/routing-diagnosis.test.ts new file mode 100644 index 0000000..2c7d6e3 --- /dev/null +++ b/tests/routing-diagnosis.test.ts @@ -0,0 +1,356 @@ +import { describe, expect, test } from "bun:test"; +import { + createEmptyRoutingPolicy, + type RoutingPolicyFile, +} from "../hooks/src/routing-policy.mts"; +import { + explainPolicyRecall, + parsePolicyScenario, +} from "../hooks/src/routing-diagnosis.mts"; + +const T0 = "2026-03-27T22:53:34.623Z"; + +function put( + policy: RoutingPolicyFile, + scenario: string, + skill: string, + exposures: number, + wins: number, + directiveWins: number, + staleMisses: number, +): void { + policy.scenarios[scenario] ??= {}; + policy.scenarios[scenario][skill] = { + exposures, + wins, + directiveWins, + staleMisses, + lastUpdatedAt: T0, + }; +} + +describe("routing-diagnosis", () => { + test("parsePolicyScenario parses legacy and route-aware keys", () => { + expect( + parsePolicyScenario( + "PreToolUse|flow-verification|clientRequest|Bash|/settings", + ), + ).toEqual({ + hook: "PreToolUse", + storyKind: "flow-verification", + targetBoundary: "clientRequest", + toolName: "Bash", + routeScope: "/settings", + }); + + expect( + parsePolicyScenario("UserPromptSubmit|deployment|none|Prompt"), + ).toEqual({ + hook: "UserPromptSubmit", + storyKind: "deployment", + targetBoundary: null, + toolName: "Prompt", + routeScope: null, + }); + }); + + test("parsePolicyScenario returns null for invalid inputs", () => { + expect(parsePolicyScenario(null)).toBeNull(); + expect(parsePolicyScenario("")).toBeNull(); + expect(parsePolicyScenario("too|few")).toBeNull(); + expect(parsePolicyScenario("Invalid|x|y|z")).toBeNull(); + }); + + test("exact route bucket wins over wildcard and legacy buckets", () => { + const policy = createEmptyRoutingPolicy(); + + put( + policy, + "PreToolUse|flow-verification|clientRequest|Bash|/settings", + "verification", + 4, + 3, + 1, + 0, + ); + put( + policy, + "PreToolUse|flow-verification|clientRequest|Bash|*", + "observability", + 8, + 8, + 0, + 0, + ); + put( + policy, + "PreToolUse|flow-verification|clientRequest|Bash", + "workflow", + 8, + 8, + 0, + 0, + ); + + const diagnosis = explainPolicyRecall( + policy, + { + hook: "PreToolUse", + storyKind: "flow-verification", + targetBoundary: "clientRequest", + toolName: "Bash", + routeScope: "/settings", + }, + { excludeSkills: new Set(), maxCandidates: 1 }, + ); + + expect(diagnosis.selectedBucket).toBe( + "PreToolUse|flow-verification|clientRequest|Bash|/settings", + ); + expect(diagnosis.selected.map((c) => c.skill)).toEqual([ + "verification", + ]); + expect( + diagnosis.rejected.some((c) => + c.rejectedReason?.startsWith("shadowed_by_selected_bucket:"), + ), + ).toBe(true); + }); + + test("diagnosis emits exposure remediation when sample size is too small", () => { + const policy = createEmptyRoutingPolicy(); + + put( + policy, + "PreToolUse|flow-verification|clientRequest|Bash|/settings", + "verification", + 2, + 2, + 0, + 0, + ); + + const diagnosis = explainPolicyRecall( + policy, + { + hook: "PreToolUse", + storyKind: "flow-verification", + targetBoundary: "clientRequest", + toolName: "Bash", + routeScope: "/settings", + }, + { excludeSkills: new Set(), maxCandidates: 1 }, + ); + + expect(diagnosis.selected).toEqual([]); + expect( + diagnosis.hints.find( + (h) => h.code === "POLICY_RECALL_NEEDS_EXPOSURES", + ), + ).toMatchObject({ + action: { + type: "collect_more_exposures", + skill: "verification", + scenario: + "PreToolUse|flow-verification|clientRequest|Bash|/settings", + remainingExposures: 1, + }, + }); + }); + + test("diagnosis emits already-present hint when a qualifying candidate is excluded", () => { + const policy = createEmptyRoutingPolicy(); + + put( + policy, + "PreToolUse|flow-verification|clientRequest|Bash|/settings", + "verification", + 5, + 5, + 0, + 0, + ); + + const diagnosis = explainPolicyRecall( + policy, + { + hook: "PreToolUse", + storyKind: "flow-verification", + targetBoundary: "clientRequest", + toolName: "Bash", + routeScope: "/settings", + }, + { excludeSkills: new Set(["verification"]), maxCandidates: 1 }, + ); + + expect(diagnosis.selected).toEqual([]); + expect( + diagnosis.hints.find( + (h) => h.code === "POLICY_RECALL_ALREADY_PRESENT", + ), + ).toMatchObject({ + action: { + type: "candidate_already_present", + skill: "verification", + }, + }); + }); + + test("no target boundary returns ineligible diagnosis", () => { + const policy = createEmptyRoutingPolicy(); + + const diagnosis = explainPolicyRecall(policy, { + hook: "PreToolUse", + storyKind: "flow-verification", + targetBoundary: null, + toolName: "Bash", + routeScope: "/settings", + }); + + expect(diagnosis.eligible).toBe(false); + expect(diagnosis.skipReason).toBe("no_target_boundary"); + expect(diagnosis.checkedScenarios).toEqual([]); + expect(diagnosis.selected).toEqual([]); + expect(diagnosis.rejected).toEqual([]); + }); + + test("empty policy emits no-history hint", () => { + const policy = createEmptyRoutingPolicy(); + + const diagnosis = explainPolicyRecall( + policy, + { + hook: "PreToolUse", + storyKind: "flow-verification", + targetBoundary: "clientRequest", + toolName: "Bash", + routeScope: "/settings", + }, + { excludeSkills: new Set(), maxCandidates: 1 }, + ); + + expect(diagnosis.eligible).toBe(true); + expect(diagnosis.selected).toEqual([]); + expect( + diagnosis.hints.find( + (h) => h.code === "POLICY_RECALL_NO_HISTORY", + ), + ).toBeDefined(); + }); + + test("wildcard bucket selection emits seed-exact-route hint", () => { + const policy = createEmptyRoutingPolicy(); + + put( + policy, + "PreToolUse|flow-verification|clientRequest|Bash|*", + "observability", + 5, + 5, + 0, + 0, + ); + + const diagnosis = explainPolicyRecall( + policy, + { + hook: "PreToolUse", + storyKind: "flow-verification", + targetBoundary: "clientRequest", + toolName: "Bash", + routeScope: "/settings", + }, + { excludeSkills: new Set(), maxCandidates: 1 }, + ); + + expect(diagnosis.selectedBucket).toBe( + "PreToolUse|flow-verification|clientRequest|Bash|*", + ); + expect( + diagnosis.hints.find( + (h) => h.code === "POLICY_RECALL_USING_WILDCARD_ROUTE", + ), + ).toBeDefined(); + }); + + test("low success rate emits appropriate hint", () => { + const policy = createEmptyRoutingPolicy(); + + put( + policy, + "PreToolUse|flow-verification|clientRequest|Bash|/settings", + "verification", + 10, + 3, + 0, + 4, + ); + + const diagnosis = explainPolicyRecall( + policy, + { + hook: "PreToolUse", + storyKind: "flow-verification", + targetBoundary: "clientRequest", + toolName: "Bash", + routeScope: "/settings", + }, + { excludeSkills: new Set(), maxCandidates: 1 }, + ); + + expect(diagnosis.selected).toEqual([]); + expect( + diagnosis.hints.find( + (h) => h.code === "POLICY_RECALL_LOW_SUCCESS_RATE", + ), + ).toMatchObject({ + action: { + type: "improve_success_rate", + skill: "verification", + }, + }); + }); + + test("precedence hint emitted when lower-priority bucket is shadowed", () => { + const policy = createEmptyRoutingPolicy(); + + // Exact route bucket with a qualifying skill + put( + policy, + "PreToolUse|flow-verification|clientRequest|Bash|/settings", + "verification", + 4, + 3, + 1, + 0, + ); + // Legacy bucket with a qualifying skill — should be shadowed + put( + policy, + "PreToolUse|flow-verification|clientRequest|Bash", + "workflow", + 8, + 8, + 0, + 0, + ); + + const diagnosis = explainPolicyRecall( + policy, + { + hook: "PreToolUse", + storyKind: "flow-verification", + targetBoundary: "clientRequest", + toolName: "Bash", + routeScope: "/settings", + }, + { excludeSkills: new Set(), maxCandidates: 1 }, + ); + + expect( + diagnosis.hints.find( + (h) => h.code === "POLICY_RECALL_PRECEDENCE_APPLIED", + ), + ).toBeDefined(); + }); +}); diff --git a/tests/routing-explain.test.ts b/tests/routing-explain.test.ts new file mode 100644 index 0000000..a8740f9 --- /dev/null +++ b/tests/routing-explain.test.ts @@ -0,0 +1,501 @@ +import { describe, test, expect, beforeEach, afterEach } from "bun:test"; +import { rmSync } from "node:fs"; +import { + appendRoutingDecisionTrace, + traceDir, + type RoutingDecisionTrace, +} from "../hooks/src/routing-decision-trace.mts"; +import { + runRoutingExplain, + type RoutingExplainResult, +} from "../src/commands/routing-explain.ts"; + +// --------------------------------------------------------------------------- +// Helpers +// --------------------------------------------------------------------------- + +const TEST_SESSION = "test-session-rexplain-" + Date.now(); + +function makeTrace( + overrides: Partial = {}, +): RoutingDecisionTrace { + return { + version: 1, + decisionId: "deadbeef01234567", + sessionId: TEST_SESSION, + hook: "PreToolUse", + toolName: "Bash", + toolTarget: "npm run dev", + timestamp: "2026-03-27T08:00:00.000Z", + primaryStory: { + id: "story-1", + kind: "flow-verification", + route: "/settings", + targetBoundary: "uiRender", + }, + policyScenario: "PreToolUse|flow-verification|uiRender|Bash", + matchedSkills: ["agent-browser-verify"], + injectedSkills: ["agent-browser-verify"], + skippedReasons: [], + ranked: [ + { + skill: "agent-browser-verify", + basePriority: 7, + effectivePriority: 15, + pattern: { type: "bashPattern", value: "dev server" }, + profilerBoost: 0, + policyBoost: 8, + policyReason: "4/5 wins", + summaryOnly: false, + synthetic: false, + droppedReason: null, + }, + ], + verification: null, + ...overrides, + }; +} + +// --------------------------------------------------------------------------- +// Cleanup +// --------------------------------------------------------------------------- + +afterEach(() => { + try { + rmSync(traceDir(TEST_SESSION), { recursive: true, force: true }); + } catch { + // ignore + } +}); + +// --------------------------------------------------------------------------- +// JSON mode +// --------------------------------------------------------------------------- + +describe("routing-explain JSON mode", () => { + test("returns parseable JSON with ok, decisionCount, latest when traces exist", () => { + const trace = makeTrace(); + appendRoutingDecisionTrace(trace); + + const output = runRoutingExplain(TEST_SESSION, true); + const result: RoutingExplainResult = JSON.parse(output); + + expect(result.ok).toBe(true); + expect(result.decisionCount).toBe(1); + expect(result.latest).not.toBeNull(); + expect(result.latest!.decisionId).toBe("deadbeef01234567"); + expect(result.latest!.hook).toBe("PreToolUse"); + expect(result.latest!.injectedSkills).toEqual(["agent-browser-verify"]); + }); + + test("returns latest trace when multiple exist", () => { + appendRoutingDecisionTrace( + makeTrace({ decisionId: "aaaa000000000000", timestamp: "2026-03-27T08:00:00.000Z" }), + ); + appendRoutingDecisionTrace( + makeTrace({ decisionId: "bbbb000000000000", timestamp: "2026-03-27T09:00:00.000Z" }), + ); + + const output = runRoutingExplain(TEST_SESSION, true); + const result: RoutingExplainResult = JSON.parse(output); + + expect(result.decisionCount).toBe(2); + expect(result.latest!.decisionId).toBe("bbbb000000000000"); + }); + + test("returns clean result when no traces exist", () => { + const output = runRoutingExplain(TEST_SESSION, true); + const result: RoutingExplainResult = JSON.parse(output); + + expect(result.ok).toBe(true); + expect(result.decisionCount).toBe(0); + expect(result.latest).toBeNull(); + }); + + test("returns clean result for null session", () => { + const output = runRoutingExplain(null, true); + const result: RoutingExplainResult = JSON.parse(output); + + expect(result.ok).toBe(true); + expect(result.decisionCount).toBeGreaterThanOrEqual(0); + }); +}); + +// --------------------------------------------------------------------------- +// Text mode +// --------------------------------------------------------------------------- + +describe("routing-explain text mode", () => { + test("prints decision id, hook, tool target, story, injected skills", () => { + appendRoutingDecisionTrace(makeTrace()); + + const output = runRoutingExplain(TEST_SESSION, false); + + expect(output).toContain("Decision: deadbeef01234567"); + expect(output).toContain("Hook: PreToolUse"); + expect(output).toContain("Tool: Bash"); + expect(output).toContain("Target: npm run dev"); + expect(output).toContain("Story: flow-verification (/settings)"); + expect(output).toContain("Injected: agent-browser-verify"); + }); + + test("prints ranked candidates with effective priority and policy boost", () => { + appendRoutingDecisionTrace(makeTrace()); + + const output = runRoutingExplain(TEST_SESSION, false); + + expect(output).toContain("Ranked:"); + expect(output).toContain("agent-browser-verify"); + expect(output).toContain("effective=15"); + expect(output).toContain("base=7"); + expect(output).toContain("policy=+8"); + expect(output).toContain("4/5 wins"); + }); + + test("prints policy scenario", () => { + appendRoutingDecisionTrace(makeTrace()); + + const output = runRoutingExplain(TEST_SESSION, false); + + expect(output).toContain("Policy scenario: PreToolUse|flow-verification|uiRender|Bash"); + }); + + test("returns clean non-throwing result when no traces exist", () => { + const output = runRoutingExplain(TEST_SESSION, false); + + expect(output).toContain("No routing decision traces found."); + expect(output).toContain("session-explain --json"); + }); + + test("prints skipped reasons for story-less routing", () => { + appendRoutingDecisionTrace( + makeTrace({ + primaryStory: { id: null, kind: null, route: null, targetBoundary: null }, + policyScenario: null, + skippedReasons: ["no_active_verification_story"], + }), + ); + + const output = runRoutingExplain(TEST_SESSION, false); + + expect(output).toContain("Skipped: no_active_verification_story"); + expect(output).toContain("Story: none"); + }); + + test("prints skipped reasons for budget and cap drops", () => { + appendRoutingDecisionTrace( + makeTrace({ + skippedReasons: [ + "cap_exceeded:some-skill", + "budget_exhausted:another-skill", + ], + ranked: [ + { + skill: "some-skill", + basePriority: 6, + effectivePriority: 6, + pattern: null, + profilerBoost: 0, + policyBoost: 0, + policyReason: null, + summaryOnly: false, + synthetic: false, + droppedReason: "cap_exceeded", + }, + { + skill: "another-skill", + basePriority: 5, + effectivePriority: 5, + pattern: null, + profilerBoost: 0, + policyBoost: 0, + policyReason: null, + summaryOnly: false, + synthetic: false, + droppedReason: "budget_exhausted", + }, + ], + }), + ); + + const output = runRoutingExplain(TEST_SESSION, false); + + expect(output).toContain("cap_exceeded:some-skill"); + expect(output).toContain("budget_exhausted:another-skill"); + expect(output).toContain("dropped=cap_exceeded"); + expect(output).toContain("dropped=budget_exhausted"); + }); + + test("prints verification closure info for PostToolUse traces", () => { + appendRoutingDecisionTrace( + makeTrace({ + hook: "PostToolUse", + verification: { + verificationId: "verif-abc", + observedBoundary: "uiRender", + matchedSuggestedAction: true, + }, + }), + ); + + const output = runRoutingExplain(TEST_SESSION, false); + + expect(output).toContain("Verification:"); + expect(output).toContain("id: verif-abc"); + expect(output).toContain("boundary: uiRender"); + expect(output).toContain("matched action: true"); + }); + + test("prints profiler boost when present", () => { + appendRoutingDecisionTrace( + makeTrace({ + ranked: [ + { + skill: "agent-browser-verify", + basePriority: 7, + effectivePriority: 20, + pattern: { type: "bashPattern", value: "dev server" }, + profilerBoost: 5, + policyBoost: 8, + policyReason: "4/5 wins", + summaryOnly: false, + synthetic: false, + droppedReason: null, + }, + ], + }), + ); + + const output = runRoutingExplain(TEST_SESSION, false); + + expect(output).toContain("profiler=+5"); + }); + + test("prints undertrained policy as zero boost without reason", () => { + appendRoutingDecisionTrace( + makeTrace({ + ranked: [ + { + skill: "some-skill", + basePriority: 6, + effectivePriority: 6, + pattern: { type: "pathPattern", value: "**/*.ts" }, + profilerBoost: 0, + policyBoost: 0, + policyReason: null, + summaryOnly: false, + synthetic: false, + droppedReason: null, + }, + ], + }), + ); + + const output = runRoutingExplain(TEST_SESSION, false); + + expect(output).toContain("effective=6"); + expect(output).toContain("base=6"); + // No policy line when boost is 0 + expect(output).not.toContain("policy="); + }); +}); + +// --------------------------------------------------------------------------- +// Diagnostic sufficiency +// --------------------------------------------------------------------------- + +describe("routing-explain diagnostic completeness", () => { + test("undertrained routing is distinguishable from story-less routing", () => { + // Undertrained: has story but policy boost is 0 + appendRoutingDecisionTrace( + makeTrace({ + decisionId: "undertrained-0001", + skippedReasons: [], + ranked: [ + { + skill: "some-skill", + basePriority: 6, + effectivePriority: 6, + pattern: null, + profilerBoost: 0, + policyBoost: 0, + policyReason: null, + summaryOnly: false, + synthetic: false, + droppedReason: null, + }, + ], + }), + ); + + const undertrainedOutput = runRoutingExplain(TEST_SESSION, false); + + // Clean up for next trace + rmSync(traceDir(TEST_SESSION), { recursive: true, force: true }); + + // Story-less: no active verification story + appendRoutingDecisionTrace( + makeTrace({ + decisionId: "storyless-00001", + primaryStory: { id: null, kind: null, route: null, targetBoundary: null }, + policyScenario: null, + skippedReasons: ["no_active_verification_story"], + }), + ); + + const storylessOutput = runRoutingExplain(TEST_SESSION, false); + + // These two outputs must be distinguishable + expect(storylessOutput).toContain("no_active_verification_story"); + expect(undertrainedOutput).not.toContain("no_active_verification_story"); + expect(undertrainedOutput).toContain("Story: flow-verification"); + expect(storylessOutput).toContain("Story: none"); + }); + + test("drop-by-budget and drop-by-cap are distinguishable in output", () => { + appendRoutingDecisionTrace( + makeTrace({ + skippedReasons: ["cap_exceeded:skill-a", "budget_exhausted:skill-b"], + }), + ); + + const output = runRoutingExplain(TEST_SESSION, false); + + expect(output).toContain("cap_exceeded:skill-a"); + expect(output).toContain("budget_exhausted:skill-b"); + }); +}); + +// --------------------------------------------------------------------------- +// Companion recall in routing-explain +// --------------------------------------------------------------------------- + +describe("routing-explain companion recall", () => { + test("text mode shows recalled companions from trace", () => { + appendRoutingDecisionTrace( + makeTrace({ + ranked: [ + { + skill: "verification", + basePriority: 7, + effectivePriority: 15, + pattern: { type: "bashPattern", value: "dev server" }, + profilerBoost: 0, + policyBoost: 8, + policyReason: "4/5 wins", + summaryOnly: false, + synthetic: false, + droppedReason: null, + }, + { + skill: "agent-browser-verify", + basePriority: 0, + effectivePriority: 0, + pattern: { type: "verified-companion", value: "scenario-companion-rulebook" }, + profilerBoost: 0, + policyBoost: 0, + policyReason: null, + summaryOnly: false, + synthetic: true, + droppedReason: null, + }, + ], + }), + ); + + const output = runRoutingExplain(TEST_SESSION, false); + + expect(output).toContain("Companions recalled:"); + expect(output).toContain("agent-browser-verify"); + }); + + test("text mode shows summary-only tag for dedup-bypassed companions", () => { + appendRoutingDecisionTrace( + makeTrace({ + ranked: [ + { + skill: "verification", + basePriority: 7, + effectivePriority: 7, + pattern: { type: "bashPattern", value: "npm run dev" }, + profilerBoost: 0, + policyBoost: 0, + policyReason: null, + summaryOnly: false, + synthetic: false, + droppedReason: null, + }, + { + skill: "agent-browser-verify", + basePriority: 0, + effectivePriority: 0, + pattern: { type: "verified-companion", value: "scenario-companion-rulebook" }, + profilerBoost: 0, + policyBoost: 0, + policyReason: null, + summaryOnly: true, + synthetic: true, + droppedReason: null, + }, + ], + }), + ); + + const output = runRoutingExplain(TEST_SESSION, false); + + expect(output).toContain("Companions recalled:"); + expect(output).toContain("agent-browser-verify (summary-only)"); + }); + + test("text mode omits companions section when none recalled", () => { + appendRoutingDecisionTrace(makeTrace()); + + const output = runRoutingExplain(TEST_SESSION, false); + + expect(output).not.toContain("Companions recalled:"); + }); + + test("JSON mode includes companion entries in ranked array", () => { + appendRoutingDecisionTrace( + makeTrace({ + ranked: [ + { + skill: "verification", + basePriority: 7, + effectivePriority: 7, + pattern: null, + profilerBoost: 0, + policyBoost: 0, + policyReason: null, + summaryOnly: false, + synthetic: false, + droppedReason: null, + }, + { + skill: "agent-browser-verify", + basePriority: 0, + effectivePriority: 0, + pattern: { type: "verified-companion", value: "scenario-companion-rulebook" }, + profilerBoost: 0, + policyBoost: 0, + policyReason: null, + summaryOnly: false, + synthetic: true, + droppedReason: null, + }, + ], + }), + ); + + const output = runRoutingExplain(TEST_SESSION, true); + const result: RoutingExplainResult = JSON.parse(output); + + expect(result.latest).not.toBeNull(); + const companionEntry = result.latest!.ranked.find( + (r) => r.pattern?.type === "verified-companion", + ); + expect(companionEntry).toBeDefined(); + expect(companionEntry!.skill).toBe("agent-browser-verify"); + expect(companionEntry!.synthetic).toBe(true); + }); +}); diff --git a/tests/routing-policy-compiler.test.ts b/tests/routing-policy-compiler.test.ts new file mode 100644 index 0000000..36c6598 --- /dev/null +++ b/tests/routing-policy-compiler.test.ts @@ -0,0 +1,1093 @@ +import { describe, test, expect } from "bun:test"; +import { + compilePolicyPatch, + applyPolicyPatch, + evaluatePromotionGate, + type PolicyPatchReport, + type PolicyPatchEntry, + type PromotionArtifact, + type PromotionGateResult, +} from "../hooks/src/routing-policy-compiler.mts"; +import { + createEmptyRoutingPolicy, + recordExposure, + recordOutcome, + derivePolicyBoost, + applyRulebookBoosts, + type RoutingPolicyFile, +} from "../hooks/src/routing-policy.mts"; +import type { + RoutingReplayReport, + RoutingRecommendation, +} from "../hooks/src/routing-replay.mts"; +import type { ReplayResult } from "../hooks/src/rule-distillation.mts"; + +// --------------------------------------------------------------------------- +// Fixtures +// --------------------------------------------------------------------------- + +const T0 = "2026-03-27T04:00:00.000Z"; +const T1 = "2026-03-27T04:01:00.000Z"; + +const SCENARIO_A = "PreToolUse|flow-verification|uiRender|Bash"; +const SCENARIO_B = "UserPromptSubmit|none|none|Prompt"; + +function makeReport( + overrides: Partial = {}, +): RoutingReplayReport { + return { + version: 1, + sessionId: "test-session-compiler", + traceCount: 10, + scenarioCount: 1, + scenarios: [], + recommendations: [], + ...overrides, + }; +} + +function makeRec( + overrides: Partial = {}, +): RoutingRecommendation { + return { + scenario: SCENARIO_A, + skill: "agent-browser-verify", + action: "promote", + suggestedBoost: 8, + confidence: 0.99, + reason: "4/4 wins in " + SCENARIO_A, + ...overrides, + }; +} + +// Helper: build a policy with stats that produce a known boost +function policyWithBoost( + scenario: string, + skill: string, + targetBoost: number, +): RoutingPolicyFile { + const policy = createEmptyRoutingPolicy(); + const base = { + hook: "PreToolUse" as const, + storyKind: scenario.split("|")[1] === "none" ? null : scenario.split("|")[1], + targetBoundary: + scenario.split("|")[2] === "none" + ? null + : (scenario.split("|")[2] as "uiRender"), + toolName: scenario.split("|")[3] as "Bash", + }; + + if (targetBoost === 8) { + // 5 exposures, 5 wins → rate 1.0 → boost 8 + for (let i = 0; i < 5; i++) { + recordExposure(policy, { ...base, skill, now: T0 }); + recordOutcome(policy, { ...base, skill, outcome: "win", now: T0 }); + } + } else if (targetBoost === 5) { + // 10 exposures, 7 wins → rate 0.70 → boost 5 + for (let i = 0; i < 10; i++) { + recordExposure(policy, { ...base, skill, now: T0 }); + } + for (let i = 0; i < 7; i++) { + recordOutcome(policy, { ...base, skill, outcome: "win", now: T0 }); + } + } else if (targetBoost === 2) { + // 4 exposures, 2 wins → rate 0.50 → boost 2 + for (let i = 0; i < 4; i++) { + recordExposure(policy, { ...base, skill, now: T0 }); + } + for (let i = 0; i < 2; i++) { + recordOutcome(policy, { ...base, skill, outcome: "win", now: T0 }); + } + } else if (targetBoost === -2) { + // 10 exposures, 1 win → rate 0.10 → boost -2 + for (let i = 0; i < 10; i++) { + recordExposure(policy, { ...base, skill, now: T0 }); + } + recordOutcome(policy, { ...base, skill, outcome: "win", now: T0 }); + } + // targetBoost 0 → empty policy + return policy; +} + +// --------------------------------------------------------------------------- +// Tests +// --------------------------------------------------------------------------- + +describe("routing-policy-compiler", () => { + // ------------------------------------------------------------------------- + // Pure function contract + // ------------------------------------------------------------------------- + + describe("compilePolicyPatch is a pure function", () => { + test("does not mutate the input policy", () => { + const policy = createEmptyRoutingPolicy(); + const policySnapshot = JSON.stringify(policy); + const report = makeReport({ + recommendations: [makeRec({ action: "promote" })], + }); + + compilePolicyPatch(policy, report); + expect(JSON.stringify(policy)).toBe(policySnapshot); + }); + + test("does not mutate the input report", () => { + const policy = createEmptyRoutingPolicy(); + const report = makeReport({ + recommendations: [makeRec({ action: "promote" })], + }); + const reportSnapshot = JSON.stringify(report); + + compilePolicyPatch(policy, report); + expect(JSON.stringify(report)).toBe(reportSnapshot); + }); + + test("returns version 1 with all required fields", () => { + const patch = compilePolicyPatch( + createEmptyRoutingPolicy(), + makeReport(), + ); + expect(patch.version).toBe(1); + expect(typeof patch.sessionId).toBe("string"); + expect(typeof patch.patchCount).toBe("number"); + expect(Array.isArray(patch.entries)).toBe(true); + }); + }); + + // ------------------------------------------------------------------------- + // Promote case + // ------------------------------------------------------------------------- + + describe("promote", () => { + test("emits promote when policy has no existing boost", () => { + const policy = createEmptyRoutingPolicy(); + const report = makeReport({ + recommendations: [makeRec({ action: "promote", suggestedBoost: 8 })], + }); + + const patch = compilePolicyPatch(policy, report); + expect(patch.patchCount).toBe(1); + expect(patch.entries[0].action).toBe("promote"); + expect(patch.entries[0].currentBoost).toBe(0); + expect(patch.entries[0].proposedBoost).toBe(8); + expect(patch.entries[0].delta).toBe(8); + }); + + test("emits promote when current boost is lower than proposed", () => { + const policy = policyWithBoost(SCENARIO_A, "agent-browser-verify", 2); + const report = makeReport({ + recommendations: [makeRec({ action: "promote", suggestedBoost: 8 })], + }); + + const patch = compilePolicyPatch(policy, report); + expect(patch.patchCount).toBe(1); + expect(patch.entries[0].action).toBe("promote"); + expect(patch.entries[0].currentBoost).toBe(2); + expect(patch.entries[0].proposedBoost).toBe(8); + expect(patch.entries[0].delta).toBe(6); + }); + + test("no-op when policy already has boost 8 for promote recommendation", () => { + const policy = policyWithBoost(SCENARIO_A, "agent-browser-verify", 8); + const report = makeReport({ + recommendations: [makeRec({ action: "promote", suggestedBoost: 8 })], + }); + + const patch = compilePolicyPatch(policy, report); + // delta is 0, action is not investigate → filtered out + expect(patch.patchCount).toBe(0); + }); + }); + + // ------------------------------------------------------------------------- + // Demote case + // ------------------------------------------------------------------------- + + describe("demote", () => { + test("emits demote when policy has no existing boost", () => { + const policy = createEmptyRoutingPolicy(); + const report = makeReport({ + recommendations: [ + makeRec({ + action: "demote", + suggestedBoost: -2, + confidence: 1.0, + reason: "0/6 wins", + }), + ], + }); + + const patch = compilePolicyPatch(policy, report); + expect(patch.patchCount).toBe(1); + expect(patch.entries[0].action).toBe("demote"); + expect(patch.entries[0].currentBoost).toBe(0); + expect(patch.entries[0].proposedBoost).toBe(-2); + expect(patch.entries[0].delta).toBe(-2); + }); + + test("emits demote when current boost is higher than proposed", () => { + const policy = policyWithBoost(SCENARIO_A, "agent-browser-verify", 5); + const report = makeReport({ + recommendations: [ + makeRec({ + action: "demote", + suggestedBoost: -2, + confidence: 0.95, + reason: "0/7 wins", + }), + ], + }); + + const patch = compilePolicyPatch(policy, report); + expect(patch.patchCount).toBe(1); + expect(patch.entries[0].action).toBe("demote"); + expect(patch.entries[0].currentBoost).toBe(5); + expect(patch.entries[0].proposedBoost).toBe(-2); + expect(patch.entries[0].delta).toBe(-7); + }); + + test("no-op when policy already at -2 for demote recommendation", () => { + const policy = policyWithBoost(SCENARIO_A, "agent-browser-verify", -2); + const report = makeReport({ + recommendations: [ + makeRec({ + action: "demote", + suggestedBoost: -2, + confidence: 1.0, + reason: "0/10 wins", + }), + ], + }); + + const patch = compilePolicyPatch(policy, report); + expect(patch.patchCount).toBe(0); + }); + }); + + // ------------------------------------------------------------------------- + // Investigate case + // ------------------------------------------------------------------------- + + describe("investigate", () => { + test("always emits investigate entry even when delta is 0", () => { + const policy = createEmptyRoutingPolicy(); + const report = makeReport({ + recommendations: [ + makeRec({ + action: "investigate", + suggestedBoost: 0, + confidence: 0.5, + reason: "2/4 mixed results", + }), + ], + }); + + const patch = compilePolicyPatch(policy, report); + expect(patch.patchCount).toBe(1); + expect(patch.entries[0].action).toBe("investigate"); + expect(patch.entries[0].proposedBoost).toBe(0); + expect(patch.entries[0].delta).toBe(0); + }); + + test("investigate with non-zero current boost still shows as investigate", () => { + const policy = policyWithBoost(SCENARIO_A, "agent-browser-verify", 5); + const report = makeReport({ + recommendations: [ + makeRec({ + action: "investigate", + suggestedBoost: 0, + confidence: 0.45, + reason: "3/7 mixed results", + }), + ], + }); + + const patch = compilePolicyPatch(policy, report); + expect(patch.patchCount).toBe(1); + // Even though delta is -5, investigate action is preserved + expect(patch.entries[0].action).toBe("investigate"); + expect(patch.entries[0].currentBoost).toBe(5); + expect(patch.entries[0].delta).toBe(-5); + }); + }); + + // ------------------------------------------------------------------------- + // No-op case (empty recommendations) + // ------------------------------------------------------------------------- + + describe("no-op", () => { + test("empty patch for report with no recommendations", () => { + const patch = compilePolicyPatch( + createEmptyRoutingPolicy(), + makeReport({ recommendations: [] }), + ); + expect(patch.patchCount).toBe(0); + expect(patch.entries).toEqual([]); + }); + + test("filters out recommendations where current boost matches proposed", () => { + const policy = policyWithBoost(SCENARIO_A, "agent-browser-verify", 8); + const report = makeReport({ + recommendations: [makeRec({ action: "promote", suggestedBoost: 8 })], + }); + + const patch = compilePolicyPatch(policy, report); + expect(patch.patchCount).toBe(0); + }); + }); + + // ------------------------------------------------------------------------- + // Deterministic ordering + // ------------------------------------------------------------------------- + + describe("deterministic patch ordering", () => { + test("entries are sorted by scenario asc, skill asc", () => { + const report = makeReport({ + recommendations: [ + makeRec({ scenario: SCENARIO_B, skill: "z-skill", action: "promote" }), + makeRec({ scenario: SCENARIO_A, skill: "b-skill", action: "promote" }), + makeRec({ scenario: SCENARIO_A, skill: "a-skill", action: "promote" }), + makeRec({ scenario: SCENARIO_B, skill: "a-skill", action: "promote" }), + ], + }); + + const patch = compilePolicyPatch(createEmptyRoutingPolicy(), report); + + const keys = patch.entries.map((e) => `${e.scenario}|${e.skill}`); + const sorted = [...keys].sort(); + expect(keys).toEqual(sorted); + }); + + test("produces identical JSON for identical input (deterministic)", () => { + const policy = createEmptyRoutingPolicy(); + const report = makeReport({ + recommendations: [ + makeRec({ scenario: SCENARIO_A, skill: "skill-x", action: "promote" }), + makeRec({ + scenario: SCENARIO_B, + skill: "skill-y", + action: "demote", + suggestedBoost: -2, + }), + ], + }); + + const patch1 = compilePolicyPatch(policy, report); + const patch2 = compilePolicyPatch(policy, report); + + expect(JSON.stringify(patch1)).toBe(JSON.stringify(patch2)); + }); + }); + + // ------------------------------------------------------------------------- + // derivePolicyBoost alignment + // ------------------------------------------------------------------------- + + describe("reuses derivePolicyBoost thresholds", () => { + test("promote maps to boost +8 (same as derivePolicyBoost >=80%)", () => { + const patch = compilePolicyPatch( + createEmptyRoutingPolicy(), + makeReport({ + recommendations: [makeRec({ action: "promote" })], + }), + ); + expect(patch.entries[0].proposedBoost).toBe(8); + }); + + test("demote maps to boost -2 (same as derivePolicyBoost <15%)", () => { + const patch = compilePolicyPatch( + createEmptyRoutingPolicy(), + makeReport({ + recommendations: [ + makeRec({ action: "demote", suggestedBoost: -2 }), + ], + }), + ); + expect(patch.entries[0].proposedBoost).toBe(-2); + }); + + test("investigate maps to boost 0 (same as derivePolicyBoost no-change zone)", () => { + const patch = compilePolicyPatch( + createEmptyRoutingPolicy(), + makeReport({ + recommendations: [ + makeRec({ action: "investigate", suggestedBoost: 0 }), + ], + }), + ); + expect(patch.entries[0].proposedBoost).toBe(0); + }); + }); + + // ------------------------------------------------------------------------- + // Multi-scenario, multi-skill fixture + // ------------------------------------------------------------------------- + + describe("complex fixture: multi-scenario multi-skill", () => { + test("handles promote + demote + investigate in one report", () => { + const policy = policyWithBoost(SCENARIO_A, "skill-stable", 8); + const report = makeReport({ + sessionId: "complex-fixture", + recommendations: [ + // promote from 0 → 8 + makeRec({ + scenario: SCENARIO_A, + skill: "skill-new", + action: "promote", + suggestedBoost: 8, + confidence: 0.95, + }), + // already at 8 → no-op (filtered) + makeRec({ + scenario: SCENARIO_A, + skill: "skill-stable", + action: "promote", + suggestedBoost: 8, + confidence: 0.99, + }), + // demote from 0 → -2 + makeRec({ + scenario: SCENARIO_B, + skill: "skill-bad", + action: "demote", + suggestedBoost: -2, + confidence: 1.0, + }), + // investigate from 0 → 0 + makeRec({ + scenario: SCENARIO_B, + skill: "skill-mixed", + action: "investigate", + suggestedBoost: 0, + confidence: 0.5, + }), + ], + }); + + const patch = compilePolicyPatch(policy, report); + + // skill-stable filtered (no-op), 3 entries remain + expect(patch.patchCount).toBe(3); + expect(patch.sessionId).toBe("complex-fixture"); + + const actions = patch.entries.map((e) => e.action); + expect(actions).toContain("promote"); + expect(actions).toContain("demote"); + expect(actions).toContain("investigate"); + + // Verify ordering + for (let i = 1; i < patch.entries.length; i++) { + const cmp = + patch.entries[i - 1].scenario.localeCompare( + patch.entries[i].scenario, + ) || + patch.entries[i - 1].skill.localeCompare(patch.entries[i].skill); + expect(cmp).toBeLessThanOrEqual(0); + } + }); + }); + + // ------------------------------------------------------------------------- + // Apply path + // ------------------------------------------------------------------------- + + describe("applyPolicyPatch", () => { + test("promote: produces PromotionArtifact with boost 8", () => { + const patch: PolicyPatchReport = { + version: 1, + sessionId: "apply-test", + patchCount: 1, + entries: [ + { + scenario: SCENARIO_A, + skill: "skill-a", + action: "promote", + currentBoost: 0, + proposedBoost: 8, + delta: 8, + confidence: 0.99, + reason: "test", + }, + ], + }; + + const artifact = applyPolicyPatch(patch, T1); + expect(artifact.applied).toBe(1); + expect(artifact.rules).toHaveLength(1); + expect(artifact.rules[0].action).toBe("promote"); + expect(artifact.rules[0].boost).toBe(8); + expect(artifact.rules[0].skill).toBe("skill-a"); + expect(artifact.rules[0].scenario).toBe(SCENARIO_A); + }); + + test("demote: produces PromotionArtifact with positive boost magnitude", () => { + const patch: PolicyPatchReport = { + version: 1, + sessionId: "apply-test", + patchCount: 1, + entries: [ + { + scenario: SCENARIO_A, + skill: "skill-b", + action: "demote", + currentBoost: 0, + proposedBoost: -2, + delta: -2, + confidence: 1.0, + reason: "test", + }, + ], + }; + + const artifact = applyPolicyPatch(patch, T1); + expect(artifact.applied).toBe(1); + expect(artifact.rules).toHaveLength(1); + expect(artifact.rules[0].action).toBe("demote"); + expect(artifact.rules[0].boost).toBe(2); + }); + + test("compiler-produced demote rule lowers runtime priority", () => { + const patch: PolicyPatchReport = { + version: 1, + sessionId: "apply-test", + patchCount: 1, + entries: [ + { + scenario: SCENARIO_A, + skill: "skill-b", + action: "demote", + currentBoost: 0, + proposedBoost: -2, + delta: -2, + confidence: 1.0, + reason: "test", + }, + ], + }; + + const artifact = applyPolicyPatch(patch, T1); + const gate = evaluatePromotionGate({ + artifact, + replay: { + baselineWins: 1, + baselineDirectiveWins: 1, + learnedWins: 1, + learnedDirectiveWins: 1, + deltaWins: 0, + deltaDirectiveWins: 0, + regressions: [], + }, + }); + + expect(gate.accepted).toBe(true); + if (!gate.rulebook) return; + + const boosted = applyRulebookBoosts( + [{ + skill: "skill-b", + priority: 8, + effectivePriority: 8, + policyBoost: 0, + policyReason: null, + }], + gate.rulebook, + { + hook: "PreToolUse", + storyKind: "flow-verification", + targetBoundary: "uiRender", + toolName: "Bash", + }, + "/tmp/test-rulebook.json", + ); + + expect(boosted[0].ruleBoost).toBe(-2); + expect(boosted[0].effectivePriority).toBe(6); + }); + + test("investigate: skipped, not included in rules", () => { + const patch: PolicyPatchReport = { + version: 1, + sessionId: "apply-test", + patchCount: 1, + entries: [ + { + scenario: SCENARIO_A, + skill: "skill-c", + action: "investigate", + currentBoost: 0, + proposedBoost: 0, + delta: 0, + confidence: 0.5, + reason: "test", + }, + ], + }; + + const artifact = applyPolicyPatch(patch, T1); + expect(artifact.applied).toBe(0); + expect(artifact.rules).toHaveLength(0); + }); + + test("no-op: skipped, not included in rules", () => { + const patch: PolicyPatchReport = { + version: 1, + sessionId: "apply-test", + patchCount: 1, + entries: [ + { + scenario: SCENARIO_A, + skill: "skill-d", + action: "no-op", + currentBoost: 8, + proposedBoost: 8, + delta: 0, + confidence: 0.99, + reason: "test", + }, + ], + }; + + const artifact = applyPolicyPatch(patch, T1); + expect(artifact.applied).toBe(0); + expect(artifact.rules).toHaveLength(0); + }); + + test("idempotent: applying same patch twice produces identical artifacts", () => { + const patch: PolicyPatchReport = { + version: 1, + sessionId: "idempotent-test", + patchCount: 1, + entries: [ + { + scenario: SCENARIO_A, + skill: "skill-idem", + action: "promote", + currentBoost: 0, + proposedBoost: 8, + delta: 8, + confidence: 0.99, + reason: "test", + }, + ], + }; + + const artifact1 = applyPolicyPatch(patch, T1); + const artifact2 = applyPolicyPatch(patch, T1); + + expect(JSON.stringify(artifact1)).toBe(JSON.stringify(artifact2)); + }); + + test("sets promotedAt to provided timestamp", () => { + const patch: PolicyPatchReport = { + version: 1, + sessionId: "ts-test", + patchCount: 1, + entries: [ + { + scenario: SCENARIO_A, + skill: "skill-ts", + action: "promote", + currentBoost: 0, + proposedBoost: 8, + delta: 8, + confidence: 0.99, + reason: "test", + }, + ], + }; + + const artifact = applyPolicyPatch(patch, T1); + expect(artifact.promotedAt).toBe(T1); + }); + + test("does not mutate any RoutingPolicyFile (evidence preservation)", () => { + const policy = policyWithBoost(SCENARIO_A, "skill-evidence", 2); + const policySnapshot = JSON.stringify(policy); + + const patch: PolicyPatchReport = { + version: 1, + sessionId: "evidence-test", + patchCount: 1, + entries: [ + { + scenario: SCENARIO_A, + skill: "skill-evidence", + action: "promote", + currentBoost: 2, + proposedBoost: 8, + delta: 6, + confidence: 0.99, + reason: "test", + }, + ], + }; + + // applyPolicyPatch no longer takes a policy — it cannot mutate one + const artifact = applyPolicyPatch(patch, T1); + expect(artifact.applied).toBe(1); + + // Policy remains completely untouched + expect(JSON.stringify(policy)).toBe(policySnapshot); + }); + + test("repeated application does not inflate counters or corrupt evidence", () => { + const patch: PolicyPatchReport = { + version: 1, + sessionId: "no-corruption-test", + patchCount: 2, + entries: [ + { + scenario: SCENARIO_A, + skill: "skill-x", + action: "promote", + currentBoost: 0, + proposedBoost: 8, + delta: 8, + confidence: 0.99, + reason: "test", + }, + { + scenario: SCENARIO_B, + skill: "skill-y", + action: "demote", + currentBoost: 5, + proposedBoost: -2, + delta: -7, + confidence: 1.0, + reason: "test", + }, + ], + }; + + // Apply 10 times — artifact is always the same + const artifacts: PromotionArtifact[] = []; + for (let i = 0; i < 10; i++) { + artifacts.push(applyPolicyPatch(patch, T1)); + } + + const first = JSON.stringify(artifacts[0]); + for (const a of artifacts) { + expect(JSON.stringify(a)).toBe(first); + } + expect(artifacts[0].applied).toBe(2); + expect(artifacts[0].rules).toHaveLength(2); + }); + + test("preserves confidence and reason in PromotedRule", () => { + const patch: PolicyPatchReport = { + version: 1, + sessionId: "passthrough-test", + patchCount: 1, + entries: [ + { + scenario: SCENARIO_A, + skill: "skill-pass", + action: "promote", + currentBoost: 0, + proposedBoost: 8, + delta: 8, + confidence: 0.73, + reason: "4/5 wins in scenario", + }, + ], + }; + + const artifact = applyPolicyPatch(patch, T1); + expect(artifact.rules[0].confidence).toBe(0.73); + expect(artifact.rules[0].reason).toBe("4/5 wins in scenario"); + }); + + test("version and sessionId are set correctly on artifact", () => { + const patch: PolicyPatchReport = { + version: 1, + sessionId: "meta-test", + patchCount: 0, + entries: [], + }; + + const artifact = applyPolicyPatch(patch, T1); + expect(artifact.version).toBe(1); + expect(artifact.sessionId).toBe("meta-test"); + expect(artifact.promotedAt).toBe(T1); + expect(artifact.applied).toBe(0); + expect(artifact.rules).toEqual([]); + }); + }); + + // ------------------------------------------------------------------------- + // Confidence passthrough + // ------------------------------------------------------------------------- + + test("preserves confidence from recommendation in patch entry", () => { + const patch = compilePolicyPatch( + createEmptyRoutingPolicy(), + makeReport({ + recommendations: [ + makeRec({ confidence: 0.73, action: "promote" }), + ], + }), + ); + + expect(patch.entries[0].confidence).toBe(0.73); + }); + + // ------------------------------------------------------------------------- + // Reason passthrough + // ------------------------------------------------------------------------- + + test("preserves reason from recommendation in patch entry", () => { + const reason = "4/5 wins in " + SCENARIO_A; + const patch = compilePolicyPatch( + createEmptyRoutingPolicy(), + makeReport({ + recommendations: [makeRec({ reason, action: "promote" })], + }), + ); + + expect(patch.entries[0].reason).toBe(reason); + }); +}); + +// --------------------------------------------------------------------------- +// Promotion gate +// --------------------------------------------------------------------------- + +describe("evaluatePromotionGate", () => { + function makeArtifact( + overrides: Partial = {}, + ): PromotionArtifact { + return { + version: 1, + sessionId: "gate-test", + promotedAt: T1, + applied: 1, + rules: [ + { + scenario: SCENARIO_A, + skill: "agent-browser-verify", + action: "promote", + boost: 8, + confidence: 0.95, + reason: "4/4 wins", + }, + ], + ...overrides, + }; + } + + function makeReplay(overrides: Partial = {}): ReplayResult { + return { + baselineWins: 4, + baselineDirectiveWins: 2, + learnedWins: 4, + learnedDirectiveWins: 2, + deltaWins: 0, + deltaDirectiveWins: 0, + regressions: [], + ...overrides, + }; + } + + // ----------------------------------------------------------------------- + // Acceptance + // ----------------------------------------------------------------------- + + test("accepts when no regressions and learnedWins >= baselineWins", () => { + const result = evaluatePromotionGate({ + artifact: makeArtifact(), + replay: makeReplay(), + }); + expect(result.accepted).toBe(true); + expect(result.errorCode).toBeNull(); + expect(result.rulebook).not.toBeNull(); + expect(result.rulebook!.rules).toHaveLength(1); + expect(result.rulebook!.rules[0].skill).toBe("agent-browser-verify"); + expect(result.rulebook!.rules[0].action).toBe("promote"); + }); + + test("accepted rulebook has deterministic rule IDs", () => { + const result = evaluatePromotionGate({ + artifact: makeArtifact(), + replay: makeReplay(), + }); + expect(result.rulebook!.rules[0].id).toBe( + `${SCENARIO_A}|agent-browser-verify`, + ); + }); + + test("accepted rulebook evidence matches replay", () => { + const replay = makeReplay({ + baselineWins: 6, + baselineDirectiveWins: 3, + learnedWins: 7, + learnedDirectiveWins: 4, + }); + const result = evaluatePromotionGate({ + artifact: makeArtifact(), + replay, + }); + const evidence = result.rulebook!.rules[0].evidence; + expect(evidence.baselineWins).toBe(6); + expect(evidence.baselineDirectiveWins).toBe(3); + expect(evidence.learnedWins).toBe(7); + expect(evidence.learnedDirectiveWins).toBe(4); + expect(evidence.regressionCount).toBe(0); + }); + + test("accepted rulebook sessionId and createdAt from artifact", () => { + const result = evaluatePromotionGate({ + artifact: makeArtifact({ sessionId: "my-session", promotedAt: T0 }), + replay: makeReplay(), + }); + expect(result.rulebook!.sessionId).toBe("my-session"); + expect(result.rulebook!.createdAt).toBe(T0); + }); + + test("accepts when learnedWins > baselineWins (improvement)", () => { + const result = evaluatePromotionGate({ + artifact: makeArtifact(), + replay: makeReplay({ learnedWins: 6, baselineWins: 4, deltaWins: 2 }), + }); + expect(result.accepted).toBe(true); + }); + + // ----------------------------------------------------------------------- + // Rejection: regressions + // ----------------------------------------------------------------------- + + test("rejects when regressions > 0", () => { + const result = evaluatePromotionGate({ + artifact: makeArtifact(), + replay: makeReplay({ regressions: ["d1", "d2"] }), + }); + expect(result.accepted).toBe(false); + expect(result.errorCode).toBe("RULEBOOK_PROMOTION_REJECTED_REGRESSION"); + expect(result.rulebook).toBeNull(); + expect(result.reason).toContain("regression"); + }); + + test("rejects with single regression", () => { + const result = evaluatePromotionGate({ + artifact: makeArtifact(), + replay: makeReplay({ regressions: ["d1"] }), + }); + expect(result.accepted).toBe(false); + expect(result.errorCode).toBe("RULEBOOK_PROMOTION_REJECTED_REGRESSION"); + }); + + // ----------------------------------------------------------------------- + // Rejection: learnedWins < baselineWins + // ----------------------------------------------------------------------- + + test("rejects when learnedWins < baselineWins", () => { + const result = evaluatePromotionGate({ + artifact: makeArtifact(), + replay: makeReplay({ + baselineWins: 5, + learnedWins: 3, + deltaWins: -2, + regressions: [], + }), + }); + expect(result.accepted).toBe(false); + expect(result.errorCode).toBe("RULEBOOK_PROMOTION_REJECTED_REGRESSION"); + expect(result.rulebook).toBeNull(); + expect(result.reason).toContain("learned wins"); + expect(result.reason).toContain("baseline wins"); + }); + + // ----------------------------------------------------------------------- + // Pure function / determinism + // ----------------------------------------------------------------------- + + test("same inputs produce identical output", () => { + const artifact = makeArtifact(); + const replay = makeReplay(); + const r1 = evaluatePromotionGate({ artifact, replay }); + const r2 = evaluatePromotionGate({ artifact, replay }); + expect(JSON.stringify(r1)).toBe(JSON.stringify(r2)); + }); + + test("does not mutate the input artifact", () => { + const artifact = makeArtifact(); + const snapshot = JSON.stringify(artifact); + evaluatePromotionGate({ artifact, replay: makeReplay() }); + expect(JSON.stringify(artifact)).toBe(snapshot); + }); + + // ----------------------------------------------------------------------- + // Deterministic ordering + // ----------------------------------------------------------------------- + + test("multi-rule accepted rulebook has deterministic ordering", () => { + const artifact = makeArtifact({ + rules: [ + { scenario: SCENARIO_B, skill: "z-skill", action: "promote", boost: 8, confidence: 0.9, reason: "test" }, + { scenario: SCENARIO_A, skill: "b-skill", action: "promote", boost: 8, confidence: 0.9, reason: "test" }, + { scenario: SCENARIO_A, skill: "a-skill", action: "promote", boost: 8, confidence: 0.9, reason: "test" }, + ], + applied: 3, + }); + const result = evaluatePromotionGate({ + artifact, + replay: makeReplay(), + }); + // Rules should be in the order they came from the artifact; + // serialization via serializeRulebook handles final ordering + expect(result.rulebook!.rules).toHaveLength(3); + }); + + // ----------------------------------------------------------------------- + // replay is always returned + // ----------------------------------------------------------------------- + + test("replay is returned in both accepted and rejected results", () => { + const replay = makeReplay({ baselineWins: 10, learnedWins: 10 }); + const accepted = evaluatePromotionGate({ + artifact: makeArtifact(), + replay, + }); + expect(accepted.replay).toBe(replay); + + const rejectedReplay = makeReplay({ regressions: ["d1"] }); + const rejected = evaluatePromotionGate({ + artifact: makeArtifact(), + replay: rejectedReplay, + }); + expect(rejected.replay).toBe(rejectedReplay); + }); + + // ----------------------------------------------------------------------- + // Empty artifact + // ----------------------------------------------------------------------- + + test("empty artifact accepted produces empty rulebook", () => { + const result = evaluatePromotionGate({ + artifact: makeArtifact({ rules: [], applied: 0 }), + replay: makeReplay({ baselineWins: 0, learnedWins: 0 }), + }); + expect(result.accepted).toBe(true); + expect(result.rulebook!.rules).toHaveLength(0); + }); + + test("accepted demote rulebook preserves positive stored magnitude", () => { + const result = evaluatePromotionGate({ + artifact: makeArtifact({ + applied: 1, + rules: [{ + scenario: SCENARIO_A, + skill: "agent-browser-verify", + action: "demote", + boost: 2, + confidence: 0.95, + reason: "1/10 wins", + }], + }), + replay: makeReplay(), + }); + + expect(result.accepted).toBe(true); + expect(result.rulebook!.rules[0].action).toBe("demote"); + expect(result.rulebook!.rules[0].boost).toBe(2); + }); +}); diff --git a/tests/routing-policy-directive-win-closure.test.ts b/tests/routing-policy-directive-win-closure.test.ts new file mode 100644 index 0000000..32744b0 --- /dev/null +++ b/tests/routing-policy-directive-win-closure.test.ts @@ -0,0 +1,245 @@ +import { describe, test, expect, beforeEach, afterEach } from "bun:test"; +import { unlinkSync } from "node:fs"; +import { + projectPolicyPath, + sessionExposurePath, + appendSkillExposure, + loadSessionExposures, + loadProjectRoutingPolicy, + resolveBoundaryOutcome, + type SkillExposure, +} from "../hooks/src/routing-policy-ledger.mts"; + +// --------------------------------------------------------------------------- +// Fixtures — deterministic timestamps, no wall-clock dependence +// --------------------------------------------------------------------------- + +const PROJECT_ROOT = "/tmp/test-directive-win-closure"; +const SESSION_ID = "directive-win-closure-" + Date.now(); + +const T0 = "2026-03-27T07:00:00.000Z"; +const T1 = "2026-03-27T07:01:00.000Z"; +const T2 = "2026-03-27T07:02:00.000Z"; + +function exposure(id: string, overrides: Partial = {}): SkillExposure { + return { + id, + sessionId: SESSION_ID, + projectRoot: PROJECT_ROOT, + storyId: "story-settings", + storyKind: "flow-verification", + route: "/settings", + hook: "PreToolUse", + toolName: "Bash", + skill: "agent-browser-verify", + targetBoundary: "clientRequest", + exposureGroupId: null, + attributionRole: "candidate", + candidateSkill: null, + createdAt: T0, + resolvedAt: null, + outcome: "pending", + ...overrides, + }; +} + +function cleanup() { + try { unlinkSync(projectPolicyPath(PROJECT_ROOT)); } catch {} + try { unlinkSync(sessionExposurePath(SESSION_ID)); } catch {} +} + +// --------------------------------------------------------------------------- +// Tests +// --------------------------------------------------------------------------- + +describe("directive-win closure", () => { + beforeEach(cleanup); + afterEach(cleanup); + + test("pending route-scoped exposure closes as directive-win when matchedSuggestedAction is true", () => { + appendSkillExposure(exposure("dw-1", { createdAt: T0 })); + + const resolved = resolveBoundaryOutcome({ + sessionId: SESSION_ID, + boundary: "clientRequest", + matchedSuggestedAction: true, + storyId: "story-settings", + route: "/settings", + now: T1, + }); + + expect(resolved).toHaveLength(1); + expect(resolved[0].outcome).toBe("directive-win"); + expect(resolved[0].resolvedAt).toBe(T1); + expect(resolved[0].id).toBe("dw-1"); + }); + + test("directive-win increments both wins and directiveWins in persisted policy", () => { + appendSkillExposure(exposure("dw-2", { createdAt: T0 })); + + resolveBoundaryOutcome({ + sessionId: SESSION_ID, + boundary: "clientRequest", + matchedSuggestedAction: true, + storyId: "story-settings", + route: "/settings", + now: T1, + }); + + const policy = loadProjectRoutingPolicy(PROJECT_ROOT); + const stats = policy.scenarios["PreToolUse|flow-verification|clientRequest|Bash"]?.["agent-browser-verify"]; + expect(stats).toBeDefined(); + expect(stats!.directiveWins).toBe(1); + expect(stats!.wins).toBe(1); + expect(stats!.exposures).toBe(1); + }); + + test("multiple directive-wins accumulate deterministically", () => { + // Append three independent pending exposures + appendSkillExposure(exposure("dw-a", { createdAt: T0 })); + appendSkillExposure(exposure("dw-b", { createdAt: T0 })); + appendSkillExposure(exposure("dw-c", { createdAt: T0 })); + + const resolved = resolveBoundaryOutcome({ + sessionId: SESSION_ID, + boundary: "clientRequest", + matchedSuggestedAction: true, + storyId: "story-settings", + route: "/settings", + now: T1, + }); + + expect(resolved).toHaveLength(3); + resolved.forEach((e) => expect(e.outcome).toBe("directive-win")); + + const policy = loadProjectRoutingPolicy(PROJECT_ROOT); + const stats = policy.scenarios["PreToolUse|flow-verification|clientRequest|Bash"]?.["agent-browser-verify"]; + expect(stats!.directiveWins).toBe(3); + expect(stats!.wins).toBe(3); + }); + + test("win (not directive-win) when matchedSuggestedAction is false", () => { + appendSkillExposure(exposure("win-1", { createdAt: T0 })); + + const resolved = resolveBoundaryOutcome({ + sessionId: SESSION_ID, + boundary: "clientRequest", + matchedSuggestedAction: false, + storyId: "story-settings", + route: "/settings", + now: T1, + }); + + expect(resolved).toHaveLength(1); + expect(resolved[0].outcome).toBe("win"); + + const policy = loadProjectRoutingPolicy(PROJECT_ROOT); + const stats = policy.scenarios["PreToolUse|flow-verification|clientRequest|Bash"]?.["agent-browser-verify"]; + expect(stats!.wins).toBe(1); + expect(stats!.directiveWins).toBe(0); + }); + + test("route mismatch prevents closure — strict scoping", () => { + appendSkillExposure(exposure("scope-1", { route: "/dashboard", createdAt: T0 })); + + const resolved = resolveBoundaryOutcome({ + sessionId: SESSION_ID, + boundary: "clientRequest", + matchedSuggestedAction: true, + storyId: "story-settings", + route: "/settings", + now: T1, + }); + + expect(resolved).toHaveLength(0); + + // Exposure remains pending in session ledger + const exposures = loadSessionExposures(SESSION_ID); + expect(exposures[0].outcome).toBe("pending"); + }); + + test("storyId mismatch prevents closure", () => { + appendSkillExposure(exposure("story-mismatch", { storyId: "other-story", createdAt: T0 })); + + const resolved = resolveBoundaryOutcome({ + sessionId: SESSION_ID, + boundary: "clientRequest", + matchedSuggestedAction: true, + storyId: "story-settings", + route: "/settings", + now: T1, + }); + + expect(resolved).toHaveLength(0); + }); + + test("already resolved exposure is not re-resolved", () => { + appendSkillExposure(exposure("already-done", { + createdAt: T0, + outcome: "win", + resolvedAt: T1, + })); + + const resolved = resolveBoundaryOutcome({ + sessionId: SESSION_ID, + boundary: "clientRequest", + matchedSuggestedAction: true, + storyId: "story-settings", + route: "/settings", + now: T2, + }); + + expect(resolved).toHaveLength(0); + }); + + test("idempotent — resolving again after all are closed yields empty", () => { + appendSkillExposure(exposure("idem-1", { createdAt: T0 })); + + const first = resolveBoundaryOutcome({ + sessionId: SESSION_ID, + boundary: "clientRequest", + matchedSuggestedAction: true, + storyId: "story-settings", + route: "/settings", + now: T1, + }); + expect(first).toHaveLength(1); + + const second = resolveBoundaryOutcome({ + sessionId: SESSION_ID, + boundary: "clientRequest", + matchedSuggestedAction: true, + storyId: "story-settings", + route: "/settings", + now: T2, + }); + expect(second).toHaveLength(0); + }); + + test("null route in exposure only matches null observed route (strict null matching)", () => { + appendSkillExposure(exposure("null-route", { route: null, createdAt: T0 })); + + // Non-null route should NOT match the null-route exposure + const mismatch = resolveBoundaryOutcome({ + sessionId: SESSION_ID, + boundary: "clientRequest", + matchedSuggestedAction: true, + storyId: "story-settings", + route: "/settings", + now: T1, + }); + expect(mismatch).toHaveLength(0); + + // Null route should match + const match = resolveBoundaryOutcome({ + sessionId: SESSION_ID, + boundary: "clientRequest", + matchedSuggestedAction: true, + storyId: "story-settings", + route: null, + now: T2, + }); + expect(match).toHaveLength(1); + expect(match[0].outcome).toBe("directive-win"); + }); +}); diff --git a/tests/routing-policy-ledger.test.ts b/tests/routing-policy-ledger.test.ts new file mode 100644 index 0000000..c604846 --- /dev/null +++ b/tests/routing-policy-ledger.test.ts @@ -0,0 +1,1016 @@ +import { describe, test, expect, beforeEach, afterEach } from "bun:test"; +import { mkdirSync, readFileSync, rmSync, writeFileSync, existsSync, unlinkSync } from "node:fs"; +import { tmpdir } from "node:os"; +import { createHash } from "node:crypto"; +import { + projectPolicyPath, + sessionExposurePath, + loadProjectRoutingPolicy, + saveProjectRoutingPolicy, + appendSkillExposure, + loadSessionExposures, + resolveBoundaryOutcome, + finalizeStaleExposures, + type SkillExposure, +} from "../hooks/src/routing-policy-ledger.mts"; +import { createEmptyRoutingPolicy } from "../hooks/src/routing-policy.mts"; + +// --------------------------------------------------------------------------- +// Helpers +// --------------------------------------------------------------------------- + +const TEST_PROJECT = "/tmp/test-project-routing-policy-ledger"; +const TEST_SESSION = "test-session-rpl-" + Date.now(); + +const T0 = "2026-03-27T04:00:00.000Z"; +const T1 = "2026-03-27T04:01:00.000Z"; +const T2 = "2026-03-27T04:02:00.000Z"; +const T3 = "2026-03-27T04:03:00.000Z"; +const T4 = "2026-03-27T04:04:00.000Z"; + +function makeExposure(overrides: Partial = {}): SkillExposure { + return { + id: `${TEST_SESSION}:test-skill:${Date.now()}`, + sessionId: TEST_SESSION, + projectRoot: TEST_PROJECT, + storyId: "story-1", + storyKind: "flow-verification", + route: "/dashboard", + hook: "PreToolUse", + toolName: "Bash", + skill: "agent-browser-verify", + targetBoundary: "uiRender", + exposureGroupId: null, + attributionRole: "candidate", + candidateSkill: null, + createdAt: T0, + resolvedAt: null, + outcome: "pending", + ...overrides, + }; +} + +function cleanupFiles() { + const policyPath = projectPolicyPath(TEST_PROJECT); + const exposurePath = sessionExposurePath(TEST_SESSION); + try { unlinkSync(policyPath); } catch {} + try { unlinkSync(exposurePath); } catch {} +} + +// --------------------------------------------------------------------------- +// Tests +// --------------------------------------------------------------------------- + +describe("routing-policy-ledger", () => { + beforeEach(cleanupFiles); + afterEach(cleanupFiles); + + describe("projectPolicyPath", () => { + test("uses sha256 of projectRoot in tmpdir", () => { + const path = projectPolicyPath(TEST_PROJECT); + const hash = createHash("sha256").update(TEST_PROJECT).digest("hex"); + expect(path).toBe(`${tmpdir()}/vercel-plugin-routing-policy-${hash}.json`); + }); + + test("different projects produce different paths", () => { + const p1 = projectPolicyPath("/project-a"); + const p2 = projectPolicyPath("/project-b"); + expect(p1).not.toBe(p2); + }); + }); + + describe("sessionExposurePath", () => { + test("uses sessionId in tmpdir for safe IDs", () => { + const path = sessionExposurePath(TEST_SESSION); + expect(path).toBe(`${tmpdir()}/vercel-plugin-${TEST_SESSION}-routing-exposures.jsonl`); + }); + + test("hashes unsafe session IDs containing / or :", () => { + const unsafeId = "abc/def:ghi"; + const path = sessionExposurePath(unsafeId); + const hash = createHash("sha256").update(unsafeId).digest("hex"); + expect(path).toBe(`${tmpdir()}/vercel-plugin-${hash}-routing-exposures.jsonl`); + expect(path).not.toContain("abc/def:ghi"); + // The only slashes should be from the tmpdir prefix + const segment = path.replace(`${tmpdir()}/`, ""); + expect(segment).not.toContain("/"); + expect(segment).not.toContain(":"); + }); + }); + + describe("loadProjectRoutingPolicy / saveProjectRoutingPolicy", () => { + test("returns empty policy when no file exists", () => { + const policy = loadProjectRoutingPolicy(TEST_PROJECT); + expect(policy.version).toBe(1); + expect(policy.scenarios).toEqual({}); + }); + + test("round-trips a policy through save/load", () => { + const policy = createEmptyRoutingPolicy(); + policy.scenarios["PreToolUse|flow-verification|uiRender|Bash"] = { + "agent-browser-verify": { + exposures: 5, + wins: 4, + directiveWins: 3, + staleMisses: 1, + lastUpdatedAt: T0, + }, + }; + + saveProjectRoutingPolicy(TEST_PROJECT, policy); + const loaded = loadProjectRoutingPolicy(TEST_PROJECT); + + expect(loaded.version).toBe(1); + expect(loaded.scenarios["PreToolUse|flow-verification|uiRender|Bash"]["agent-browser-verify"]).toEqual({ + exposures: 5, + wins: 4, + directiveWins: 3, + staleMisses: 1, + lastUpdatedAt: T0, + }); + }); + + test("returns empty policy for corrupt file", () => { + const path = projectPolicyPath(TEST_PROJECT); + writeFileSync(path, "not-json"); + const policy = loadProjectRoutingPolicy(TEST_PROJECT); + expect(policy.version).toBe(1); + expect(policy.scenarios).toEqual({}); + }); + }); + + describe("appendSkillExposure / loadSessionExposures", () => { + test("appends and loads exposures from JSONL", () => { + const e1 = makeExposure({ id: "e1", createdAt: T0 }); + const e2 = makeExposure({ id: "e2", skill: "vercel-deploy", createdAt: T1 }); + + appendSkillExposure(e1); + appendSkillExposure(e2); + + const loaded = loadSessionExposures(TEST_SESSION); + expect(loaded).toHaveLength(2); + expect(loaded[0].id).toBe("e1"); + expect(loaded[1].id).toBe("e2"); + expect(loaded[1].skill).toBe("vercel-deploy"); + + const policy = loadProjectRoutingPolicy(TEST_PROJECT); + const scenario = policy.scenarios["PreToolUse|flow-verification|uiRender|Bash"]; + expect(scenario?.["agent-browser-verify"]?.exposures).toBe(1); + expect(scenario?.["vercel-deploy"]?.exposures).toBe(1); + }); + + test("returns empty array for nonexistent session", () => { + const loaded = loadSessionExposures("no-such-session"); + expect(loaded).toEqual([]); + }); + }); + + describe("resolveBoundaryOutcome", () => { + test("resolves pending exposures matching boundary, story, and route as win", () => { + appendSkillExposure(makeExposure({ id: "e1", createdAt: T0 })); + appendSkillExposure(makeExposure({ id: "e2", createdAt: T1 })); + + const resolved = resolveBoundaryOutcome({ + sessionId: TEST_SESSION, + boundary: "uiRender", + matchedSuggestedAction: false, + storyId: "story-1", + route: "/dashboard", + now: T2, + }); + + expect(resolved).toHaveLength(2); + expect(resolved[0].outcome).toBe("win"); + expect(resolved[0].resolvedAt).toBe(T2); + expect(resolved[1].outcome).toBe("win"); + + // Verify ledger is updated + const reloaded = loadSessionExposures(TEST_SESSION); + expect(reloaded.every((e) => e.outcome === "win")).toBe(true); + }); + + test("resolves as directive-win when matchedSuggestedAction is true", () => { + appendSkillExposure(makeExposure({ id: "e1", createdAt: T0 })); + + const resolved = resolveBoundaryOutcome({ + sessionId: TEST_SESSION, + boundary: "uiRender", + matchedSuggestedAction: true, + storyId: "story-1", + route: "/dashboard", + now: T2, + }); + + expect(resolved).toHaveLength(1); + expect(resolved[0].outcome).toBe("directive-win"); + }); + + test("does not resolve exposures with different boundary", () => { + appendSkillExposure(makeExposure({ + id: "e1", + targetBoundary: "clientRequest", + createdAt: T0, + })); + + const resolved = resolveBoundaryOutcome({ + sessionId: TEST_SESSION, + boundary: "uiRender", + matchedSuggestedAction: false, + storyId: "story-1", + route: "/dashboard", + now: T2, + }); + + expect(resolved).toHaveLength(0); + + const reloaded = loadSessionExposures(TEST_SESSION); + expect(reloaded[0].outcome).toBe("pending"); + }); + + test("does not re-resolve already resolved exposures", () => { + appendSkillExposure(makeExposure({ + id: "e1", + outcome: "win", + resolvedAt: T1, + createdAt: T0, + })); + + const resolved = resolveBoundaryOutcome({ + sessionId: TEST_SESSION, + boundary: "uiRender", + matchedSuggestedAction: false, + storyId: "story-1", + route: "/dashboard", + now: T2, + }); + + expect(resolved).toHaveLength(0); + }); + + test("updates project policy with resolved outcomes", () => { + appendSkillExposure(makeExposure({ id: "e1", createdAt: T0 })); + + resolveBoundaryOutcome({ + sessionId: TEST_SESSION, + boundary: "uiRender", + matchedSuggestedAction: true, + storyId: "story-1", + route: "/dashboard", + now: T2, + }); + + const policy = loadProjectRoutingPolicy(TEST_PROJECT); + const stats = policy.scenarios["PreToolUse|flow-verification|uiRender|Bash"]?.["agent-browser-verify"]; + expect(stats).toBeDefined(); + expect(stats!.exposures).toBe(1); + expect(stats!.wins).toBe(1); + expect(stats!.directiveWins).toBe(1); + }); + + test("returns empty array when no pending exposures exist", () => { + const resolved = resolveBoundaryOutcome({ + sessionId: TEST_SESSION, + boundary: "uiRender", + matchedSuggestedAction: false, + now: T2, + }); + + expect(resolved).toEqual([]); + }); + }); + + describe("finalizeStaleExposures", () => { + test("converts remaining pending exposures to stale-miss", () => { + appendSkillExposure(makeExposure({ id: "e1", createdAt: T0 })); + appendSkillExposure(makeExposure({ + id: "e2", + targetBoundary: "clientRequest", + createdAt: T1, + })); + + const stale = finalizeStaleExposures(TEST_SESSION, T3); + + expect(stale).toHaveLength(2); + expect(stale[0].outcome).toBe("stale-miss"); + expect(stale[0].resolvedAt).toBe(T3); + expect(stale[1].outcome).toBe("stale-miss"); + + // Verify ledger + const reloaded = loadSessionExposures(TEST_SESSION); + expect(reloaded.every((e) => e.outcome === "stale-miss")).toBe(true); + }); + + test("does not finalize already resolved exposures", () => { + appendSkillExposure(makeExposure({ + id: "e1", + outcome: "win", + resolvedAt: T1, + createdAt: T0, + })); + appendSkillExposure(makeExposure({ + id: "e2", + createdAt: T1, + })); + + const stale = finalizeStaleExposures(TEST_SESSION, T3); + + expect(stale).toHaveLength(1); + expect(stale[0].id).toBe("e2"); + + const reloaded = loadSessionExposures(TEST_SESSION); + expect(reloaded[0].outcome).toBe("win"); + expect(reloaded[1].outcome).toBe("stale-miss"); + }); + + test("updates project policy with stale-miss outcomes", () => { + appendSkillExposure(makeExposure({ id: "e1", createdAt: T0 })); + + finalizeStaleExposures(TEST_SESSION, T3); + + const policy = loadProjectRoutingPolicy(TEST_PROJECT); + const stats = policy.scenarios["PreToolUse|flow-verification|uiRender|Bash"]?.["agent-browser-verify"]; + expect(stats).toBeDefined(); + expect(stats!.exposures).toBe(1); + expect(stats!.staleMisses).toBe(1); + expect(stats!.wins).toBe(0); + }); + + test("returns empty array when no pending exposures exist", () => { + const stale = finalizeStaleExposures(TEST_SESSION, T3); + expect(stale).toEqual([]); + }); + }); + + describe("story/route-scoped resolution", () => { + test("resolves only exposures matching the observed storyId", () => { + appendSkillExposure(makeExposure({ + id: "story1-e1", + storyId: "story-1", + route: "/settings", + createdAt: T0, + })); + appendSkillExposure(makeExposure({ + id: "story2-e1", + storyId: "story-2", + route: "/settings", + createdAt: T1, + })); + + const resolved = resolveBoundaryOutcome({ + sessionId: TEST_SESSION, + boundary: "uiRender", + matchedSuggestedAction: false, + storyId: "story-1", + route: "/settings", + now: T2, + }); + + expect(resolved).toHaveLength(1); + expect(resolved[0].id).toBe("story1-e1"); + expect(resolved[0].outcome).toBe("win"); + + // story-2 exposure remains pending + const all = loadSessionExposures(TEST_SESSION); + const story2 = all.find((e) => e.id === "story2-e1"); + expect(story2!.outcome).toBe("pending"); + }); + + test("resolves only exposures matching the observed route", () => { + appendSkillExposure(makeExposure({ + id: "settings-e1", + storyId: "story-1", + route: "/settings", + createdAt: T0, + })); + appendSkillExposure(makeExposure({ + id: "dashboard-e1", + storyId: "story-1", + route: "/dashboard", + createdAt: T1, + })); + + const resolved = resolveBoundaryOutcome({ + sessionId: TEST_SESSION, + boundary: "uiRender", + matchedSuggestedAction: false, + storyId: "story-1", + route: "/settings", + now: T2, + }); + + expect(resolved).toHaveLength(1); + expect(resolved[0].id).toBe("settings-e1"); + + const all = loadSessionExposures(TEST_SESSION); + expect(all.find((e) => e.id === "dashboard-e1")!.outcome).toBe("pending"); + }); + + test("resolves only exposures matching both storyId and route", () => { + appendSkillExposure(makeExposure({ + id: "match-e1", + storyId: "story-1", + route: "/settings", + createdAt: T0, + })); + appendSkillExposure(makeExposure({ + id: "wrong-story", + storyId: "story-2", + route: "/settings", + createdAt: T1, + })); + appendSkillExposure(makeExposure({ + id: "wrong-route", + storyId: "story-1", + route: "/dashboard", + createdAt: T2, + })); + + const resolved = resolveBoundaryOutcome({ + sessionId: TEST_SESSION, + boundary: "uiRender", + matchedSuggestedAction: false, + storyId: "story-1", + route: "/settings", + now: T3, + }); + + expect(resolved).toHaveLength(1); + expect(resolved[0].id).toBe("match-e1"); + + const all = loadSessionExposures(TEST_SESSION); + expect(all.find((e) => e.id === "wrong-story")!.outcome).toBe("pending"); + expect(all.find((e) => e.id === "wrong-route")!.outcome).toBe("pending"); + }); + + test("null observed route/storyId only resolves exposures with null route/storyId (strict matching)", () => { + // Exposures with specific routes should NOT be resolved by a null observed route + appendSkillExposure(makeExposure({ + id: "specific-route-e1", + storyId: "story-1", + route: "/settings", + createdAt: T0, + })); + appendSkillExposure(makeExposure({ + id: "specific-route-e2", + storyId: "story-2", + route: "/dashboard", + createdAt: T1, + })); + + const resolved = resolveBoundaryOutcome({ + sessionId: TEST_SESSION, + boundary: "uiRender", + matchedSuggestedAction: false, + now: T2, + }); + + // Strict null matching: null route/storyId does NOT match non-null exposures + expect(resolved).toHaveLength(0); + + // All remain pending + const all = loadSessionExposures(TEST_SESSION); + expect(all.every((e) => e.outcome === "pending")).toBe(true); + }); + + test("null observed route/storyId resolves exposures that also have null route/storyId", () => { + appendSkillExposure(makeExposure({ + id: "null-route-e1", + storyId: null, + route: null, + createdAt: T0, + })); + appendSkillExposure(makeExposure({ + id: "specific-route-e1", + storyId: "story-1", + route: "/settings", + createdAt: T1, + })); + + const resolved = resolveBoundaryOutcome({ + sessionId: TEST_SESSION, + boundary: "uiRender", + matchedSuggestedAction: false, + now: T2, + }); + + // Only the null-route exposure is resolved + expect(resolved).toHaveLength(1); + expect(resolved[0].id).toBe("null-route-e1"); + + // The specific-route exposure remains pending + const all = loadSessionExposures(TEST_SESSION); + expect(all.find((e) => e.id === "specific-route-e1")!.outcome).toBe("pending"); + }); + }); + + describe("unsafe session ID round-trip", () => { + const UNSAFE_SESSION = "abc/def:ghi"; + + afterEach(() => { + try { unlinkSync(sessionExposurePath(UNSAFE_SESSION)); } catch {} + try { unlinkSync(projectPolicyPath(TEST_PROJECT)); } catch {} + }); + + test("append, load, resolve, and finalize all work with unsafe session IDs", () => { + const e1 = makeExposure({ + id: "unsafe-e1", + sessionId: UNSAFE_SESSION, + targetBoundary: "clientRequest", + createdAt: T0, + }); + const e2 = makeExposure({ + id: "unsafe-e2", + sessionId: UNSAFE_SESSION, + targetBoundary: "uiRender", + createdAt: T1, + }); + + // Append should not throw + appendSkillExposure(e1); + appendSkillExposure(e2); + + // Load should return both + const loaded = loadSessionExposures(UNSAFE_SESSION); + expect(loaded).toHaveLength(2); + + // Resolve clientRequest (must match storyId/route from makeExposure defaults) + const resolved = resolveBoundaryOutcome({ + sessionId: UNSAFE_SESSION, + boundary: "clientRequest", + matchedSuggestedAction: false, + storyId: "story-1", + route: "/dashboard", + now: T2, + }); + expect(resolved).toHaveLength(1); + expect(resolved[0].id).toBe("unsafe-e1"); + + // Finalize remaining + const stale = finalizeStaleExposures(UNSAFE_SESSION, T3); + expect(stale).toHaveLength(1); + expect(stale[0].id).toBe("unsafe-e2"); + expect(stale[0].outcome).toBe("stale-miss"); + + // Verify the file path doesn't contain unsafe characters + const path = sessionExposurePath(UNSAFE_SESSION); + const segment = path.replace(`${tmpdir()}/`, ""); + expect(segment).not.toContain("/"); + expect(segment).not.toContain(":"); + }); + }); + + describe("null-route attribution guardrails", () => { + test("null observed route does not over-credit exposures with specific routes", () => { + appendSkillExposure(makeExposure({ + id: "route-a", + route: "/settings", + storyId: "story-1", + createdAt: T0, + })); + appendSkillExposure(makeExposure({ + id: "route-b", + route: "/dashboard", + storyId: "story-1", + createdAt: T1, + })); + appendSkillExposure(makeExposure({ + id: "route-c", + route: "/api/users", + storyId: "story-1", + createdAt: T2, + })); + + // Observed route is null — should NOT resolve any of these + const resolved = resolveBoundaryOutcome({ + sessionId: TEST_SESSION, + boundary: "uiRender", + matchedSuggestedAction: false, + storyId: "story-1", + route: null, + now: T3, + }); + + expect(resolved).toHaveLength(0); + + // All remain pending + const all = loadSessionExposures(TEST_SESSION); + expect(all.every((e) => e.outcome === "pending")).toBe(true); + }); + + test("null observed storyId does not over-credit exposures with specific storyIds", () => { + appendSkillExposure(makeExposure({ + id: "story-a", + storyId: "story-1", + route: "/settings", + createdAt: T0, + })); + appendSkillExposure(makeExposure({ + id: "story-b", + storyId: "story-2", + route: "/settings", + createdAt: T1, + })); + + // Observed storyId is null — should NOT resolve any + const resolved = resolveBoundaryOutcome({ + sessionId: TEST_SESSION, + boundary: "uiRender", + matchedSuggestedAction: false, + storyId: null, + route: "/settings", + now: T3, + }); + + expect(resolved).toHaveLength(0); + }); + + test("mixed null and non-null: only exact matches resolve", () => { + // Exposure with null route + appendSkillExposure(makeExposure({ + id: "null-route", + storyId: "story-1", + route: null, + createdAt: T0, + })); + // Exposure with specific route + appendSkillExposure(makeExposure({ + id: "specific-route", + storyId: "story-1", + route: "/settings", + createdAt: T1, + })); + + // Resolve with null route — only matches null-route exposure + const resolved = resolveBoundaryOutcome({ + sessionId: TEST_SESSION, + boundary: "uiRender", + matchedSuggestedAction: false, + storyId: "story-1", + route: null, + now: T2, + }); + + expect(resolved).toHaveLength(1); + expect(resolved[0].id).toBe("null-route"); + + // specific-route remains pending + const all = loadSessionExposures(TEST_SESSION); + expect(all.find((e) => e.id === "specific-route")!.outcome).toBe("pending"); + }); + }); + + describe("outcome distinguishability", () => { + test("directive-win and plain win are persisted distinctly in exposures", () => { + appendSkillExposure(makeExposure({ + id: "directive-e1", + storyId: "story-1", + route: "/a", + createdAt: T0, + })); + appendSkillExposure(makeExposure({ + id: "plain-e1", + storyId: "story-1", + route: "/b", + createdAt: T1, + })); + + // Directive win for /a + resolveBoundaryOutcome({ + sessionId: TEST_SESSION, + boundary: "uiRender", + matchedSuggestedAction: true, + storyId: "story-1", + route: "/a", + now: T2, + }); + + // Plain win for /b + resolveBoundaryOutcome({ + sessionId: TEST_SESSION, + boundary: "uiRender", + matchedSuggestedAction: false, + storyId: "story-1", + route: "/b", + now: T3, + }); + + const all = loadSessionExposures(TEST_SESSION); + expect(all.find((e) => e.id === "directive-e1")!.outcome).toBe("directive-win"); + expect(all.find((e) => e.id === "plain-e1")!.outcome).toBe("win"); + }); + + test("directive-win, win, and stale-miss coexist in the same session ledger", () => { + appendSkillExposure(makeExposure({ + id: "dw-e1", + storyId: "story-1", + route: "/a", + targetBoundary: "uiRender", + createdAt: T0, + })); + appendSkillExposure(makeExposure({ + id: "w-e1", + storyId: "story-1", + route: "/b", + targetBoundary: "uiRender", + createdAt: T1, + })); + appendSkillExposure(makeExposure({ + id: "sm-e1", + storyId: "story-1", + route: "/c", + targetBoundary: "clientRequest", + createdAt: T2, + })); + + // Directive win + resolveBoundaryOutcome({ + sessionId: TEST_SESSION, + boundary: "uiRender", + matchedSuggestedAction: true, + storyId: "story-1", + route: "/a", + now: T3, + }); + + // Plain win + resolveBoundaryOutcome({ + sessionId: TEST_SESSION, + boundary: "uiRender", + matchedSuggestedAction: false, + storyId: "story-1", + route: "/b", + now: T3, + }); + + // Stale-miss the rest + finalizeStaleExposures(TEST_SESSION, T4); + + const all = loadSessionExposures(TEST_SESSION); + const outcomes = all.map((e) => ({ id: e.id, outcome: e.outcome })); + expect(outcomes).toEqual([ + { id: "dw-e1", outcome: "directive-win" }, + { id: "w-e1", outcome: "win" }, + { id: "sm-e1", outcome: "stale-miss" }, + ]); + }); + + test("policy correctly distinguishes directive-win from plain win counts", () => { + appendSkillExposure(makeExposure({ + id: "dw", + storyId: "story-1", + route: "/a", + createdAt: T0, + })); + appendSkillExposure(makeExposure({ + id: "pw", + storyId: "story-1", + route: "/b", + createdAt: T1, + })); + + resolveBoundaryOutcome({ + sessionId: TEST_SESSION, + boundary: "uiRender", + matchedSuggestedAction: true, + storyId: "story-1", + route: "/a", + now: T2, + }); + resolveBoundaryOutcome({ + sessionId: TEST_SESSION, + boundary: "uiRender", + matchedSuggestedAction: false, + storyId: "story-1", + route: "/b", + now: T3, + }); + + const policy = loadProjectRoutingPolicy(TEST_PROJECT); + const stats = policy.scenarios["PreToolUse|flow-verification|uiRender|Bash"]?.["agent-browser-verify"]; + expect(stats!.wins).toBe(2); + expect(stats!.directiveWins).toBe(1); + }); + }); + + describe("stale-miss finalization determinism", () => { + test("repeated finalization calls produce identical results", () => { + appendSkillExposure(makeExposure({ + id: "det-e1", + storyId: "s1", + route: "/x", + createdAt: T0, + })); + appendSkillExposure(makeExposure({ + id: "det-e2", + storyId: "s1", + route: "/y", + createdAt: T1, + })); + + const first = finalizeStaleExposures(TEST_SESSION, T3); + expect(first).toHaveLength(2); + expect(first.every((e) => e.outcome === "stale-miss")).toBe(true); + + // Second call should be a no-op + const second = finalizeStaleExposures(TEST_SESSION, T4); + expect(second).toHaveLength(0); + + // Ledger is identical after both calls + const all = loadSessionExposures(TEST_SESSION); + expect(all).toHaveLength(2); + expect(all[0].resolvedAt).toBe(T3); + expect(all[1].resolvedAt).toBe(T3); + }); + }); + + describe("strict null matching regression — paired unresolved/resolved", () => { + test("pending exposure stays unresolved when observed storyId and route are both null", () => { + appendSkillExposure(makeExposure({ + id: "fallback-e1", + storyId: "story-fb", + route: "/settings", + targetBoundary: "clientRequest", + createdAt: T0, + })); + + // Attempt resolution with null storyId and null route + const resolved = resolveBoundaryOutcome({ + sessionId: TEST_SESSION, + boundary: "clientRequest", + matchedSuggestedAction: false, + storyId: null, + route: null, + now: T1, + }); + + // Strict null matching: null does not match non-null — exposure stays pending + expect(resolved).toHaveLength(0); + + const all = loadSessionExposures(TEST_SESSION); + expect(all).toHaveLength(1); + expect(all[0].id).toBe("fallback-e1"); + expect(all[0].outcome).toBe("pending"); + expect(all[0].resolvedAt).toBeNull(); + + // Policy should have exposure counted but zero wins + const policy = loadProjectRoutingPolicy(TEST_PROJECT); + const stats = policy.scenarios["PreToolUse|flow-verification|clientRequest|Bash"]?.["agent-browser-verify"]; + expect(stats).toBeDefined(); + expect(stats!.exposures).toBe(1); + expect(stats!.wins).toBe(0); + expect(stats!.directiveWins).toBe(0); + }); + + test("same exposure resolves once exact storyId and route are supplied", () => { + // Seed the same exposure as the paired test above + appendSkillExposure(makeExposure({ + id: "fallback-e1", + storyId: "story-fb", + route: "/settings", + targetBoundary: "clientRequest", + createdAt: T0, + })); + + // First: null attempt — should fail + const attempt1 = resolveBoundaryOutcome({ + sessionId: TEST_SESSION, + boundary: "clientRequest", + matchedSuggestedAction: false, + storyId: null, + route: null, + now: T1, + }); + expect(attempt1).toHaveLength(0); + + // Second: exact values — should succeed + const attempt2 = resolveBoundaryOutcome({ + sessionId: TEST_SESSION, + boundary: "clientRequest", + matchedSuggestedAction: true, + storyId: "story-fb", + route: "/settings", + now: T2, + }); + + expect(attempt2).toHaveLength(1); + expect(attempt2[0].id).toBe("fallback-e1"); + expect(attempt2[0].outcome).toBe("directive-win"); + expect(attempt2[0].resolvedAt).toBe(T2); + + // Verify ledger is updated + const all = loadSessionExposures(TEST_SESSION); + expect(all).toHaveLength(1); + expect(all[0].outcome).toBe("directive-win"); + + // Policy stats should reflect exactly 1 exposure, 1 win, 1 directiveWin + const policy = loadProjectRoutingPolicy(TEST_PROJECT); + const stats = policy.scenarios["PreToolUse|flow-verification|clientRequest|Bash"]?.["agent-browser-verify"]; + expect(stats).toBeDefined(); + expect(stats!.exposures).toBe(1); + expect(stats!.wins).toBe(1); + expect(stats!.directiveWins).toBe(1); + expect(stats!.staleMisses).toBe(0); + }); + + test("multiple exposures: null resolution leaves all pending, exact resolution is selective", () => { + appendSkillExposure(makeExposure({ + id: "multi-e1", + storyId: "story-m", + route: "/api/data", + targetBoundary: "clientRequest", + createdAt: T0, + })); + appendSkillExposure(makeExposure({ + id: "multi-e2", + storyId: "story-m", + route: "/api/users", + targetBoundary: "clientRequest", + createdAt: T1, + })); + appendSkillExposure(makeExposure({ + id: "multi-e3", + storyId: null, + route: null, + targetBoundary: "clientRequest", + createdAt: T2, + })); + + // Null resolution: only multi-e3 (null/null) should resolve + const nullResolved = resolveBoundaryOutcome({ + sessionId: TEST_SESSION, + boundary: "clientRequest", + matchedSuggestedAction: false, + storyId: null, + route: null, + now: T3, + }); + expect(nullResolved).toHaveLength(1); + expect(nullResolved[0].id).toBe("multi-e3"); + + // Exact resolution for multi-e1 + const exactResolved = resolveBoundaryOutcome({ + sessionId: TEST_SESSION, + boundary: "clientRequest", + matchedSuggestedAction: false, + storyId: "story-m", + route: "/api/data", + now: T4, + }); + expect(exactResolved).toHaveLength(1); + expect(exactResolved[0].id).toBe("multi-e1"); + + // multi-e2 remains pending + const all = loadSessionExposures(TEST_SESSION); + const e2 = all.find((e) => e.id === "multi-e2"); + expect(e2!.outcome).toBe("pending"); + + // Policy: 3 exposures, 2 wins, 0 directiveWins + const policy = loadProjectRoutingPolicy(TEST_PROJECT); + const stats = policy.scenarios["PreToolUse|flow-verification|clientRequest|Bash"]?.["agent-browser-verify"]; + expect(stats!.exposures).toBe(3); + expect(stats!.wins).toBe(2); + expect(stats!.directiveWins).toBe(0); + }); + }); + + describe("idempotency", () => { + test("resolveBoundaryOutcome is safe to call twice", () => { + appendSkillExposure(makeExposure({ id: "e1", createdAt: T0 })); + + resolveBoundaryOutcome({ + sessionId: TEST_SESSION, + boundary: "uiRender", + matchedSuggestedAction: false, + storyId: "story-1", + route: "/dashboard", + now: T2, + }); + + // Second call should find no pending exposures + const second = resolveBoundaryOutcome({ + sessionId: TEST_SESSION, + boundary: "uiRender", + matchedSuggestedAction: false, + storyId: "story-1", + route: "/dashboard", + now: T3, + }); + + expect(second).toHaveLength(0); + + // Policy should still have exactly 1 win + const policy = loadProjectRoutingPolicy(TEST_PROJECT); + const stats = policy.scenarios["PreToolUse|flow-verification|uiRender|Bash"]?.["agent-browser-verify"]; + expect(stats!.wins).toBe(1); + }); + + test("finalizeStaleExposures is safe to call twice", () => { + appendSkillExposure(makeExposure({ id: "e1", createdAt: T0 })); + + finalizeStaleExposures(TEST_SESSION, T2); + const second = finalizeStaleExposures(TEST_SESSION, T3); + + expect(second).toHaveLength(0); + }); + }); +}); diff --git a/tests/routing-policy.test.ts b/tests/routing-policy.test.ts new file mode 100644 index 0000000..29b8e8e --- /dev/null +++ b/tests/routing-policy.test.ts @@ -0,0 +1,631 @@ +import { describe, test, expect } from "bun:test"; +import { + createEmptyRoutingPolicy, + scenarioKey, + scenarioKeyWithRoute, + scenarioKeyCandidates, + computePolicySuccessRate, + lookupPolicyStats, + ensureScenario, + recordExposure, + recordOutcome, + derivePolicyBoost, + applyPolicyBoosts, + type RoutingPolicyFile, + type RoutingPolicyScenario, + type RoutingPolicyStats, +} from "../hooks/src/routing-policy.mts"; + +// Fixed ISO timestamps for deterministic tests +const T0 = "2026-03-27T04:00:00.000Z"; +const T1 = "2026-03-27T04:01:00.000Z"; +const T2 = "2026-03-27T04:02:00.000Z"; +const T3 = "2026-03-27T04:03:00.000Z"; + +const BASE_SCENARIO: RoutingPolicyScenario = { + hook: "PreToolUse", + storyKind: "flow-verification", + targetBoundary: "uiRender", + toolName: "Bash", +}; + +describe("routing-policy core", () => { + describe("createEmptyRoutingPolicy", () => { + test("returns version 1 with empty scenarios", () => { + const policy = createEmptyRoutingPolicy(); + expect(policy.version).toBe(1); + expect(policy.scenarios).toEqual({}); + }); + }); + + describe("scenarioKey", () => { + test("joins fields with pipe delimiters", () => { + expect(scenarioKey(BASE_SCENARIO)).toBe( + "PreToolUse|flow-verification|uiRender|Bash", + ); + }); + + test("uses 'none' for null storyKind and targetBoundary", () => { + expect( + scenarioKey({ + hook: "UserPromptSubmit", + storyKind: null, + targetBoundary: null, + toolName: "Prompt", + }), + ).toBe("UserPromptSubmit|none|none|Prompt"); + }); + }); + + describe("ensureScenario", () => { + test("creates scenario and skill slot when absent", () => { + const policy = createEmptyRoutingPolicy(); + const stats = ensureScenario(policy, "s1", "skill-a", T0); + expect(stats).toEqual({ + exposures: 0, + wins: 0, + directiveWins: 0, + staleMisses: 0, + lastUpdatedAt: T0, + }); + expect(policy.scenarios["s1"]["skill-a"]).toBe(stats); + }); + + test("returns existing slot without overwriting", () => { + const policy = createEmptyRoutingPolicy(); + const first = ensureScenario(policy, "s1", "skill-a", T0); + first.exposures = 5; + const second = ensureScenario(policy, "s1", "skill-a", T1); + expect(second.exposures).toBe(5); + expect(second).toBe(first); + }); + }); + + describe("recordExposure", () => { + test("increments exposures and updates timestamp", () => { + const policy = createEmptyRoutingPolicy(); + recordExposure(policy, { ...BASE_SCENARIO, skill: "agent-browser-verify", now: T0 }); + recordExposure(policy, { ...BASE_SCENARIO, skill: "agent-browser-verify", now: T1 }); + + const key = scenarioKey(BASE_SCENARIO); + const stats = policy.scenarios[key]["agent-browser-verify"]; + expect(stats.exposures).toBe(2); + expect(stats.wins).toBe(0); + expect(stats.lastUpdatedAt).toBe(T1); + }); + + test("returns the same policy object (mutation)", () => { + const policy = createEmptyRoutingPolicy(); + const result = recordExposure(policy, { ...BASE_SCENARIO, skill: "x", now: T0 }); + expect(result).toBe(policy); + }); + }); + + describe("recordOutcome", () => { + test("win increments wins only", () => { + const policy = createEmptyRoutingPolicy(); + recordOutcome(policy, { + ...BASE_SCENARIO, + skill: "s", + outcome: "win", + now: T0, + }); + const stats = policy.scenarios[scenarioKey(BASE_SCENARIO)]["s"]; + expect(stats.wins).toBe(1); + expect(stats.directiveWins).toBe(0); + expect(stats.staleMisses).toBe(0); + }); + + test("directive-win increments both wins and directiveWins", () => { + const policy = createEmptyRoutingPolicy(); + recordOutcome(policy, { + ...BASE_SCENARIO, + skill: "s", + outcome: "directive-win", + now: T0, + }); + const stats = policy.scenarios[scenarioKey(BASE_SCENARIO)]["s"]; + expect(stats.wins).toBe(1); + expect(stats.directiveWins).toBe(1); + }); + + test("stale-miss increments staleMisses only", () => { + const policy = createEmptyRoutingPolicy(); + recordOutcome(policy, { + ...BASE_SCENARIO, + skill: "s", + outcome: "stale-miss", + now: T0, + }); + const stats = policy.scenarios[scenarioKey(BASE_SCENARIO)]["s"]; + expect(stats.wins).toBe(0); + expect(stats.staleMisses).toBe(1); + }); + }); + + describe("derivePolicyBoost", () => { + test("returns 0 for undefined stats", () => { + expect(derivePolicyBoost(undefined)).toBe(0); + }); + + test("returns 0 when exposures < 3", () => { + expect( + derivePolicyBoost({ + exposures: 2, + wins: 2, + directiveWins: 2, + staleMisses: 0, + lastUpdatedAt: T0, + }), + ).toBe(0); + }); + + test("returns 8 for high success rate (>= 80%)", () => { + expect( + derivePolicyBoost({ + exposures: 5, + wins: 4, + directiveWins: 3, + staleMisses: 1, + lastUpdatedAt: T0, + }), + ).toBe(8); + }); + + test("returns 5 for good success rate (>= 65%)", () => { + // 10 exposures, 7 wins, 0 directive → rate = 0.70 → boost 5 + expect( + derivePolicyBoost({ + exposures: 10, + wins: 7, + directiveWins: 0, + staleMisses: 3, + lastUpdatedAt: T0, + }), + ).toBe(5); + }); + + test("returns 2 for moderate success rate (>= 40%)", () => { + // 4 exposures, 2 wins, 0 directive → weightedWins=2, rate=0.50 → boost 2 + expect( + derivePolicyBoost({ + exposures: 4, + wins: 2, + directiveWins: 0, + staleMisses: 2, + lastUpdatedAt: T0, + }), + ).toBe(2); + }); + + test("returns -2 for low success rate with enough exposures", () => { + expect( + derivePolicyBoost({ + exposures: 10, + wins: 1, + directiveWins: 0, + staleMisses: 9, + lastUpdatedAt: T0, + }), + ).toBe(-2); + }); + + test("returns 0 for middling success rate (not enough for boost, not low enough for penalty)", () => { + // 5 exposures, 1 win, 0 directive → rate = 0.20 → not < 0.15, not >= 0.40 + expect( + derivePolicyBoost({ + exposures: 5, + wins: 1, + directiveWins: 0, + staleMisses: 4, + lastUpdatedAt: T0, + }), + ).toBe(0); + }); + }); + + describe("applyPolicyBoosts", () => { + test("adds policyBoost and policyReason to each entry", () => { + const policy = createEmptyRoutingPolicy(); + const entries = [{ skill: "some-skill", priority: 6 }]; + const result = applyPolicyBoosts(entries, policy, BASE_SCENARIO); + + expect(result).toHaveLength(1); + expect(result[0].policyBoost).toBe(0); + expect(result[0].policyReason).toBeNull(); + expect(result[0].effectivePriority).toBe(6); + }); + + test("uses effectivePriority as base when present", () => { + const policy = createEmptyRoutingPolicy(); + const entries = [{ skill: "s", priority: 5, effectivePriority: 10 }]; + const result = applyPolicyBoosts(entries, policy, BASE_SCENARIO); + expect(result[0].effectivePriority).toBe(10); + }); + + test("does not mutate original entries", () => { + const policy = createEmptyRoutingPolicy(); + const original = { skill: "s", priority: 5 }; + applyPolicyBoosts([original], policy, BASE_SCENARIO); + expect(original).toEqual({ skill: "s", priority: 5 }); + }); + }); + + describe("acceptance scenario: 3 exposures + 1 directive-win → boost 2, effective 9", () => { + test("produces expected boost and effective priority", () => { + const policy = createEmptyRoutingPolicy(); + + // 3 exposures + recordExposure(policy, { + ...BASE_SCENARIO, + skill: "agent-browser-verify", + now: T0, + }); + recordExposure(policy, { + ...BASE_SCENARIO, + skill: "agent-browser-verify", + now: T1, + }); + recordExposure(policy, { + ...BASE_SCENARIO, + skill: "agent-browser-verify", + now: T2, + }); + + // 1 directive-win + recordOutcome(policy, { + ...BASE_SCENARIO, + skill: "agent-browser-verify", + outcome: "directive-win", + now: T3, + }); + + // Verify raw stats + const key = scenarioKey(BASE_SCENARIO); + const stats = policy.scenarios[key]["agent-browser-verify"]; + expect(stats.exposures).toBe(3); + expect(stats.wins).toBe(1); + expect(stats.directiveWins).toBe(1); + + // weightedWins = 1 + 1*0.25 = 1.25, rate = 1.25/3 ≈ 0.417 → boost 2 (>= 0.40) + const boost = derivePolicyBoost(stats); + expect(boost).toBe(2); + + const boosted = applyPolicyBoosts( + [{ skill: "agent-browser-verify", priority: 7 }], + policy, + BASE_SCENARIO, + ); + + expect(boosted[0].policyBoost).toBe(2); + expect(boosted[0].effectivePriority).toBe(9); + expect(boosted[0].policyReason).toContain("1 wins / 3 exposures"); + expect(boosted[0].policyReason).toContain("1 directive wins"); + }); + }); + + // --------------------------------------------------------------------------- + // Route-scoped policy (routeScope dimension) + // --------------------------------------------------------------------------- + + describe("scenarioKeyWithRoute", () => { + test("appends routeScope as fifth pipe segment", () => { + expect( + scenarioKeyWithRoute({ ...BASE_SCENARIO, routeScope: "/settings" }), + ).toBe("PreToolUse|flow-verification|uiRender|Bash|/settings"); + }); + + test("defaults to wildcard when routeScope is null", () => { + expect( + scenarioKeyWithRoute({ ...BASE_SCENARIO, routeScope: null }), + ).toBe("PreToolUse|flow-verification|uiRender|Bash|*"); + }); + + test("defaults to wildcard when routeScope is undefined", () => { + expect(scenarioKeyWithRoute(BASE_SCENARIO)).toBe( + "PreToolUse|flow-verification|uiRender|Bash|*", + ); + }); + }); + + describe("scenarioKeyCandidates", () => { + test("exact route → wildcard → legacy for non-null routeScope", () => { + const input = { ...BASE_SCENARIO, routeScope: "/settings" }; + expect(scenarioKeyCandidates(input)).toEqual([ + "PreToolUse|flow-verification|uiRender|Bash|/settings", + "PreToolUse|flow-verification|uiRender|Bash|*", + "PreToolUse|flow-verification|uiRender|Bash", + ]); + }); + + test("wildcard → legacy for null routeScope", () => { + expect(scenarioKeyCandidates(BASE_SCENARIO)).toEqual([ + "PreToolUse|flow-verification|uiRender|Bash|*", + "PreToolUse|flow-verification|uiRender|Bash", + ]); + }); + + test("wildcard → legacy for explicit '*' routeScope (deduped)", () => { + const input = { ...BASE_SCENARIO, routeScope: "*" }; + expect(scenarioKeyCandidates(input)).toEqual([ + "PreToolUse|flow-verification|uiRender|Bash|*", + "PreToolUse|flow-verification|uiRender|Bash", + ]); + }); + }); + + describe("computePolicySuccessRate", () => { + test("returns weighted success rate", () => { + const rate = computePolicySuccessRate({ + exposures: 10, + wins: 7, + directiveWins: 2, + staleMisses: 3, + lastUpdatedAt: T0, + }); + // weightedWins = 7 + 2*0.25 = 7.5, rate = 7.5/10 = 0.75 + expect(rate).toBeCloseTo(0.75); + }); + + test("returns 0 for zero exposures", () => { + expect( + computePolicySuccessRate({ + exposures: 0, + wins: 0, + directiveWins: 0, + staleMisses: 0, + lastUpdatedAt: T0, + }), + ).toBe(0); + }); + }); + + describe("lookupPolicyStats", () => { + test("prefers exact route over wildcard", () => { + const policy: RoutingPolicyFile = { + version: 1, + scenarios: { + "PreToolUse|flow-verification|uiRender|Bash|/settings": { + verification: { + exposures: 5, + wins: 4, + directiveWins: 2, + staleMisses: 1, + lastUpdatedAt: T0, + }, + }, + "PreToolUse|flow-verification|uiRender|Bash|*": { + verification: { + exposures: 10, + wins: 6, + directiveWins: 1, + staleMisses: 4, + lastUpdatedAt: T0, + }, + }, + }, + }; + + const { scenario, stats } = lookupPolicyStats( + policy, + { ...BASE_SCENARIO, routeScope: "/settings" }, + "verification", + ); + expect(scenario).toBe("PreToolUse|flow-verification|uiRender|Bash|/settings"); + expect(stats!.exposures).toBe(5); + }); + + test("falls back to wildcard when exact route absent", () => { + const policy: RoutingPolicyFile = { + version: 1, + scenarios: { + "PreToolUse|flow-verification|uiRender|Bash|*": { + verification: { + exposures: 8, + wins: 6, + directiveWins: 1, + staleMisses: 2, + lastUpdatedAt: T0, + }, + }, + }, + }; + + const { scenario, stats } = lookupPolicyStats( + policy, + { ...BASE_SCENARIO, routeScope: "/settings" }, + "verification", + ); + expect(scenario).toBe("PreToolUse|flow-verification|uiRender|Bash|*"); + expect(stats!.exposures).toBe(8); + }); + + test("falls back to legacy key when no route keys exist", () => { + const policy: RoutingPolicyFile = { + version: 1, + scenarios: { + "PreToolUse|flow-verification|uiRender|Bash": { + verification: { + exposures: 3, + wins: 2, + directiveWins: 0, + staleMisses: 1, + lastUpdatedAt: T0, + }, + }, + }, + }; + + const { scenario, stats } = lookupPolicyStats( + policy, + { ...BASE_SCENARIO, routeScope: "/settings" }, + "verification", + ); + expect(scenario).toBe("PreToolUse|flow-verification|uiRender|Bash"); + expect(stats!.exposures).toBe(3); + }); + + test("returns null scenario for missing skill", () => { + const policy = createEmptyRoutingPolicy(); + const { scenario, stats } = lookupPolicyStats( + policy, + BASE_SCENARIO, + "nonexistent", + ); + expect(scenario).toBeNull(); + expect(stats).toBeUndefined(); + }); + }); + + describe("route-scoped recordExposure", () => { + test("writes to exact route, wildcard, and legacy keys", () => { + const policy = createEmptyRoutingPolicy(); + recordExposure(policy, { + ...BASE_SCENARIO, + routeScope: "/settings", + skill: "verification", + now: T0, + }); + + const keys = Object.keys(policy.scenarios); + expect(keys).toContain("PreToolUse|flow-verification|uiRender|Bash|/settings"); + expect(keys).toContain("PreToolUse|flow-verification|uiRender|Bash|*"); + expect(keys).toContain("PreToolUse|flow-verification|uiRender|Bash"); + + // All three have the same exposure count + for (const key of keys) { + expect(policy.scenarios[key]["verification"].exposures).toBe(1); + } + }); + + test("null routeScope writes to wildcard and legacy only", () => { + const policy = createEmptyRoutingPolicy(); + recordExposure(policy, { + ...BASE_SCENARIO, + routeScope: null, + skill: "verification", + now: T0, + }); + + const keys = Object.keys(policy.scenarios); + expect(keys).toHaveLength(2); + expect(keys).toContain("PreToolUse|flow-verification|uiRender|Bash|*"); + expect(keys).toContain("PreToolUse|flow-verification|uiRender|Bash"); + }); + }); + + describe("route-scoped recordOutcome", () => { + test("writes outcome to all candidate keys", () => { + const policy = createEmptyRoutingPolicy(); + recordOutcome(policy, { + ...BASE_SCENARIO, + routeScope: "/dashboard", + skill: "verification", + outcome: "win", + now: T0, + }); + + for (const key of [ + "PreToolUse|flow-verification|uiRender|Bash|/dashboard", + "PreToolUse|flow-verification|uiRender|Bash|*", + "PreToolUse|flow-verification|uiRender|Bash", + ]) { + const stats = policy.scenarios[key]["verification"]; + expect(stats.wins).toBe(1); + expect(stats.directiveWins).toBe(0); + } + }); + }); + + describe("route-scoped applyPolicyBoosts", () => { + test("prefers exact-route stats for boost calculation", () => { + const policy: RoutingPolicyFile = { + version: 1, + scenarios: { + "PreToolUse|flow-verification|uiRender|Bash|/settings": { + verification: { + exposures: 5, + wins: 5, + directiveWins: 3, + staleMisses: 0, + lastUpdatedAt: T0, + }, + }, + "PreToolUse|flow-verification|uiRender|Bash|*": { + verification: { + exposures: 10, + wins: 3, + directiveWins: 0, + staleMisses: 7, + lastUpdatedAt: T0, + }, + }, + }, + }; + + const result = applyPolicyBoosts( + [{ skill: "verification", priority: 6 }], + policy, + { ...BASE_SCENARIO, routeScope: "/settings" }, + ); + + // Exact route: 5 wins / 5 exposures + 3*0.25 → rate > 0.80 → boost 8 + expect(result[0].policyBoost).toBe(8); + expect(result[0].effectivePriority).toBe(14); + expect(result[0].policyReason).toContain("|/settings"); + }); + + test("falls back to wildcard when exact route has no data for skill", () => { + const policy: RoutingPolicyFile = { + version: 1, + scenarios: { + "PreToolUse|flow-verification|uiRender|Bash|*": { + verification: { + exposures: 5, + wins: 4, + directiveWins: 2, + staleMisses: 1, + lastUpdatedAt: T0, + }, + }, + }, + }; + + const result = applyPolicyBoosts( + [{ skill: "verification", priority: 6 }], + policy, + { ...BASE_SCENARIO, routeScope: "/settings" }, + ); + + // Falls back to wildcard: 4+2*0.25=4.5/5=0.90 → boost 8 + expect(result[0].policyBoost).toBe(8); + expect(result[0].policyReason).toContain("|*"); + }); + + test("backward-compatible: no routeScope still works with legacy keys", () => { + const policy: RoutingPolicyFile = { + version: 1, + scenarios: { + "PreToolUse|flow-verification|uiRender|Bash": { + verification: { + exposures: 5, + wins: 4, + directiveWins: 2, + staleMisses: 1, + lastUpdatedAt: T0, + }, + }, + }, + }; + + const result = applyPolicyBoosts( + [{ skill: "verification", priority: 6 }], + policy, + BASE_SCENARIO, + ); + + expect(result[0].policyBoost).toBe(8); + expect(result[0].policyReason).toContain("PreToolUse|flow-verification|uiRender|Bash:"); + }); + }); +}); diff --git a/tests/routing-replay.test.ts b/tests/routing-replay.test.ts new file mode 100644 index 0000000..1c65184 --- /dev/null +++ b/tests/routing-replay.test.ts @@ -0,0 +1,554 @@ +import { describe, test, expect, beforeEach, afterEach } from "bun:test"; +import { rmSync, unlinkSync } from "node:fs"; +import { + appendRoutingDecisionTrace, + traceDir, + type RoutingDecisionTrace, +} from "../hooks/src/routing-decision-trace.mts"; +import { + appendSkillExposure, + sessionExposurePath, + type SkillExposure, +} from "../hooks/src/routing-policy-ledger.mts"; +import { + replayRoutingSession, + type RoutingReplayReport, +} from "../hooks/src/routing-replay.mts"; + +// --------------------------------------------------------------------------- +// Helpers +// --------------------------------------------------------------------------- + +const TEST_SESSION = "test-session-replay-" + Date.now(); +const TEST_PROJECT = "/tmp/test-project-replay"; + +const T0 = "2026-03-27T04:00:00.000Z"; +const T1 = "2026-03-27T04:01:00.000Z"; +const T2 = "2026-03-27T04:02:00.000Z"; +const T3 = "2026-03-27T04:03:00.000Z"; +const T4 = "2026-03-27T04:04:00.000Z"; +const T5 = "2026-03-27T04:05:00.000Z"; + +function makeTrace( + overrides: Partial = {}, +): RoutingDecisionTrace { + return { + version: 2, + decisionId: "replay-test-" + Math.random().toString(36).slice(2, 10), + sessionId: TEST_SESSION, + hook: "PreToolUse", + toolName: "Bash", + toolTarget: "npm run dev", + timestamp: T0, + primaryStory: { + id: "story-1", + kind: "flow-verification", + storyRoute: "/dashboard", + targetBoundary: "uiRender", + }, + observedRoute: null, + policyScenario: "PreToolUse|flow-verification|uiRender|Bash", + matchedSkills: ["agent-browser-verify"], + injectedSkills: ["agent-browser-verify"], + skippedReasons: [], + ranked: [ + { + skill: "agent-browser-verify", + basePriority: 7, + effectivePriority: 15, + pattern: { type: "bashPattern", value: "dev server" }, + profilerBoost: 0, + policyBoost: 8, + policyReason: "4/5 wins", + summaryOnly: false, + synthetic: false, + droppedReason: null, + }, + ], + verification: null, + ...overrides, + }; +} + +function makeExposure(overrides: Partial = {}): SkillExposure { + return { + id: `${TEST_SESSION}:test-skill:${Date.now()}-${Math.random()}`, + sessionId: TEST_SESSION, + projectRoot: TEST_PROJECT, + storyId: "story-1", + storyKind: "flow-verification", + route: "/dashboard", + hook: "PreToolUse", + toolName: "Bash", + skill: "agent-browser-verify", + targetBoundary: "uiRender", + exposureGroupId: null, + attributionRole: "candidate", + candidateSkill: null, + createdAt: T0, + resolvedAt: null, + outcome: "pending", + ...overrides, + }; +} + +function cleanup() { + try { + rmSync(traceDir(TEST_SESSION), { recursive: true, force: true }); + } catch {} + try { + unlinkSync(sessionExposurePath(TEST_SESSION)); + } catch {} +} + +// --------------------------------------------------------------------------- +// Tests +// --------------------------------------------------------------------------- + +describe("routing-replay", () => { + beforeEach(cleanup); + afterEach(cleanup); + + // ------------------------------------------------------------------------- + // Empty session + // ------------------------------------------------------------------------- + + test("returns empty report for session with no traces or exposures", () => { + const report = replayRoutingSession(TEST_SESSION); + expect(report.version).toBe(1); + expect(report.sessionId).toBe(TEST_SESSION); + expect(report.traceCount).toBe(0); + expect(report.scenarioCount).toBe(0); + expect(report.scenarios).toEqual([]); + expect(report.recommendations).toEqual([]); + }); + + // ------------------------------------------------------------------------- + // Determinism — byte-for-byte identical output + // ------------------------------------------------------------------------- + + test("produces identical JSON for identical input (deterministic)", () => { + // Write traces + appendRoutingDecisionTrace(makeTrace({ timestamp: T0 })); + appendRoutingDecisionTrace(makeTrace({ timestamp: T1 })); + + // Write exposures with wins + appendSkillExposure( + makeExposure({ createdAt: T0, resolvedAt: T1, outcome: "win" }), + ); + appendSkillExposure( + makeExposure({ createdAt: T1, resolvedAt: T2, outcome: "win" }), + ); + + const report1 = replayRoutingSession(TEST_SESSION); + const report2 = replayRoutingSession(TEST_SESSION); + + expect(JSON.stringify(report1)).toBe(JSON.stringify(report2)); + }); + + // ------------------------------------------------------------------------- + // Scenario grouping + // ------------------------------------------------------------------------- + + test("groups exposures by scenario key", () => { + appendRoutingDecisionTrace(makeTrace()); + + // Two different scenarios + appendSkillExposure( + makeExposure({ + hook: "PreToolUse", + storyKind: "flow-verification", + targetBoundary: "uiRender", + toolName: "Bash", + outcome: "win", + resolvedAt: T1, + }), + ); + appendSkillExposure( + makeExposure({ + hook: "UserPromptSubmit", + storyKind: "none", + targetBoundary: null, + toolName: "Read", + skill: "next-config", + outcome: "win", + resolvedAt: T2, + }), + ); + + const report = replayRoutingSession(TEST_SESSION); + + // Should have at least 2 scenarios (one from trace, others from exposures) + expect(report.scenarioCount).toBeGreaterThanOrEqual(2); + + // Scenarios must be sorted lexicographically + const names = report.scenarios.map((s) => s.scenario); + const sorted = [...names].sort(); + expect(names).toEqual(sorted); + }); + + // ------------------------------------------------------------------------- + // Win / directive-win / stale-miss accounting + // ------------------------------------------------------------------------- + + test("counts wins, directive-wins, and stale-misses correctly", () => { + const scenario = "PreToolUse|flow-verification|uiRender|Bash"; + appendRoutingDecisionTrace( + makeTrace({ policyScenario: scenario }), + ); + + // 2 plain wins + appendSkillExposure(makeExposure({ outcome: "win", resolvedAt: T1 })); + appendSkillExposure(makeExposure({ outcome: "win", resolvedAt: T2 })); + + // 1 directive-win (also counts as a win) + appendSkillExposure( + makeExposure({ outcome: "directive-win", resolvedAt: T3 }), + ); + + // 1 stale-miss + appendSkillExposure( + makeExposure({ outcome: "stale-miss", resolvedAt: T4 }), + ); + + // 1 pending (only exposure count) + appendSkillExposure(makeExposure({ outcome: "pending" })); + + const report = replayRoutingSession(TEST_SESSION); + const s = report.scenarios.find((sc) => sc.scenario === scenario); + + expect(s).toBeDefined(); + expect(s!.exposures).toBe(5); + expect(s!.wins).toBe(3); // 2 win + 1 directive-win + expect(s!.directiveWins).toBe(1); + expect(s!.staleMisses).toBe(1); + }); + + // ------------------------------------------------------------------------- + // Null-route attribution (strict scoping) + // ------------------------------------------------------------------------- + + test("separates null-route and non-null-route into distinct scenarios", () => { + // Exposure with route + appendSkillExposure( + makeExposure({ + route: "/dashboard", + storyKind: "flow-verification", + targetBoundary: "uiRender", + outcome: "win", + resolvedAt: T1, + }), + ); + + // Exposure with null route — different storyKind produces different scenario + appendSkillExposure( + makeExposure({ + route: null, + storyKind: "none", + targetBoundary: null, + outcome: "stale-miss", + resolvedAt: T2, + }), + ); + + const report = replayRoutingSession(TEST_SESSION); + + // The two exposures should land in different scenarios because + // buildScenarioKey uses storyKind and targetBoundary + const withBoundary = report.scenarios.find( + (s) => s.scenario === "PreToolUse|flow-verification|uiRender|Bash", + ); + const withoutBoundary = report.scenarios.find( + (s) => s.scenario === "PreToolUse|none|none|Bash", + ); + + expect(withBoundary).toBeDefined(); + expect(withBoundary!.wins).toBe(1); + expect(withBoundary!.staleMisses).toBe(0); + + expect(withoutBoundary).toBeDefined(); + expect(withoutBoundary!.wins).toBe(0); + expect(withoutBoundary!.staleMisses).toBe(1); + }); + + // ------------------------------------------------------------------------- + // Promote recommendation + // ------------------------------------------------------------------------- + + test("recommends promote for high success rate (>=80%, >=3 exposures)", () => { + // 4 wins out of 4 exposures + for (let i = 0; i < 4; i++) { + appendSkillExposure( + makeExposure({ outcome: "win", resolvedAt: T1 }), + ); + } + + const report = replayRoutingSession(TEST_SESSION); + const promo = report.recommendations.find((r) => r.action === "promote"); + + expect(promo).toBeDefined(); + expect(promo!.skill).toBe("agent-browser-verify"); + expect(promo!.suggestedBoost).toBe(8); + expect(promo!.confidence).toBeGreaterThanOrEqual(0.99); + }); + + // ------------------------------------------------------------------------- + // Demote recommendation + // ------------------------------------------------------------------------- + + test("recommends demote for low success rate (<15%, >=5 exposures)", () => { + // 0 wins out of 6 exposures (all stale-miss) + for (let i = 0; i < 6; i++) { + appendSkillExposure( + makeExposure({ outcome: "stale-miss", resolvedAt: T1 }), + ); + } + + const report = replayRoutingSession(TEST_SESSION); + const demote = report.recommendations.find((r) => r.action === "demote"); + + expect(demote).toBeDefined(); + expect(demote!.skill).toBe("agent-browser-verify"); + expect(demote!.suggestedBoost).toBe(-2); + expect(demote!.confidence).toBe(1); + }); + + // ------------------------------------------------------------------------- + // Investigate recommendation + // ------------------------------------------------------------------------- + + test("recommends investigate for mixed results (40-65%, >=3 exposures)", () => { + // 2 wins out of 4 exposures = 50% + appendSkillExposure(makeExposure({ outcome: "win", resolvedAt: T1 })); + appendSkillExposure(makeExposure({ outcome: "win", resolvedAt: T2 })); + appendSkillExposure( + makeExposure({ outcome: "stale-miss", resolvedAt: T3 }), + ); + appendSkillExposure( + makeExposure({ outcome: "stale-miss", resolvedAt: T4 }), + ); + + const report = replayRoutingSession(TEST_SESSION); + const inv = report.recommendations.find((r) => r.action === "investigate"); + + expect(inv).toBeDefined(); + expect(inv!.skill).toBe("agent-browser-verify"); + expect(inv!.suggestedBoost).toBe(0); + expect(inv!.confidence).toBe(0.5); + }); + + // ------------------------------------------------------------------------- + // No recommendation in dead zone + // ------------------------------------------------------------------------- + + test("produces no recommendation for insufficient data", () => { + // Only 2 exposures — below all thresholds + appendSkillExposure(makeExposure({ outcome: "win", resolvedAt: T1 })); + appendSkillExposure( + makeExposure({ outcome: "stale-miss", resolvedAt: T2 }), + ); + + const report = replayRoutingSession(TEST_SESSION); + expect(report.recommendations).toEqual([]); + }); + + // ------------------------------------------------------------------------- + // Directive-win vs plain win tracking + // ------------------------------------------------------------------------- + + test("tracks directive-wins separately within win count", () => { + appendSkillExposure(makeExposure({ outcome: "win", resolvedAt: T1 })); + appendSkillExposure( + makeExposure({ outcome: "directive-win", resolvedAt: T2 }), + ); + appendSkillExposure( + makeExposure({ outcome: "directive-win", resolvedAt: T3 }), + ); + + const report = replayRoutingSession(TEST_SESSION); + const s = report.scenarios[0]; + + expect(s.wins).toBe(3); + expect(s.directiveWins).toBe(2); + expect(s.topSkills[0].wins).toBe(3); + expect(s.topSkills[0].directiveWins).toBe(2); + }); + + // ------------------------------------------------------------------------- + // Stable skill ordering within scenario + // ------------------------------------------------------------------------- + + test("sorts skills by wins desc, exposures desc, name asc", () => { + // skill-b: 3 wins out of 3 + for (let i = 0; i < 3; i++) { + appendSkillExposure( + makeExposure({ skill: "skill-b", outcome: "win", resolvedAt: T1 }), + ); + } + + // skill-a: 2 wins out of 4 + appendSkillExposure( + makeExposure({ skill: "skill-a", outcome: "win", resolvedAt: T1 }), + ); + appendSkillExposure( + makeExposure({ skill: "skill-a", outcome: "win", resolvedAt: T2 }), + ); + appendSkillExposure( + makeExposure({ skill: "skill-a", outcome: "stale-miss", resolvedAt: T3 }), + ); + appendSkillExposure( + makeExposure({ skill: "skill-a", outcome: "stale-miss", resolvedAt: T4 }), + ); + + // skill-c: 2 wins out of 2 (same wins as skill-a, fewer exposures) + appendSkillExposure( + makeExposure({ skill: "skill-c", outcome: "win", resolvedAt: T1 }), + ); + appendSkillExposure( + makeExposure({ skill: "skill-c", outcome: "win", resolvedAt: T2 }), + ); + + const report = replayRoutingSession(TEST_SESSION); + const skills = report.scenarios[0].topSkills.map((s) => s.skill); + + // skill-b (3 wins) > skill-a (2 wins, 4 exp) > skill-c (2 wins, 2 exp) + expect(skills).toEqual(["skill-b", "skill-a", "skill-c"]); + }); + + // ------------------------------------------------------------------------- + // Recommendation stable ordering + // ------------------------------------------------------------------------- + + test("sorts recommendations by scenario asc then skill asc", () => { + // Two scenarios with promotable skills + for (let i = 0; i < 4; i++) { + appendSkillExposure( + makeExposure({ + hook: "UserPromptSubmit", + storyKind: "none", + targetBoundary: null, + toolName: "Read", + skill: "next-config", + outcome: "win", + resolvedAt: T1, + }), + ); + } + for (let i = 0; i < 4; i++) { + appendSkillExposure( + makeExposure({ + skill: "agent-browser-verify", + outcome: "win", + resolvedAt: T1, + }), + ); + } + + const report = replayRoutingSession(TEST_SESSION); + const recs = report.recommendations; + + expect(recs.length).toBeGreaterThanOrEqual(2); + + // Verify stable sort + for (let i = 1; i < recs.length; i++) { + const cmp = + recs[i - 1].scenario.localeCompare(recs[i].scenario) || + recs[i - 1].skill.localeCompare(recs[i].skill); + expect(cmp).toBeLessThanOrEqual(0); + } + }); + + // ------------------------------------------------------------------------- + // Traces without exposures produce empty scenarios + // ------------------------------------------------------------------------- + + test("traces seed scenario keys even with no matching exposures", () => { + appendRoutingDecisionTrace( + makeTrace({ + policyScenario: "PreToolUse|flow-verification|uiRender|Bash", + }), + ); + + const report = replayRoutingSession(TEST_SESSION); + + expect(report.traceCount).toBe(1); + expect(report.scenarioCount).toBe(1); + expect(report.scenarios[0].exposures).toBe(0); + expect(report.scenarios[0].topSkills).toEqual([]); + }); + + // ------------------------------------------------------------------------- + // Multi-skill per scenario + // ------------------------------------------------------------------------- + + test("tracks multiple skills within the same scenario independently", () => { + appendSkillExposure( + makeExposure({ skill: "alpha", outcome: "win", resolvedAt: T1 }), + ); + appendSkillExposure( + makeExposure({ skill: "alpha", outcome: "stale-miss", resolvedAt: T2 }), + ); + appendSkillExposure( + makeExposure({ skill: "beta", outcome: "win", resolvedAt: T3 }), + ); + + const report = replayRoutingSession(TEST_SESSION); + const s = report.scenarios[0]; + + expect(s.topSkills.length).toBe(2); + + const alpha = s.topSkills.find((sk) => sk.skill === "alpha"); + const beta = s.topSkills.find((sk) => sk.skill === "beta"); + + expect(alpha).toBeDefined(); + expect(alpha!.exposures).toBe(2); + expect(alpha!.wins).toBe(1); + + expect(beta).toBeDefined(); + expect(beta!.exposures).toBe(1); + expect(beta!.wins).toBe(1); + }); + + // ------------------------------------------------------------------------- + // Synthetic injection fidelity — traces with synthetic markers + // ------------------------------------------------------------------------- + + test("includes traces with synthetic ranked skills in trace count", () => { + appendRoutingDecisionTrace( + makeTrace({ + ranked: [ + { + skill: "react-best-practices", + basePriority: 6, + effectivePriority: 6, + pattern: null, + profilerBoost: 0, + policyBoost: 0, + policyReason: null, + summaryOnly: false, + synthetic: true, + droppedReason: null, + }, + ], + }), + ); + + const report = replayRoutingSession(TEST_SESSION); + expect(report.traceCount).toBe(1); + }); + + // ------------------------------------------------------------------------- + // Report version and structure + // ------------------------------------------------------------------------- + + test("report has version 1 and all required fields", () => { + const report = replayRoutingSession(TEST_SESSION); + + expect(report.version).toBe(1); + expect(report.sessionId).toBe(TEST_SESSION); + expect(typeof report.traceCount).toBe("number"); + expect(typeof report.scenarioCount).toBe("number"); + expect(Array.isArray(report.scenarios)).toBe(true); + expect(Array.isArray(report.recommendations)).toBe(true); + }); +}); diff --git a/tests/rule-distillation.test.ts b/tests/rule-distillation.test.ts new file mode 100644 index 0000000..d9ec157 --- /dev/null +++ b/tests/rule-distillation.test.ts @@ -0,0 +1,706 @@ +import { describe, test, expect } from "bun:test"; +import { + computeRuleLift, + classifyRuleConfidence, + distillRulesFromTrace, + replayLearnedRules, +} from "../hooks/src/rule-distillation.mts"; +import type { + LearnedRoutingRulesFile, + LearnedRoutingRule, + DistillRulesParams, +} from "../hooks/src/rule-distillation.mts"; +import type { RoutingDecisionTrace, RankedSkillTrace } from "../hooks/src/routing-decision-trace.mts"; +import type { SkillExposure } from "../hooks/src/routing-policy-ledger.mts"; +import type { RoutingPolicyFile } from "../hooks/src/routing-policy.mts"; +import { createEmptyRoutingPolicy } from "../hooks/src/routing-policy.mts"; + +// --------------------------------------------------------------------------- +// Helpers +// --------------------------------------------------------------------------- + +const FIXED_TS = "2026-03-28T06:00:00.000Z"; + +function makeTrace(overrides: Partial & { decisionId: string }): RoutingDecisionTrace { + return { + version: 2, + sessionId: "sess-1", + hook: "PreToolUse", + toolName: "Read", + toolTarget: "/app/page.tsx", + timestamp: FIXED_TS, + primaryStory: { + id: "story-1", + kind: "feature", + storyRoute: "/app", + targetBoundary: "uiRender", + }, + observedRoute: "/app", + policyScenario: null, + matchedSkills: [], + injectedSkills: [], + skippedReasons: [], + ranked: [], + verification: null, + ...overrides, + }; +} + +function makeRanked(overrides: Partial & { skill: string }): RankedSkillTrace { + return { + basePriority: 6, + effectivePriority: 6, + pattern: null, + profilerBoost: 0, + policyBoost: 0, + policyReason: null, + summaryOnly: false, + synthetic: false, + droppedReason: null, + ...overrides, + }; +} + +function makeExposure(overrides: Partial & { skill: string }): SkillExposure { + return { + id: `exp-${overrides.skill}-${Date.now()}`, + sessionId: "sess-1", + projectRoot: "/test", + storyId: "story-1", + storyKind: "feature", + route: "/app", + hook: "PreToolUse", + toolName: "Read", + targetBoundary: "uiRender", + exposureGroupId: null, + attributionRole: "candidate", + candidateSkill: overrides.skill, + createdAt: FIXED_TS, + resolvedAt: FIXED_TS, + outcome: "win", + ...overrides, + }; +} + +function makeDistillParams(overrides: Partial = {}): DistillRulesParams { + return { + projectRoot: "/test/project", + traces: [], + exposures: [], + policy: createEmptyRoutingPolicy(), + generatedAt: FIXED_TS, + ...overrides, + }; +} + +// --------------------------------------------------------------------------- +// computeRuleLift +// --------------------------------------------------------------------------- + +describe("computeRuleLift", () => { + test("returns rulePrecision when scenarioPrecision is 0", () => { + const lift = computeRuleLift({ + wins: 4, + support: 5, + scenarioWins: 0, + scenarioExposures: 0, + }); + expect(lift).toBe(0.8); // 4/5 + }); + + test("computes ratio of rule precision to scenario precision", () => { + const lift = computeRuleLift({ + wins: 4, + support: 5, + scenarioWins: 10, + scenarioExposures: 25, + }); + // rulePrecision = 4/5 = 0.8, scenarioPrecision = 10/25 = 0.4 + expect(lift).toBe(2.0); + }); + + test("returns 1.0 when rule matches scenario precision", () => { + const lift = computeRuleLift({ + wins: 3, + support: 10, + scenarioWins: 6, + scenarioExposures: 20, + }); + // 0.3 / 0.3 = 1.0 + expect(lift).toBe(1.0); + }); + + test("handles zero support gracefully", () => { + const lift = computeRuleLift({ + wins: 0, + support: 0, + scenarioWins: 5, + scenarioExposures: 10, + }); + expect(lift).toBe(0); + }); +}); + +// --------------------------------------------------------------------------- +// classifyRuleConfidence +// --------------------------------------------------------------------------- + +describe("classifyRuleConfidence", () => { + test("returns holdout-fail when regressions > 0", () => { + expect( + classifyRuleConfidence({ support: 10, precision: 0.9, lift: 2.0, regressions: 1 }), + ).toBe("holdout-fail"); + }); + + test("returns promote when all thresholds met", () => { + expect( + classifyRuleConfidence({ support: 5, precision: 0.8, lift: 1.5, regressions: 0 }), + ).toBe("promote"); + }); + + test("returns candidate when intermediate thresholds met", () => { + expect( + classifyRuleConfidence({ support: 3, precision: 0.65, lift: 1.1, regressions: 0 }), + ).toBe("candidate"); + }); + + test("returns holdout-fail when below candidate thresholds", () => { + expect( + classifyRuleConfidence({ support: 2, precision: 0.5, lift: 1.0, regressions: 0 }), + ).toBe("holdout-fail"); + }); + + test("promote requires all three thresholds simultaneously", () => { + // High precision and lift but low support + expect( + classifyRuleConfidence({ support: 4, precision: 0.9, lift: 2.0, regressions: 0 }), + ).toBe("candidate"); + // High support and lift but low precision + expect( + classifyRuleConfidence({ support: 10, precision: 0.7, lift: 2.0, regressions: 0 }), + ).toBe("candidate"); + // High support and precision but low lift + expect( + classifyRuleConfidence({ support: 10, precision: 0.9, lift: 1.0, regressions: 0 }), + ).toBe("holdout-fail"); + }); + + test("regressions override even excellent metrics", () => { + expect( + classifyRuleConfidence({ support: 100, precision: 1.0, lift: 5.0, regressions: 1 }), + ).toBe("holdout-fail"); + }); +}); + +// --------------------------------------------------------------------------- +// distillRulesFromTrace — determinism +// --------------------------------------------------------------------------- + +describe("distillRulesFromTrace", () => { + test("returns valid LearnedRoutingRulesFile shape with empty inputs", () => { + const result = distillRulesFromTrace(makeDistillParams()); + expect(result.version).toBe(1); + expect(result.generatedAt).toBe(FIXED_TS); + expect(result.projectRoot).toBe("/test/project"); + expect(result.rules).toEqual([]); + expect(result.replay).toEqual({ + baselineWins: 0, + baselineDirectiveWins: 0, + learnedWins: 0, + learnedDirectiveWins: 0, + deltaWins: 0, + deltaDirectiveWins: 0, + regressions: [], + }); + }); + + test("identical inputs produce byte-for-byte identical JSON", () => { + const traces = [ + makeTrace({ + decisionId: "d1", + injectedSkills: ["next-config"], + ranked: [makeRanked({ skill: "next-config", pattern: { type: "path", value: "next.config.*" } })], + }), + makeTrace({ + decisionId: "d2", + injectedSkills: ["next-config"], + ranked: [makeRanked({ skill: "next-config", pattern: { type: "path", value: "next.config.*" } })], + }), + ]; + const exposures = [ + makeExposure({ skill: "next-config", outcome: "win" }), + ]; + + const params = makeDistillParams({ traces, exposures }); + const result1 = distillRulesFromTrace(params); + const result2 = distillRulesFromTrace(params); + + expect(JSON.stringify(result1)).toBe(JSON.stringify(result2)); + }); + + test("does not promote rules below support threshold", () => { + // Only 2 traces — below default minSupport of 5 + const traces = [ + makeTrace({ + decisionId: "d1", + injectedSkills: ["next-config"], + ranked: [makeRanked({ skill: "next-config" })], + }), + makeTrace({ + decisionId: "d2", + injectedSkills: ["next-config"], + ranked: [makeRanked({ skill: "next-config" })], + }), + ]; + const exposures = [ + makeExposure({ skill: "next-config", outcome: "win" }), + ]; + + const result = distillRulesFromTrace(makeDistillParams({ traces, exposures })); + for (const rule of result.rules) { + expect(rule.confidence).not.toBe("promote"); + } + }); + + test("promotes rules meeting all thresholds", () => { + // next-config wins 6/6 (precision=1.0), but we also add losing traces + // for a different skill in the same scenario so lift > 1. + const rankedWin = [makeRanked({ skill: "next-config", pattern: { type: "path", value: "next.config.*" } })]; + const rankedLose = [makeRanked({ skill: "tailwind", pattern: { type: "path", value: "tailwind.*" } })]; + + const winTraces = Array.from({ length: 6 }, (_, i) => + makeTrace({ + decisionId: `win${i}`, + injectedSkills: ["next-config"], + ranked: rankedWin, + }), + ); + // 6 losing traces in the same scenario bring scenario precision down + const loseTraces = Array.from({ length: 6 }, (_, i) => + makeTrace({ + decisionId: `lose${i}`, + injectedSkills: ["tailwind"], + ranked: rankedLose, + }), + ); + + const exposures = [ + makeExposure({ skill: "next-config", outcome: "win" }), + makeExposure({ skill: "tailwind", outcome: "stale-miss" }), + ]; + + const result = distillRulesFromTrace( + makeDistillParams({ traces: [...winTraces, ...loseTraces], exposures }), + ); + const promoted = result.rules.filter((r) => r.confidence === "promote"); + expect(promoted.length).toBeGreaterThanOrEqual(1); + for (const rule of promoted) { + expect(rule.promotedAt).toBe(FIXED_TS); + expect(rule.support).toBeGreaterThanOrEqual(5); + expect(rule.precision).toBeGreaterThanOrEqual(0.8); + } + }); + + test("skips dropped ranked skills", () => { + const traces = [ + makeTrace({ + decisionId: "d1", + injectedSkills: [], + ranked: [makeRanked({ skill: "next-config", droppedReason: "deduped" })], + }), + ]; + const exposures = [makeExposure({ skill: "next-config" })]; + + const result = distillRulesFromTrace(makeDistillParams({ traces, exposures })); + expect(result.rules.length).toBe(0); + }); + + test("skips context-role exposures (only candidate attribution)", () => { + const traces = [ + makeTrace({ + decisionId: "d1", + injectedSkills: ["next-config"], + ranked: [makeRanked({ skill: "next-config" })], + }), + ]; + const exposures = [ + makeExposure({ skill: "next-config", attributionRole: "context" }), + ]; + + const result = distillRulesFromTrace(makeDistillParams({ traces, exposures })); + expect(result.rules.length).toBe(0); + }); + + test("tracks directive wins separately", () => { + const ranked = [makeRanked({ skill: "next-config" })]; + const traces = Array.from({ length: 6 }, (_, i) => + makeTrace({ decisionId: `d${i}`, injectedSkills: ["next-config"], ranked }), + ); + const exposures = [ + makeExposure({ skill: "next-config", outcome: "directive-win" }), + ]; + + const result = distillRulesFromTrace(makeDistillParams({ traces, exposures })); + expect(result.rules.length).toBeGreaterThanOrEqual(1); + const rule = result.rules[0]!; + expect(rule.directiveWins).toBeGreaterThan(0); + expect(rule.wins).toBeGreaterThanOrEqual(rule.directiveWins); + }); + + test("counts stale misses correctly", () => { + const ranked = [makeRanked({ skill: "tailwind" })]; + const traces = Array.from({ length: 4 }, (_, i) => + makeTrace({ decisionId: `d${i}`, injectedSkills: ["tailwind"], ranked }), + ); + const exposures = [ + makeExposure({ skill: "tailwind", outcome: "stale-miss" }), + ]; + + const result = distillRulesFromTrace(makeDistillParams({ traces, exposures })); + expect(result.rules.length).toBeGreaterThanOrEqual(1); + const rule = result.rules[0]!; + expect(rule.staleMisses).toBeGreaterThan(0); + expect(rule.wins).toBe(0); + }); + + test("sorts rules deterministically: promote > candidate > holdout-fail, then skill, then id", () => { + // Create two skills with different confidence levels + const rankedA = [makeRanked({ skill: "a-skill", pattern: { type: "path", value: "a.*" } })]; + const rankedB = [makeRanked({ skill: "b-skill", pattern: { type: "path", value: "b.*" } })]; + + // a-skill: 6 wins (promote-level) + const tracesA = Array.from({ length: 6 }, (_, i) => + makeTrace({ + decisionId: `a${i}`, + injectedSkills: ["a-skill"], + ranked: rankedA, + }), + ); + // b-skill: 3 wins (candidate-level) + const tracesB = Array.from({ length: 3 }, (_, i) => + makeTrace({ + decisionId: `b${i}`, + injectedSkills: ["b-skill"], + ranked: rankedB, + }), + ); + + const exposures = [ + makeExposure({ skill: "a-skill", outcome: "win" }), + makeExposure({ skill: "b-skill", outcome: "win" }), + ]; + + const result = distillRulesFromTrace( + makeDistillParams({ + traces: [...tracesA, ...tracesB], + exposures, + }), + ); + + // Promoted rules should come first + const confidences = result.rules.map((r) => r.confidence); + const promoteIdx = confidences.indexOf("promote"); + const candidateIdx = confidences.indexOf("candidate"); + if (promoteIdx !== -1 && candidateIdx !== -1) { + expect(promoteIdx).toBeLessThan(candidateIdx); + } + }); + + test("sourceDecisionIds are sorted for determinism", () => { + const ranked = [makeRanked({ skill: "next-config" })]; + const traces = ["z-id", "a-id", "m-id"].map((id) => + makeTrace({ decisionId: id, injectedSkills: ["next-config"], ranked }), + ); + const exposures = [makeExposure({ skill: "next-config", outcome: "win" })]; + + const result = distillRulesFromTrace(makeDistillParams({ traces, exposures })); + expect(result.rules.length).toBeGreaterThanOrEqual(1); + const ids = result.rules[0]!.sourceDecisionIds; + expect(ids).toEqual([...ids].sort()); + }); + + test("respects custom minSupport/minPrecision/minLift", () => { + const ranked = [makeRanked({ skill: "next-config" })]; + // 3 traces, all wins + const traces = Array.from({ length: 3 }, (_, i) => + makeTrace({ decisionId: `d${i}`, injectedSkills: ["next-config"], ranked }), + ); + const exposures = [makeExposure({ skill: "next-config", outcome: "win" })]; + + // With default thresholds — not enough support for promote + const strict = distillRulesFromTrace(makeDistillParams({ traces, exposures })); + expect(strict.rules.every((r) => r.confidence !== "promote")).toBe(true); + + // With relaxed thresholds — should promote + const relaxed = distillRulesFromTrace( + makeDistillParams({ traces, exposures, minSupport: 2, minPrecision: 0.5, minLift: 1.0 }), + ); + // Note: classifyRuleConfidence still has its own thresholds, so this tests the path coverage + expect(relaxed.rules.length).toBeGreaterThanOrEqual(1); + }); +}); + +// --------------------------------------------------------------------------- +// replayLearnedRules +// --------------------------------------------------------------------------- + +describe("replayLearnedRules", () => { + test("returns zeros with empty inputs", () => { + const result = replayLearnedRules({ traces: [], rules: [] }); + expect(result).toEqual({ + baselineWins: 0, + baselineDirectiveWins: 0, + learnedWins: 0, + learnedDirectiveWins: 0, + deltaWins: 0, + deltaDirectiveWins: 0, + regressions: [], + }); + }); + + test("baseline wins carry through when no learned rules apply", () => { + const traces = [ + makeTrace({ + decisionId: "d1", + injectedSkills: ["next-config"], + verification: { + verificationId: "v1", + observedBoundary: "uiRender", + matchedSuggestedAction: true, + }, + }), + ]; + + const result = replayLearnedRules({ traces, rules: [] }); + expect(result.baselineWins).toBe(1); + expect(result.learnedWins).toBe(1); + expect(result.deltaWins).toBe(0); + expect(result.regressions).toEqual([]); + }); + + test("counts learned wins when promoted rules overlap with injected skills", () => { + const traces = [ + makeTrace({ + decisionId: "d1", + injectedSkills: ["next-config"], + verification: { + verificationId: "v1", + observedBoundary: "uiRender", + matchedSuggestedAction: true, + }, + }), + ]; + + const rules: LearnedRoutingRule[] = [ + { + id: "pathPattern:next-config:next.config.*", + skill: "next-config", + kind: "pathPattern", + value: "next.config.*", + scenario: { + hook: "PreToolUse", + storyKind: "feature", + targetBoundary: "uiRender", + toolName: "Read", + routeScope: "/app", + }, + support: 10, + wins: 9, + directiveWins: 3, + staleMisses: 0, + precision: 0.9, + lift: 2.0, + sourceDecisionIds: ["d1"], + confidence: "promote", + promotedAt: FIXED_TS, + }, + ]; + + const result = replayLearnedRules({ traces, rules }); + expect(result.baselineWins).toBe(1); + expect(result.learnedWins).toBe(1); + expect(result.deltaWins).toBe(0); + }); + + test("does not count non-promoted rules", () => { + const traces = [ + makeTrace({ + decisionId: "d1", + injectedSkills: ["next-config"], + verification: null, + }), + ]; + + const rules: LearnedRoutingRule[] = [ + { + id: "test-rule", + skill: "next-config", + kind: "pathPattern", + value: "next.config.*", + scenario: { + hook: "PreToolUse", + storyKind: "feature", + targetBoundary: "uiRender", + toolName: "Read", + routeScope: "/app", + }, + support: 3, + wins: 2, + directiveWins: 0, + staleMisses: 0, + precision: 0.67, + lift: 1.2, + sourceDecisionIds: [], + confidence: "candidate", // Not promoted + promotedAt: null, + }, + ]; + + const result = replayLearnedRules({ traces, rules }); + expect(result.baselineWins).toBe(0); + expect(result.learnedWins).toBe(0); + }); + + test("regressions list is sorted for determinism", () => { + const traces = [ + makeTrace({ + decisionId: "z-trace", + injectedSkills: ["skill-a"], + verification: { + verificationId: "v1", + observedBoundary: "uiRender", + matchedSuggestedAction: true, + }, + }), + makeTrace({ + decisionId: "a-trace", + injectedSkills: ["skill-a"], + verification: { + verificationId: "v2", + observedBoundary: "uiRender", + matchedSuggestedAction: true, + }, + }), + ]; + + // Promoted rule for different skill — won't cover skill-a traces + const rules: LearnedRoutingRule[] = [ + { + id: "test-rule", + skill: "other-skill", + kind: "pathPattern", + value: "other.*", + scenario: { + hook: "PreToolUse", + storyKind: "feature", + targetBoundary: "uiRender", + toolName: "Read", + routeScope: "/app", + }, + support: 10, + wins: 9, + directiveWins: 0, + staleMisses: 0, + precision: 0.9, + lift: 2.0, + sourceDecisionIds: [], + confidence: "promote", + promotedAt: FIXED_TS, + }, + ]; + + const result = replayLearnedRules({ traces, rules }); + if (result.regressions.length > 1) { + expect(result.regressions).toEqual([...result.regressions].sort()); + } + }); +}); + +// --------------------------------------------------------------------------- +// Integration: distill + replay pipeline +// --------------------------------------------------------------------------- + +describe("distill + replay integration", () => { + test("replay regressions downgrade promoted rules to holdout-fail", () => { + // Create traces where a skill wins in baseline + const ranked = [makeRanked({ skill: "next-config" })]; + const traces = Array.from({ length: 6 }, (_, i) => + makeTrace({ + decisionId: `d${i}`, + injectedSkills: ["next-config"], + ranked, + verification: { + verificationId: `v${i}`, + observedBoundary: "uiRender", + matchedSuggestedAction: true, + }, + }), + ); + + // But exposure says stale-miss — rule will have low precision, won't promote + const exposures = [makeExposure({ skill: "next-config", outcome: "stale-miss" })]; + + const result = distillRulesFromTrace(makeDistillParams({ traces, exposures })); + // No rule should be promoted because precision is 0 (no wins) + for (const rule of result.rules) { + expect(rule.confidence).not.toBe("promote"); + } + }); + + test("end-to-end: winning skill gets promoted with sufficient evidence", () => { + const rankedWin = [ + makeRanked({ + skill: "next-config", + pattern: { type: "path", value: "next.config.*" }, + }), + ]; + const rankedLose = [ + makeRanked({ + skill: "tailwind", + pattern: { type: "path", value: "tailwind.*" }, + }), + ]; + + // 8 winning traces for next-config + const winTraces = Array.from({ length: 8 }, (_, i) => + makeTrace({ + decisionId: `win${i}`, + injectedSkills: ["next-config"], + ranked: rankedWin, + verification: { + verificationId: `v${i}`, + observedBoundary: "uiRender", + matchedSuggestedAction: true, + }, + }), + ); + // 8 losing traces for tailwind in same scenario — creates lift > 1 + const loseTraces = Array.from({ length: 8 }, (_, i) => + makeTrace({ + decisionId: `lose${i}`, + injectedSkills: ["tailwind"], + ranked: rankedLose, + }), + ); + + const exposures = [ + makeExposure({ skill: "next-config", outcome: "win" }), + makeExposure({ skill: "tailwind", outcome: "stale-miss" }), + ]; + + const result = distillRulesFromTrace( + makeDistillParams({ traces: [...winTraces, ...loseTraces], exposures }), + ); + + expect(result.version).toBe(1); + expect(result.rules.length).toBeGreaterThanOrEqual(1); + + const promoted = result.rules.filter((r) => r.confidence === "promote"); + expect(promoted.length).toBeGreaterThanOrEqual(1); + expect(promoted[0]!.skill).toBe("next-config"); + expect(promoted[0]!.precision).toBeGreaterThanOrEqual(0.8); + expect(promoted[0]!.lift).toBeGreaterThanOrEqual(1.0); + expect(promoted[0]!.promotedAt).toBe(FIXED_TS); + expect(result.replay.regressions).toEqual([]); + }); +}); diff --git a/tests/rule-replay.test.ts b/tests/rule-replay.test.ts new file mode 100644 index 0000000..e4f0d10 --- /dev/null +++ b/tests/rule-replay.test.ts @@ -0,0 +1,662 @@ +import { describe, test, expect } from "bun:test"; +import { replayLearnedRules } from "../hooks/src/rule-replay.mts"; +import type { ReplayResult } from "../hooks/src/rule-replay.mts"; +import type { LearnedRoutingRule } from "../hooks/src/rule-distillation.mts"; +import type { + RoutingDecisionTrace, + RankedSkillTrace, +} from "../hooks/src/routing-decision-trace.mts"; + +// --------------------------------------------------------------------------- +// Helpers +// --------------------------------------------------------------------------- + +const FIXED_TS = "2026-03-28T06:00:00.000Z"; + +function makeTrace( + overrides: Partial & { decisionId: string }, +): RoutingDecisionTrace { + return { + version: 2, + sessionId: "sess-1", + hook: "PreToolUse", + toolName: "Read", + toolTarget: "/app/page.tsx", + timestamp: FIXED_TS, + primaryStory: { + id: "story-1", + kind: "feature", + storyRoute: "/app", + targetBoundary: "uiRender", + }, + observedRoute: "/app", + policyScenario: null, + matchedSkills: [], + injectedSkills: [], + skippedReasons: [], + ranked: [], + verification: null, + ...overrides, + }; +} + +function makeRule( + overrides: Partial & { id: string; skill: string }, +): LearnedRoutingRule { + return { + kind: "pathPattern", + value: "*.tsx", + scenario: { + hook: "PreToolUse", + storyKind: "feature", + targetBoundary: "uiRender", + toolName: "Read", + routeScope: "/app", + }, + support: 10, + wins: 9, + directiveWins: 0, + staleMisses: 0, + precision: 0.9, + lift: 2.0, + sourceDecisionIds: [], + confidence: "promote", + promotedAt: FIXED_TS, + ...overrides, + }; +} + +/** Verified trace with matchedSuggestedAction === true (directive-adherent win). */ +function verifiedTrace( + decisionId: string, + injectedSkills: string[], +): RoutingDecisionTrace { + return makeTrace({ + decisionId, + injectedSkills, + verification: { + verificationId: `v-${decisionId}`, + observedBoundary: "uiRender", + matchedSuggestedAction: true, + }, + }); +} + +/** Verified trace with matchedSuggestedAction === false (verified success, not directive-adherent). */ +function verifiedNonDirectiveTrace( + decisionId: string, + injectedSkills: string[], +): RoutingDecisionTrace { + return makeTrace({ + decisionId, + injectedSkills, + verification: { + verificationId: `v-${decisionId}`, + observedBoundary: "uiRender", + matchedSuggestedAction: false, + }, + }); +} + +function unverifiedTrace( + decisionId: string, + injectedSkills: string[], +): RoutingDecisionTrace { + return makeTrace({ decisionId, injectedSkills, verification: null }); +} + +// --------------------------------------------------------------------------- +// Empty / trivial inputs +// --------------------------------------------------------------------------- + +describe("replayLearnedRules — empty inputs", () => { + test("returns zeros with no traces and no rules", () => { + const result = replayLearnedRules({ traces: [], rules: [] }); + expect(result).toEqual({ + baselineWins: 0, + baselineDirectiveWins: 0, + learnedWins: 0, + learnedDirectiveWins: 0, + deltaWins: 0, + deltaDirectiveWins: 0, + regressions: [], + }); + }); + + test("returns zeros with traces but no rules", () => { + const result = replayLearnedRules({ + traces: [unverifiedTrace("d1", ["next-config"])], + rules: [], + }); + expect(result).toEqual({ + baselineWins: 0, + baselineDirectiveWins: 0, + learnedWins: 0, + learnedDirectiveWins: 0, + deltaWins: 0, + deltaDirectiveWins: 0, + regressions: [], + }); + }); + + test("returns zeros with rules but no traces", () => { + const result = replayLearnedRules({ + traces: [], + rules: [makeRule({ id: "r1", skill: "next-config" })], + }); + expect(result).toEqual({ + baselineWins: 0, + baselineDirectiveWins: 0, + learnedWins: 0, + learnedDirectiveWins: 0, + deltaWins: 0, + deltaDirectiveWins: 0, + regressions: [], + }); + }); +}); + +// --------------------------------------------------------------------------- +// Baseline carry-through (no promoted rules for scenario) +// --------------------------------------------------------------------------- + +describe("replayLearnedRules — baseline carry-through", () => { + test("baseline wins carry through when no promoted rules apply", () => { + const result = replayLearnedRules({ + traces: [verifiedTrace("d1", ["next-config"])], + rules: [], + }); + expect(result.baselineWins).toBe(1); + expect(result.baselineDirectiveWins).toBe(1); + expect(result.learnedWins).toBe(1); + expect(result.learnedDirectiveWins).toBe(1); + expect(result.deltaWins).toBe(0); + expect(result.deltaDirectiveWins).toBe(0); + expect(result.regressions).toEqual([]); + }); + + test("multiple baseline wins carry through independently", () => { + const result = replayLearnedRules({ + traces: [ + verifiedTrace("d1", ["next-config"]), + verifiedTrace("d2", ["tailwind"]), + unverifiedTrace("d3", ["react"]), + ], + rules: [], + }); + expect(result.baselineWins).toBe(2); + expect(result.baselineDirectiveWins).toBe(2); + expect(result.learnedWins).toBe(2); + expect(result.learnedDirectiveWins).toBe(2); + expect(result.deltaWins).toBe(0); + expect(result.regressions).toEqual([]); + }); + + test("non-promoted rules do not affect replay", () => { + const result = replayLearnedRules({ + traces: [verifiedTrace("d1", ["next-config"])], + rules: [ + makeRule({ id: "r1", skill: "next-config", confidence: "candidate", promotedAt: null }), + makeRule({ id: "r2", skill: "next-config", confidence: "holdout-fail", promotedAt: null }), + ], + }); + // candidate and holdout-fail rules are ignored → baseline carries through + expect(result.baselineWins).toBe(1); + expect(result.learnedWins).toBe(1); + expect(result.regressions).toEqual([]); + }); +}); + +// --------------------------------------------------------------------------- +// Verified success vs directive adherence +// --------------------------------------------------------------------------- + +describe("replayLearnedRules — verified success vs directive adherence", () => { + test("verified non-directive trace counts as baseline win but not directive win", () => { + const result = replayLearnedRules({ + traces: [verifiedNonDirectiveTrace("d1", ["next-config"])], + rules: [], + }); + expect(result.baselineWins).toBe(1); + expect(result.baselineDirectiveWins).toBe(0); + expect(result.learnedWins).toBe(1); + expect(result.learnedDirectiveWins).toBe(0); + }); + + test("mix of directive and non-directive verified traces counted separately", () => { + const result = replayLearnedRules({ + traces: [ + verifiedTrace("d1", ["skill-a"]), // directive-adherent + verifiedNonDirectiveTrace("d2", ["skill-b"]), // verified but not directive + unverifiedTrace("d3", ["skill-c"]), // not verified + ], + rules: [], + }); + expect(result.baselineWins).toBe(2); + expect(result.baselineDirectiveWins).toBe(1); + expect(result.learnedWins).toBe(2); + expect(result.learnedDirectiveWins).toBe(1); + }); + + test("non-directive verified trace triggers regression when promoted rule misses", () => { + const result = replayLearnedRules({ + traces: [verifiedNonDirectiveTrace("d1", ["skill-a"])], + rules: [makeRule({ id: "r1", skill: "skill-b" })], + }); + expect(result.baselineWins).toBe(1); + expect(result.baselineDirectiveWins).toBe(0); + expect(result.regressions).toEqual(["d1"]); + }); + + test("promoted rule covering non-directive verified trace is a learned win", () => { + const result = replayLearnedRules({ + traces: [verifiedNonDirectiveTrace("d1", ["next-config"])], + rules: [makeRule({ id: "r1", skill: "next-config" })], + }); + expect(result.baselineWins).toBe(1); + expect(result.baselineDirectiveWins).toBe(0); + expect(result.learnedWins).toBe(1); + expect(result.learnedDirectiveWins).toBe(0); + expect(result.regressions).toEqual([]); + }); + + test("directive adherence tracked through to learned wins", () => { + const result = replayLearnedRules({ + traces: [ + verifiedTrace("d1", ["skill-a"]), // directive + verifiedNonDirectiveTrace("d2", ["skill-a"]), // non-directive + ], + rules: [makeRule({ id: "r1", skill: "skill-a" })], + }); + expect(result.baselineWins).toBe(2); + expect(result.baselineDirectiveWins).toBe(1); + expect(result.learnedWins).toBe(2); + expect(result.learnedDirectiveWins).toBe(1); + expect(result.deltaWins).toBe(0); + expect(result.deltaDirectiveWins).toBe(0); + }); + + test("regression rejects rules that reduce verified success even if directive wins would increase", () => { + // d1: non-directive verified win with skill-a (counts as baseline win) + // d2: unverified trace with skill-b (promoted rule covers it → learned win) + // Promoted rule is skill-b, which doesn't cover d1 → regression on d1 + const result = replayLearnedRules({ + traces: [ + verifiedNonDirectiveTrace("d1", ["skill-a"]), + makeTrace({ + decisionId: "d2", + injectedSkills: ["skill-b"], + verification: null, + }), + ], + rules: [makeRule({ id: "r1", skill: "skill-b" })], + }); + expect(result.baselineWins).toBe(1); + expect(result.regressions).toEqual(["d1"]); + }); +}); + +// --------------------------------------------------------------------------- +// Improvement cases (learned rules add wins) +// --------------------------------------------------------------------------- + +describe("replayLearnedRules — improvements", () => { + test("learned rules that overlap with injected skills count as wins", () => { + const result = replayLearnedRules({ + traces: [verifiedTrace("d1", ["next-config"])], + rules: [makeRule({ id: "r1", skill: "next-config" })], + }); + expect(result.baselineWins).toBe(1); + expect(result.learnedWins).toBe(1); + expect(result.deltaWins).toBe(0); + expect(result.regressions).toEqual([]); + }); + + test("learned overlap on unverified trace counts as learned win", () => { + // Trace has no baseline win (no verification), but promoted rule overlaps + const result = replayLearnedRules({ + traces: [ + makeTrace({ + decisionId: "d1", + injectedSkills: ["next-config"], + verification: null, + }), + ], + rules: [makeRule({ id: "r1", skill: "next-config" })], + }); + expect(result.baselineWins).toBe(0); + expect(result.learnedWins).toBe(1); + expect(result.deltaWins).toBe(1); + expect(result.regressions).toEqual([]); + }); + + test("positive delta when learned rules cover unverified traces", () => { + const result = replayLearnedRules({ + traces: [ + verifiedTrace("d1", ["next-config"]), + // d2: no verification but promoted rule covers it + makeTrace({ + decisionId: "d2", + injectedSkills: ["next-config"], + verification: null, + }), + ], + rules: [makeRule({ id: "r1", skill: "next-config" })], + }); + expect(result.baselineWins).toBe(1); + expect(result.learnedWins).toBe(2); + expect(result.deltaWins).toBe(1); + expect(result.regressions).toEqual([]); + }); +}); + +// --------------------------------------------------------------------------- +// Regression cases +// --------------------------------------------------------------------------- + +describe("replayLearnedRules — regressions", () => { + test("detects regression when promoted rule does not cover baseline winner", () => { + const result = replayLearnedRules({ + traces: [verifiedTrace("d1", ["next-config"])], + // Promoted rule covers a DIFFERENT skill + rules: [makeRule({ id: "r1", skill: "tailwind" })], + }); + expect(result.baselineWins).toBe(1); + expect(result.learnedWins).toBe(0); + expect(result.deltaWins).toBe(-1); + expect(result.regressions).toEqual(["d1"]); + }); + + test("multiple regressions are all captured", () => { + const result = replayLearnedRules({ + traces: [ + verifiedTrace("d1", ["skill-a"]), + verifiedTrace("d2", ["skill-a"]), + verifiedTrace("d3", ["skill-a"]), + ], + rules: [makeRule({ id: "r1", skill: "skill-b" })], + }); + expect(result.baselineWins).toBe(3); + expect(result.regressions.length).toBe(3); + expect(result.regressions).toEqual(["d1", "d2", "d3"]); + }); + + test("regressions block all promotions (zero promoted rules downstream)", () => { + const result = replayLearnedRules({ + traces: [verifiedTrace("d1", ["skill-a"])], + rules: [makeRule({ id: "r1", skill: "skill-b" })], + }); + // Any caller of replayLearnedRules should check: if regressions.length > 0, + // demote all promoted rules to holdout-fail + expect(result.regressions.length).toBeGreaterThan(0); + }); + + test("mixed: some traces regress, some improve", () => { + const result = replayLearnedRules({ + traces: [ + // d1: baseline win with skill-a, but promoted rule is skill-b → regression + verifiedTrace("d1", ["skill-a"]), + // d2: no baseline win, promoted rule covers injected skill → improvement + makeTrace({ + decisionId: "d2", + injectedSkills: ["skill-b"], + verification: null, + }), + ], + rules: [makeRule({ id: "r1", skill: "skill-b" })], + }); + expect(result.baselineWins).toBe(1); + expect(result.learnedWins).toBe(1); // only d2 + expect(result.deltaWins).toBe(0); + expect(result.regressions).toEqual(["d1"]); + }); + + test("regression not triggered when promoted skill matches injected skill", () => { + // Same skill promoted as was injected and verified — no regression + const result = replayLearnedRules({ + traces: [verifiedTrace("d1", ["next-config"])], + rules: [makeRule({ id: "r1", skill: "next-config" })], + }); + expect(result.regressions).toEqual([]); + }); + + test("learnedWins < baselineWins when promoted rules miss verified wins", () => { + // 3 baseline wins, but promoted rule covers a different skill → 0 learned wins + const result = replayLearnedRules({ + traces: [ + verifiedTrace("d1", ["skill-a"]), + verifiedTrace("d2", ["skill-a"]), + verifiedTrace("d3", ["skill-a"]), + ], + rules: [makeRule({ id: "r1", skill: "skill-b" })], + }); + expect(result.baselineWins).toBe(3); + expect(result.learnedWins).toBe(0); + expect(result.learnedWins).toBeLessThan(result.baselineWins); + expect(result.regressions.length).toBe(3); + }); + + test("learnedWins equals baselineWins when promoted rules cover all wins", () => { + const result = replayLearnedRules({ + traces: [ + verifiedTrace("d1", ["skill-a"]), + verifiedTrace("d2", ["skill-a"]), + ], + rules: [makeRule({ id: "r1", skill: "skill-a" })], + }); + expect(result.baselineWins).toBe(2); + expect(result.learnedWins).toBe(2); + expect(result.learnedWins).toBeGreaterThanOrEqual(result.baselineWins); + expect(result.regressions).toEqual([]); + }); +}); + +// --------------------------------------------------------------------------- +// Determinism +// --------------------------------------------------------------------------- + +describe("replayLearnedRules — determinism", () => { + test("identical inputs produce identical output", () => { + const traces = [ + verifiedTrace("d1", ["next-config"]), + verifiedTrace("d2", ["tailwind"]), + unverifiedTrace("d3", ["react"]), + ]; + const rules = [makeRule({ id: "r1", skill: "next-config" })]; + + const r1 = replayLearnedRules({ traces, rules }); + const r2 = replayLearnedRules({ traces, rules }); + expect(JSON.stringify(r1)).toBe(JSON.stringify(r2)); + }); + + test("regression IDs are sorted regardless of trace order", () => { + const traces = [ + verifiedTrace("z-trace", ["skill-a"]), + verifiedTrace("m-trace", ["skill-a"]), + verifiedTrace("a-trace", ["skill-a"]), + ]; + const rules = [makeRule({ id: "r1", skill: "different-skill" })]; + + const result = replayLearnedRules({ traces, rules }); + expect(result.regressions).toEqual(["a-trace", "m-trace", "z-trace"]); + }); + + test("output is stable across repeated runs with shuffled traces", () => { + const base = [ + verifiedTrace("d3", ["skill-a"]), + verifiedTrace("d1", ["skill-a"]), + verifiedTrace("d2", ["skill-a"]), + ]; + const rules = [makeRule({ id: "r1", skill: "other" })]; + + const r1 = replayLearnedRules({ traces: base, rules }); + // Shuffle order + const r2 = replayLearnedRules({ + traces: [base[1]!, base[2]!, base[0]!], + rules, + }); + + // Counts are the same + expect(r1.baselineWins).toBe(r2.baselineWins); + expect(r1.baselineDirectiveWins).toBe(r2.baselineDirectiveWins); + expect(r1.learnedWins).toBe(r2.learnedWins); + expect(r1.learnedDirectiveWins).toBe(r2.learnedDirectiveWins); + expect(r1.deltaWins).toBe(r2.deltaWins); + expect(r1.deltaDirectiveWins).toBe(r2.deltaDirectiveWins); + // Regressions sorted identically + expect(r1.regressions).toEqual(r2.regressions); + }); +}); + +// --------------------------------------------------------------------------- +// Scenario scoping +// --------------------------------------------------------------------------- + +describe("replayLearnedRules — scenario scoping", () => { + test("rules only apply to matching scenario", () => { + // Rule targets PreToolUse/feature/uiRender/Read//app + const rule = makeRule({ id: "r1", skill: "other-skill" }); + + // Trace in a DIFFERENT scenario (different hook) + const trace = verifiedTrace("d1", ["skill-a"]); + // Override to put in different scenario + const diffScenarioTrace: RoutingDecisionTrace = { + ...trace, + hook: "UserPromptSubmit", + }; + + const result = replayLearnedRules({ + traces: [diffScenarioTrace], + rules: [rule], + }); + // No promoted rules match this scenario → baseline carries through + expect(result.baselineWins).toBe(1); + expect(result.learnedWins).toBe(1); + expect(result.regressions).toEqual([]); + }); + + test("same skill in different scenarios are independent", () => { + const ruleA = makeRule({ + id: "r1", + skill: "skill-b", + scenario: { + hook: "PreToolUse", + storyKind: "feature", + targetBoundary: "uiRender", + toolName: "Read", + routeScope: "/app", + }, + }); + const ruleB = makeRule({ + id: "r2", + skill: "skill-a", + scenario: { + hook: "PreToolUse", + storyKind: "bugfix", + targetBoundary: "serverHandler", + toolName: "Bash", + routeScope: "/api", + }, + }); + + // Trace in scenario A — skill-a wins baseline but promoted is skill-b → regression + const traceA = verifiedTrace("d1", ["skill-a"]); + + // Trace in scenario B — skill-a is promoted and injected → no regression + const traceB: RoutingDecisionTrace = { + ...verifiedTrace("d2", ["skill-a"]), + hook: "PreToolUse", + toolName: "Bash", + primaryStory: { + id: "story-2", + kind: "bugfix", + storyRoute: "/api", + targetBoundary: "serverHandler", + }, + }; + + const result = replayLearnedRules({ + traces: [traceA, traceB], + rules: [ruleA, ruleB], + }); + expect(result.baselineWins).toBe(2); + expect(result.learnedWins).toBe(1); // only traceB + expect(result.regressions).toEqual(["d1"]); + }); +}); + +// --------------------------------------------------------------------------- +// Edge cases +// --------------------------------------------------------------------------- + +describe("replayLearnedRules — edge cases", () => { + test("trace with empty injectedSkills and verification is not a baseline win", () => { + const trace = makeTrace({ + decisionId: "d1", + injectedSkills: [], + verification: { + verificationId: "v1", + observedBoundary: "uiRender", + matchedSuggestedAction: true, + }, + }); + const result = replayLearnedRules({ traces: [trace], rules: [] }); + expect(result.baselineWins).toBe(0); + expect(result.baselineDirectiveWins).toBe(0); + }); + + test("trace with verification false is a baseline win but not a directive win", () => { + const trace = makeTrace({ + decisionId: "d1", + injectedSkills: ["next-config"], + verification: { + verificationId: "v1", + observedBoundary: "uiRender", + matchedSuggestedAction: false, + }, + }); + const result = replayLearnedRules({ traces: [trace], rules: [] }); + expect(result.baselineWins).toBe(1); + expect(result.baselineDirectiveWins).toBe(0); + }); + + test("pending verification placeholder is not a baseline win", () => { + const trace = makeTrace({ + decisionId: "d1", + injectedSkills: ["next-config"], + verification: { + verificationId: "v1", + observedBoundary: null, + matchedSuggestedAction: null, + }, + }); + const result = replayLearnedRules({ traces: [trace], rules: [] }); + expect(result.baselineWins).toBe(0); + expect(result.baselineDirectiveWins).toBe(0); + }); + + test("multiple promoted rules for same scenario are unioned", () => { + const result = replayLearnedRules({ + traces: [ + verifiedTrace("d1", ["skill-a"]), + verifiedTrace("d2", ["skill-b"]), + ], + rules: [ + makeRule({ id: "r1", skill: "skill-a" }), + makeRule({ id: "r2", skill: "skill-b" }), + ], + }); + expect(result.baselineWins).toBe(2); + expect(result.learnedWins).toBe(2); + expect(result.regressions).toEqual([]); + }); + + test("trace with multiple injected skills: one overlaps promoted → no regression", () => { + const result = replayLearnedRules({ + traces: [verifiedTrace("d1", ["skill-a", "skill-b"])], + rules: [makeRule({ id: "r1", skill: "skill-b" })], + }); + expect(result.regressions).toEqual([]); + expect(result.learnedWins).toBe(1); + }); +}); diff --git a/tests/session-explain.test.ts b/tests/session-explain.test.ts new file mode 100644 index 0000000..9cf4d15 --- /dev/null +++ b/tests/session-explain.test.ts @@ -0,0 +1,869 @@ +import { describe, test, expect, beforeEach, afterEach } from "bun:test"; +import { existsSync, mkdirSync, rmSync, writeFileSync } from "node:fs"; +import { join } from "node:path"; +import { tmpdir } from "node:os"; +import { + appendRoutingDecisionTrace, + traceDir, + type RoutingDecisionTrace, +} from "../hooks/src/routing-decision-trace.mts"; +import { + runSessionExplain, + type SessionExplainResult, +} from "../src/commands/session-explain.ts"; + +// --------------------------------------------------------------------------- +// Helpers +// --------------------------------------------------------------------------- + +const ROOT = join(import.meta.dir, ".."); +const TEST_SESSION = "test-session-explain-" + Date.now(); + +function makeTrace( + overrides: Partial = {}, +): RoutingDecisionTrace { + return { + version: 2, + decisionId: "deadbeef01234567", + sessionId: TEST_SESSION, + hook: "PreToolUse", + toolName: "Bash", + toolTarget: "npm run dev", + timestamp: "2026-03-27T08:00:00.000Z", + primaryStory: { + id: "story-1", + kind: "flow-verification", + storyRoute: "/settings", + targetBoundary: "uiRender", + }, + observedRoute: null, + policyScenario: "PreToolUse|flow-verification|uiRender|Bash", + matchedSkills: ["agent-browser-verify"], + injectedSkills: ["agent-browser-verify"], + skippedReasons: [], + ranked: [ + { + skill: "agent-browser-verify", + basePriority: 7, + effectivePriority: 15, + pattern: { type: "bashPattern", value: "dev server" }, + profilerBoost: 0, + policyBoost: 8, + policyReason: "4/5 wins", + summaryOnly: false, + synthetic: false, + droppedReason: null, + }, + ], + verification: null, + ...overrides, + }; +} + +// --------------------------------------------------------------------------- +// Cleanup +// --------------------------------------------------------------------------- + +afterEach(() => { + try { + rmSync(traceDir(TEST_SESSION), { recursive: true, force: true }); + } catch { + // ignore + } +}); + +// --------------------------------------------------------------------------- +// Core JSON contract tests +// --------------------------------------------------------------------------- + +describe("session-explain JSON mode", () => { + test("reports excluded test-only skills instead of treating them as parity failures", () => { + // The project root has skills/ including fake-banned-test-skill. + // session-explain should report it as excluded, not as a parity failure. + const output = runSessionExplain(TEST_SESSION, ROOT, true); + const result: SessionExplainResult = JSON.parse(output); + + expect(result.ok).toBe(true); + expect(result.manifest.excludedSkills).toEqual( + expect.arrayContaining([ + { slug: "fake-banned-test-skill", reason: "test-only-pattern" }, + ]), + ); + // Excluded skills should NOT appear as parity drift + expect(result.manifest.parity.ok).toBe(true); + expect(result.manifest.parity.missingFromManifest).not.toContain("fake-banned-test-skill"); + }); + + test("includes latest routing decision id and hook when traces exist", () => { + const trace = makeTrace(); + appendRoutingDecisionTrace(trace); + + const output = runSessionExplain(TEST_SESSION, ROOT, true); + const result: SessionExplainResult = JSON.parse(output); + + expect(result.routing.decisionCount).toBe(1); + expect(result.routing.latestDecisionId).toBe("deadbeef01234567"); + expect(result.routing.latestHook).toBe("PreToolUse"); + expect(result.routing.latestPolicyScenario).toBe( + "PreToolUse|flow-verification|uiRender|Bash", + ); + }); + + test("includes verification directive env when a plan has a primaryNextAction", () => { + // Without an active session with stories, env should contain clearing values + const output = runSessionExplain(TEST_SESSION, ROOT, true); + const result: SessionExplainResult = JSON.parse(output); + + // Verification env always present with the four canonical keys + expect(result.verification.env).toHaveProperty("VERCEL_PLUGIN_VERIFICATION_STORY_ID"); + expect(result.verification.env).toHaveProperty("VERCEL_PLUGIN_VERIFICATION_ROUTE"); + expect(result.verification.env).toHaveProperty("VERCEL_PLUGIN_VERIFICATION_BOUNDARY"); + expect(result.verification.env).toHaveProperty("VERCEL_PLUGIN_VERIFICATION_ACTION"); + }); + + test("returns actionable warning when manifest is missing", () => { + // Use a temp dir with skills/ but no generated/skill-manifest.json + const tempRoot = join(tmpdir(), `session-explain-test-${Date.now()}`); + const tempSkills = join(tempRoot, "skills", "dummy-skill"); + mkdirSync(tempSkills, { recursive: true }); + writeFileSync( + join(tempSkills, "SKILL.md"), + `--- +name: dummy-skill +description: test +metadata: + priority: 5 +--- +# Dummy +`, + ); + + try { + const output = runSessionExplain(null, tempRoot, true); + const result: SessionExplainResult = JSON.parse(output); + + expect(result.diagnosis.some((d) => d.code === "MANIFEST_MISSING")).toBe(true); + const diag = result.diagnosis.find((d) => d.code === "MANIFEST_MISSING")!; + expect(diag.severity).toBe("warning"); + expect(diag.hint).toContain("build:manifest"); + } finally { + rmSync(tempRoot, { recursive: true, force: true }); + } + }); + + test("returns actionable error when manifest is malformed", () => { + const tempRoot = join(tmpdir(), `session-explain-bad-manifest-${Date.now()}`); + const generatedDir = join(tempRoot, "generated"); + const tempSkills = join(tempRoot, "skills", "dummy-skill"); + mkdirSync(generatedDir, { recursive: true }); + mkdirSync(tempSkills, { recursive: true }); + writeFileSync(join(generatedDir, "skill-manifest.json"), "{ not-json"); + writeFileSync( + join(tempSkills, "SKILL.md"), + `--- +name: dummy-skill +description: test +metadata: + priority: 5 +--- +# Dummy +`, + ); + + try { + const output = runSessionExplain(null, tempRoot, true); + const result: SessionExplainResult = JSON.parse(output); + + expect(result.diagnosis.some((d) => d.code === "MANIFEST_PARSE_FAILED")).toBe(true); + const diag = result.diagnosis.find((d) => d.code === "MANIFEST_PARSE_FAILED")!; + expect(diag.severity).toBe("error"); + expect(diag.hint).toContain("build:manifest"); + } finally { + rmSync(tempRoot, { recursive: true, force: true }); + } + }); + + test("does not surface fake-banned-test-skill as a live runtime candidate", () => { + const output = runSessionExplain(TEST_SESSION, ROOT, true); + const result: SessionExplainResult = JSON.parse(output); + + // The skill count should not include excluded skills + const manifestSkillNames = Object.keys( + JSON.parse( + require("node:fs").readFileSync( + join(ROOT, "generated", "skill-manifest.json"), + "utf-8", + ), + ).skills, + ); + expect(manifestSkillNames).not.toContain("fake-banned-test-skill"); + + // Parity should be ok (excluded skill doesn't cause drift) + expect(result.manifest.parity.ok).toBe(true); + }); +}); + +// --------------------------------------------------------------------------- +// Text mode +// --------------------------------------------------------------------------- + +describe("session-explain text mode", () => { + test("prints session id, manifest count, routing traces, and verification status", () => { + const output = runSessionExplain(TEST_SESSION, ROOT, false); + + expect(output).toContain(`Session: ${TEST_SESSION}`); + expect(output).toContain("Manifest:"); + expect(output).toContain("skills"); + expect(output).toContain("Routing traces:"); + expect(output).toContain("Verification stories:"); + }); + + test("includes excluded skills in text output", () => { + const output = runSessionExplain(TEST_SESSION, ROOT, false); + + expect(output).toContain("Excluded:"); + expect(output).toContain("fake-banned-test-skill"); + }); +}); + +// --------------------------------------------------------------------------- +// Exposure aggregation +// --------------------------------------------------------------------------- + +describe("session-explain exposure aggregation", () => { + test("reports zero exposures for unknown session", () => { + const output = runSessionExplain("nonexistent-session-" + Date.now(), ROOT, true); + const result: SessionExplainResult = JSON.parse(output); + + expect(result.exposures.pending).toBe(0); + expect(result.exposures.wins).toBe(0); + expect(result.exposures.directiveWins).toBe(0); + expect(result.exposures.staleMisses).toBe(0); + expect(result.exposures.candidateWins).toBe(0); + expect(result.exposures.contextWins).toBe(0); + }); +}); + +// --------------------------------------------------------------------------- +// Manifest exclusion drift diagnosis +// --------------------------------------------------------------------------- + +describe("session-explain manifest exclusion drift", () => { + test("emits MANIFEST_EXCLUSION_DRIFT when live exclusions exist but manifest has none", () => { + // Create a temp root with an excluded skill but a manifest with excludedSkills: [] + const tempRoot = join(tmpdir(), `session-explain-drift-${Date.now()}`); + const generatedDir = join(tempRoot, "generated"); + const tempSkills = join(tempRoot, "skills", "fake-drift-skill"); + mkdirSync(generatedDir, { recursive: true }); + mkdirSync(tempSkills, { recursive: true }); + writeFileSync( + join(generatedDir, "skill-manifest.json"), + JSON.stringify({ + generatedAt: "2026-03-28T00:00:00.000Z", + version: 2, + excludedSkills: [], + skills: {}, + }), + ); + writeFileSync( + join(tempSkills, "SKILL.md"), + `--- +name: fake-drift-skill +description: "Fixture that triggers exclusion drift" +metadata: + priority: 1 +--- +# Fake Drift Skill +`, + ); + + try { + const output = runSessionExplain(null, tempRoot, true); + const result: SessionExplainResult = JSON.parse(output); + + const drift = result.diagnosis.find( + (d) => d.code === "MANIFEST_EXCLUSION_DRIFT", + ); + expect(drift).toBeDefined(); + expect(drift!.severity).toBe("error"); + expect(drift!.hint).toContain("build:manifest"); + } finally { + rmSync(tempRoot, { recursive: true, force: true }); + } + }); + + test("does NOT emit MANIFEST_EXCLUSION_DRIFT when manifest exclusions are in sync", () => { + // Use the real project root — manifest should be in sync after rebuild + const output = runSessionExplain(null, ROOT, true); + const result: SessionExplainResult = JSON.parse(output); + + const drift = result.diagnosis.find( + (d) => d.code === "MANIFEST_EXCLUSION_DRIFT", + ); + expect(drift).toBeUndefined(); + }); +}); + +// --------------------------------------------------------------------------- +// Null session +// --------------------------------------------------------------------------- + +describe("session-explain null session", () => { + test("returns valid result with null sessionId", () => { + const output = runSessionExplain(null, ROOT, true); + const result: SessionExplainResult = JSON.parse(output); + + expect(result.ok).toBe(true); + expect(result.sessionId).toBeNull(); + expect(result.verification.hasStories).toBe(false); + }); +}); + +// --------------------------------------------------------------------------- +// Routing doctor contract +// --------------------------------------------------------------------------- + +describe("session-explain doctor contract", () => { + afterEach(() => { + try { + rmSync(traceDir(TEST_SESSION), { recursive: true, force: true }); + } catch { + // ignore + } + }); + + test(".doctor exists and contains expected structure when a trace is present", () => { + const trace = makeTrace(); + appendRoutingDecisionTrace(trace); + + const output = runSessionExplain(TEST_SESSION, ROOT, true); + const result: SessionExplainResult = JSON.parse(output); + + expect(result.doctor).not.toBeNull(); + expect(result.doctor!.latestDecisionId).toBe("deadbeef01234567"); + expect(result.doctor!.latestScenario).toBe("PreToolUse|flow-verification|uiRender|Bash"); + expect(result.doctor!.latestRanked).toBeArray(); + expect(result.doctor!.latestRanked.length).toBeGreaterThan(0); + expect(result.doctor!.latestRanked[0].skill).toBe("agent-browser-verify"); + expect(result.doctor!.hints).toBeArray(); + }); + + test(".doctor.policyRecall.checkedScenarios is an array when scenario has targetBoundary", () => { + // Construct a trace with a 5-part policy scenario (includes route scope) + const trace = makeTrace({ + policyScenario: "PreToolUse|flow-verification|clientRequest|Bash|/settings", + primaryStory: { + id: "story-1", + kind: "flow-verification", + storyRoute: "/settings", + targetBoundary: "clientRequest", + }, + }); + appendRoutingDecisionTrace(trace); + + const output = runSessionExplain(TEST_SESSION, ROOT, true); + const result: SessionExplainResult = JSON.parse(output); + + expect(result.doctor).not.toBeNull(); + expect(result.doctor!.policyRecall).not.toBeNull(); + expect(result.doctor!.policyRecall!.checkedScenarios).toBeArray(); + // checkedScenarios should contain at least the exact route key + expect(result.doctor!.policyRecall!.checkedScenarios.length).toBeGreaterThan(0); + for (const bucket of result.doctor!.policyRecall!.checkedScenarios) { + expect(bucket).toHaveProperty("scenario"); + expect(bucket).toHaveProperty("skillCount"); + expect(bucket).toHaveProperty("qualifiedCount"); + expect(bucket).toHaveProperty("selected"); + } + }); + + test(".doctor.hints[].action is machine-readable when present", () => { + // With no routing policy history and a valid scenario, we should get a NO_HISTORY hint + const trace = makeTrace({ + policyScenario: "PreToolUse|flow-verification|clientRequest|Bash|/settings", + primaryStory: { + id: "story-1", + kind: "flow-verification", + storyRoute: "/settings", + targetBoundary: "clientRequest", + }, + }); + appendRoutingDecisionTrace(trace); + + const output = runSessionExplain(TEST_SESSION, ROOT, true); + const result: SessionExplainResult = JSON.parse(output); + + expect(result.doctor).not.toBeNull(); + // Every hint with an action must have a machine-readable action.type + for (const hint of result.doctor!.hints) { + expect(hint).toHaveProperty("severity"); + expect(hint).toHaveProperty("code"); + expect(hint).toHaveProperty("message"); + if (hint.action) { + expect(typeof hint.action.type).toBe("string"); + expect(hint.action.type.length).toBeGreaterThan(0); + } + } + }); + + test(".doctor.companionRecall detects verified-companion synthetic entries", () => { + const trace = makeTrace({ + ranked: [ + { + skill: "agent-browser-verify", + basePriority: 7, + effectivePriority: 15, + pattern: { type: "bashPattern", value: "dev server" }, + profilerBoost: 0, + policyBoost: 8, + policyReason: "4/5 wins", + summaryOnly: false, + synthetic: false, + droppedReason: null, + }, + { + skill: "verification", + basePriority: 0, + effectivePriority: 0, + pattern: { type: "verified-companion", value: "scenario-companion-rulebook" }, + profilerBoost: 0, + policyBoost: 0, + policyReason: null, + summaryOnly: false, + synthetic: true, + droppedReason: null, + }, + ], + }); + appendRoutingDecisionTrace(trace); + + const output = runSessionExplain(TEST_SESSION, ROOT, true); + const result: SessionExplainResult = JSON.parse(output); + + expect(result.doctor).not.toBeNull(); + expect(result.doctor!.companionRecall.detected).toBe(true); + expect(result.doctor!.companionRecall.entries).toHaveLength(1); + + const entry = result.doctor!.companionRecall.entries[0]; + expect(entry.companionSkill).toBe("verification"); + expect(entry.candidateSkill).toBe("agent-browser-verify"); + expect(entry.patternType).toBe("verified-companion"); + expect(entry.patternValue).toBe("scenario-companion-rulebook"); + expect(entry.synthetic).toBe(true); + expect(entry.droppedReason).toBeNull(); + }); + + test(".doctor.companionRecall.detected is false when no companion entries exist", () => { + const trace = makeTrace(); // default trace has no companion entries + appendRoutingDecisionTrace(trace); + + const output = runSessionExplain(TEST_SESSION, ROOT, true); + const result: SessionExplainResult = JSON.parse(output); + + expect(result.doctor).not.toBeNull(); + expect(result.doctor!.companionRecall.detected).toBe(false); + expect(result.doctor!.companionRecall.entries).toHaveLength(0); + }); + + test(".doctor emits COMPANION_RECALL_NOT_SYNTHETIC hint for non-synthetic companion", () => { + const trace = makeTrace({ + ranked: [ + { + skill: "agent-browser-verify", + basePriority: 7, + effectivePriority: 15, + pattern: { type: "bashPattern", value: "dev server" }, + profilerBoost: 0, + policyBoost: 8, + policyReason: "4/5 wins", + summaryOnly: false, + synthetic: false, + droppedReason: null, + }, + { + skill: "verification", + basePriority: 0, + effectivePriority: 0, + pattern: { type: "verified-companion", value: "scenario-companion-rulebook" }, + profilerBoost: 0, + policyBoost: 0, + policyReason: null, + summaryOnly: false, + synthetic: false, // BUG: should be true + droppedReason: null, + }, + ], + }); + appendRoutingDecisionTrace(trace); + + const output = runSessionExplain(TEST_SESSION, ROOT, true); + const result: SessionExplainResult = JSON.parse(output); + + expect(result.doctor).not.toBeNull(); + const hint = result.doctor!.hints.find( + (h) => h.code === "COMPANION_RECALL_NOT_SYNTHETIC", + ); + expect(hint).toBeDefined(); + expect(hint!.severity).toBe("warning"); + expect(hint!.message).toContain("verification"); + }); + + test(".doctor.companionRecall coexists with policyRecall without interference", () => { + const trace = makeTrace({ + policyScenario: "PreToolUse|flow-verification|clientRequest|Bash|/settings", + primaryStory: { + id: "story-1", + kind: "flow-verification", + storyRoute: "/settings", + targetBoundary: "clientRequest", + }, + ranked: [ + { + skill: "agent-browser-verify", + basePriority: 7, + effectivePriority: 15, + pattern: { type: "policy-recall", value: "route-scoped-verified-policy-recall" }, + profilerBoost: 0, + policyBoost: 8, + policyReason: "4/5 wins", + summaryOnly: false, + synthetic: true, + droppedReason: null, + }, + { + skill: "verification", + basePriority: 0, + effectivePriority: 0, + pattern: { type: "verified-companion", value: "scenario-companion-rulebook" }, + profilerBoost: 0, + policyBoost: 0, + policyReason: null, + summaryOnly: false, + synthetic: true, + droppedReason: null, + }, + ], + }); + appendRoutingDecisionTrace(trace); + + const output = runSessionExplain(TEST_SESSION, ROOT, true); + const result: SessionExplainResult = JSON.parse(output); + + expect(result.doctor).not.toBeNull(); + // Policy recall should still have its own diagnosis + expect(result.doctor!.policyRecall).not.toBeNull(); + // Companion recall should be independently tracked + expect(result.doctor!.companionRecall.detected).toBe(true); + expect(result.doctor!.companionRecall.entries[0].companionSkill).toBe("verification"); + // Both should appear in latestRanked + expect(result.doctor!.latestRanked).toHaveLength(2); + }); + + test("explicit causality: two companions after one candidate resolve correctly", () => { + const trace = makeTrace({ + ranked: [ + { + skill: "agent-browser-verify", + basePriority: 7, + effectivePriority: 15, + pattern: null, + profilerBoost: 0, + policyBoost: 8, + policyReason: "4/5 wins", + summaryOnly: false, + synthetic: true, + droppedReason: null, + }, + { + skill: "verification", + basePriority: 0, + effectivePriority: 0, + pattern: null, + profilerBoost: 0, + policyBoost: 0, + policyReason: null, + summaryOnly: false, + synthetic: true, + droppedReason: null, + }, + { + skill: "observability", + basePriority: 0, + effectivePriority: 0, + pattern: null, + profilerBoost: 0, + policyBoost: 0, + policyReason: null, + summaryOnly: false, + synthetic: true, + droppedReason: null, + }, + ], + causes: [ + { + code: "verified-companion", + stage: "rank", + skill: "verification", + synthetic: true, + scoreDelta: 0, + message: "Inserted learned companion after agent-browser-verify", + detail: { + candidateSkill: "agent-browser-verify", + scenario: "PreToolUse|bugfix|uiRender|Bash|/settings", + }, + }, + { + code: "verified-companion", + stage: "rank", + skill: "observability", + synthetic: true, + scoreDelta: 0, + message: "Inserted learned companion after agent-browser-verify", + detail: { + candidateSkill: "agent-browser-verify", + scenario: "PreToolUse|bugfix|uiRender|Bash|/settings", + }, + }, + ], + edges: [ + { + fromSkill: "agent-browser-verify", + toSkill: "verification", + relation: "companion-of", + code: "verified-companion", + detail: { scenario: "PreToolUse|bugfix|uiRender|Bash|/settings" }, + }, + { + fromSkill: "agent-browser-verify", + toSkill: "observability", + relation: "companion-of", + code: "verified-companion", + detail: { scenario: "PreToolUse|bugfix|uiRender|Bash|/settings" }, + }, + ], + } as any); + appendRoutingDecisionTrace(trace); + + const output = runSessionExplain(TEST_SESSION, ROOT, true); + const result: SessionExplainResult = JSON.parse(output); + + expect(result.doctor).not.toBeNull(); + expect(result.doctor!.companionRecall.detected).toBe(true); + expect(result.doctor!.companionRecall.entries).toHaveLength(2); + + const verification = result.doctor!.companionRecall.entries.find( + (e) => e.companionSkill === "verification", + )!; + expect(verification.candidateSkill).toBe("agent-browser-verify"); + expect(verification.synthetic).toBe(true); + + const observability = result.doctor!.companionRecall.entries.find( + (e) => e.companionSkill === "observability", + )!; + expect(observability.candidateSkill).toBe("agent-browser-verify"); + expect(observability.synthetic).toBe(true); + + // No COMPANION_EDGE_MISSING hints since all edges are present + const edgeMissing = result.doctor!.hints.filter( + (h) => h.code === "COMPANION_EDGE_MISSING", + ); + expect(edgeMissing).toHaveLength(0); + }); + + test("explicit causality: companion moved away from candidate resolves via edge", () => { + // ranked order: candidate, unrelated, companion — edge still resolves correctly + const trace = makeTrace({ + ranked: [ + { + skill: "agent-browser-verify", + basePriority: 7, + effectivePriority: 15, + pattern: null, + profilerBoost: 0, + policyBoost: 8, + policyReason: "4/5 wins", + summaryOnly: false, + synthetic: true, + droppedReason: null, + }, + { + skill: "next-app-router", + basePriority: 6, + effectivePriority: 6, + pattern: { type: "pathPattern", value: "app/**" }, + profilerBoost: 0, + policyBoost: 0, + policyReason: null, + summaryOnly: false, + synthetic: false, + droppedReason: null, + }, + { + skill: "verification", + basePriority: 0, + effectivePriority: 0, + pattern: null, + profilerBoost: 0, + policyBoost: 0, + policyReason: null, + summaryOnly: false, + synthetic: true, + droppedReason: null, + }, + ], + causes: [ + { + code: "verified-companion", + stage: "rank", + skill: "verification", + synthetic: true, + scoreDelta: 0, + message: "Inserted learned companion after agent-browser-verify", + detail: { + candidateSkill: "agent-browser-verify", + scenario: "PreToolUse|bugfix|uiRender|Bash|/settings", + }, + }, + ], + edges: [ + { + fromSkill: "agent-browser-verify", + toSkill: "verification", + relation: "companion-of", + code: "verified-companion", + detail: { scenario: "PreToolUse|bugfix|uiRender|Bash|/settings" }, + }, + ], + } as any); + appendRoutingDecisionTrace(trace); + + const output = runSessionExplain(TEST_SESSION, ROOT, true); + const result: SessionExplainResult = JSON.parse(output); + + expect(result.doctor!.companionRecall.detected).toBe(true); + const entry = result.doctor!.companionRecall.entries[0]; + // Edge-based resolution should find the correct candidate even though + // next-app-router sits between them in ranked order + expect(entry.candidateSkill).toBe("agent-browser-verify"); + expect(entry.companionSkill).toBe("verification"); + }); + + test("fallback: old traces without causes/edges resolve via ranked order", () => { + // Old-style trace: no causes or edges, only ranked[] with pattern metadata + const trace = makeTrace({ + ranked: [ + { + skill: "agent-browser-verify", + basePriority: 7, + effectivePriority: 15, + pattern: { type: "bashPattern", value: "dev server" }, + profilerBoost: 0, + policyBoost: 8, + policyReason: "4/5 wins", + summaryOnly: false, + synthetic: false, + droppedReason: null, + }, + { + skill: "verification", + basePriority: 0, + effectivePriority: 0, + pattern: { type: "verified-companion", value: "scenario-companion-rulebook" }, + profilerBoost: 0, + policyBoost: 0, + policyReason: null, + summaryOnly: false, + synthetic: true, + droppedReason: null, + }, + ], + // Explicitly no causes/edges — simulates pre-causality trace + }); + // Remove causes/edges from the trace before appending + const rawTrace = { ...trace } as any; + delete rawTrace.causes; + delete rawTrace.edges; + appendRoutingDecisionTrace(rawTrace); + + const output = runSessionExplain(TEST_SESSION, ROOT, true); + const result: SessionExplainResult = JSON.parse(output); + + expect(result.doctor!.companionRecall.detected).toBe(true); + expect(result.doctor!.companionRecall.entries).toHaveLength(1); + + const entry = result.doctor!.companionRecall.entries[0]; + expect(entry.companionSkill).toBe("verification"); + // Fallback: candidate inferred from preceding ranked entry + expect(entry.candidateSkill).toBe("agent-browser-verify"); + expect(entry.patternType).toBe("verified-companion"); + }); + + test("COMPANION_EDGE_MISSING when companion has cause but no edge", () => { + const trace = makeTrace({ + ranked: [ + { + skill: "agent-browser-verify", + basePriority: 7, + effectivePriority: 15, + pattern: null, + profilerBoost: 0, + policyBoost: 8, + policyReason: "4/5 wins", + summaryOnly: false, + synthetic: true, + droppedReason: null, + }, + { + skill: "verification", + basePriority: 0, + effectivePriority: 0, + pattern: null, + profilerBoost: 0, + policyBoost: 0, + policyReason: null, + summaryOnly: false, + synthetic: true, + droppedReason: null, + }, + ], + causes: [ + { + code: "verified-companion", + stage: "rank", + skill: "verification", + synthetic: true, + scoreDelta: 0, + message: "Inserted learned companion after agent-browser-verify", + detail: { + // No candidateSkill in detail either + scenario: "PreToolUse|bugfix|uiRender|Bash|/settings", + }, + }, + ], + edges: [], // No edges — should trigger COMPANION_EDGE_MISSING + } as any); + appendRoutingDecisionTrace(trace); + + const output = runSessionExplain(TEST_SESSION, ROOT, true); + const result: SessionExplainResult = JSON.parse(output); + + expect(result.doctor!.companionRecall.detected).toBe(true); + const entry = result.doctor!.companionRecall.entries[0]; + // No edge and no detail.candidateSkill → null + expect(entry.candidateSkill).toBeNull(); + + const hint = result.doctor!.hints.find( + (h) => h.code === "COMPANION_EDGE_MISSING", + ); + expect(hint).toBeDefined(); + expect(hint!.severity).toBe("warning"); + expect(hint!.message).toContain("verification"); + }); + + test(".doctor is null when no traces exist", () => { + const output = runSessionExplain(TEST_SESSION, ROOT, true); + const result: SessionExplainResult = JSON.parse(output); + + expect(result.doctor).toBeNull(); + }); +}); diff --git a/tests/skill-exclusion-policy.test.ts b/tests/skill-exclusion-policy.test.ts new file mode 100644 index 0000000..ae2edd6 --- /dev/null +++ b/tests/skill-exclusion-policy.test.ts @@ -0,0 +1,157 @@ +import { describe, test, expect } from "bun:test"; +import { resolve, join } from "node:path"; +import { readFileSync, existsSync } from "node:fs"; + +import { + EXCLUDED_SKILL_PATTERN, + getSkillExclusion, + filterExcludedSkillMap, +} from "../src/shared/skill-exclusion-policy.ts"; + +const ROOT = resolve(import.meta.dirname, ".."); +const MANIFEST_PATH = join(ROOT, "generated", "skill-manifest.json"); +const CLI = join(ROOT, "src", "cli", "index.ts"); + +function readManifest(): any { + return JSON.parse(readFileSync(MANIFEST_PATH, "utf-8")); +} + +async function runCli( + ...args: string[] +): Promise<{ stdout: string; stderr: string; exitCode: number }> { + const proc = Bun.spawn(["bun", "run", CLI, ...args], { + cwd: ROOT, + stdout: "pipe", + stderr: "pipe", + env: { ...process.env, NO_COLOR: "1" }, + }); + const [stdout, stderr] = await Promise.all([ + new Response(proc.stdout).text(), + new Response(proc.stderr).text(), + ]); + const exitCode = await proc.exited; + return { stdout: stdout.trim(), stderr: stderr.trim(), exitCode }; +} + +// --------------------------------------------------------------------------- +// Unit tests for the shared policy module +// --------------------------------------------------------------------------- + +describe("skill-exclusion-policy", () => { + test("EXCLUDED_SKILL_PATTERN matches test-only slugs", () => { + expect(EXCLUDED_SKILL_PATTERN.test("fake-banned-test-skill")).toBe(true); + expect(EXCLUDED_SKILL_PATTERN.test("fake-something")).toBe(true); + }); + + test("EXCLUDED_SKILL_PATTERN rejects production slugs", () => { + expect(EXCLUDED_SKILL_PATTERN.test("nextjs")).toBe(false); + expect(EXCLUDED_SKILL_PATTERN.test("vercel-cli")).toBe(false); + expect(EXCLUDED_SKILL_PATTERN.test("ai-sdk")).toBe(false); + }); + + test("getSkillExclusion returns exclusion record for test slugs", () => { + const result = getSkillExclusion("fake-banned-test-skill"); + expect(result).toEqual({ + slug: "fake-banned-test-skill", + reason: "test-only-pattern", + }); + }); + + test("getSkillExclusion returns null for production slugs", () => { + expect(getSkillExclusion("nextjs")).toBeNull(); + expect(getSkillExclusion("vercel-cli")).toBeNull(); + }); + + test("filterExcludedSkillMap partitions correctly", () => { + const input = { + nextjs: { priority: 6 }, + "fake-banned-test-skill": { priority: 1 }, + "ai-sdk": { priority: 5 }, + }; + + const { included, excluded } = filterExcludedSkillMap(input); + + expect(Object.keys(included)).toEqual(["nextjs", "ai-sdk"]); + expect(included).not.toHaveProperty("fake-banned-test-skill"); + expect(excluded).toEqual([ + { slug: "fake-banned-test-skill", reason: "test-only-pattern" }, + ]); + }); + + test("filterExcludedSkillMap sorts excluded entries by slug", () => { + const input = { + "fake-z": { priority: 1 }, + "fake-a": { priority: 1 }, + nextjs: { priority: 6 }, + }; + + const { excluded } = filterExcludedSkillMap(input); + expect(excluded.map((e) => e.slug)).toEqual(["fake-a", "fake-z"]); + }); +}); + +// --------------------------------------------------------------------------- +// Manifest integration: excludedSkills provenance +// --------------------------------------------------------------------------- + +describe("manifest excludedSkills provenance", () => { + test("manifest contains excludedSkills array with provenance", () => { + const manifest = readManifest(); + expect(manifest.excludedSkills).toEqual([ + { slug: "fake-banned-test-skill", reason: "test-only-pattern" }, + ]); + }); + + test("excluded skills are absent from manifest.skills", () => { + const manifest = readManifest(); + expect(manifest.skills).not.toHaveProperty("fake-banned-test-skill"); + }); + + test("test fixture still exists on disk", () => { + expect( + existsSync(join(ROOT, "skills", "fake-banned-test-skill", "SKILL.md")), + ).toBe(true); + }); +}); + +// --------------------------------------------------------------------------- +// CLI explain: excluded skills never surface as runtime candidates +// --------------------------------------------------------------------------- + +describe("explain excludes test-only skills", () => { + test("fake-banned-test-skill does not appear as a match in JSON mode", async () => { + // fake-banned-test-skill has pathPatterns — but it must not surface + const { stdout, exitCode } = await runCli( + "explain", + "some-test-file.ts", + "--json", + ); + expect(exitCode).toBe(0); + const result = JSON.parse(stdout); + const matchedSlugs = (result.matches ?? []).map( + (m: any) => m.skill, + ); + expect(matchedSlugs).not.toContain("fake-banned-test-skill"); + }); +}); + +// --------------------------------------------------------------------------- +// Doctor: excluded skills do not cause false parity errors +// --------------------------------------------------------------------------- + +describe("doctor respects exclusion policy", () => { + test("doctor does not report fake-banned-test-skill as a parity error", async () => { + const { stdout, exitCode } = await runCli("doctor", "--json"); + // doctor exits 0 or 1 depending on other issues, but check the + // parity-related issues specifically + const result = JSON.parse(stdout); + const parityIssues = (result.issues ?? []).filter( + (i: any) => i.check === "manifest-parity", + ); + const mentionsFake = parityIssues.some( + (i: any) => + i.message.includes("fake-banned-test-skill"), + ); + expect(mentionsFake).toBe(false); + }); +}); diff --git a/tests/snapshots/inject-crons.snap b/tests/snapshots/inject-crons.snap index be64640..41c2a1f 100644 --- a/tests/snapshots/inject-crons.snap +++ b/tests/snapshots/inject-crons.snap @@ -14,10 +14,6 @@ "vercel-functions" ], "summaryOnly": [], - "droppedByCap": [ - "deployments-cicd", - "routing-middleware" - ], "droppedByBudget": [], "reasons": { "cron-jobs": { diff --git a/tests/snapshots/inject-functions.snap b/tests/snapshots/inject-functions.snap index 3f9475c..1cee7b6 100644 --- a/tests/snapshots/inject-functions.snap +++ b/tests/snapshots/inject-functions.snap @@ -14,10 +14,6 @@ "cron-jobs" ], "summaryOnly": [], - "droppedByCap": [ - "deployments-cicd", - "routing-middleware" - ], "droppedByBudget": [], "reasons": { "vercel-functions": { diff --git a/tests/snapshots/inject-headers.snap b/tests/snapshots/inject-headers.snap index 2514fc0..e50c9e1 100644 --- a/tests/snapshots/inject-headers.snap +++ b/tests/snapshots/inject-headers.snap @@ -14,10 +14,6 @@ "vercel-functions" ], "summaryOnly": [], - "droppedByCap": [ - "cron-jobs", - "deployments-cicd" - ], "droppedByBudget": [], "reasons": { "routing-middleware": { diff --git a/tests/snapshots/inject-mixed.snap b/tests/snapshots/inject-mixed.snap index 10aa75f..a816499 100644 --- a/tests/snapshots/inject-mixed.snap +++ b/tests/snapshots/inject-mixed.snap @@ -14,10 +14,6 @@ "deployments-cicd" ], "summaryOnly": [], - "droppedByCap": [ - "routing-middleware", - "vercel-cli" - ], "droppedByBudget": [], "reasons": { "vercel-functions": { diff --git a/tests/snapshots/inject-redirects.snap b/tests/snapshots/inject-redirects.snap index 77dcb55..009f117 100644 --- a/tests/snapshots/inject-redirects.snap +++ b/tests/snapshots/inject-redirects.snap @@ -14,10 +14,6 @@ "vercel-functions" ], "summaryOnly": [], - "droppedByCap": [ - "cron-jobs", - "deployments-cicd" - ], "droppedByBudget": [], "reasons": { "routing-middleware": { diff --git a/tests/snapshots/inject-rewrites.snap b/tests/snapshots/inject-rewrites.snap index 3a8c908..4452e9b 100644 --- a/tests/snapshots/inject-rewrites.snap +++ b/tests/snapshots/inject-rewrites.snap @@ -14,10 +14,6 @@ "vercel-functions" ], "summaryOnly": [], - "droppedByCap": [ - "cron-jobs", - "deployments-cicd" - ], "droppedByBudget": [], "reasons": { "routing-middleware": { diff --git a/tests/subagent-start-context.test.ts b/tests/subagent-start-context.test.ts index 23ebb3e..1809674 100644 --- a/tests/subagent-start-context.test.ts +++ b/tests/subagent-start-context.test.ts @@ -1,8 +1,30 @@ -import { describe, test, expect, beforeEach } from "bun:test"; +import { describe, test, expect, beforeEach, afterEach } from "bun:test"; import { mkdtempSync, writeFileSync, rmSync } from "node:fs"; import { join, resolve } from "node:path"; import { tmpdir } from "node:os"; import { appendPendingLaunch, type PendingLaunch } from "../hooks/src/subagent-state.mts"; +import { + recordStory, + recordObservation, + loadObservations, + loadStories, + derivePlan, + persistPlanState, + type VerificationObservation, + type VerificationBoundary, +} from "../hooks/src/verification-ledger.mts"; +import { computePlan, selectPrimaryStory, type VerificationPlanResult } from "../hooks/src/verification-plan.mts"; +import { + buildVerificationContext, + buildVerificationContextFromPlan, + buildVerificationDirective, + buildVerificationEnv, + resolveBudgetCategory, +} from "../hooks/src/subagent-start-bootstrap.mts"; +import { + envString, + resolveObservedRoute, +} from "../hooks/src/posttooluse-verification-observe.mts"; const ROOT = resolve(import.meta.dirname, ".."); const HOOK_SCRIPT = join(ROOT, "hooks", "subagent-start-bootstrap.mjs"); @@ -330,3 +352,605 @@ describe("subagent-start-context: profile cache and fallback", () => { expect(stdout.trim()).toBe(""); }); }); + +// --------------------------------------------------------------------------- +// Verification context scoping helpers +// --------------------------------------------------------------------------- + +const T0 = "2026-03-26T12:00:00.000Z"; + +function makeObs( + id: string, + boundary: VerificationBoundary, + opts?: Partial, +): VerificationObservation { + return { + id, + timestamp: T0, + source: "bash", + boundary, + route: null, + storyId: null, + summary: `obs-${id}`, + ...opts, + }; +} + +let verificationSessionId: string; + +// --------------------------------------------------------------------------- +// Verification context: unit tests for buildVerificationContext +// --------------------------------------------------------------------------- + +describe("subagent-start-context: verification context scoping", () => { + beforeEach(() => { + verificationSessionId = `subagent-ver-${Date.now()}-${Math.random().toString(36).slice(2)}`; + }); + + afterEach(() => { + try { + rmSync(join(tmpdir(), `vercel-plugin-${verificationSessionId}-ledger`), { recursive: true, force: true }); + } catch {} + }); + + test("returns null when no verification plan exists", () => { + const ctx = buildVerificationContext(verificationSessionId, "minimal"); + expect(ctx).toBeNull(); + }); + + test("returns null when no session id provided", () => { + const ctx = buildVerificationContext(undefined, "standard"); + expect(ctx).toBeNull(); + }); + + test("Explore agent gets minimal verification context (story + route only)", () => { + recordStory(verificationSessionId, "flow-verification", "/settings", "settings page loads but save fails", ["verification"]); + recordObservation(verificationSessionId, makeObs("v1", "clientRequest", { route: "/settings" })); + + const ctx = buildVerificationContext(verificationSessionId, "minimal"); + expect(ctx).not.toBeNull(); + expect(ctx!).toContain('scope="minimal"'); + expect(ctx!).toContain("flow-verification"); + expect(ctx!).toContain("/settings"); + // Minimal should NOT include missing boundaries or actions + expect(ctx!).not.toContain("Missing boundaries"); + expect(ctx!).not.toContain("Primary action"); + }); + + test("Plan agent gets light verification context (story + missing boundaries + candidate)", () => { + recordStory(verificationSessionId, "flow-verification", "/settings", "settings page loads but save fails", ["verification"]); + recordObservation(verificationSessionId, makeObs("v1", "clientRequest", { route: "/settings" })); + + const ctx = buildVerificationContext(verificationSessionId, "light"); + expect(ctx).not.toBeNull(); + expect(ctx!).toContain('scope="light"'); + expect(ctx!).toContain("flow-verification"); + expect(ctx!).toContain("/settings"); + expect(ctx!).toContain("Missing boundaries:"); + expect(ctx!).toContain("Candidate action:"); + }); + + test("general-purpose agent gets standard verification context (full evidence + action)", () => { + recordStory(verificationSessionId, "flow-verification", "/settings", "settings page loads but save fails", ["verification"]); + recordObservation(verificationSessionId, makeObs("v1", "clientRequest", { route: "/settings" })); + recordObservation(verificationSessionId, makeObs("v2", "serverHandler", { route: "/settings" })); + + const ctx = buildVerificationContext(verificationSessionId, "standard"); + expect(ctx).not.toBeNull(); + expect(ctx!).toContain('scope="standard"'); + expect(ctx!).toContain("flow-verification"); + expect(ctx!).toContain("Evidence: 2/4 boundaries"); + expect(ctx!).toContain("Missing:"); + expect(ctx!).toContain("Primary action:"); + expect(ctx!).toContain("Reason:"); + }); + + test("standard context includes recent routes", () => { + recordStory(verificationSessionId, "flow-verification", "/settings", "test", []); + recordObservation(verificationSessionId, makeObs("v1", "clientRequest", { route: "/settings" })); + recordObservation(verificationSessionId, makeObs("v2", "serverHandler", { route: "/dashboard" })); + + const ctx = buildVerificationContext(verificationSessionId, "standard"); + expect(ctx).not.toBeNull(); + expect(ctx!).toContain("Recent routes:"); + expect(ctx!).toContain("/settings"); + expect(ctx!).toContain("/dashboard"); + }); + + test("light context includes blocked reasons", () => { + recordStory(verificationSessionId, "flow-verification", null, "test", []); + recordObservation(verificationSessionId, makeObs("v1", "clientRequest")); + recordObservation(verificationSessionId, makeObs("v2", "serverHandler")); + recordObservation(verificationSessionId, makeObs("v3", "environment")); + // Need to force browser unavailability — the cached plan was derived with defaults. + // Re-derive with browser unavailable so the cached state reflects it. + const obs = loadObservations(verificationSessionId); + const stories = loadStories(verificationSessionId); + const plan = derivePlan(obs, stories, { agentBrowserAvailable: false }); + persistPlanState(verificationSessionId, plan); + + const ctx = buildVerificationContext(verificationSessionId, "light"); + expect(ctx).not.toBeNull(); + expect(ctx!).toContain("Blocked:"); + }); +}); + +// --------------------------------------------------------------------------- +// Verification context: fixture scenarios +// --------------------------------------------------------------------------- + +describe("subagent-start-context: verification fixtures", () => { + beforeEach(() => { + verificationSessionId = `subagent-fix-${Date.now()}-${Math.random().toString(36).slice(2)}`; + }); + + afterEach(() => { + try { + rmSync(join(tmpdir(), `vercel-plugin-${verificationSessionId}-ledger`), { recursive: true, force: true }); + } catch {} + }); + + test("settings page loads but save fails — scoped context per agent type", () => { + recordStory(verificationSessionId, "flow-verification", "/settings", "settings page loads but save fails", ["verification"]); + recordObservation(verificationSessionId, makeObs("f1", "clientRequest", { route: "/settings" })); + + const minimal = buildVerificationContext(verificationSessionId, "minimal"); + const light = buildVerificationContext(verificationSessionId, "light"); + const standard = buildVerificationContext(verificationSessionId, "standard"); + + // All should mention the story + expect(minimal).toContain("flow-verification"); + expect(light).toContain("flow-verification"); + expect(standard).toContain("flow-verification"); + + // Light and standard should have missing boundaries + expect(light).toContain("Missing boundaries:"); + expect(standard).toContain("Missing:"); + + // Standard should have evidence count + expect(standard).toContain("Evidence: 1/4 boundaries"); + }); + + test("blank page on dashboard — all agent types get context", () => { + recordStory(verificationSessionId, "browser-only", "/dashboard", "blank page on dashboard", ["agent-browser-verify"]); + + const minimal = buildVerificationContext(verificationSessionId, "minimal"); + const light = buildVerificationContext(verificationSessionId, "light"); + const standard = buildVerificationContext(verificationSessionId, "standard"); + + expect(minimal).toContain("browser-only"); + expect(minimal).toContain("/dashboard"); + expect(light).toContain("blank page on dashboard"); + expect(standard).toContain("blank page on dashboard"); + expect(standard).toContain("Evidence: 0/4 boundaries"); + }); + + test("env inspection — environment boundary satisfied", () => { + recordStory(verificationSessionId, "stuck-investigation", null, "env vars missing", []); + recordObservation(verificationSessionId, makeObs("e1", "environment", { summary: "printenv" })); + + const standard = buildVerificationContext(verificationSessionId, "standard"); + expect(standard).toContain("environment"); + expect(standard).toContain("Evidence: 1/4 boundaries"); + }); +}); + +// --------------------------------------------------------------------------- +// Evidence isolation between sibling subagents +// --------------------------------------------------------------------------- + +describe("subagent-start-context: evidence isolation", () => { + let siblingSession: string; + + beforeEach(() => { + siblingSession = `sibling-${Date.now()}-${Math.random().toString(36).slice(2)}`; + }); + + afterEach(() => { + try { + rmSync(join(tmpdir(), `vercel-plugin-${siblingSession}-ledger`), { recursive: true, force: true }); + } catch {} + }); + + test("sibling agents do not overwrite each other's verification state", () => { + // Parent session creates a story + recordStory(siblingSession, "flow-verification", "/settings", "settings broken", ["verification"]); + + // Subagent A records an observation + recordObservation(siblingSession, makeObs("agent-a-obs", "clientRequest", { + route: "/settings", + source: "subagent", + meta: { agentId: "explore-1" }, + })); + + // Subagent B records a different observation + recordObservation(siblingSession, makeObs("agent-b-obs", "serverHandler", { + route: "/settings", + source: "subagent", + meta: { agentId: "plan-1" }, + })); + + // Both observations should be visible to the parent session + const obs = loadObservations(siblingSession); + expect(obs).toHaveLength(2); + expect(obs.find((o: VerificationObservation) => o.id === "agent-a-obs")).toBeTruthy(); + expect(obs.find((o: VerificationObservation) => o.id === "agent-b-obs")).toBeTruthy(); + + // Plan should reflect both observations + const plan = computePlan(siblingSession); + expect(plan.satisfiedBoundaries).toContain("clientRequest"); + expect(plan.satisfiedBoundaries).toContain("serverHandler"); + expect(plan.observationCount).toBe(2); + }); + + test("observations from multiple subagents are idempotent by id", () => { + recordStory(siblingSession, "flow-verification", null, "test", []); + + // Both agents try to record the same observation (e.g., race condition) + recordObservation(siblingSession, makeObs("shared-obs", "clientRequest", { + source: "subagent", + meta: { agentId: "explore-1" }, + })); + recordObservation(siblingSession, makeObs("shared-obs", "clientRequest", { + source: "subagent", + meta: { agentId: "plan-1" }, + })); + + const plan = computePlan(siblingSession); + // Should only count once despite two appends + expect(plan.observationCount).toBe(1); + }); + + test("deterministic planner output across repeated runs", () => { + recordStory(siblingSession, "flow-verification", "/settings", "settings page loads but save fails", ["verification"]); + recordObservation(siblingSession, makeObs("det-1", "clientRequest", { route: "/settings" })); + recordObservation(siblingSession, makeObs("det-2", "serverHandler", { route: "/settings" })); + + const plan1 = computePlan(siblingSession); + const plan2 = computePlan(siblingSession); + const plan3 = computePlan(siblingSession); + + expect(JSON.stringify(plan1, null, 2)).toBe(JSON.stringify(plan2, null, 2)); + expect(JSON.stringify(plan2, null, 2)).toBe(JSON.stringify(plan3, null, 2)); + }); +}); + +// --------------------------------------------------------------------------- +// buildVerificationContextFromPlan: deterministic story selection +// --------------------------------------------------------------------------- + +describe("subagent-start-context: buildVerificationContextFromPlan", () => { + test("uses primary story (most recently updated) for standard agents", () => { + const plan: VerificationPlanResult = { + hasStories: true, + stories: [ + { + id: "older", + kind: "flow-verification", + route: "/older", + promptExcerpt: "older prompt", + createdAt: "2026-03-27T00:00:00.000Z", + updatedAt: "2026-03-27T00:00:00.000Z", + }, + { + id: "newer", + kind: "flow-verification", + route: "/settings", + promptExcerpt: "verify settings flow", + createdAt: "2026-03-27T00:01:00.000Z", + updatedAt: "2026-03-27T00:02:00.000Z", + }, + ], + observationCount: 1, + satisfiedBoundaries: ["serverHandler"], + missingBoundaries: ["clientRequest", "environment", "uiRender"], + recentRoutes: ["/settings"], + primaryNextAction: { + action: "curl /settings", + targetBoundary: "clientRequest", + reason: "No HTTP request observation yet — verify the endpoint responds", + }, + blockedReasons: [], + }; + + const context = buildVerificationContextFromPlan(plan, "standard"); + expect(context).toContain("Verification story: flow-verification (/settings)"); + expect(context).toContain("Primary action:"); + expect(context).not.toContain("(/older)"); + }); + + test("returns null for plan with no stories", () => { + const plan: VerificationPlanResult = { + hasStories: false, + stories: [], + observationCount: 0, + satisfiedBoundaries: [], + missingBoundaries: [], + recentRoutes: [], + primaryNextAction: null, + blockedReasons: [], + }; + + expect(buildVerificationContextFromPlan(plan, "standard")).toBeNull(); + }); + + test("minimal scope contains only story kind and route", () => { + const plan: VerificationPlanResult = { + hasStories: true, + stories: [{ + id: "s1", + kind: "browser-only", + route: "/dashboard", + promptExcerpt: "blank page", + createdAt: T0, + updatedAt: T0, + }], + observationCount: 0, + satisfiedBoundaries: [], + missingBoundaries: ["clientRequest", "environment", "serverHandler", "uiRender"], + recentRoutes: [], + primaryNextAction: { action: "curl /dashboard", targetBoundary: "clientRequest", reason: "test" }, + blockedReasons: [], + }; + + const ctx = buildVerificationContextFromPlan(plan, "minimal"); + expect(ctx).toContain("browser-only"); + expect(ctx).toContain("/dashboard"); + expect(ctx).not.toContain("Missing"); + expect(ctx).not.toContain("Primary action"); + }); +}); + +// --------------------------------------------------------------------------- +// Verification directive and env +// --------------------------------------------------------------------------- + +describe("subagent-start-context: verification directive", () => { + test("buildVerificationDirective returns null for empty plan", () => { + expect(buildVerificationDirective(null)).toBeNull(); + expect(buildVerificationDirective({ + hasStories: false, + stories: [], + observationCount: 0, + satisfiedBoundaries: [], + missingBoundaries: [], + recentRoutes: [], + primaryNextAction: null, + blockedReasons: [], + })).toBeNull(); + }); + + test("buildVerificationDirective selects primary story", () => { + const plan: VerificationPlanResult = { + hasStories: true, + stories: [ + { id: "old", kind: "flow-verification", route: "/old", promptExcerpt: "old", createdAt: T0, updatedAt: T0 }, + { id: "new", kind: "flow-verification", route: "/new", promptExcerpt: "new", createdAt: "2026-03-27T01:00:00.000Z", updatedAt: "2026-03-27T01:00:00.000Z" }, + ], + observationCount: 1, + satisfiedBoundaries: ["serverHandler"], + missingBoundaries: ["clientRequest"], + recentRoutes: [], + primaryNextAction: { action: "curl /new", targetBoundary: "clientRequest", reason: "test" }, + blockedReasons: [], + }; + + const directive = buildVerificationDirective(plan); + expect(directive).not.toBeNull(); + expect(directive!.version).toBe(1); + expect(directive!.storyId).toBe("new"); + expect(directive!.route).toBe("/new"); + expect(directive!.primaryNextAction?.action).toBe("curl /new"); + }); + + test("buildVerificationEnv returns env vars from directive", () => { + const directive = { + version: 1 as const, + storyId: "abc123", + storyKind: "flow-verification", + route: "/settings", + missingBoundaries: ["clientRequest"], + satisfiedBoundaries: ["serverHandler"], + primaryNextAction: { + action: "curl /settings", + targetBoundary: "clientRequest", + reason: "test", + }, + blockedReasons: [], + }; + + const env = buildVerificationEnv(directive); + expect(env.VERCEL_PLUGIN_VERIFICATION_STORY_ID).toBe("abc123"); + expect(env.VERCEL_PLUGIN_VERIFICATION_ROUTE).toBe("/settings"); + expect(env.VERCEL_PLUGIN_VERIFICATION_BOUNDARY).toBe("clientRequest"); + expect(env.VERCEL_PLUGIN_VERIFICATION_ACTION).toBe("curl /settings"); + }); + + test("buildVerificationEnv returns empty when no next action", () => { + const directive = { + version: 1 as const, + storyId: "abc", + storyKind: "flow-verification", + route: null, + missingBoundaries: [], + satisfiedBoundaries: [], + primaryNextAction: null, + blockedReasons: [], + }; + + expect(buildVerificationEnv(directive)).toEqual({ + VERCEL_PLUGIN_VERIFICATION_STORY_ID: "", + VERCEL_PLUGIN_VERIFICATION_ROUTE: "", + VERCEL_PLUGIN_VERIFICATION_BOUNDARY: "", + VERCEL_PLUGIN_VERIFICATION_ACTION: "", + }); + }); +}); + +// --------------------------------------------------------------------------- +// Directive env contract: all four keys present +// --------------------------------------------------------------------------- + +const CONTRACT_KEYS = [ + "VERCEL_PLUGIN_VERIFICATION_STORY_ID", + "VERCEL_PLUGIN_VERIFICATION_ROUTE", + "VERCEL_PLUGIN_VERIFICATION_BOUNDARY", + "VERCEL_PLUGIN_VERIFICATION_ACTION", +] as const; + +describe("subagent-start-context: directive env contract stability", () => { + test("active directive emits exactly the four contract keys", () => { + const directive = { + version: 1 as const, + storyId: "s1", + storyKind: "flow-verification", + route: "/api/test", + missingBoundaries: ["clientRequest"], + satisfiedBoundaries: ["serverHandler"], + primaryNextAction: { + action: "curl /api/test", + targetBoundary: "clientRequest", + reason: "verify endpoint", + }, + blockedReasons: [], + }; + + const env = buildVerificationEnv(directive); + const keys = Object.keys(env).sort(); + expect(keys).toEqual([...CONTRACT_KEYS].sort()); + for (const k of CONTRACT_KEYS) { + expect(typeof env[k]).toBe("string"); + } + }); + + test("null directive emits exactly the four contract keys with empty-string clearing values", () => { + const env = buildVerificationEnv(null); + const keys = Object.keys(env).sort(); + expect(keys).toEqual([...CONTRACT_KEYS].sort()); + for (const k of CONTRACT_KEYS) { + expect(env[k]).toBe(""); + } + }); + + test("directive without primaryNextAction emits clearing values for all four keys", () => { + const plan: VerificationPlanResult = { + hasStories: true, + stories: [{ + id: "s1", + kind: "flow-verification", + route: "/test", + promptExcerpt: "test", + createdAt: T0, + updatedAt: T0, + }], + observationCount: 0, + satisfiedBoundaries: [], + missingBoundaries: [], + recentRoutes: [], + primaryNextAction: null, + blockedReasons: [], + }; + + const directive = buildVerificationDirective(plan); + expect(directive).not.toBeNull(); + const env = buildVerificationEnv(directive); + const keys = Object.keys(env).sort(); + expect(keys).toEqual([...CONTRACT_KEYS].sort()); + // All clearing because primaryNextAction is null + for (const k of CONTRACT_KEYS) { + expect(env[k]).toBe(""); + } + }); +}); + +// --------------------------------------------------------------------------- +// Cross-hook roundtrip: SubagentStart emits → PostToolUse consumes +// --------------------------------------------------------------------------- + +describe("subagent-start-context: SubagentStart → PostToolUse roundtrip", () => { + test("PostToolUse envString reads what SubagentStart buildVerificationEnv writes — no translation needed", () => { + const directive = { + version: 1 as const, + storyId: "roundtrip-story", + storyKind: "flow-verification", + route: "/settings", + missingBoundaries: ["clientRequest"], + satisfiedBoundaries: ["serverHandler"], + primaryNextAction: { + action: "curl http://localhost:3000/settings", + targetBoundary: "clientRequest", + reason: "test", + }, + blockedReasons: [], + }; + + // SubagentStart produces these env vars + const emittedEnv = buildVerificationEnv(directive); + + // Simulate PostToolUse reading them via envString + const fakeEnv = emittedEnv as unknown as NodeJS.ProcessEnv; + expect(envString(fakeEnv, "VERCEL_PLUGIN_VERIFICATION_STORY_ID")).toBe("roundtrip-story"); + expect(envString(fakeEnv, "VERCEL_PLUGIN_VERIFICATION_ROUTE")).toBe("/settings"); + expect(envString(fakeEnv, "VERCEL_PLUGIN_VERIFICATION_BOUNDARY")).toBe("clientRequest"); + expect(envString(fakeEnv, "VERCEL_PLUGIN_VERIFICATION_ACTION")).toBe("curl http://localhost:3000/settings"); + }); + + test("PostToolUse resolveObservedRoute consumes directive route when inference is null", () => { + const directive = { + version: 1 as const, + storyId: "story-1", + storyKind: "flow-verification", + route: "/dashboard", + missingBoundaries: ["clientRequest"], + satisfiedBoundaries: [], + primaryNextAction: { + action: "curl http://localhost:3000/dashboard", + targetBoundary: "clientRequest", + reason: "test", + }, + blockedReasons: [], + }; + + const emittedEnv = buildVerificationEnv(directive); + const fakeEnv = emittedEnv as unknown as NodeJS.ProcessEnv; + + // Inference null → falls back to directive route + expect(resolveObservedRoute(null, fakeEnv)).toBe("/dashboard"); + // Inference present → prefers inference + expect(resolveObservedRoute("/api/data", fakeEnv)).toBe("/api/data"); + }); + + test("clearing env (null directive) produces null from PostToolUse envString", () => { + const emittedEnv = buildVerificationEnv(null); + const fakeEnv = emittedEnv as unknown as NodeJS.ProcessEnv; + + // All keys exist but are empty → envString returns null + for (const k of CONTRACT_KEYS) { + expect(envString(fakeEnv, k)).toBeNull(); + } + // resolveObservedRoute also returns null + expect(resolveObservedRoute(null, fakeEnv)).toBeNull(); + }); + + test("emitted env is JSON-serializable for agent inspection", () => { + const directive = { + version: 1 as const, + storyId: "json-test", + storyKind: "flow-verification", + route: "/api", + missingBoundaries: [], + satisfiedBoundaries: [], + primaryNextAction: { + action: "curl /api", + targetBoundary: "clientRequest", + reason: "test", + }, + blockedReasons: [], + }; + + const emittedEnv = buildVerificationEnv(directive); + const serialized = JSON.stringify(emittedEnv); + const deserialized = JSON.parse(serialized); + expect(deserialized).toEqual(emittedEnv); + }); +}); diff --git a/tests/user-prompt-companion-rulebook.test.ts b/tests/user-prompt-companion-rulebook.test.ts new file mode 100644 index 0000000..9a8c18e --- /dev/null +++ b/tests/user-prompt-companion-rulebook.test.ts @@ -0,0 +1,244 @@ +import { describe, test, expect, beforeEach, afterEach } from "bun:test"; +import { writeFileSync, mkdirSync, rmSync } from "node:fs"; +import { randomUUID } from "node:crypto"; +import { + companionRulebookPath, + saveCompanionRulebook, + type LearnedCompanionRule, + type LearnedCompanionRulebook, +} from "../hooks/src/learned-companion-rulebook.mts"; +import { + recallVerifiedCompanions, + type CompanionRecallResult, +} from "../hooks/src/companion-recall.mts"; + +// --------------------------------------------------------------------------- +// Fixtures +// --------------------------------------------------------------------------- + +const T0 = "2026-03-28T08:00:00.000Z"; +const PROJECT = `/tmp/test-companion-prompt-${randomUUID()}`; +const SCENARIO = "UserPromptSubmit|flow-verification|uiRender|Prompt|*"; + +function makeRule( + overrides: Partial = {}, +): LearnedCompanionRule { + return { + id: `${SCENARIO}::ai-sdk->ai-elements`, + scenario: SCENARIO, + hook: "UserPromptSubmit", + storyKind: "flow-verification", + targetBoundary: "uiRender", + toolName: "Prompt", + routeScope: "*", + candidateSkill: "ai-sdk", + companionSkill: "ai-elements", + support: 6, + winsWithCompanion: 5, + winsWithoutCompanion: 2, + directiveWinsWithCompanion: 2, + staleMissesWithCompanion: 0, + precisionWithCompanion: 0.8333, + baselinePrecisionWithoutCompanion: 0.5, + liftVsCandidateAlone: 1.6667, + staleMissDelta: 0, + confidence: "promote", + promotedAt: T0, + reason: "companion beats candidate-alone within same verified scenario", + sourceExposureGroupIds: ["g-1", "g-2", "g-3", "g-4", "g-5", "g-6"], + ...overrides, + }; +} + +function makeRulebook( + rules: LearnedCompanionRule[] = [makeRule()], +): LearnedCompanionRulebook { + return { + version: 1, + generatedAt: T0, + projectRoot: PROJECT, + rules, + replay: { baselineWins: 0, learnedWins: 0, deltaWins: 0, regressions: [] }, + promotion: { + accepted: true, + errorCode: null, + reason: `${rules.filter((r) => r.confidence === "promote").length} promoted companion rules`, + }, + }; +} + +// --------------------------------------------------------------------------- +// Lifecycle +// --------------------------------------------------------------------------- + +beforeEach(() => { + mkdirSync(PROJECT, { recursive: true }); +}); + +afterEach(() => { + const path = companionRulebookPath(PROJECT); + try { rmSync(path); } catch {} + try { rmSync(PROJECT, { recursive: true }); } catch {} +}); + +// --------------------------------------------------------------------------- +// Tests +// --------------------------------------------------------------------------- + +describe("UserPromptSubmit companion recall", () => { + test("recalls promoted companion for UserPromptSubmit hook", () => { + saveCompanionRulebook(PROJECT, makeRulebook()); + + const result = recallVerifiedCompanions({ + projectRoot: PROJECT, + scenario: { + hook: "UserPromptSubmit", + storyKind: "flow-verification", + targetBoundary: "uiRender", + toolName: "Prompt", + routeScope: null, + }, + candidateSkills: ["ai-sdk"], + excludeSkills: new Set(), + maxCompanions: 1, + }); + + expect(result.selected).toHaveLength(1); + expect(result.selected[0].candidateSkill).toBe("ai-sdk"); + expect(result.selected[0].companionSkill).toBe("ai-elements"); + expect(result.selected[0].confidence).toBeCloseTo(1.6667, 3); + }); + + test("rejects when companion is in excludeSkills", () => { + saveCompanionRulebook(PROJECT, makeRulebook()); + + const result = recallVerifiedCompanions({ + projectRoot: PROJECT, + scenario: { + hook: "UserPromptSubmit", + storyKind: "flow-verification", + targetBoundary: "uiRender", + toolName: "Prompt", + routeScope: null, + }, + candidateSkills: ["ai-sdk"], + excludeSkills: new Set(["ai-elements"]), + maxCompanions: 1, + }); + + expect(result.selected).toHaveLength(0); + expect(result.rejected).toHaveLength(1); + expect(result.rejected[0].rejectedReason).toBe("excluded"); + }); + + test("no-ops when no promoted rules exist", () => { + const rulebook = makeRulebook([ + makeRule({ confidence: "holdout-fail", promotedAt: null }), + ]); + saveCompanionRulebook(PROJECT, rulebook); + + const result = recallVerifiedCompanions({ + projectRoot: PROJECT, + scenario: { + hook: "UserPromptSubmit", + storyKind: "flow-verification", + targetBoundary: "uiRender", + toolName: "Prompt", + routeScope: null, + }, + candidateSkills: ["ai-sdk"], + excludeSkills: new Set(), + maxCompanions: 1, + }); + + expect(result.selected).toHaveLength(0); + }); + + test("no-ops when rulebook is missing", () => { + const result = recallVerifiedCompanions({ + projectRoot: PROJECT, + scenario: { + hook: "UserPromptSubmit", + storyKind: "flow-verification", + targetBoundary: "uiRender", + toolName: "Prompt", + routeScope: null, + }, + candidateSkills: ["ai-sdk"], + excludeSkills: new Set(), + maxCompanions: 1, + }); + + expect(result.selected).toHaveLength(0); + }); + + test("symmetric behavior: trigger and reasonCode match PreToolUse contract", () => { + saveCompanionRulebook(PROJECT, makeRulebook()); + + const result = recallVerifiedCompanions({ + projectRoot: PROJECT, + scenario: { + hook: "UserPromptSubmit", + storyKind: "flow-verification", + targetBoundary: "uiRender", + toolName: "Prompt", + routeScope: null, + }, + candidateSkills: ["ai-sdk"], + excludeSkills: new Set(), + maxCompanions: 1, + }); + + // The caller is responsible for setting trigger/reasonCode, but the + // recall module returns the reason from the rule for traceability + expect(result.selected[0].reason).toBe( + "companion beats candidate-alone within same verified scenario", + ); + expect(result.selected[0].scenario).toBe(SCENARIO); + }); + + test("does not duplicate companion already in candidateSkills", () => { + saveCompanionRulebook(PROJECT, makeRulebook()); + + // Companion is already a candidate — should be in excludeSkills + const result = recallVerifiedCompanions({ + projectRoot: PROJECT, + scenario: { + hook: "UserPromptSubmit", + storyKind: "flow-verification", + targetBoundary: "uiRender", + toolName: "Prompt", + routeScope: null, + }, + candidateSkills: ["ai-sdk"], + excludeSkills: new Set(["ai-sdk", "ai-elements"]), + maxCompanions: 1, + }); + + expect(result.selected).toHaveLength(0); + expect(result.rejected).toHaveLength(1); + }); + + test("checks multiple scenario candidates in fallback order", () => { + // Rule is for wildcard scenario — should match via fallback + saveCompanionRulebook(PROJECT, makeRulebook()); + + const result = recallVerifiedCompanions({ + projectRoot: PROJECT, + scenario: { + hook: "UserPromptSubmit", + storyKind: "flow-verification", + targetBoundary: "uiRender", + toolName: "Prompt", + routeScope: "/specific-route", + }, + candidateSkills: ["ai-sdk"], + excludeSkills: new Set(), + maxCompanions: 1, + }); + + // Wildcard rule matches via fallback + expect(result.selected).toHaveLength(1); + expect(result.checkedScenarios.length).toBeGreaterThanOrEqual(2); + }); +}); diff --git a/tests/user-prompt-routing-policy-integration.test.ts b/tests/user-prompt-routing-policy-integration.test.ts new file mode 100644 index 0000000..61d7605 --- /dev/null +++ b/tests/user-prompt-routing-policy-integration.test.ts @@ -0,0 +1,737 @@ +import { describe, test, expect, beforeEach, afterEach } from "bun:test"; +import { writeFileSync, unlinkSync, readFileSync, mkdirSync, rmSync } from "node:fs"; +import { join, resolve } from "node:path"; +import { + createEmptyRoutingPolicy, + applyPolicyBoosts, + applyRulebookBoosts, + type RoutingPolicyFile, + type RoutingPolicyScenario, +} from "../hooks/src/routing-policy.mts"; +import { + saveRulebook, + rulebookPath, + createRule, + createEmptyRulebook, + type LearnedRoutingRulebook, +} from "../hooks/src/learned-routing-rulebook.mts"; +import { + projectPolicyPath, + sessionExposurePath, + loadProjectRoutingPolicy, + saveProjectRoutingPolicy, + appendSkillExposure, + loadSessionExposures, + type SkillExposure, +} from "../hooks/src/routing-policy-ledger.mts"; +import { + statePath as verificationStatePath, +} from "../hooks/src/verification-ledger.mts"; +import { + readRoutingDecisionTrace, + traceDir, +} from "../hooks/src/routing-decision-trace.mts"; + +// --------------------------------------------------------------------------- +// Helpers +// --------------------------------------------------------------------------- + +const TEST_PROJECT = "/tmp/test-user-prompt-routing-policy-" + Date.now(); +const TEST_SESSION = "test-session-uprp-" + Date.now(); + +const T0 = "2026-03-27T04:00:00.000Z"; +const T1 = "2026-03-27T04:01:00.000Z"; + +function cleanupPolicyFile(): void { + try { unlinkSync(projectPolicyPath(TEST_PROJECT)); } catch {} +} + +function cleanupExposureFile(): void { + try { unlinkSync(sessionExposurePath(TEST_SESSION)); } catch {} +} + +function cleanupRulebookFile(): void { + try { unlinkSync(rulebookPath(TEST_PROJECT)); } catch {} +} + +/** Write a minimal mock verification plan state for the session. */ +function writeMockPlanState(sessionId: string, story?: { + id?: string; + kind?: string; + route?: string | null; + targetBoundary?: string | null; +}): void { + const sp = verificationStatePath(sessionId); + mkdirSync(join(sp, ".."), { recursive: true }); + const s = { + id: story?.id ?? "test-prompt-story", + kind: story?.kind ?? "deployment", + route: story?.route ?? "/api/test", + promptExcerpt: "test prompt", + createdAt: T0, + updatedAt: T1, + requestedSkills: [], + }; + const tb = story?.targetBoundary ?? null; + writeFileSync(sp, JSON.stringify({ + version: 1, + stories: [s], + observationIds: [], + satisfiedBoundaries: [], + missingBoundaries: [], + recentRoutes: [], + primaryNextAction: tb + ? { action: "verify boundary", targetBoundary: tb, reason: "test" } + : null, + blockedReasons: [], + })); +} + +function cleanupMockPlanState(sessionId: string): void { + const sp = verificationStatePath(sessionId); + try { rmSync(join(sp, ".."), { recursive: true, force: true }); } catch {} +} + +function buildPromptPolicy( + skill: string, + exposures: number, + wins: number, + directiveWins: number, + staleMisses: number, +): RoutingPolicyFile { + const policy = createEmptyRoutingPolicy(); + const scenario = "UserPromptSubmit|none|none|Prompt"; + policy.scenarios[scenario] = { + [skill]: { + exposures, + wins, + directiveWins, + staleMisses, + lastUpdatedAt: T0, + }, + }; + return policy; +} + +// --------------------------------------------------------------------------- +// Setup / teardown +// --------------------------------------------------------------------------- + +beforeEach(() => { + cleanupPolicyFile(); + cleanupExposureFile(); + cleanupRulebookFile(); +}); + +afterEach(() => { + cleanupPolicyFile(); + cleanupExposureFile(); + cleanupRulebookFile(); + cleanupMockPlanState(TEST_SESSION); +}); + +// --------------------------------------------------------------------------- +// Tests +// --------------------------------------------------------------------------- + +describe("user-prompt-submit routing-policy integration", () => { + describe("applyPolicyBoosts with UserPromptSubmit scenario", () => { + const PROMPT_SCENARIO: RoutingPolicyScenario = { + hook: "UserPromptSubmit", + storyKind: null, + targetBoundary: null, + toolName: "Prompt", + }; + + test("applies boost to prompt-matched skills with sufficient history", () => { + const policy = buildPromptPolicy("next-config", 5, 4, 2, 0); + saveProjectRoutingPolicy(TEST_PROJECT, policy); + const loaded = loadProjectRoutingPolicy(TEST_PROJECT); + + const entries = [ + { skill: "next-config", priority: 8, effectivePriority: 8 }, + { skill: "deployment", priority: 10, effectivePriority: 10 }, + ]; + + const boosted = applyPolicyBoosts(entries, loaded, PROMPT_SCENARIO); + + // next-config: (4 + 2*0.25)/5 = 0.9 => +8 boost => 16 + expect(boosted.find((b) => b.skill === "next-config")!.policyBoost).toBe(8); + expect(boosted.find((b) => b.skill === "next-config")!.effectivePriority).toBe(16); + // deployment: no data => 0 boost + expect(boosted.find((b) => b.skill === "deployment")!.policyBoost).toBe(0); + }); + + test("re-orders selected skills by boosted effective priority", () => { + const policy = buildPromptPolicy("low-base-skill", 5, 4, 2, 0); + saveProjectRoutingPolicy(TEST_PROJECT, policy); + const loaded = loadProjectRoutingPolicy(TEST_PROJECT); + + const entries = [ + { skill: "high-base-skill", priority: 12, effectivePriority: 12 }, + { skill: "low-base-skill", priority: 6, effectivePriority: 6 }, + ]; + + const boosted = applyPolicyBoosts(entries, loaded, PROMPT_SCENARIO); + + // Sort as the injector would: by effectivePriority desc, then skill name asc + boosted.sort((a, b) => + b.effectivePriority - a.effectivePriority || a.skill.localeCompare(b.skill), + ); + + // low-base-skill: 6 + 8 = 14 > high-base-skill: 12 + expect(boosted[0].skill).toBe("low-base-skill"); + expect(boosted[1].skill).toBe("high-base-skill"); + }); + + test("no boost when policy file missing", () => { + const loaded = loadProjectRoutingPolicy(TEST_PROJECT); + const entries = [ + { skill: "next-config", priority: 8, effectivePriority: 8 }, + ]; + + const boosted = applyPolicyBoosts(entries, loaded, PROMPT_SCENARIO); + expect(boosted[0].policyBoost).toBe(0); + expect(boosted[0].effectivePriority).toBe(8); + }); + + test("negative boost for skill with many exposures but low wins", () => { + const policy = buildPromptPolicy("bad-skill", 8, 0, 0, 7); + saveProjectRoutingPolicy(TEST_PROJECT, policy); + const loaded = loadProjectRoutingPolicy(TEST_PROJECT); + + const entries = [ + { skill: "bad-skill", priority: 7, effectivePriority: 7 }, + ]; + + const boosted = applyPolicyBoosts(entries, loaded, PROMPT_SCENARIO); + expect(boosted[0].policyBoost).toBe(-2); + expect(boosted[0].effectivePriority).toBe(5); + }); + }); + + describe("exposure recording for UserPromptSubmit", () => { + test("appends pending exposure with hook=UserPromptSubmit and toolName=Prompt", () => { + const exposure: SkillExposure = { + id: `${TEST_SESSION}:prompt:next-config:1`, + sessionId: TEST_SESSION, + projectRoot: TEST_PROJECT, + storyId: null, + storyKind: null, + route: null, + hook: "UserPromptSubmit", + toolName: "Prompt", + skill: "next-config", + targetBoundary: null, + exposureGroupId: null, + attributionRole: "candidate", + candidateSkill: null, + createdAt: T0, + resolvedAt: null, + outcome: "pending", + }; + + appendSkillExposure(exposure); + + const exposures = loadSessionExposures(TEST_SESSION); + expect(exposures.length).toBe(1); + expect(exposures[0].hook).toBe("UserPromptSubmit"); + expect(exposures[0].toolName).toBe("Prompt"); + expect(exposures[0].skill).toBe("next-config"); + expect(exposures[0].outcome).toBe("pending"); + }); + + test("records exposures only for injected skills not candidates", () => { + // Simulate: 3 matched, but only 2 injected (cap of MAX_SKILLS=2) + const injected = ["skill-a", "skill-b"]; + for (const skill of injected) { + appendSkillExposure({ + id: `${TEST_SESSION}:prompt:${skill}:${Date.now()}`, + sessionId: TEST_SESSION, + projectRoot: TEST_PROJECT, + storyId: null, + storyKind: null, + route: null, + hook: "UserPromptSubmit", + toolName: "Prompt", + skill, + targetBoundary: null, + createdAt: T0, + resolvedAt: null, + outcome: "pending", + }); + } + + const exposures = loadSessionExposures(TEST_SESSION); + expect(exposures.length).toBe(2); + expect(exposures.map((e) => e.skill).sort()).toEqual(["skill-a", "skill-b"]); + // skill-c (matched but not injected) should not have an exposure + }); + + test("policy file is not mutated during boost application", () => { + const policy = buildPromptPolicy("next-config", 5, 4, 2, 0); + saveProjectRoutingPolicy(TEST_PROJECT, policy); + + const before = readFileSync(projectPolicyPath(TEST_PROJECT), "utf-8"); + + const loaded = loadProjectRoutingPolicy(TEST_PROJECT); + applyPolicyBoosts( + [{ skill: "next-config", priority: 8, effectivePriority: 8 }], + loaded, + { + hook: "UserPromptSubmit", + storyKind: null, + targetBoundary: null, + toolName: "Prompt", + }, + ); + + const after = readFileSync(projectPolicyPath(TEST_PROJECT), "utf-8"); + expect(after).toBe(before); + }); + }); + + describe("deterministic ordering with policy ties", () => { + test("skills with same boosted priority sort by name ascending", () => { + const entries = [ + { skill: "z-skill", priority: 8, effectivePriority: 8 }, + { skill: "a-skill", priority: 8, effectivePriority: 8 }, + ]; + + const loaded = loadProjectRoutingPolicy(TEST_PROJECT); + const boosted = applyPolicyBoosts(entries, loaded, { + hook: "UserPromptSubmit", + storyKind: null, + targetBoundary: null, + toolName: "Prompt", + }); + + boosted.sort((a, b) => + b.effectivePriority - a.effectivePriority || a.skill.localeCompare(b.skill), + ); + + expect(boosted[0].skill).toBe("a-skill"); + expect(boosted[1].skill).toBe("z-skill"); + }); + }); + + describe("evidence scoping — story gate", () => { + test("exposure recording requires active verification story", () => { + // No mock plan state → exposureStory will be null → no exposure written + // Simulate what the hook does: check for story before writing + const exposurePlan = null; // loadCachedPlanResult returns null + const exposureStory = null; + + // Directly verify: if we attempt to record an exposure without a story, + // the hook code now skips it. We verify by writing exposures only with story. + if (exposureStory) { + appendSkillExposure({ + id: `${TEST_SESSION}:prompt:next-config:1`, + sessionId: TEST_SESSION, + projectRoot: TEST_PROJECT, + storyId: null, + storyKind: null, + route: null, + hook: "UserPromptSubmit", + toolName: "Prompt", + skill: "next-config", + targetBoundary: null, + createdAt: T0, + resolvedAt: null, + outcome: "pending", + }); + } + + const exposures = loadSessionExposures(TEST_SESSION); + expect(exposures).toEqual([]); + }); + + test("exposure recording proceeds with active verification story", () => { + writeMockPlanState(TEST_SESSION); + + // Simulate what the hook does: story found → record exposure with story fields + appendSkillExposure({ + id: `${TEST_SESSION}:prompt:next-config:1`, + sessionId: TEST_SESSION, + projectRoot: TEST_PROJECT, + storyId: "test-prompt-story", + storyKind: "deployment", + route: "/api/test", + hook: "UserPromptSubmit", + toolName: "Prompt", + skill: "next-config", + targetBoundary: null, + createdAt: T0, + resolvedAt: null, + outcome: "pending", + }); + + const exposures = loadSessionExposures(TEST_SESSION); + expect(exposures.length).toBe(1); + expect(exposures[0].storyId).toBe("test-prompt-story"); + expect(exposures[0].storyKind).toBe("deployment"); + }); + + test("no none|none scenario keys created when no story exists", () => { + // No plan state → no story → no exposures + const exposures = loadSessionExposures(TEST_SESSION); + const noneNone = exposures.filter( + (e) => e.storyId === null && e.storyKind === null, + ); + expect(noneNone).toEqual([]); + }); + }); +}); + +// --------------------------------------------------------------------------- +// Routing decision trace integration tests (UserPromptSubmit) +// --------------------------------------------------------------------------- + +const ROOT = resolve(import.meta.dirname, ".."); +const HOOK_SCRIPT = join(ROOT, "hooks", "user-prompt-submit-skill-inject.mjs"); + +/** Run UserPromptSubmit hook as subprocess */ +async function runPromptHook( + prompt: string, + env?: Record, + sessionId?: string, +): Promise<{ code: number; stdout: string; stderr: string }> { + const sid = sessionId ?? `trace-test-${Date.now()}-${Math.random().toString(36).slice(2)}`; + const payload = JSON.stringify({ + prompt, + session_id: sid, + cwd: ROOT, + hook_event_name: "UserPromptSubmit", + }); + const proc = Bun.spawn(["node", HOOK_SCRIPT], { + stdin: "pipe", + stdout: "pipe", + stderr: "pipe", + env: { + ...process.env, + VERCEL_PLUGIN_SEEN_SKILLS: "", + VERCEL_PLUGIN_LOG_LEVEL: "summary", + ...env, + }, + }); + proc.stdin.write(payload); + proc.stdin.end(); + const code = await proc.exited; + const stdout = await new Response(proc.stdout).text(); + const stderr = await new Response(proc.stderr).text(); + return { code, stdout, stderr }; +} + +describe("user-prompt-submit routing decision trace", () => { + let traceSession: string; + + beforeEach(() => { + traceSession = `trace-ups-${Date.now()}-${Math.random().toString(36).slice(2)}`; + }); + + afterEach(() => { + try { rmSync(traceDir(traceSession), { recursive: true, force: true }); } catch {} + cleanupMockPlanState(traceSession); + }); + + test("emits exactly one trace per prompt injection attempt", async () => { + const { code } = await runPromptHook( + "I want to deploy my Next.js application to Vercel production", + {}, + traceSession, + ); + expect(code).toBe(0); + + const traces = readRoutingDecisionTrace(traceSession); + expect(traces).toHaveLength(1); + expect(traces[0].hook).toBe("UserPromptSubmit"); + expect(traces[0].version).toBe(2); + expect(traces[0].toolName).toBe("Prompt"); + expect(traces[0].sessionId).toBe(traceSession); + expect(traces[0].decisionId).toMatch(/^[0-9a-f]{16}$/); + expect(Array.isArray(traces[0].matchedSkills)).toBe(true); + expect(Array.isArray(traces[0].injectedSkills)).toBe(true); + expect(Array.isArray(traces[0].ranked)).toBe(true); + }); + + test("records no_active_verification_story when no story exists", async () => { + const { code } = await runPromptHook( + "I want to deploy my Next.js application to Vercel production", + {}, + traceSession, + ); + expect(code).toBe(0); + + const traces = readRoutingDecisionTrace(traceSession); + expect(traces).toHaveLength(1); + expect(traces[0].skippedReasons).toContain("no_active_verification_story"); + expect(traces[0].policyScenario).toBeNull(); + }); + + test("records primaryStory and policyScenario when story exists", async () => { + writeMockPlanState(traceSession, { + id: "prompt-trace-story", + kind: "feature-investigation", + route: "/settings", + targetBoundary: "serverHandler", + }); + + const { code } = await runPromptHook( + "I want to deploy my Next.js application to Vercel production", + {}, + traceSession, + ); + expect(code).toBe(0); + + const traces = readRoutingDecisionTrace(traceSession); + expect(traces).toHaveLength(1); + expect(traces[0].primaryStory.id).toBe("prompt-trace-story"); + expect(traces[0].primaryStory.kind).toBe("feature-investigation"); + expect(traces[0].policyScenario).toBe("UserPromptSubmit|feature-investigation|serverHandler|Prompt"); + expect(traces[0].skippedReasons).not.toContain("no_active_verification_story"); + }); + + test("does not emit synthetic none|none policyScenario without story", async () => { + const { code } = await runPromptHook( + "I want to deploy my Next.js application to Vercel production", + {}, + traceSession, + ); + expect(code).toBe(0); + + const traces = readRoutingDecisionTrace(traceSession); + expect(traces).toHaveLength(1); + expect(traces[0].policyScenario).toBeNull(); + }); + + test("emits routing.decision_trace_written summary log", async () => { + const { code, stderr } = await runPromptHook( + "I want to deploy my Next.js application to Vercel production", + { VERCEL_PLUGIN_LOG_LEVEL: "summary" }, + traceSession, + ); + expect(code).toBe(0); + + const logLines = stderr + .split("\n") + .filter((l) => l.trim()) + .map((l) => { try { return JSON.parse(l); } catch { return null; } }) + .filter((o): o is Record => o !== null); + + const traceLog = logLines.find( + (l) => l.event === "routing.decision_trace_written", + ); + expect(traceLog).toBeDefined(); + expect(traceLog!.hook).toBe("UserPromptSubmit"); + expect(traceLog!.decisionId).toMatch(/^[0-9a-f]{16}$/); + }); + + test("ranked entries surface cap/budget drops in both skippedReasons and droppedReason", async () => { + const { code } = await runPromptHook( + "I want to deploy my Next.js application to Vercel production", + { + VERCEL_PLUGIN_PROMPT_INJECTION_BUDGET: "200", // Very low budget + }, + traceSession, + ); + expect(code).toBe(0); + + const traces = readRoutingDecisionTrace(traceSession); + expect(traces).toHaveLength(1); + + for (const reason of traces[0].skippedReasons) { + if (reason.startsWith("cap_exceeded:")) { + const skill = reason.replace("cap_exceeded:", ""); + const ranked = traces[0].ranked.find((r) => r.skill === skill); + if (ranked) { + expect(ranked.droppedReason).toBe("cap_exceeded"); + } + } + if (reason.startsWith("budget_exhausted:")) { + const skill = reason.replace("budget_exhausted:", ""); + const ranked = traces[0].ranked.find((r) => r.skill === skill); + if (ranked) { + expect(ranked.droppedReason).toBe("budget_exhausted"); + } + } + } + }); +}); + +// --------------------------------------------------------------------------- +// Learned-routing-rulebook precedence tests for UserPromptSubmit +// --------------------------------------------------------------------------- + +describe("user-prompt-submit rulebook precedence", () => { + const PROMPT_SCENARIO: RoutingPolicyScenario = { + hook: "UserPromptSubmit", + storyKind: null, + targetBoundary: null, + toolName: "Prompt", + }; + + function makeRulebook(rules: Array<{ + scenario: string; + skill: string; + boost: number; + action?: "promote" | "demote"; + }>): LearnedRoutingRulebook { + const rb = createEmptyRulebook("test-sess", T0); + for (const r of rules) { + rb.rules.push(createRule({ + scenario: r.scenario, + skill: r.skill, + action: r.action ?? "promote", + boost: r.boost, + confidence: 0.9, + reason: "replay verified: no regressions", + sourceSessionId: "test-sess", + promotedAt: T0, + evidence: { + baselineWins: 4, + baselineDirectiveWins: 2, + learnedWins: 4, + learnedDirectiveWins: 2, + regressionCount: 0, + }, + })); + } + return rb; + } + + test("rulebook boost replaces stats-policy boost for matching skill", () => { + // Stats-policy: +8 for next-config + const policy = buildPromptPolicy("next-config", 5, 4, 2, 0); + saveProjectRoutingPolicy(TEST_PROJECT, policy); + const loaded = loadProjectRoutingPolicy(TEST_PROJECT); + + // First apply stats-policy + const entries = [ + { skill: "next-config", priority: 8, effectivePriority: 8 }, + { skill: "deployment", priority: 10, effectivePriority: 10 }, + ]; + const boosted = applyPolicyBoosts(entries, loaded, PROMPT_SCENARIO); + + // Verify stats-policy gave +8 + const nextConfigStats = boosted.find((b) => b.skill === "next-config")!; + expect(nextConfigStats.policyBoost).toBe(8); + + // Now apply rulebook — rule gives +5 + const rulebook = makeRulebook([{ + scenario: "UserPromptSubmit|none|none|Prompt", + skill: "next-config", + boost: 5, + }]); + + const withRulebook = applyRulebookBoosts( + boosted, + rulebook, + PROMPT_SCENARIO, + "/tmp/test-rulebook.json", + ); + + const nextConfigRule = withRulebook.find((b) => b.skill === "next-config")!; + // Rulebook should replace stats-policy: base=8, ruleBoost=5, policyBoost suppressed + expect(nextConfigRule.matchedRuleId).toBe("UserPromptSubmit|none|none|Prompt|next-config"); + expect(nextConfigRule.ruleBoost).toBe(5); + expect(nextConfigRule.policyBoost).toBe(0); // suppressed + expect(nextConfigRule.effectivePriority).toBe(13); // 8 + 5 (not 8 + 8 + 5) + + // deployment should be unchanged + const deployment = withRulebook.find((b) => b.skill === "deployment")!; + expect(deployment.matchedRuleId).toBeNull(); + expect(deployment.ruleBoost).toBe(0); + expect(deployment.policyBoost).toBe(0); // no stats-policy either + }); + + test("route-scoped rulebook rule does not leak to other routes", () => { + const rulebook = makeRulebook([{ + scenario: "UserPromptSubmit|deployment|clientRequest|Prompt", + skill: "next-config", + boost: 10, + }]); + + // Scenario with different storyKind — should NOT match + const differentScenario: RoutingPolicyScenario = { + hook: "UserPromptSubmit", + storyKind: "feature", + targetBoundary: "uiRender", + toolName: "Prompt", + }; + + const entries = [ + { + skill: "next-config", + priority: 8, + effectivePriority: 8, + policyBoost: 0, + policyReason: null, + }, + ]; + + const withRulebook = applyRulebookBoosts( + entries, + rulebook, + differentScenario, + "/tmp/test-rulebook.json", + ); + + expect(withRulebook[0].matchedRuleId).toBeNull(); + expect(withRulebook[0].ruleBoost).toBe(0); + expect(withRulebook[0].effectivePriority).toBe(8); // unchanged + }); + + test("demote action produces negative boost", () => { + const rulebook = makeRulebook([{ + scenario: "UserPromptSubmit|none|none|Prompt", + skill: "next-config", + boost: 3, + action: "demote", + }]); + + const entries = [ + { + skill: "next-config", + priority: 8, + effectivePriority: 8, + policyBoost: 0, + policyReason: null, + }, + ]; + + const withRulebook = applyRulebookBoosts( + entries, + rulebook, + PROMPT_SCENARIO, + "/tmp/test-rulebook.json", + ); + + expect(withRulebook[0].ruleBoost).toBe(-3); + expect(withRulebook[0].effectivePriority).toBe(5); // 8 - 3 + }); + + test("trace ranked entries include rulebook fields with null defaults", () => { + const entries = [ + { + skill: "next-config", + priority: 8, + effectivePriority: 8, + policyBoost: 0, + policyReason: null, + }, + ]; + + const rulebook = createEmptyRulebook("test-sess", T0); + const withRulebook = applyRulebookBoosts( + entries, + rulebook, + PROMPT_SCENARIO, + "/tmp/test-rulebook.json", + ); + + expect(withRulebook[0].matchedRuleId).toBeNull(); + expect(withRulebook[0].ruleBoost).toBe(0); + expect(withRulebook[0].ruleReason).toBeNull(); + expect(withRulebook[0].rulebookPath).toBeNull(); + }); +}); diff --git a/tests/user-prompt-submit-skill-inject.test.ts b/tests/user-prompt-submit-skill-inject.test.ts index c91e742..bcaa5a9 100644 --- a/tests/user-prompt-submit-skill-inject.test.ts +++ b/tests/user-prompt-submit-skill-inject.test.ts @@ -1,6 +1,7 @@ import { describe, test, expect, beforeEach } from "bun:test"; -import { existsSync } from "node:fs"; +import { existsSync, readdirSync, readFileSync } from "node:fs"; import { join, resolve } from "node:path"; +import { decisionCapsuleDir } from "../hooks/src/routing-decision-capsule.mts"; const ROOT = resolve(import.meta.dirname, ".."); const HOOK_SCRIPT = join(ROOT, "hooks", "user-prompt-submit-skill-inject.mjs"); @@ -133,3 +134,32 @@ describe("lexical prompt matching (VERCEL_PLUGIN_LEXICAL_PROMPT)", () => { } }); }); + +// --------------------------------------------------------------------------- +// Decision capsule rulebookProvenance integration tests +// --------------------------------------------------------------------------- + +describe("decision capsule rulebookProvenance", () => { + test("persisted capsule has rulebookProvenance: null when no rulebook exists", async () => { + const { code, stdout } = await runHook( + "I want to deploy my app to production", + { VERCEL_PLUGIN_LEXICAL_PROMPT: "1", VERCEL_PLUGIN_SEEN_SKILLS: "" }, + ); + expect(code).toBe(0); + const result = JSON.parse(stdout); + // Hook must have matched and injected at least one skill + if (!result.hookSpecificOutput) return; + const meta = extractSkillInjection(result.hookSpecificOutput); + if (!meta || meta.injectedSkills.length === 0) return; + + // Find the persisted capsule for this session + const capsuleDir = decisionCapsuleDir(testSession); + if (!existsSync(capsuleDir)) return; + const capsuleFiles = readdirSync(capsuleDir).filter((f) => f.endsWith(".json")); + expect(capsuleFiles.length).toBeGreaterThanOrEqual(1); + + const capsule = JSON.parse(readFileSync(join(capsuleDir, capsuleFiles[0]), "utf-8")); + expect(capsule).toHaveProperty("rulebookProvenance"); + expect(capsule.rulebookProvenance).toBeNull(); + }); +}); diff --git a/tests/verification-ledger.test.ts b/tests/verification-ledger.test.ts new file mode 100644 index 0000000..54006e4 --- /dev/null +++ b/tests/verification-ledger.test.ts @@ -0,0 +1,980 @@ +import { describe, test, expect, beforeEach, afterEach } from "bun:test"; +import { existsSync, mkdtempSync, readFileSync, rmSync } from "node:fs"; +import { tmpdir } from "node:os"; +import { join } from "node:path"; +import { + type VerificationBoundary, + type VerificationObservation, + type VerificationStory, + type VerificationStoryKind, + type VerificationPlan, + type SerializedPlanStateV1, + appendObservation, + derivePlan, + deriveStoryStates, + selectActiveStoryId, + resolveObservationStoryId, + collectRecentRoutes, + normalizeSerializedPlanState, + serializePlanState, + upsertStory, + storyId, + persistObservation, + persistStories, + persistPlanState, + loadObservations, + loadStories, + loadPlanState, + recordObservation, + recordStory, + ledgerPath, + storiesPath, + statePath, +} from "../hooks/src/verification-ledger.mts"; + +// --------------------------------------------------------------------------- +// Helpers +// --------------------------------------------------------------------------- + +const T0 = "2026-03-26T12:00:00.000Z"; +const T1 = "2026-03-26T12:01:00.000Z"; +const T2 = "2026-03-26T12:02:00.000Z"; +const T3 = "2026-03-26T12:03:00.000Z"; + +function makeObs( + id: string, + boundary: VerificationBoundary | null, + opts?: Partial, +): VerificationObservation { + return { + id, + timestamp: T0, + source: "bash", + boundary, + route: null, + storyId: null, + summary: `obs-${id}`, + ...opts, + }; +} + +function makeStory( + kind: VerificationStoryKind, + route: string | null = null, +): VerificationStory { + return { + id: storyId(kind, route), + kind, + route, + promptExcerpt: "test prompt", + createdAt: T0, + updatedAt: T0, + requestedSkills: [], + }; +} + +// Use a unique session id per test to avoid collisions +let testSessionId: string; + +beforeEach(() => { + testSessionId = `test-ledger-${Date.now()}-${Math.random().toString(36).slice(2, 8)}`; +}); + +afterEach(() => { + // Clean up temp files + try { + const dir = join(tmpdir(), `vercel-plugin-${testSessionId}-ledger`); + rmSync(dir, { recursive: true, force: true }); + } catch { + // ignore + } +}); + +// --------------------------------------------------------------------------- +// Type definitions +// --------------------------------------------------------------------------- + +describe("verification-ledger types", () => { + test("VerificationBoundary covers all four boundary types", () => { + const boundaries: VerificationBoundary[] = [ + "uiRender", + "clientRequest", + "serverHandler", + "environment", + ]; + expect(boundaries).toHaveLength(4); + }); + + test("VerificationStoryKind covers all three kinds", () => { + const kinds: VerificationStoryKind[] = [ + "flow-verification", + "stuck-investigation", + "browser-only", + ]; + expect(kinds).toHaveLength(3); + }); + + test("VerificationObservation has required fields", () => { + const obs = makeObs("obs-1", "clientRequest"); + expect(obs.id).toBe("obs-1"); + expect(obs.boundary).toBe("clientRequest"); + expect(obs.source).toBe("bash"); + expect(obs.timestamp).toBe(T0); + }); + + test("VerificationStory has required fields", () => { + const story = makeStory("flow-verification", "/settings"); + expect(story.kind).toBe("flow-verification"); + expect(story.route).toBe("/settings"); + expect(story.id).toBeTruthy(); + }); +}); + +// --------------------------------------------------------------------------- +// Append dedup +// --------------------------------------------------------------------------- + +describe("appendObservation", () => { + test("appends a new observation", () => { + const obs = makeObs("a", "clientRequest"); + const result = appendObservation([], obs); + expect(result).toHaveLength(1); + expect(result[0].id).toBe("a"); + }); + + test("duplicate id is a no-op", () => { + const obs = makeObs("a", "clientRequest"); + const list = [obs]; + const result = appendObservation(list, obs); + expect(result).toBe(list); // same reference — no change + }); + + test("different ids are both appended", () => { + const a = makeObs("a", "clientRequest"); + const b = makeObs("b", "serverHandler"); + let list = appendObservation([], a); + list = appendObservation(list, b); + expect(list).toHaveLength(2); + }); + + test("does not mutate the input array", () => { + const original = [makeObs("a", "clientRequest")]; + const copy = [...original]; + appendObservation(original, makeObs("b", "serverHandler")); + expect(original).toEqual(copy); + }); +}); + +// --------------------------------------------------------------------------- +// Story upsert +// --------------------------------------------------------------------------- + +describe("upsertStory", () => { + test("creates a new story when none exists", () => { + const result = upsertStory([], "flow-verification", "/settings", "test", ["verification"], T0); + expect(result).toHaveLength(1); + expect(result[0].kind).toBe("flow-verification"); + expect(result[0].route).toBe("/settings"); + }); + + test("merges into existing story with same kind+route", () => { + const initial = upsertStory([], "flow-verification", "/settings", "first prompt", ["skill-a"], T0); + const result = upsertStory(initial, "flow-verification", "/settings", "second prompt", ["skill-b"], T1); + expect(result).toHaveLength(1); + expect(result[0].requestedSkills).toEqual(["skill-a", "skill-b"]); + expect(result[0].updatedAt).toBe(T1); + expect(result[0].promptExcerpt).toBe("second prompt"); + }); + + test("different kind creates a separate story", () => { + let stories = upsertStory([], "flow-verification", "/settings", "a", [], T0); + stories = upsertStory(stories, "stuck-investigation", "/settings", "b", [], T0); + expect(stories).toHaveLength(2); + }); + + test("different route creates a separate story", () => { + let stories = upsertStory([], "flow-verification", "/settings", "a", [], T0); + stories = upsertStory(stories, "flow-verification", "/dashboard", "b", [], T0); + expect(stories).toHaveLength(2); + }); + + test("does not mutate the input array", () => { + const original = upsertStory([], "flow-verification", "/", "x", [], T0); + const copy = [...original]; + upsertStory(original, "flow-verification", "/", "y", [], T1); + expect(original).toEqual(copy); + }); +}); + +// --------------------------------------------------------------------------- +// storyId determinism +// --------------------------------------------------------------------------- + +describe("storyId", () => { + test("same kind+route produces same id", () => { + expect(storyId("flow-verification", "/settings")).toBe( + storyId("flow-verification", "/settings"), + ); + }); + + test("different kind produces different id", () => { + expect(storyId("flow-verification", "/settings")).not.toBe( + storyId("stuck-investigation", "/settings"), + ); + }); + + test("null route uses wildcard", () => { + expect(storyId("flow-verification", null)).toBe( + storyId("flow-verification", null), + ); + }); +}); + +// --------------------------------------------------------------------------- +// derivePlan +// --------------------------------------------------------------------------- + +describe("derivePlan", () => { + test("empty inputs produce empty plan", () => { + const plan = derivePlan([], []); + expect(plan.observations).toHaveLength(0); + expect(plan.stories).toHaveLength(0); + expect(plan.missingBoundaries).toHaveLength(0); + expect(plan.primaryNextAction).toBeNull(); + }); + + test("deduplicates observations by id", () => { + const obs = makeObs("a", "clientRequest"); + const plan = derivePlan([obs, obs, obs], [makeStory("flow-verification")]); + expect(plan.observations).toHaveLength(1); + expect(plan.observationIds.size).toBe(1); + }); + + test("tracks satisfied boundaries", () => { + const obs = [ + makeObs("a", "clientRequest"), + makeObs("b", "serverHandler"), + ]; + const plan = derivePlan(obs, [makeStory("flow-verification")]); + expect(plan.satisfiedBoundaries.has("clientRequest")).toBe(true); + expect(plan.satisfiedBoundaries.has("serverHandler")).toBe(true); + expect(plan.satisfiedBoundaries.has("uiRender")).toBe(false); + }); + + test("computes missing boundaries when story exists", () => { + const obs = [makeObs("a", "clientRequest")]; + const plan = derivePlan(obs, [makeStory("flow-verification")]); + expect(plan.missingBoundaries).toContain("serverHandler"); + expect(plan.missingBoundaries).toContain("uiRender"); + expect(plan.missingBoundaries).toContain("environment"); + expect(plan.missingBoundaries).not.toContain("clientRequest"); + }); + + test("no missing boundaries without a story", () => { + const obs = [makeObs("a", "clientRequest")]; + const plan = derivePlan(obs, []); + expect(plan.missingBoundaries).toHaveLength(0); + }); + + test("all boundaries satisfied yields no next action", () => { + const obs = [ + makeObs("a", "clientRequest"), + makeObs("b", "serverHandler"), + makeObs("c", "uiRender"), + makeObs("d", "environment"), + ]; + const plan = derivePlan(obs, [makeStory("flow-verification")]); + expect(plan.missingBoundaries).toHaveLength(0); + expect(plan.primaryNextAction).toBeNull(); + }); + + test("emits next action for first missing boundary", () => { + const plan = derivePlan([], [makeStory("flow-verification")]); + expect(plan.primaryNextAction).not.toBeNull(); + expect(plan.primaryNextAction!.targetBoundary).toBe("clientRequest"); + }); + + test("collects recent routes from observations", () => { + const obs = [ + makeObs("a", "clientRequest", { route: "/settings" }), + makeObs("b", "serverHandler", { route: "/dashboard" }), + ]; + const plan = derivePlan(obs, [makeStory("flow-verification")]); + expect(plan.recentRoutes).toContain("/settings"); + expect(plan.recentRoutes).toContain("/dashboard"); + }); + + test("suppresses uiRender action when agent-browser unavailable", () => { + const obs = [ + makeObs("a", "clientRequest"), + makeObs("b", "serverHandler"), + makeObs("c", "environment"), + ]; + const plan = derivePlan(obs, [makeStory("flow-verification")], { + agentBrowserAvailable: false, + }); + expect(plan.primaryNextAction).toBeNull(); + expect(plan.blockedReasons.length).toBeGreaterThan(0); + expect(plan.blockedReasons[0]).toContain("agent-browser unavailable"); + }); + + test("suppresses uiRender action when dev-server loop guard hit", () => { + const obs = [ + makeObs("a", "clientRequest"), + makeObs("b", "serverHandler"), + makeObs("c", "environment"), + ]; + const plan = derivePlan(obs, [makeStory("flow-verification")], { + devServerLoopGuardHit: true, + }); + expect(plan.primaryNextAction).toBeNull(); + expect(plan.blockedReasons.some((r) => r.includes("loop guard"))).toBe(true); + }); + + test("suppresses repeat of last attempted action", () => { + const plan = derivePlan([], [makeStory("flow-verification")], { + lastAttemptedAction: "curl http://localhost:3000/", + }); + // clientRequest was the top priority, but it matches lastAttemptedAction + // so it should move to the next boundary + expect( + plan.primaryNextAction === null || + plan.primaryNextAction.targetBoundary !== "clientRequest", + ).toBe(true); + expect(plan.blockedReasons.some((r) => r.includes("Suppressed repeat"))).toBe(true); + }); +}); + +// --------------------------------------------------------------------------- +// Deterministic serialization +// --------------------------------------------------------------------------- + +describe("serializePlanState", () => { + test("same plan produces identical JSON", () => { + const obs = [ + makeObs("b", "serverHandler", { route: "/dashboard" }), + makeObs("a", "clientRequest", { route: "/settings" }), + ]; + const stories = [makeStory("flow-verification", "/settings")]; + + const plan1 = derivePlan(obs, stories); + const plan2 = derivePlan(obs, stories); + + const json1 = serializePlanState(plan1); + const json2 = serializePlanState(plan2); + expect(json1).toBe(json2); + }); + + test("serialized state is valid JSON with version field", () => { + const plan = derivePlan([], []); + const json = serializePlanState(plan); + const parsed = JSON.parse(json); + expect(parsed.version).toBe(2); + expect(Array.isArray(parsed.observationIds)).toBe(true); + expect(Array.isArray(parsed.satisfiedBoundaries)).toBe(true); + }); + + test("observation ids are sorted in serialized output", () => { + const obs = [ + makeObs("z", "clientRequest"), + makeObs("a", "serverHandler"), + makeObs("m", "environment"), + ]; + const plan = derivePlan(obs, [makeStory("flow-verification")]); + const parsed = JSON.parse(serializePlanState(plan)); + expect(parsed.observationIds).toEqual(["a", "m", "z"]); + }); +}); + +// --------------------------------------------------------------------------- +// Replay determinism +// --------------------------------------------------------------------------- + +describe("replay determinism", () => { + test("replaying same ordered trace produces byte-for-byte equivalent state", () => { + const trace: VerificationObservation[] = [ + makeObs("obs-1", "clientRequest", { route: "/settings", timestamp: T0 }), + makeObs("obs-2", "serverHandler", { route: "/settings", timestamp: T1 }), + makeObs("obs-3", "environment", { timestamp: T2 }), + ]; + const stories = [makeStory("flow-verification", "/settings")]; + + // Replay 1 + const plan1 = derivePlan(trace, stories); + const state1 = serializePlanState(plan1); + + // Replay 2 (same trace) + const plan2 = derivePlan(trace, stories); + const state2 = serializePlanState(plan2); + + expect(state1).toBe(state2); + }); + + test("replaying trace with duplicates produces same state as without", () => { + const obs1 = makeObs("obs-1", "clientRequest", { timestamp: T0 }); + const obs2 = makeObs("obs-2", "serverHandler", { timestamp: T1 }); + const stories = [makeStory("flow-verification")]; + + const planClean = derivePlan([obs1, obs2], stories); + const planDuped = derivePlan([obs1, obs2, obs1, obs2, obs1], stories); + + expect(serializePlanState(planClean)).toBe(serializePlanState(planDuped)); + }); +}); + +// --------------------------------------------------------------------------- +// JSONL persistence +// --------------------------------------------------------------------------- + +describe("JSONL persistence", () => { + test("persistObservation writes JSONL line", () => { + const obs = makeObs("persist-1", "clientRequest"); + persistObservation(testSessionId, obs); + + const content = readFileSync(ledgerPath(testSessionId), "utf-8"); + const lines = content.trim().split("\n"); + expect(lines).toHaveLength(1); + expect(JSON.parse(lines[0]).id).toBe("persist-1"); + }); + + test("multiple observations append as separate lines", () => { + persistObservation(testSessionId, makeObs("p-1", "clientRequest")); + persistObservation(testSessionId, makeObs("p-2", "serverHandler")); + persistObservation(testSessionId, makeObs("p-3", "environment")); + + const content = readFileSync(ledgerPath(testSessionId), "utf-8"); + const lines = content.trim().split("\n"); + expect(lines).toHaveLength(3); + }); + + test("loadObservations reads back persisted observations", () => { + persistObservation(testSessionId, makeObs("load-1", "clientRequest")); + persistObservation(testSessionId, makeObs("load-2", "serverHandler")); + + const loaded = loadObservations(testSessionId); + expect(loaded).toHaveLength(2); + expect(loaded[0].id).toBe("load-1"); + expect(loaded[1].id).toBe("load-2"); + }); + + test("loadObservations returns empty for nonexistent session", () => { + const loaded = loadObservations("nonexistent-session-xyz"); + expect(loaded).toHaveLength(0); + }); +}); + +// --------------------------------------------------------------------------- +// Stories persistence +// --------------------------------------------------------------------------- + +describe("stories persistence", () => { + test("persistStories and loadStories round-trip", () => { + const stories = [makeStory("flow-verification", "/settings")]; + persistStories(testSessionId, stories); + + const loaded = loadStories(testSessionId); + expect(loaded).toHaveLength(1); + expect(loaded[0].kind).toBe("flow-verification"); + expect(loaded[0].route).toBe("/settings"); + }); + + test("loadStories returns empty for nonexistent session", () => { + const loaded = loadStories("nonexistent-session-xyz"); + expect(loaded).toHaveLength(0); + }); +}); + +// --------------------------------------------------------------------------- +// Plan state persistence +// --------------------------------------------------------------------------- + +describe("plan state persistence", () => { + test("persistPlanState and loadPlanState round-trip", () => { + const plan = derivePlan( + [makeObs("s-1", "clientRequest")], + [makeStory("flow-verification")], + ); + persistPlanState(testSessionId, plan); + + const loaded = loadPlanState(testSessionId); + expect(loaded).not.toBeNull(); + expect(loaded!.version).toBe(2); + expect(loaded!.observationIds).toContain("s-1"); + expect(loaded!.satisfiedBoundaries).toContain("clientRequest"); + }); + + test("loadPlanState returns null for nonexistent session", () => { + const loaded = loadPlanState("nonexistent-session-xyz"); + expect(loaded).toBeNull(); + }); +}); + +// --------------------------------------------------------------------------- +// recordObservation full cycle +// --------------------------------------------------------------------------- + +describe("recordObservation", () => { + test("full cycle: append → derive → persist", () => { + // Create a story first + recordStory( + testSessionId, + "flow-verification", + "/settings", + "settings page loads but save fails", + ["verification"], + ); + + // Record observations + const plan1 = recordObservation(testSessionId, makeObs("r-1", "clientRequest", { + route: "/settings", + summary: "curl http://localhost:3000/settings", + })); + expect(plan1.observations).toHaveLength(1); + expect(plan1.satisfiedBoundaries.has("clientRequest")).toBe(true); + + const plan2 = recordObservation(testSessionId, makeObs("r-2", "serverHandler", { + route: "/settings", + summary: "vercel logs", + })); + expect(plan2.observations).toHaveLength(2); + expect(plan2.satisfiedBoundaries.has("serverHandler")).toBe(true); + + // Verify persisted state matches derived state + const persistedState = loadPlanState(testSessionId); + expect(persistedState).not.toBeNull(); + expect(persistedState!.observationIds).toContain("r-1"); + expect(persistedState!.observationIds).toContain("r-2"); + }); + + test("idempotent on duplicate observation id", () => { + recordStory(testSessionId, "flow-verification", null, "test", []); + + const obs = makeObs("dup-1", "clientRequest"); + recordObservation(testSessionId, obs); + const plan = recordObservation(testSessionId, obs); + + // Derive deduplicates — only one observation with this id + expect(plan.observations.filter((o) => o.id === "dup-1")).toHaveLength(1); + }); + + test("duplicate observation id does not append a second ledger line", () => { + recordStory(testSessionId, "flow-verification", "/settings", "test", []); + + const obs = makeObs("dup-ledger-1", "clientRequest", { route: "/settings" }); + recordObservation(testSessionId, obs); + recordObservation(testSessionId, obs); + + const content = readFileSync(ledgerPath(testSessionId), "utf-8"); + expect(content.trim().split("\n")).toHaveLength(1); + }); +}); + +// --------------------------------------------------------------------------- +// recordStory full cycle +// --------------------------------------------------------------------------- + +describe("recordStory", () => { + test("creates story and derives plan", () => { + const plan = recordStory( + testSessionId, + "flow-verification", + "/settings", + "settings page loads but save fails", + ["verification"], + ); + expect(plan.stories).toHaveLength(1); + expect(plan.stories[0].kind).toBe("flow-verification"); + expect(plan.missingBoundaries).toHaveLength(4); // all boundaries missing + expect(plan.primaryNextAction).not.toBeNull(); + }); + + test("merges repeated story creation", () => { + recordStory(testSessionId, "flow-verification", "/settings", "first", ["skill-a"]); + const plan = recordStory(testSessionId, "flow-verification", "/settings", "second", ["skill-b"]); + + expect(plan.stories).toHaveLength(1); + expect(plan.stories[0].requestedSkills).toContain("skill-a"); + expect(plan.stories[0].requestedSkills).toContain("skill-b"); + }); +}); + +// --------------------------------------------------------------------------- +// Bounded reads +// --------------------------------------------------------------------------- + +describe("bounded reads", () => { + test("recent state reads only from session-specific ledger files", () => { + // Create data in one session + recordStory(testSessionId, "flow-verification", null, "test", []); + recordObservation(testSessionId, makeObs("bounded-1", "clientRequest")); + + // A different session should see nothing + const otherSession = `other-${testSessionId}`; + const otherObs = loadObservations(otherSession); + const otherStories = loadStories(otherSession); + expect(otherObs).toHaveLength(0); + expect(otherStories).toHaveLength(0); + + // Clean up other session dir (may not exist) + try { + rmSync(join(tmpdir(), `vercel-plugin-${otherSession}-ledger`), { recursive: true, force: true }); + } catch { /* ignore */ } + }); +}); + +// --------------------------------------------------------------------------- +// Story-scoped state derivation +// --------------------------------------------------------------------------- + +describe("resolveObservationStoryId", () => { + test("returns explicit storyId when set", () => { + const stories = [makeStory("flow-verification", "/settings")]; + const obs = makeObs("a", "clientRequest", { + storyId: "explicit-id", + route: "/settings", + }); + expect(resolveObservationStoryId(obs, stories)).toBe("explicit-id"); + }); + + test("resolves by exact route match when storyId is null", () => { + const stories = [makeStory("flow-verification", "/settings")]; + const obs = makeObs("a", "clientRequest", { route: "/settings" }); + expect(resolveObservationStoryId(obs, stories)).toBe(stories[0].id); + }); + + test("returns null when multiple stories share the same route", () => { + const stories = [ + makeStory("flow-verification", "/settings"), + { ...makeStory("stuck-investigation", "/other"), route: "/settings", id: "other-id" }, + ]; + const obs = makeObs("a", "clientRequest", { route: "/settings" }); + // Two stories with /settings route — ambiguous + expect(resolveObservationStoryId(obs, stories)).toBeNull(); + }); + + test("returns null when no route and no storyId with multiple stories", () => { + const stories = [ + makeStory("flow-verification", "/settings"), + makeStory("flow-verification", "/dashboard"), + ]; + const obs = makeObs("a", "clientRequest"); + expect(resolveObservationStoryId(obs, stories)).toBeNull(); + }); + + test("falls back to single story when no route and no storyId", () => { + const stories = [makeStory("flow-verification", "/settings")]; + const obs = makeObs("a", "clientRequest"); + expect(resolveObservationStoryId(obs, stories)).toBe(stories[0].id); + }); +}); + +describe("collectRecentRoutes", () => { + test("returns routes in most-recent-first order", () => { + const obs = [ + makeObs("a", "clientRequest", { route: "/settings", timestamp: T0 }), + makeObs("b", "serverHandler", { route: "/dashboard", timestamp: T2 }), + makeObs("c", "environment", { route: "/settings", timestamp: T3 }), + ]; + const routes = collectRecentRoutes(obs); + expect(routes).toEqual(["/settings", "/dashboard"]); + }); + + test("skips observations without routes", () => { + const obs = [ + makeObs("a", "environment", { timestamp: T0 }), + makeObs("b", "clientRequest", { route: "/api", timestamp: T1 }), + ]; + expect(collectRecentRoutes(obs)).toEqual(["/api"]); + }); +}); + +describe("deriveStoryStates", () => { + test("initializes empty state for stories with no observations", () => { + const stories = [makeStory("flow-verification", "/settings")]; + const states = deriveStoryStates([], stories); + const state = states[stories[0].id]; + expect(state).toBeDefined(); + expect(state!.satisfiedBoundaries).toEqual([]); + expect(state!.missingBoundaries).toHaveLength(4); + expect(state!.lastObservedAt).toBeNull(); + }); + + test("groups observations into correct stories by route", () => { + const settingsStory = makeStory("flow-verification", "/settings"); + const dashStory = makeStory("flow-verification", "/dashboard"); + const stories = [settingsStory, dashStory]; + + const obs = [ + makeObs("a", "clientRequest", { route: "/settings", timestamp: T0 }), + makeObs("b", "serverHandler", { route: "/dashboard", timestamp: T1 }), + ]; + + const states = deriveStoryStates(obs, stories); + expect(states[settingsStory.id]!.satisfiedBoundaries).toEqual(["clientRequest"]); + expect(states[dashStory.id]!.satisfiedBoundaries).toEqual(["serverHandler"]); + }); + + test("uses explicit storyId over route inference", () => { + const settingsStory = makeStory("flow-verification", "/settings"); + const dashStory = makeStory("flow-verification", "/dashboard"); + const stories = [settingsStory, dashStory]; + + const obs = [ + makeObs("a", "clientRequest", { + route: "/settings", + storyId: dashStory.id, // explicitly assigned to dashboard story + timestamp: T0, + }), + ]; + + const states = deriveStoryStates(obs, stories); + expect(states[settingsStory.id]!.satisfiedBoundaries).toEqual([]); + expect(states[dashStory.id]!.satisfiedBoundaries).toEqual(["clientRequest"]); + }); + + test("computes per-story next action", () => { + const settingsStory = makeStory("flow-verification", "/settings"); + const dashStory = makeStory("flow-verification", "/dashboard"); + const stories = [settingsStory, dashStory]; + + const obs = [ + makeObs("a", "clientRequest", { route: "/settings", timestamp: T0 }), + ]; + + const states = deriveStoryStates(obs, stories); + // Settings: clientRequest satisfied → next should be serverHandler + expect(states[settingsStory.id]!.primaryNextAction?.targetBoundary).toBe("serverHandler"); + // Dashboard: nothing satisfied → next should be clientRequest + expect(states[dashStory.id]!.primaryNextAction?.targetBoundary).toBe("clientRequest"); + }); +}); + +describe("selectActiveStoryId", () => { + test("selects story with most missing boundaries", () => { + const settingsStory = makeStory("flow-verification", "/settings"); + const dashStory = makeStory("flow-verification", "/dashboard"); + const stories = [settingsStory, dashStory]; + + const obs = [ + makeObs("a", "clientRequest", { route: "/settings", timestamp: T0 }), + makeObs("b", "serverHandler", { route: "/settings", timestamp: T1 }), + ]; + + const states = deriveStoryStates(obs, stories); + // Dashboard has 4 missing, settings has 2 → dashboard selected + expect(selectActiveStoryId(stories, states)).toBe(dashStory.id); + }); + + test("returns null for empty stories", () => { + expect(selectActiveStoryId([], {})).toBeNull(); + }); +}); + +// --------------------------------------------------------------------------- +// Story-scoped derivePlan (active-story projection) +// --------------------------------------------------------------------------- + +describe("derivePlan story scoping", () => { + test("top-level fields reflect active story, not session-global evidence", () => { + const settingsStory = makeStory("flow-verification", "/settings"); + const dashStory = makeStory("flow-verification", "/dashboard"); + const stories = [settingsStory, dashStory]; + + // Only settings has observations + const obs = [ + makeObs("a", "clientRequest", { route: "/settings", timestamp: T0 }), + makeObs("b", "serverHandler", { route: "/settings", timestamp: T1 }), + ]; + + const plan = derivePlan(obs, stories); + + // Dashboard is active (more missing boundaries) + expect(plan.activeStoryId).toBe(dashStory.id); + // Top-level shows dashboard's state — no satisfied boundaries + expect(plan.satisfiedBoundaries.size).toBe(0); + expect(plan.missingBoundaries).toHaveLength(4); + expect(plan.primaryNextAction?.targetBoundary).toBe("clientRequest"); + expect(plan.primaryNextAction?.action).toContain("/dashboard"); + }); + + test("storyStates are available for all stories", () => { + const settingsStory = makeStory("flow-verification", "/settings"); + const dashStory = makeStory("flow-verification", "/dashboard"); + const stories = [settingsStory, dashStory]; + + const obs = [ + makeObs("a", "clientRequest", { route: "/settings", timestamp: T0 }), + ]; + + const plan = derivePlan(obs, stories); + expect(Object.keys(plan.storyStates)).toHaveLength(2); + expect(plan.storyStates[settingsStory.id]!.satisfiedBoundaries).toContain("clientRequest"); + expect(plan.storyStates[dashStory.id]!.satisfiedBoundaries).toEqual([]); + }); + + test("primaryNextAction is scoped to the active story, not session-global evidence", () => { + // This is the contamination bug test from the acceptance criteria + recordStory(testSessionId, "flow-verification", "/settings", "settings broken", ["verification"]); + recordObservation(testSessionId, makeObs("obs-settings-client", "clientRequest", { + route: "/settings", + storyId: storyId("flow-verification", "/settings"), + timestamp: T0, + })); + + recordStory(testSessionId, "flow-verification", "/dashboard", "dashboard broken", ["verification"]); + + const plan = derivePlan( + loadObservations(testSessionId), + loadStories(testSessionId), + ); + + // Dashboard should be active (more missing boundaries) + expect(plan.activeStoryId).toBe(storyId("flow-verification", "/dashboard")); + // Next action should target dashboard, not be influenced by settings evidence + expect(plan.primaryNextAction?.targetBoundary).toBe("clientRequest"); + expect(plan.primaryNextAction?.action).toContain("/dashboard"); + }); +}); + +// --------------------------------------------------------------------------- +// V1 → V2 state normalization +// --------------------------------------------------------------------------- + +describe("normalizeSerializedPlanState", () => { + test("passes through V2 state unchanged", () => { + const v2 = JSON.parse(serializePlanState(derivePlan([], []))); + const normalized = normalizeSerializedPlanState(v2); + expect(normalized.version).toBe(2); + expect(normalized).toEqual(v2); + }); + + test("upgrades V1 state to V2 without data loss", () => { + const v1: SerializedPlanStateV1 = { + version: 1, + stories: [makeStory("flow-verification", "/settings")], + observationIds: ["obs-1", "obs-2"], + satisfiedBoundaries: ["clientRequest"], + missingBoundaries: ["serverHandler", "uiRender", "environment"], + recentRoutes: ["/settings"], + primaryNextAction: { + action: "tail server logs /settings", + targetBoundary: "serverHandler", + reason: "No server-side observation yet — check logs for errors", + }, + blockedReasons: [], + }; + + const normalized = normalizeSerializedPlanState(v1); + expect(normalized.version).toBe(2); + expect(normalized.activeStoryId).toBe(v1.stories[0].id); + expect(normalized.storyStates).toHaveLength(1); + + // Top-level fields preserved + expect(normalized.observationIds).toEqual(v1.observationIds); + expect(normalized.satisfiedBoundaries).toEqual(v1.satisfiedBoundaries); + expect(normalized.missingBoundaries).toEqual(v1.missingBoundaries); + expect(normalized.primaryNextAction).toEqual(v1.primaryNextAction); + + // Active story gets the old top-level data + const activeState = normalized.storyStates[0]; + expect(activeState.storyId).toBe(v1.stories[0].id); + expect(activeState.observationIds).toEqual(v1.observationIds); + expect(activeState.satisfiedBoundaries).toEqual(v1.satisfiedBoundaries); + }); + + test("V1 with multiple stories: non-active get empty state", () => { + const story1 = makeStory("flow-verification", "/settings"); + const story2 = { ...makeStory("flow-verification", "/dashboard"), updatedAt: T1 }; + + const v1: SerializedPlanStateV1 = { + version: 1, + stories: [story1, story2], + observationIds: ["obs-1"], + satisfiedBoundaries: ["clientRequest"], + missingBoundaries: ["serverHandler", "uiRender", "environment"], + recentRoutes: ["/settings"], + primaryNextAction: null, + blockedReasons: [], + }; + + const normalized = normalizeSerializedPlanState(v1); + // story2 is more recent → should be active + expect(normalized.activeStoryId).toBe(story2.id); + expect(normalized.storyStates).toHaveLength(2); + + const activeState = normalized.storyStates.find((s) => s.storyId === story2.id); + const inactiveState = normalized.storyStates.find((s) => s.storyId === story1.id); + expect(activeState!.observationIds).toEqual(["obs-1"]); + expect(inactiveState!.observationIds).toEqual([]); + }); + + test("V1 with no stories normalizes cleanly", () => { + const v1: SerializedPlanStateV1 = { + version: 1, + stories: [], + observationIds: [], + satisfiedBoundaries: [], + missingBoundaries: [], + recentRoutes: [], + primaryNextAction: null, + blockedReasons: [], + }; + + const normalized = normalizeSerializedPlanState(v1); + expect(normalized.version).toBe(2); + expect(normalized.activeStoryId).toBeNull(); + expect(normalized.storyStates).toHaveLength(0); + }); +}); + +// --------------------------------------------------------------------------- +// Serialization round-trip (V2) +// --------------------------------------------------------------------------- + +describe("serializePlanState V2", () => { + test("round-trip: derive → serialize → parse preserves story states", () => { + const stories = [ + makeStory("flow-verification", "/settings"), + makeStory("flow-verification", "/dashboard"), + ]; + const obs = [ + makeObs("a", "clientRequest", { route: "/settings", timestamp: T0 }), + ]; + + const plan = derivePlan(obs, stories); + const json = serializePlanState(plan); + const parsed = JSON.parse(json); + + expect(parsed.version).toBe(2); + expect(parsed.activeStoryId).toBe(plan.activeStoryId); + expect(parsed.storyStates).toHaveLength(2); + expect(parsed.storyStates.find((s: any) => s.storyId === stories[0].id) + .satisfiedBoundaries).toContain("clientRequest"); + }); + + test("top-level fields equal active story projection after round-trip", () => { + const stories = [ + makeStory("flow-verification", "/settings"), + makeStory("flow-verification", "/dashboard"), + ]; + const obs = [ + makeObs("a", "clientRequest", { route: "/settings", timestamp: T0 }), + ]; + + const plan = derivePlan(obs, stories); + const json = serializePlanState(plan); + const parsed = JSON.parse(json); + + const activeState = parsed.storyStates.find( + (s: any) => s.storyId === parsed.activeStoryId, + ); + + expect(parsed.satisfiedBoundaries).toEqual( + [...activeState.satisfiedBoundaries].sort(), + ); + expect(parsed.missingBoundaries).toEqual( + [...activeState.missingBoundaries].sort(), + ); + expect(parsed.primaryNextAction).toEqual(activeState.primaryNextAction); + }); +}); diff --git a/tests/verification-observe-env-fallback.test.ts b/tests/verification-observe-env-fallback.test.ts new file mode 100644 index 0000000..5829408 --- /dev/null +++ b/tests/verification-observe-env-fallback.test.ts @@ -0,0 +1,68 @@ +import { describe, test, expect } from "bun:test"; +import { + buildLedgerObservation, + resolveObservedRoute, + type VerificationBoundaryEvent, +} from "../hooks/src/posttooluse-verification-observe.mts"; + +const EVENT: VerificationBoundaryEvent = { + event: "verification.boundary_observed", + boundary: "clientRequest", + verificationId: "verify-1", + command: "curl http://localhost:3000/settings", + matchedPattern: "http-client", + inferredRoute: "/settings", + timestamp: "2026-03-28T00:00:00.000Z", + suggestedBoundary: "clientRequest", + suggestedAction: "curl http://localhost:3000/settings", + matchedSuggestedAction: true, +}; + +describe("verification observer env fallback", () => { + test("buildLedgerObservation copies storyId from directive env", () => { + const obs = buildLedgerObservation(EVENT, { + VERCEL_PLUGIN_VERIFICATION_STORY_ID: "story-123", + } as NodeJS.ProcessEnv); + expect(obs.storyId).toBe("story-123"); + expect(obs.route).toBe("/settings"); + expect(obs.boundary).toBe("clientRequest"); + }); + + test("buildLedgerObservation returns null storyId when env is empty", () => { + const obs = buildLedgerObservation(EVENT, {} as NodeJS.ProcessEnv); + expect(obs.storyId).toBeNull(); + }); + + test("buildLedgerObservation trims whitespace-only storyId to null", () => { + const obs = buildLedgerObservation(EVENT, { + VERCEL_PLUGIN_VERIFICATION_STORY_ID: " ", + } as NodeJS.ProcessEnv); + expect(obs.storyId).toBeNull(); + }); + + test("resolveObservedRoute falls back to directive route env", () => { + const route = resolveObservedRoute(null, { + VERCEL_PLUGIN_VERIFICATION_ROUTE: "/settings", + } as NodeJS.ProcessEnv); + expect(route).toBe("/settings"); + }); + + test("resolveObservedRoute prefers inferred route", () => { + const route = resolveObservedRoute("/dashboard", { + VERCEL_PLUGIN_VERIFICATION_ROUTE: "/settings", + } as NodeJS.ProcessEnv); + expect(route).toBe("/dashboard"); + }); + + test("resolveObservedRoute returns null when both sources are absent", () => { + const route = resolveObservedRoute(null, {} as NodeJS.ProcessEnv); + expect(route).toBeNull(); + }); + + test("resolveObservedRoute trims directive route", () => { + const route = resolveObservedRoute(null, { + VERCEL_PLUGIN_VERIFICATION_ROUTE: " /api/users ", + } as NodeJS.ProcessEnv); + expect(route).toBe("/api/users"); + }); +}); diff --git a/tests/verification-observe-integration.test.ts b/tests/verification-observe-integration.test.ts new file mode 100644 index 0000000..3253a78 --- /dev/null +++ b/tests/verification-observe-integration.test.ts @@ -0,0 +1,895 @@ +import { describe, test, expect, beforeEach, afterEach } from "bun:test"; +import { rmSync } from "node:fs"; +import { tmpdir } from "node:os"; +import { join, resolve } from "node:path"; +import { + buildBoundaryEvent, + classifyBoundary, + envString, + inferRoute, + parseInput, + redactCommand, + resolveObservedRoute, + run, +} from "../hooks/src/posttooluse-verification-observe.mts"; +import { + loadObservations, + loadStories, + loadPlanState, + recordObservation, + recordStory, +} from "../hooks/src/verification-ledger.mts"; +import type { VerificationObservation } from "../hooks/src/verification-ledger.mts"; +import { + appendSkillExposure, + loadProjectRoutingPolicy, + loadSessionExposures, + type SkillExposure, +} from "../hooks/src/routing-policy-ledger.mts"; +import { scenarioKey } from "../hooks/src/routing-policy.mts"; + +// --------------------------------------------------------------------------- +// Stderr capture helper for structured log assertions +// --------------------------------------------------------------------------- + +interface CapturedLogLine { + event: string; + [key: string]: unknown; +} + +function captureStderr(fn: () => void): CapturedLogLine[] { + const lines: CapturedLogLine[] = []; + const origWrite = process.stderr.write; + process.stderr.write = function (chunk: string | Uint8Array, ...args: unknown[]): boolean { + const text = typeof chunk === "string" ? chunk : Buffer.from(chunk).toString(); + for (const line of text.split("\n").filter(Boolean)) { + try { + lines.push(JSON.parse(line)); + } catch { /* ignore non-JSON */ } + } + return true; + } as typeof process.stderr.write; + try { + fn(); + } finally { + process.stderr.write = origWrite; + } + return lines; +} + +// --------------------------------------------------------------------------- +// Helpers +// --------------------------------------------------------------------------- + +const ROOT = resolve(import.meta.dirname, ".."); +const T0 = "2026-03-26T12:00:00.000Z"; + +let testSessionId: string; + +beforeEach(() => { + testSessionId = `test-observe-${Date.now()}-${Math.random().toString(36).slice(2, 8)}`; +}); + +afterEach(() => { + try { + rmSync(join(tmpdir(), `vercel-plugin-${testSessionId}-ledger`), { recursive: true, force: true }); + } catch { /* ignore */ } +}); + +function makeStdinPayload(command: string, sessionId?: string): string { + return JSON.stringify({ + tool_name: "Bash", + tool_input: { command }, + session_id: sessionId ?? testSessionId, + cwd: ROOT, + }); +} + +function makeObs( + id: string, + boundary: "uiRender" | "clientRequest" | "serverHandler" | "environment", + opts?: Partial, +): VerificationObservation { + return { + id, + timestamp: T0, + source: "bash", + boundary, + route: null, + storyId: null, + summary: `obs-${id}`, + ...opts, + }; +} + +// --------------------------------------------------------------------------- +// classifyBoundary +// --------------------------------------------------------------------------- + +describe("classifyBoundary for verification observations", () => { + test("pnpm dev records unknown (launch, not a boundary observation itself)", () => { + // pnpm dev is a dev server launch — it doesn't directly match a boundary + // unless it includes browser/curl/log/env patterns + const result = classifyBoundary("pnpm dev"); + // dev server launch does not match any specific boundary pattern + expect(result.boundary).toBe("unknown"); + }); + + test("curl http://localhost:3000/settings records clientRequest", () => { + const result = classifyBoundary("curl http://localhost:3000/settings"); + expect(result.boundary).toBe("clientRequest"); + expect(result.matchedPattern).toBe("http-client"); + }); + + test("wget http://localhost:3000/api/users records clientRequest", () => { + const result = classifyBoundary("wget http://localhost:3000/api/users"); + expect(result.boundary).toBe("clientRequest"); + }); + + test("vercel logs records serverHandler", () => { + const result = classifyBoundary("vercel logs"); + expect(result.boundary).toBe("serverHandler"); + expect(result.matchedPattern).toBe("vercel-logs"); + }); + + test("tail -f server.log records serverHandler", () => { + const result = classifyBoundary("tail -f server.log"); + expect(result.boundary).toBe("serverHandler"); + }); + + test("printenv records environment", () => { + const result = classifyBoundary("printenv"); + expect(result.boundary).toBe("environment"); + }); + + test("vercel env pull records environment", () => { + const result = classifyBoundary("vercel env pull"); + expect(result.boundary).toBe("environment"); + }); + + test("cat .env.local records environment", () => { + const result = classifyBoundary("cat .env.local"); + expect(result.boundary).toBe("environment"); + }); + + test("open https://localhost:3000/ records uiRender", () => { + const result = classifyBoundary("open https://localhost:3000/"); + expect(result.boundary).toBe("uiRender"); + }); + + test("npx playwright test records uiRender", () => { + const result = classifyBoundary("npx playwright test"); + expect(result.boundary).toBe("uiRender"); + }); +}); + +// --------------------------------------------------------------------------- +// inferRoute +// --------------------------------------------------------------------------- + +describe("inferRoute", () => { + test("recent edits win over URL-derived routes", () => { + const route = inferRoute( + "curl http://localhost:3000/api/data", + "app/settings/page.tsx", + ); + expect(route).toBe("/settings"); + }); + + test("URL route is fallback when no recent edits", () => { + const route = inferRoute("curl http://localhost:3000/settings"); + expect(route).toBe("/settings"); + }); + + test("preserves explicit null when neither source is reliable", () => { + const route = inferRoute("echo hello"); + expect(route).toBeNull(); + }); + + test("strips Next.js file suffixes from edit paths", () => { + const route = inferRoute("ls", "app/dashboard/page.tsx"); + expect(route).toBe("/dashboard"); + }); + + test("converts dynamic segments to param notation", () => { + const route = inferRoute("ls", "app/users/[id]/page.tsx"); + expect(route).toBe("/users/:id"); + }); +}); + +// --------------------------------------------------------------------------- +// redactCommand +// --------------------------------------------------------------------------- + +describe("redactCommand", () => { + test("redacts --token flag values", () => { + const result = redactCommand("vercel --token skt_abc123xyz"); + expect(result).toContain("[REDACTED]"); + expect(result).not.toContain("skt_abc123xyz"); + }); + + test("redacts --password flag values", () => { + const result = redactCommand("mysql --password mysecretpass"); + expect(result).toContain("[REDACTED]"); + }); + + test("truncates long commands to 200 chars plus suffix", () => { + const longCmd = "echo " + "x".repeat(300); + const result = redactCommand(longCmd); + // redactCommand slices to 200 then appends "…[truncated]" suffix + expect(result.length).toBeLessThanOrEqual(200 + "…[truncated]".length); + expect(result).toContain("[truncated]"); + }); + + test("preserves safe commands unchanged", () => { + const cmd = "curl http://localhost:3000/settings"; + expect(redactCommand(cmd)).toBe(cmd); + }); +}); + +// --------------------------------------------------------------------------- +// parseInput +// --------------------------------------------------------------------------- + +describe("parseInput", () => { + test("parses valid Bash tool input", () => { + const result = parseInput(makeStdinPayload("curl http://localhost:3000")); + expect(result).not.toBeNull(); + expect(result!.toolName).toBe("Bash"); + expect(result!.toolInput.command).toBe("curl http://localhost:3000"); + expect(result!.sessionId).toBe(testSessionId); + }); + + test("accepts Read tool (multi-tool support)", () => { + const payload = JSON.stringify({ + tool_name: "Read", + tool_input: { file_path: "/foo" }, + }); + const result = parseInput(payload); + expect(result).not.toBeNull(); + expect(result!.toolName).toBe("Read"); + }); + + test("rejects unsupported tool names", () => { + const payload = JSON.stringify({ + tool_name: "Agent", + tool_input: {}, + }); + expect(parseInput(payload)).toBeNull(); + }); + + test("returns null for empty input", () => { + expect(parseInput("")).toBeNull(); + }); +}); + +// --------------------------------------------------------------------------- +// Ledger integration: observations persist through full cycle +// --------------------------------------------------------------------------- + +describe("observation ledger integration", () => { + test("pnpm dev trace does not record observation (unknown boundary)", () => { + // pnpm dev → unknown → not recorded + const { boundary } = classifyBoundary("pnpm dev"); + expect(boundary).toBe("unknown"); + // Only record if boundary is not unknown + const before = loadObservations(testSessionId); + expect(before).toHaveLength(0); + }); + + test("curl http://localhost:3000/settings records clientRequest with route /settings", () => { + recordStory(testSessionId, "flow-verification", "/settings", "test prompt", []); + const obs = makeObs("curl-test", "clientRequest", { + route: "/settings", + summary: "curl http://localhost:3000/settings", + }); + const plan = recordObservation(testSessionId, obs); + expect(plan.satisfiedBoundaries.has("clientRequest")).toBe(true); + expect(plan.recentRoutes).toContain("/settings"); + }); + + test("vercel logs records serverHandler", () => { + recordStory(testSessionId, "flow-verification", null, "test", []); + const obs = makeObs("logs-test", "serverHandler", { + summary: "vercel logs", + }); + const plan = recordObservation(testSessionId, obs); + expect(plan.satisfiedBoundaries.has("serverHandler")).toBe(true); + }); + + test("printenv records environment", () => { + recordStory(testSessionId, "flow-verification", null, "test", []); + const obs = makeObs("env-test", "environment", { + summary: "printenv", + }); + const plan = recordObservation(testSessionId, obs); + expect(plan.satisfiedBoundaries.has("environment")).toBe(true); + }); + + test("full bash trace sequence builds up boundaries", () => { + recordStory(testSessionId, "flow-verification", "/settings", "settings page loads but save fails", ["verification"]); + + // Simulate: curl → vercel logs → printenv + recordObservation(testSessionId, makeObs("trace-1", "clientRequest", { + route: "/settings", + summary: "curl http://localhost:3000/settings", + })); + recordObservation(testSessionId, makeObs("trace-2", "serverHandler", { + summary: "vercel logs", + })); + const finalPlan = recordObservation(testSessionId, makeObs("trace-3", "environment", { + summary: "printenv", + })); + + expect(finalPlan.observations).toHaveLength(3); + expect(finalPlan.satisfiedBoundaries.has("clientRequest")).toBe(true); + expect(finalPlan.satisfiedBoundaries.has("serverHandler")).toBe(true); + expect(finalPlan.satisfiedBoundaries.has("environment")).toBe(true); + // uiRender still missing + expect(finalPlan.missingBoundaries).toContain("uiRender"); + expect(finalPlan.missingBoundaries).not.toContain("clientRequest"); + }); + + test("observation ids are stable for dedup retries", () => { + recordStory(testSessionId, "flow-verification", null, "test", []); + const obs = makeObs("stable-id", "clientRequest"); + recordObservation(testSessionId, obs); + const plan = recordObservation(testSessionId, obs); // retry + expect(plan.observations.filter((o) => o.id === "stable-id")).toHaveLength(1); + }); +}); + +// --------------------------------------------------------------------------- +// Story creation from prompt +// --------------------------------------------------------------------------- + +describe("verification story from prompt", () => { + test("flow-verification story creation before any bash command", () => { + const plan = recordStory( + testSessionId, + "flow-verification", + "/settings", + "settings page loads but save fails", + ["verification"], + ); + expect(plan.stories).toHaveLength(1); + expect(plan.stories[0].kind).toBe("flow-verification"); + expect(plan.stories[0].route).toBe("/settings"); + expect(plan.missingBoundaries).toHaveLength(4); // all missing initially + }); + + test("stuck-investigation story creation", () => { + const plan = recordStory( + testSessionId, + "stuck-investigation", + null, + "the page is stuck loading", + ["investigation-mode"], + ); + expect(plan.stories).toHaveLength(1); + expect(plan.stories[0].kind).toBe("stuck-investigation"); + }); + + test("browser-only story creation", () => { + const plan = recordStory( + testSessionId, + "browser-only", + "/dashboard", + "blank page on dashboard", + ["agent-browser-verify", "investigation-mode"], + ); + expect(plan.stories).toHaveLength(1); + expect(plan.stories[0].kind).toBe("browser-only"); + }); + + test("repeated similar troubleshooting prompts merge into one active story", () => { + recordStory(testSessionId, "flow-verification", "/settings", "settings page loads but save fails", ["verification"]); + const plan = recordStory(testSessionId, "flow-verification", "/settings", "the settings page still fails on save", ["workflow"]); + + expect(plan.stories).toHaveLength(1); // merged, not duplicated + expect(plan.stories[0].requestedSkills).toContain("verification"); + expect(plan.stories[0].requestedSkills).toContain("workflow"); + expect(plan.stories[0].promptExcerpt).toBe("the settings page still fails on save"); + }); + + test("different routes create separate stories", () => { + recordStory(testSessionId, "flow-verification", "/settings", "settings broken", []); + const plan = recordStory(testSessionId, "flow-verification", "/dashboard", "dashboard broken", []); + expect(plan.stories).toHaveLength(2); + }); +}); + +// --------------------------------------------------------------------------- +// buildBoundaryEvent +// --------------------------------------------------------------------------- + +describe("buildBoundaryEvent", () => { + test("redacts secrets and marks suggested matches", () => { + const event = buildBoundaryEvent({ + command: "curl -H 'Authorization: Bearer sk-secret-value' http://localhost:3000/settings", + boundary: "clientRequest", + matchedPattern: "http-client", + inferredRoute: "/settings", + verificationId: "verification-1", + timestamp: "2026-03-27T00:00:00.000Z", + env: { + VERCEL_PLUGIN_VERIFICATION_BOUNDARY: "clientRequest", + VERCEL_PLUGIN_VERIFICATION_ACTION: "curl http://localhost:3000/settings", + } as NodeJS.ProcessEnv, + }); + + expect(event.command).toContain("[REDACTED]"); + expect(event.command).not.toContain("sk-secret-value"); + expect(event.suggestedBoundary).toBe("clientRequest"); + expect(event.matchedSuggestedAction).toBe(true); + }); + + test("matchedSuggestedAction is false when boundaries differ", () => { + const event = buildBoundaryEvent({ + command: "curl http://localhost:3000/api", + boundary: "clientRequest", + matchedPattern: "http-client", + inferredRoute: "/api", + verificationId: "v2", + timestamp: "2026-03-27T00:00:00.000Z", + env: { + VERCEL_PLUGIN_VERIFICATION_BOUNDARY: "serverHandler", + } as NodeJS.ProcessEnv, + }); + + expect(event.matchedSuggestedAction).toBe(false); + }); + + test("handles missing env vars gracefully", () => { + const event = buildBoundaryEvent({ + command: "curl http://localhost:3000/test", + boundary: "clientRequest", + matchedPattern: "http-client", + inferredRoute: "/test", + verificationId: "v3", + timestamp: "2026-03-27T00:00:00.000Z", + env: {} as NodeJS.ProcessEnv, + }); + + expect(event.suggestedBoundary).toBeNull(); + expect(event.suggestedAction).toBeNull(); + expect(event.matchedSuggestedAction).toBe(false); + }); + + test("truncates command to 200 characters", () => { + const longCommand = "curl " + "x".repeat(300); + const event = buildBoundaryEvent({ + command: longCommand, + boundary: "clientRequest", + matchedPattern: "http-client", + inferredRoute: null, + verificationId: "v4", + env: {} as NodeJS.ProcessEnv, + }); + + expect(event.command.length).toBeLessThanOrEqual(200); + }); +}); + +// --------------------------------------------------------------------------- +// envString helper +// --------------------------------------------------------------------------- + +describe("envString", () => { + test("returns trimmed value for non-empty strings", () => { + const env = { FOO: " bar " } as unknown as NodeJS.ProcessEnv; + expect(envString(env, "FOO")).toBe("bar"); + }); + + test("returns null for blank strings", () => { + const env = { FOO: " " } as unknown as NodeJS.ProcessEnv; + expect(envString(env, "FOO")).toBeNull(); + }); + + test("returns null for empty string", () => { + const env = { FOO: "" } as unknown as NodeJS.ProcessEnv; + expect(envString(env, "FOO")).toBeNull(); + }); + + test("returns null for missing key", () => { + const env = {} as NodeJS.ProcessEnv; + expect(envString(env, "MISSING")).toBeNull(); + }); + + test("returns null for undefined value", () => { + const env = { FOO: undefined } as unknown as NodeJS.ProcessEnv; + expect(envString(env, "FOO")).toBeNull(); + }); +}); + +// --------------------------------------------------------------------------- +// resolveObservedRoute +// --------------------------------------------------------------------------- + +describe("resolveObservedRoute", () => { + test("returns VERCEL_PLUGIN_VERIFICATION_ROUTE when inferred is null", () => { + const env = { VERCEL_PLUGIN_VERIFICATION_ROUTE: "/settings" } as unknown as NodeJS.ProcessEnv; + expect(resolveObservedRoute(null, env)).toBe("/settings"); + }); + + test("prefers inferred route over env fallback", () => { + const env = { VERCEL_PLUGIN_VERIFICATION_ROUTE: "/fallback" } as unknown as NodeJS.ProcessEnv; + expect(resolveObservedRoute("/real", env)).toBe("/real"); + }); + + test("returns null when both inferred and env are absent", () => { + const env = {} as NodeJS.ProcessEnv; + expect(resolveObservedRoute(null, env)).toBeNull(); + }); +}); + +// --------------------------------------------------------------------------- +// Directive-env fallback closes the verified routing loop E2E +// --------------------------------------------------------------------------- + +describe("directive-env fallback closes the routing policy loop", () => { + const projectRoot = ROOT; + + function makeExposure( + sessionId: string, + overrides?: Partial, + ): SkillExposure { + return { + id: `exp-${Date.now()}-${Math.random().toString(36).slice(2, 6)}`, + sessionId, + projectRoot, + storyId: "story-1", + storyKind: "flow-verification", + route: "/settings", + hook: "PreToolUse", + toolName: "Bash", + skill: "verification", + targetBoundary: "clientRequest", + exposureGroupId: null, + attributionRole: "candidate", + candidateSkill: null, + createdAt: T0, + resolvedAt: null, + outcome: "pending", + ...overrides, + }; + } + + test("observer resolves pending exposure as directive-win via env fallback", () => { + // --- Checkpoint 1: Seed a pending exposure --- + const exposure = makeExposure(testSessionId); + appendSkillExposure(exposure); + + const seededExposures = loadSessionExposures(testSessionId); + expect(seededExposures).toHaveLength(1); + expect(seededExposures[0].outcome).toBe("pending"); + expect(seededExposures[0].storyId).toBe("story-1"); + expect(seededExposures[0].route).toBe("/settings"); + expect(seededExposures[0].targetBoundary).toBe("clientRequest"); + + // --- Checkpoint 2: Record a verification story so the observer has plan context --- + // We deliberately do NOT create a matching story in the verification ledger. + // Instead, we rely on directive env fallback for story/route resolution. + + // --- Checkpoint 3: Set directive env vars (simulating subagent bootstrap handoff) --- + const savedEnv = { + VERCEL_PLUGIN_VERIFICATION_STORY_ID: process.env.VERCEL_PLUGIN_VERIFICATION_STORY_ID, + VERCEL_PLUGIN_VERIFICATION_ROUTE: process.env.VERCEL_PLUGIN_VERIFICATION_ROUTE, + VERCEL_PLUGIN_VERIFICATION_BOUNDARY: process.env.VERCEL_PLUGIN_VERIFICATION_BOUNDARY, + VERCEL_PLUGIN_VERIFICATION_ACTION: process.env.VERCEL_PLUGIN_VERIFICATION_ACTION, + VERCEL_PLUGIN_LOG_LEVEL: process.env.VERCEL_PLUGIN_LOG_LEVEL, + }; + + process.env.VERCEL_PLUGIN_VERIFICATION_STORY_ID = "story-1"; + process.env.VERCEL_PLUGIN_VERIFICATION_ROUTE = "/settings"; + process.env.VERCEL_PLUGIN_VERIFICATION_BOUNDARY = "clientRequest"; + process.env.VERCEL_PLUGIN_VERIFICATION_ACTION = "curl http://localhost:3000/settings"; + process.env.VERCEL_PLUGIN_LOG_LEVEL = "off"; + + try { + // --- Checkpoint 4: Run the PostToolUse observer with a matching Bash payload --- + const stdinPayload = makeStdinPayload( + "curl http://localhost:3000/settings", + testSessionId, + ); + const output = run(stdinPayload); + expect(output).toBe("{}"); + + // --- Checkpoint 5: Assert the exposure resolved as directive-win --- + const resolvedExposures = loadSessionExposures(testSessionId); + const resolved = resolvedExposures.filter((e) => e.outcome !== "pending"); + expect(resolved).toHaveLength(1); + expect(resolved[0].outcome).toBe("directive-win"); + expect(resolved[0].resolvedAt).not.toBeNull(); + expect(resolved[0].skill).toBe("verification"); + + // --- Checkpoint 6: Assert project routing policy incremented wins and directiveWins --- + const policy = loadProjectRoutingPolicy(projectRoot); + const scenario = scenarioKey({ + hook: "PreToolUse", + storyKind: "flow-verification", + targetBoundary: "clientRequest", + toolName: "Bash", + }); + const stats = policy.scenarios[scenario]?.["verification"]; + expect(stats).toBeDefined(); + expect(stats!.wins).toBeGreaterThanOrEqual(1); + expect(stats!.directiveWins).toBeGreaterThanOrEqual(1); + } finally { + // Restore env + for (const [key, val] of Object.entries(savedEnv)) { + if (val === undefined) delete process.env[key]; + else process.env[key] = val; + } + } + }); + + test("exposure remains unresolved when directive env is absent and no story matches", () => { + const exposure = makeExposure(testSessionId, { + storyId: "story-orphan", + route: "/orphan", + }); + appendSkillExposure(exposure); + + const savedEnv = { + VERCEL_PLUGIN_VERIFICATION_STORY_ID: process.env.VERCEL_PLUGIN_VERIFICATION_STORY_ID, + VERCEL_PLUGIN_VERIFICATION_ROUTE: process.env.VERCEL_PLUGIN_VERIFICATION_ROUTE, + VERCEL_PLUGIN_VERIFICATION_BOUNDARY: process.env.VERCEL_PLUGIN_VERIFICATION_BOUNDARY, + VERCEL_PLUGIN_VERIFICATION_ACTION: process.env.VERCEL_PLUGIN_VERIFICATION_ACTION, + VERCEL_PLUGIN_LOG_LEVEL: process.env.VERCEL_PLUGIN_LOG_LEVEL, + }; + + // Clear all directive env — the observer has no story context + delete process.env.VERCEL_PLUGIN_VERIFICATION_STORY_ID; + delete process.env.VERCEL_PLUGIN_VERIFICATION_ROUTE; + delete process.env.VERCEL_PLUGIN_VERIFICATION_BOUNDARY; + delete process.env.VERCEL_PLUGIN_VERIFICATION_ACTION; + process.env.VERCEL_PLUGIN_LOG_LEVEL = "off"; + + try { + const stdinPayload = makeStdinPayload( + "curl http://localhost:3000/settings", + testSessionId, + ); + run(stdinPayload); + + // The exposure has storyId="story-orphan" and route="/orphan", + // but the observer inferred route="/settings" and storyId=null. + // Strict null matching prevents resolution. + const exposures = loadSessionExposures(testSessionId); + expect(exposures).toHaveLength(1); + expect(exposures[0].outcome).toBe("pending"); + } finally { + for (const [key, val] of Object.entries(savedEnv)) { + if (val === undefined) delete process.env[key]; + else process.env[key] = val; + } + } + }); +}); + +// --------------------------------------------------------------------------- +// Story-scoped observation isolation (E2E) +// --------------------------------------------------------------------------- + +describe("story-scoped observation isolation", () => { + test("observation for /settings under story-settings does not redirect /dashboard active story", async () => { + const { storyId } = await import("../hooks/src/verification-ledger.mts"); + + // Create /settings story, record an observation under it + recordStory(testSessionId, "flow-verification", "/settings", "settings broken", ["verification"]); + recordObservation(testSessionId, makeObs("iso-settings-1", "clientRequest", { + route: "/settings", + storyId: storyId("flow-verification", "/settings"), + timestamp: "2026-03-27T22:00:00.000Z", + summary: "curl http://localhost:3000/settings", + })); + + // Create newer /dashboard story (should become active) + recordStory(testSessionId, "flow-verification", "/dashboard", "dashboard broken", ["verification"]); + + // Load the plan state and verify isolation + const state = loadPlanState(testSessionId); + expect(state).not.toBeNull(); + expect(state!.activeStoryId).toBe(storyId("flow-verification", "/dashboard")); + + // Top-level projection is the active story (/dashboard) which has no observations + expect(state!.satisfiedBoundaries).toHaveLength(0); + expect(state!.missingBoundaries).toHaveLength(4); + + // The /settings story state should contain the observation + const settingsState = state!.storyStates.find((s) => s.route === "/settings"); + expect(settingsState).toBeDefined(); + expect(settingsState!.satisfiedBoundaries).toContain("clientRequest"); + + // The /dashboard story state should be clean + const dashboardState = state!.storyStates.find((s) => s.route === "/dashboard"); + expect(dashboardState).toBeDefined(); + expect(dashboardState!.satisfiedBoundaries).toHaveLength(0); + }); + + test("run() with VERCEL_PLUGIN_VERIFICATION_STORY_ID persists storyId on observation", () => { + const savedEnv = { + VERCEL_PLUGIN_VERIFICATION_STORY_ID: process.env.VERCEL_PLUGIN_VERIFICATION_STORY_ID, + VERCEL_PLUGIN_LOG_LEVEL: process.env.VERCEL_PLUGIN_LOG_LEVEL, + }; + + const storyIdValue = "explicit-story-123"; + process.env.VERCEL_PLUGIN_VERIFICATION_STORY_ID = storyIdValue; + process.env.VERCEL_PLUGIN_LOG_LEVEL = "off"; + + try { + recordStory(testSessionId, "flow-verification", "/settings", "test story binding", []); + + const stdinPayload = makeStdinPayload( + "curl http://localhost:3000/settings", + testSessionId, + ); + run(stdinPayload); + + const observations = loadObservations(testSessionId); + const obs = observations.find((o) => o.route === "/settings"); + expect(obs).toBeDefined(); + expect(obs!.storyId).toBe(storyIdValue); + } finally { + for (const [key, val] of Object.entries(savedEnv)) { + if (val === undefined) delete process.env[key]; + else process.env[key] = val; + } + } + }); +}); + +// --------------------------------------------------------------------------- +// Resolution-gate telemetry: structured log assertions +// --------------------------------------------------------------------------- + +describe("verification.routing-policy-resolution-gate telemetry", () => { + const ENV_KEYS = [ + "VERCEL_PLUGIN_VERIFICATION_STORY_ID", + "VERCEL_PLUGIN_VERIFICATION_ROUTE", + "VERCEL_PLUGIN_VERIFICATION_BOUNDARY", + "VERCEL_PLUGIN_VERIFICATION_ACTION", + "VERCEL_PLUGIN_LOG_LEVEL", + "VERCEL_PLUGIN_LOCAL_DEV_ORIGIN", + ] as const; + + let savedEnv: Record; + + beforeEach(() => { + savedEnv = {}; + for (const key of ENV_KEYS) { + savedEnv[key] = process.env[key]; + } + }); + + afterEach(() => { + for (const [key, val] of Object.entries(savedEnv)) { + if (val === undefined) delete process.env[key]; + else process.env[key] = val; + } + try { + rmSync(join(tmpdir(), `vercel-plugin-${testSessionId}-ledger`), { recursive: true, force: true }); + } catch { /* ignore */ } + }); + + function makeToolPayload( + toolName: string, + toolInput: Record, + sessionId?: string, + ): string { + return JSON.stringify({ + tool_name: toolName, + tool_input: toolInput, + session_id: sessionId ?? testSessionId, + cwd: ROOT, + }); + } + + test("eligible Bash curl emits resolution-gate with resolutionEligible: true and full payload shape", () => { + process.env.VERCEL_PLUGIN_LOG_LEVEL = "summary"; + process.env.VERCEL_PLUGIN_VERIFICATION_STORY_ID = "story-settings"; + process.env.VERCEL_PLUGIN_VERIFICATION_ROUTE = "/settings"; + process.env.VERCEL_PLUGIN_VERIFICATION_BOUNDARY = "clientRequest"; + + recordStory(testSessionId, "flow-verification", "/settings", "test gate telemetry", []); + + const payload = makeStdinPayload("curl http://localhost:3000/settings", testSessionId); + const logs = captureStderr(() => run(payload)); + + const gate = logs.find((l) => l.event === "verification.routing-policy-resolution-gate"); + expect(gate).toBeDefined(); + + // Required fields per acceptance criteria + expect(gate!.verificationId).toBeString(); + expect((gate!.verificationId as string).length).toBeGreaterThan(0); + expect(gate!.toolName).toBe("Bash"); + expect(gate!.boundary).toBe("clientRequest"); + expect(gate!.inferredRoute).toBe("/settings"); + expect(gate!.resolvedStoryId).toBe("story-settings"); + expect(gate!.resolutionEligible).toBe(true); + + // Eligible path must NOT include a reason field + expect(gate!.reason).toBeUndefined(); + }); + + test("external WebFetch emits resolution-gate with resolutionEligible: false and reason: non_local_webfetch", () => { + process.env.VERCEL_PLUGIN_LOG_LEVEL = "summary"; + delete process.env.VERCEL_PLUGIN_VERIFICATION_STORY_ID; + delete process.env.VERCEL_PLUGIN_VERIFICATION_ROUTE; + delete process.env.VERCEL_PLUGIN_VERIFICATION_BOUNDARY; + + recordStory(testSessionId, "flow-verification", "/settings", "test external webfetch gate", []); + + const payload = makeToolPayload("WebFetch", { url: "https://example.com/settings" }); + const logs = captureStderr(() => run(payload)); + + const gate = logs.find((l) => l.event === "verification.routing-policy-resolution-gate"); + expect(gate).toBeDefined(); + + expect(gate!.verificationId).toBeString(); + expect(gate!.toolName).toBe("WebFetch"); + expect(gate!.boundary).toBe("clientRequest"); + expect(gate!.inferredRoute).toBe("/settings"); + expect(gate!.resolutionEligible).toBe(false); + expect(gate!.blockingReasonCodes).toContain("remote_web_fetch"); + }); + + test("local WebFetch emits resolution-gate with resolutionEligible: true", () => { + process.env.VERCEL_PLUGIN_LOG_LEVEL = "summary"; + process.env.VERCEL_PLUGIN_VERIFICATION_STORY_ID = "story-dashboard"; + + recordStory(testSessionId, "flow-verification", "/dashboard", "test local webfetch gate", []); + + const payload = makeToolPayload("WebFetch", { url: "http://localhost:3000/dashboard" }); + const logs = captureStderr(() => run(payload)); + + const gate = logs.find((l) => l.event === "verification.routing-policy-resolution-gate"); + expect(gate).toBeDefined(); + + expect(gate!.toolName).toBe("WebFetch"); + expect(gate!.boundary).toBe("clientRequest"); + expect(gate!.inferredRoute).toBe("/dashboard"); + expect(gate!.resolvedStoryId).toBe("story-dashboard"); + expect(gate!.resolutionEligible).toBe(true); + expect(gate!.reason).toBeUndefined(); + }); + + test("soft signal (Read .env) emits resolution-gate with resolutionEligible: false and reason: soft_signal_or_unknown_boundary", () => { + process.env.VERCEL_PLUGIN_LOG_LEVEL = "summary"; + delete process.env.VERCEL_PLUGIN_VERIFICATION_STORY_ID; + + recordStory(testSessionId, "flow-verification", null, "test soft signal gate", []); + + const payload = makeToolPayload("Read", { file_path: "/repo/.env.local" }); + const logs = captureStderr(() => run(payload)); + + const gate = logs.find((l) => l.event === "verification.routing-policy-resolution-gate"); + expect(gate).toBeDefined(); + + expect(gate!.toolName).toBe("Read"); + expect(gate!.boundary).toBe("environment"); + expect(gate!.resolutionEligible).toBe(false); + expect(gate!.blockingReasonCodes).toContain("soft_signal"); + }); + + test("resolvedStoryId reflects route-matched story, not merely active story", () => { + process.env.VERCEL_PLUGIN_LOG_LEVEL = "summary"; + delete process.env.VERCEL_PLUGIN_VERIFICATION_STORY_ID; + + // Create two stories — /settings is active (created first), /dashboard second + recordStory(testSessionId, "flow-verification", "/settings", "settings flow", []); + recordStory(testSessionId, "flow-verification", "/dashboard", "dashboard flow", []); + + // Bash curl targets /settings — should resolve to /settings story, not /dashboard (active) + const payload = makeStdinPayload("curl http://localhost:3000/settings", testSessionId); + const logs = captureStderr(() => run(payload)); + + const gate = logs.find((l) => l.event === "verification.routing-policy-resolution-gate"); + expect(gate).toBeDefined(); + expect(gate!.inferredRoute).toBe("/settings"); + // resolvedStoryId should match the /settings story, not the active /dashboard one + expect(gate!.resolvedStoryId).toBeString(); + expect(gate!.resolutionEligible).toBe(true); + }); +}); diff --git a/tests/verification-plan.test.ts b/tests/verification-plan.test.ts new file mode 100644 index 0000000..880c8ee --- /dev/null +++ b/tests/verification-plan.test.ts @@ -0,0 +1,703 @@ +import { describe, test, expect, beforeEach, afterEach } from "bun:test"; +import { rmSync } from "node:fs"; +import { tmpdir } from "node:os"; +import { join } from "node:path"; +import { + type VerificationBoundary, + type VerificationObservation, + type VerificationStoryKind, + derivePlan, + recordObservation, + recordStory, + storyId, +} from "../hooks/src/verification-ledger.mts"; +import { + computePlan, + planToResult, + loadCachedPlanResult, + formatVerificationBanner, + formatPlanHuman, + selectPrimaryStory, + type VerificationPlanResult, + type VerificationPlanStorySummary, +} from "../hooks/src/verification-plan.mts"; + +// --------------------------------------------------------------------------- +// Helpers +// --------------------------------------------------------------------------- + +const T0 = "2026-03-26T12:00:00.000Z"; +const T1 = "2026-03-26T12:01:00.000Z"; + +function makeObs( + id: string, + boundary: VerificationBoundary | null, + opts?: Partial, +): VerificationObservation { + return { + id, + timestamp: T0, + source: "bash", + boundary, + route: null, + storyId: null, + summary: `obs-${id}`, + ...opts, + }; +} + +let testSessionId: string; + +beforeEach(() => { + testSessionId = `test-plan-${Date.now()}-${Math.random().toString(36).slice(2, 8)}`; +}); + +afterEach(() => { + try { + rmSync(join(tmpdir(), `vercel-plugin-${testSessionId}-ledger`), { recursive: true, force: true }); + } catch { /* ignore */ } +}); + +// --------------------------------------------------------------------------- +// computePlan +// --------------------------------------------------------------------------- + +describe("computePlan", () => { + test("returns empty result for new session", () => { + const result = computePlan(testSessionId); + expect(result.hasStories).toBe(false); + expect(result.stories).toHaveLength(0); + expect(result.observationCount).toBe(0); + expect(result.primaryNextAction).toBeNull(); + }); + + test("returns plan with story and observations", () => { + recordStory(testSessionId, "flow-verification", "/settings", "settings page loads but save fails", ["verification"]); + recordObservation(testSessionId, makeObs("obs-1", "clientRequest", { route: "/settings" })); + recordObservation(testSessionId, makeObs("obs-2", "serverHandler", { route: "/settings" })); + + const result = computePlan(testSessionId); + expect(result.hasStories).toBe(true); + expect(result.stories).toHaveLength(1); + expect(result.stories[0].kind).toBe("flow-verification"); + expect(result.observationCount).toBe(2); + expect(result.satisfiedBoundaries).toContain("clientRequest"); + expect(result.satisfiedBoundaries).toContain("serverHandler"); + expect(result.missingBoundaries).toContain("uiRender"); + expect(result.missingBoundaries).toContain("environment"); + }); + + test("next action is first missing boundary in priority order", () => { + recordStory(testSessionId, "flow-verification", null, "test", []); + const result = computePlan(testSessionId); + expect(result.primaryNextAction).not.toBeNull(); + expect(result.primaryNextAction!.targetBoundary).toBe("clientRequest"); + }); + + test("suppresses browser action when agent-browser unavailable", () => { + recordStory(testSessionId, "flow-verification", null, "test", []); + recordObservation(testSessionId, makeObs("a", "clientRequest")); + recordObservation(testSessionId, makeObs("b", "serverHandler")); + recordObservation(testSessionId, makeObs("c", "environment")); + + const result = computePlan(testSessionId, { agentBrowserAvailable: false }); + expect(result.primaryNextAction).toBeNull(); + expect(result.blockedReasons.some((r) => r.includes("agent-browser"))).toBe(true); + }); + + test("suppresses browser action when loop guard hit", () => { + recordStory(testSessionId, "flow-verification", null, "test", []); + recordObservation(testSessionId, makeObs("a", "clientRequest")); + recordObservation(testSessionId, makeObs("b", "serverHandler")); + recordObservation(testSessionId, makeObs("c", "environment")); + + const result = computePlan(testSessionId, { devServerLoopGuardHit: true }); + expect(result.primaryNextAction).toBeNull(); + expect(result.blockedReasons.some((r) => r.includes("loop guard"))).toBe(true); + }); + + test("deterministic for same fixture state", () => { + recordStory(testSessionId, "flow-verification", "/settings", "settings page loads but save fails", ["verification"]); + recordObservation(testSessionId, makeObs("obs-1", "clientRequest", { route: "/settings" })); + + const result1 = computePlan(testSessionId); + const result2 = computePlan(testSessionId); + expect(JSON.stringify(result1)).toBe(JSON.stringify(result2)); + }); +}); + +// --------------------------------------------------------------------------- +// planToResult +// --------------------------------------------------------------------------- + +describe("planToResult", () => { + test("converts plan to serializable result", () => { + const plan = derivePlan( + [makeObs("a", "clientRequest", { route: "/settings" })], + [{ id: storyId("flow-verification", "/settings"), kind: "flow-verification", route: "/settings", promptExcerpt: "test", createdAt: T0, updatedAt: T0, requestedSkills: [] }], + ); + const result = planToResult(plan); + expect(result.hasStories).toBe(true); + expect(result.observationCount).toBe(1); + expect(result.satisfiedBoundaries).toContain("clientRequest"); + expect(Array.isArray(result.missingBoundaries)).toBe(true); + expect(Array.isArray(result.recentRoutes)).toBe(true); + }); + + test("sorts boundaries in result", () => { + const plan = derivePlan( + [ + makeObs("a", "serverHandler"), + makeObs("b", "clientRequest"), + ], + [{ id: storyId("flow-verification", null), kind: "flow-verification", route: null, promptExcerpt: "test", createdAt: T0, updatedAt: T0, requestedSkills: [] }], + ); + const result = planToResult(plan); + // satisfiedBoundaries should be sorted + expect(result.satisfiedBoundaries).toEqual([...result.satisfiedBoundaries].sort()); + expect(result.missingBoundaries).toEqual([...result.missingBoundaries].sort()); + }); +}); + +// --------------------------------------------------------------------------- +// loadCachedPlanResult +// --------------------------------------------------------------------------- + +describe("loadCachedPlanResult", () => { + test("returns null for nonexistent session", () => { + expect(loadCachedPlanResult("nonexistent-session-xyz")).toBeNull(); + }); + + test("returns cached result after recordObservation", () => { + recordStory(testSessionId, "flow-verification", null, "test", []); + recordObservation(testSessionId, makeObs("cached-1", "clientRequest")); + + const result = loadCachedPlanResult(testSessionId); + expect(result).not.toBeNull(); + expect(result!.hasStories).toBe(true); + expect(result!.observationCount).toBe(1); + }); +}); + +// --------------------------------------------------------------------------- +// formatVerificationBanner +// --------------------------------------------------------------------------- + +describe("formatVerificationBanner", () => { + test("returns null when no stories", () => { + const result: VerificationPlanResult = { + hasStories: false, + stories: [], + observationCount: 0, + satisfiedBoundaries: [], + missingBoundaries: [], + recentRoutes: [], + primaryNextAction: null, + blockedReasons: [], + }; + expect(formatVerificationBanner(result)).toBeNull(); + }); + + test("returns null when all boundaries satisfied and no next action", () => { + const result: VerificationPlanResult = { + hasStories: true, + stories: [{ id: "abc", kind: "flow-verification", route: "/settings", promptExcerpt: "test", createdAt: T0, updatedAt: T0 }], + observationCount: 4, + satisfiedBoundaries: ["clientRequest", "environment", "serverHandler", "uiRender"], + missingBoundaries: [], + recentRoutes: ["/settings"], + primaryNextAction: null, + blockedReasons: [], + }; + expect(formatVerificationBanner(result)).toBeNull(); + }); + + test("includes story, evidence, and next action", () => { + const result: VerificationPlanResult = { + hasStories: true, + stories: [{ id: "abc", kind: "flow-verification", route: "/settings", promptExcerpt: "save fails", createdAt: T0, updatedAt: T0 }], + observationCount: 1, + satisfiedBoundaries: ["clientRequest"], + missingBoundaries: ["environment", "serverHandler", "uiRender"], + recentRoutes: ["/settings"], + primaryNextAction: { + action: "tail server logs /settings", + targetBoundary: "serverHandler", + reason: "No server-side observation yet — check logs for errors", + }, + blockedReasons: [], + }; + const banner = formatVerificationBanner(result); + expect(banner).not.toBeNull(); + expect(banner).toContain(""); + expect(banner).toContain("flow-verification"); + expect(banner).toContain("/settings"); + expect(banner).toContain("save fails"); + expect(banner).toContain("1/4 boundaries satisfied"); + expect(banner).toContain("tail server logs"); + expect(banner).toContain(""); + }); + + test("shows blocked reason when no next action possible", () => { + const result: VerificationPlanResult = { + hasStories: true, + stories: [{ id: "abc", kind: "browser-only", route: null, promptExcerpt: "blank page", createdAt: T0, updatedAt: T0 }], + observationCount: 3, + satisfiedBoundaries: ["clientRequest", "environment", "serverHandler"], + missingBoundaries: ["uiRender"], + recentRoutes: [], + primaryNextAction: null, + blockedReasons: ["agent-browser unavailable — cannot emit browser-only action"], + }; + const banner = formatVerificationBanner(result); + expect(banner).not.toBeNull(); + expect(banner).toContain("Blocked:"); + expect(banner).toContain("agent-browser unavailable"); + }); +}); + +// --------------------------------------------------------------------------- +// formatPlanHuman +// --------------------------------------------------------------------------- + +describe("formatPlanHuman", () => { + test("shows no stories message", () => { + const result: VerificationPlanResult = { + hasStories: false, + stories: [], + observationCount: 0, + satisfiedBoundaries: [], + missingBoundaries: [], + recentRoutes: [], + primaryNextAction: null, + blockedReasons: [], + }; + const output = formatPlanHuman(result); + expect(output).toContain("No verification stories"); + }); + + test("shows full plan details", () => { + const result: VerificationPlanResult = { + hasStories: true, + activeStoryId: "abc", + stories: [{ id: "abc", kind: "flow-verification", route: "/settings", promptExcerpt: "save fails", createdAt: T0, updatedAt: T0 }], + storyStates: [{ + storyId: "abc", + storyKind: "flow-verification", + route: "/settings", + observationIds: [], + satisfiedBoundaries: ["clientRequest", "serverHandler"], + missingBoundaries: ["environment", "uiRender"], + recentRoutes: ["/settings"], + primaryNextAction: { + action: "open /settings in agent-browser", + targetBoundary: "uiRender", + reason: "No UI render observation yet", + }, + blockedReasons: [], + lastObservedAt: null, + }], + observationCount: 2, + satisfiedBoundaries: ["clientRequest", "serverHandler"], + missingBoundaries: ["environment", "uiRender"], + recentRoutes: ["/settings"], + primaryNextAction: { + action: "open /settings in agent-browser", + targetBoundary: "uiRender", + reason: "No UI render observation yet", + }, + blockedReasons: [], + }; + const output = formatPlanHuman(result); + expect(output).toContain("Active story:"); + expect(output).toContain("flow-verification"); + expect(output).toContain("/settings"); + expect(output).toContain("2/4 boundaries satisfied"); + expect(output).toContain("Next action:"); + expect(output).toContain("open /settings in agent-browser"); + expect(output).toContain("Reason:"); + }); + + test("shows blocked reasons", () => { + const result: VerificationPlanResult = { + hasStories: true, + stories: [{ id: "abc", kind: "stuck-investigation", route: null, promptExcerpt: "hangs", createdAt: T0, updatedAt: T0 }], + observationCount: 3, + satisfiedBoundaries: ["clientRequest", "environment", "serverHandler"], + missingBoundaries: ["uiRender"], + recentRoutes: [], + primaryNextAction: null, + blockedReasons: ["agent-browser unavailable", "dev-server loop guard hit"], + }; + const output = formatPlanHuman(result); + expect(output).toContain("Next action: blocked"); + expect(output).toContain("agent-browser unavailable"); + expect(output).toContain("dev-server loop guard hit"); + }); + + test("shows all satisfied message", () => { + const result: VerificationPlanResult = { + hasStories: true, + activeStoryId: "abc", + stories: [{ id: "abc", kind: "flow-verification", route: null, promptExcerpt: "test", createdAt: T0, updatedAt: T0 }], + storyStates: [], + observationCount: 4, + satisfiedBoundaries: ["clientRequest", "environment", "serverHandler", "uiRender"], + missingBoundaries: [], + recentRoutes: [], + primaryNextAction: null, + blockedReasons: [], + }; + const output = formatPlanHuman(result); + expect(output).toContain("All verification boundaries satisfied"); + }); +}); + +// --------------------------------------------------------------------------- +// Fixture-based deterministic snapshots +// --------------------------------------------------------------------------- + +describe("deterministic fixture snapshots", () => { + test("settings page loads but save fails", () => { + recordStory(testSessionId, "flow-verification", "/settings", "settings page loads but save fails", ["verification"]); + recordObservation(testSessionId, makeObs("f1-1", "clientRequest", { route: "/settings", summary: "curl http://localhost:3000/settings" })); + recordObservation(testSessionId, makeObs("f1-2", "serverHandler", { route: "/settings", summary: "vercel logs" })); + + const result1 = computePlan(testSessionId); + const result2 = computePlan(testSessionId); + expect(JSON.stringify(result1, null, 2)).toBe(JSON.stringify(result2, null, 2)); + + expect(result1.primaryNextAction).not.toBeNull(); + expect(result1.missingBoundaries).toContain("uiRender"); + expect(result1.missingBoundaries).toContain("environment"); + }); + + test("blank page on dashboard", () => { + recordStory(testSessionId, "browser-only", "/dashboard", "blank page on dashboard", ["agent-browser-verify"]); + + const result = computePlan(testSessionId); + expect(result.hasStories).toBe(true); + expect(result.missingBoundaries).toHaveLength(4); + expect(result.primaryNextAction!.targetBoundary).toBe("clientRequest"); + }); + + test("bash trace: pnpm dev -> curl -> vercel logs", () => { + recordStory(testSessionId, "flow-verification", "/settings", "test", []); + recordObservation(testSessionId, makeObs("t1", "environment", { summary: "pnpm dev" })); + recordObservation(testSessionId, makeObs("t2", "clientRequest", { route: "/settings", summary: "curl /settings" })); + recordObservation(testSessionId, makeObs("t3", "serverHandler", { route: "/settings", summary: "vercel logs" })); + + const result = computePlan(testSessionId); + expect(result.satisfiedBoundaries).toContain("environment"); + expect(result.satisfiedBoundaries).toContain("clientRequest"); + expect(result.satisfiedBoundaries).toContain("serverHandler"); + expect(result.missingBoundaries).toEqual(["uiRender"]); + }); + + test("env trace: vercel env pull / printenv", () => { + recordStory(testSessionId, "stuck-investigation", null, "env vars missing", []); + recordObservation(testSessionId, makeObs("e1", "environment", { summary: "vercel env pull" })); + recordObservation(testSessionId, makeObs("e2", "environment", { summary: "printenv" })); + + const result = computePlan(testSessionId); + expect(result.satisfiedBoundaries).toContain("environment"); + expect(result.missingBoundaries).not.toContain("environment"); + }); + + test("unavailable browser case", () => { + recordStory(testSessionId, "flow-verification", null, "test", []); + recordObservation(testSessionId, makeObs("b1", "clientRequest")); + recordObservation(testSessionId, makeObs("b2", "serverHandler")); + recordObservation(testSessionId, makeObs("b3", "environment")); + + const result = computePlan(testSessionId, { agentBrowserAvailable: false }); + expect(result.primaryNextAction).toBeNull(); + expect(result.blockedReasons).toHaveLength(1); + expect(result.blockedReasons[0]).toContain("agent-browser unavailable"); + }); + + test("repeated launch hitting loop guard", () => { + recordStory(testSessionId, "flow-verification", null, "test", []); + recordObservation(testSessionId, makeObs("lg1", "clientRequest")); + recordObservation(testSessionId, makeObs("lg2", "serverHandler")); + recordObservation(testSessionId, makeObs("lg3", "environment")); + + const result = computePlan(testSessionId, { devServerLoopGuardHit: true }); + expect(result.primaryNextAction).toBeNull(); + expect(result.blockedReasons.some((r) => r.includes("loop guard"))).toBe(true); + }); +}); + +// --------------------------------------------------------------------------- +// No regressions to troubleshooting routing +// --------------------------------------------------------------------------- + +describe("no regressions", () => { + test("computePlan does not throw on missing session", () => { + expect(() => computePlan("nonexistent-session")).not.toThrow(); + }); + + test("planToResult handles empty plan", () => { + const plan = derivePlan([], []); + const result = planToResult(plan); + expect(result.hasStories).toBe(false); + expect(result.primaryNextAction).toBeNull(); + }); + + test("formatVerificationBanner handles result with empty stories gracefully", () => { + const result: VerificationPlanResult = { + hasStories: true, + stories: [], + observationCount: 0, + satisfiedBoundaries: [], + missingBoundaries: ["clientRequest"], + recentRoutes: [], + primaryNextAction: { action: "curl /", targetBoundary: "clientRequest", reason: "test" }, + blockedReasons: [], + }; + const banner = formatVerificationBanner(result); + expect(banner).not.toBeNull(); + expect(banner).toContain("Next action:"); + }); +}); + +// --------------------------------------------------------------------------- +// selectPrimaryStory +// --------------------------------------------------------------------------- + +describe("selectPrimaryStory", () => { + test("returns null for empty array", () => { + expect(selectPrimaryStory([])).toBeNull(); + }); + + test("returns the only story when single element", () => { + const story: VerificationPlanStorySummary = { + id: "only", + kind: "flow-verification", + route: "/settings", + promptExcerpt: "test", + createdAt: T0, + updatedAt: T0, + }; + expect(selectPrimaryStory([story])?.id).toBe("only"); + }); + + test("prefers the most recently updated story", () => { + const result = selectPrimaryStory([ + { + id: "older", + kind: "flow-verification", + route: "/older", + promptExcerpt: "older", + createdAt: "2026-03-27T00:00:00.000Z", + updatedAt: "2026-03-27T00:00:00.000Z", + }, + { + id: "newer", + kind: "flow-verification", + route: "/settings", + promptExcerpt: "newer", + createdAt: "2026-03-27T00:01:00.000Z", + updatedAt: "2026-03-27T00:02:00.000Z", + }, + ]); + + expect(result?.id).toBe("newer"); + }); + + test("breaks updatedAt ties with createdAt", () => { + const result = selectPrimaryStory([ + { + id: "created-first", + kind: "flow-verification", + route: "/a", + promptExcerpt: "a", + createdAt: "2026-03-27T00:00:00.000Z", + updatedAt: "2026-03-27T01:00:00.000Z", + }, + { + id: "created-later", + kind: "flow-verification", + route: "/b", + promptExcerpt: "b", + createdAt: "2026-03-27T00:30:00.000Z", + updatedAt: "2026-03-27T01:00:00.000Z", + }, + ]); + + expect(result?.id).toBe("created-later"); + }); + + test("breaks full tie with id (lexicographic ascending)", () => { + const result = selectPrimaryStory([ + { + id: "beta", + kind: "flow-verification", + route: null, + promptExcerpt: "beta", + createdAt: T0, + updatedAt: T0, + }, + { + id: "alpha", + kind: "flow-verification", + route: null, + promptExcerpt: "alpha", + createdAt: T0, + updatedAt: T0, + }, + ]); + + expect(result?.id).toBe("alpha"); + }); + + test("planToResult includes createdAt and updatedAt in stories", () => { + const plan = derivePlan( + [makeObs("a", "clientRequest")], + [{ + id: storyId("flow-verification", "/settings"), + kind: "flow-verification", + route: "/settings", + promptExcerpt: "test", + createdAt: T0, + updatedAt: T1, + requestedSkills: [], + }], + ); + const result = planToResult(plan); + expect(result.stories[0].createdAt).toBe(T0); + expect(result.stories[0].updatedAt).toBe(T1); + }); +}); + +// --------------------------------------------------------------------------- +// Story-scoped contamination regression tests +// --------------------------------------------------------------------------- + +describe("story-scoped contamination prevention", () => { + test("primaryNextAction is scoped to the active story, not session-global evidence", () => { + // Record a /settings story and satisfy clientRequest for it + recordStory(testSessionId, "flow-verification", "/settings", "settings broken", ["verification"]); + recordObservation(testSessionId, makeObs("obs-settings-client", "clientRequest", { + route: "/settings", + storyId: storyId("flow-verification", "/settings"), + timestamp: "2026-03-27T22:00:00.000Z", + summary: "curl http://localhost:3000/settings", + })); + + // Record a newer /dashboard story (no observations yet) + recordStory(testSessionId, "flow-verification", "/dashboard", "dashboard broken", ["verification"]); + + const result = computePlan(testSessionId); + + // Active story should be /dashboard (more missing boundaries → selectActiveStoryId) + expect(result.activeStoryId).toBe(storyId("flow-verification", "/dashboard")); + + // The /settings clientRequest observation must NOT bleed into /dashboard's projection + expect(result.primaryNextAction).not.toBeNull(); + expect(result.primaryNextAction!.targetBoundary).toBe("clientRequest"); + expect(result.primaryNextAction!.action).toContain("/dashboard"); + + // Verify per-story state isolation + const settingsState = result.storyStates.find((s) => s.route === "/settings"); + const dashboardState = result.storyStates.find((s) => s.route === "/dashboard"); + + expect(settingsState).toBeDefined(); + expect(settingsState!.satisfiedBoundaries).toContain("clientRequest"); + expect(settingsState!.observationIds).toContain("obs-settings-client"); + + expect(dashboardState).toBeDefined(); + expect(dashboardState!.satisfiedBoundaries).toHaveLength(0); + expect(dashboardState!.observationIds).toHaveLength(0); + }); + + test("route-scoped policy recall uses the active story boundary, not a stale story", () => { + // Record a /settings story and satisfy serverHandler for it + recordStory(testSessionId, "flow-verification", "/settings", "settings broken", ["verification"]); + recordObservation(testSessionId, makeObs("obs-settings-server", "serverHandler", { + route: "/settings", + storyId: storyId("flow-verification", "/settings"), + timestamp: "2026-03-27T22:00:00.000Z", + summary: "vercel logs", + })); + + // Record a newer /dashboard story (no observations yet) + recordStory(testSessionId, "flow-verification", "/dashboard", "dashboard broken", ["verification"]); + + const result = computePlan(testSessionId); + + // Active story is /dashboard — should have all 4 boundaries missing + expect(result.activeStoryId).toBe(storyId("flow-verification", "/dashboard")); + expect(result.missingBoundaries).toHaveLength(4); + expect(result.primaryNextAction!.targetBoundary).toBe("clientRequest"); + + // /settings should show serverHandler satisfied, not bleeding + const settingsState = result.storyStates.find((s) => s.route === "/settings"); + expect(settingsState!.satisfiedBoundaries).toContain("serverHandler"); + expect(settingsState!.missingBoundaries).not.toContain("serverHandler"); + }); + + test("observation with explicit storyId does not attach to route-matched story", () => { + // Two stories with different routes + recordStory(testSessionId, "flow-verification", "/settings", "settings broken", []); + recordStory(testSessionId, "flow-verification", "/dashboard", "dashboard broken", []); + + // Observation explicitly tagged for /settings story even though route says /dashboard + recordObservation(testSessionId, makeObs("obs-explicit", "clientRequest", { + route: "/dashboard", + storyId: storyId("flow-verification", "/settings"), + summary: "curl http://localhost:3000/dashboard", + })); + + const result = computePlan(testSessionId); + + // The observation should be attributed to /settings (explicit storyId wins) + const settingsState = result.storyStates.find((s) => s.route === "/settings"); + const dashboardState = result.storyStates.find((s) => s.route === "/dashboard"); + + expect(settingsState!.observationIds).toContain("obs-explicit"); + expect(settingsState!.satisfiedBoundaries).toContain("clientRequest"); + + expect(dashboardState!.observationIds).toHaveLength(0); + expect(dashboardState!.satisfiedBoundaries).toHaveLength(0); + }); + + test("buildLedgerObservation persists storyId from env", async () => { + const { buildBoundaryEvent, buildLedgerObservation } = await import("../hooks/src/posttooluse-verification-observe.mts"); + + const event = buildBoundaryEvent({ + command: "curl http://localhost:3000/settings", + boundary: "clientRequest", + matchedPattern: "http-client", + inferredRoute: "/settings", + verificationId: "v-story-env-1", + timestamp: "2026-03-27T23:00:00.000Z", + env: {} as NodeJS.ProcessEnv, + }); + + const observation = buildLedgerObservation(event, { + VERCEL_PLUGIN_VERIFICATION_STORY_ID: "story-settings", + } as NodeJS.ProcessEnv); + + expect(observation.storyId).toBe("story-settings"); + expect(observation.route).toBe("/settings"); + expect(observation.boundary).toBe("clientRequest"); + }); + + test("buildLedgerObservation storyId is null when env is absent", async () => { + const { buildBoundaryEvent, buildLedgerObservation } = await import("../hooks/src/posttooluse-verification-observe.mts"); + + const event = buildBoundaryEvent({ + command: "curl http://localhost:3000/settings", + boundary: "clientRequest", + matchedPattern: "http-client", + inferredRoute: "/settings", + verificationId: "v-story-env-2", + timestamp: "2026-03-27T23:00:00.000Z", + env: {} as NodeJS.ProcessEnv, + }); + + const observation = buildLedgerObservation(event, {} as NodeJS.ProcessEnv); + expect(observation.storyId).toBeNull(); + }); +}); diff --git a/tests/verification-routing-policy-closure.test.ts b/tests/verification-routing-policy-closure.test.ts new file mode 100644 index 0000000..eeb54cd --- /dev/null +++ b/tests/verification-routing-policy-closure.test.ts @@ -0,0 +1,1457 @@ +import { describe, test, expect, beforeEach, afterEach } from "bun:test"; +import { unlinkSync, rmSync } from "node:fs"; +import { + projectPolicyPath, + sessionExposurePath, + appendSkillExposure, + loadSessionExposures, + loadProjectRoutingPolicy, + resolveBoundaryOutcome, + finalizeStaleExposures, + type SkillExposure, +} from "../hooks/src/routing-policy-ledger.mts"; +import { storyId as computeStoryId } from "../hooks/src/verification-ledger.mts"; +import { + applyPolicyBoosts, + derivePolicyBoost, +} from "../hooks/src/routing-policy.mts"; +import { + readRoutingDecisionTrace, + traceDir, +} from "../hooks/src/routing-decision-trace.mts"; + +// --------------------------------------------------------------------------- +// Fixtures +// --------------------------------------------------------------------------- + +const PROJECT_ROOT = "/tmp/test-project-closure"; +const SESSION_ID = "closure-test-session-" + Date.now(); + +const T0 = "2026-03-27T04:00:00.000Z"; +const T1 = "2026-03-27T04:01:00.000Z"; +const T2 = "2026-03-27T04:02:00.000Z"; +const T3 = "2026-03-27T04:03:00.000Z"; +const T4 = "2026-03-27T04:04:00.000Z"; +const T5 = "2026-03-27T04:05:00.000Z"; +const T_END = "2026-03-27T04:30:00.000Z"; + +function exposure(id: string, overrides: Partial = {}): SkillExposure { + return { + id, + sessionId: SESSION_ID, + projectRoot: PROJECT_ROOT, + storyId: "story-1", + storyKind: "flow-verification", + route: "/dashboard", + hook: "PreToolUse", + toolName: "Bash", + skill: "agent-browser-verify", + targetBoundary: "uiRender", + exposureGroupId: null, + attributionRole: "candidate", + candidateSkill: null, + createdAt: T0, + resolvedAt: null, + outcome: "pending", + ...overrides, + }; +} + +function cleanupFiles() { + try { unlinkSync(projectPolicyPath(PROJECT_ROOT)); } catch {} + try { unlinkSync(sessionExposurePath(SESSION_ID)); } catch {} +} + +// --------------------------------------------------------------------------- +// Tests +// --------------------------------------------------------------------------- + +describe("verification → routing-policy closure", () => { + beforeEach(cleanupFiles); + afterEach(cleanupFiles); + + describe("acceptance: skill injection → boundary observation → policy update", () => { + test("uiRender boundary win increments project policy wins", () => { + // Simulate: agent-browser-verify injected while target boundary = uiRender + appendSkillExposure(exposure("e1", { createdAt: T0 })); + appendSkillExposure(exposure("e2", { createdAt: T1 })); + appendSkillExposure(exposure("e3", { createdAt: T2 })); + + // Simulate: observer sees a uiRender boundary match (scoped to story + route) + const resolved = resolveBoundaryOutcome({ + sessionId: SESSION_ID, + boundary: "uiRender", + matchedSuggestedAction: false, + storyId: "story-1", + route: "/dashboard", + now: T3, + }); + + expect(resolved).toHaveLength(3); + resolved.forEach((e) => expect(e.outcome).toBe("win")); + + // Verify project policy + const policy = loadProjectRoutingPolicy(PROJECT_ROOT); + const stats = policy.scenarios["PreToolUse|flow-verification|uiRender|Bash"]?.["agent-browser-verify"]; + expect(stats).toBeDefined(); + expect(stats!.wins).toBe(3); + expect(stats!.directiveWins).toBe(0); + }); + + test("directive-win increments both wins and directiveWins", () => { + appendSkillExposure(exposure("e1", { createdAt: T0 })); + + const resolved = resolveBoundaryOutcome({ + sessionId: SESSION_ID, + boundary: "uiRender", + matchedSuggestedAction: true, + storyId: "story-1", + route: "/dashboard", + now: T3, + }); + + expect(resolved).toHaveLength(1); + expect(resolved[0].outcome).toBe("directive-win"); + + const policy = loadProjectRoutingPolicy(PROJECT_ROOT); + const stats = policy.scenarios["PreToolUse|flow-verification|uiRender|Bash"]?.["agent-browser-verify"]; + expect(stats!.wins).toBe(1); + expect(stats!.directiveWins).toBe(1); + }); + + test("stale-miss on session end for unresolved exposures", () => { + appendSkillExposure(exposure("e1", { createdAt: T0 })); + appendSkillExposure(exposure("e2", { targetBoundary: "clientRequest", createdAt: T1 })); + + // Resolve only uiRender (scoped to story + route) + resolveBoundaryOutcome({ + sessionId: SESSION_ID, + boundary: "uiRender", + matchedSuggestedAction: false, + storyId: "story-1", + route: "/dashboard", + now: T3, + }); + + // Session end: finalize remaining + const stale = finalizeStaleExposures(SESSION_ID, T_END); + + expect(stale).toHaveLength(1); + expect(stale[0].id).toBe("e2"); + expect(stale[0].outcome).toBe("stale-miss"); + + // Policy should have 1 stale-miss for the clientRequest scenario + const policy = loadProjectRoutingPolicy(PROJECT_ROOT); + const crStats = policy.scenarios["PreToolUse|flow-verification|clientRequest|Bash"]?.["agent-browser-verify"]; + expect(crStats).toBeDefined(); + expect(crStats!.staleMisses).toBe(1); + expect(crStats!.wins).toBe(0); + }); + }); + + describe("end-to-end: exposures → outcomes → policy boosts", () => { + test("5 exposures with 4 wins produces policy boost of 8", () => { + // Record 5 exposures + for (let i = 0; i < 5; i++) { + appendSkillExposure(exposure(`e${i}`, { createdAt: `2026-03-27T04:0${i}:00.000Z` })); + } + + // Resolve 4 as wins + // First batch: 4 exposures at once (simulate resolving 4) + // We need to resolve in batches since they all match the same boundary + resolveBoundaryOutcome({ + sessionId: SESSION_ID, + boundary: "uiRender", + matchedSuggestedAction: false, + storyId: "story-1", + route: "/dashboard", + now: T5, + }); + + // All 5 get resolved as wins (they all had targetBoundary=uiRender) + // Let's adjust: make one exposure have a different boundary + // to get exactly 4 wins out of 5 exposures + cleanupFiles(); + + for (let i = 0; i < 4; i++) { + appendSkillExposure(exposure(`e${i}`, { createdAt: `2026-03-27T04:0${i}:00.000Z` })); + } + appendSkillExposure(exposure("e4", { + targetBoundary: "clientRequest", + createdAt: "2026-03-27T04:04:00.000Z", + })); + + // Resolve 4 uiRender wins (scoped to story + route) + resolveBoundaryOutcome({ + sessionId: SESSION_ID, + boundary: "uiRender", + matchedSuggestedAction: false, + storyId: "story-1", + route: "/dashboard", + now: T5, + }); + + // Finalize the remaining 1 as stale + finalizeStaleExposures(SESSION_ID, T_END); + + const policy = loadProjectRoutingPolicy(PROJECT_ROOT); + const uiStats = policy.scenarios["PreToolUse|flow-verification|uiRender|Bash"]?.["agent-browser-verify"]; + expect(uiStats!.wins).toBe(4); + expect(uiStats!.exposures).toBe(4); + + const simulatedStats = policy.scenarios["PreToolUse|flow-verification|uiRender|Bash"]!["agent-browser-verify"]; + + // rate = 4/5 = 0.80 → boost 8 + expect(derivePolicyBoost(simulatedStats)).toBe(8); + + const boosted = applyPolicyBoosts( + [{ skill: "agent-browser-verify", priority: 7 }], + { + version: 1, + scenarios: { + "PreToolUse|flow-verification|uiRender|Bash": { + "agent-browser-verify": simulatedStats, + }, + }, + }, + { + hook: "PreToolUse", + storyKind: "flow-verification", + targetBoundary: "uiRender", + toolName: "Bash", + }, + ); + + expect(boosted[0].policyBoost).toBe(8); + expect(boosted[0].effectivePriority).toBe(15); + }); + }); + + describe("story/route-scoped resolution in closure", () => { + test("verification for /settings does not resolve /dashboard exposures", () => { + appendSkillExposure(exposure("settings-1", { + storyId: "story-1", + route: "/settings", + targetBoundary: "clientRequest", + createdAt: T0, + })); + appendSkillExposure(exposure("dashboard-1", { + storyId: "story-1", + route: "/dashboard", + targetBoundary: "clientRequest", + createdAt: T1, + })); + + const resolved = resolveBoundaryOutcome({ + sessionId: SESSION_ID, + boundary: "clientRequest", + matchedSuggestedAction: true, + storyId: "story-1", + route: "/settings", + now: T3, + }); + + expect(resolved).toHaveLength(1); + expect(resolved[0].id).toBe("settings-1"); + expect(resolved[0].outcome).toBe("directive-win"); + + // /dashboard exposure remains pending + const all = loadSessionExposures(SESSION_ID); + expect(all.find((e) => e.id === "dashboard-1")!.outcome).toBe("pending"); + + // Finalize: /dashboard becomes stale-miss + const stale = finalizeStaleExposures(SESSION_ID, T_END); + expect(stale).toHaveLength(1); + expect(stale[0].id).toBe("dashboard-1"); + expect(stale[0].outcome).toBe("stale-miss"); + }); + + test("cross-story observation does not over-credit unrelated exposures", () => { + appendSkillExposure(exposure("s1-e1", { + storyId: "story-1", + route: "/settings", + targetBoundary: "clientRequest", + createdAt: T0, + })); + appendSkillExposure(exposure("s2-e1", { + storyId: "story-2", + route: "/dashboard", + targetBoundary: "clientRequest", + createdAt: T1, + })); + + // Observation scoped to story-1 + /settings + const resolved = resolveBoundaryOutcome({ + sessionId: SESSION_ID, + boundary: "clientRequest", + matchedSuggestedAction: false, + storyId: "story-1", + route: "/settings", + now: T3, + }); + + expect(resolved).toHaveLength(1); + expect(resolved[0].id).toBe("s1-e1"); + + // story-2's exposure is unaffected + const all = loadSessionExposures(SESSION_ID); + expect(all.find((e) => e.id === "s2-e1")!.outcome).toBe("pending"); + }); + }); + + describe("mixed boundaries", () => { + test("resolves only matching boundary exposures", () => { + appendSkillExposure(exposure("ui-1", { targetBoundary: "uiRender", createdAt: T0 })); + appendSkillExposure(exposure("cr-1", { targetBoundary: "clientRequest", createdAt: T1 })); + appendSkillExposure(exposure("sh-1", { targetBoundary: "serverHandler", createdAt: T2 })); + + // Resolve only clientRequest (scoped to story + route) + const resolved = resolveBoundaryOutcome({ + sessionId: SESSION_ID, + boundary: "clientRequest", + matchedSuggestedAction: true, + storyId: "story-1", + route: "/dashboard", + now: T3, + }); + + expect(resolved).toHaveLength(1); + expect(resolved[0].id).toBe("cr-1"); + expect(resolved[0].outcome).toBe("directive-win"); + + // Others still pending + const all = loadSessionExposures(SESSION_ID); + expect(all.find((e) => e.id === "ui-1")!.outcome).toBe("pending"); + expect(all.find((e) => e.id === "sh-1")!.outcome).toBe("pending"); + + // Finalize stale + const stale = finalizeStaleExposures(SESSION_ID, T_END); + expect(stale).toHaveLength(2); + expect(stale.every((e) => e.outcome === "stale-miss")).toBe(true); + }); + }); + + describe("multi-skill exposure tracking", () => { + test("resolves different skills for the same boundary independently", () => { + appendSkillExposure(exposure("e1", { + skill: "agent-browser-verify", + createdAt: T0, + })); + appendSkillExposure(exposure("e2", { + skill: "vercel-deploy", + targetBoundary: "uiRender", + createdAt: T1, + })); + + const resolved = resolveBoundaryOutcome({ + sessionId: SESSION_ID, + boundary: "uiRender", + matchedSuggestedAction: false, + storyId: "story-1", + route: "/dashboard", + now: T3, + }); + + expect(resolved).toHaveLength(2); + const skills = resolved.map((e) => e.skill).sort(); + expect(skills).toEqual(["agent-browser-verify", "vercel-deploy"]); + }); + }); + + describe("UserPromptSubmit exposures", () => { + test("tracks prompt-based exposures with null targetBoundary", () => { + appendSkillExposure(exposure("p1", { + hook: "UserPromptSubmit", + toolName: "Prompt", + targetBoundary: null, + createdAt: T0, + })); + + // These can only be finalized as stale (no boundary to match) + const stale = finalizeStaleExposures(SESSION_ID, T_END); + expect(stale).toHaveLength(1); + expect(stale[0].outcome).toBe("stale-miss"); + + const policy = loadProjectRoutingPolicy(PROJECT_ROOT); + const stats = policy.scenarios["UserPromptSubmit|flow-verification|none|Prompt"]?.["agent-browser-verify"]; + expect(stats).toBeDefined(); + expect(stats!.staleMisses).toBe(1); + }); + }); + + describe("null-route attribution in closure", () => { + test("null inferred route does not over-credit route-specific exposures", () => { + // Exposure scoped to /dashboard + appendSkillExposure(exposure("scoped-e1", { + storyId: "story-1", + route: "/dashboard", + targetBoundary: "clientRequest", + createdAt: T0, + })); + + // Resolution with null route (e.g., no route inferrable from command) + const resolved = resolveBoundaryOutcome({ + sessionId: SESSION_ID, + boundary: "clientRequest", + matchedSuggestedAction: false, + storyId: "story-1", + route: null, + now: T3, + }); + + // Should NOT resolve: exposure has route="/dashboard", observed route is null + expect(resolved).toHaveLength(0); + + const all = loadSessionExposures(SESSION_ID); + expect(all[0].outcome).toBe("pending"); + }); + + test("null-route exposures ARE resolved by null-route observations", () => { + // Exposure with null route (e.g., from UserPromptSubmit) + appendSkillExposure(exposure("null-route-e1", { + storyId: null, + route: null, + targetBoundary: "clientRequest", + createdAt: T0, + })); + + const resolved = resolveBoundaryOutcome({ + sessionId: SESSION_ID, + boundary: "clientRequest", + matchedSuggestedAction: true, + storyId: null, + route: null, + now: T3, + }); + + expect(resolved).toHaveLength(1); + expect(resolved[0].outcome).toBe("directive-win"); + }); + }); + + describe("route-scoped policy bucket persistence", () => { + test("one exact-route exposure/outcome cycle writes exact-route, wildcard, and legacy buckets", () => { + appendSkillExposure(exposure("route-bucket-e1", { + route: "/settings", + targetBoundary: "clientRequest", + createdAt: T0, + })); + + resolveBoundaryOutcome({ + sessionId: SESSION_ID, + boundary: "clientRequest", + matchedSuggestedAction: false, + storyId: "story-1", + route: "/settings", + now: T1, + }); + + const policy = loadProjectRoutingPolicy(PROJECT_ROOT); + + // Exact-route bucket + const exactKey = "PreToolUse|flow-verification|clientRequest|Bash|/settings"; + const exactStats = policy.scenarios[exactKey]?.["agent-browser-verify"]; + expect(exactStats).toBeDefined(); + expect(exactStats!.exposures).toBe(1); + expect(exactStats!.wins).toBe(1); + + // Wildcard-route bucket + const wildcardKey = "PreToolUse|flow-verification|clientRequest|Bash|*"; + const wildcardStats = policy.scenarios[wildcardKey]?.["agent-browser-verify"]; + expect(wildcardStats).toBeDefined(); + expect(wildcardStats!.exposures).toBe(1); + expect(wildcardStats!.wins).toBe(1); + + // Legacy 4-part bucket + const legacyKey = "PreToolUse|flow-verification|clientRequest|Bash"; + const legacyStats = policy.scenarios[legacyKey]?.["agent-browser-verify"]; + expect(legacyStats).toBeDefined(); + expect(legacyStats!.exposures).toBe(1); + expect(legacyStats!.wins).toBe(1); + }); + + test("/settings outcomes do not over-credit /dashboard exposures in policy", () => { + // Expose on /dashboard + appendSkillExposure(exposure("dash-policy-e1", { + route: "/dashboard", + targetBoundary: "clientRequest", + createdAt: T0, + })); + + // Expose on /settings + appendSkillExposure(exposure("settings-policy-e1", { + route: "/settings", + targetBoundary: "clientRequest", + createdAt: T1, + })); + + // Resolve only /settings as a win + resolveBoundaryOutcome({ + sessionId: SESSION_ID, + boundary: "clientRequest", + matchedSuggestedAction: true, + storyId: "story-1", + route: "/settings", + now: T2, + }); + + const policy = loadProjectRoutingPolicy(PROJECT_ROOT); + + // /settings exact-route bucket has the win + const settingsKey = "PreToolUse|flow-verification|clientRequest|Bash|/settings"; + const settingsStats = policy.scenarios[settingsKey]?.["agent-browser-verify"]; + expect(settingsStats!.wins).toBe(1); + expect(settingsStats!.directiveWins).toBe(1); + + // /dashboard exact-route bucket has only the exposure, no win + const dashKey = "PreToolUse|flow-verification|clientRequest|Bash|/dashboard"; + const dashStats = policy.scenarios[dashKey]?.["agent-browser-verify"]; + expect(dashStats).toBeDefined(); + expect(dashStats!.exposures).toBe(1); + expect(dashStats!.wins).toBe(0); + expect(dashStats!.directiveWins).toBe(0); + + // Wildcard and legacy buckets see both exposures but only the /settings win + const wildcardKey = "PreToolUse|flow-verification|clientRequest|Bash|*"; + const wildcardStats = policy.scenarios[wildcardKey]?.["agent-browser-verify"]; + expect(wildcardStats!.exposures).toBe(2); + expect(wildcardStats!.wins).toBe(1); + + const legacyKey = "PreToolUse|flow-verification|clientRequest|Bash"; + const legacyStats = policy.scenarios[legacyKey]?.["agent-browser-verify"]; + expect(legacyStats!.exposures).toBe(2); + expect(legacyStats!.wins).toBe(1); + }); + + test("stale-miss finalization writes route-scoped policy for each exposure route", () => { + appendSkillExposure(exposure("stale-dash-e1", { + route: "/dashboard", + targetBoundary: "clientRequest", + createdAt: T0, + })); + appendSkillExposure(exposure("stale-settings-e1", { + route: "/settings", + targetBoundary: "clientRequest", + createdAt: T1, + })); + + finalizeStaleExposures(SESSION_ID, T_END); + + const policy = loadProjectRoutingPolicy(PROJECT_ROOT); + + // Each route's exact bucket gets its own stale-miss + const dashKey = "PreToolUse|flow-verification|clientRequest|Bash|/dashboard"; + expect(policy.scenarios[dashKey]?.["agent-browser-verify"]?.staleMisses).toBe(1); + + const settingsKey = "PreToolUse|flow-verification|clientRequest|Bash|/settings"; + expect(policy.scenarios[settingsKey]?.["agent-browser-verify"]?.staleMisses).toBe(1); + + // Wildcard accumulates both + const wildcardKey = "PreToolUse|flow-verification|clientRequest|Bash|*"; + expect(policy.scenarios[wildcardKey]?.["agent-browser-verify"]?.staleMisses).toBe(2); + }); + }); + + describe("soft signal gating: plan state updated, routing policy untouched", () => { + const SOFT_SESSION = "soft-signal-closure-" + Date.now(); + + afterEach(() => { + try { unlinkSync(projectPolicyPath(PROJECT_ROOT)); } catch {} + try { unlinkSync(sessionExposurePath(SOFT_SESSION)); } catch {} + }); + + test("Read .env.local records observation but does not resolve routing-policy wins", async () => { + const { run } = await import("../hooks/src/posttooluse-verification-observe.mts"); + const { recordStory, removeLedgerArtifacts, loadObservations } = await import("../hooks/src/verification-ledger.mts"); + + try { + recordStory(SOFT_SESSION, "flow-verification", "/settings", "env check", []); + + // Add a pending exposure so we can verify it stays pending + appendSkillExposure(exposure("soft-e1", { + sessionId: SOFT_SESSION, + targetBoundary: "environment", + route: "/settings", + createdAt: T0, + })); + + const input = JSON.stringify({ + tool_name: "Read", + tool_input: { file_path: "/repo/.env.local" }, + session_id: SOFT_SESSION, + }); + + run(input); + + // Observation was recorded in the ledger (plan state updated) + const observations = loadObservations(SOFT_SESSION); + expect(observations.length).toBeGreaterThanOrEqual(1); + const envObs = observations.find((o) => o.meta?.evidenceSource === "env-read"); + expect(envObs).toBeDefined(); + expect(envObs!.boundary).toBe("environment"); + expect(envObs!.meta?.signalStrength).toBe("soft"); + expect(envObs!.meta?.toolName).toBe("Read"); + + // Routing policy was NOT updated — exposure remains pending + const exposures = loadSessionExposures(SOFT_SESSION); + const pending = exposures.filter((e) => e.outcome === "pending"); + expect(pending).toHaveLength(1); + expect(pending[0].id).toBe("soft-e1"); + + // Project policy has no wins + const policy = loadProjectRoutingPolicy(PROJECT_ROOT); + const scenarioKey = "PreToolUse|flow-verification|environment|Bash|/settings"; + const stats = policy.scenarios[scenarioKey]?.["agent-browser-verify"]; + // stats may exist from exposure recording, but wins should be 0 + if (stats) { + expect(stats.wins).toBe(0); + expect(stats.directiveWins).toBe(0); + } + } finally { + removeLedgerArtifacts(SOFT_SESSION); + } + }); + + test("Read server.log records observation but does not resolve routing-policy wins", async () => { + const { run } = await import("../hooks/src/posttooluse-verification-observe.mts"); + const { recordStory, removeLedgerArtifacts, loadObservations } = await import("../hooks/src/verification-ledger.mts"); + + try { + recordStory(SOFT_SESSION, "flow-verification", "/dashboard", "log check", []); + + appendSkillExposure(exposure("soft-log-e1", { + sessionId: SOFT_SESSION, + targetBoundary: "serverHandler", + route: "/dashboard", + createdAt: T0, + })); + + const input = JSON.stringify({ + tool_name: "Read", + tool_input: { file_path: "/repo/.next/server/app.log" }, + session_id: SOFT_SESSION, + }); + + run(input); + + // Observation recorded + const observations = loadObservations(SOFT_SESSION); + const logObs = observations.find((o) => o.meta?.evidenceSource === "log-read"); + expect(logObs).toBeDefined(); + expect(logObs!.boundary).toBe("serverHandler"); + expect(logObs!.meta?.signalStrength).toBe("soft"); + + // Exposure stays pending — soft signal did not resolve policy + const exposures = loadSessionExposures(SOFT_SESSION); + expect(exposures.filter((e) => e.outcome === "pending")).toHaveLength(1); + } finally { + removeLedgerArtifacts(SOFT_SESSION); + } + }); + + test("Bash curl (strong) DOES resolve routing-policy wins — contrast with soft", async () => { + const { run } = await import("../hooks/src/posttooluse-verification-observe.mts"); + const { recordStory, removeLedgerArtifacts } = await import("../hooks/src/verification-ledger.mts"); + + try { + recordStory(SOFT_SESSION, "flow-verification", "/dashboard", "api check", []); + + // Use the real computed story ID so it matches what the observer resolves + const realStoryId = computeStoryId("flow-verification", "/dashboard"); + + appendSkillExposure(exposure("strong-bash-e1", { + sessionId: SOFT_SESSION, + targetBoundary: "clientRequest", + storyId: realStoryId, + route: "/dashboard", + createdAt: T0, + })); + + const input = JSON.stringify({ + tool_name: "Bash", + tool_input: { command: "curl http://localhost:3000/dashboard" }, + session_id: SOFT_SESSION, + }); + + run(input); + + // Bash curl is strong → exposure should be resolved + const exposures = loadSessionExposures(SOFT_SESSION); + const resolved = exposures.filter((e) => e.outcome === "win" || e.outcome === "directive-win"); + expect(resolved.length).toBeGreaterThanOrEqual(1); + } finally { + removeLedgerArtifacts(SOFT_SESSION); + } + }); + + test("finalizeStaleExposures converts unresolved soft-signal exposures to stale-miss", async () => { + const { run } = await import("../hooks/src/posttooluse-verification-observe.mts"); + const { recordStory, removeLedgerArtifacts } = await import("../hooks/src/verification-ledger.mts"); + + try { + recordStory(SOFT_SESSION, "flow-verification", "/settings", "stale check", []); + + appendSkillExposure(exposure("stale-soft-e1", { + sessionId: SOFT_SESSION, + targetBoundary: "environment", + route: "/settings", + createdAt: T0, + })); + + // Soft signal — does NOT resolve policy + run(JSON.stringify({ + tool_name: "Read", + tool_input: { file_path: "/repo/.env.local" }, + session_id: SOFT_SESSION, + })); + + // Session end: pending exposure becomes stale-miss + const stale = finalizeStaleExposures(SOFT_SESSION, T_END); + expect(stale).toHaveLength(1); + expect(stale[0].id).toBe("stale-soft-e1"); + expect(stale[0].outcome).toBe("stale-miss"); + } finally { + removeLedgerArtifacts(SOFT_SESSION); + } + }); + }); + + describe("PostToolUse closure traces", () => { + const TRACE_SESSION = "closure-trace-test-" + Date.now(); + + afterEach(() => { + try { rmSync(traceDir(TRACE_SESSION), { recursive: true, force: true }); } catch {} + }); + + test("boundary observation with session writes PostToolUse routing decision trace", async () => { + const { run } = await import("../hooks/src/posttooluse-verification-observe.mts"); + const { recordStory, removeLedgerArtifacts } = await import("../hooks/src/verification-ledger.mts"); + + try { + recordStory(TRACE_SESSION, "flow-verification", "/settings", "test trace", []); + + const input = JSON.stringify({ + tool_name: "Bash", + tool_input: { command: "curl http://localhost:3000/settings" }, + session_id: TRACE_SESSION, + }); + + run(input); + + const traces = readRoutingDecisionTrace(TRACE_SESSION); + expect(traces.length).toBeGreaterThanOrEqual(1); + + const postTrace = traces.find((t) => t.hook === "PostToolUse"); + expect(postTrace).toBeDefined(); + expect(postTrace!.version).toBe(2); + expect(postTrace!.hook).toBe("PostToolUse"); + expect(postTrace!.toolName).toBe("Bash"); + expect(postTrace!.verification).not.toBeNull(); + expect(postTrace!.verification!.verificationId).toBeTruthy(); + expect(postTrace!.verification!.observedBoundary).toBe("clientRequest"); + expect(typeof postTrace!.verification!.matchedSuggestedAction).toBe("boolean"); + + // PostToolUse traces never fabricate ranking data + expect(postTrace!.matchedSkills).toEqual([]); + expect(postTrace!.injectedSkills).toEqual([]); + expect(postTrace!.ranked).toEqual([]); + } finally { + removeLedgerArtifacts(TRACE_SESSION); + } + }); + + test("closure trace and routing-policy-resolved share correlation data", async () => { + const { run } = await import("../hooks/src/posttooluse-verification-observe.mts"); + const { recordStory, removeLedgerArtifacts } = await import("../hooks/src/verification-ledger.mts"); + + try { + recordStory(TRACE_SESSION, "flow-verification", "/dashboard", "correlation test", []); + + // Add an exposure so routing-policy-resolved fires + appendSkillExposure(exposure("corr-1", { + sessionId: TRACE_SESSION, + targetBoundary: "uiRender", + route: "/dashboard", + createdAt: T0, + })); + + const input = JSON.stringify({ + tool_name: "Bash", + tool_input: { command: "open http://localhost:3000/dashboard" }, + session_id: TRACE_SESSION, + }); + + run(input); + + const traces = readRoutingDecisionTrace(TRACE_SESSION); + const postTrace = traces.find((t) => t.hook === "PostToolUse"); + expect(postTrace).toBeDefined(); + + // Trace carries the same story identity as policy resolution + expect(postTrace!.primaryStory.kind).toBe("flow-verification"); + expect(postTrace!.primaryStory.storyRoute).not.toBeNull(); + expect(postTrace!.verification!.verificationId).toBeTruthy(); + expect(postTrace!.verification!.observedBoundary).toBe("uiRender"); + + // policyScenario should be set when primary story exists + expect(postTrace!.policyScenario).toMatch(/^PostToolUse\|flow-verification\|/); + } finally { + removeLedgerArtifacts(TRACE_SESSION); + } + }); + + test("trace without active story includes no_active_verification_story skip reason", async () => { + const { run } = await import("../hooks/src/posttooluse-verification-observe.mts"); + const { removeLedgerArtifacts } = await import("../hooks/src/verification-ledger.mts"); + + // Use a session with no stories recorded + const noStorySession = "no-story-trace-" + Date.now(); + try { + const input = JSON.stringify({ + tool_name: "Bash", + tool_input: { command: "curl http://localhost:3000/api/test" }, + session_id: noStorySession, + }); + + run(input); + + const traces = readRoutingDecisionTrace(noStorySession); + const postTrace = traces.find((t) => t.hook === "PostToolUse"); + expect(postTrace).toBeDefined(); + expect(postTrace!.skippedReasons).toContain("no_active_verification_story"); + expect(postTrace!.policyScenario).toBeNull(); + expect(postTrace!.primaryStory.id).toBeNull(); + } finally { + removeLedgerArtifacts(noStorySession); + try { rmSync(traceDir(noStorySession), { recursive: true, force: true }); } catch {} + } + }); + }); + + // --------------------------------------------------------------------------- + // E2E signal fusion: multi-tool, strong/soft gating, route-scoped resolution + // --------------------------------------------------------------------------- + + describe("signal fusion E2E: multi-tool verification closure", () => { + const FUSION_SESSION = "signal-fusion-e2e-" + Date.now(); + + afterEach(() => { + try { unlinkSync(projectPolicyPath(PROJECT_ROOT)); } catch {} + try { unlinkSync(sessionExposurePath(FUSION_SESSION)); } catch {} + try { rmSync(traceDir(FUSION_SESSION), { recursive: true, force: true }); } catch {} + }); + + test("Bash curl records clientRequest strong and resolves the correct pending exposure", async () => { + const { run } = await import("../hooks/src/posttooluse-verification-observe.mts"); + const { recordStory, removeLedgerArtifacts, loadObservations } = await import("../hooks/src/verification-ledger.mts"); + const { storyId: computeStoryId } = await import("../hooks/src/verification-ledger.mts"); + + try { + recordStory(FUSION_SESSION, "flow-verification", "/dashboard", "dashboard verify", []); + const realStoryId = computeStoryId("flow-verification", "/dashboard"); + + appendSkillExposure(exposure("fusion-curl-e1", { + sessionId: FUSION_SESSION, + targetBoundary: "clientRequest", + storyId: realStoryId, + route: "/dashboard", + createdAt: T0, + })); + + run(JSON.stringify({ + tool_name: "Bash", + tool_input: { command: "curl http://localhost:3000/dashboard" }, + session_id: FUSION_SESSION, + })); + + // Observation was recorded + const observations = loadObservations(FUSION_SESSION); + const curlObs = observations.find((o) => o.meta?.matchedPattern === "http-client"); + expect(curlObs).toBeDefined(); + expect(curlObs!.boundary).toBe("clientRequest"); + expect(curlObs!.meta?.signalStrength).toBe("strong"); + expect(curlObs!.meta?.evidenceSource).toBe("bash"); + + // Strong signal → exposure resolved + const exposures = loadSessionExposures(FUSION_SESSION); + const resolved = exposures.filter((e) => e.outcome === "win" || e.outcome === "directive-win"); + expect(resolved.length).toBeGreaterThanOrEqual(1); + expect(resolved[0].id).toBe("fusion-curl-e1"); + } finally { + removeLedgerArtifacts(FUSION_SESSION); + } + }); + + test(".env.local read records environment soft, affects plan state, does NOT resolve routing-policy", async () => { + const { run } = await import("../hooks/src/posttooluse-verification-observe.mts"); + const { recordStory, removeLedgerArtifacts, loadObservations } = await import("../hooks/src/verification-ledger.mts"); + + try { + recordStory(FUSION_SESSION, "flow-verification", "/settings", "env check", []); + + // Seed a pending exposure for environment boundary + appendSkillExposure(exposure("fusion-env-e1", { + sessionId: FUSION_SESSION, + targetBoundary: "environment", + route: "/settings", + createdAt: T0, + })); + + run(JSON.stringify({ + tool_name: "Read", + tool_input: { file_path: "/repo/.env.local" }, + session_id: FUSION_SESSION, + })); + + // Observation recorded in ledger (plan state affected) + const observations = loadObservations(FUSION_SESSION); + const envObs = observations.find((o) => o.meta?.evidenceSource === "env-read"); + expect(envObs).toBeDefined(); + expect(envObs!.boundary).toBe("environment"); + expect(envObs!.meta?.signalStrength).toBe("soft"); + expect(envObs!.meta?.toolName).toBe("Read"); + + // Routing policy NOT updated — exposure stays pending + const exposures = loadSessionExposures(FUSION_SESSION); + expect(exposures.filter((e) => e.outcome === "pending")).toHaveLength(1); + expect(exposures[0].id).toBe("fusion-env-e1"); + + // Project policy has zero wins + const policy = loadProjectRoutingPolicy(PROJECT_ROOT); + const scenarioKey = "PreToolUse|flow-verification|environment|Bash|/settings"; + const stats = policy.scenarios[scenarioKey]?.["agent-browser-verify"]; + if (stats) { + expect(stats.wins).toBe(0); + expect(stats.directiveWins).toBe(0); + } + } finally { + removeLedgerArtifacts(FUSION_SESSION); + } + }); + + test("server log read records serverHandler soft, affects plan state only, does NOT resolve routing-policy", async () => { + const { run } = await import("../hooks/src/posttooluse-verification-observe.mts"); + const { recordStory, removeLedgerArtifacts, loadObservations } = await import("../hooks/src/verification-ledger.mts"); + + try { + recordStory(FUSION_SESSION, "flow-verification", "/dashboard", "log inspect", []); + + appendSkillExposure(exposure("fusion-log-e1", { + sessionId: FUSION_SESSION, + targetBoundary: "serverHandler", + route: "/dashboard", + createdAt: T0, + })); + + run(JSON.stringify({ + tool_name: "Read", + tool_input: { file_path: "/repo/.next/server/app.log" }, + session_id: FUSION_SESSION, + })); + + // Observation recorded (plan state affected) + const observations = loadObservations(FUSION_SESSION); + const logObs = observations.find((o) => o.meta?.evidenceSource === "log-read"); + expect(logObs).toBeDefined(); + expect(logObs!.boundary).toBe("serverHandler"); + expect(logObs!.meta?.signalStrength).toBe("soft"); + + // Routing policy NOT updated — exposure stays pending + const exposures = loadSessionExposures(FUSION_SESSION); + expect(exposures.filter((e) => e.outcome === "pending")).toHaveLength(1); + expect(exposures[0].id).toBe("fusion-log-e1"); + } finally { + removeLedgerArtifacts(FUSION_SESSION); + } + }); + + test("Bash browser command records uiRender strong and resolves only matching story/route", async () => { + const { run } = await import("../hooks/src/posttooluse-verification-observe.mts"); + const { recordStory, removeLedgerArtifacts, loadObservations } = await import("../hooks/src/verification-ledger.mts"); + const { storyId: computeStoryId } = await import("../hooks/src/verification-ledger.mts"); + + try { + recordStory(FUSION_SESSION, "flow-verification", "/dashboard", "browser verify", []); + const realStoryId = computeStoryId("flow-verification", "/dashboard"); + + // Exposure on /dashboard (uiRender) + appendSkillExposure(exposure("fusion-browser-dash", { + sessionId: FUSION_SESSION, + targetBoundary: "uiRender", + storyId: realStoryId, + route: "/dashboard", + createdAt: T0, + })); + + // Exposure on /settings (uiRender) — different route, should NOT be resolved + appendSkillExposure(exposure("fusion-browser-settings", { + sessionId: FUSION_SESSION, + targetBoundary: "uiRender", + storyId: realStoryId, + route: "/settings", + createdAt: T1, + })); + + run(JSON.stringify({ + tool_name: "Bash", + tool_input: { command: "open http://localhost:3000/dashboard" }, + session_id: FUSION_SESSION, + })); + + // Observation is uiRender + strong + const observations = loadObservations(FUSION_SESSION); + const browserObs = observations.find((o) => o.boundary === "uiRender"); + expect(browserObs).toBeDefined(); + expect(browserObs!.meta?.signalStrength).toBe("strong"); + expect(browserObs!.meta?.evidenceSource).toBe("browser"); + + // Only /dashboard exposure resolved; /settings stays pending + const exposures = loadSessionExposures(FUSION_SESSION); + const dashExposure = exposures.find((e) => e.id === "fusion-browser-dash"); + const settingsExposure = exposures.find((e) => e.id === "fusion-browser-settings"); + expect(dashExposure!.outcome).toBe("win"); + expect(settingsExposure!.outcome).toBe("pending"); + } finally { + removeLedgerArtifacts(FUSION_SESSION); + } + }); + + test("route mismatch resolves nothing; finalizeStaleExposures converts unresolved to stale-miss", async () => { + const { run } = await import("../hooks/src/posttooluse-verification-observe.mts"); + const { recordStory, removeLedgerArtifacts } = await import("../hooks/src/verification-ledger.mts"); + const { storyId: computeStoryId } = await import("../hooks/src/verification-ledger.mts"); + + try { + recordStory(FUSION_SESSION, "flow-verification", "/settings", "route mismatch", []); + const realStoryId = computeStoryId("flow-verification", "/settings"); + + // Exposure targeting /settings clientRequest + appendSkillExposure(exposure("fusion-mismatch-e1", { + sessionId: FUSION_SESSION, + targetBoundary: "clientRequest", + storyId: realStoryId, + route: "/settings", + createdAt: T0, + })); + + // Observer sees curl /dashboard — route mismatch with /settings exposure + run(JSON.stringify({ + tool_name: "Bash", + tool_input: { command: "curl http://localhost:3000/dashboard" }, + session_id: FUSION_SESSION, + })); + + // Exposure still pending (route mismatch: /dashboard observation vs /settings exposure) + const exposures = loadSessionExposures(FUSION_SESSION); + expect(exposures).toHaveLength(1); + expect(exposures[0].outcome).toBe("pending"); + + // Session end: finalize converts to stale-miss + const stale = finalizeStaleExposures(FUSION_SESSION, T_END); + expect(stale).toHaveLength(1); + expect(stale[0].id).toBe("fusion-mismatch-e1"); + expect(stale[0].outcome).toBe("stale-miss"); + expect(stale[0].resolvedAt).toBe(T_END); + + // Project policy reflects stale-miss, not a win + const policy = loadProjectRoutingPolicy(PROJECT_ROOT); + const key = "PreToolUse|flow-verification|clientRequest|Bash|/settings"; + const stats = policy.scenarios[key]?.["agent-browser-verify"]; + expect(stats).toBeDefined(); + expect(stats!.staleMisses).toBe(1); + expect(stats!.wins).toBe(0); + } finally { + removeLedgerArtifacts(FUSION_SESSION); + } + }); + + test("full signal fusion: mixed strong/soft tools in one session, only strong resolves policy", async () => { + const { run } = await import("../hooks/src/posttooluse-verification-observe.mts"); + const { recordStory, removeLedgerArtifacts, loadObservations } = await import("../hooks/src/verification-ledger.mts"); + const { storyId: computeStoryId } = await import("../hooks/src/verification-ledger.mts"); + + try { + recordStory(FUSION_SESSION, "flow-verification", "/dashboard", "fusion test", []); + const realStoryId = computeStoryId("flow-verification", "/dashboard"); + + // Four exposures: one per boundary + appendSkillExposure(exposure("fusion-all-cr", { + sessionId: FUSION_SESSION, + targetBoundary: "clientRequest", + storyId: realStoryId, + route: "/dashboard", + createdAt: T0, + })); + appendSkillExposure(exposure("fusion-all-env", { + sessionId: FUSION_SESSION, + targetBoundary: "environment", + storyId: realStoryId, + route: "/dashboard", + createdAt: T1, + })); + appendSkillExposure(exposure("fusion-all-sh", { + sessionId: FUSION_SESSION, + targetBoundary: "serverHandler", + storyId: realStoryId, + route: "/dashboard", + createdAt: T2, + })); + appendSkillExposure(exposure("fusion-all-ui", { + sessionId: FUSION_SESSION, + targetBoundary: "uiRender", + storyId: realStoryId, + route: "/dashboard", + createdAt: T3, + })); + + // Step 1: Soft env read — records observation, does NOT resolve policy + run(JSON.stringify({ + tool_name: "Read", + tool_input: { file_path: "/repo/.env.local" }, + session_id: FUSION_SESSION, + })); + + // Step 2: Soft log read — records observation, does NOT resolve policy + run(JSON.stringify({ + tool_name: "Read", + tool_input: { file_path: "/repo/.next/server/app.log" }, + session_id: FUSION_SESSION, + })); + + // After soft signals: all 4 exposures still pending + let exposures = loadSessionExposures(FUSION_SESSION); + expect(exposures.filter((e) => e.outcome === "pending")).toHaveLength(4); + + // Step 3: Strong curl — resolves clientRequest exposure only + run(JSON.stringify({ + tool_name: "Bash", + tool_input: { command: "curl http://localhost:3000/dashboard" }, + session_id: FUSION_SESSION, + })); + + exposures = loadSessionExposures(FUSION_SESSION); + expect(exposures.find((e) => e.id === "fusion-all-cr")!.outcome).toBe("win"); + expect(exposures.find((e) => e.id === "fusion-all-env")!.outcome).toBe("pending"); + expect(exposures.find((e) => e.id === "fusion-all-sh")!.outcome).toBe("pending"); + expect(exposures.find((e) => e.id === "fusion-all-ui")!.outcome).toBe("pending"); + + // Step 4: Strong browser — resolves uiRender exposure only + run(JSON.stringify({ + tool_name: "Bash", + tool_input: { command: "open http://localhost:3000/dashboard" }, + session_id: FUSION_SESSION, + })); + + exposures = loadSessionExposures(FUSION_SESSION); + expect(exposures.find((e) => e.id === "fusion-all-ui")!.outcome).toBe("win"); + + // env and serverHandler exposures still pending (soft signals didn't resolve them) + expect(exposures.find((e) => e.id === "fusion-all-env")!.outcome).toBe("pending"); + expect(exposures.find((e) => e.id === "fusion-all-sh")!.outcome).toBe("pending"); + + // Verify observations were all recorded + const observations = loadObservations(FUSION_SESSION); + expect(observations.length).toBeGreaterThanOrEqual(4); + + // Boundaries observed: env, serverHandler, clientRequest, uiRender + const boundaries = new Set(observations.map((o) => o.boundary)); + expect(boundaries.has("environment")).toBe(true); + expect(boundaries.has("serverHandler")).toBe(true); + expect(boundaries.has("clientRequest")).toBe(true); + expect(boundaries.has("uiRender")).toBe(true); + + // Finalize: remaining 2 soft-only exposures become stale-miss + const stale = finalizeStaleExposures(FUSION_SESSION, T_END); + expect(stale).toHaveLength(2); + expect(stale.every((e) => e.outcome === "stale-miss")).toBe(true); + const staleIds = stale.map((e) => e.id).sort(); + expect(staleIds).toEqual(["fusion-all-env", "fusion-all-sh"]); + + // Final policy state: clientRequest and uiRender have wins, others have stale-misses + const policy = loadProjectRoutingPolicy(PROJECT_ROOT); + const crKey = "PreToolUse|flow-verification|clientRequest|Bash|/dashboard"; + expect(policy.scenarios[crKey]?.["agent-browser-verify"]?.wins).toBe(1); + const uiKey = "PreToolUse|flow-verification|uiRender|Bash|/dashboard"; + expect(policy.scenarios[uiKey]?.["agent-browser-verify"]?.wins).toBe(1); + const envKey = "PreToolUse|flow-verification|environment|Bash|/dashboard"; + expect(policy.scenarios[envKey]?.["agent-browser-verify"]?.staleMisses).toBe(1); + const shKey = "PreToolUse|flow-verification|serverHandler|Bash|/dashboard"; + expect(policy.scenarios[shKey]?.["agent-browser-verify"]?.staleMisses).toBe(1); + } finally { + removeLedgerArtifacts(FUSION_SESSION); + } + }); + }); + + // --------------------------------------------------------------------------- + // Regression: companion recall does not distort cap/budget/attribution closure + // --------------------------------------------------------------------------- + + describe("companion recall parity guards", () => { + afterEach(cleanupFiles); + + test("companion-recalled context exposure does not steal candidate attribution from direct match", () => { + // Direct match candidate exposure + appendSkillExposure(exposure("comp-parity-e1", { + skill: "agent-browser-verify", + attributionRole: "candidate", + candidateSkill: null, + targetBoundary: "uiRender", + createdAt: T0, + })); + + // Companion-recalled context exposure for the same story + appendSkillExposure(exposure("comp-parity-e2", { + skill: "verification", + attributionRole: "context", + candidateSkill: "agent-browser-verify", + targetBoundary: "uiRender", + createdAt: T1, + })); + + // Resolve boundary — both exposures win + const resolved = resolveBoundaryOutcome({ + sessionId: SESSION_ID, + boundary: "uiRender", + matchedSuggestedAction: false, + storyId: "story-1", + route: "/dashboard", + now: T3, + }); + + expect(resolved).toHaveLength(2); + expect(resolved.every((e) => e.outcome === "win")).toBe(true); + + // Policy should credit ONLY the candidate — context exposures must NOT + // affect the routing policy (shouldAffectPolicy gates on attributionRole) + const policy = loadProjectRoutingPolicy(PROJECT_ROOT); + const key = "PreToolUse|flow-verification|uiRender|Bash|/dashboard"; + const candidateStats = policy.scenarios[key]?.["agent-browser-verify"]; + expect(candidateStats).toBeDefined(); + expect(candidateStats!.wins).toBeGreaterThanOrEqual(1); + + // Context companion must NOT appear in policy scenarios + const contextStats = policy.scenarios[key]?.["verification"]; + expect(contextStats).toBeUndefined(); + }); + + test("policy boost from companion context wins does not exceed direct candidate boost", () => { + // Build policy where direct candidate has strong history and companion has mild history + const policy = loadProjectRoutingPolicy(PROJECT_ROOT); + const key = "PreToolUse|flow-verification|uiRender|Bash|/dashboard"; + if (!policy.scenarios[key]) policy.scenarios[key] = {}; + policy.scenarios[key]["agent-browser-verify"] = { + exposures: 10, + wins: 8, + directiveWins: 3, + staleMisses: 1, + lastUpdatedAt: T0, + }; + policy.scenarios[key]["verification"] = { + exposures: 3, + wins: 2, + directiveWins: 0, + staleMisses: 1, + lastUpdatedAt: T0, + }; + + const { saveProjectRoutingPolicy: savePRP } = require("../hooks/src/routing-policy-ledger.mts"); + savePRP(PROJECT_ROOT, policy); + + const reloaded = loadProjectRoutingPolicy(PROJECT_ROOT); + + // Derive boosts and verify the candidate always gets a higher boost + const candidateBoost = derivePolicyBoost(reloaded.scenarios[key]!["agent-browser-verify"]); + const companionBoost = derivePolicyBoost(reloaded.scenarios[key]!["verification"]); + + expect(candidateBoost).toBeGreaterThan(companionBoost); + }); + + test("stale-miss finalization applies only to candidate exposures, not companion context", () => { + // Candidate exposure + appendSkillExposure(exposure("stale-comp-e1", { + skill: "agent-browser-verify", + attributionRole: "candidate", + targetBoundary: "uiRender", + createdAt: T0, + })); + + // Companion context exposure + appendSkillExposure(exposure("stale-comp-e2", { + skill: "verification", + attributionRole: "context", + candidateSkill: "agent-browser-verify", + targetBoundary: "uiRender", + createdAt: T1, + })); + + // Finalize without boundary resolution + const stale = finalizeStaleExposures(SESSION_ID, T_END); + expect(stale).toHaveLength(2); + expect(stale.every((e) => e.outcome === "stale-miss")).toBe(true); + + // Only candidate exposure should affect routing policy; + // context companion is excluded by shouldAffectPolicy + const policy = loadProjectRoutingPolicy(PROJECT_ROOT); + const key = "PreToolUse|flow-verification|uiRender|Bash|/dashboard"; + expect(policy.scenarios[key]?.["agent-browser-verify"]?.staleMisses).toBe(1); + // Context companion must NOT appear in policy scenarios + expect(policy.scenarios[key]?.["verification"]).toBeUndefined(); + }); + }); + + // --------------------------------------------------------------------------- + // Playbook credit-safety: only anchor skill accumulates policy wins + // --------------------------------------------------------------------------- + + describe("playbook credit-safe exposure attribution", () => { + const PLAYBOOK_SESSION = "playbook-policy-test-" + Date.now(); + + afterEach(() => { + try { unlinkSync(projectPolicyPath(PROJECT_ROOT)); } catch {} + try { unlinkSync(sessionExposurePath(PLAYBOOK_SESSION)); } catch {} + }); + + test("verified playbook credits only the anchor skill to project policy", () => { + // Anchor skill: "verification" (candidate) + appendSkillExposure(exposure("pb-anchor", { + sessionId: PLAYBOOK_SESSION, + skill: "verification", + attributionRole: "candidate", + candidateSkill: "verification", + exposureGroupId: "playbook-group-1", + targetBoundary: "clientRequest", + storyId: "story-1", + route: "/settings", + createdAt: T0, + })); + + // Inserted playbook step 1: "workflow" (context) + appendSkillExposure(exposure("pb-step1", { + sessionId: PLAYBOOK_SESSION, + skill: "workflow", + attributionRole: "context", + candidateSkill: "verification", + exposureGroupId: "playbook-group-1", + targetBoundary: "clientRequest", + storyId: "story-1", + route: "/settings", + createdAt: T1, + })); + + // Inserted playbook step 2: "agent-browser-verify" (context) + appendSkillExposure(exposure("pb-step2", { + sessionId: PLAYBOOK_SESSION, + skill: "agent-browser-verify", + attributionRole: "context", + candidateSkill: "verification", + exposureGroupId: "playbook-group-1", + targetBoundary: "clientRequest", + storyId: "story-1", + route: "/settings", + createdAt: T2, + })); + + // Resolve the boundary — all three exposures match + const resolved = resolveBoundaryOutcome({ + sessionId: PLAYBOOK_SESSION, + boundary: "clientRequest", + matchedSuggestedAction: false, + storyId: "story-1", + route: "/settings", + now: T3, + }); + + expect(resolved).toHaveLength(3); + expect(resolved.every((e) => e.outcome === "win")).toBe(true); + + // Project policy: only the anchor skill ("verification") gets policy credit + const policy = loadProjectRoutingPolicy(PROJECT_ROOT); + const scenarioKey = "PreToolUse|flow-verification|clientRequest|Bash|/settings"; + const anchorStats = policy.scenarios[scenarioKey]?.["verification"]; + expect(anchorStats).toBeDefined(); + expect(anchorStats!.wins).toBe(1); + + // Inserted playbook steps must NOT appear in project policy + const step1Stats = policy.scenarios[scenarioKey]?.["workflow"]; + expect(step1Stats).toBeUndefined(); + + const step2Stats = policy.scenarios[scenarioKey]?.["agent-browser-verify"]; + // agent-browser-verify should have no wins from this playbook batch + // (it may have exposure count from appendSkillExposure but no wins) + if (step2Stats) { + expect(step2Stats.wins).toBe(0); + } + }); + + test("playbook context steps are persisted in session ledger for inspection", () => { + appendSkillExposure(exposure("pb-ledger-anchor", { + sessionId: PLAYBOOK_SESSION, + skill: "verification", + attributionRole: "candidate", + candidateSkill: "verification", + exposureGroupId: "playbook-group-2", + targetBoundary: "clientRequest", + storyId: "story-1", + route: "/settings", + createdAt: T0, + })); + + appendSkillExposure(exposure("pb-ledger-step", { + sessionId: PLAYBOOK_SESSION, + skill: "workflow", + attributionRole: "context", + candidateSkill: "verification", + exposureGroupId: "playbook-group-2", + targetBoundary: "clientRequest", + storyId: "story-1", + route: "/settings", + createdAt: T1, + })); + + // Both are in the session ledger + const all = loadSessionExposures(PLAYBOOK_SESSION); + expect(all).toHaveLength(2); + expect(all.find((e) => e.id === "pb-ledger-anchor")!.attributionRole).toBe("candidate"); + expect(all.find((e) => e.id === "pb-ledger-step")!.attributionRole).toBe("context"); + expect(all.find((e) => e.id === "pb-ledger-step")!.candidateSkill).toBe("verification"); + }); + + test("stale-miss finalization for playbook batch credits only anchor", () => { + appendSkillExposure(exposure("pb-stale-anchor", { + sessionId: PLAYBOOK_SESSION, + skill: "verification", + attributionRole: "candidate", + candidateSkill: "verification", + exposureGroupId: "playbook-group-3", + targetBoundary: "clientRequest", + storyId: "story-1", + route: "/settings", + createdAt: T0, + })); + + appendSkillExposure(exposure("pb-stale-step", { + sessionId: PLAYBOOK_SESSION, + skill: "workflow", + attributionRole: "context", + candidateSkill: "verification", + exposureGroupId: "playbook-group-3", + targetBoundary: "clientRequest", + storyId: "story-1", + route: "/settings", + createdAt: T1, + })); + + // Session end — no boundary resolution + const stale = finalizeStaleExposures(PLAYBOOK_SESSION, T_END); + expect(stale).toHaveLength(2); + expect(stale.every((e) => e.outcome === "stale-miss")).toBe(true); + + // Only anchor's stale-miss affects project policy + const policy = loadProjectRoutingPolicy(PROJECT_ROOT); + const key = "PreToolUse|flow-verification|clientRequest|Bash|/settings"; + expect(policy.scenarios[key]?.["verification"]?.staleMisses).toBe(1); + expect(policy.scenarios[key]?.["workflow"]).toBeUndefined(); + }); + }); +}); diff --git a/tests/verification-signal.test.ts b/tests/verification-signal.test.ts new file mode 100644 index 0000000..6267696 --- /dev/null +++ b/tests/verification-signal.test.ts @@ -0,0 +1,453 @@ +import { describe, expect, test } from "bun:test"; +import { + classifyVerificationSignal, + type NormalizedVerificationSignal, +} from "../hooks/src/verification-signal.mts"; + +// --------------------------------------------------------------------------- +// Helper: assert classification result shape +// --------------------------------------------------------------------------- + +function expectSignal( + result: NormalizedVerificationSignal | null, + expected: Partial, +): void { + expect(result).not.toBeNull(); + for (const [key, value] of Object.entries(expected)) { + expect((result as Record)[key]).toBe(value); + } +} + +// --------------------------------------------------------------------------- +// Bash strong signals +// --------------------------------------------------------------------------- + +describe("classifyVerificationSignal — Bash strong signals", () => { + test("curl http request → clientRequest + strong + bash", () => { + const result = classifyVerificationSignal({ + toolName: "Bash", + toolInput: { command: "curl http://localhost:3000/dashboard" }, + }); + expectSignal(result, { + boundary: "clientRequest", + matchedPattern: "http-client", + signalStrength: "strong", + evidenceSource: "bash", + toolName: "Bash", + inferredRoute: "/dashboard", + }); + }); + + test("wget request → clientRequest + strong", () => { + const result = classifyVerificationSignal({ + toolName: "Bash", + toolInput: { command: "wget http://localhost:3000/api/users" }, + }); + expectSignal(result, { + boundary: "clientRequest", + signalStrength: "strong", + evidenceSource: "bash", + inferredRoute: "/api/users", + }); + }); + + test("playwright command → uiRender + strong + browser", () => { + const result = classifyVerificationSignal({ + toolName: "Bash", + toolInput: { command: "npx playwright test" }, + }); + expectSignal(result, { + boundary: "uiRender", + matchedPattern: "playwright-cli", + signalStrength: "strong", + evidenceSource: "browser", + toolName: "Bash", + }); + }); + + test("open browser URL → uiRender + strong", () => { + const result = classifyVerificationSignal({ + toolName: "Bash", + toolInput: { command: "open https://localhost:3000/settings" }, + }); + expectSignal(result, { + boundary: "uiRender", + signalStrength: "strong", + evidenceSource: "browser", + }); + }); + + test("vercel logs → serverHandler + strong", () => { + const result = classifyVerificationSignal({ + toolName: "Bash", + toolInput: { command: "vercel logs --follow" }, + }); + expectSignal(result, { + boundary: "serverHandler", + signalStrength: "strong", + }); + }); + + test("printenv → environment + strong", () => { + const result = classifyVerificationSignal({ + toolName: "Bash", + toolInput: { command: "printenv DATABASE_URL" }, + }); + expectSignal(result, { + boundary: "environment", + signalStrength: "strong", + evidenceSource: "bash", + }); + }); + + test("Bash with empty command → null", () => { + const result = classifyVerificationSignal({ + toolName: "Bash", + toolInput: { command: "" }, + }); + expect(result).toBeNull(); + }); + + test("Bash with unrecognized command → null", () => { + const result = classifyVerificationSignal({ + toolName: "Bash", + toolInput: { command: "ls -la" }, + }); + expect(result).toBeNull(); + }); +}); + +// --------------------------------------------------------------------------- +// Browser tool strong signals +// --------------------------------------------------------------------------- + +describe("classifyVerificationSignal — browser tool strong signals", () => { + test("agent_browser → uiRender + strong + browser", () => { + const result = classifyVerificationSignal({ + toolName: "agent_browser", + toolInput: { url: "http://localhost:3000/profile" }, + }); + expectSignal(result, { + boundary: "uiRender", + matchedPattern: "browser-tool", + signalStrength: "strong", + evidenceSource: "browser", + toolName: "agent_browser", + inferredRoute: "/profile", + }); + }); + + test("mcp__browser__screenshot → uiRender + strong (no URL)", () => { + const result = classifyVerificationSignal({ + toolName: "mcp__browser__screenshot", + toolInput: {}, + }); + expectSignal(result, { + boundary: "uiRender", + signalStrength: "strong", + evidenceSource: "browser", + toolName: "mcp__browser__screenshot", + }); + expect(result!.inferredRoute).toBeNull(); + expect(result!.summary).toBe("mcp__browser__screenshot"); + }); + + test("mcp__playwright__navigate → uiRender + strong with route", () => { + const result = classifyVerificationSignal({ + toolName: "mcp__playwright__navigate", + toolInput: { url: "http://localhost:3000/settings/account" }, + }); + expectSignal(result, { + boundary: "uiRender", + signalStrength: "strong", + inferredRoute: "/settings/account", + }); + }); +}); + +// --------------------------------------------------------------------------- +// HTTP tool strong signals +// --------------------------------------------------------------------------- + +describe("classifyVerificationSignal — HTTP tool strong signals", () => { + test("WebFetch → clientRequest + strong + http", () => { + const result = classifyVerificationSignal({ + toolName: "WebFetch", + toolInput: { url: "https://example.com/api/data" }, + }); + expectSignal(result, { + boundary: "clientRequest", + matchedPattern: "web-fetch", + signalStrength: "strong", + evidenceSource: "http", + toolName: "WebFetch", + inferredRoute: "/api/data", + }); + }); + + test("WebFetch without URL → null", () => { + const result = classifyVerificationSignal({ + toolName: "WebFetch", + toolInput: {}, + }); + expect(result).toBeNull(); + }); + + test("mcp__fetch__fetch → clientRequest + strong + http", () => { + const result = classifyVerificationSignal({ + toolName: "mcp__fetch__fetch", + toolInput: { url: "http://localhost:3000/api/health" }, + }); + expectSignal(result, { + boundary: "clientRequest", + matchedPattern: "http-tool", + signalStrength: "strong", + evidenceSource: "http", + toolName: "mcp__fetch__fetch", + inferredRoute: "/api/health", + }); + }); + + test("mcp__http__post without URL → still strong http", () => { + const result = classifyVerificationSignal({ + toolName: "mcp__http__post", + toolInput: { body: '{"key":"val"}' }, + }); + expectSignal(result, { + boundary: "clientRequest", + matchedPattern: "http-tool", + signalStrength: "strong", + evidenceSource: "http", + }); + expect(result!.inferredRoute).toBeNull(); + }); +}); + +// --------------------------------------------------------------------------- +// Soft signals — env reads +// --------------------------------------------------------------------------- + +describe("classifyVerificationSignal — env-read soft signals", () => { + test("Read .env.local → environment + soft + env-read", () => { + const result = classifyVerificationSignal({ + toolName: "Read", + toolInput: { file_path: "/repo/.env.local" }, + }); + expectSignal(result, { + boundary: "environment", + matchedPattern: "env-file-read", + signalStrength: "soft", + evidenceSource: "env-read", + toolName: "Read", + }); + expect(result!.inferredRoute).toBeNull(); + }); + + test("Read vercel.json → environment + soft", () => { + const result = classifyVerificationSignal({ + toolName: "Read", + toolInput: { file_path: "/repo/vercel.json" }, + }); + expectSignal(result, { + boundary: "environment", + matchedPattern: "vercel-config-read", + signalStrength: "soft", + evidenceSource: "env-read", + }); + }); + + test("Read .vercel/project.json → environment + soft", () => { + const result = classifyVerificationSignal({ + toolName: "Read", + toolInput: { file_path: "/repo/.vercel/project.json" }, + }); + expectSignal(result, { + boundary: "environment", + signalStrength: "soft", + }); + }); + + test("Grep in .env → environment + soft", () => { + const result = classifyVerificationSignal({ + toolName: "Grep", + toolInput: { pattern: "API_KEY", path: ".env" }, + }); + expectSignal(result, { + boundary: "environment", + matchedPattern: "env-grep", + signalStrength: "soft", + evidenceSource: "env-read", + }); + }); + + test("Glob for .env* → environment + soft", () => { + const result = classifyVerificationSignal({ + toolName: "Glob", + toolInput: { pattern: ".env*" }, + }); + expectSignal(result, { + boundary: "environment", + signalStrength: "soft", + evidenceSource: "env-read", + }); + }); +}); + +// --------------------------------------------------------------------------- +// Soft signals — log reads +// --------------------------------------------------------------------------- + +describe("classifyVerificationSignal — log-read soft signals", () => { + test("Read server.log → serverHandler + soft + log-read", () => { + const result = classifyVerificationSignal({ + toolName: "Read", + toolInput: { file_path: "/repo/.next/server/app.log" }, + }); + expectSignal(result, { + boundary: "serverHandler", + matchedPattern: "log-file-read", + signalStrength: "soft", + evidenceSource: "log-read", + toolName: "Read", + }); + }); + + test("Grep in log directory → serverHandler + soft", () => { + const result = classifyVerificationSignal({ + toolName: "Grep", + toolInput: { pattern: "ERROR", path: "/var/log/app.log" }, + }); + expectSignal(result, { + boundary: "serverHandler", + matchedPattern: "log-grep", + signalStrength: "soft", + evidenceSource: "log-read", + }); + }); + + test("Glob for *.log → serverHandler + soft", () => { + const result = classifyVerificationSignal({ + toolName: "Glob", + toolInput: { pattern: "**/*.log" }, + }); + expectSignal(result, { + boundary: "serverHandler", + signalStrength: "soft", + evidenceSource: "log-read", + }); + }); +}); + +// --------------------------------------------------------------------------- +// Unsupported / null cases +// --------------------------------------------------------------------------- + +describe("classifyVerificationSignal — unsupported evidence", () => { + test("Read generic .ts file → null", () => { + const result = classifyVerificationSignal({ + toolName: "Read", + toolInput: { file_path: "/repo/src/index.ts" }, + }); + expect(result).toBeNull(); + }); + + test("Edit → null (mutations, not observations)", () => { + const result = classifyVerificationSignal({ + toolName: "Edit", + toolInput: { file_path: "/repo/src/page.tsx" }, + }); + expect(result).toBeNull(); + }); + + test("Write → null (mutations, not observations)", () => { + const result = classifyVerificationSignal({ + toolName: "Write", + toolInput: { file_path: "/repo/src/page.tsx" }, + }); + expect(result).toBeNull(); + }); + + test("Unknown tool → null", () => { + const result = classifyVerificationSignal({ + toolName: "SomeFutureTool", + toolInput: { data: "test" }, + }); + expect(result).toBeNull(); + }); + + test("Grep in generic path → null", () => { + const result = classifyVerificationSignal({ + toolName: "Grep", + toolInput: { pattern: "foo", path: "/repo/src" }, + }); + expect(result).toBeNull(); + }); + + test("Glob for *.ts → null", () => { + const result = classifyVerificationSignal({ + toolName: "Glob", + toolInput: { pattern: "**/*.ts" }, + }); + expect(result).toBeNull(); + }); +}); + +// --------------------------------------------------------------------------- +// Determinism +// --------------------------------------------------------------------------- + +describe("classifyVerificationSignal — determinism", () => { + test("same input always produces same output", () => { + const input = { + toolName: "Bash", + toolInput: { command: "curl http://localhost:3000/dashboard" }, + }; + const r1 = classifyVerificationSignal(input); + const r2 = classifyVerificationSignal(input); + expect(r1).toEqual(r2); + }); + + test("same null result for same unsupported input", () => { + const input = { toolName: "Edit", toolInput: { file_path: "/repo/x.ts" } }; + expect(classifyVerificationSignal(input)).toBeNull(); + expect(classifyVerificationSignal(input)).toBeNull(); + }); +}); + +// --------------------------------------------------------------------------- +// Expected outputs from task spec +// --------------------------------------------------------------------------- + +describe("classifyVerificationSignal — spec examples", () => { + test("curl http://localhost:3000/dashboard → spec output", () => { + const result = classifyVerificationSignal({ + toolName: "Bash", + toolInput: { command: "curl http://localhost:3000/dashboard" }, + }); + expect(result).toEqual({ + boundary: "clientRequest", + matchedPattern: "http-client", + inferredRoute: "/dashboard", + signalStrength: "strong", + evidenceSource: "bash", + summary: "curl http://localhost:3000/dashboard", + toolName: "Bash", + }); + }); + + test("Read .env.local → spec output", () => { + const result = classifyVerificationSignal({ + toolName: "Read", + toolInput: { file_path: "/repo/.env.local" }, + }); + expect(result).toEqual({ + boundary: "environment", + matchedPattern: "env-file-read", + inferredRoute: null, + signalStrength: "soft", + evidenceSource: "env-read", + summary: "/repo/.env.local", + toolName: "Read", + }); + }); +}); diff --git a/tests/verify-plan-cli.test.ts b/tests/verify-plan-cli.test.ts new file mode 100644 index 0000000..3d63734 --- /dev/null +++ b/tests/verify-plan-cli.test.ts @@ -0,0 +1,296 @@ +import { describe, test, expect, beforeEach, afterEach } from "bun:test"; +import { rmSync } from "node:fs"; +import { tmpdir } from "node:os"; +import { join, resolve } from "node:path"; +import { + recordObservation, + recordStory, + storyId, + type VerificationObservation, + type VerificationBoundary, +} from "../hooks/src/verification-ledger.mts"; +import { verifyPlan, formatPlanHuman } from "../src/commands/verify-plan.ts"; + +// --------------------------------------------------------------------------- +// Helpers +// --------------------------------------------------------------------------- + +const T0 = "2026-03-26T12:00:00.000Z"; + +function makeObs( + id: string, + boundary: VerificationBoundary | null, + opts?: Partial, +): VerificationObservation { + return { + id, + timestamp: T0, + source: "bash", + boundary, + route: null, + storyId: null, + summary: `obs-${id}`, + ...opts, + }; +} + +let testSessionId: string; + +beforeEach(() => { + testSessionId = `test-cli-${Date.now()}-${Math.random().toString(36).slice(2, 8)}`; +}); + +afterEach(() => { + try { + rmSync(join(tmpdir(), `vercel-plugin-${testSessionId}-ledger`), { recursive: true, force: true }); + } catch { /* ignore */ } +}); + +// --------------------------------------------------------------------------- +// verifyPlan command +// --------------------------------------------------------------------------- + +describe("verifyPlan command", () => { + test("returns empty result for nonexistent session", () => { + const result = verifyPlan({ sessionId: "nonexistent-session-xyz" }); + expect(result.hasStories).toBe(false); + expect(result.observationCount).toBe(0); + }); + + test("returns plan for session with data", () => { + recordStory(testSessionId, "flow-verification", "/settings", "settings fails", ["verification"]); + recordObservation(testSessionId, makeObs("c1", "clientRequest", { route: "/settings" })); + + const result = verifyPlan({ sessionId: testSessionId }); + expect(result.hasStories).toBe(true); + expect(result.stories).toHaveLength(1); + expect(result.observationCount).toBe(1); + expect(result.satisfiedBoundaries).toContain("clientRequest"); + expect(result.primaryNextAction).not.toBeNull(); + }); + + test("respects agentBrowserAvailable option", () => { + recordStory(testSessionId, "flow-verification", null, "test", []); + recordObservation(testSessionId, makeObs("a", "clientRequest")); + recordObservation(testSessionId, makeObs("b", "serverHandler")); + recordObservation(testSessionId, makeObs("c", "environment")); + + const result = verifyPlan({ + sessionId: testSessionId, + agentBrowserAvailable: false, + }); + expect(result.primaryNextAction).toBeNull(); + expect(result.blockedReasons.some((r) => r.includes("agent-browser"))).toBe(true); + }); + + test("respects devServerLoopGuardHit option", () => { + recordStory(testSessionId, "flow-verification", null, "test", []); + recordObservation(testSessionId, makeObs("a", "clientRequest")); + recordObservation(testSessionId, makeObs("b", "serverHandler")); + recordObservation(testSessionId, makeObs("c", "environment")); + + const result = verifyPlan({ + sessionId: testSessionId, + devServerLoopGuardHit: true, + }); + expect(result.primaryNextAction).toBeNull(); + expect(result.blockedReasons.some((r) => r.includes("loop guard"))).toBe(true); + }); + + test("returns stable JSON for same fixture state", () => { + recordStory(testSessionId, "flow-verification", "/settings", "test", []); + recordObservation(testSessionId, makeObs("s1", "clientRequest", { route: "/settings" })); + recordObservation(testSessionId, makeObs("s2", "serverHandler", { route: "/settings" })); + + const r1 = verifyPlan({ sessionId: testSessionId }); + const r2 = verifyPlan({ sessionId: testSessionId }); + expect(JSON.stringify(r1, null, 2)).toBe(JSON.stringify(r2, null, 2)); + }); + + test("exits zero — does not throw on valid execution", () => { + expect(() => verifyPlan({ sessionId: testSessionId })).not.toThrow(); + }); + + test("auto-detects the most recently updated session ledger", () => { + const olderSessionId = `${testSessionId}-zzz-older`; + const newerSessionId = `${testSessionId}-aaa-newer`; + const previousSessionId = process.env.CLAUDE_SESSION_ID; + + try { + recordStory(olderSessionId, "flow-verification", "/older", "older session", []); + recordObservation(olderSessionId, makeObs("older-1", "clientRequest", { route: "/older" })); + + recordStory(newerSessionId, "flow-verification", "/newer", "newer session", []); + recordObservation(newerSessionId, makeObs("newer-1", "serverHandler", { route: "/newer" })); + + delete process.env.CLAUDE_SESSION_ID; + + const result = verifyPlan(); + expect(result.hasStories).toBe(true); + expect(result.stories[0]?.route).toBe("/newer"); + } finally { + if (previousSessionId === undefined) { + delete process.env.CLAUDE_SESSION_ID; + } else { + process.env.CLAUDE_SESSION_ID = previousSessionId; + } + rmSync(join(tmpdir(), `vercel-plugin-${olderSessionId}-ledger`), { recursive: true, force: true }); + rmSync(join(tmpdir(), `vercel-plugin-${newerSessionId}-ledger`), { recursive: true, force: true }); + } + }); +}); + +// --------------------------------------------------------------------------- +// JSON output stability +// --------------------------------------------------------------------------- + +describe("JSON output stability", () => { + test("result has expected shape", () => { + recordStory(testSessionId, "stuck-investigation", null, "hangs on load", []); + recordObservation(testSessionId, makeObs("j1", "environment", { summary: "printenv" })); + + const result = verifyPlan({ sessionId: testSessionId }); + + expect(typeof result.hasStories).toBe("boolean"); + expect(Array.isArray(result.stories)).toBe(true); + expect(typeof result.observationCount).toBe("number"); + expect(Array.isArray(result.satisfiedBoundaries)).toBe(true); + expect(Array.isArray(result.missingBoundaries)).toBe(true); + expect(Array.isArray(result.recentRoutes)).toBe(true); + expect(Array.isArray(result.blockedReasons)).toBe(true); + // primaryNextAction is either object or null + expect(result.primaryNextAction === null || typeof result.primaryNextAction === "object").toBe(true); + }); + + test("result is JSON-serializable", () => { + recordStory(testSessionId, "flow-verification", "/", "test", ["skill-a"]); + const result = verifyPlan({ sessionId: testSessionId }); + + const json = JSON.stringify(result); + const parsed = JSON.parse(json); + expect(parsed.hasStories).toBe(true); + }); +}); + +// --------------------------------------------------------------------------- +// Human output format +// --------------------------------------------------------------------------- + +describe("human output format", () => { + test("human output matches JSON data", () => { + recordStory(testSessionId, "flow-verification", "/api/save", "save endpoint fails", ["verification"]); + recordObservation(testSessionId, makeObs("h1", "clientRequest", { route: "/api/save" })); + + const result = verifyPlan({ sessionId: testSessionId }); + const human = formatPlanHuman(result); + + // Human output should reflect the same data + expect(human).toContain("flow-verification"); + expect(human).toContain("/api/save"); + if (result.primaryNextAction) { + expect(human).toContain(result.primaryNextAction.action); + } + }); + + test("human output includes active story details, reason, and other stories summary", () => { + recordStory(testSessionId, "flow-verification", "/settings", "settings broken", ["verification"]); + recordObservation(testSessionId, makeObs("h-s1", "clientRequest", { + route: "/settings", + storyId: storyId("flow-verification", "/settings"), + })); + recordObservation(testSessionId, makeObs("h-s2", "serverHandler", { + route: "/settings", + storyId: storyId("flow-verification", "/settings"), + })); + + recordStory(testSessionId, "flow-verification", "/dashboard", "dashboard broken", ["verification"]); + + const result = verifyPlan({ sessionId: testSessionId }); + const human = formatPlanHuman(result); + + // Active story header + expect(human).toContain("Active story:"); + expect(human).toContain("/dashboard"); + expect(human).toContain("dashboard broken"); + + // Evidence for active story (dashboard has 0 boundaries) + expect(human).toContain("Evidence: 0/4 boundaries satisfied"); + + // Next action with reason + expect(human).toContain("Next action:"); + expect(human).toContain("Reason:"); + + // Other stories compact summary + expect(human).toContain("Other stories:"); + expect(human).toContain("/settings"); + expect(human).toContain("2/4 boundaries satisfied"); + }); +}); + +// --------------------------------------------------------------------------- +// CLI JSON: activeStoryId and storyStates equivalence +// --------------------------------------------------------------------------- + +describe("CLI JSON active-story equivalence", () => { + test("JSON output includes activeStoryId and storyStates array", () => { + recordStory(testSessionId, "flow-verification", "/dashboard", "dashboard save fails", ["verification"]); + + const result = verifyPlan({ sessionId: testSessionId }); + + expect(result.activeStoryId).toBe(storyId("flow-verification", "/dashboard")); + expect(Array.isArray(result.storyStates)).toBe(true); + expect(result.storyStates.length).toBe(1); + expect(result.storyStates[0].storyId).toBe(result.activeStoryId); + }); + + test("active storyStates entry matches top-level fields exactly", () => { + recordStory(testSessionId, "flow-verification", "/settings", "settings broken", ["verification"]); + recordObservation(testSessionId, makeObs("eq-1", "clientRequest", { + route: "/settings", + storyId: storyId("flow-verification", "/settings"), + })); + + recordStory(testSessionId, "flow-verification", "/dashboard", "dashboard broken", ["verification"]); + + const result = verifyPlan({ sessionId: testSessionId }); + + // Active story is /dashboard (more missing boundaries) + expect(result.activeStoryId).toBe(storyId("flow-verification", "/dashboard")); + + const activeState = result.storyStates.find((s) => s.storyId === result.activeStoryId); + expect(activeState).toBeDefined(); + + // The active entry's fields must match the top-level projection exactly + expect([...activeState!.satisfiedBoundaries].sort()).toEqual([...result.satisfiedBoundaries].sort()); + expect([...activeState!.missingBoundaries].sort()).toEqual([...result.missingBoundaries].sort()); + expect(activeState!.recentRoutes).toEqual(result.recentRoutes); + expect(JSON.stringify(activeState!.primaryNextAction)).toBe(JSON.stringify(result.primaryNextAction)); + expect(activeState!.blockedReasons).toEqual(result.blockedReasons); + }); + + test("multi-story JSON preserves isolation between active and non-active stories", () => { + recordStory(testSessionId, "flow-verification", "/settings", "settings broken", ["verification"]); + recordObservation(testSessionId, makeObs("ms-1", "clientRequest", { + route: "/settings", + storyId: storyId("flow-verification", "/settings"), + })); + recordObservation(testSessionId, makeObs("ms-2", "serverHandler", { + route: "/settings", + storyId: storyId("flow-verification", "/settings"), + })); + + recordStory(testSessionId, "flow-verification", "/dashboard", "dashboard broken", ["verification"]); + + const result = verifyPlan({ sessionId: testSessionId }); + + // Active story (/dashboard) has 0 satisfied, 4 missing + expect(result.satisfiedBoundaries).toHaveLength(0); + expect(result.missingBoundaries).toHaveLength(4); + + // Non-active story (/settings) has 2 satisfied + const settingsState = result.storyStates.find((s) => s.route === "/settings"); + expect(settingsState!.satisfiedBoundaries).toContain("clientRequest"); + expect(settingsState!.satisfiedBoundaries).toContain("serverHandler"); + expect(settingsState!.satisfiedBoundaries).toHaveLength(2); + }); +});