Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
043f6af
feat: add verification planning workflow
johnlindquist Mar 27, 2026
5e6bf6d
feat: add verification directive handoff
johnlindquist Mar 27, 2026
c311be9
feat: close verification feedback loop
johnlindquist Mar 27, 2026
b20fb3b
feat: add verified routing policy
johnlindquist Mar 27, 2026
c9640cb
fix: scope routing policy evidence
johnlindquist Mar 27, 2026
5e546b0
ploop: iteration 3 checkpoint
johnlindquist Mar 27, 2026
743a015
feat: harden routing replay learning
johnlindquist Mar 27, 2026
83f032c
feat: unify verification directive flow
johnlindquist Mar 27, 2026
8b1e288
test: harden verification routing parity
johnlindquist Mar 27, 2026
10e08bc
feat: unify session diagnostics and skill exclusions
johnlindquist Mar 27, 2026
154bedc
feat: add route-scoped policy recall
johnlindquist Mar 27, 2026
3bbedb4
fix: harden route-scoped policy recall
johnlindquist Mar 27, 2026
c43be00
feat: explain routing recall decisions
johnlindquist Mar 27, 2026
575d359
feat: scope verification state by story
johnlindquist Mar 28, 2026
b4d0c0a
feat: attribute routing policy credit precisely
johnlindquist Mar 28, 2026
8dbd783
ploop: iteration 3 checkpoint
johnlindquist Mar 28, 2026
e03f8b9
feat: bind prompt routing to verification boundaries
johnlindquist Mar 28, 2026
56bf864
feat: add verified prompt policy recall
johnlindquist Mar 28, 2026
64b609c
feat: add verified rule learning workflow
johnlindquist Mar 28, 2026
7eaf6ca
feat: separate learned promotions from policy evidence
johnlindquist Mar 28, 2026
4cc886d
feat: persist learned routing rulebooks
johnlindquist Mar 28, 2026
d4fbb08
feat: broaden verification signal observation
johnlindquist Mar 28, 2026
1130b3d
feat: gate verification policy on local provenance
johnlindquist Mar 28, 2026
8da3ba0
feat: add verification closure diagnostics
johnlindquist Mar 28, 2026
dfb42ef
feat: learn and recall verified skill companions
johnlindquist Mar 28, 2026
bb7f861
feat: diagnose companion recall routing
johnlindquist Mar 28, 2026
0ceb348
feat: add routing decision causality
johnlindquist Mar 28, 2026
035a615
feat: add verified playbook recall
johnlindquist Mar 28, 2026
7618898
fix: harden verified playbook attribution
johnlindquist Mar 28, 2026
25757fc
fix: gate verified playbook side effects
johnlindquist Mar 28, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 56 additions & 0 deletions docs/02-injection-pipeline.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ This document explains how vercel-plugin decides **which skills to inject**, **w
- [Vercel.json Key-Aware Routing](#verceljson-key-aware-routing)
- [Profiler Boost](#profiler-boost)
- [Setup Mode Routing](#setup-mode-routing)
- [Route-Scoped Verified Policy Recall](#route-scoped-verified-policy-recall)
- [Unified Ranker](#unified-ranker)
7. [Dedup State Machine](#dedup-state-machine)
- [Three State Sources](#three-state-sources)
Expand Down Expand Up @@ -481,6 +482,61 @@ effectivePriority = base + vercelJsonAdjustment + profilerBoost

When the project is greenfield (`VERCEL_PLUGIN_GREENFIELD=true`), the `bootstrap` skill gets a massive priority boost of **+50**, ensuring it's always injected first. If `bootstrap` didn't naturally match the tool call, it's synthetically added to the match set.

### Route-Scoped Verified Policy Recall

**Source**: `hooks/src/policy-recall.mts` → `selectPolicyRecallCandidates()`, integrated in `pretooluse-skill-inject.mts` at Stage 4.95

After all pattern-matched skills are ranked and before injection, the hook checks whether an **active verification story** with a non-null `targetBoundary` exists. If so, it queries the project's routing policy for historically winning skills that pattern matching missed.

**Preconditions** (all must be true):
1. `cwd` and `sessionId` are available
2. An active verification story exists (via `loadCachedPlanResult` → `selectPrimaryStory`)
3. `primaryNextAction.targetBoundary` is non-null

**Lookup precedence** (first qualifying bucket wins — no cross-bucket merging):
1. **Exact route** — e.g. `PreToolUse|flow-verification|clientRequest|Bash|/settings`
2. **Wildcard route** — e.g. `PreToolUse|flow-verification|clientRequest|Bash|*`
3. **Legacy 4-part key** — e.g. `PreToolUse|flow-verification|clientRequest|Bash`

**Qualification thresholds** (same conservatism as `derivePolicyBoost`):
- Minimum 3 exposures
- Minimum 65% success rate (weighted: `directiveWins` count at 0.25×)
- Minimum +2 policy boost

**Tie-breaking** is deterministic: `recallScore` DESC → `exposures` DESC → skill name ASC (lexicographic).

**Insertion behavior** — recalled skills are **bounded second-order candidates**, not slot-1 overrides:
- When direct pattern matches exist: recalled skill inserts at index 1 (behind the top direct match)
- When no direct matches exist: recalled skill takes index 0
- At most 1 recalled skill per PreToolUse invocation (`maxCandidates: 1`)
- Skills already in `rankedSkills` or `injectedSkills` (dedup) are excluded

**How recall differs from ordinary policy boosts**:
- Policy boosts adjust `effectivePriority` of already-matched skills — they only amplify what pattern matching found
- Policy recall **injects a skill that pattern matching missed entirely**, based on historical verification evidence
- Recalled skills are marked `synthetic: true` in the routing decision trace
- Recalled skills use `trigger: "policy-recall"` and `reasonCode: "route-scoped-verified-policy-recall"` in injection metadata
- Recalled skills are NOT forced to summary-only mode (the summary and full payloads are identical via `skillInvocationMessage`)

**Trace output**: Recalled candidates appear in `ranked[]` with:
```json
{
"skill": "verification",
"synthetic": true,
"pattern": {
"type": "policy-recall",
"value": "route-scoped-verified-policy-recall"
}
}
```

**When recall is skipped**, a `policy-recall-skipped` log line is emitted with reason `no_active_verification_story` or `no_target_boundary`.

**Observability**:
- `policy-recall-lookup` is emitted before any recalled skill is inserted
- It includes `requestedScenario`, `checkedScenarios[]`, `selectedBucket`, `selectedSkills[]`, `rejected[]`, and `hintCodes[]`
- This is the canonical machine-readable explanation for why route-scoped recall did or did not fire

### Unified Ranker

**Source**: `patterns.mts:rankEntries()`
Expand Down
75 changes: 75 additions & 0 deletions docs/06-runtime-internals.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ This document covers implementation details that go beyond the pipeline overview
- [Profiler Boost (+5)](#profiler-boost-5)
- [Vercel.json Key Routing (±10)](#verceljson-key-routing-10)
- [Special-Case Boosts](#special-case-boosts)
- [Route-Scoped Verified Policy Recall](#route-scoped-verified-policy-recall)
- [Ranking Function](#ranking-function)
- [Budget Enforcement](#budget-enforcement)
- [Prompt Signal Scoring](#prompt-signal-scoring)
Expand Down Expand Up @@ -507,6 +508,7 @@ Every matched skill receives an **effective priority** computed from its base pr
| Setup-mode bootstrap | **+50** | `bootstrap` skill | Greenfield or ≥3 bootstrap hints |
| TSX review trigger | **+40** | `react-best-practices` | After N `.tsx` edits (default 3) |
| Dev-server verify | **+45** | `agent-browser-verify` | Dev server command detected |
| Policy recall | *splice at idx 1* | Any verified skill | Active story + target boundary + policy evidence |

### Base Priority Range (4–8)

Expand Down Expand Up @@ -556,6 +558,79 @@ If the skill's associated key **exists** in `vercel.json` → +10. If the skill
- **Dev-server verify (+45)**: On `npm run dev`, `next dev`, `vercel dev`, etc., injects `agent-browser-verify` + `verification` companion. Capped at 2 injections per session (loop guard).
- **Vercel env help**: One-time injection when `vercel env add/update/pull` commands are detected.

### Route-Scoped Verified Policy Recall

**Source**: `hooks/src/policy-recall.mts` → `selectPolicyRecallCandidates()`

Policy recall is a **post-ranking injection stage** (Stage 4.95) that fires between ranking and skill body loading. It is fundamentally different from policy boosts:

| Aspect | Policy Boost | Policy Recall |
|--------|-------------|---------------|
| Input | Skill already matched by patterns | Skill **not** matched by patterns |
| Effect | Adjusts `effectivePriority` | Splices skill into `rankedSkills` array |
| Trace field | `policyBoost` (number) | `synthetic: true`, `pattern.type: "policy-recall"` |
| Reason code | `"policy-boost"` | `"route-scoped-verified-policy-recall"` |
| Trigger | Always (when policy data exists) | Only when active verification story + target boundary exist |

**Selector algorithm** (`selectPolicyRecallCandidates`):

1. Generate scenario key candidates via `scenarioKeyCandidates()` — exact route, wildcard (`*`), legacy 4-part key
2. For each candidate key (in precedence order), look up the policy bucket
3. Filter entries: `exposures >= 3`, `successRate >= 0.65`, `policyBoost >= 2`, not in `excludeSkills`
4. Sort: `recallScore` DESC → `exposures` DESC → `skill` ASC
5. Return first qualifying bucket's top `maxCandidates` entries (default 1) — no cross-bucket merging

**Recall score formula**:
```
recallScore = derivePolicyBoost(stats) × 1000
+ round(successRate × 100) × 10
+ directiveWins × 5
+ wins
− staleMisses
```

Where `successRate = (wins + directiveWins × 0.25) / max(exposures, 1)`.

**Insertion semantics**: The recalled skill is spliced at `index = rankedSkills.length > 0 ? 1 : 0`, ensuring it never preempts the strongest direct match. It then flows through normal budget enforcement and cap logic.

**Synthetic trace marking**: All recalled skills are added to the `syntheticSkills` set and appear in the routing decision trace with:
```json
{
"skill": "<name>",
"synthetic": true,
"pattern": { "type": "policy-recall", "value": "route-scoped-verified-policy-recall" },
"summaryOnly": false
}
```

**Log events**:
- `policy-recall-injected` (debug): Emitted per recalled skill with `skill`, `scenario`, `insertionIndex`, `exposures`, `wins`, `directiveWins`, `successRate`, `policyBoost`, `recallScore`
- `policy-recall-skipped` (debug): Emitted when preconditions fail, with `reason`: `"no_active_verification_story"` or `"no_target_boundary"`
- `policy-recall-lookup` (debug): Emitted before any recalled skill is inserted, with `requestedScenario`, `checkedScenarios[]`, `selectedBucket`, `selectedSkills[]`, `rejected[]`, and `hintCodes[]`

### Routing Doctor (`session-explain --json`)

`session-explain` includes an additive `doctor` object that explains the latest routing decision without changing routing behavior.

```json
{
"doctor": {
"latestDecisionId": "abc123",
"latestScenario": "PreToolUse|flow-verification|clientRequest|Bash|/settings",
"latestRanked": [],
"policyRecall": {
"selectedBucket": "PreToolUse|flow-verification|clientRequest|Bash|/settings",
"selected": [],
"rejected": [],
"hints": []
},
"hints": []
}
}
```

The contract is additive-only and intended for downstream agents, CI diagnostics, and local operator debugging.

### Ranking Function

**Source**: `hooks/src/patterns.mts` → `rankEntries()`
Expand Down
61 changes: 61 additions & 0 deletions docs/skill-injection.md
Original file line number Diff line number Diff line change
Expand Up @@ -764,3 +764,64 @@ This catches `cookies()` calls without `await`, but skips client components (whi
| `VERCEL_PLUGIN_LOG_LEVEL` | `off` | — | `off` / `summary` / `debug` / `trace` |
| `VERCEL_PLUGIN_HOOK_DEDUP` | — | — | `off` to disable dedup entirely |
| `VERCEL_PLUGIN_AUDIT_LOG_FILE` | — | — | Audit log path or `off` |

---

## Learned Routing Rulebook & Capsule Provenance

When the routing-policy compiler promotes verified rules into a **Learned Routing Rulebook**, the ranking pipeline can apply per-rule boosts at injection time. Every decision capsule records which rule (if any) fired via the `rulebookProvenance` field, so downstream consumers never need to re-derive ranking state.

### Canonical Rulebook JSON

```json
{
"version": 1,
"createdAt": "2026-03-28T08:15:00.000Z",
"sessionId": "sess_123",
"rules": [
{
"id": "PreToolUse|flow-verification|uiRender|Bash|agent-browser-verify",
"scenario": "PreToolUse|flow-verification|uiRender|Bash",
"skill": "agent-browser-verify",
"action": "promote",
"boost": 8,
"confidence": 0.93,
"reason": "replay verified: no regressions, learned routing matched winning skill",
"sourceSessionId": "sess_123",
"promotedAt": "2026-03-28T08:15:00.000Z",
"evidence": {
"baselineWins": 4,
"baselineDirectiveWins": 2,
"learnedWins": 4,
"learnedDirectiveWins": 2,
"regressionCount": 0
}
}
]
}
```

### Decision Capsule Provenance

When a rulebook rule fires, the capsule includes:

```json
{
"rulebookProvenance": {
"matchedRuleId": "PreToolUse|flow-verification|uiRender|Bash|agent-browser-verify",
"ruleBoost": 8,
"ruleReason": "replay verified: no regressions, learned routing matched winning skill",
"rulebookPath": "/tmp/vercel-plugin-routing-policy-<hash>-rulebook.json"
}
}
```

When no rule fires, the field is `null`:

```json
{
"rulebookProvenance": null
}
```

Each ranked entry in the capsule's `ranked` array also carries the per-skill fields `matchedRuleId`, `ruleBoost`, `ruleReason`, and `rulebookPath` for full traceability.
2 changes: 1 addition & 1 deletion generated/build-from-skills.manifest.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"version": 1,
"generatedAt": "2026-03-23T18:09:18.986Z",
"generatedAt": "2026-03-28T19:42:43.133Z",
"templates": [
{
"template": "agents/ai-architect.md.tmpl",
Expand Down
2 changes: 1 addition & 1 deletion generated/skill-catalog.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Skill Catalog

> Auto-generated by `scripts/generate-catalog.ts` — do not edit manually.
> Generated: 2026-03-23T18:07:35.703Z
> Generated: 2026-03-28T00:24:28.407Z
> Skills: 39

## Table of Contents
Expand Down
8 changes: 7 additions & 1 deletion generated/skill-manifest.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,12 @@
{
"generatedAt": "2026-03-23T18:09:21.758Z",
"generatedAt": "2026-03-28T19:42:43.059Z",
"version": 2,
"excludedSkills": [
{
"slug": "fake-banned-test-skill",
"reason": "test-only-pattern"
}
],
"skills": {
"vercel-agent": {
"priority": 4,
Expand Down
33 changes: 33 additions & 0 deletions hooks/cli-routing-replay.mjs
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
// hooks/src/cli-routing-replay.mts
import { replayRoutingSession } from "./routing-replay.mjs";
import { createLogger } from "./logger.mjs";
var log = createLogger();
var sessionId = process.argv[2];
if (!sessionId) {
log.summary("cli_error", { reason: "missing_session_id" });
process.stderr.write(
JSON.stringify({
ok: false,
error: "missing_session_id",
usage: "node cli-routing-replay.mjs <sessionId>"
}) + "\n"
);
process.exit(1);
}
try {
const report = replayRoutingSession(sessionId);
log.summary("cli_complete", {
sessionId,
traceCount: report.traceCount,
scenarioCount: report.scenarioCount,
recommendationCount: report.recommendations.length
});
process.stdout.write(JSON.stringify(report, null, 2) + "\n");
} catch (err) {
const message = err instanceof Error ? err.message : String(err);
log.summary("cli_error", { reason: "replay_failed", message });
process.stderr.write(
JSON.stringify({ ok: false, error: "replay_failed", message }) + "\n"
);
process.exit(2);
}
Loading
Loading