mongrel-intelligence · zbigniewsobiecki · May 9, 2026 · May 8, 2026 · May 8, 2026 · May 9, 2026
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -107,7 +107,7 @@ cascade projects credentials-set <id> --key GITHUB_TOKEN_REVIEWER --value ghp_..
 ## Agent triggers
 
 Trigger format is category-prefixed: `{category}:{event}`
-(e.g. `pm:status-changed`, `scm:check-suite-success`, `alerting:issue-created`).
+(e.g. `pm:status-changed`, `scm:check-suite-success`, `alerting:issue-alert`).
 
 Configs live in the `agent_trigger_configs` table. Manage via:
 

diff --git a/README.md b/README.md
@@ -30,7 +30,7 @@ docker compose exec dashboard node dist/tools/create-admin-user.mjs \
   --email admin@example.com --password changeme --name "Admin"
 ```
 
-Open **http://localhost:3001** and log in with your admin credentials.
+Open **http://localhost:3001** and log in with your admin credentials. The router listens on **http://localhost:3000** for provider webhooks.
 
 For the full setup walkthrough — projects, credentials, webhooks, and triggers — see **[Getting Started](./docs/getting-started.md)**.
 
@@ -51,9 +51,8 @@ For the full setup walkthrough — projects, credentials, webhooks, and triggers
 
 ## 🏗️ Architecture
 
-<p align="center">
-  <img src="docs/architecture.jpg" alt="CASCADE architecture diagram" />
-</p>
+> The architecture diagram source lives at [`docs/architecture.d2`](./docs/architecture.d2).
+> Render it locally with the [D2 CLI](https://d2lang.com/): `d2 docs/architecture.d2 docs/architecture.svg`.
 
 Cascade runs as three independent services:
 
@@ -152,15 +151,15 @@ All project-level credentials (GitHub tokens, PM keys, LLM API keys) are stored
 
 **Dual-persona GitHub model** — Cascade uses two separate GitHub bot accounts per project (implementer and reviewer) to prevent feedback loops. The implementer writes code and creates PRs; the reviewer reviews and approves them.
 
-**Trigger system** — Events from Trello, JIRA, Linear, and GitHub webhooks are matched against registered `TriggerHandler` instances. Triggers are configured per-project in the database.
+**Trigger system** — Events from Trello, JIRA, Linear, GitHub, and Sentry webhooks are matched against registered `TriggerHandler` instances. Triggers are configured per-project in the database. Event names are category-prefixed, for example `pm:status-changed`, `scm:check-suite-success`, and `alerting:issue-alert`.
 
 **Agent engines** — Agents run through a shared execution lifecycle with a pluggable engine registry. Default engine is `claude-code` (Anthropic Claude Code SDK). Alternatives: `llmist` (supports OpenRouter, Anthropic, OpenAI), `codex` (OpenAI Codex CLI), `opencode` (OpenCode server).
 
 **Credential management** — All secrets are stored in the `project_credentials` table, scoped to a project. Optional AES-256-GCM encryption via `CREDENTIAL_MASTER_KEY`.
 
 **`.cascade/` directory** — Each target repository can include a `.cascade/` directory with hooks that control how the agent sets up the project, lints after edits, and runs tests. See **[`.cascade/` Directory Guide](./docs/cascade-directory.md)**.
 
-**Observable subprocesses** — `cascade-tools` streams child stdout/stderr live to the parent's stderr so LLM-driven agents can see progress as it happens, emits 30-second heartbeats during silent stretches, and enforces both idle-silence and wall-clock timeouts with SIGTERM→SIGKILL escalation across the full process tree. See [spec 013](./docs/specs/013-subprocess-output-streaming.md).
+**Observable subprocesses** — `cascade-tools` streams child stdout/stderr live to the parent's stderr so LLM-driven agents can see progress as it happens, emits 30-second heartbeats during silent stretches, and enforces both idle-silence and wall-clock timeouts with SIGTERM→SIGKILL escalation across the full process tree. See [spec 013](./docs/specs/013-subprocess-output-streaming.md.done).
 
 For deeper documentation on all of these topics, see [CLAUDE.md](./CLAUDE.md).
 

diff --git a/bin/cascade-tools.js b/bin/cascade-tools.js
@@ -1,4 +1,27 @@
 #!/usr/bin/env node
+
+// cascade-tools' stdout is reserved for the JSON envelope agents parse. The
+// worker process at `src/backends/llmist/index.ts` sets
+// `LLMIST_LOG_FILE=<engineLogPath>` AND `LLMIST_LOG_TEE='true'` so its OWN
+// logger tees to both the engine log file AND stdout. Both env vars are in
+// the subprocess allowlist (`src/utils/cascadeEnv.ts`) and pass through to
+// the bash subprocess that runs cascade-tools — making the cascade-tools
+// logger ALSO tee to stdout, polluting the agent's tool-result channel with
+// DEBUG/INFO + ANSI escapes (62% of cascade-tools calls in the 2026-05-09
+// prod corpus). Strip the inherited tee BEFORE the singleton logger is
+// constructed by the bootstrap import below. With LLMIST_LOG_FILE still set,
+// every log line — including the load-bearing `[image-pipeline]
+// work-item-fetch summary` per spec 016 — lands in the engine log the worker
+// collects, so operator observability via `cascade runs logs <runId>` is
+// preserved.
+delete process.env.LLMIST_LOG_TEE;
+// Standalone CLI runs (no LLMIST_LOG_FILE inherited): redirect to /dev/null
+// so dev runs stay envelope-only too. Override for debugging:
+// `LLMIST_LOG_FILE=/tmp/x.log cascade-tools ...`.
+if (!process.env.LLMIST_LOG_FILE) {
+	process.env.LLMIST_LOG_FILE = '/dev/null';
+}
+
 import { readFileSync } from 'node:fs';
 import { dirname, resolve } from 'node:path';
 import { fileURLToPath } from 'node:url';
@@ -24,6 +47,25 @@ pjson.oclif = {
 		globPatterns: ['**/*.js', '!**/dashboard/**', '!**/_shared/**', '!base.js', '!bootstrap.js'],
 	},
 	topicSeparator: ' ',
+	// Explicit topic summaries. Without this block oclif borrows each topic's
+	// description from its FIRST command (see node_modules/@oclif/core
+	// /lib/config/config.js — the line `this._topics.set(name, { description:
+	// c.summary || c.description, name })`). That made bare `cascade-tools
+	// --help` show "pm  Add a checklist with items to a work item..." — a
+	// specific gadget's description leaking into the topic line. Agents reading
+	// bare --help to map the surface got a misleading frame (saw in 2026-05-09
+	// prod corpus). One truthful sentence per topic.
+	topics: {
+		pm: {
+			description:
+				'Read and write PM work items, comments, and checklists across Trello/JIRA/Linear.',
+		},
+		scm: {
+			description: 'Interact with GitHub PRs: create, review, comment, fetch diffs and CI logs.',
+		},
+		alerting: { description: 'Inspect Sentry alerting issues and events.' },
+		session: { description: 'End the agent session. Exclusive terminal call.' },
+	},
 };
 
 const config = await Config.load({ root, pjson });

diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md
@@ -100,7 +100,7 @@ sequenceDiagram
 
 **YAML-based agent definitions** — Agents are defined declaratively in YAML files specifying identity, capabilities, triggers, prompts, and lifecycle hooks. Definitions resolve via three tiers: in-memory cache, database, then YAML files on disk.
 
-**AsyncLocalStorage credential scoping** — Provider clients (GitHub, Trello, JIRA) use Node.js `AsyncLocalStorage` to scope credentials per-request, preventing cross-request credential leakage.
+**AsyncLocalStorage credential scoping** — Provider clients (GitHub, Trello, JIRA, Linear, and PM dispatch scopes) use Node.js `AsyncLocalStorage` to scope credentials and active PM provider context per request, preventing cross-request credential leakage.
 
 ## Directory Map
 

diff --git a/docs/architecture.d2 b/docs/architecture.d2
@@ -15,12 +15,21 @@ SCM: {
 PM: {
   trello
   jira
+  linear
   style: {
     shadow: true
     fill: Orange
   }
 }
 
+ALERTING: {
+  sentry
+  style: {
+    shadow: true
+    fill: Red
+  }
+}
+
 CASCADE: {
   router
   api
@@ -68,8 +77,9 @@ client.cli <-> CASCADE.api
 
 SCM -> CASCADE.router: webhook triggers
 PM -> CASCADE.router: webhook triggers
+ALERTING -> CASCADE.router: webhook triggers
 
 CASCADE.worker manager -> PM: updates, comments
+CASCADE.worker manager -> ALERTING: issue + event reads
 CASCADE.worker manager -> SCM: PRs, reviews, comments
 SCM -> CASCADE.worker manager: repo + PR contents
-
diff --git a/docs/architecture/01-services.md b/docs/architecture/01-services.md
@@ -96,7 +96,7 @@ The router passes job data to workers via Docker container env vars:
 |----------|---------|
 | `JOB_ID` | Unique job identifier |
 | `JOB_TYPE` | `trello`, `github`, `jira`, `linear`, `sentry`, `manual-run`, `retry-run`, `debug-analysis` |
-| `JOB_DATA` | JSON-encoded job payload |
+| `JOB_DATA` | JSON-encoded job payload; GitHub jobs include `mergeabilityRecheckAttempt` in this payload for deferred re-checks |
 | `CASCADE_CREDENTIAL_KEYS` | Comma-separated list of credential env var names |
 | Individual credential vars | Pre-loaded project credentials (e.g., `GITHUB_TOKEN_IMPLEMENTER`) |
 
@@ -132,7 +132,7 @@ The security scrub in step 8 prevents agent engines (which execute arbitrary LLM
 ### Dispatch flow
 
 `dispatchJob()` switches on the job type:
-- **Webhook jobs** (`trello`, `github`, `jira`, `sentry`) — call the provider-specific webhook processor, which re-runs trigger dispatch and executes the matched agent
+- **Webhook jobs** (`trello`, `github`, `jira`, `linear`, `sentry`) — call the provider-specific webhook processor, which re-runs trigger dispatch and executes the matched agent
 - **Dashboard jobs** (`manual-run`, `retry-run`, `debug-analysis`) — call `processDashboardJob()`, which loads project config and invokes the appropriate runner
 
 ## Dashboard

diff --git a/docs/architecture/02-webhook-pipeline.md b/docs/architecture/02-webhook-pipeline.md
@@ -118,7 +118,7 @@ flowchart TD
 4. **Self-check** — Adapter's `isSelfAuthored()` detects bot's own actions (loop prevention)
 5. **Reaction** — Fire-and-forget emoji reaction on the source event
 6. **Resolve config** — Look up project by platform identifier (board ID, repo, etc.)
-7. **Dispatch triggers** — Within credential scope, call `TriggerRegistry.dispatch()` to find matching agent
+7. **Dispatch triggers** — Within credential scope, call `TriggerRegistry.dispatch()` to find a matching agent. PM router adapters also wrap dispatch in `withPMScopeForDispatch(fullProject, dispatch)` so shared PM gates can resolve the active provider.
 8. **Concurrency** — Check work-item lock (`work-item-lock.ts`) and agent-type concurrency (`agent-type-lock.ts`)
 9. **Ack comment** — Post an acknowledgment comment to the work item or PR
 10. **Build job** — Package trigger result + payload + ack info into a `CascadeJob`
@@ -130,10 +130,11 @@ flowchart TD
 | Mechanism | File | Purpose |
 |-----------|------|---------|
 | Action dedup | `action-dedup.ts` | Prevent processing same webhook delivery twice |
-| Work-item lock | `work-item-lock.ts` | Prevent concurrent agents on the same card/issue |
+| Work-item lock | `work-item-lock.ts` | Prevent duplicate same-agent runs on the same card/issue |
 | Agent-type lock | `agent-type-lock.ts` | Configurable `max_concurrency` per agent type per project |
+| Lock-state classifier | `lock-state-classifier.ts` | Explains blocked webhooks as queued, awaiting worker slot, or wedged lock |
 
-All locks are in-memory with TTL expiry. They are conservative (enqueue-time only) — the worker performs its own verification before executing.
+All locks are in-memory with TTL expiry. Work-item locks are scoped by `(projectId, workItemId, agentType)`: duplicate runs of the same agent are blocked, but different agent types can run concurrently on the same work item. When a lock rejects a webhook, logs distinguish `Awaiting worker slot` from `Work item locked (no active dispatch)`; the latter is a wedged-lock canary and captures to Sentry.
 
 ## Signature Verification
 

diff --git a/docs/architecture/03-trigger-system.md b/docs/architecture/03-trigger-system.md
@@ -57,8 +57,11 @@ interface TriggerResult {
   prNumber?: number;
   prUrl?: string;
   prTitle?: string;
-  waitForChecks?: boolean;         // Poll CI before starting
   onBlocked?: () => void;          // Cleanup if job can't be enqueued
+  deferredRecheck?: {
+    delayMs: number;
+    coalesceKey: string;
+  };                               // Schedule a bare delayed re-dispatch
 }
 ```
 
@@ -135,7 +138,11 @@ function registerBuiltInTriggers(registry: TriggerRegistry): void {
 Triggers use category-prefixed events: `{category}:{event-name}`
 - `pm:status-changed`, `pm:label-added`
 - `scm:check-suite-success`, `scm:pr-review-submitted`, `scm:review-requested`
-- `alerting:issue-created`, `alerting:metric-alert`
+- `alerting:issue-alert`, `alerting:metric-alert`
+
+### Deferred re-checks
+
+Handlers that cannot make a final decision yet can return `deferredRecheck: { delayMs, coalesceKey }` with `agentType: null`. The router schedules a coalesced delayed BullMQ job and exits without spawning an agent. GitHub mergeability checks use this path; the worker recognizes re-check jobs via `mergeabilityRecheckAttempt` and captures a Sentry diagnostic if the second pass still cannot resolve state.
 
 ### Config resolution
 
@@ -166,26 +173,30 @@ Each trigger in a YAML agent definition can declare a `contextPipeline` — an o
 
 `src/triggers/shared/agent-execution.ts`
 
-After a trigger matches, the shared execution layer handles the agent lifecycle:
+After a trigger matches, the shared execution layer handles the agent lifecycle. `runAgentExecutionPipeline()` is intentionally a thin facade: it keeps the source-compatible call signature used by PM, GitHub, Sentry, and manual paths, while delegating each execution concern to helper modules under `src/triggers/shared/`.
 
 ```mermaid
 flowchart TD
-    A[Trigger matched] --> B[PM lifecycle: prepareForAgent]
-    B --> C[Check budget]
-    C -->|Over budget| D[Post budget warning, skip]
-    C -->|Within budget| E[Resolve agent definition]
-    E --> F[Set credential scope]
+    A[Trigger matched] --> B[Guard and context setup]
+    B --> C[Validation and budget preflight]
+    C -->|Blocked| D[Notify PM/callbacks and stop]
+    C -->|Allowed| E[Persist work-item and PR links]
+    E --> F[PM lifecycle: prepareForAgent]
     F --> G[Run agent via engine]
-    G -->|Success| H[PM lifecycle: handleSuccess]
-    G -->|Failure| I[PM lifecycle: handleFailure]
-    H --> J[Trigger debug analysis if configured]
-    I --> J
+    G --> H[Post-run side effects]
+    H --> I[PM lifecycle cleanup and success/failure]
+    I --> J[Source callbacks]
+    J --> K[Follow-up dispatch]
+    K --> L[Auto-debug if eligible]
 ```
 
 This includes:
-- PM lifecycle management (move card to "In Progress", post labels)
-- Budget checking (`workItemBudgetUsd`)
-- Credential scoping via `withCredentials()`
-- Agent execution via `runAgent()` (see [05-engine-backends](./05-engine-backends.md))
-- Post-run lifecycle (move card to "In Review", link PR, sync checklists)
-- Debug analysis triggering on failure
+- Context setup in `agent-execution-runtime.ts`: build the `PMLifecycleManager`, load agent lifecycle hooks, and re-resolve `workItemId` from PR links when a webhook arrived before the DB mapping existed.
+- Validation and lifecycle preflight in `agent-execution-lifecycle.ts`: validate PM/SCM integrations, notify PM/callbacks on validation failure, check `workItemBudgetUsd`, and run `prepareForAgent`.
+- Work-item and PR traceability in `agent-work-items.ts`: create/update work-item records, maintain PR/work-item links before and after execution, fetch PR titles, and backfill run PR numbers.
+- Agent execution in `agent-execution-runtime.ts`: call `runAgent()` with the resolved input plus project, config, and remaining budget.
+- Post-run PM behavior in `agent-pm-summary.ts` and `agent-execution-lifecycle.ts`: post review/output summaries to the PM work item, handle artifacts, post budget warnings, clean up processing state, and call `handleSuccess` or `handleFailure`.
+- Follow-up dispatch in `agent-execution-followups.ts`: dispatch review after a successful implementation PR once CI is passing and the review dedup key is claimed, and chain backlog-manager after a successful splitting run when the auto label/capacity checks allow it.
+- Auto-debug in `agent-auto-debug.ts`: fire-and-forget debug analysis for eligible failed or timed-out runs after callbacks and follow-up dispatch complete.
+
+Credential scoping still happens before the facade runs. PM webhook handling enters provider credentials and PM provider scope before dispatch; GitHub and Sentry use `webhook-execution.ts` / `credential-scope.ts` to inject LLM keys, PM credentials, PM provider scope, and GitHub persona tokens as needed.
diff --git a/docs/architecture/04-agent-system.md b/docs/architecture/04-agent-system.md
@@ -117,7 +117,7 @@ Key functions:
 | `respond-to-planning-comment` | fs, session, pm | Implementer | `pm:comment-mention` |
 | `backlog-manager` | fs, session, pm, scm:read | Implementer | `pm:status-changed` (backlog, merged) |
 | `resolve-conflicts` | fs, shell, session, scm | Implementer | `scm:pr-conflict-detected` |
-| `alerting` | fs, shell, session, alerting, scm | Implementer | `alerting:issue-created`, `alerting:metric-alert` |
+| `alerting` | fs, shell, session, alerting, scm | Implementer | `alerting:issue-alert`, `alerting:metric-alert` |
 | `debug` | fs, session, pm | Implementer | `internal:debug-analysis` |
 
 ## Capabilities