Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -50,3 +50,6 @@ test-results/

# CASCADE context files (temporary pre-fetched data)
.cascade/context/

# CASCADE friction report sidecar (runtime artifact — must not be committed or staged)
.cascade/friction-reports.jsonl
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ All notable user-visible changes to CASCADE are documented here. The format is l

### Documentation

- **Friction reporting is now documented for operators and provider contributors.** Architecture docs cover the optional PM Friction slot (`lists.friction` for Trello, `statuses.friction` for JIRA/Linear), `ReportFriction`, and `cascade-tools pm report-friction --details-file -`. The integration guide explains that friction reports use existing provider `createWorkItem` plus optional `moveWorkItem`, so providers do not need a new adapter method or a DB-backed friction index. Resilience docs describe the JSONL sidecar/outbox retry path, missing-slot behavior, and non-blocking drain failures. See Trello card [Rvv7VVd5](https://trello.com/c/69ff6af3bc5c526cc5faa2d4).

- **Trigger architecture docs now describe the migrated trigger contracts.** Added guidance for canonical `TRIGGER_EVENTS`, shared PM/GitHub result builders, first-match dispatch, structured skip vs bare `null`, no-agent results, deferred bare-job re-checks, router outcome decision reasons, PM coalescing, capacity scope, dispatch failure compensation, and wedged-lock diagnostics. Migration note for future trigger contributors: new handlers should import event constants, use the shared builders, return structured skips for claimed-but-non-dispatched events, and reserve bare `null` for "continue to later handlers." See Trello card [qUbPtALY](https://trello.com/c/69fe2a950699baaf91688a5b).

### Fixed
Expand Down
2 changes: 2 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,8 @@ Some triggers take params (e.g. `review` + `scm:check-suite-success` accepts `{"

**Worker exit diagnostics** — when a worker container exits non-zero, the router calls `container.inspect()` *before* AutoRemove reaps it and stamps the run record's `error` field with a structured, grep-stable string: `Worker crashed with exit code N · OOMKilled=<true|false> · reason="<State.Error>"`. The `OOMKilled=true` marker is the definitive cgroup-OOM signal (per Docker's own `State.OOMKilled`); a 137 exit *without* `OOMKilled=true` means the kill came from inside the container or from a non-cgroup signal — *not* memory. The `[WorkerManager] Resolved spawn settings` log emitted at every spawn includes both `projectWatchdogTimeoutMs` and `globalWorkerTimeoutMs` so post-mortems can confirm whether the per-project override actually won. See `src/router/active-workers.ts:formatCrashReason` for the format and `tests/unit/router/container-manager-diagnostics.test.ts` for regression pins.

**Friction reporting** — agents with `pm:friction` can call `cascade-tools pm report-friction` / `ReportFriction` for incidental tooling, environment, permission, dependency, test, PM-data, or SCM-data papercuts. Configure the optional Friction slot in the PM wizard's Status Mapping step: Trello uses `lists.friction`; JIRA and Linear use `statuses.friction`. The feature does not add provider adapter methods or a DB-backed friction index — it materializes a normal PM work item through existing `createWorkItem` plus optional `moveWorkItem`. Reports are first written to `CASCADE_FRICTION_SIDECAR_PATH` as a JSONL outbox, then filed immediately when possible; backend drain retries pending reports after the engine returns, including ordinary failures. Missing friction slot returns a non-fatal `friction_slot_missing`/`queued_slot_missing` result, and drain failures log/capture Sentry under `friction_sidecar_drain_failed` without failing an otherwise successful run.

**Dispatch failure semantics** — spec 015 (verified live in prod via the ucho/MNG-350 incident on 2026-04-26):

- **Capacity miss waits, never throws.** When the dispatcher pulls a job and the worker pool is at `maxWorkers`, it `await`s a slot via the in-process slot-waiter (default `slotWaitTimeoutMs` = 5min). The slot is conceptually held by the running container — `slotReleased()` is called once per cleanup from `cleanupWorker`, never from the dispatcher.
Expand Down
2 changes: 1 addition & 1 deletion docs/architecture/04-agent-system.md
Original file line number Diff line number Diff line change
Expand Up @@ -135,7 +135,7 @@ const CAPABILITIES = [
// Built-in (always available)
'fs:read', 'fs:write', 'shell:exec', 'session:ctrl',
// PM integration
'pm:read', 'pm:write', 'pm:checklist',
'pm:read', 'pm:write', 'pm:checklist', 'pm:friction',
// SCM integration
'scm:read', 'scm:ci-logs', 'scm:comment', 'scm:review', 'scm:pr',
// Alerting integration
Expand Down
18 changes: 16 additions & 2 deletions docs/architecture/07-gadgets.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ All file gadgets validate paths against allowed directories (working directory +

Todos are stored in `.claude/todos.json` within the repo working directory.

### PM (`pm:read`, `pm:write`, `pm:checklist`)
### PM (`pm:read`, `pm:write`, `pm:checklist`, `pm:friction`)

| Gadget | Capability | Purpose |
|--------|-----------|---------|
Expand All @@ -67,9 +67,23 @@ Todos are stored in `.claude/todos.json` within the repo working directory.
| `AddChecklist` | `pm:write` | Add checklist to work item |
| `PMUpdateChecklistItem` | `pm:checklist` | Update checklist item status |
| `PMDeleteChecklistItem` | `pm:checklist` | Delete checklist item |
| `ReportFriction` | `pm:friction` | Queue and file incidental friction reports |

PM gadgets use the active `PMProvider` from `AsyncLocalStorage` context, making them provider-agnostic.

`ReportFriction` is intentionally narrower than general PM write access. It lets agents file incidental papercuts in tooling, environment, permissions, dependencies, tests, PM data, or SCM data without exposing `CreateWorkItem` / `MoveWorkItem` directly. The CLI form is:

```bash
cascade-tools pm report-friction \
--summary "Typecheck requires undocumented Redis env var" \
--category environment \
--severity medium \
--whileDoing "Running pre-PR verification" \
--details-file -
```

`--details-file -` reads Markdown details from stdin; use it for multi-line reproduction notes or shell output. The command always appends a queued event to the friction sidecar before it tries to create the PM work item, so a failed immediate write can be retried by the backend drain.

### SCM (`scm:read`, `scm:ci-logs`, `scm:comment`, `scm:review`, `scm:pr`)

| Gadget | Capability | Purpose |
Expand Down Expand Up @@ -101,7 +115,7 @@ Native-tool engines cannot invoke gadget classes directly (they run as subproces

| Category | Commands | Example |
|----------|----------|---------|
| PM | `cascade-tools pm read-work-item`, `list-work-items`, `update-work-item`, etc. | `cascade-tools pm read-work-item --workItemId abc123` |
| PM | `cascade-tools pm read-work-item`, `list-work-items`, `update-work-item`, `report-friction`, etc. | `cascade-tools pm report-friction --summary "Missing setup hint" --details-file - --category tooling --severity medium` |
| SCM | `cascade-tools scm get-pr-details`, `get-pr-diff`, `post-pr-comment`, etc. | `cascade-tools scm get-pr-details --prNumber 42` |
| Alerting | `cascade-tools alerting get-alerting-issue`, `list-alerting-events`, etc. | `cascade-tools alerting get-alerting-issue --organizationId acme --issueId 12345` |
| Session | `cascade-tools session finish` | `cascade-tools session finish --comment "Created PR and verified checks"` |
Expand Down
12 changes: 12 additions & 0 deletions docs/architecture/08-config-credentials.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,18 @@ interface ProjectConfig {
}
```

### PM workflow slots

PM provider config maps CASCADE lifecycle concepts onto provider-native lists or statuses. The friction-reporting slot is optional but recognized consistently across providers:

| Provider | Config key | Meaning |
|---|---|---|
| Trello | `lists.friction` | Trello list ID where friction report cards are created and left |
| JIRA | `statuses.friction` | JIRA status name/ID applied after the issue is created in `projectKey` |
| Linear | `statuses.friction` | Linear workflow state UUID applied after the issue is created in `teamId` |

If the slot is not configured, `ReportFriction` records the report in the sidecar and returns a non-fatal `queued_slot_missing` result with operator guidance. No run should fail solely because the friction slot is missing.

`maxInFlightItems` is enforced at two points: (a) the `backlog-manager` chain
gates (won't auto-pull from BACKLOG when at capacity) and (b) the PM
`status-changed` triggers (won't fire `implementation` when a card is moved
Expand Down
12 changes: 12 additions & 0 deletions docs/architecture/10-resilience.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,18 @@ Rate limits are enforced by the LLMist SDK for `sdk`-archetype engines. Native-t

## Retry Strategy

### Friction report outbox

`ReportFriction` uses a JSONL sidecar as a small outbox so incidental agent issues do not block the main run. The gadget appends a queued event to `CASCADE_FRICTION_SIDECAR_PATH` before attempting PM materialization. Native-tool engines receive that path in their environment; in-process engines get the same value through session state.

On successful immediate materialization, the gadget appends a filed event with the PM work item ID/URL. If the immediate PM write fails, the gadget returns `queued_for_retry` and the agent should keep working unless the underlying issue is a real blocker.

The backend adapter drains pending sidecar events after the engine returns, including ordinary engine failures. Drain behavior is deliberately non-blocking:

- A missing `lists.friction` / `statuses.friction` slot produces a skipped report with reason `friction_slot_missing`; operators should configure the Friction row in the PM wizard's Status Mapping step.
- A PM API failure during drain logs a warning and captures Sentry with `source=friction_sidecar_drain_failed`, but it does not change a successful run into a failed run.
- After drain, the sidecar is compacted/cleaned so filed reports are not retried indefinitely.

### Dispatch retries

The router queues `cascade-jobs` and `cascade-dashboard-jobs` with `attempts: 4` and exponential backoff. Dispatch errors before a worker container starts are classified in `src/router/dispatch-error-classifier.ts`:
Expand Down
1 change: 1 addition & 0 deletions src/agents/capabilities/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
* pm → pm:read → ReadWorkItem, ListWorkItems
* pm:write → CreateWorkItem, UpdateWorkItem, PostComment
* pm:checklist → PMUpdateChecklistItem, PMDeleteChecklistItem
* pm:friction → ReportFriction
*
* scm → scm:read → GetPRDetails, GetPRDiff, GetPRChecks
* scm:comment → PostPRComment, UpdatePRComment
Expand Down
9 changes: 9 additions & 0 deletions src/agents/capabilities/registry.ts
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ export const CAPABILITIES = [
'pm:read',
'pm:write',
'pm:checklist',
'pm:friction',

// SCM integration capabilities
'scm:read',
Expand Down Expand Up @@ -143,6 +144,14 @@ export const CAPABILITY_REGISTRY: Record<Capability, CapabilityDefinition> = {
cliToolNames: [],
},

'pm:friction': {
integration: 'pm',
description: 'Report incidental PM-backed friction without general PM write access',
gadgetNames: ['ReportFriction'],
sdkToolNames: [],
cliToolNames: [],
},

// -------------------------------------------------------------------------
// SCM integration capabilities
// -------------------------------------------------------------------------
Expand Down
4 changes: 4 additions & 0 deletions src/agents/capabilities/resolver.ts
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ import {
PMUpdateChecklistItem,
PostComment,
ReadWorkItem,
ReportFriction,
UpdateWorkItem,
} from '../../gadgets/pm/index.js';
import { ReadFile } from '../../gadgets/ReadFile.js';
Expand Down Expand Up @@ -111,6 +112,9 @@ const GADGET_CONSTRUCTORS: Record<string, new () => any> = {
PMUpdateChecklistItem,
PMDeleteChecklistItem,

// pm:friction
ReportFriction,

// scm:read
GetPRDetails,
GetPRDiff,
Expand Down
1 change: 1 addition & 0 deletions src/agents/definitions/alerting.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ capabilities:
optional:
- pm:read
- pm:write
- pm:friction

# Supported triggers for this agent
triggers:
Expand Down
1 change: 1 addition & 0 deletions src/agents/definitions/backlog-manager.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ capabilities:
- session:ctrl
- pm:read
- pm:write
- pm:friction
optional: []

# Supported triggers for this agent.
Expand Down
5 changes: 3 additions & 2 deletions src/agents/definitions/debug.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,16 +9,17 @@ integrations:
required: []
optional: [pm]

# Read-only FS access for log analysis, full PM access for creating debug cards.
# FS/session access always available; PM access gated on configured integration.
capabilities:
required:
- fs:read
- shell:exec
- session:ctrl
optional:
- pm:read
- pm:write
- pm:checklist
optional: []
- pm:friction

# Debug agent is triggered manually or via internal attachment upload detection.
# No external event-based triggers are configured.
Expand Down
1 change: 1 addition & 0 deletions src/agents/definitions/implementation.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ capabilities:
- pm:read
- pm:write
- pm:checklist
- pm:friction
- scm:pr
optional: []

Expand Down
1 change: 1 addition & 0 deletions src/agents/definitions/planning.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ capabilities:
- session:ctrl
- pm:read
- pm:write
- pm:friction
optional: []

# Supported triggers for this agent
Expand Down
17 changes: 11 additions & 6 deletions src/agents/definitions/profiles.ts
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ export type { AgentCapabilities } from './schema.js';

export interface AgentProfile {
/** Filter the full set of tool manifests down to what this agent needs */
filterTools(allTools: ToolManifest[]): ToolManifest[];
filterTools(allTools: ToolManifest[], integrationChecker?: IntegrationChecker): ToolManifest[];
/** Engine-neutral capabilities used to derive native tools inside each engine */
allCapabilities: Capability[];
/** Whether this profile needs the GitHub client for context fetching */
Expand Down Expand Up @@ -166,9 +166,6 @@ function resolveContextPipeline(
function buildProfileFromDefinition(def: AgentDefinition, agentType: string): AgentProfile {
const allCapabilities = getAllCapabilities(def.capabilities);

// Derive tool names from capabilities for filtering
const gadgetNames = getGadgetNamesFromCapabilities(allCapabilities);

// Get gadget options from strategies
const gadgetOptions = def.strategies.gadgetOptions;

Expand All @@ -195,9 +192,17 @@ function buildProfileFromDefinition(def: AgentDefinition, agentType: string): Ag
const lifecycle = resolveLifecycleHooks(def);

const profile: AgentProfile = {
filterTools: (allTools: ToolManifest[]) => {
filterTools: (allTools: ToolManifest[], integrationChecker?: IntegrationChecker) => {
const effectiveCaps = integrationChecker
? resolveEffectiveCapabilities(
def.capabilities.required,
def.capabilities.optional,
integrationChecker,
)
: allCapabilities;

// Filter tools by the gadget names derived from capabilities
const nameSet = new Set(gadgetNames);
const nameSet = new Set(getGadgetNamesFromCapabilities(effectiveCaps));
return allTools.filter((t) => nameSet.has(t.name));
},
allCapabilities,
Expand Down
1 change: 1 addition & 0 deletions src/agents/definitions/resolve-conflicts.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ capabilities:
- pm:read
- pm:write
- pm:checklist
- pm:friction

# Supported triggers for this agent
triggers:
Expand Down
1 change: 1 addition & 0 deletions src/agents/definitions/respond-to-ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ capabilities:
- pm:read
- pm:write
- pm:checklist
- pm:friction

# Supported triggers for this agent
triggers:
Expand Down
1 change: 1 addition & 0 deletions src/agents/definitions/respond-to-planning-comment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ capabilities:
- pm:read
- pm:write
- pm:checklist
- pm:friction
optional: []

# Supported triggers for this agent
Expand Down
1 change: 1 addition & 0 deletions src/agents/definitions/respond-to-pr-comment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ capabilities:
optional:
- pm:read
- pm:write
- pm:friction

# Supported triggers for this agent
triggers:
Expand Down
1 change: 1 addition & 0 deletions src/agents/definitions/respond-to-review.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ capabilities:
optional:
- pm:read
- pm:write
- pm:friction

# Supported triggers for this agent
triggers:
Expand Down
1 change: 1 addition & 0 deletions src/agents/definitions/review.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ capabilities:
- scm:comment
optional:
- pm:read
- pm:friction

# Supported triggers for this agent
triggers:
Expand Down
1 change: 1 addition & 0 deletions src/agents/definitions/splitting.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ capabilities:
- pm:read
- pm:write
- pm:checklist
- pm:friction
optional: []

# Supported triggers for this agent
Expand Down
2 changes: 2 additions & 0 deletions src/agents/definitions/toolManifests.ts
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ import {
pmUpdateChecklistItemDef,
postCommentDef,
readWorkItemDef,
reportFrictionDef,
updateWorkItemDef,
} from '../../gadgets/pm/definitions.js';
import { finishDef } from '../../gadgets/session/definitions.js';
Expand All @@ -35,6 +36,7 @@ const ALL_DEFINITIONS = [
postCommentDef,
updateWorkItemDef,
createWorkItemDef,
reportFrictionDef,
listWorkItemsDef,
addChecklistDef,
moveWorkItemDef,
Expand Down
24 changes: 24 additions & 0 deletions src/agents/shared/builderFactory.ts
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ import { getIterationTrailingMessage } from '../../config/hintConfig.js';
import { getRateLimitForModel } from '../../config/rateLimits.js';
import { getRetryConfig } from '../../config/retryConfig.js';
import { initSessionState, type SessionHooks, setReadOnlyFs } from '../../gadgets/sessionState.js';
import type { ProjectConfig } from '../../types/index.js';
import type { LLMCallLogger } from '../../utils/llmLogging.js';
import type { IProgressMonitor } from '../contracts/index.js';
import { getAgentCapabilities } from '../definitions/index.js';
Expand Down Expand Up @@ -55,6 +56,21 @@ export interface CreateBuilderOptions {
workItemUrl?: string;
/** Work item display title for PR ↔ work item enrichment. Passed to session state. */
workItemTitle?: string;
/** JSONL outbox path for incidental friction reports. Passed to session state. */
frictionSidecarPath?: string;
/** PR number for the current execution. Passed to session state for in-process gadget fallback. */
prNumber?: number;
/** PR URL for the current execution. Passed to session state for in-process gadget fallback. */
prUrl?: string;
/** PR title for the current execution. Passed to session state for in-process gadget fallback. */
prTitle?: string;
/** Full project config. Stored in session state so in-process gadgets (e.g. LLMist ReportFriction)
* can access project context without relying on process.env (projectSecrets are not exported
* to env for in-process runs). */
project?: ProjectConfig;
/** Engine identifier (e.g. 'llmist'). Stored in session state so in-process gadgets
* (e.g. ReportFriction) can read it without CASCADE_ENGINE_LABEL in process.env. */
engineLabel?: string;
/** Resolved SCM hook flags for finish validation (requiresPR, requiresReview, etc.) */
hooks?: SessionHooks;
}
Expand Down Expand Up @@ -95,10 +111,18 @@ export async function createConfiguredBuilder(options: CreateBuilderOptions): Pr
baseBranch: options.baseBranch,
prBranch: options.prBranch,
projectId: options.projectId,
project: options.project,
workItemId: options.workItemId,
hooks: options.hooks,
workItemUrl: options.workItemUrl,
workItemTitle: options.workItemTitle,
frictionSidecarPath: options.frictionSidecarPath,
runId: options.runId,
prNumber: options.prNumber,
prUrl: options.prUrl,
prTitle: options.prTitle,
engineLabel: options.engineLabel,
model: options.model,
initialHeadSha,
});

Expand Down
21 changes: 21 additions & 0 deletions src/agents/shared/frictionGuidance.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
import type { Capability } from '../capabilities/index.js';

export const FRICTION_REPORTING_GUIDANCE = `## Friction Reporting

When the ReportFriction tool is available, use it only for incidental papercuts in the environment, tooling, repository setup, documentation, or developer workflow that make the work harder than it should be.

Do not report core task difficulty, expected debugging effort, product ambiguity that belongs in the current work item, or issues you can resolve directly as part of the assigned task.

Keep working after reporting friction unless the issue blocks progress. If blocked, report the friction with concrete context and then explain the blocker in your final response.`;

export function shouldAppendFrictionGuidance(capabilities: readonly Capability[]): boolean {
return capabilities.includes('pm:friction');
}

export function appendFrictionGuidance(
systemPrompt: string,
capabilities: readonly Capability[],
): string {
if (!shouldAppendFrictionGuidance(capabilities)) return systemPrompt;
return `${systemPrompt.trimEnd()}\n\n${FRICTION_REPORTING_GUIDANCE}`;
}
Loading
Loading