mongrel-intelligence · zbigniewsobiecki · May 19, 2026 · May 18, 2026 · May 18, 2026 · May 19, 2026
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -136,6 +136,16 @@ jobs:
       - name: Build worker image
         run: docker build -f Dockerfile.worker -t cascade-worker:ci-check .
 
+      # MNG-1055: the worker image must include the baseline native-session
+      # runtime (Python shim + Playwright Chromium cache + the env var the
+      # native-tool env filter forwards to agent shells). The smoke script
+      # below fails loudly if any of those are missing — keeps the worker
+      # image aligned with the contract documented in
+      # docs/architecture/05-engine-backends.md and surfaced to agents in
+      # the native-tool prompts.
+      - name: Smoke-test worker runtime tools
+        run: WORKER_IMAGE=cascade-worker:ci-check tests/docker/worker-runtime-tools/run-test.sh
+
   enforce-dev-to-main:
     runs-on: ubuntu-latest
     if: github.event_name == 'pull_request' && github.base_ref == 'main'

diff --git a/.github/workflows/deploy-dev.yml b/.github/workflows/deploy-dev.yml
@@ -39,13 +39,22 @@ jobs:
           docker run --rm ${{ env.ROUTER_IMAGE }}:dev-${{ github.sha }} \
             node --check dist/router/index.js
 
-      - name: Build and push worker image
+      - name: Build worker image
         run: |
           docker build \
             --label org.opencontainers.image.revision=${{ github.sha }} \
             -f Dockerfile.worker \
             -t ${{ env.WORKER_IMAGE }}:dev \
             -t ${{ env.WORKER_IMAGE }}:dev-${{ github.sha }} .
+
+      # MNG-1055: gate the dev worker push on the same runtime smoke test
+      # CI runs so dev never publishes an image lacking python, Playwright
+      # Chromium, or a working PLAYWRIGHT_BROWSERS_PATH.
+      - name: Smoke-test worker runtime tools
+        run: WORKER_IMAGE=${{ env.WORKER_IMAGE }}:dev-${{ github.sha }} tests/docker/worker-runtime-tools/run-test.sh
+
+      - name: Push worker image
+        run: |
           docker push ${{ env.WORKER_IMAGE }}:dev
           docker push ${{ env.WORKER_IMAGE }}:dev-${{ github.sha }}
 

diff --git a/.github/workflows/deploy.yml b/.github/workflows/deploy.yml
@@ -39,14 +39,25 @@ jobs:
           docker run --rm ${{ env.ROUTER_IMAGE }}:${{ github.sha }} \
             node --check dist/router/index.js
 
-      - name: Build and push worker image
+      - name: Build worker image
         run: |
           docker build \
             --label org.opencontainers.image.revision=${{ github.sha }} \
             --no-cache \
             -f Dockerfile.worker \
             -t ${{ env.WORKER_IMAGE }}:latest \
             -t ${{ env.WORKER_IMAGE }}:${{ github.sha }} .
+
+      # MNG-1055: gate the production worker push on the same runtime smoke
+      # test CI runs. A worker image without `python`, Playwright Chromium,
+      # or a working `PLAYWRIGHT_BROWSERS_PATH` would break agents the
+      # moment it reached production, so we fail before publishing the
+      # `:latest` / SHA tags.
+      - name: Smoke-test worker runtime tools
+        run: WORKER_IMAGE=${{ env.WORKER_IMAGE }}:${{ github.sha }} tests/docker/worker-runtime-tools/run-test.sh
+
+      - name: Push worker image
+        run: |
           docker push ${{ env.WORKER_IMAGE }}:latest
           docker push ${{ env.WORKER_IMAGE }}:${{ github.sha }}
 

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -4,6 +4,16 @@ All notable user-visible changes to CASCADE are documented here. The format is l
 
 ## Unreleased
 
+### Fixed
+
+- **`cascade-tools` multiline text and large diff I/O are now hardened against shell-quoting footguns and stdout truncation** ([MNG-1059](https://linear.app/issue/MNG-1059)). The shared CLI factory at `src/gadgets/shared/cli/params.ts` now rejects invocations that pass `--*-file -` for two or more file-input flags (e.g. `--body-file - --comments-file -`) before any `readFileSync(0, ...)` call — stdin (fd 0) can only be drained once per process, and the previous behavior silently truncated one of the two agent payloads. The rejection emits a structured `flag-parse` error envelope (`error.flag: "body-file,comments-file"`, `hint: "Pass at most one --*-file -; for the others, write the payload to a temp file and pass --<flag>-file <path>."`) so agents can self-correct on the next attempt. Direct file paths remain pairwise-compatible — `--body-file - --comments-file /tmp/comments.json` and `--body-file /tmp/body.md --comments-file -` both work as before. The native-tool system prompt now renders a "cascade-tools shell-safety rules" section that documents the one-stdin-consumer invariant and provides safe heredoc / temp-file patterns for one and two payloads. The prompt renderer also suppresses inline `--body '...'` / `--text '...'` examples whose content contains backticks, code fences, `$(...)`, or newlines when a file-input companion is declared, redirecting the agent at the safer `--*-file <path>` form instead. File-input flag descriptions for `--body-file`, `--text-file`, `--description-file`, `--details-file`, and `--comments-file` explicitly call out markdown / multiline / backticks. Closes [MNG-908](https://linear.app/mongrel/issue/MNG-908), [MNG-910](https://linear.app/mongrel/issue/MNG-910), [MNG-917](https://linear.app/mongrel/issue/MNG-917), [MNG-1046](https://linear.app/mongrel/issue/MNG-1046).
+
+- **`cascade-tools scm get-pr-diff` gains an `--outputFile <path>` escape hatch for large diffs and one-line JSON patches** ([MNG-1045](https://linear.app/mongrel/issue/MNG-1045)). When `--outputFile` is set, the full multiline Markdown diff is written to the requested path on disk and stdout returns a compact JSON summary `{outputFile, fileCount, bytes, pathFilter}` instead of the raw payload — sidestepping terminal-truncation issues with hundreds-of-kilobytes one-line JSON diffs. Default behavior is preserved: without `--outputFile`, `get-pr-diff` returns the formatted Markdown directly. The review-agent skipped-files guidance now points operators at this form (`cascade-tools scm get-pr-diff --prNumber <N> --path <path> --outputFile /tmp/pr-diff.md`) when a file would otherwise truncate. The `--outputFile` flag is declared as `cliOnly: true` on the underlying ToolDefinition so it appears in the CLI + agent-facing manifest but is excluded from the SDK Gadget Zod schema (gadgets return strings in-process and cannot deliver a file path back through that contract).
+
+### Changed
+
+- **Worker image now ships a Python shim and a shared Playwright Chromium cache as native-session baseline tools** ([MNG-1055](https://linear.app/issue/MNG-1055)). `Dockerfile.worker` installs `python3` + `python-is-python3` so both `python` and `python3` resolve to the same Debian-owned interpreter (closing the friction cluster MNG-887/897/926/934/947/957/973/1010/1024/1033/1039/1044). It also installs a pinned `@playwright/test` plus Chromium with browser dependencies into `/ms-playwright` (closing MNG-998/1048), then makes that cache owned by the runtime `node` user so project `.cascade/setup.sh` scripts pinned to a different Playwright revision can install the missing Chromium revision into the inherited `$PLAYWRIGHT_BROWSERS_PATH`. The native-tool env filter (`src/backends/shared/envFilter.ts`) now allowlists `PLAYWRIGHT_BROWSERS_PATH` as an exact match so the cache is reachable from every native-tool engine subprocess; the broader `PLAYWRIGHT_*` prefix is intentionally left out to preserve the defense-in-depth env-allowlist posture. A new Docker smoke script (`tests/docker/worker-runtime-tools/run-test.sh`) validates `python`, `python3`, `python -c 'import json'`, the Playwright Chromium launch path, the env var, and node-user write access to the cache, and is wired into CI (`docker-build-check`) and both deploy workflows (`deploy.yml` / `deploy-dev.yml`) **before** the worker image is pushed, so a broken baseline cannot reach `:latest` / `:dev` tags. The native-tool system prompt (`src/backends/shared/nativeToolPrompts.ts`) exposes the guaranteed tools to agents under a new "Guaranteed runtime tools" section, and the engine-backends architecture doc, README, and Getting Started prerequisites describe the runtime baseline (including the image-size implication).
+
 ### Documentation
 
 - **Friction reporting is now documented for operators and provider contributors.** Architecture docs cover the optional PM Friction slot (`lists.friction` for Trello, `statuses.friction` for JIRA/Linear), `ReportFriction`, and `cascade-tools pm report-friction --details-file -`. The integration guide explains that friction reports use existing provider `createWorkItem` plus optional `moveWorkItem`, so providers do not need a new adapter method or a DB-backed friction index. Resilience docs describe the JSONL sidecar/outbox retry path, missing-slot behavior, and non-blocking drain failures. See Trello card [Rvv7VVd5](https://trello.com/c/69ff6af3bc5c526cc5faa2d4).

diff --git a/CLAUDE.md b/CLAUDE.md
@@ -148,10 +148,12 @@ The wedged-lock canary should never fire under normal operation. Its presence in
 
 Review agent receives a **compact per-file diff context**, not full file contents. Each changed file is a `### <file> (<status>, +N -M)` section with a unified diff hunk. Budget: `REVIEW_DIFF_CONTEXT_TOKEN_LIMIT` = 200k tokens, per-file cap 10%.
 
-GitHub's changed-file API is used for file enumeration and change counts, but compact patch bodies come from the checked-out PR workspace via `git diff origin/<base>...HEAD`. Files that can't fit or can't be locally verified (deleted, binary/no text patch, local diff failure/empty patch, oversized patch, or budget exhausted) are injected as `SKIPPED FILES` with instructions to fetch on demand via `cascade-tools scm get-pr-diff --prNumber <N> --path <path>`, `Read`, or `Grep`.
+GitHub's changed-file API is used for file enumeration and change counts, but compact patch bodies come from the checked-out PR workspace via `git diff origin/<base>...HEAD`. Files that can't fit or can't be locally verified (deleted, binary/no text patch, local diff failure/empty patch, oversized patch, or budget exhausted) are injected as `SKIPPED FILES` with instructions to fetch on demand via `cascade-tools scm get-pr-diff --prNumber <N> --path <path>`, `Read`, or `Grep`. Large or one-line JSON diffs that would truncate stdout should add `--outputFile /tmp/pr-diff.md` — the CLI writes the full multiline Markdown payload to disk and returns a compact `{outputFile, fileCount, bytes, pathFilter}` summary on stdout (MNG-1059 / MNG-1045).
 
 When review output misses something, check the `PR context prepared` log entry for `included` / `skipped` / `skipReasons`, `patchSources`, `totalDiffTokens`, `perFileTokenCap`, and `localGitMismatches` to confirm whether the file was visible to the agent and whether GitHub's API patch differed from the local patch. Also check context offload logs if the diff context was written under `.cascade/context/`.
 
+**cascade-tools shell-safety contract** — MNG-1059. cascade-tools commands that accept markdown/multiline payloads (`--body`, `--text`, `--description`, `--details`, `--comments`) declare a `--*-file <path>` companion via `cli.fileInputAlternatives`. Agents are instructed in the system prompt to prefer the file form when content contains backticks, code fences, `$(...)`, or newlines — shells expand those tokens even inside single quotes once they layer through `bash -c`, and newlines break argv parsing. The shared CLI factory at `src/gadgets/shared/cli/params.ts:rejectMultipleStdinConsumers` enforces the single-stdin-consumer invariant: only one `--*-file -` per command. Passing two stdin consumers (e.g. `--body-file - --comments-file -`) returns a structured `flag-parse` envelope with `error.flag: "body-file,comments-file"` and a hint to write one payload to a temp file — *before* any `readFileSync(0, ...)` call. The native-tool system prompt also renders a "cascade-tools shell-safety rules" section with safe heredoc / temp-file patterns. Prompt example rendering suppresses inline `--body '...'` examples for shell-sensitive content (backticks / `$(...)` / newlines) when a file-input companion exists, redirecting agents at the safer `--*-file <path>` form.
+
 ## Engines
 
 Default engine: `claude-code`. Alternatives: `codex`, `opencode`.

diff --git a/Dockerfile.worker b/Dockerfile.worker
@@ -24,6 +24,13 @@ LABEL cascade.managed=true
 RUN npm install -g pnpm --force
 
 # Install system packages needed by agent runtime
+#
+# `python3` plus `python-is-python3` provide a Debian-owned Python shim so
+# both `python` and `python3` work predictably for `python -c 'import json'`
+# and other lightweight scripting agents reach for. The package ownership is
+# load-bearing — manual `ln -s python3 python` symlinks drift on base-image
+# changes; the apt package survives those cleanly. See MNG-1055 (and the
+# friction cluster MNG-887/897/926/934/947/957/973/1010/1024/1033/1039/1044).
 RUN apt-get update && apt-get install -y \
     ca-certificates \
     curl \
@@ -36,14 +43,19 @@ RUN apt-get update && apt-get install -y \
     postgresql-client \
     procps \
     psutils \
+    python3 \
+    python-is-python3 \
     redis-tools \
     ripgrep \
     ruby \
     sudo \
     tmux \
     unzip \
     && rm -rf /var/lib/apt/lists/* \
-    && ln -s $(which fdfind) /usr/local/bin/fd
+    && ln -s $(which fdfind) /usr/local/bin/fd \
+    && python3 --version \
+    && python --version \
+    && python -c 'import json; print(json.dumps({"ok": True}))'
 
 # Configure tmux to keep panes alive after command exits
 # This allows capturing output and exit code from fast-exiting commands
@@ -65,6 +77,32 @@ RUN npm install -g \
     @openai/codex@0.125.0 \
     opencode-ai@1.14.25
 
+# Playwright browser cache.
+#
+# Workers regularly review PRs that include UI/Playwright tests, and review
+# agents call `playwright test` / `playwright launch chromium` in shell
+# sessions. Without a baseline browser cache they hit
+# `browserType.launch: Executable doesn't exist` and waste budget trying to
+# install Chromium themselves (often into `~/.cache/ms-playwright` for the
+# wrong user, or while offline). See friction cluster MNG-998 / MNG-1048.
+#
+# The cache lives at a stable non-home path so:
+#   - Chromium binaries are owned by the runtime `node` user and readable by all
+#     users.
+#   - Project setup scripts that need a different Playwright revision can write
+#     the missing browser revision into the same cache instead of failing on a
+#     root-owned directory.
+#   - The `USER node` switch below does not invalidate or duplicate the cache.
+#   - `PLAYWRIGHT_BROWSERS_PATH` can be allowlisted by the native-tool env
+#     filter (`src/backends/shared/envFilter.ts`) and forwarded to agent
+#     subprocesses without leaking any other Playwright config.
+ENV PLAYWRIGHT_BROWSERS_PATH=/ms-playwright
+RUN npm install -g @playwright/test@1.49.1 \
+    && PLAYWRIGHT_BROWSERS_PATH=/ms-playwright \
+       npx --yes playwright@1.49.1 install --with-deps chromium \
+    && chown -R node:node /ms-playwright \
+    && chmod -R u+rwX,go+rX /ms-playwright
+
 # Switch to non-root user for running workers.
 # Claude Code CLI refuses --dangerously-skip-permissions when running as root.
 # The node user (uid 1000) is pre-created by the Node.js base image and matches

diff --git a/README.md b/README.md
@@ -133,7 +133,7 @@ The included `docker-compose.yml` runs all services with a single command. Worke
 |-------|-----------|---------|
 | Dashboard + Frontend | `Dockerfile.selfhosted` | API server + web UI (combined) |
 | Router | `Dockerfile.router` | Webhook receiver, worker orchestration |
-| Worker | `Dockerfile.worker` | Full agent runtime (clones repos, runs AI) |
+| Worker | `Dockerfile.worker` | Full agent runtime (clones repos, runs AI). Ships a baseline native-session toolchain (`python`/`python3`, `jq`, `rg`, `fd`, `git`, `tmux`, `cascade-tools`) and a shared Playwright Chromium cache at `$PLAYWRIGHT_BROWSERS_PATH=/ms-playwright`. See [engine-backends](./docs/architecture/05-engine-backends.md#worker-image-runtime-baseline-mng-1055). |
 
 **Required production environment variables:**
 

diff --git a/docs/architecture/05-engine-backends.md b/docs/architecture/05-engine-backends.md
@@ -151,4 +151,23 @@ All LLM requests and responses are logged to the `agent_run_llm_calls` table, tr
 - Duration
 - Tool calls made
 
+## Worker-image runtime baseline (MNG-1055)
+
+Native-tool engines do not provision their own shell environment — they execute against whatever the worker image ships. CASCADE intentionally bakes a fixed baseline into `Dockerfile.worker` so agents can rely on these tools without per-project `.cascade/setup.sh` workarounds:
+
+| Tool | Where it lives | Notes |
+|---|---|---|
+| `python` / `python3` | apt `python3` + `python-is-python3` | Both names resolve to the same Debian-owned Python 3. Use either for `python -c 'import json'` etc.; do not `pip install` at runtime. |
+| `jq`, `rg`, `fd`, `git`, `tmux`, `cascade-tools`, `ast-grep` (`sg`) | apt + curl + npm install in the worker image | Prefer these over hand-rolled equivalents in shell commands. |
+| Playwright Chromium | `npm install -g @playwright/test@<pin> && playwright install --with-deps chromium` | Browser cache lives at `$PLAYWRIGHT_BROWSERS_PATH` (`/ms-playwright`), readable and writable by the unprivileged `node` user. |
+| Agent engine CLIs | `@anthropic-ai/claude-code`, `@openai/codex`, `opencode-ai` | All pinned versions. |
+
+**Env propagation.** Native-tool engines sanitize subprocess env via `src/backends/shared/envFilter.ts`. `PLAYWRIGHT_BROWSERS_PATH` is allowlisted as an exact match so the bake-in cache is reachable from agent shells; the broader `PLAYWRIGHT_*` prefix is intentionally not allowed, preserving the defense-in-depth posture for the rest of Playwright's env surface.
+
+**Smoke coverage.** Every build path validates the baseline via `tests/docker/worker-runtime-tools/run-test.sh`. CI (`.github/workflows/ci.yml` → `docker-build-check`) runs the script against `cascade-worker:ci-check` after `docker build`. The deploy workflows (`.github/workflows/deploy{,-dev}.yml`) run the same script against the freshly built worker image **before** pushing `:latest` / `:dev` / SHA tags, so a regression that would break agents in production blocks the publish step.
+
+**Image size.** Chromium + system deps significantly increase the worker image. We pin one Chromium revision and one `@playwright/test` version; target repositories that need a materially different revision can run their normal `npx playwright install chromium` flow in `.cascade/setup.sh`, which writes the missing revision into the same `$PLAYWRIGHT_BROWSERS_PATH` cache as the runtime `node` user. Other browsers (Firefox, WebKit) are intentionally not installed.
+
+**Agent visibility.** The list of guaranteed tools is surfaced to agents in the native-tool system prompt (`src/backends/shared/nativeToolPrompts.ts`, "Guaranteed runtime tools" section). Adding a new tool to the worker image should always be paired with an update to that section so agents reach for the new capability instead of working around its absence.
+
 For further details on adding a new engine, see [`docs/adding-engines.md`](../adding-engines.md).