Add Cloudflare Pages docs preview with /preview-docs slash command#3028
Conversation
…ommand PRs from repo admins that touch docs auto-deploy a mkdocs preview to Cloudflare Pages; for everyone else, an admin or maintainer can comment `/preview-docs` to trigger one. A sticky PR comment carries the preview link, and a companion workflow deletes the deployments when the PR closes. Because mkdocs executes Python from the PR (mkdocstrings imports src/mcp), the build is gated by a permission check on the triggering actor and runs in a separate job with no secrets — the static site is handed to the Cloudflare deploy step via an artifact so PR code never shares a runner with the API token. The pull_request_target trigger is suppressed in zizmor with this rationale. :house: Remote-Dev: homespace
There was a problem hiding this comment.
4 issues found across 3 files
Reply with feedback, questions, or to request a fix.
Fix all with cubic | Re-trigger cubic
- Disable the uv Actions cache in the build job: pull_request_target runs share the base-branch cache scope, so a cache populated while untrusted PR code ran could poison later trusted workflows. Mirrors the posture in publish-pypi.yml. - Fix concurrency: workflow-level concurrency is evaluated before job if-conditions, so any PR comment was cancelling in-flight previews. Only runs that actually produce a preview now share a group; unrelated comment runs fall through to a unique run_id group. - Replace .github/zizmor.yml with inline ignore comments anchored to the pull_request_target lines (matching the existing pattern in claude.yml). - Raise the cleanup pagination cap to 200 pages. :house: Remote-Dev: homespace
There was a problem hiding this comment.
1 issue found across 3 files (changes from recent commits).
Reply with feedback, questions, or to request a fix.
Fix all with cubic | Re-trigger cubic
The upsert predicate matched any bot, so a comment from a different bot that quoted the marker string would have been overwritten with the preview table instead of the workflow's own comment. :house: Remote-Dev: homespace
| deleted=0 | ||
| for id in $ids; do | ||
| echo "Deleting deployment $id" | ||
| curl -fsS -X DELETE -H "Authorization: Bearer $CF_API_TOKEN" "$base/$id?force=true" > /dev/null | ||
| deleted=$((deleted + 1)) | ||
| done |
There was a problem hiding this comment.
🟡 In the cleanup loop, each curl -fsS -X DELETE runs under set -euo pipefail with no per-id error tolerance, so a single failed DELETE (transient 5xx, rate limit, or an already-deleted deployment returning 404) aborts the step and skips every remaining id — leaving the rest of the PR's preview deployments stale, which is the leak this workflow exists to prevent. A one-line || echo "failed to delete $id" per id (optionally tracking failures and exiting non-zero at the end) keeps the batch resilient while still surfacing the error.
Extended reasoning...
The bug. The delete loop in docs-preview-cleanup.yml runs:
for id in $ids; do
echo "Deleting deployment $id"
curl -fsS -X DELETE -H "Authorization: Bearer $CF_API_TOKEN" "$base/$id?force=true" > /dev/null
deleted=$((deleted + 1))
doneunder set -euo pipefail. Because of -f, curl exits non-zero on any HTTP ≥ 400 (and on transient network errors), and set -e then aborts the whole step mid-loop. Every id after the failing one in $ids is never deleted.
Concrete walk-through. Suppose a long-lived PR accumulated 8 preview deployments and $ids collects all 8. (1) The PR closes and the workflow fires once. (2) DELETE succeeds for ids 1–2. (3) The DELETE for id 3 hits a transient Cloudflare 5xx or a 429 — curl -f exits non-zero. (4) set -e terminates the step immediately; ids 4–8 are never attempted. (5) The workflow only triggers on pull_request_target: closed, so there is no automatic retry — the remaining deployments stay live on Cloudflare indefinitely unless someone notices the red run and manually re-runs it from the Actions UI.
Why nothing else prevents it. The script already shows care about partial-iteration hazards (the collect-then-delete comment to avoid pagination shifts, the 200-page cap raised after review), but the delete phase has no equivalent protection: there is no || true, no retry, and no per-id error accounting.
Addressing the counterargument. It's fair that fail-fast with a visible red run is a defensible design and that a manual re-run will re-list and delete the leftovers, so the original "no retry path" framing overstates it slightly — recovery exists, it's just manual and easy to miss on a closed PR. It's also true that the trigger needs an actual API failure on a specific DELETE, which is uncommon, and the consequence is the same low-impact class already discussed on the pagination-cap thread (a stale preview; nothing breaks). That's why this is a nit rather than a blocker. Still, a workflow whose entire purpose is unattended cleanup ideally shouldn't let one bad response abandon the rest of the batch when the hardening costs one line.
Fix. Tolerate per-id failures while still reporting them, e.g.:
failed=0
for id in $ids; do
echo "Deleting deployment $id"
if curl -fsS -X DELETE -H "Authorization: Bearer $CF_API_TOKEN" "$base/$id?force=true" > /dev/null; then
deleted=$((deleted + 1))
else
echo "::warning::failed to delete deployment $id"
failed=$((failed + 1))
fi
done
echo "Deleted $deleted deployment(s) for $BRANCH ($failed failed)."
[ "$failed" -eq 0 ]This deletes everything it can, still fails the run (visibly) if anything was left behind, and a manual re-run then only has the genuinely stuck ids to deal with.
Summary
Adds per-PR documentation previews on Cloudflare Pages so reviewers can see the rendered mkdocs site before merge.
docs/**,docs_src/**,mkdocs.ymlorpyproject.tomlbuild and deploy automatically on open and on every push./preview-docsto trigger a preview. Anyone else gets a polite rejection.Security
mkdocs executes Python from the PR (mkdocstrings imports
src/mcp,!!python/name:directives inmkdocs.yml), so the design treats the build as untrusted code:authorizejob checks the triggering actor's permission before any PR code is fetched. The auto-path keys onevent.sender.login(the pusher), not the PR opener, so a non-admin pushing to an admin's branch does not get an auto-build.buildjob runs PR code with no secrets and uploads the staticsite/as an artifact. Thedeployjob downloads the artifact and runswrangler pages deploy— PR code never shares a runner with the Cloudflare token..github/zizmor.ymldocuments and suppresses thedangerous-triggersfinding for these two workflows.Setup required before this takes effect
mcp-python-sdk-docs.CLOUDFLARE_API_TOKEN(scope: Account → Cloudflare Pages → Edit) andCLOUDFLARE_ACCOUNT_ID.CLOUDFLARE_PAGES_PROJECT= the project name from step 1.pull_request_targetruns the base branch's workflow, so this only takes effect once merged tomain.Test plan
actionlintclean on both new workflowszizmor .github/workflows/exits 0uv sync --frozen --group docs && uv run --frozen --no-sync mkdocs buildproducessite/index.html+site/api/locallypr-<n>.<project>.pages.devlink/preview-docsfrom a non-admin account on a fork PR → rejection reply/preview-docsfrom an admin → preview appearspr-<n>are deleted