Skip to content

docs: CI performance and warm Docker CI research#613

Open
paddymul wants to merge 252 commits intomainfrom
docs/ci-research
Open

docs: CI performance and warm Docker CI research#613
paddymul wants to merge 252 commits intomainfrom
docs/ci-research

Conversation

@paddymul
Copy link
Collaborator

@paddymul paddymul commented Mar 1, 2026

Summary

  • CI-performance.md: Analysis of current Depot CI — latency breakdown, runner tier comparison (2/4/8 CPU), per-job timing, path-gated optimization proposals
  • warm-docker-ci.md: Research into replacing Depot with a persistent Hetzner server running warm Docker containers — framework comparison, Dockerfile structure, sidecar pattern, CPU contention analysis, Hetzner Cloud vs Dedicated, provisioning automation

Context

Research/brainstorming docs, no code changes. Captures findings for future reference when implementing a faster CI setup.

🤖 Generated with Claude Code

@github-actions
Copy link

github-actions bot commented Mar 1, 2026

📦 TestPyPI package published

pip install --index-strategy unsafe-best-match --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ buckaroo==0.12.12.dev22741332667

or with uv:

uv pip install --index-strategy unsafe-best-match --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ buckaroo==0.12.12.dev22741332667

MCP server for Claude Code

claude mcp add buckaroo-table -- uvx --from "buckaroo[mcp]==0.12.12.dev22741332667" --index-strategy unsafe-best-match --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ buckaroo-table

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 18a7fbd4de

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +150 to +152
pytest -vv tests/unit/ &
(cd packages/buckaroo-js-core && pnpm test) &
wait

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Propagate background test failures in trigger script

This script backgrounds both test commands and then calls wait with no job IDs; in Bash, help wait states that when no IDs are given it waits for all children and returns status 0, so a failing pytest or pnpm test can still produce an overall success status. In the CI-trigger context here, that creates false-green runs and can let broken commits appear healthy.

Useful? React with 👍 / 👎.


# 1. Activate rescue system (~5s API call)
curl -s -u "$AUTH" "$API/boot/$SERVER_NUM/rescue" \
-d "os=linux&authorized_key[]=$SSH_FINGERPRINT"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Define SSH key variable before invoking Robot rescue API

The rebuild script uses authorized_key[]=$SSH_FINGERPRINT but never initializes SSH_FINGERPRINT, so running the snippet as written sends an empty key and the later SSH wait loops cannot authenticate to the rescue system. This makes the documented wipe/reprovision flow fail unless callers add hidden external setup.

Useful? React with 👍 / 👎.

- Pin uv/node/pnpm versions (don't track releases, bump when needed)
- Bump Node 20 → 22 LTS
- Add HETZNER_SERVER_ID/IP to .env.example
- Add development verification section (how Claude tests each script locally)
- Add monitoring & alerting section (health endpoint, systemd watchdog, disk hygiene, dead man's switch)
- Expand testing & ongoing verification (Depot as canary, deprecation criteria)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
paddymul and others added 3 commits March 1, 2026 13:55
Adds ci/hetzner/ with everything needed to run CI on a persistent CCX33:

- Dockerfile: Ubuntu 24.04, uv 0.6.6, Python 3.11-3.14, Node 22 LTS,
  pnpm 9.10.0, all deps pre-installed, Playwright chromium
- docker-compose.yml: warm sidecar container (sleep infinity), bind-mounts
  repo + logs, named volume for Playwright browsers
- webhook.py: Flask on :9000, HMAC-SHA256, per-branch cancellation via
  pkill, /health + /logs/<sha> endpoints, systemd watchdog
- run-ci.sh: 5-phase orchestrator (parallel lint+test-js+test-py-3.13 →
  build-wheel → sequential py 3.11/3.12/3.14 → parallel mcp+smoke →
  sequential playwright) with lockfile-aware dep skipping
- lib/status.sh: GitHub commit status API helpers
- lib/lockcheck.sh: SHA256 lockfile comparison, rebuilds deps only on change
- cloud-init.yml: one-shot CCX33 provisioning
- .env.example: template for required secrets

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add lib/status.sh (GitHub commit status API) and lib/lockcheck.sh
(lockfile hash comparison for warm dep skipping). Unblock them from
the lib/ gitignore rule which was intended for Python venv dirs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove owner:ci:ci from write_files (ci user doesn't exist yet at that stage)
- Fix echo runcmd entry with colon causing YAML dict parse error
- status.sh: skip GitHub API calls gracefully when GITHUB_TOKEN unset

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…it branch fix

- Add build-essential + libffi-dev + libssl-dev so cffi can compile
- cloud-init: clone --branch main (not default), add safe.directory

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…e unused import

- Dockerfile: git config --system safe.directory /repo so git checkout works
  inside the container (bind-mount owned by ci on host, root in container)
- test_playwright_jupyter.sh: add --allow-root so JupyterLab starts as root
- webhook.py: remove unused import signal

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… SHA

Dockerfile COPYs ci/hetzner/run-ci.sh and lib/ into /opt/ci-runner/.
run-ci.sh sources lib from CI_RUNNER_DIR (/opt/ci-runner/) instead of
/repo/ci/hetzner/lib/, so they survive `git checkout <sha>` even when
the SHA has no ci/hetzner/ directory (e.g. commits on main branch).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
job_lint_python was running uv sync --dev --no-install-project on the 3.13
venv, which strips --all-extras packages (e.g. pl-series-hash) because
optional extras require the project to be installed. This ran in parallel
with job_test_python_3.13, causing a race condition that randomly removed
pl-series-hash from the venv before tests ran.

ruff is already installed in the venv from the image build — no sync needed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
JupyterLab refuses to start as root without --allow-root. Rather than
patching every test script, bake c.ServerApp.allow_root = True into
/root/.jupyter/jupyter_lab_config.py in the image.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- mp_timeout tests: forkserver subprocess spawn takes >1s in Docker (timeout)
- test_server_killed_on_parent_death: SIGKILL propagation differs in containers
- Python 3.14.0a5: segfaults on pytest startup (CPython pre-release bug)

All three disabled with a note to revisit once timing/stability is known.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Documents all 9 bugs fixed during bringup, known Docker-incompatible
tests (disabled), and final timing: 8m59s wall time, all jobs passing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Each version has its own venv at /opt/venvs/3.11-3.14 — no shared
state, safe to run concurrently. Saves ~70-80s wall time on CCX33.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Run 7 (warm, sequential Phase 3): 8m23s
Run 8 (warm, parallel Phase 3): 7m21s — saves 1m07s

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
All 5 jobs bind to distinct ports (6006/8701/2718/8765/8889) — no
port conflicts. Redirect PLAYWRIGHT_HTML_OUTPUT_DIR per job to avoid
playwright-report/ write collisions. Expected saving: ~3m.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- marimo/wasm-marimo: set UV_PROJECT_ENVIRONMENT=/opt/venvs/3.13 so
  `uv run marimo` uses the pre-synced venv instead of racing to create
  /repo/.venv from scratch concurrently
- playwright-jupyter: use isolated /tmp/ci-jupyter-$$ venv so it
  doesn't pip-reinstall into the shared 3.13 venv while marimo reads it

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
paddymul and others added 30 commits March 4, 2026 16:02
3+3 A/B test: pw-jupyter 35-37s with or without renice.
Failures are unrelated (flaky pytest timing, b2b pw-jupyter timeout).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Force-install pytest-xdist after uv sync so `-n 4 --dist load`
  works even on old commits that don't have it in their lockfile.
- Wipe packages/node_modules in rebuild_deps before pnpm install
  so switching between commits with different pnpm-lock.yaml files
  doesn't leave a corrupted/mixed node_modules state.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Stale symlinks in packages/js/node_modules/ and packages/buckaroo-js-core/node_modules/
point to old .pnpm paths after lockfile change, causing pnpm to attempt concurrent
recreation -> ENOTEMPTY race between build-wheel and test-js.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
build-js uses --store-dir /opt/pnpm-store, updating .modules.yaml storeDir.
full_build.sh's pnpm run commands have no --store-dir, so pnpm sees a store
mismatch and re-links node_modules concurrently with test-js reading it.

Exporting npm_config_store_dir makes all pnpm commands inherit the same
store, eliminating the race condition.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Old commits don't have tests/unit/server/test_mcp_uvx_install.py.
pytest exits 5 (no tests collected) which we treated as failure.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
All 50 commits fail (expected: old code + new tests). Infrastructure
stable after 4 b2b fixes: pnpm store-dir mismatch, xdist missing,
node_modules ENOTEMPTY race, test-mcp-wheel false positive.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Exp 57: P<9 always times out (120s). Stagger has zero effect on pass
rate. P=9 failures are all test-python-3.13 timing flake under B2B load.
STAGGER=0 is safe to use.

Exp 62: pytest workers=8 saves 3s but triggers timing flake. Not worth it.

Exp 64: tsgo/vitest — test-js drops from ~4s to 2s, no regressions.
Branch ready to merge on clean run.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Fetches ci.log from server, animates job bars building up over time.
Uses uv inline deps (matplotlib, pillow) — no install needed.

Usage: uv run ci/hetzner/ci-gantt.py [SHA] [SHA2] [--run N]

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Brighter colors: #00e676 green, #ff5252 red, #ffd740 amber
- Full job names (no abbreviation), wider left margin (2.2in)
- Vertical gate lines: sky blue = JS built, purple = Wheel built
- Full redraw per frame to avoid stale line positions

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Comparisons now stack vertically (old on top, new on bottom)
- SHA:label syntax for descriptive titles instead of git hashes
- Explicit identical xticks on all panels so grid columns align
- Fixed output path (ci-gantt-latest.gif) overwrites previous output
- x labels only on bottom panel when stacking

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Converts from animated GIF to static JPEG. Wide bar area (13in),
compact rows (0.26in), gate lines for JS/Wheel built, SHA:label CLI
syntax for human-readable titles.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Jobs now ordered by average start time across all displayed runs,
with JOB_ORDER as a stable tiebreaker within each wave. This groups
wave-0 (lint/build-js/warmup/pytest), wave-1 (test-js/build-wheel),
and wave-2 (playwright/smoke/mcp) naturally without hardcoding.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
run-ci.sh: use >> with # RUN marker so multiple runs preserve all data;
add iowait as 4th column (ts busy total iowait).

ci-gantt.py: parse per-run segments, pick segment closest to t0,
extract iowait as orange overlay line alongside cpu% (blue).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
tests/conftest.py: autouse fixture gives each test its own in-memory
SQLiteExecutorLog and SQLiteFileCache, preventing xdist workers from
contending on ~/.buckaroo/*.sqlite.

sqlite_log.py / sqlite_file_cache.py: enable WAL journal mode +
NORMAL synchronous + 30s timeout on file-based connections, so any
remaining cross-process access (e.g. MultiprocessingExecutor
subprocesses) waits rather than immediately failing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Mark tests with hard wall-clock assertions as timing_dependent.
job_test_python now runs two parallel pytest invocations:
  - timing_dependent: nice -15, --dist no (single process, high priority)
  - regular: nice +19, -n 4 (parallel workers, low priority)

This gives timing-sensitive tests CPU priority over the bulk suite,
reducing flakes from scheduler contention during parallel CI.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
playwright-server starts 'python -m buckaroo.server --port 8701' via
Playwright's webServer config. That process was never in the ci_pkill
list, so it survived between CI runs. Next run found 8701 occupied and
failed immediately (reuseExistingServer=false in CI mode).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…gger between them)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Venv was rebuilt from scratch every run (rm -rf + uv venv + uv pip install).
Now cached at /opt/venvs/mcp-test keyed by wheel SHA256 — warm runs skip
the ~6s install step entirely.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
12 tests with 1 worker ran serially at ~3s each = 37s.
Both spec files (marimo.spec.ts + theme-screenshots-marimo.spec.ts)
only read from the shared marimo server — safe to parallelize.
Expected: ~21s (7-test file dominates over 5-test file).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…warmup

Only playwright-jupyter needs jupyter-warmup. All other wheel-dependent
jobs (test-mcp-wheel, playwright-marimo, playwright-server, smoke-test,
playwright-wasm-marimo, test-python-3.11/12/14) were blocked waiting
~7s for warmup to finish. Now they launch as soon as the wheel is built.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ter wheel

test-js doesn't need the built wheel — move it to wave 0 alongside lint.
test-python-3.11 moved to t0 to fill idle CPU during build-js/wheel phases.
test-python-3.12 and 3.14 deferred 10s after wheel to reduce peak contention
during the playwright/marimo/server burst.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously: wait for warmup (~10s) → then install wheel (~2s) → start pw-jupyter.
Now: start wheel install in background as soon as wheel is built and venv path
is written (~t=4s). By the time warmup finishes, install is already done.
Saves ~2s off playwright-jupyter start time.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When --local is set, all commands run directly (no SSH wrapper).
Allows running the stress test inside tmux on the server itself
so it survives network disconnects.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… killer

- pytest -m timing_dependent exits 5 (no tests collected) on old commits
  that predate the mark — treat exit code 5 as success
- fuser is not installed in the container, so fuser -k silently did nothing.
  Replace with kill_port() using /proc/net/tcp inode lookup. Fixes lingering
  marimo (2718), buckaroo-server (8701), storybook (6006) between runs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…mmits

- Add port 8765 (wasm-marimo HTTP server) to kill_port loop
- Add npx serve to ci_pkill list
- Replace fuser in Jupyter port cleanup (not in container)
- Add playwright.config.*.ts and test_playwright_server.sh to
  create-merge-commits.sh OVERLAY_PATHS so synth commits get
  current reuseExistingServer logic

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
New TEST_SHA=031c787e includes playwright.config.*.ts and
test_playwright_server.sh in the overlay. Updated SAFE_COMMITS SHAs
and fixed comment reference.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant