docs: CI performance and warm Docker CI research by paddymul · Pull Request #613 · buckaroo-data/buckaroo

paddymul · 2026-03-01T17:51:09Z

Summary

CI-performance.md: Analysis of current Depot CI — latency breakdown, runner tier comparison (2/4/8 CPU), per-job timing, path-gated optimization proposals
warm-docker-ci.md: Research into replacing Depot with a persistent Hetzner server running warm Docker containers — framework comparison, Dockerfile structure, sidecar pattern, CPU contention analysis, Hetzner Cloud vs Dedicated, provisioning automation

Context

Research/brainstorming docs, no code changes. Captures findings for future reference when implementing a faster CI setup.

🤖 Generated with Claude Code

github-actions · 2026-03-01T17:52:52Z

📦 TestPyPI package published

pip install --index-strategy unsafe-best-match --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ buckaroo==0.12.12.dev22741332667

or with uv:

uv pip install --index-strategy unsafe-best-match --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ buckaroo==0.12.12.dev22741332667

MCP server for Claude Code

claude mcp add buckaroo-table -- uvx --from "buckaroo[mcp]==0.12.12.dev22741332667" --index-strategy unsafe-best-match --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ buckaroo-table

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 18a7fbd4de

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-01T17:53:48Z

docs/llm/research/warm-docker-ci.md

+  pytest -vv tests/unit/ &
+  (cd packages/buckaroo-js-core && pnpm test) &
+  wait


Propagate background test failures in trigger script

This script backgrounds both test commands and then calls wait with no job IDs; in Bash, help wait states that when no IDs are given it waits for all children and returns status 0, so a failing pytest or pnpm test can still produce an overall success status. In the CI-trigger context here, that creates false-green runs and can let broken commits appear healthy.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-01T17:53:48Z

docs/llm/research/warm-docker-ci.md

+
+# 1. Activate rescue system (~5s API call)
+curl -s -u "$AUTH" "$API/boot/$SERVER_NUM/rescue" \
+  -d "os=linux&authorized_key[]=$SSH_FINGERPRINT"


Define SSH key variable before invoking Robot rescue API

The rebuild script uses authorized_key[]=$SSH_FINGERPRINT but never initializes SSH_FINGERPRINT, so running the snippet as written sends an empty key and the later SSH wait loops cannot authenticate to the rescue system. This makes the documented wipe/reprovision flow fail unless callers add hidden external setup.

Useful? React with 👍 / 👎.

- Pin uv/node/pnpm versions (don't track releases, bump when needed) - Bump Node 20 → 22 LTS - Add HETZNER_SERVER_ID/IP to .env.example - Add development verification section (how Claude tests each script locally) - Add monitoring & alerting section (health endpoint, systemd watchdog, disk hygiene, dead man's switch) - Expand testing & ongoing verification (Depot as canary, deprecation criteria) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Adds ci/hetzner/ with everything needed to run CI on a persistent CCX33: - Dockerfile: Ubuntu 24.04, uv 0.6.6, Python 3.11-3.14, Node 22 LTS, pnpm 9.10.0, all deps pre-installed, Playwright chromium - docker-compose.yml: warm sidecar container (sleep infinity), bind-mounts repo + logs, named volume for Playwright browsers - webhook.py: Flask on :9000, HMAC-SHA256, per-branch cancellation via pkill, /health + /logs/<sha> endpoints, systemd watchdog - run-ci.sh: 5-phase orchestrator (parallel lint+test-js+test-py-3.13 → build-wheel → sequential py 3.11/3.12/3.14 → parallel mcp+smoke → sequential playwright) with lockfile-aware dep skipping - lib/status.sh: GitHub commit status API helpers - lib/lockcheck.sh: SHA256 lockfile comparison, rebuilds deps only on change - cloud-init.yml: one-shot CCX33 provisioning - .env.example: template for required secrets Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add lib/status.sh (GitHub commit status API) and lib/lockcheck.sh (lockfile hash comparison for warm dep skipping). Unblock them from the lib/ gitignore rule which was intended for Python venv dirs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Remove owner:ci:ci from write_files (ci user doesn't exist yet at that stage) - Fix echo runcmd entry with colon causing YAML dict parse error - status.sh: skip GitHub API calls gracefully when GITHUB_TOKEN unset Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…it branch fix - Add build-essential + libffi-dev + libssl-dev so cffi can compile - cloud-init: clone --branch main (not default), add safe.directory Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…e unused import - Dockerfile: git config --system safe.directory /repo so git checkout works inside the container (bind-mount owned by ci on host, root in container) - test_playwright_jupyter.sh: add --allow-root so JupyterLab starts as root - webhook.py: remove unused import signal Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… SHA Dockerfile COPYs ci/hetzner/run-ci.sh and lib/ into /opt/ci-runner/. run-ci.sh sources lib from CI_RUNNER_DIR (/opt/ci-runner/) instead of /repo/ci/hetzner/lib/, so they survive `git checkout <sha>` even when the SHA has no ci/hetzner/ directory (e.g. commits on main branch). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

job_lint_python was running uv sync --dev --no-install-project on the 3.13 venv, which strips --all-extras packages (e.g. pl-series-hash) because optional extras require the project to be installed. This ran in parallel with job_test_python_3.13, causing a race condition that randomly removed pl-series-hash from the venv before tests ran. ruff is already installed in the venv from the image build — no sync needed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

JupyterLab refuses to start as root without --allow-root. Rather than patching every test script, bake c.ServerApp.allow_root = True into /root/.jupyter/jupyter_lab_config.py in the image. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- mp_timeout tests: forkserver subprocess spawn takes >1s in Docker (timeout) - test_server_killed_on_parent_death: SIGKILL propagation differs in containers - Python 3.14.0a5: segfaults on pytest startup (CPython pre-release bug) All three disabled with a note to revisit once timing/stability is known. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Documents all 9 bugs fixed during bringup, known Docker-incompatible tests (disabled), and final timing: 8m59s wall time, all jobs passing. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Each version has its own venv at /opt/venvs/3.11-3.14 — no shared state, safe to run concurrently. Saves ~70-80s wall time on CCX33. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Run 7 (warm, sequential Phase 3): 8m23s Run 8 (warm, parallel Phase 3): 7m21s — saves 1m07s Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

All 5 jobs bind to distinct ports (6006/8701/2718/8765/8889) — no port conflicts. Redirect PLAYWRIGHT_HTML_OUTPUT_DIR per job to avoid playwright-report/ write collisions. Expected saving: ~3m. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- marimo/wasm-marimo: set UV_PROJECT_ENVIRONMENT=/opt/venvs/3.13 so `uv run marimo` uses the pre-synced venv instead of racing to create /repo/.venv from scratch concurrently - playwright-jupyter: use isolated /tmp/ci-jupyter-$$ venv so it doesn't pip-reinstall into the shared 3.13 venv while marimo reads it Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

3+3 A/B test: pw-jupyter 35-37s with or without renice. Failures are unrelated (flaky pytest timing, b2b pw-jupyter timeout). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Force-install pytest-xdist after uv sync so `-n 4 --dist load` works even on old commits that don't have it in their lockfile. - Wipe packages/node_modules in rebuild_deps before pnpm install so switching between commits with different pnpm-lock.yaml files doesn't leave a corrupted/mixed node_modules state. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Stale symlinks in packages/js/node_modules/ and packages/buckaroo-js-core/node_modules/ point to old .pnpm paths after lockfile change, causing pnpm to attempt concurrent recreation -> ENOTEMPTY race between build-wheel and test-js. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

build-js uses --store-dir /opt/pnpm-store, updating .modules.yaml storeDir. full_build.sh's pnpm run commands have no --store-dir, so pnpm sees a store mismatch and re-links node_modules concurrently with test-js reading it. Exporting npm_config_store_dir makes all pnpm commands inherit the same store, eliminating the race condition. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Old commits don't have tests/unit/server/test_mcp_uvx_install.py. pytest exits 5 (no tests collected) which we treated as failure. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

All 50 commits fail (expected: old code + new tests). Infrastructure stable after 4 b2b fixes: pnpm store-dir mismatch, xdist missing, node_modules ENOTEMPTY race, test-mcp-wheel false positive. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Exp 57: P<9 always times out (120s). Stagger has zero effect on pass rate. P=9 failures are all test-python-3.13 timing flake under B2B load. STAGGER=0 is safe to use. Exp 62: pytest workers=8 saves 3s but triggers timing flake. Not worth it. Exp 64: tsgo/vitest — test-js drops from ~4s to 2s, no regressions. Branch ready to merge on clean run. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Fetches ci.log from server, animates job bars building up over time. Uses uv inline deps (matplotlib, pillow) — no install needed. Usage: uv run ci/hetzner/ci-gantt.py [SHA] [SHA2] [--run N] Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Brighter colors: #00e676 green, #ff5252 red, #ffd740 amber - Full job names (no abbreviation), wider left margin (2.2in) - Vertical gate lines: sky blue = JS built, purple = Wheel built - Full redraw per frame to avoid stale line positions Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Comparisons now stack vertically (old on top, new on bottom) - SHA:label syntax for descriptive titles instead of git hashes - Explicit identical xticks on all panels so grid columns align - Fixed output path (ci-gantt-latest.gif) overwrites previous output - x labels only on bottom panel when stacking Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Converts from animated GIF to static JPEG. Wide bar area (13in), compact rows (0.26in), gate lines for JS/Wheel built, SHA:label CLI syntax for human-readable titles. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Jobs now ordered by average start time across all displayed runs, with JOB_ORDER as a stable tiebreaker within each wave. This groups wave-0 (lint/build-js/warmup/pytest), wave-1 (test-js/build-wheel), and wave-2 (playwright/smoke/mcp) naturally without hardcoding. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

run-ci.sh: use >> with # RUN marker so multiple runs preserve all data; add iowait as 4th column (ts busy total iowait). ci-gantt.py: parse per-run segments, pick segment closest to t0, extract iowait as orange overlay line alongside cpu% (blue). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

tests/conftest.py: autouse fixture gives each test its own in-memory SQLiteExecutorLog and SQLiteFileCache, preventing xdist workers from contending on ~/.buckaroo/*.sqlite. sqlite_log.py / sqlite_file_cache.py: enable WAL journal mode + NORMAL synchronous + 30s timeout on file-based connections, so any remaining cross-process access (e.g. MultiprocessingExecutor subprocesses) waits rather than immediately failing. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Mark tests with hard wall-clock assertions as timing_dependent. job_test_python now runs two parallel pytest invocations: - timing_dependent: nice -15, --dist no (single process, high priority) - regular: nice +19, -n 4 (parallel workers, low priority) This gives timing-sensitive tests CPU priority over the bulk suite, reducing flakes from scheduler contention during parallel CI. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

playwright-server starts 'python -m buckaroo.server --port 8701' via Playwright's webServer config. That process was never in the ci_pkill list, so it survived between CI runs. Next run found 8701 occupied and failed immediately (reuseExistingServer=false in CI mode). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…gger between them) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Venv was rebuilt from scratch every run (rm -rf + uv venv + uv pip install). Now cached at /opt/venvs/mcp-test keyed by wheel SHA256 — warm runs skip the ~6s install step entirely. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

12 tests with 1 worker ran serially at ~3s each = 37s. Both spec files (marimo.spec.ts + theme-screenshots-marimo.spec.ts) only read from the shared marimo server — safe to parallelize. Expected: ~21s (7-test file dominates over 5-test file). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…warmup Only playwright-jupyter needs jupyter-warmup. All other wheel-dependent jobs (test-mcp-wheel, playwright-marimo, playwright-server, smoke-test, playwright-wasm-marimo, test-python-3.11/12/14) were blocked waiting ~7s for warmup to finish. Now they launch as soon as the wheel is built. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ter wheel test-js doesn't need the built wheel — move it to wave 0 alongside lint. test-python-3.11 moved to t0 to fill idle CPU during build-js/wheel phases. test-python-3.12 and 3.14 deferred 10s after wheel to reduce peak contention during the playwright/marimo/server burst. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Previously: wait for warmup (~10s) → then install wheel (~2s) → start pw-jupyter. Now: start wheel install in background as soon as wheel is built and venv path is written (~t=4s). By the time warmup finishes, install is already done. Saves ~2s off playwright-jupyter start time. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

When --local is set, all commands run directly (no SSH wrapper). Allows running the stress test inside tmux on the server itself so it survives network disconnects. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… killer - pytest -m timing_dependent exits 5 (no tests collected) on old commits that predate the mark — treat exit code 5 as success - fuser is not installed in the container, so fuser -k silently did nothing. Replace with kill_port() using /proc/net/tcp inode lookup. Fixes lingering marimo (2718), buckaroo-server (8701), storybook (6006) between runs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…mmits - Add port 8765 (wasm-marimo HTTP server) to kill_port loop - Add npx serve to ci_pkill list - Replace fuser in Jupyter port cleanup (not in container) - Add playwright.config.*.ts and test_playwright_server.sh to create-merge-commits.sh OVERLAY_PATHS so synth commits get current reuseExistingServer logic Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

New TEST_SHA=031c787e includes playwright.config.*.ts and test_playwright_server.sh in the overlay. Updated SAFE_COMMITS SHAs and fixed comment reference. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

paddymul temporarily deployed to testpypi March 1, 2026 17:52 — with GitHub Actions Inactive

chatgpt-codex-connector bot reviewed Mar 1, 2026

View reviewed changes

paddymul temporarily deployed to testpypi March 1, 2026 18:30 — with GitHub Actions Inactive

paddymul and others added 3 commits March 1, 2026 13:55

paddymul had a problem deploying to testpypi March 1, 2026 19:16 — with GitHub Actions Error

fix: Dockerfile needs build-essential for cffi/cryptography; cloud-in…

5ee2550

…it branch fix - Add build-essential + libffi-dev + libssl-dev so cffi can compile - cloud-init: clone --branch main (not default), add safe.directory Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

paddymul temporarily deployed to testpypi March 1, 2026 19:19 — with GitHub Actions Inactive

paddymul temporarily deployed to testpypi March 1, 2026 19:43 — with GitHub Actions Inactive

paddymul temporarily deployed to testpypi March 1, 2026 20:12 — with GitHub Actions Inactive

paddymul temporarily deployed to testpypi March 1, 2026 20:37 — with GitHub Actions Inactive

paddymul temporarily deployed to testpypi March 1, 2026 20:58 — with GitHub Actions Inactive

paddymul temporarily deployed to testpypi March 1, 2026 21:01 — with GitHub Actions Inactive

docs: update hetzner-ci-bringup with final clean run results

a373b9b

Documents all 9 bugs fixed during bringup, known Docker-incompatible tests (disabled), and final timing: 8m59s wall time, all jobs passing. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

paddymul temporarily deployed to testpypi March 2, 2026 02:26 — with GitHub Actions Inactive

perf: parallelize Phase 3 Python tests (3.11/3.12/3.14)

f05e4d7

Each version has its own venv at /opt/venvs/3.11-3.14 — no shared state, safe to run concurrently. Saves ~70-80s wall time on CCX33. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

paddymul temporarily deployed to testpypi March 2, 2026 02:44 — with GitHub Actions Inactive

docs: add warm cache and parallel Phase 3 timing results

1773af1

Run 7 (warm, sequential Phase 3): 8m23s Run 8 (warm, parallel Phase 3): 7m21s — saves 1m07s Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

paddymul temporarily deployed to testpypi March 2, 2026 02:55 — with GitHub Actions Inactive

paddymul temporarily deployed to testpypi March 2, 2026 03:46 — with GitHub Actions Inactive

paddymul and others added 30 commits March 4, 2026 16:02

docs: Exp 60 results — renice has no effect on 16C

6ddd4c0

3+3 A/B test: pw-jupyter 35-37s with or without renice. Failures are unrelated (flaky pytest timing, b2b pw-jupyter timeout). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix: skip test-mcp-wheel on commits that predate MCP

8f13a52

Old commits don't have tests/unit/server/test_mcp_uvx_install.py. pytest exits 5 (no tests collected) which we treated as failure. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix: mcp skip check on correct file

c8787af

feat: ci-gantt static JPEG output with compact wide layout

fcdc9ec

Converts from animated GIF to static JPEG. Wide bar area (13in), compact rows (0.26in), gate lines for JS/Wheel built, SHA:label CLI syntax for human-readable titles. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat: red y-axis labels for failing jobs in gantt chart

804e4b9

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix: ruff E702 semicolon style in ci-gantt.py

35d8a78

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

perf: run playwright-marimo and playwright-server in parallel (no sta…

71b36ff

…gger between them) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

chore: update TEST_SHA to 031c787 for synth commit regeneration

669af98

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

chore: re-bake safe set synth commits with playwright configs in overlay

10cab69

New TEST_SHA=031c787e includes playwright.config.*.ts and test_playwright_server.sh in the overlay. Updated SAFE_COMMITS SHAs and fixed comment reference. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: CI performance and warm Docker CI research#613

docs: CI performance and warm Docker CI research#613
paddymul wants to merge 252 commits intomainfrom
docs/ci-research

paddymul commented Mar 1, 2026

Uh oh!

github-actions bot commented Mar 1, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 1, 2026

Uh oh!

chatgpt-codex-connector bot Mar 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

paddymul commented Mar 1, 2026

Summary

Context

Uh oh!

github-actions bot commented Mar 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📦 TestPyPI package published

MCP server for Claude Code

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions bot commented Mar 1, 2026 •

edited

Loading