Skip to content

bird-agents: cloud OTF reference encode + per-task debug artefacts (DEV-1470)#3

Merged
ZmeiGorynych merged 10 commits into
mainfrom
egor/dev-1470-download-the-slayer_models_otf-models-back-from-cloud
May 26, 2026
Merged

bird-agents: cloud OTF reference encode + per-task debug artefacts (DEV-1470)#3
ZmeiGorynych merged 10 commits into
mainfrom
egor/dev-1470-download-the-slayer_models_otf-models-back-from-cloud

Conversation

@ZmeiGorynych
Copy link
Copy Markdown
Member

@ZmeiGorynych ZmeiGorynych commented May 26, 2026

Summary

  • Move LLM-driven per-DB _build_reference to the cloud and ship results back to the laptop's warm cache (paths.slayer_models_otf_root()) for reuse by future runs.
  • Per-task agent session logs + HARD-8 SLayer scratch storage upload to per-run debug bundles alongside the existing row+log writes.
  • Driver, worker, and merger all use a shared per-DB fcntl.flock so the in-cloud encoder, the optional seed download, and the laptop-side merge cannot interleave on the same (reference_root, db).

What changed

Driver (cloud/driver.py)

  • _check_slayer_setup_present for otf_encode + on-the-fly now requires _cache_fp.txt (deterministic stage 1, stays local) and treats _reference_fp.txt as an OPTIONAL upload seed (cloud encodes any db without a seed).
  • _build_job_args + _build_resubmit_args sort --instance-ids by (selected_database, instance_id) so same-db tasks dispatch adjacently — same-actor encoding becomes the common case, encode race becomes rare.
  • fetch(run_id) calls post_run_merge.merge_post_run_into_warm_cache after _collation.collate; merge report surfaced as metrics["merge_report"].

Worker (cloud/ray_app.py)

  • download_slayer_setup iterates a per-combo artifact list. REQUIRED artifacts: atomic tmp + os.rename + .download_complete marker (empty prefix fatal). OPTIONAL seed: no-op on empty prefix; non-empty → per-DB fcntl.flock-guarded file-by-file merge into existing root (newest-mtime-wins) + .optional_seed_download_complete marker so actor restart never clobbers a cloud-built reference.
  • _run_one_in_actor calls the upload-back triple (debug bundle → setup-encoder sessions → OTF reference delta) after row/log writes, in a try/except that logs to stderr and swallows.
  • Per-actor uploaded_dbs: set[str] + initial_seed_fp_by_db: dict[str, str] snapshot drives upload_otf_reference_delta's eligibility — failed uploads stay eligible on later tasks.

New: cloud/upload_back.py

  • upload_per_task_debug — ships /tmp/bird_interact_slayer_otf/<iid>/ to runs/<run_id>/sessions/<iid>/attempt-<n>/.
  • upload_per_task_setup_sessions — ships each _setup_sessions/<db>/ whose any file mtime ≥ task_start_ts to runs/<run_id>/sessions/_setup_sessions/<db>/.
  • upload_otf_reference_delta — for each <db>/ whose _reference_fp.txt differs from the actor's seed snapshot AND isn't in uploaded_dbs, uploads the WHOLE subtree to runs/<run_id>/post_run/slayer_models_otf/<hostname>-<pid>/<db>/. Upload order: bulk content → _source_mtimes.json sidecar → _upload_complete LAST.

New: cloud/post_run_merge.py

  • Walks <run_dir>/post_run/slayer_models_otf/<shard>/<db>/ across all shards. For each db, builds a per-file pick map (max source_mtime across shards). Skips shards missing _upload_complete or _source_mtimes.json.
  • Under per-DB fcntl.flock on <reference_root>/<db>.build.lock (shared with the build/seed locks): unlink local _reference_fp.txt FIRST → per-file atomic tmp + os.replace → write _reference_fp.txt LAST (also atomic). Preserves "marker present ⇒ content complete" against concurrent readers.

slayer_otf/reference_build.py

  • _build_reference wrapped in the same cross-process fcntl.flock so two Ray actor processes building the same DB on one VM serialise instead of racing _commit_reference's rmtree+rename.

Codex round-1 findings folded in

H1 (merger marker invariant), H2 (per-actor retry tracker), H3 (optional seed restart-safe), H4 (cross-process build flock), M1 (_upload_complete marker), M2 (empty optional prefix no-op), M3 (restart-safe optional download), M5/M6 (concurrency + upload-loss tests). M4 (mid-task crash loses /tmp/.../<iid>/) accepted as documented limitation. L1 (cross-shard mixing safe per fingerprint invariant) documented in module docstring.

Known limitation

Pre-completion losses (actor crash, VM preemption) lose /tmp/bird_interact_slayer_otf/<iid>/ for the in-flight task. The per-task row still lands; the debug bundle for that single task is gone. Per-task upload-back covers the common case; periodic upload was discussed and rejected as not worth the complexity.

Test plan

  • tests/cloud/test_upload_back.py (15 tests) — per-helper unit coverage, swallow semantics, retry-after-failure (H2/M6), upload ordering (M1), sidecar mtime fidelity
  • tests/cloud/test_post_run_merge.py (12 tests) — mtime-wins per file, marker invariant (H1/M3), shard-completeness (M1), per-file atomicity, cross-process lock (H1/H4), docstring fingerprint invariant (L1), report shape (M4)
  • tests/cloud/test_driver.py additions — cache-required + reference-optional contract, partial seeds, exact (db, iid) sort (L1), fetch post-merge hook
  • tests/cloud/test_ray_app.py additions — multi-artifact download, optional seed file-merge + marker + cross-process flock (H3/M1/H4), upload-back wiring + row-before-upload ordering + swallow, post-download seed snapshot timing (H2/H3)
  • tests/test_slayer_otf_reference_build.py addition — cross-process _build_reference flock (H4)
  • Full non-integration suite: 1078 passed, 92 skipped, 0 failed
  • Cloud smoke: bird-interact-cloud submit --framework pydantic_ai_otf_encode --slayer-setup on-the-fly ... (manual, separate)

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • On-the-fly workflows: deterministic cache is required; reference seeds are optional with per-DB optional-seed handling.
    • Post-run warm-cache promotion: per-run shards merged into the global warm cache (newest-source-mtime wins).
    • Best-effort per-task upload-back (debug, setup sessions, OTF reference deltas) and deterministic grouping of instance IDs by database.
    • Image build/tagging now accepts audited_gold via explicit build-context.
  • Bug Fixes

    • Stronger cross-process locking and atomic marker semantics to avoid race/race-recovery issues.
  • Tests

    • Expanded suites covering merges, downloads, uploads, ordering, concurrency, image hashing, and CLI wiring.
  • Documentation

    • Worktree-safety note for audited_gold and build-context usage.

Review Change Stack

…EV-1470)

Move LLM-driven `_build_reference` to the cloud and ship results back to the
laptop's warm cache for reuse. Per-task agent session logs + HARD-8 scratch
storage now upload to per-run debug bundles alongside the row+log writes.

Driver: `_check_slayer_setup_present` for otf_encode+on-the-fly now requires
`_cache_fp.txt` (deterministic) and treats `_reference_fp.txt` as an OPTIONAL
upload seed. `_build_job_args` / `_build_resubmit_args` sort `--instance-ids`
by (selected_database, instance_id) so same-db tasks dispatch adjacently and
the cross-actor encode race is rare. `fetch(run_id)` calls the new
`post_run_merge.merge_post_run_into_warm_cache` after collation.

Worker `download_slayer_setup` iterates a per-combo artifact list: REQUIRED
artifacts use the existing atomic rename + `.download_complete` marker (empty
prefix fatal); OPTIONAL `slayer_models_otf` seed no-ops on empty prefix,
file-merges into existing roots under per-DB `fcntl.flock` on non-empty, and
is gated by `.optional_seed_download_complete` so an actor restart doesn't
clobber a cloud-built reference. `_run_one_in_actor` fires the upload-back
triple (debug bundle, setup-encoder sessions, OTF reference delta) after the
row+log write, swallowing exceptions.

New `cloud/upload_back.py` ships per-task scratch + the per-DB OTF reference
shards (per-actor `<hostname>-<pid>` shard, bulk content → `_source_mtimes.json`
sidecar → `_upload_complete` LAST). Per-actor `uploaded_dbs` + initial-seed
fingerprint snapshot drive eligibility — failed uploads stay eligible for
retry on later tasks.

New `cloud/post_run_merge.py` promotes shards into the global warm cache:
per-file newest-source-mtime-wins, under per-DB `fcntl.flock` (shared with
`_build_reference`'s build lock), unlinking the local `_reference_fp.txt`
FIRST and writing it LAST (both via atomic `tmp+os.replace`) so the
marker-present-⇒-content-complete invariant holds against concurrent readers.
Shards missing `_upload_complete` or `_source_mtimes.json` are ignored.

`reference_build._build_reference` now takes the same cross-process flock so
two Ray actor processes building the same DB on one VM serialise instead of
racing `_commit_reference`'s rmtree+rename.

Known limitation (documented): pre-completion losses (actor crash, VM
preemption) lose `/tmp/bird_interact_slayer_otf/<iid>/` for the in-flight
task. Per-task row still lands; debug bundle for that single task is gone.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 26, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds DEV-1470 on-the-fly upload-back and warm-cache merge: required-vs-optional SLAYER artifact semantics, deterministic instance grouping for submit/resubmit, actor-side best-effort per-task uploads, per-DB flocked optional-seed download/merge, post-run per-db shard merge with newest-mtime-wins, and audited_gold BuildKit input handling for image builds.

Changes

OTF Upload-Back and Post-Run Merge

Layer / File(s) Summary
SLayer artifact contract and validation
src/bird_interact_agents/cloud/driver.py, tests/cloud/test_driver.py
_slayer_uploads_for() defines required vs optional artifact roots; _check_slayer_setup_present() validates required artifacts (cache) and treats reference seeds as optional.
Deterministic instance grouping & submit wiring
src/bird_interact_agents/cloud/driver.py, tests/cloud/test_driver.py
_instance_ids_sorted_by_db() groups --instance-ids by (selected_database, instance_id) for submit and resubmit; submit() passes audited_gold_root into image.build_and_push().
Image hashing & BuildKit audited_gold input
src/bird_interact_agents/cloud/image.py, Dockerfile.cloud, tests/cloud/test_image.py, src/bird_interact_agents/cloud/cli.py, tests/cloud/test_cli.py
data_hash/image_tag/build_and_push accept audited_gold_root, include audited_gold bytes in DATA-layer hash, omit audited_gold from dirty-worktree checks, and build_and_push wires --build-context audited-gold=<path>; Dockerfile.cloud now COPYs from audited-gold context; CLI forwards audited_gold_root to image APIs.
Artifact download and optional seed merge
src/bird_interact_agents/cloud/ray_app.py, tests/cloud/test_ray_app.py
Multi-artifact combo resolution; required artifacts downloaded atomically with .download_complete; optional seed prefixes merged per-file under per-db fcntl locks using don’t-clobber semantics and per-file atomic replace; optional seeds marked with .optional_seed_download_complete.
Upload-back triple: debug, setup sessions, and OTF reference
src/bird_interact_agents/cloud/upload_back.py, src/bird_interact_agents/cloud/ray_app.py, tests/cloud/test_upload_back.py
upload_per_task_debug() uploads per-IID work dir; upload_per_task_setup_sessions() uploads DB setup sessions with mtime filter; upload_otf_reference_delta() promotes per-DB OTF subtrees when fingerprint differs, uploading content → _source_mtimes.json_upload_complete and updating uploaded_dbs only on success. All are best-effort and run after row/log writes.
Actor integration & upload ordering
src/bird_interact_agents/cloud/ray_app.py, tests/cloud/test_ray_app.py
Actors snapshot initial_seed_fp_by_db after download, maintain uploaded_dbs per-actor, and call upload-back helpers after writes but before log cleanup; upload failures are swallowed so tasks still complete.
Post-run warm-cache merge orchestration
src/bird_interact_agents/cloud/post_run_merge.py, src/bird_interact_agents/cloud/driver.py, tests/cloud/test_post_run_merge.py
merge_post_run_into_warm_cache() scans run shards, ignores incomplete shards, selects newest source_mtime per relative path across shards, acquires per-db flock, unlinks local _reference_fp.txt (if present) before replacing files, atomically replaces only when cloud mtime is newer, and writes marker last; fetch() calls merge after collation and records merge report in metrics.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • MotleyAI/bird-agents#2: Shares prior DEV-1468 slayer marker and per-DB build locking work that this PR extends.

Poem

🐰 I hopped through shards and mtimes bright,

Pushed back seeds into the warm-cache night.
Locks kept each DB tidy and sane,
Markers wrote last — no partial pain.
A rabbit’s merge: atomic and light!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 65.73% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: cloud OTF reference encoding and per-task debug artifact uploads (DEV-1470), which are the core features implemented across multiple new modules and changes.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch egor/dev-1470-download-the-slayer_models_otf-models-back-from-cloud

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/bird_interact_agents/cloud/driver.py (1)

221-229: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't treat _reference_fp.txt as a sufficient OTF-seed presence check.

reference_build._load_reference_entry() now rejects a marker-only reference because _kb_rows.json is mandatory. With the current check, a corrupt slayer_models_otf/<db>/ still gets uploaded as an optional seed, and the cloud reuse path will fail on that seed instead of just rebuilding it lazily.

Suggested fix
 def _artifact_present(root: Path, db: str, artifact: str) -> bool:
     db_dir = root / db
     if artifact == "slayer_models":
         return db_dir.is_dir() and any(db_dir.iterdir())
-    marker = "_cache_fp.txt" if artifact == "slayer_otf_cache" else "_reference_fp.txt"
-    return (db_dir / marker).is_file()
+    if artifact == "slayer_otf_cache":
+        return (db_dir / "_cache_fp.txt").is_file()
+    return (
+        (db_dir / "_reference_fp.txt").is_file()
+        and (db_dir / "_kb_rows.json").is_file()
+    )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/bird_interact_agents/cloud/driver.py` around lines 221 - 229, The
presence check currently treats any marker file as sufficient for OTF seeds,
which lets marker-only/corrupt OTF dirs pass; update _artifact_present to
require both the marker and the mandatory "_kb_rows.json" for OTF-style
artifacts (e.g., when artifact == "slayer_otf_cache") while keeping the existing
non-empty-dir check for "slayer_models"; specifically, in _artifact_present use
the existing marker variable but return True only if (db_dir / marker).is_file()
AND (db_dir / "_kb_rows.json").is_file() for OTF artifacts.
🧹 Nitpick comments (1)
tests/cloud/test_post_run_merge.py (1)

400-418: 💤 Low value

Remove unused run_dir from the worker function signature.

The run_dir variable is unpacked but never used in _spawn_merge_holding_lock. Since this function only needs to acquire the per-DB lock (not perform an actual merge), the parameter is unnecessary.

🧹 Suggested cleanup
 def _spawn_merge_holding_lock(args):
     """Worker that acquires the per-DB build lock (shared with the merger
     under the H4 resolution) and holds it for `hold_s`."""
-    run_dir, reference_root, db, hold_s, sentinel_path = args
+    _run_dir, reference_root, db, hold_s, sentinel_path = args
     import fcntl

Or remove it from the caller as well:

-    args=((run_dir, ref_root, "db_a", 2.0, str(sentinel)),),
+    args=((ref_root, "db_a", 2.0, str(sentinel)),),
-    run_dir, reference_root, db, hold_s, sentinel_path = args
+    reference_root, db, hold_s, sentinel_path = args
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/cloud/test_post_run_merge.py` around lines 400 - 418, The function
_spawn_merge_holding_lock currently unpacks a unused run_dir parameter; remove
run_dir from the function signature and from its tuple unpacking inside
_spawn_merge_holding_lock, and update any callers/tests that construct/pass the
argument so they only pass (reference_root, db, hold_s, sentinel_path) (or
adjust the caller to stop including run_dir in the args tuple). Ensure the
lock_path logic and use of fcntl, Path(sentinel_path).write_text, and time.sleep
remain unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/bird_interact_agents/cloud/ray_app.py`:
- Around line 203-207: The merge currently writes a temp file (tmp_dst) and
os.replace() moves it into place, which gives dst the temp file's mtime
(download time); fix by capturing the source mtime (src_mtime =
src.stat().st_mtime) and after os.replace(tmp_dst, dst) call os.utime(dst,
(src_mtime, src_mtime)) so the replaced file preserves the original source
modification time; update the block around tmp_dst.write_bytes(...) and
os.replace(...) to set src_mtime and call os.utime on dst.

In `@src/bird_interact_agents/cloud/upload_back.py`:
- Around line 203-236: The upload loop currently runs for every slayer job and
can re-upload stale OTF files; add a strict guard so this code only runs for the
OTF-encode flow by checking cfg.get("run_name") == "pydantic_ai_otf_encode" (in
addition to cfg.get("query_mode") == "slayer") at the top of
upload_otf_reference_delta (or the surrounding scope), i.e. return early unless
both conditions hold; keep the rest of the logic (refs: cfg,
initial_seed_fp_by_db, slayer_models_otf_root, _upload_one_db_shard,
uploaded_dbs) unchanged so only true encode runs can mark/upload shards.

---

Outside diff comments:
In `@src/bird_interact_agents/cloud/driver.py`:
- Around line 221-229: The presence check currently treats any marker file as
sufficient for OTF seeds, which lets marker-only/corrupt OTF dirs pass; update
_artifact_present to require both the marker and the mandatory "_kb_rows.json"
for OTF-style artifacts (e.g., when artifact == "slayer_otf_cache") while
keeping the existing non-empty-dir check for "slayer_models"; specifically, in
_artifact_present use the existing marker variable but return True only if
(db_dir / marker).is_file() AND (db_dir / "_kb_rows.json").is_file() for OTF
artifacts.

---

Nitpick comments:
In `@tests/cloud/test_post_run_merge.py`:
- Around line 400-418: The function _spawn_merge_holding_lock currently unpacks
a unused run_dir parameter; remove run_dir from the function signature and from
its tuple unpacking inside _spawn_merge_holding_lock, and update any
callers/tests that construct/pass the argument so they only pass
(reference_root, db, hold_s, sentinel_path) (or adjust the caller to stop
including run_dir in the args tuple). Ensure the lock_path logic and use of
fcntl, Path(sentinel_path).write_text, and time.sleep remain unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 460d6417-817d-4cc7-9c62-9f3c1e92dbe3

📥 Commits

Reviewing files that changed from the base of the PR and between 7349e96 and 55ca611.

📒 Files selected for processing (10)
  • src/bird_interact_agents/cloud/driver.py
  • src/bird_interact_agents/cloud/post_run_merge.py
  • src/bird_interact_agents/cloud/ray_app.py
  • src/bird_interact_agents/cloud/upload_back.py
  • src/bird_interact_agents/slayer_otf/reference_build.py
  • tests/cloud/test_driver.py
  • tests/cloud/test_post_run_merge.py
  • tests/cloud/test_ray_app.py
  • tests/cloud/test_upload_back.py
  • tests/test_slayer_otf_reference_build.py

Comment thread src/bird_interact_agents/cloud/ray_app.py
Comment thread src/bird_interact_agents/cloud/upload_back.py Outdated
…ild (DEV-1470)

Five fixes from Codex + CodeRabbit on PR #3, plus the related worktree-safety
change `audited_gold/` needed to actually run the smoke from a worktree.

**Codex Cdx-1 / CodeRabbit CR-1 — preserve source_mtime across atomic replace.**
`post_run_merge._atomic_replace` / `_write_marker_atomic` and
`ray_app._download_optional_seed` all used `os.replace(tmp, dst)` of a
freshly-written tmp file, so `dst` inherited the laptop's fetch time, not the
source's actual mtime. A later fetch of a genuinely-newer cloud reference
would then be wrongly skipped because the comparison reads
`dst.stat().st_mtime`. Fix: `os.utime(tmp, (source_mtime, source_mtime))`
BEFORE the rename so dst inherits the source's mtime. Regression tests in
`test_post_run_merge.py` (per-file mtime preservation + two-fetch
newest-wins scenario) and `test_ray_app.py` (optional seed mtime).

**CodeRabbit CR-2 — gate `upload_otf_reference_delta` on otf_encode+on-the-fly.**
The helper returned only on `query_mode \!= "slayer"`. For recursive+slayer
or pre-encoded+slayer, `initial_seed_fp_by_db` is `{}`, so any stale
`slayer_models_otf/<db>/` on a shared worker FS would pass the fp-mismatch
check and pollute the laptop warm cache. Widened the guard to require
`framework == "pydantic_ai_otf_encode"` AND `slayer_setup == "on-the-fly"`.
Regression test covers all three non-eligible combos.

**Codex Cdx-2 — re-check the marker INSIDE `_build_reference_inside_lock`.**
Two Ray actor processes can both pass `ensure_db_reference`'s asyncio marker
check, then block on the cross-process flock; once the lock is released both
would re-enter `_build_reference` and BOTH run the full LLM encode (the
asyncio Lock only serializes within ONE process). Added the marker re-check
at the top of `_build_reference_inside_lock` — if the peer just committed,
load via `_load_reference_entry` and skip the encoder entirely. Regression
test spawns a peer process that commits the reference while the main
asyncio task is blocked on the flock; the test asserts `build_encoder` is
never invoked.

**CodeRabbit CR-3 nitpick** — renamed the unused `run_dir` in
`_spawn_merge_holding_lock` to `_run_dir`.

**Worktree-safety: ship `audited_gold/` via `--build-context`** (out-of-band:
caught by attempting the smoke from a worktree). `audited_gold/` is
gitignored and lives in the MAIN checkout, so the worktree's
`COPY audited_gold/ ...` in `Dockerfile.cloud` failed with
`"/audited_gold": not found`. Mirrored how `mini-interact` is wired:
`build_and_push` now passes `--build-context audited-gold=<paths.audited_gold_root()>`
and the Dockerfile uses `COPY --from=audited-gold ...`. `image.data_hash` /
`image.image_tag` take `audited_gold_root` as an explicit positional param
so the hash key is stable regardless of where the dir lives on the host.
Dropped `audited_gold/` from `_dirty_image_input_paths` (it's gitignored, so
it could never appear in `git status --porcelain` anyway, and the dir is
now content-hashed via the explicit root). 4 new regression tests in
`test_image.py` (worktree-vs-main hash equivalence; stale-worktree-subdir
ignored; `--build-context audited-gold` arg; dirty-prefix no longer
includes audited_gold). Added a worktree-safety note to project CLAUDE.md.

Full non-integration suite: 1087 passed (was 1078 pre-PR + 5 fix regressions
+ 4 worktree-safety regressions).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/bird_interact_agents/slayer_otf/reference_build.py`:
- Around line 565-581: The current check uses the combined rebuild condition
(`force or target.exists()`) earlier and here only inspects `force` before
reusing a peer-committed reference, which causes a race where a stale markerless
directory (target.exists() true) can still trigger a rebuild even after a peer
committed; update ensure_db_reference()/calling logic so the decision to skip
reuse is based only on an explicit user intent flag (`force`) not on
`target.exists()`, and modify this block (around marker/_MARKER check in
reference_build.py) to re-check the marker file and return
_load_reference_entry(target).setup_results when marker.is_file() and not force,
ensuring that existence of target without marker does not suppress peer-commit
reuse — keep the separate semantics for "explicit user rebuild" (force) vs
"pre-existing target".

In `@tests/cloud/test_ray_app.py`:
- Around line 1616-1663: The test only checks that os.utime was called on a tmp
'.seed-' file but doesn't assert the final file mtime; modify
test_optional_seed_merge_preserves_src_mtime to (1) choose a deterministic
expected_src_mtime (e.g. expected_src_mtime = 1_600_000_000.0) and ensure the
spy_utime will be invoked with that value for the tmp file, (2) after calling
ray_app.download_slayer_setup(RUN_ID, cfg, client=client) locate the final
destination file ref_root / "db_a" / "models" / "x.yaml" and assert its
stat().st_mtime == expected_src_mtime, and (3) keep the existing assertion that
os.utime was called on the tmp '.seed-' path (using utime_calls) but use the
recorded/expected_src_mtime for the final st_mtime check so the test verifies
the dst actually inherited the source mtime.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ed0d17e3-0a72-4faf-a144-970043bb380b

📥 Commits

Reviewing files that changed from the base of the PR and between 55ca611 and e3b41b7.

📒 Files selected for processing (13)
  • CLAUDE.md
  • Dockerfile.cloud
  • src/bird_interact_agents/cloud/driver.py
  • src/bird_interact_agents/cloud/image.py
  • src/bird_interact_agents/cloud/post_run_merge.py
  • src/bird_interact_agents/cloud/ray_app.py
  • src/bird_interact_agents/cloud/upload_back.py
  • src/bird_interact_agents/slayer_otf/reference_build.py
  • tests/cloud/test_image.py
  • tests/cloud/test_post_run_merge.py
  • tests/cloud/test_ray_app.py
  • tests/cloud/test_upload_back.py
  • tests/test_slayer_otf_reference_build.py

Comment thread src/bird_interact_agents/slayer_otf/reference_build.py
Comment thread tests/cloud/test_ray_app.py Outdated
Three Codex-round-2 findings, all major. One of them (Cdx2-1) was the
reason the smoke had been silently failing — the round-1 image-tag
signature change broke a second production call site I missed.

**Cdx2-1 — `cli.py build` subcommand still used the old `image_tag`/
`build_and_push` signature.** `driver.submit` was updated for the
`audited_gold_root` parameter; `cloud/cli.py:131-137` was not. Running
`bird-interact-cloud build` (or `submit` via the build subcommand path)
raised `TypeError: image_tag() missing 1 required positional argument:
'audited_gold_root'`. Updated the subcommand to pass
`paths.audited_gold_root()` to both calls. New regression test in
`tests/cloud/test_cli.py::test_build_subcommand_passes_audited_gold_root_to_image_tag`
mocks both image helpers and asserts the arg is plumbed through.

**Cdx2-2 — Optional seed merge: drop the broken mtime-wins, use "don't
clobber if dst exists".** The round-1 fix preserved `src_mtime` across the
`os.replace`, but the source mtime captured at download time is the local
write time (gcs.download_prefix writes blobs to local tmp files without
preserving any GCS-side metadata), not the original upload-time mtime.
So a stale seed on a restarted actor could still wrongly overwrite newer
cloud-built local references. Correct semantics for `_download_optional_seed`:
if dst already exists (cloud-built reference from a prior actor, OR a
crashed mid-merge), KEEP it. The cross-process per-DB flock + the
`.optional_seed_download_complete` marker make this race-safe.
Replaced the now-misleading `test_optional_seed_merge_preserves_src_mtime`
with `test_download_slayer_setup_optional_seed_does_not_clobber_local`.

**Cdx2-3 — Merger downgraded the local marker when cloud lost the
comparison.** The round-1 merger unlinked the local `_reference_fp.txt`
FIRST, then compared cloud marker mtime against `local_marker.stat()` —
which raised `FileNotFoundError` and defaulted to `0.0`. Older cloud
markers therefore always "won" the comparison, leaving newer local content
under a stale fingerprint (invariant 'marker present ⇒ content complete'
violated). Refactored `_merge_one_db` to: (1) capture local marker mtime +
bytes BEFORE any unlink, (2) precompute all per-file decisions, (3) skip
the whole touch when nothing changes, (4) on any actual change unlink the
marker, do per-file atomic replaces, then either write the cloud marker
(if it won) OR restore the local marker (if cloud lost). Two new
regression tests in `test_post_run_merge.py` cover the older-cloud
no-downgrade case and the mixed-win case where the marker must be
restored after a partial content overwrite.

Full non-integration suite: 1089 passed (was 1087 + 2 new regressions —
one existing optional-seed test was REPLACED to match the new
"don't clobber" contract, so net is +2).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
src/bird_interact_agents/cloud/ray_app.py (1)

196-214: ⚡ Quick win

Update _download_optional_seed's docstring to match this new rule.

This block now implements "don't clobber if dst already exists", but the function docstring above still says newest-source-mtime-wins. Leaving both contracts in the same function will mislead the next refactor.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/bird_interact_agents/cloud/ray_app.py` around lines 196 - 214, Update the
_download_optional_seed function docstring to reflect the new "don't clobber if
dst already exists" semantics instead of "newest-source-mtime-wins"; mention
that the function now checks dst.exists() and skips writing when present, uses
an atomic tmp_dst (tmp_dst = dst.parent / f".{dst.name}.seed-{os.getpid()}")
with tmp+os.replace for crash safety, and relies on the cross-process flock and
the .optional_seed_download_complete marker to ensure safe
concurrent/partial-download behavior.
tests/cloud/test_ray_app.py (1)

1592-1611: ⚡ Quick win

Also assert that an existing local _reference_fp.txt is preserved.

This test only proves the "don't clobber" rule for content files. _download_optional_seed applies the same branch to _reference_fp.txt, and overwriting that marker would change completeness and upload-back behavior without tripping this regression.

Suggested test tightening
     store[f"{rpfx}/db_a/_reference_fp.txt"] = b"seed-fp"
     store[f"{rpfx}/db_a/models/x.yaml"] = b"SEED-CONTENT\n"
     store[f"{rpfx}/db_a/models/y.yaml"] = b"SEED-Y\n"

     # Local already has a cloud-built x.yaml (from a prior actor on this
     # VM). MUST NOT be overwritten by the seed.
     local_db = ref_root / "db_a"
     (local_db / "models").mkdir(parents=True)
     (local_db / "models" / "x.yaml").write_bytes(b"CLOUD-BUILT\n")
+    (local_db / "_reference_fp.txt").write_bytes(b"local-fp")
@@
     assert (local_db / "models" / "x.yaml").read_bytes() == b"CLOUD-BUILT\n", (
         "seed merged on top of an existing local file — must not clobber"
     )
+    assert (local_db / "_reference_fp.txt").read_bytes() == b"local-fp", (
+        "seed marker clobbered an existing local marker"
+    )
     assert (local_db / "models" / "y.yaml").read_bytes() == b"SEED-Y\n", (
         "seed file absent locally must still be downloaded"
     )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/cloud/test_ray_app.py` around lines 1592 - 1611, Add a check that an
existing local _reference_fp.txt is not overwritten by the seed: before calling
download_slayer_setup(RUN_ID, cfg, client=client) create a local marker file at
ref_root / "db_a" / "_reference_fp.txt" with a distinct byte string (e.g.,
b"CLOUD-FP"), then after download assert that (ref_root / "db_a" /
"_reference_fp.txt").read_bytes() still equals that original byte string;
reference the existing symbols rpfx, local_db, ref_root, download_slayer_setup,
and _download_optional_seed to locate the relevant setup and assertion spot.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/bird_interact_agents/cloud/ray_app.py`:
- Around line 196-214: Update the _download_optional_seed function docstring to
reflect the new "don't clobber if dst already exists" semantics instead of
"newest-source-mtime-wins"; mention that the function now checks dst.exists()
and skips writing when present, uses an atomic tmp_dst (tmp_dst = dst.parent /
f".{dst.name}.seed-{os.getpid()}") with tmp+os.replace for crash safety, and
relies on the cross-process flock and the .optional_seed_download_complete
marker to ensure safe concurrent/partial-download behavior.

In `@tests/cloud/test_ray_app.py`:
- Around line 1592-1611: Add a check that an existing local _reference_fp.txt is
not overwritten by the seed: before calling download_slayer_setup(RUN_ID, cfg,
client=client) create a local marker file at ref_root / "db_a" /
"_reference_fp.txt" with a distinct byte string (e.g., b"CLOUD-FP"), then after
download assert that (ref_root / "db_a" / "_reference_fp.txt").read_bytes()
still equals that original byte string; reference the existing symbols rpfx,
local_db, ref_root, download_slayer_setup, and _download_optional_seed to locate
the relevant setup and assertion spot.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: bb330c90-2aba-4229-9ef9-79963ad62f6b

📥 Commits

Reviewing files that changed from the base of the PR and between e3b41b7 and dba12b3.

📒 Files selected for processing (6)
  • src/bird_interact_agents/cloud/cli.py
  • src/bird_interact_agents/cloud/post_run_merge.py
  • src/bird_interact_agents/cloud/ray_app.py
  • tests/cloud/test_cli.py
  • tests/cloud/test_post_run_merge.py
  • tests/cloud/test_ray_app.py

Two issues from CodeRabbit's review of e3b41b7 / dba12b3.

**CR r2 — peer-marker reuse path was disabled by stale markerless scrap.**
`ensure_db_reference` passed `force=force or target.exists()` to
`_build_reference`, so any local markerless dir set `force=True` and
caused `_build_reference_inside_lock`'s `if not force and marker.is_file()`
peer-reuse to be bypassed. Concretely: an actor with stale leftover scrap
who lost the cross-process flock race would, on lock acquisition, observe
the peer's freshly-committed reference but still rmtree+rebuild it.
Split the two concerns: `ensure_db_reference` now passes `force=force`
(user intent only), and `_commit_reference` handles markerless-scrap on
its own — it rmtrees a markerless target before the atomic rename, regardless
of `force`. Result: a peer's commit is honored under both `force=False`
(reuse) and `force=True` (no markerless rmtree of a marked dir), completing
the H4 cross-process safety story. New regression test
`test_ensure_db_reference_reuses_peer_commit_when_markerless_scrap_exists`
spawns a peer that commits while the main task waits on flock + asserts
the build_encoder is never invoked.

**CR r2 nitpick — `_download_optional_seed` docstring stale.** The Codex
r2 fix moved the optional seed merge from mtime-wins to "don't clobber
if dst exists", but the docstring still claimed "newest-source-mtime-wins".
Rewrote the docstring to reflect the new contract + the rationale for
abandoning mtime comparison (`src.stat().st_mtime` is download-time, not
upload-time; gcs.download_prefix doesn't preserve metadata).

Full non-integration suite: 1090 passed (was 1089 + 1 new regression).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/test_slayer_otf_reference_build.py (1)

880-890: 💤 Low value

Minor: mutable class attribute in test helper.

index_rows: list[dict] = [] is a mutable class attribute. While harmless here (each class is defined fresh inside the test function and the list is never mutated), using a property or instance attribute would be more defensive.

 class _RunOne:
-    index_rows: list[dict] = []
+    `@property`
+    def index_rows(self) -> list[dict]:
+        return []

This is a low-priority nitpick — the current code works correctly for the test's purpose.

Also applies to: 792-803, 992-1002

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_slayer_otf_reference_build.py` around lines 880 - 890, The class
_RunOne defines a mutable class attribute index_rows: list[dict] = [], which
should be an instance attribute to avoid shared mutable state; change it to
initialize self.index_rows = [] inside an __init__ (or make it a `@property` that
returns a new list) and update any code that references _RunOne.index_rows to
use the instance attribute; apply the same change to the other test helper
_RunOne definitions mentioned in the review so each test gets its own list
instance.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/test_slayer_otf_reference_build.py`:
- Around line 880-890: The class _RunOne defines a mutable class attribute
index_rows: list[dict] = [], which should be an instance attribute to avoid
shared mutable state; change it to initialize self.index_rows = [] inside an
__init__ (or make it a `@property` that returns a new list) and update any code
that references _RunOne.index_rows to use the instance attribute; apply the same
change to the other test helper _RunOne definitions mentioned in the review so
each test gets its own list instance.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ed3aadda-4242-4a1f-8295-c9edfbe4a05e

📥 Commits

Reviewing files that changed from the base of the PR and between dba12b3 and 911b5d2.

📒 Files selected for processing (3)
  • src/bird_interact_agents/cloud/ray_app.py
  • src/bird_interact_agents/slayer_otf/reference_build.py
  • tests/test_slayer_otf_reference_build.py

**Codex r3 — seed-fp snapshot must come from GCS, not local disk.**
`_snapshot_initial_seed_fps()` walked `paths.slayer_models_otf_root()` to
build the per-DB seed-fingerprint dict that drives
`upload_otf_reference_delta`'s skip-or-upload decision. But the OTF reference
root is SHARED across actor processes on a VM, so a REPLACEMENT actor (after
a peer died mid-task before its post-task upload-back ran) would see the
peer's cloud-built reference on disk and record its fingerprint as
"initial seed". Then `upload_otf_reference_delta` would skip the upload
because the fingerprint appears unchanged — losing the only post-run shard
for that DB, so `fetch()` couldn't merge it into the laptop warm cache.

Fix: snapshot AUTHORITATIVELY from GCS. The seed prefix
`runs/<run_id>/slayer_setup/slayer_models_otf/` is set by the driver at
submit time, BEFORE any actor starts, and is immutable for the run. Any
local fingerprint NOT in this snapshot is genuine cloud work that must be
uploaded back. Empty dict on any error (conservative: upload everything
cloud-built, never lose data).

Two regression tests in `test_ray_app.py`:
- `test_actor_captures_uploaded_dbs_and_initial_seed_fps_from_gcs`
  populates GCS (not disk) and asserts the snapshot reads from it.
- `test_actor_seed_snapshot_ignores_local_only_cloud_built_refs` sets up
  the exact Codex scenario: GCS seed has db_a; local disk ALSO has db_b
  from a peer's crashed task; snapshot must contain only db_a, and
  `upload_otf_reference_delta` must successfully upload db_b.

**CR r3 nitpick** — Replaced the `index_rows: list[dict] = []` class
attribute with `self.index_rows = []` in `__init__` across all three
`_RunOne` test helpers, defensive against the mutable-default-shared trap
(harmless in practice but easy to fix).

Full non-integration suite: 1091 passed.

Smoke validation: `bird-interact-cloud submit --framework pydantic_ai
--query-mode raw --instance-ids alien_1,alien_2 --detach` succeeded as
run `20260526t1008-pydanticai-raw-90e564` — confirms the worktree-safety
fix, the CLI signature fix (Cdx2-1), and the universal touched paths
(db-sort, fetch hook, image build, cluster lifecycle).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/bird_interact_agents/cloud/ray_app.py`:
- Around line 679-681: The continuation line for the assignment to
self.initial_seed_fp_by_db inside WorkerActor.__init__ is mis-indented causing
an IndentationError; locate the assignment that calls _snapshot_initial_seed_fps
(symbol: _snapshot_initial_seed_fps, target attribute: initial_seed_fp_by_db)
and re-indent the wrapped call so the continuation line aligns with the start of
the assignment expression inside __init__ (use the same indentation level as
other statements in the method, e.g., move from 8 spaces to the matching
12-space indentation used for inner statements) so the call is syntactically
valid.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ea2b24b4-7b04-4dd3-afc0-1d009c91351d

📥 Commits

Reviewing files that changed from the base of the PR and between 911b5d2 and f98e841.

📒 Files selected for processing (3)
  • src/bird_interact_agents/cloud/ray_app.py
  • tests/cloud/test_ray_app.py
  • tests/test_slayer_otf_reference_build.py

Comment thread src/bird_interact_agents/cloud/ray_app.py Outdated
ZmeiGorynych and others added 5 commits May 26, 2026 12:44
…-1470)

CR r4 flagged this as a critical SyntaxError but that claim is incorrect:
Python suppresses indentation rules for continuation lines inside parens, so
the misaligned `_snapshot_initial_seed_fps(...)` call parsed and ran fine —
module imports cleanly, `_build_actor_class()` builds `WorkerActor`, and the
full 1091-test suite passes against the misaligned source.

The realignment is still a strict readability improvement (PEP 8 alignment),
so applying it. Side-effect of a previous `replace_all` Edit that didn't
account for the deeper nesting level inside the lazily-defined `WorkerActor`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
**Codex r5 — merger could produce a mixed-fingerprint reference tree.**
`_merge_one_db` decided each content file and the marker independently by
mtime. A newer cloud marker could win while one or more cloud payload files
LOST to newer local files (e.g. user touched a file post-fetch), and
vice-versa. Result: a reference tree marked complete for fp-X but
containing files actually generated under fp-Y. Downstream readers
(`_reuse_reference`, `cache.ensure_db_cache`) trust
"marker present ⇒ content matches fingerprint" and would silently use the
wrong reference.

Fix: whole-db rule keyed off the marker. If the cloud's chosen
`_reference_fp.txt` source_mtime is newer than the local marker's, the
ENTIRE cloud reference wins as a unit — every picked file (content + marker)
is atomic-replaced even if some local files have newer individual mtimes.
Otherwise NOTHING is touched. Cross-shard per-file mtime-wins is preserved
(still safe under the L1 invariant: all shards in one run share the same
fp). No cloud marker for a db → that db's cloud content cannot be trusted
(no fp to bind it to); skip entirely.

Two regression tests in `test_post_run_merge.py`:
- `test_cloud_marker_loses_no_files_touched_even_if_content_newer` —
  cloud marker older than local marker; cloud x.yaml has newer mtime but
  MUST NOT be replaced (would produce mixed fp).
- `test_cloud_marker_wins_replaces_all_files_even_when_some_content_older` —
  cloud marker newer; local x.yaml has deceptively-recent mtime (e.g.
  manual touch); MUST be overwritten so the new fp marker doesn't sit
  on top of stale content from a different fp.

Replaces the round-2 `test_partial_cloud_win_restores_local_marker_when_marker_loses`,
which exercised the per-file mixed scenario that is now impossible by
construction. Updated `test_per_file_atomic_replace_leaves_no_partial`
to include a cloud marker (the whole-db rule short-circuits without one
before reaching the per-file replace step).

Module docstring expanded to document the two-level merge rule.

Full non-integration suite: 1092 passed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two Codex r6 findings, same class of bug + missing observability.

**Cdx6-1 major — whole-db cloud-wins left stale local files behind.**
The r5 whole-db merger replaced cloud's picked files atomically but left
ANY local file not in the cloud's picks in place. So a `<db>/` ending up
with the cloud's `current.yaml` + marker (new fp) PLUS an old
`legacy.yaml` only present under the prior local fp — same class of
mixed-fingerprint bug as r5 caught for content vs marker, just on the
file-presence axis. The new fp marker then sits on top of files from
the prior fp, silently breaking the marker-binds-content invariant
downstream readers depend on.

Fix: when cloud wins, after unlinking the local marker (preserves the
"marker absent ⇒ rebuild" reader contract during the merge window),
delete every local file under `<db>/` that's not in the cloud's picks +
{`_reference_fp.txt`}, then sweep empty subdirs bottom-up. Per-file
atomic replaces + marker-last write happen unchanged afterwards.

Two regression tests:
- `test_cloud_wins_deletes_stale_local_files_not_in_picks` — local
  has `legacy.yaml` (only in fp-LOCAL), cloud has only `current.yaml` +
  marker (fp-CLOUD). After merge, `legacy.yaml` MUST be gone.
- `test_cloud_wins_removes_empty_subdirs_left_after_stale_purge` —
  when a stale file's parent subdir has no replacement under cloud,
  the empty subdir is swept too.

**Cdx6-2 minor — merge_report invisible after fetch.**
`_collation.collate()` writes `eval.json` and returns `metrics`;
`fetch()` then mutates `metrics["merge_report"] = ...` and returns it,
but the CLI discarded the return value. Result: ignored shards and merge
counts were visible only to direct Python callers — post-run merge
failures un-auditable.

Fix:
- `merge_post_run_into_warm_cache` writes `<run_dir>/merge_report.json`
  directly (best-effort, swallows OSError).
- CLI's `fetch` subcommand now keeps the return value and prints a
  summary line per merged db + a count of ignored shards.

Regression test `test_merge_report_persisted_to_run_dir` asserts the
on-disk file matches the returned report and carries both axes.

Full non-integration suite: 1095 passed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
**Codex r7 minor — partial-shard failures silently vanished from the report.**
The merger marked a shard as ignored ONLY when EVERY db slice under it was
incomplete. If one actor uploaded db_a fully but died mid-upload for db_b,
the merger would land db_a, silently skip db_b (no other shard covers it),
omit the shard from `ignored_shards` (db_a's slice IS complete), and the
user had no signal that db_b's warm-cache artifact was lost.

Fix: add `skipped_dbs: [{db, reason}]` to the report for any db that
appeared in shards but had no complete slice anywhere. CLI's `fetch`
subcommand now prints a SKIPPED line per such db so the user sees the loss
at fetch time. On-disk `merge_report.json` carries the field too for audit.

Regression test `test_partial_shard_dbs_appear_in_skipped_dbs`: same shard
has db_a (complete) + db_b (no `_upload_complete` marker). db_a merged,
db_b in `skipped_dbs`, shard NOT in `ignored_shards`.

Existing tests unaffected — they use `set(report.keys()) >= {...}` for
shape checks, and report-content assertions key off specific entries.

Full non-integration suite: 1096 passed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…smoke recipe (DEV-1470)

Two notes added after the DEV-1470 PR loop surfaced both pitfalls:

**Polling for cloud-run terminal state.** A bash polling loop around
`bird-interact-cloud list | awk '{print $4}' | grep done` silently never
terminates because `$4` is `done/total`, not the state column (`$3`).
Burned half an hour mid-smoke. The canonical primitive `driver.wait_until_done`
already exists (`submit` uses it on the non-detached path) — documented the
one-liner that invokes it for detached / pre-existing runs.

**otf_encode end-to-end smoke recipe.** Validates the DEV-1470 round-trip
(local deterministic cache → cloud LLM encode → upload-back → fetch +
merge into local warm cache). Documents the four steps: nuke local
artefacts, rebuild only the cache locally (REQUIRED by submit fail-fast),
submit `--framework pydantic_ai_otf_encode`, then fetch. Last validated
on smoke `20260526t1140-pydanti-slayer-6aad37` — 14 files merged into
`slayer_models_otf/households/`, 0 skipped/ignored.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ZmeiGorynych ZmeiGorynych merged commit 33f30ac into main May 26, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant