bird-agents: cloud OTF reference encode + per-task debug artefacts (DEV-1470)#3
Conversation
…EV-1470) Move LLM-driven `_build_reference` to the cloud and ship results back to the laptop's warm cache for reuse. Per-task agent session logs + HARD-8 scratch storage now upload to per-run debug bundles alongside the row+log writes. Driver: `_check_slayer_setup_present` for otf_encode+on-the-fly now requires `_cache_fp.txt` (deterministic) and treats `_reference_fp.txt` as an OPTIONAL upload seed. `_build_job_args` / `_build_resubmit_args` sort `--instance-ids` by (selected_database, instance_id) so same-db tasks dispatch adjacently and the cross-actor encode race is rare. `fetch(run_id)` calls the new `post_run_merge.merge_post_run_into_warm_cache` after collation. Worker `download_slayer_setup` iterates a per-combo artifact list: REQUIRED artifacts use the existing atomic rename + `.download_complete` marker (empty prefix fatal); OPTIONAL `slayer_models_otf` seed no-ops on empty prefix, file-merges into existing roots under per-DB `fcntl.flock` on non-empty, and is gated by `.optional_seed_download_complete` so an actor restart doesn't clobber a cloud-built reference. `_run_one_in_actor` fires the upload-back triple (debug bundle, setup-encoder sessions, OTF reference delta) after the row+log write, swallowing exceptions. New `cloud/upload_back.py` ships per-task scratch + the per-DB OTF reference shards (per-actor `<hostname>-<pid>` shard, bulk content → `_source_mtimes.json` sidecar → `_upload_complete` LAST). Per-actor `uploaded_dbs` + initial-seed fingerprint snapshot drive eligibility — failed uploads stay eligible for retry on later tasks. New `cloud/post_run_merge.py` promotes shards into the global warm cache: per-file newest-source-mtime-wins, under per-DB `fcntl.flock` (shared with `_build_reference`'s build lock), unlinking the local `_reference_fp.txt` FIRST and writing it LAST (both via atomic `tmp+os.replace`) so the marker-present-⇒-content-complete invariant holds against concurrent readers. Shards missing `_upload_complete` or `_source_mtimes.json` are ignored. `reference_build._build_reference` now takes the same cross-process flock so two Ray actor processes building the same DB on one VM serialise instead of racing `_commit_reference`'s rmtree+rename. Known limitation (documented): pre-completion losses (actor crash, VM preemption) lose `/tmp/bird_interact_slayer_otf/<iid>/` for the in-flight task. Per-task row still lands; debug bundle for that single task is gone. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds DEV-1470 on-the-fly upload-back and warm-cache merge: required-vs-optional SLAYER artifact semantics, deterministic instance grouping for submit/resubmit, actor-side best-effort per-task uploads, per-DB flocked optional-seed download/merge, post-run per-db shard merge with newest-mtime-wins, and audited_gold BuildKit input handling for image builds. ChangesOTF Upload-Back and Post-Run Merge
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/bird_interact_agents/cloud/driver.py (1)
221-229:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDon't treat
_reference_fp.txtas a sufficient OTF-seed presence check.
reference_build._load_reference_entry()now rejects a marker-only reference because_kb_rows.jsonis mandatory. With the current check, a corruptslayer_models_otf/<db>/still gets uploaded as an optional seed, and the cloud reuse path will fail on that seed instead of just rebuilding it lazily.Suggested fix
def _artifact_present(root: Path, db: str, artifact: str) -> bool: db_dir = root / db if artifact == "slayer_models": return db_dir.is_dir() and any(db_dir.iterdir()) - marker = "_cache_fp.txt" if artifact == "slayer_otf_cache" else "_reference_fp.txt" - return (db_dir / marker).is_file() + if artifact == "slayer_otf_cache": + return (db_dir / "_cache_fp.txt").is_file() + return ( + (db_dir / "_reference_fp.txt").is_file() + and (db_dir / "_kb_rows.json").is_file() + )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/bird_interact_agents/cloud/driver.py` around lines 221 - 229, The presence check currently treats any marker file as sufficient for OTF seeds, which lets marker-only/corrupt OTF dirs pass; update _artifact_present to require both the marker and the mandatory "_kb_rows.json" for OTF-style artifacts (e.g., when artifact == "slayer_otf_cache") while keeping the existing non-empty-dir check for "slayer_models"; specifically, in _artifact_present use the existing marker variable but return True only if (db_dir / marker).is_file() AND (db_dir / "_kb_rows.json").is_file() for OTF artifacts.
🧹 Nitpick comments (1)
tests/cloud/test_post_run_merge.py (1)
400-418: 💤 Low valueRemove unused
run_dirfrom the worker function signature.The
run_dirvariable is unpacked but never used in_spawn_merge_holding_lock. Since this function only needs to acquire the per-DB lock (not perform an actual merge), the parameter is unnecessary.🧹 Suggested cleanup
def _spawn_merge_holding_lock(args): """Worker that acquires the per-DB build lock (shared with the merger under the H4 resolution) and holds it for `hold_s`.""" - run_dir, reference_root, db, hold_s, sentinel_path = args + _run_dir, reference_root, db, hold_s, sentinel_path = args import fcntlOr remove it from the caller as well:
- args=((run_dir, ref_root, "db_a", 2.0, str(sentinel)),), + args=((ref_root, "db_a", 2.0, str(sentinel)),),- run_dir, reference_root, db, hold_s, sentinel_path = args + reference_root, db, hold_s, sentinel_path = args🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/cloud/test_post_run_merge.py` around lines 400 - 418, The function _spawn_merge_holding_lock currently unpacks a unused run_dir parameter; remove run_dir from the function signature and from its tuple unpacking inside _spawn_merge_holding_lock, and update any callers/tests that construct/pass the argument so they only pass (reference_root, db, hold_s, sentinel_path) (or adjust the caller to stop including run_dir in the args tuple). Ensure the lock_path logic and use of fcntl, Path(sentinel_path).write_text, and time.sleep remain unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/bird_interact_agents/cloud/ray_app.py`:
- Around line 203-207: The merge currently writes a temp file (tmp_dst) and
os.replace() moves it into place, which gives dst the temp file's mtime
(download time); fix by capturing the source mtime (src_mtime =
src.stat().st_mtime) and after os.replace(tmp_dst, dst) call os.utime(dst,
(src_mtime, src_mtime)) so the replaced file preserves the original source
modification time; update the block around tmp_dst.write_bytes(...) and
os.replace(...) to set src_mtime and call os.utime on dst.
In `@src/bird_interact_agents/cloud/upload_back.py`:
- Around line 203-236: The upload loop currently runs for every slayer job and
can re-upload stale OTF files; add a strict guard so this code only runs for the
OTF-encode flow by checking cfg.get("run_name") == "pydantic_ai_otf_encode" (in
addition to cfg.get("query_mode") == "slayer") at the top of
upload_otf_reference_delta (or the surrounding scope), i.e. return early unless
both conditions hold; keep the rest of the logic (refs: cfg,
initial_seed_fp_by_db, slayer_models_otf_root, _upload_one_db_shard,
uploaded_dbs) unchanged so only true encode runs can mark/upload shards.
---
Outside diff comments:
In `@src/bird_interact_agents/cloud/driver.py`:
- Around line 221-229: The presence check currently treats any marker file as
sufficient for OTF seeds, which lets marker-only/corrupt OTF dirs pass; update
_artifact_present to require both the marker and the mandatory "_kb_rows.json"
for OTF-style artifacts (e.g., when artifact == "slayer_otf_cache") while
keeping the existing non-empty-dir check for "slayer_models"; specifically, in
_artifact_present use the existing marker variable but return True only if
(db_dir / marker).is_file() AND (db_dir / "_kb_rows.json").is_file() for OTF
artifacts.
---
Nitpick comments:
In `@tests/cloud/test_post_run_merge.py`:
- Around line 400-418: The function _spawn_merge_holding_lock currently unpacks
a unused run_dir parameter; remove run_dir from the function signature and from
its tuple unpacking inside _spawn_merge_holding_lock, and update any
callers/tests that construct/pass the argument so they only pass
(reference_root, db, hold_s, sentinel_path) (or adjust the caller to stop
including run_dir in the args tuple). Ensure the lock_path logic and use of
fcntl, Path(sentinel_path).write_text, and time.sleep remain unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 460d6417-817d-4cc7-9c62-9f3c1e92dbe3
📒 Files selected for processing (10)
src/bird_interact_agents/cloud/driver.pysrc/bird_interact_agents/cloud/post_run_merge.pysrc/bird_interact_agents/cloud/ray_app.pysrc/bird_interact_agents/cloud/upload_back.pysrc/bird_interact_agents/slayer_otf/reference_build.pytests/cloud/test_driver.pytests/cloud/test_post_run_merge.pytests/cloud/test_ray_app.pytests/cloud/test_upload_back.pytests/test_slayer_otf_reference_build.py
…ild (DEV-1470) Five fixes from Codex + CodeRabbit on PR #3, plus the related worktree-safety change `audited_gold/` needed to actually run the smoke from a worktree. **Codex Cdx-1 / CodeRabbit CR-1 — preserve source_mtime across atomic replace.** `post_run_merge._atomic_replace` / `_write_marker_atomic` and `ray_app._download_optional_seed` all used `os.replace(tmp, dst)` of a freshly-written tmp file, so `dst` inherited the laptop's fetch time, not the source's actual mtime. A later fetch of a genuinely-newer cloud reference would then be wrongly skipped because the comparison reads `dst.stat().st_mtime`. Fix: `os.utime(tmp, (source_mtime, source_mtime))` BEFORE the rename so dst inherits the source's mtime. Regression tests in `test_post_run_merge.py` (per-file mtime preservation + two-fetch newest-wins scenario) and `test_ray_app.py` (optional seed mtime). **CodeRabbit CR-2 — gate `upload_otf_reference_delta` on otf_encode+on-the-fly.** The helper returned only on `query_mode \!= "slayer"`. For recursive+slayer or pre-encoded+slayer, `initial_seed_fp_by_db` is `{}`, so any stale `slayer_models_otf/<db>/` on a shared worker FS would pass the fp-mismatch check and pollute the laptop warm cache. Widened the guard to require `framework == "pydantic_ai_otf_encode"` AND `slayer_setup == "on-the-fly"`. Regression test covers all three non-eligible combos. **Codex Cdx-2 — re-check the marker INSIDE `_build_reference_inside_lock`.** Two Ray actor processes can both pass `ensure_db_reference`'s asyncio marker check, then block on the cross-process flock; once the lock is released both would re-enter `_build_reference` and BOTH run the full LLM encode (the asyncio Lock only serializes within ONE process). Added the marker re-check at the top of `_build_reference_inside_lock` — if the peer just committed, load via `_load_reference_entry` and skip the encoder entirely. Regression test spawns a peer process that commits the reference while the main asyncio task is blocked on the flock; the test asserts `build_encoder` is never invoked. **CodeRabbit CR-3 nitpick** — renamed the unused `run_dir` in `_spawn_merge_holding_lock` to `_run_dir`. **Worktree-safety: ship `audited_gold/` via `--build-context`** (out-of-band: caught by attempting the smoke from a worktree). `audited_gold/` is gitignored and lives in the MAIN checkout, so the worktree's `COPY audited_gold/ ...` in `Dockerfile.cloud` failed with `"/audited_gold": not found`. Mirrored how `mini-interact` is wired: `build_and_push` now passes `--build-context audited-gold=<paths.audited_gold_root()>` and the Dockerfile uses `COPY --from=audited-gold ...`. `image.data_hash` / `image.image_tag` take `audited_gold_root` as an explicit positional param so the hash key is stable regardless of where the dir lives on the host. Dropped `audited_gold/` from `_dirty_image_input_paths` (it's gitignored, so it could never appear in `git status --porcelain` anyway, and the dir is now content-hashed via the explicit root). 4 new regression tests in `test_image.py` (worktree-vs-main hash equivalence; stale-worktree-subdir ignored; `--build-context audited-gold` arg; dirty-prefix no longer includes audited_gold). Added a worktree-safety note to project CLAUDE.md. Full non-integration suite: 1087 passed (was 1078 pre-PR + 5 fix regressions + 4 worktree-safety regressions). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/bird_interact_agents/slayer_otf/reference_build.py`:
- Around line 565-581: The current check uses the combined rebuild condition
(`force or target.exists()`) earlier and here only inspects `force` before
reusing a peer-committed reference, which causes a race where a stale markerless
directory (target.exists() true) can still trigger a rebuild even after a peer
committed; update ensure_db_reference()/calling logic so the decision to skip
reuse is based only on an explicit user intent flag (`force`) not on
`target.exists()`, and modify this block (around marker/_MARKER check in
reference_build.py) to re-check the marker file and return
_load_reference_entry(target).setup_results when marker.is_file() and not force,
ensuring that existence of target without marker does not suppress peer-commit
reuse — keep the separate semantics for "explicit user rebuild" (force) vs
"pre-existing target".
In `@tests/cloud/test_ray_app.py`:
- Around line 1616-1663: The test only checks that os.utime was called on a tmp
'.seed-' file but doesn't assert the final file mtime; modify
test_optional_seed_merge_preserves_src_mtime to (1) choose a deterministic
expected_src_mtime (e.g. expected_src_mtime = 1_600_000_000.0) and ensure the
spy_utime will be invoked with that value for the tmp file, (2) after calling
ray_app.download_slayer_setup(RUN_ID, cfg, client=client) locate the final
destination file ref_root / "db_a" / "models" / "x.yaml" and assert its
stat().st_mtime == expected_src_mtime, and (3) keep the existing assertion that
os.utime was called on the tmp '.seed-' path (using utime_calls) but use the
recorded/expected_src_mtime for the final st_mtime check so the test verifies
the dst actually inherited the source mtime.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: ed0d17e3-0a72-4faf-a144-970043bb380b
📒 Files selected for processing (13)
CLAUDE.mdDockerfile.cloudsrc/bird_interact_agents/cloud/driver.pysrc/bird_interact_agents/cloud/image.pysrc/bird_interact_agents/cloud/post_run_merge.pysrc/bird_interact_agents/cloud/ray_app.pysrc/bird_interact_agents/cloud/upload_back.pysrc/bird_interact_agents/slayer_otf/reference_build.pytests/cloud/test_image.pytests/cloud/test_post_run_merge.pytests/cloud/test_ray_app.pytests/cloud/test_upload_back.pytests/test_slayer_otf_reference_build.py
Three Codex-round-2 findings, all major. One of them (Cdx2-1) was the reason the smoke had been silently failing — the round-1 image-tag signature change broke a second production call site I missed. **Cdx2-1 — `cli.py build` subcommand still used the old `image_tag`/ `build_and_push` signature.** `driver.submit` was updated for the `audited_gold_root` parameter; `cloud/cli.py:131-137` was not. Running `bird-interact-cloud build` (or `submit` via the build subcommand path) raised `TypeError: image_tag() missing 1 required positional argument: 'audited_gold_root'`. Updated the subcommand to pass `paths.audited_gold_root()` to both calls. New regression test in `tests/cloud/test_cli.py::test_build_subcommand_passes_audited_gold_root_to_image_tag` mocks both image helpers and asserts the arg is plumbed through. **Cdx2-2 — Optional seed merge: drop the broken mtime-wins, use "don't clobber if dst exists".** The round-1 fix preserved `src_mtime` across the `os.replace`, but the source mtime captured at download time is the local write time (gcs.download_prefix writes blobs to local tmp files without preserving any GCS-side metadata), not the original upload-time mtime. So a stale seed on a restarted actor could still wrongly overwrite newer cloud-built local references. Correct semantics for `_download_optional_seed`: if dst already exists (cloud-built reference from a prior actor, OR a crashed mid-merge), KEEP it. The cross-process per-DB flock + the `.optional_seed_download_complete` marker make this race-safe. Replaced the now-misleading `test_optional_seed_merge_preserves_src_mtime` with `test_download_slayer_setup_optional_seed_does_not_clobber_local`. **Cdx2-3 — Merger downgraded the local marker when cloud lost the comparison.** The round-1 merger unlinked the local `_reference_fp.txt` FIRST, then compared cloud marker mtime against `local_marker.stat()` — which raised `FileNotFoundError` and defaulted to `0.0`. Older cloud markers therefore always "won" the comparison, leaving newer local content under a stale fingerprint (invariant 'marker present ⇒ content complete' violated). Refactored `_merge_one_db` to: (1) capture local marker mtime + bytes BEFORE any unlink, (2) precompute all per-file decisions, (3) skip the whole touch when nothing changes, (4) on any actual change unlink the marker, do per-file atomic replaces, then either write the cloud marker (if it won) OR restore the local marker (if cloud lost). Two new regression tests in `test_post_run_merge.py` cover the older-cloud no-downgrade case and the mixed-win case where the marker must be restored after a partial content overwrite. Full non-integration suite: 1089 passed (was 1087 + 2 new regressions — one existing optional-seed test was REPLACED to match the new "don't clobber" contract, so net is +2). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (2)
src/bird_interact_agents/cloud/ray_app.py (1)
196-214: ⚡ Quick winUpdate
_download_optional_seed's docstring to match this new rule.This block now implements "don't clobber if
dstalready exists", but the function docstring above still saysnewest-source-mtime-wins. Leaving both contracts in the same function will mislead the next refactor.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/bird_interact_agents/cloud/ray_app.py` around lines 196 - 214, Update the _download_optional_seed function docstring to reflect the new "don't clobber if dst already exists" semantics instead of "newest-source-mtime-wins"; mention that the function now checks dst.exists() and skips writing when present, uses an atomic tmp_dst (tmp_dst = dst.parent / f".{dst.name}.seed-{os.getpid()}") with tmp+os.replace for crash safety, and relies on the cross-process flock and the .optional_seed_download_complete marker to ensure safe concurrent/partial-download behavior.tests/cloud/test_ray_app.py (1)
1592-1611: ⚡ Quick winAlso assert that an existing local
_reference_fp.txtis preserved.This test only proves the "don't clobber" rule for content files.
_download_optional_seedapplies the same branch to_reference_fp.txt, and overwriting that marker would change completeness and upload-back behavior without tripping this regression.Suggested test tightening
store[f"{rpfx}/db_a/_reference_fp.txt"] = b"seed-fp" store[f"{rpfx}/db_a/models/x.yaml"] = b"SEED-CONTENT\n" store[f"{rpfx}/db_a/models/y.yaml"] = b"SEED-Y\n" # Local already has a cloud-built x.yaml (from a prior actor on this # VM). MUST NOT be overwritten by the seed. local_db = ref_root / "db_a" (local_db / "models").mkdir(parents=True) (local_db / "models" / "x.yaml").write_bytes(b"CLOUD-BUILT\n") + (local_db / "_reference_fp.txt").write_bytes(b"local-fp") @@ assert (local_db / "models" / "x.yaml").read_bytes() == b"CLOUD-BUILT\n", ( "seed merged on top of an existing local file — must not clobber" ) + assert (local_db / "_reference_fp.txt").read_bytes() == b"local-fp", ( + "seed marker clobbered an existing local marker" + ) assert (local_db / "models" / "y.yaml").read_bytes() == b"SEED-Y\n", ( "seed file absent locally must still be downloaded" )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/cloud/test_ray_app.py` around lines 1592 - 1611, Add a check that an existing local _reference_fp.txt is not overwritten by the seed: before calling download_slayer_setup(RUN_ID, cfg, client=client) create a local marker file at ref_root / "db_a" / "_reference_fp.txt" with a distinct byte string (e.g., b"CLOUD-FP"), then after download assert that (ref_root / "db_a" / "_reference_fp.txt").read_bytes() still equals that original byte string; reference the existing symbols rpfx, local_db, ref_root, download_slayer_setup, and _download_optional_seed to locate the relevant setup and assertion spot.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@src/bird_interact_agents/cloud/ray_app.py`:
- Around line 196-214: Update the _download_optional_seed function docstring to
reflect the new "don't clobber if dst already exists" semantics instead of
"newest-source-mtime-wins"; mention that the function now checks dst.exists()
and skips writing when present, uses an atomic tmp_dst (tmp_dst = dst.parent /
f".{dst.name}.seed-{os.getpid()}") with tmp+os.replace for crash safety, and
relies on the cross-process flock and the .optional_seed_download_complete
marker to ensure safe concurrent/partial-download behavior.
In `@tests/cloud/test_ray_app.py`:
- Around line 1592-1611: Add a check that an existing local _reference_fp.txt is
not overwritten by the seed: before calling download_slayer_setup(RUN_ID, cfg,
client=client) create a local marker file at ref_root / "db_a" /
"_reference_fp.txt" with a distinct byte string (e.g., b"CLOUD-FP"), then after
download assert that (ref_root / "db_a" / "_reference_fp.txt").read_bytes()
still equals that original byte string; reference the existing symbols rpfx,
local_db, ref_root, download_slayer_setup, and _download_optional_seed to locate
the relevant setup and assertion spot.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: bb330c90-2aba-4229-9ef9-79963ad62f6b
📒 Files selected for processing (6)
src/bird_interact_agents/cloud/cli.pysrc/bird_interact_agents/cloud/post_run_merge.pysrc/bird_interact_agents/cloud/ray_app.pytests/cloud/test_cli.pytests/cloud/test_post_run_merge.pytests/cloud/test_ray_app.py
Two issues from CodeRabbit's review of e3b41b7 / dba12b3. **CR r2 — peer-marker reuse path was disabled by stale markerless scrap.** `ensure_db_reference` passed `force=force or target.exists()` to `_build_reference`, so any local markerless dir set `force=True` and caused `_build_reference_inside_lock`'s `if not force and marker.is_file()` peer-reuse to be bypassed. Concretely: an actor with stale leftover scrap who lost the cross-process flock race would, on lock acquisition, observe the peer's freshly-committed reference but still rmtree+rebuild it. Split the two concerns: `ensure_db_reference` now passes `force=force` (user intent only), and `_commit_reference` handles markerless-scrap on its own — it rmtrees a markerless target before the atomic rename, regardless of `force`. Result: a peer's commit is honored under both `force=False` (reuse) and `force=True` (no markerless rmtree of a marked dir), completing the H4 cross-process safety story. New regression test `test_ensure_db_reference_reuses_peer_commit_when_markerless_scrap_exists` spawns a peer that commits while the main task waits on flock + asserts the build_encoder is never invoked. **CR r2 nitpick — `_download_optional_seed` docstring stale.** The Codex r2 fix moved the optional seed merge from mtime-wins to "don't clobber if dst exists", but the docstring still claimed "newest-source-mtime-wins". Rewrote the docstring to reflect the new contract + the rationale for abandoning mtime comparison (`src.stat().st_mtime` is download-time, not upload-time; gcs.download_prefix doesn't preserve metadata). Full non-integration suite: 1090 passed (was 1089 + 1 new regression). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/test_slayer_otf_reference_build.py (1)
880-890: 💤 Low valueMinor: mutable class attribute in test helper.
index_rows: list[dict] = []is a mutable class attribute. While harmless here (each class is defined fresh inside the test function and the list is never mutated), using a property or instance attribute would be more defensive.class _RunOne: - index_rows: list[dict] = [] + `@property` + def index_rows(self) -> list[dict]: + return []This is a low-priority nitpick — the current code works correctly for the test's purpose.
Also applies to: 792-803, 992-1002
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_slayer_otf_reference_build.py` around lines 880 - 890, The class _RunOne defines a mutable class attribute index_rows: list[dict] = [], which should be an instance attribute to avoid shared mutable state; change it to initialize self.index_rows = [] inside an __init__ (or make it a `@property` that returns a new list) and update any code that references _RunOne.index_rows to use the instance attribute; apply the same change to the other test helper _RunOne definitions mentioned in the review so each test gets its own list instance.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@tests/test_slayer_otf_reference_build.py`:
- Around line 880-890: The class _RunOne defines a mutable class attribute
index_rows: list[dict] = [], which should be an instance attribute to avoid
shared mutable state; change it to initialize self.index_rows = [] inside an
__init__ (or make it a `@property` that returns a new list) and update any code
that references _RunOne.index_rows to use the instance attribute; apply the same
change to the other test helper _RunOne definitions mentioned in the review so
each test gets its own list instance.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: ed3aadda-4242-4a1f-8295-c9edfbe4a05e
📒 Files selected for processing (3)
src/bird_interact_agents/cloud/ray_app.pysrc/bird_interact_agents/slayer_otf/reference_build.pytests/test_slayer_otf_reference_build.py
**Codex r3 — seed-fp snapshot must come from GCS, not local disk.** `_snapshot_initial_seed_fps()` walked `paths.slayer_models_otf_root()` to build the per-DB seed-fingerprint dict that drives `upload_otf_reference_delta`'s skip-or-upload decision. But the OTF reference root is SHARED across actor processes on a VM, so a REPLACEMENT actor (after a peer died mid-task before its post-task upload-back ran) would see the peer's cloud-built reference on disk and record its fingerprint as "initial seed". Then `upload_otf_reference_delta` would skip the upload because the fingerprint appears unchanged — losing the only post-run shard for that DB, so `fetch()` couldn't merge it into the laptop warm cache. Fix: snapshot AUTHORITATIVELY from GCS. The seed prefix `runs/<run_id>/slayer_setup/slayer_models_otf/` is set by the driver at submit time, BEFORE any actor starts, and is immutable for the run. Any local fingerprint NOT in this snapshot is genuine cloud work that must be uploaded back. Empty dict on any error (conservative: upload everything cloud-built, never lose data). Two regression tests in `test_ray_app.py`: - `test_actor_captures_uploaded_dbs_and_initial_seed_fps_from_gcs` populates GCS (not disk) and asserts the snapshot reads from it. - `test_actor_seed_snapshot_ignores_local_only_cloud_built_refs` sets up the exact Codex scenario: GCS seed has db_a; local disk ALSO has db_b from a peer's crashed task; snapshot must contain only db_a, and `upload_otf_reference_delta` must successfully upload db_b. **CR r3 nitpick** — Replaced the `index_rows: list[dict] = []` class attribute with `self.index_rows = []` in `__init__` across all three `_RunOne` test helpers, defensive against the mutable-default-shared trap (harmless in practice but easy to fix). Full non-integration suite: 1091 passed. Smoke validation: `bird-interact-cloud submit --framework pydantic_ai --query-mode raw --instance-ids alien_1,alien_2 --detach` succeeded as run `20260526t1008-pydanticai-raw-90e564` — confirms the worktree-safety fix, the CLI signature fix (Cdx2-1), and the universal touched paths (db-sort, fetch hook, image build, cluster lifecycle). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/bird_interact_agents/cloud/ray_app.py`:
- Around line 679-681: The continuation line for the assignment to
self.initial_seed_fp_by_db inside WorkerActor.__init__ is mis-indented causing
an IndentationError; locate the assignment that calls _snapshot_initial_seed_fps
(symbol: _snapshot_initial_seed_fps, target attribute: initial_seed_fp_by_db)
and re-indent the wrapped call so the continuation line aligns with the start of
the assignment expression inside __init__ (use the same indentation level as
other statements in the method, e.g., move from 8 spaces to the matching
12-space indentation used for inner statements) so the call is syntactically
valid.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: ea2b24b4-7b04-4dd3-afc0-1d009c91351d
📒 Files selected for processing (3)
src/bird_interact_agents/cloud/ray_app.pytests/cloud/test_ray_app.pytests/test_slayer_otf_reference_build.py
…-1470) CR r4 flagged this as a critical SyntaxError but that claim is incorrect: Python suppresses indentation rules for continuation lines inside parens, so the misaligned `_snapshot_initial_seed_fps(...)` call parsed and ran fine — module imports cleanly, `_build_actor_class()` builds `WorkerActor`, and the full 1091-test suite passes against the misaligned source. The realignment is still a strict readability improvement (PEP 8 alignment), so applying it. Side-effect of a previous `replace_all` Edit that didn't account for the deeper nesting level inside the lazily-defined `WorkerActor`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
**Codex r5 — merger could produce a mixed-fingerprint reference tree.** `_merge_one_db` decided each content file and the marker independently by mtime. A newer cloud marker could win while one or more cloud payload files LOST to newer local files (e.g. user touched a file post-fetch), and vice-versa. Result: a reference tree marked complete for fp-X but containing files actually generated under fp-Y. Downstream readers (`_reuse_reference`, `cache.ensure_db_cache`) trust "marker present ⇒ content matches fingerprint" and would silently use the wrong reference. Fix: whole-db rule keyed off the marker. If the cloud's chosen `_reference_fp.txt` source_mtime is newer than the local marker's, the ENTIRE cloud reference wins as a unit — every picked file (content + marker) is atomic-replaced even if some local files have newer individual mtimes. Otherwise NOTHING is touched. Cross-shard per-file mtime-wins is preserved (still safe under the L1 invariant: all shards in one run share the same fp). No cloud marker for a db → that db's cloud content cannot be trusted (no fp to bind it to); skip entirely. Two regression tests in `test_post_run_merge.py`: - `test_cloud_marker_loses_no_files_touched_even_if_content_newer` — cloud marker older than local marker; cloud x.yaml has newer mtime but MUST NOT be replaced (would produce mixed fp). - `test_cloud_marker_wins_replaces_all_files_even_when_some_content_older` — cloud marker newer; local x.yaml has deceptively-recent mtime (e.g. manual touch); MUST be overwritten so the new fp marker doesn't sit on top of stale content from a different fp. Replaces the round-2 `test_partial_cloud_win_restores_local_marker_when_marker_loses`, which exercised the per-file mixed scenario that is now impossible by construction. Updated `test_per_file_atomic_replace_leaves_no_partial` to include a cloud marker (the whole-db rule short-circuits without one before reaching the per-file replace step). Module docstring expanded to document the two-level merge rule. Full non-integration suite: 1092 passed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two Codex r6 findings, same class of bug + missing observability.
**Cdx6-1 major — whole-db cloud-wins left stale local files behind.**
The r5 whole-db merger replaced cloud's picked files atomically but left
ANY local file not in the cloud's picks in place. So a `<db>/` ending up
with the cloud's `current.yaml` + marker (new fp) PLUS an old
`legacy.yaml` only present under the prior local fp — same class of
mixed-fingerprint bug as r5 caught for content vs marker, just on the
file-presence axis. The new fp marker then sits on top of files from
the prior fp, silently breaking the marker-binds-content invariant
downstream readers depend on.
Fix: when cloud wins, after unlinking the local marker (preserves the
"marker absent ⇒ rebuild" reader contract during the merge window),
delete every local file under `<db>/` that's not in the cloud's picks +
{`_reference_fp.txt`}, then sweep empty subdirs bottom-up. Per-file
atomic replaces + marker-last write happen unchanged afterwards.
Two regression tests:
- `test_cloud_wins_deletes_stale_local_files_not_in_picks` — local
has `legacy.yaml` (only in fp-LOCAL), cloud has only `current.yaml` +
marker (fp-CLOUD). After merge, `legacy.yaml` MUST be gone.
- `test_cloud_wins_removes_empty_subdirs_left_after_stale_purge` —
when a stale file's parent subdir has no replacement under cloud,
the empty subdir is swept too.
**Cdx6-2 minor — merge_report invisible after fetch.**
`_collation.collate()` writes `eval.json` and returns `metrics`;
`fetch()` then mutates `metrics["merge_report"] = ...` and returns it,
but the CLI discarded the return value. Result: ignored shards and merge
counts were visible only to direct Python callers — post-run merge
failures un-auditable.
Fix:
- `merge_post_run_into_warm_cache` writes `<run_dir>/merge_report.json`
directly (best-effort, swallows OSError).
- CLI's `fetch` subcommand now keeps the return value and prints a
summary line per merged db + a count of ignored shards.
Regression test `test_merge_report_persisted_to_run_dir` asserts the
on-disk file matches the returned report and carries both axes.
Full non-integration suite: 1095 passed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
**Codex r7 minor — partial-shard failures silently vanished from the report.**
The merger marked a shard as ignored ONLY when EVERY db slice under it was
incomplete. If one actor uploaded db_a fully but died mid-upload for db_b,
the merger would land db_a, silently skip db_b (no other shard covers it),
omit the shard from `ignored_shards` (db_a's slice IS complete), and the
user had no signal that db_b's warm-cache artifact was lost.
Fix: add `skipped_dbs: [{db, reason}]` to the report for any db that
appeared in shards but had no complete slice anywhere. CLI's `fetch`
subcommand now prints a SKIPPED line per such db so the user sees the loss
at fetch time. On-disk `merge_report.json` carries the field too for audit.
Regression test `test_partial_shard_dbs_appear_in_skipped_dbs`: same shard
has db_a (complete) + db_b (no `_upload_complete` marker). db_a merged,
db_b in `skipped_dbs`, shard NOT in `ignored_shards`.
Existing tests unaffected — they use `set(report.keys()) >= {...}` for
shape checks, and report-content assertions key off specific entries.
Full non-integration suite: 1096 passed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…smoke recipe (DEV-1470)
Two notes added after the DEV-1470 PR loop surfaced both pitfalls:
**Polling for cloud-run terminal state.** A bash polling loop around
`bird-interact-cloud list | awk '{print $4}' | grep done` silently never
terminates because `$4` is `done/total`, not the state column (`$3`).
Burned half an hour mid-smoke. The canonical primitive `driver.wait_until_done`
already exists (`submit` uses it on the non-detached path) — documented the
one-liner that invokes it for detached / pre-existing runs.
**otf_encode end-to-end smoke recipe.** Validates the DEV-1470 round-trip
(local deterministic cache → cloud LLM encode → upload-back → fetch +
merge into local warm cache). Documents the four steps: nuke local
artefacts, rebuild only the cache locally (REQUIRED by submit fail-fast),
submit `--framework pydantic_ai_otf_encode`, then fetch. Last validated
on smoke `20260526t1140-pydanti-slayer-6aad37` — 14 files merged into
`slayer_models_otf/households/`, 0 skipped/ignored.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
_build_referenceto the cloud and ship results back to the laptop's warm cache (paths.slayer_models_otf_root()) for reuse by future runs.fcntl.flockso the in-cloud encoder, the optional seed download, and the laptop-side merge cannot interleave on the same(reference_root, db).What changed
Driver (
cloud/driver.py)_check_slayer_setup_presentforotf_encode + on-the-flynow requires_cache_fp.txt(deterministic stage 1, stays local) and treats_reference_fp.txtas an OPTIONAL upload seed (cloud encodes any db without a seed)._build_job_args+_build_resubmit_argssort--instance-idsby(selected_database, instance_id)so same-db tasks dispatch adjacently — same-actor encoding becomes the common case, encode race becomes rare.fetch(run_id)callspost_run_merge.merge_post_run_into_warm_cacheafter_collation.collate; merge report surfaced asmetrics["merge_report"].Worker (
cloud/ray_app.py)download_slayer_setupiterates a per-combo artifact list. REQUIRED artifacts: atomictmp + os.rename+.download_completemarker (empty prefix fatal). OPTIONAL seed: no-op on empty prefix; non-empty → per-DBfcntl.flock-guarded file-by-file merge into existing root (newest-mtime-wins) +.optional_seed_download_completemarker so actor restart never clobbers a cloud-built reference._run_one_in_actorcalls the upload-back triple (debug bundle → setup-encoder sessions → OTF reference delta) after row/log writes, in a try/except that logs to stderr and swallows.uploaded_dbs: set[str]+initial_seed_fp_by_db: dict[str, str]snapshot drivesupload_otf_reference_delta's eligibility — failed uploads stay eligible on later tasks.New:
cloud/upload_back.pyupload_per_task_debug— ships/tmp/bird_interact_slayer_otf/<iid>/toruns/<run_id>/sessions/<iid>/attempt-<n>/.upload_per_task_setup_sessions— ships each_setup_sessions/<db>/whose any file mtime ≥task_start_tstoruns/<run_id>/sessions/_setup_sessions/<db>/.upload_otf_reference_delta— for each<db>/whose_reference_fp.txtdiffers from the actor's seed snapshot AND isn't inuploaded_dbs, uploads the WHOLE subtree toruns/<run_id>/post_run/slayer_models_otf/<hostname>-<pid>/<db>/. Upload order: bulk content →_source_mtimes.jsonsidecar →_upload_completeLAST.New:
cloud/post_run_merge.py<run_dir>/post_run/slayer_models_otf/<shard>/<db>/across all shards. For each db, builds a per-file pick map (max source_mtime across shards). Skips shards missing_upload_completeor_source_mtimes.json.fcntl.flockon<reference_root>/<db>.build.lock(shared with the build/seed locks): unlink local_reference_fp.txtFIRST → per-file atomictmp + os.replace→ write_reference_fp.txtLAST (also atomic). Preserves "marker present ⇒ content complete" against concurrent readers.slayer_otf/reference_build.py_build_referencewrapped in the same cross-processfcntl.flockso two Ray actor processes building the same DB on one VM serialise instead of racing_commit_reference's rmtree+rename.Codex round-1 findings folded in
H1 (merger marker invariant), H2 (per-actor retry tracker), H3 (optional seed restart-safe), H4 (cross-process build flock), M1 (
_upload_completemarker), M2 (empty optional prefix no-op), M3 (restart-safe optional download), M5/M6 (concurrency + upload-loss tests). M4 (mid-task crash loses/tmp/.../<iid>/) accepted as documented limitation. L1 (cross-shard mixing safe per fingerprint invariant) documented in module docstring.Known limitation
Pre-completion losses (actor crash, VM preemption) lose
/tmp/bird_interact_slayer_otf/<iid>/for the in-flight task. The per-task row still lands; the debug bundle for that single task is gone. Per-task upload-back covers the common case; periodic upload was discussed and rejected as not worth the complexity.Test plan
tests/cloud/test_upload_back.py(15 tests) — per-helper unit coverage, swallow semantics, retry-after-failure (H2/M6), upload ordering (M1), sidecar mtime fidelitytests/cloud/test_post_run_merge.py(12 tests) — mtime-wins per file, marker invariant (H1/M3), shard-completeness (M1), per-file atomicity, cross-process lock (H1/H4), docstring fingerprint invariant (L1), report shape (M4)tests/cloud/test_driver.pyadditions — cache-required + reference-optional contract, partial seeds, exact(db, iid)sort (L1), fetch post-merge hooktests/cloud/test_ray_app.pyadditions — multi-artifact download, optional seed file-merge + marker + cross-process flock (H3/M1/H4), upload-back wiring + row-before-upload ordering + swallow, post-download seed snapshot timing (H2/H3)tests/test_slayer_otf_reference_build.pyaddition — cross-process_build_referenceflock (H4)bird-interact-cloud submit --framework pydantic_ai_otf_encode --slayer-setup on-the-fly ...(manual, separate)🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Bug Fixes
Tests
Documentation