Skip to content

fix: bound brain_store busy deferral#516

Merged
EtanHey merged 15 commits into
mainfrom
fix/brain-store-busy-defer
Jun 19, 2026
Merged

fix: bound brain_store busy deferral#516
EtanHey merged 15 commits into
mainfrom
fix/brain-store-busy-defer

Conversation

@EtanHey

@EtanHey EtanHey commented Jun 19, 2026

Copy link
Copy Markdown
Owner

Summary

  • bound the MCP brain_store write retry path with BRAINLAYER_STORE_BUSY_BUDGET_MS, defaulting to 3000ms
  • temporarily lowers and restores the active store connection busy_timeout to the remaining store budget before deferring through the existing queue fallback
  • adds regression coverage for prompt deferral and pending-store replay

Test plan

  • RED: pytest tests/test_store_handler.py::test_store_busy_budget_defers_promptly_and_pending_flush_replays -q failed before implementation with elapsed 0.324s > 0.22s
  • ruff check src/brainlayer/mcp/store_handler.py tests/test_store_handler.py
  • ruff format --check src/brainlayer/mcp/store_handler.py tests/test_store_handler.py
  • pytest tests/test_store_handler.py tests/test_write_queue.py::TestStoreRetryOnLock tests/test_write_queue.py::TestFlushPendingStores -q
  • BRAINLAYER_PREPUSH=1 ./scripts/run_tests.sh passed: 2972 passed, 9 skipped, 61 deselected, 1 xfailed; MCP registration, isolated eval/hooks, bun, and regression shell tests passed

Note: local CodeRabbit pre-commit review hit the free OSS rate limit, so PR-level reviewer bots are requested on the PR.


Note

Medium Risk
Changes core write/defer paths and singleton VectorStore initialization under lock contention; mis-tuned budgets could increase DEFERRED stores, but behavior is covered by extensive regression tests.

Overview
MCP brain_store now caps how long it waits on SQLite contention using BRAINLAYER_STORE_BUSY_BUDGET_MS (default 3s). Before that budget is exhausted, writes refresh busy_timeout per statement (via exec traces), cold VectorStore init can time out on the singleton lock, and store_memory is called with retry_on_busy=False so the handler owns backoff. When the budget runs out, the existing DEFERRED queue / pending-stores.jsonl path still applies—but validation runs first, so invalid requests are not queued on busy errors. Supersede updates share the same deadline-scoped timeout wrapper.

store_memory adds optional busy_deadline / retry_on_busy so outer callers can enforce a single contention window.

Drain bootstraps schema via VectorStore when replaying store_memory on a missing/empty DB, retries once on “no such table: chunks”, treats WriterInUseError as busy (preserving queue files), and applies supersedes even when replay hits an already-reserved chunk id.

VectorStore write paths honor thread-local deadline-aware busy timeouts during init and restore default timeouts after successful open.

Reviewed by Cursor Bugbot for commit c048857. Bugbot is set up for automated code reviews on this repo. Configure here.

Note

Bound brain_store busy deferral with deadline-aware timeouts across store and drain paths

  • Introduces a busy budget (default 3000ms, configurable via BRAINLAYER_STORE_BUSY_BUDGET_MS) that caps total wait time across VectorStore init, write transactions, and supersede updates in store_handler.py.
  • Adds _temporary_store_busy_timeout context manager and per-statement exec trace hooks to refresh busy_timeout before each SQL statement and restore original settings on exit.
  • Extends store_memory in store.py with busy_deadline and retry_on_busy parameters so internal retry loops honor the caller's deadline instead of sleeping independently.
  • Bootstraps drain DB schema when store events are present in drain.py; preserves the queued batch on WriterInUseError and retries once on missing chunks table.
  • Risk: callers that previously waited indefinitely on busy SQLite or VectorStore lock contention will now receive BusyError once the budget is exhausted.

Macroscope summarized c048857.

Summary by CodeRabbit

  • New Features

    • Added configurable store busy budget (BRAINLAYER_STORE_BUSY_BUDGET_MS) to bound how long store operations wait under contention.
  • Improvements

    • Enhanced deadline-aware busy handling for storing and superseding chunks.
    • Improved drain reliability by initializing missing DB schema without losing queued JSONL files on busy/“missing chunks” failures.
    • Refined supersede application so updates and superseded vector cleanup stay consistent.
  • Tests

    • Expanded coverage for busy-budget behavior, lock contention, queue preservation, and supersede correctness during drain replay.

@EtanHey

EtanHey commented Jun 19, 2026

Copy link
Copy Markdown
Owner Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 19, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

This PR introduces deadline-bounded APSW busy-timeout handling across vector_store.py, store.py, and mcp/store_handler.py using thread-local state, exec-trace hooks, and context managers. store_memory gains busy_deadline/retry_on_busy parameters; the MCP layer adds a configurable busy budget, _validate_store_request, and wraps all write operations including supersede_chunk with the deadline context. drain.py refactors inline supersede DB/vector updates into _apply_store_supersedes and adds queue-preserving DB schema initialization that preserves batch files on busy errors.

Changes

Busy-budget deadline system, supersedes refactor, and queue-preserving drain

Layer / File(s) Summary
Deadline-aware busy-timeout core in VectorStore
src/brainlayer/vector_store.py
Adds thread-local write-deadline state, a sentinel, deadline helper functions, an APSW exec-trace hook that refreshes busy-timeout before each statement, a temporary_write_busy_timeout context manager for overrides, and updates _init_db_with_retry / _init_db to check deadline expiration and restore defaults on success.
store_memory deadline parameters and retry helpers
src/brainlayer/store.py
Adds _MAX_APSW_BUSY_TIMEOUT_MS, _set_busy_timeout_for_deadline, and _sleep_before_busy_retry; extends store_memory with busy_deadline and retry_on_busy parameters; applies deadline-derived timeout before each transaction and makes BusyError retry conditional in both the main and re-embedding loops.
MCP busy-budget constants, _get_vector_store timeout, and validation
src/brainlayer/mcp/_shared.py, src/brainlayer/mcp/store_handler.py
Defines BRAINLAYER_STORE_BUSY_BUDGET_MS and deadline helpers; changes _get_vector_store to accept a timeout parameter, use conditional lock acquisition with acquire(timeout=...), and raise apsw.BusyError on timeout; adds _get_store_vector_store(deadline) and _validate_store_request for centralized input validation.
_temporary_store_busy_timeout and _store_memory_with_retries rework
src/brainlayer/mcp/store_handler.py
Adds _temporary_store_busy_timeout context manager that installs/restores exec-trace and busy-timeout on the APSW connection; replaces _store_memory_with_retries with a deadline-aware version that wraps store_memory, computes remaining budget, and raises instead of sleeping when the budget is exhausted or lock errors are not retryable.
_store orchestration: validation, deadline wiring, supersede timeout
src/brainlayer/mcp/store_handler.py
Updates _store to call _validate_store_request in both paths; computes a deadline from the busy budget, sets temporary_write_busy_timeout_ms, and passes the deadline into _get_store_vector_store and _store_memory_with_retries; wraps supersede_chunk inside _temporary_store_busy_timeout.
Drain supersedes refactoring, schema bootstrap, and WriterInUseError handling
src/brainlayer/drain.py
Extracts inline supersede DB/vector updates into _apply_store_supersedes helper; derives supersedes once and calls the helper in both existing-content and insert/merge branches. Adds _ensure_drain_db_schema, _db_needs_initial_schema, and _ensure_drain_db_schema_preserving_queue to initialize schema while preserving queue files on busy errors. Introduces _events_include_store and _is_missing_chunks_error helpers; extends _is_busy_error to treat WriterInUseError as busy. Refactors burn_drain_once and drain_once to pre-check for store events, ensure schema initialization before transaction start, preserve queues on busy failures, and retry schema init on missing-chunks errors during processing.
Tests: busy-budget, contention, queue fallback, and supersedes
tests/test_arbitration.py, tests/test_store_handler.py
Adds regression test verifying drain_once applies superseded_by for already-present chunks. Adds ~750 lines of new async tests covering input validation before deferral, deferred write path with durable pending-queue persistence, concurrent retry timeout accounting, MCP budget bounding BEGIN IMMEDIATE, cold VectorStore init respecting budget and restoring timeouts, DDL deadline refresh, singleton init-lock wait, outer retry disabling internal sleep, post-store supersede bounded by budget, rollback cleanup after budget expiry, chunk-id pre-assignment stability, queued metadata preservation, writer-pidfile contention converted to queue, and arbitrated mode queued events.
Tests: queue drain with schema bootstrap resilience
tests/test_write_queue.py
Updates imports to include burn_drain_once; adds three new test cases for store event queue replay: schema creation before replay, drain_once preserves queue on WriterInUseError, and burn_drain_once reports correct failure counts while preserving queue and logging "batch preserved".

Sequence Diagram(s)

sequenceDiagram
  participant MCP as MCP Client
  participant _store
  participant _validate_store_request
  participant _get_store_vector_store
  participant _store_memory_with_retries
  participant _temporary_store_busy_timeout as _temporary_store_busy_timeout ctx
  participant store_memory
  participant supersede_chunk

  MCP->>_store: store request
  _store->>_validate_store_request: validate(content, type, ...)
  _validate_store_request-->>_store: ok or raise
  _store->>_store: compute deadline from busy budget
  _store->>_get_store_vector_store: get(deadline)
  _get_store_vector_store-->>_store: VectorStore or BusyError
  _store->>_store_memory_with_retries: call(store, content, deadline)
  _store_memory_with_retries->>_temporary_store_busy_timeout: enter(remaining_ms)
  _temporary_store_busy_timeout->>store_memory: call(busy_deadline, retry_on_busy=False)
  alt BusyError within budget
    store_memory-->>_store_memory_with_retries: BusyError
    _store_memory_with_retries-->>_store: defer to pending queue
  else success
    store_memory-->>_store_memory_with_retries: chunk_id
    _store_memory_with_retries-->>_store: chunk_id
    _store->>_temporary_store_busy_timeout: enter for supersede
    _temporary_store_busy_timeout->>supersede_chunk: call(old_id, new_id)
    supersede_chunk-->>_store: done
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~85 minutes

Possibly related PRs

  • EtanHey/brainlayer#45: Both PRs modify src/brainlayer/vector_store.py busy-timeout and APSW hook behavior for SQLite lock/busy contention handling.
  • EtanHey/brainlayer#282: This PR's drain queue-preserving schema bootstrap and _apply_store_supersedes refactoring directly extend the arbitration queue drain introduced in that PR.
  • EtanHey/brainlayer#65: Both PRs modify src/brainlayer/mcp/store_handler.py to handle APSW busy/locked conditions via retry/defer/queue logic around store_memory and supersede_chunk.

Poem

🐇 A deadline ticks, the lock must yield,
Exec-traces guard each SQL field.
Supersedes now share a helper neat,
The busy budget keeps time's beat.
Queue preserved when locks contend —
The rabbit hops till all threads blend! 🗄️✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 18.29% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'fix: bound brain_store busy deferral' is specific, concise, and directly describes the main change: bounding write retries with a configurable timeout budget to prevent indefinite blocking under SQLite contention.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/brain-store-busy-defer

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@EtanHey

EtanHey commented Jun 19, 2026

Copy link
Copy Markdown
Owner Author

@codex review

@EtanHey

EtanHey commented Jun 19, 2026

Copy link
Copy Markdown
Owner Author

@cursor @BugBot review

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@cursor

cursor Bot commented Jun 19, 2026

Copy link
Copy Markdown

You need to increase your spend limit or enable usage-based billing to run background agents. Go to Cursor

@coderabbitai

coderabbitai Bot commented Jun 19, 2026

Copy link
Copy Markdown
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Comment thread src/brainlayer/mcp/store_handler.py Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 53b2fa6c54

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/brainlayer/mcp/store_handler.py Outdated
Comment on lines +627 to +629
_set_connection_busy_timeout(conn, remaining_ms)
try:
return store_memory(**kwargs)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Bound the inner store retry loop too

When a real SQLite writer holds the lock, this timeout is not a total budget because store_memory has its own 5-attempt apsw.BusyError retry loop around BEGIN IMMEDIATE and sleeps between attempts. Setting the connection timeout to the full remaining budget before calling it lets one outer attempt spend that timeout on each inner try; with the default 3000ms budget, brain_store can wait roughly 15s plus sleeps before it queues, instead of deferring after ~3s. This still leaves MCP store calls blocked under the DB-lock scenario this change is meant to bound.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in fa6a1ba. The MCP path now passes a monotonic busy_deadline into store_memory, and store_memory refreshes the SQLite busy_timeout before each internal BEGIN IMMEDIATE retry while refusing sleeps that would exceed the same deadline. Added a regression that failed at ~0.42s before the fix with an 80ms budget and now defers inside the budget.

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@EtanHey

EtanHey commented Jun 19, 2026

Copy link
Copy Markdown
Owner Author

@cursor @BugBot review

@EtanHey

EtanHey commented Jun 19, 2026

Copy link
Copy Markdown
Owner Author

@codex review

@cursor

cursor Bot commented Jun 19, 2026

Copy link
Copy Markdown

You need to increase your spend limit or enable usage-based billing to run background agents. Go to Cursor

Comment thread src/brainlayer/mcp/store_handler.py

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b26be70a87

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/brainlayer/mcp/store_handler.py Outdated
Comment on lines +646 to +647
with _temporary_store_busy_timeout(conn, remaining_ms):
return store_memory(**kwargs)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Enforce the busy budget across inner retries

When a real writer lock is held, passing the full remaining_ms into the connection here does not bound the whole store attempt because store_memory catches apsw.BusyError and retries five BEGIN IMMEDIATEs internally, each with the same busy timeout plus sleeps. That means a configured 100ms budget can take roughly 2s before queueing, and the default 3000ms budget can block the MCP call for ~16s instead of deferring promptly, so the new busy-budget contract is not actually enforced under the lock-contention case this change targets.

Useful? React with 👍 / 👎.

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@EtanHey

EtanHey commented Jun 19, 2026

Copy link
Copy Markdown
Owner Author

@codex review

@EtanHey

EtanHey commented Jun 19, 2026

Copy link
Copy Markdown
Owner Author

@cursor @BugBot review

@EtanHey

EtanHey commented Jun 19, 2026

Copy link
Copy Markdown
Owner Author

@coderabbitai review

@cursor

cursor Bot commented Jun 19, 2026

Copy link
Copy Markdown

You need to increase your spend limit or enable usage-based billing to run background agents. Go to Cursor

@coderabbitai

coderabbitai Bot commented Jun 19, 2026

Copy link
Copy Markdown
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fa6a1ba876

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/brainlayer/mcp/store_handler.py Outdated
Comment on lines +647 to +648
budget_ms = _store_busy_budget_ms()
deadline = time.monotonic() + (budget_ms / 1000)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Start the store budget before opening the VectorStore

When the MCP process is cold and another connection already holds the SQLite write/schema lock, _store calls _get_vector_store() before entering this helper, so VectorStore initialization can still sit on its own 10s+ busy timeout/retry path before BRAINLAYER_STORE_BUSY_BUDGET_MS starts. In that context brain_store will not defer within the configured budget even though the write would eventually be queued; the budget needs to cover the initial DB open/init path as well as store_memory().

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in f0f50d9. The store path now starts a monotonic busy deadline before cold VectorStore acquisition, scopes new write connections to the remaining store budget, and VectorStore init stops retrying once the same deadline is exhausted. Added a regression that failed before the fix by retrying cold init past a 100ms store budget and now queues promptly.

Comment thread src/brainlayer/store.py

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Medium

if pending_reembed is not None and embed_fn is not None:

If the re-embed transaction loop (lines 380–400) raises — which is now more likely since _set_busy_timeout_for_deadline (line 384) and _sleep_before_busy_retry (line 396) both raise BusyError when busy_deadline is exceeded — the cache invalidation at lines 402–407 never executes, even though the main chunk write already committed at line 363. This leaves _hybrid_cache, _audit_recursion_count_cache, and _checkpoint_count_cache stale after a successful write. Move the cache invalidation block before the re-embed loop so it always runs after the main transaction commits.

🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file @src/brainlayer/store.py around line 377:

If the re-embed transaction loop (lines 380–400) raises — which is now more likely since `_set_busy_timeout_for_deadline` (line 384) and `_sleep_before_busy_retry` (line 396) both raise `BusyError` when `busy_deadline` is exceeded — the cache invalidation at lines 402–407 never executes, even though the main chunk write already committed at line 363. This leaves `_hybrid_cache`, `_audit_recursion_count_cache`, and `_checkpoint_count_cache` stale after a successful write. Move the cache invalidation block before the re-embed loop so it always runs after the main transaction commits.

Evidence trail:
src/brainlayer/store.py lines 56-62 (_busy_deadline_timeout_ms raises apsw.BusyError), lines 65-69 (_set_busy_timeout_for_deadline calls _busy_deadline_timeout_ms), lines 72-78 (_sleep_before_busy_retry raises apsw.BusyError), lines 363-364 (main transaction commits), lines 377-400 (re-embed loop that can propagate BusyError), lines 402-407 (cache invalidation that is skipped on exception). Commit: REVIEWED_COMMIT.

@EtanHey

EtanHey commented Jun 19, 2026

Copy link
Copy Markdown
Owner Author

@coderabbitai review

@EtanHey

EtanHey commented Jun 19, 2026

Copy link
Copy Markdown
Owner Author

@cursor @BugBot review

@EtanHey

EtanHey commented Jun 19, 2026

Copy link
Copy Markdown
Owner Author

@codex review

@cursor

cursor Bot commented Jun 19, 2026

Copy link
Copy Markdown

You need to increase your spend limit or enable usage-based billing to run background agents. Go to Cursor

@coderabbitai

coderabbitai Bot commented Jun 19, 2026

Copy link
Copy Markdown
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0fa85210c3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

original_exec_trace = _connection_exec_trace(conn)

def refresh_timeout(cursor, statement, bindings):
_set_connection_busy_timeout(conn, _remaining_store_busy_budget_ms(deadline))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Let rollback bypass the expired busy trace

When this exec trace is installed, _remaining_store_busy_budget_ms() raises before every SQL statement once the store budget is exhausted. If store_memory() has already entered BEGIN IMMEDIATE and a later statement/commit hits busy until the deadline, its except apsw.BusyError cleanup (ROLLBACK, and similar savepoint cleanup in dedupe helpers) will also run through this trace and be aborted before it executes, leaving the shared MCP write connection inside an open transaction. Under exactly that contention path, subsequent brain_store/write calls can remain poisoned rather than cleanly deferring; the trace should allow transaction-control cleanup or be disabled before rollback.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 62bb380. The temporary store busy exec trace now lets rollback/release cleanup statements pass through even after the busy-budget deadline has expired, while normal statements still refresh the connection busy_timeout from the same deadline. Added test_store_busy_budget_allows_expired_rollback_cleanup covering ROLLBACK, ROLLBACK TO, and RELEASE after expiration.

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@EtanHey

EtanHey commented Jun 19, 2026

Copy link
Copy Markdown
Owner Author

Fixed the Codex P1 rollback-cleanup finding in 62bb380. Local verification on 62bb380: focused regression failed before the fix and passed after; tests/test_store_handler.py passed 22 tests; ruff check/format passed for the touched files; uncached ./scripts/run_tests.sh passed with 3021 passed, 9 skipped, 77 deselected, 1 xfailed; push pre-push gate passed with 2991 passed, 9 skipped, 61 deselected, 1 xfailed; MCP registration 3 passed; isolated eval/hook routing 40 passed; bun suite passed; test_fts5_determinism.sh passed.

@EtanHey

EtanHey commented Jun 19, 2026

Copy link
Copy Markdown
Owner Author

@codex review

@EtanHey

EtanHey commented Jun 19, 2026

Copy link
Copy Markdown
Owner Author

@coderabbitai review

@EtanHey

EtanHey commented Jun 19, 2026

Copy link
Copy Markdown
Owner Author

@cursor @BugBot review

@cursor

cursor Bot commented Jun 19, 2026

Copy link
Copy Markdown

You need to increase your spend limit or enable usage-based billing to run background agents. Go to Cursor

@coderabbitai

coderabbitai Bot commented Jun 19, 2026

Copy link
Copy Markdown
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 62bb380889

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +823 to +824
with temporary_write_busy_timeout_ms(_remaining_store_busy_budget_ms(deadline), deadline=deadline):
store = _get_store_vector_store(deadline)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid deferring before the DB schema exists

When the database is fresh, this applies the store busy budget to VectorStore construction, so a low budget or slow first-run migration can return a DEFERRED receipt before the chunks table has ever been created. The queued file is then not replayable by drain_once, which opens APSW directly and fails on the missing chunks table; I reproduced this with BRAINLAYER_STORE_BUSY_BUDGET_MS=1 on a fresh DB, leaving the queue file preserved forever. Consider letting schema initialization complete outside this budget or making the drain initialize the schema before applying queued stores.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e940230. The drain path now initializes the BrainLayer schema before applying queued store events when the DB file is missing/empty, and also retries after a store replay hits no such table: chunks, so stores queued before first-run schema creation can replay instead of staying preserved forever. Added test_drain_store_event_initializes_missing_schema, which starts with a fresh DB path and verifies drain_once creates schema, applies the queued store, and removes the queue file.

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@EtanHey

EtanHey commented Jun 19, 2026

Copy link
Copy Markdown
Owner Author

Verification update for e940230:

  • Red case: pytest tests/test_write_queue.py::TestFlushPendingStores::test_drain_store_event_initializes_missing_schema -q failed before the drain schema-init fix (assert 0 == 1).
  • Focused checks after the fix passed: pytest tests/test_hot_currentness_benchmark.py -q, pytest tests/test_write_queue.py::TestFlushPendingStores -q, pytest tests/test_arbitration.py -q, and ruff check src/brainlayer/drain.py tests/test_write_queue.py && ruff format --check src/brainlayer/drain.py tests/test_write_queue.py.
  • Full local gate passed: ./scripts/run_tests.sh completed with 3022 passed, 9 skipped, 77 deselected, 1 xfailed.
  • Pre-push gate passed while pushing e940230: pytest unit suite completed with 2992 passed, 9 skipped, 61 deselected, 1 xfailed; MCP registration, isolated eval/hook routing, bun, and FTS determinism checks also passed.
  • Local coderabbit review --agent was bounded with timeout 180 and timed out before returning findings; PR-level CodeRabbit/Bugbot/Codex/Macroscope review is requested below.

@codex review
@cursor @BugBot re-review
@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 19, 2026

Copy link
Copy Markdown

@EtanHey Thanks for the detailed verification update on e940230. The drain schema-init fix is well-documented — RED case confirmed before the fix (assert 0 == 1), GREEN after, with the full local gate at 3022 passed and the pre-push gate at 2992 passed. Kicking off the incremental review now.

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit e940230. Configure here.

Comment thread src/brainlayer/vector_store.py Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e9402305d9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/brainlayer/drain.py
def _ensure_drain_db_schema(db_path: Path) -> None:
from .vector_store import VectorStore

store = VectorStore(db_path)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve queued stores when schema bootstrap is contended

When a store batch is drained against a missing or empty DB while another BrainLayer process already owns the VectorStore writer pidfile, this VectorStore(db_path) construction raises WriterInUseError. The helper is called from drain_once/burn_drain_once before their existing failure-preserving retry blocks, and that exception is not treated as an APSW busy error, so the drain daemon exits instead of leaving the JSONL file queued for a later retry; a first-run/startup race can strand queued brain_store events until the daemon is restarted.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in c048857. drain_once and burn_drain_once now treat WriterInUseError as busy-like schema bootstrap contention and preserve queued store batches instead of letting the exception escape. The same preserve path is used for missing-chunks schema bootstrap retries. Added drain and burn-drain regressions that patch schema initialization to raise WriterInUseError and assert the queue file remains for retry.

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@EtanHey

EtanHey commented Jun 19, 2026

Copy link
Copy Markdown
Owner Author

Review-fix update for c048857:

  • Fixed Bugbot medium Init trace blocks rollback cleanup by allowing rollback/release cleanup SQL through the VectorStore init deadline trace after expiry.
  • Fixed Codex P2 Preserve queued stores when schema bootstrap is contended by preserving drain/burn-drain store batches when schema bootstrap hits WriterInUseError.
  • Red/green: the three new regressions failed before the fix and pass now.
  • Focused checks passed: pytest tests/test_store_handler.py -q (23 passed), pytest tests/test_write_queue.py -q (47 passed), pytest tests/test_arbitration.py -q (21 passed), pytest tests/test_hot_currentness_benchmark.py -q (8 passed), and ruff check/format on touched files.
  • Local coderabbit review --agent was bounded with timeout 180 and timed out while reviewing.
  • Pre-push gate passed while pushing c048857: 2995 selected unit tests passed, 9 skipped, 61 deselected, 1 xfailed; MCP registration passed; isolated eval/hook routing passed; bun suite passed; FTS determinism regression passed.

@codex review
@cursor @BugBot re-review
@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 19, 2026

Copy link
Copy Markdown

@EtanHey Thanks for the thorough verification update on c048857. Both fixes are well-documented:

  • Init trace blocks rollback cleanup: Allowing rollback/release cleanup SQL through the VectorStore init deadline trace after expiry is the right approach — expiry should gate new work, not block necessary cleanup.
  • Preserve queued stores on contended schema bootstrap: Preserving drain/burn-drain batches when WriterInUseError is hit prevents data loss under contention, which is exactly the right behavior.

RED/GREEN confirmation on all three new regressions before and after the fix is solid. Kicking off the incremental review now.

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/brainlayer/drain.py`:
- Around line 1226-1233: The current code at the
`_ensure_drain_db_schema_preserving_queue` call only breaks from the inner retry
loop when schema bootstrap is busy, allowing the outer file loop to continue
processing other queue files. This reorders queued writes since the current
store file remains queued while later files are processed. Instead of breaking
from the inner loop, break from the outer loop (or return if applicable) to
match the behavior at line 1161 in the pre-transaction path, ensuring the store
file is fully handled before advancing to process other files.

In `@tests/test_write_queue.py`:
- Around line 323-355: The test
test_drain_preserves_store_queue_when_schema_bootstrap_writer_in_use currently
only verifies that the queued file exists after the drain operation but does not
verify that its content is unchanged. Capture the original JSONL content of the
queued file after writing it with queued.write_text, then after the drain_once
call completes, add an assertion that compares the current content of the queued
file with the captured original content to ensure the payload itself was not
modified or truncated.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 5c0b6578-91f4-4929-8bb2-baecc3eeffe7

📥 Commits

Reviewing files that changed from the base of the PR and between 62bb380 and c048857.

📒 Files selected for processing (4)
  • src/brainlayer/drain.py
  • src/brainlayer/vector_store.py
  • tests/test_store_handler.py
  • tests/test_write_queue.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: test (3.11)
  • GitHub Check: test (3.12)
  • GitHub Check: test (3.13)
  • GitHub Check: Macroscope - Correctness Check
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Flag risky DB or concurrency changes explicitly and do not hand-wave lock behavior
Enforce one-write-at-a-time concurrency constraint; reads are safe but brain_digest is write-heavy and must not run in parallel with other MCP work
Run pytest before claiming behavior changed safely; current test suite has 929 tests

Files:

  • tests/test_write_queue.py
  • src/brainlayer/drain.py
  • src/brainlayer/vector_store.py
  • tests/test_store_handler.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Run tests with pytest and use BRAINLAYER_PREPUSH=1 for pre-push validation

Files:

  • tests/test_write_queue.py
  • tests/test_store_handler.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Run linting with ruff check src/ && ruff format src/ on all Python source files

Files:

  • src/brainlayer/drain.py
  • src/brainlayer/vector_store.py
src/brainlayer/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Resolve database path using paths.py:get_db_path() with env var override; canonical path is ~/.local/share/brainlayer/brainlayer.db

Files:

  • src/brainlayer/drain.py
  • src/brainlayer/vector_store.py
src/brainlayer/**/*vector_store*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Drop FTS triggers before bulk deletes on chunks table; recreate after completion

Files:

  • src/brainlayer/vector_store.py
🪛 ast-grep (0.43.0)
tests/test_write_queue.py

[info] 296-305: use jsonify instead of json.dumps for JSON output
Context: json.dumps(
{
"kind": "store_memory",
"chunk_id": "manual-fresh-schema",
"content": "fresh queued store should create schema before replay",
"memory_type": "note",
"project": "test",
"created_at": "2026-06-19T08:54:00Z",
}
)
Note: Security best practice.

(use-jsonify)


[info] 334-342: use jsonify instead of json.dumps for JSON output
Context: json.dumps(
{
"kind": "store_memory",
"chunk_id": "manual-schema-contended",
"content": "store queue should survive contended schema bootstrap",
"memory_type": "note",
"project": "test",
}
)
Note: Security best practice.

(use-jsonify)


[info] 368-376: use jsonify instead of json.dumps for JSON output
Context: json.dumps(
{
"kind": "store_memory",
"chunk_id": "manual-schema-contended",
"content": "burn drain should preserve contended schema bootstrap",
"memory_type": "note",
"project": "test",
}
)
Note: Security best practice.

(use-jsonify)

🔇 Additional comments (6)
src/brainlayer/vector_store.py (3)

124-140: LGTM!

Also applies to: 656-670, 676-678


117-121: Keep commit-like RELEASE statements deadline-bound.

Line 121 treats every RELEASE as cleanup, but RELEASE SAVEPOINT can be the commit path for a savepoint. Skipping _refresh_write_busy_timeout_for_deadline() there can leave a stale busy timeout and overrun the store budget. Only exempt rollback cleanup, or track rollback-cleanup state before exempting RELEASE.

As per coding guidelines, **/*.py: "Flag risky DB or concurrency changes explicitly and do not hand-wave lock behavior."

Suggested narrowing
 def _is_write_busy_cleanup_statement(statement) -> bool:
     if not isinstance(statement, str):
         return False
     sql = statement.lstrip().upper()
-    return sql.startswith("ROLLBACK") or sql.startswith("RELEASE")
+    return sql.startswith("ROLLBACK")

632-653: Close failed init connections before backing off.

If _init_db() opens self.conn and then raises a retryable APSW error, this path sleeps/retries without closing the failed connection or removing its deadline trace. That can keep locks/file descriptors alive across the backoff and make the next init attempt contend with its own abandoned connection.

As per coding guidelines, **/*.py: "Flag risky DB or concurrency changes explicitly and do not hand-wave lock behavior."

Suggested cleanup before retry
             except apsw.Error as e:
                 if not _is_retryable_init_error(e):
                     raise
                 last_err = e
+                failed_conn = getattr(self, "conn", None)
+                if failed_conn is not None:
+                    _restore_write_busy_deadline_trace(
+                        failed_conn,
+                        getattr(self, "_init_write_busy_deadline_trace", _NO_EXEC_TRACE),
+                    )
+                    self._init_write_busy_deadline_trace = _NO_EXEC_TRACE
+                    try:
+                        failed_conn.close()
+                    except apsw.Error:
+                        pass
+                    self.conn = None
                 delay = min(self._INIT_BASE_DELAY * (2**attempt), self._INIT_MAX_DELAY)
src/brainlayer/drain.py (1)

177-207: LGTM!

Also applies to: 348-353, 807-810, 1031-1090, 1155-1161

tests/test_store_handler.py (1)

507-558: LGTM!

tests/test_write_queue.py (1)

20-20: LGTM!

Also applies to: 287-321

Comment thread src/brainlayer/drain.py
Comment on lines +1226 to +1233
if not _ensure_drain_db_schema_preserving_queue(
db_path,
log_path,
context=f"drain failed for {path.name}",
force=True,
):
break
continue

@coderabbitai coderabbitai Bot Jun 19, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Stop draining after preserving a schema-blocked store file.

When forced schema bootstrap is busy, Line 1232 only exits the retry loop. The outer file loop then advances, so later queue files can be applied/unlinked while this store file remains queued, reordering queued writes. Return or break the outer loop here, matching the pre-transaction path at Line 1161.

As per coding guidelines, **/*.py: “Flag risky DB or concurrency changes explicitly and do not hand-wave lock behavior.”

Suggested stop-on-preserve fix
                         if not _ensure_drain_db_schema_preserving_queue(
                             db_path,
                             log_path,
                             context=f"drain failed for {path.name}",
                             force=True,
                         ):
-                            break
+                            return drained
                         continue
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if not _ensure_drain_db_schema_preserving_queue(
db_path,
log_path,
context=f"drain failed for {path.name}",
force=True,
):
break
continue
if not _ensure_drain_db_schema_preserving_queue(
db_path,
log_path,
context=f"drain failed for {path.name}",
force=True,
):
return drained
continue
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/brainlayer/drain.py` around lines 1226 - 1233, The current code at the
`_ensure_drain_db_schema_preserving_queue` call only breaks from the inner retry
loop when schema bootstrap is busy, allowing the outer file loop to continue
processing other queue files. This reorders queued writes since the current
store file remains queued while later files are processed. Instead of breaking
from the inner loop, break from the outer loop (or return if applicable) to
match the behavior at line 1161 in the pre-transaction path, ensuring the store
file is fully handled before advancing to process other files.

Source: Coding guidelines

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 2ca90cc. drain_once now sets a stop flag when forced schema bootstrap is busy after a missing-chunks failure, breaks the outer file loop, and leaves later queue files untouched until the first preserved store can replay. Added test_drain_stops_after_forced_schema_bootstrap_writer_in_use, which failed before the fix by advancing to the second file and now verifies both queued JSONL payloads are unchanged.

Comment thread tests/test_write_queue.py

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c048857219

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/brainlayer/drain.py
Comment on lines +1231 to +1232
):
break

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Stop draining after contended schema bootstrap

When the DB file already exists but lacks chunks, a store file reaches this forced schema-bootstrap path; if bootstrap is busy, this break exits only the retry loop, so drain_once continues to later queue files while preserving the current one. A subsequent file can then initialize the schema and be applied/deleted before the earlier preserved store, violating queue order and potentially applying supersedes/dedupe against the wrong prior state. This should stop the outer drain pass just like the pre-transaction schema-busy path above.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 2ca90cc. drain_once now stops the outer file loop when forced schema bootstrap is busy after a missing-chunks failure, so later queue files are not applied or unlinked while the first store file remains queued. Added test_drain_stops_after_forced_schema_bootstrap_writer_in_use, which failed before the fix by calling schema bootstrap for the second file and now verifies both queued payloads are unchanged.

Comment thread src/brainlayer/drain.py
@EtanHey EtanHey merged commit 6a89a92 into main Jun 19, 2026
@EtanHey

EtanHey commented Jun 19, 2026

Copy link
Copy Markdown
Owner Author

BL7 latest update after review fixes:

Red/green for 2ca90cc:

  • test_drain_stops_after_forced_schema_bootstrap_writer_in_use failed before the fix because drain_once advanced to the second queue file after forced schema bootstrap contention; now it passes and asserts both queued JSONL payloads are unchanged.
  • test_burn_drain_missing_chunks_retry_counts_events_once failed before the fix with applied_events == 3; now it passes with 2 committed events.

Local verification:

  • pytest tests/test_write_queue.py::TestFlushPendingStores::test_drain_stops_after_forced_schema_bootstrap_writer_in_use tests/test_write_queue.py::TestFlushPendingStores::test_burn_drain_missing_chunks_retry_counts_events_once -q -> 2 passed.
  • ruff check src/brainlayer/drain.py tests/test_write_queue.py -> passed.
  • ruff format --check src/brainlayer/drain.py tests/test_write_queue.py -> 2 files already formatted.
  • Local CodeRabbit agent timed out at the 180s cap without returning findings.
  • Pre-push gate passed: 2997 passed, 9 skipped, 61 deselected, 1 xfailed, 103 warnings, then MCP registration 3 passed, isolated eval/hook routing 40 passed, Bun 1 pass, and FTS determinism passed. The worker exclusion regression also passed.

@codex review
@cursor @BugBot re-review
@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 19, 2026

Copy link
Copy Markdown

@EtanHey Thanks for the detailed verification update on 2ca90cc. Both fixes are well-documented:

  • drain_once stops after forced schema bootstrap contention: Correct behavior — when schema bootstrap hits WriterInUseError, drain_once should halt rather than advance to the next queue file, preserving both payloads intact. The assertion that both queued JSONL payloads are unchanged confirms the queue-preservation invariant holds.
  • burn_drain_once counts events once on missing-chunks retry: applied_events == 2 (committed) rather than 3 is the right outcome — events should not be double-counted across a schema-init retry within the same batch.

Ruff clean on both touched files and the full pre-push gate at 2997 passed with all ancillary checks (MCP registration, eval/hook routing, bun, FTS determinism, worker exclusion regression) is solid coverage. Kicking off the incremental review now.

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6a89a922ee

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/brainlayer/drain.py
context=f"drain failed for {path.name}",
force=True,
):
break

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Stop draining after forced schema bootstrap is busy

When the DB file is non-empty but still lacks chunks, the first store file reaches this forced schema bootstrap path; if another writer owns schema init, this break exits only the retry loop, so the outer for path in files continues and attempts later queue files in the same drain run. That violates FIFO/write-safety under schema contention and can repeatedly open the DB/log failures for subsequent payloads instead of preserving the queue until the writer releases; return or break the outer drain loop here.

Useful? React with 👍 / 👎.

Comment thread src/brainlayer/drain.py
Comment on lines +1058 to +1059
applied = _apply_event(conn, event)
result.applied_events += 1

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3 Badge Defer burn-drain counters until commit succeeds

This increments the public applied_events count inside the transaction before the new missing-chunks retry path can roll it back. If a batch contains an event that returns before a later store raises no such table: chunks, the forced schema retry applies the batch again and the result/log over-count rolled-back work (for example two queued lines can report applied_events=3 after one retry). Accumulate counts locally per attempt and add them to result only after COMMIT succeeds.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant