fix: bound brain_store busy deferral#516
Conversation
|
@coderabbitai review |
📝 WalkthroughWalkthroughThis PR introduces deadline-bounded APSW busy-timeout handling across ChangesBusy-budget deadline system, supersedes refactor, and queue-preserving drain
Sequence Diagram(s)sequenceDiagram
participant MCP as MCP Client
participant _store
participant _validate_store_request
participant _get_store_vector_store
participant _store_memory_with_retries
participant _temporary_store_busy_timeout as _temporary_store_busy_timeout ctx
participant store_memory
participant supersede_chunk
MCP->>_store: store request
_store->>_validate_store_request: validate(content, type, ...)
_validate_store_request-->>_store: ok or raise
_store->>_store: compute deadline from busy budget
_store->>_get_store_vector_store: get(deadline)
_get_store_vector_store-->>_store: VectorStore or BusyError
_store->>_store_memory_with_retries: call(store, content, deadline)
_store_memory_with_retries->>_temporary_store_busy_timeout: enter(remaining_ms)
_temporary_store_busy_timeout->>store_memory: call(busy_deadline, retry_on_busy=False)
alt BusyError within budget
store_memory-->>_store_memory_with_retries: BusyError
_store_memory_with_retries-->>_store: defer to pending queue
else success
store_memory-->>_store_memory_with_retries: chunk_id
_store_memory_with_retries-->>_store: chunk_id
_store->>_temporary_store_busy_timeout: enter for supersede
_temporary_store_busy_timeout->>supersede_chunk: call(old_id, new_id)
supersede_chunk-->>_store: done
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~85 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@codex review |
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
|
You need to increase your spend limit or enable usage-based billing to run background agents. Go to Cursor |
✅ Action performedReview finished.
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 53b2fa6c54
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| _set_connection_busy_timeout(conn, remaining_ms) | ||
| try: | ||
| return store_memory(**kwargs) |
There was a problem hiding this comment.
Bound the inner store retry loop too
When a real SQLite writer holds the lock, this timeout is not a total budget because store_memory has its own 5-attempt apsw.BusyError retry loop around BEGIN IMMEDIATE and sleeps between attempts. Setting the connection timeout to the full remaining budget before calling it lets one outer attempt spend that timeout on each inner try; with the default 3000ms budget, brain_store can wait roughly 15s plus sleeps before it queues, instead of deferring after ~3s. This still leaves MCP store calls blocked under the DB-lock scenario this change is meant to bound.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in fa6a1ba. The MCP path now passes a monotonic busy_deadline into store_memory, and store_memory refreshes the SQLite busy_timeout before each internal BEGIN IMMEDIATE retry while refusing sleeps that would exceed the same deadline. Added a regression that failed at ~0.42s before the fix with an 80ms budget and now defers inside the budget.
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
|
@codex review |
|
You need to increase your spend limit or enable usage-based billing to run background agents. Go to Cursor |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b26be70a87
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| with _temporary_store_busy_timeout(conn, remaining_ms): | ||
| return store_memory(**kwargs) |
There was a problem hiding this comment.
Enforce the busy budget across inner retries
When a real writer lock is held, passing the full remaining_ms into the connection here does not bound the whole store attempt because store_memory catches apsw.BusyError and retries five BEGIN IMMEDIATEs internally, each with the same busy timeout plus sleeps. That means a configured 100ms budget can take roughly 2s before queueing, and the default 3000ms budget can block the MCP call for ~16s instead of deferring promptly, so the new busy-budget contract is not actually enforced under the lock-contention case this change targets.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
|
@codex review |
|
@coderabbitai review |
|
You need to increase your spend limit or enable usage-based billing to run background agents. Go to Cursor |
✅ Action performedReview finished.
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fa6a1ba876
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| budget_ms = _store_busy_budget_ms() | ||
| deadline = time.monotonic() + (budget_ms / 1000) |
There was a problem hiding this comment.
Start the store budget before opening the VectorStore
When the MCP process is cold and another connection already holds the SQLite write/schema lock, _store calls _get_vector_store() before entering this helper, so VectorStore initialization can still sit on its own 10s+ busy timeout/retry path before BRAINLAYER_STORE_BUSY_BUDGET_MS starts. In that context brain_store will not defer within the configured budget even though the write would eventually be queued; the budget needs to cover the initial DB open/init path as well as store_memory().
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in f0f50d9. The store path now starts a monotonic busy deadline before cold VectorStore acquisition, scopes new write connections to the remaining store budget, and VectorStore init stops retrying once the same deadline is exhausted. Added a regression that failed before the fix by retrying cold init past a 100ms store budget and now queues promptly.
There was a problem hiding this comment.
🟡 Medium
brainlayer/src/brainlayer/store.py
Line 377 in fa6a1ba
If the re-embed transaction loop (lines 380–400) raises — which is now more likely since _set_busy_timeout_for_deadline (line 384) and _sleep_before_busy_retry (line 396) both raise BusyError when busy_deadline is exceeded — the cache invalidation at lines 402–407 never executes, even though the main chunk write already committed at line 363. This leaves _hybrid_cache, _audit_recursion_count_cache, and _checkpoint_count_cache stale after a successful write. Move the cache invalidation block before the re-embed loop so it always runs after the main transaction commits.
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file @src/brainlayer/store.py around line 377:
If the re-embed transaction loop (lines 380–400) raises — which is now more likely since `_set_busy_timeout_for_deadline` (line 384) and `_sleep_before_busy_retry` (line 396) both raise `BusyError` when `busy_deadline` is exceeded — the cache invalidation at lines 402–407 never executes, even though the main chunk write already committed at line 363. This leaves `_hybrid_cache`, `_audit_recursion_count_cache`, and `_checkpoint_count_cache` stale after a successful write. Move the cache invalidation block before the re-embed loop so it always runs after the main transaction commits.
Evidence trail:
src/brainlayer/store.py lines 56-62 (_busy_deadline_timeout_ms raises apsw.BusyError), lines 65-69 (_set_busy_timeout_for_deadline calls _busy_deadline_timeout_ms), lines 72-78 (_sleep_before_busy_retry raises apsw.BusyError), lines 363-364 (main transaction commits), lines 377-400 (re-embed loop that can propagate BusyError), lines 402-407 (cache invalidation that is skipped on exception). Commit: REVIEWED_COMMIT.
|
@coderabbitai review |
|
@codex review |
|
You need to increase your spend limit or enable usage-based billing to run background agents. Go to Cursor |
✅ Action performedReview finished.
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0fa85210c3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| original_exec_trace = _connection_exec_trace(conn) | ||
|
|
||
| def refresh_timeout(cursor, statement, bindings): | ||
| _set_connection_busy_timeout(conn, _remaining_store_busy_budget_ms(deadline)) |
There was a problem hiding this comment.
Let rollback bypass the expired busy trace
When this exec trace is installed, _remaining_store_busy_budget_ms() raises before every SQL statement once the store budget is exhausted. If store_memory() has already entered BEGIN IMMEDIATE and a later statement/commit hits busy until the deadline, its except apsw.BusyError cleanup (ROLLBACK, and similar savepoint cleanup in dedupe helpers) will also run through this trace and be aborted before it executes, leaving the shared MCP write connection inside an open transaction. Under exactly that contention path, subsequent brain_store/write calls can remain poisoned rather than cleanly deferring; the trace should allow transaction-control cleanup or be disabled before rollback.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 62bb380. The temporary store busy exec trace now lets rollback/release cleanup statements pass through even after the busy-budget deadline has expired, while normal statements still refresh the connection busy_timeout from the same deadline. Added test_store_busy_budget_allows_expired_rollback_cleanup covering ROLLBACK, ROLLBACK TO, and RELEASE after expiration.
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
|
Fixed the Codex P1 rollback-cleanup finding in 62bb380. Local verification on 62bb380: focused regression failed before the fix and passed after; tests/test_store_handler.py passed 22 tests; ruff check/format passed for the touched files; uncached ./scripts/run_tests.sh passed with 3021 passed, 9 skipped, 77 deselected, 1 xfailed; push pre-push gate passed with 2991 passed, 9 skipped, 61 deselected, 1 xfailed; MCP registration 3 passed; isolated eval/hook routing 40 passed; bun suite passed; test_fts5_determinism.sh passed. |
|
@codex review |
|
@coderabbitai review |
|
You need to increase your spend limit or enable usage-based billing to run background agents. Go to Cursor |
✅ Action performedReview finished.
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 62bb380889
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| with temporary_write_busy_timeout_ms(_remaining_store_busy_budget_ms(deadline), deadline=deadline): | ||
| store = _get_store_vector_store(deadline) |
There was a problem hiding this comment.
Avoid deferring before the DB schema exists
When the database is fresh, this applies the store busy budget to VectorStore construction, so a low budget or slow first-run migration can return a DEFERRED receipt before the chunks table has ever been created. The queued file is then not replayable by drain_once, which opens APSW directly and fails on the missing chunks table; I reproduced this with BRAINLAYER_STORE_BUSY_BUDGET_MS=1 on a fresh DB, leaving the queue file preserved forever. Consider letting schema initialization complete outside this budget or making the drain initialize the schema before applying queued stores.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in e940230. The drain path now initializes the BrainLayer schema before applying queued store events when the DB file is missing/empty, and also retries after a store replay hits no such table: chunks, so stores queued before first-run schema creation can replay instead of staying preserved forever. Added test_drain_store_event_initializes_missing_schema, which starts with a fresh DB path and verifies drain_once creates schema, applies the queued store, and removes the queue file.
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
|
Verification update for e940230:
@codex review |
|
✅ Action performedReview finished.
|
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit e940230. Configure here.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e9402305d9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| def _ensure_drain_db_schema(db_path: Path) -> None: | ||
| from .vector_store import VectorStore | ||
|
|
||
| store = VectorStore(db_path) |
There was a problem hiding this comment.
Preserve queued stores when schema bootstrap is contended
When a store batch is drained against a missing or empty DB while another BrainLayer process already owns the VectorStore writer pidfile, this VectorStore(db_path) construction raises WriterInUseError. The helper is called from drain_once/burn_drain_once before their existing failure-preserving retry blocks, and that exception is not treated as an APSW busy error, so the drain daemon exits instead of leaving the JSONL file queued for a later retry; a first-run/startup race can strand queued brain_store events until the daemon is restarted.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in c048857. drain_once and burn_drain_once now treat WriterInUseError as busy-like schema bootstrap contention and preserve queued store batches instead of letting the exception escape. The same preserve path is used for missing-chunks schema bootstrap retries. Added drain and burn-drain regressions that patch schema initialization to raise WriterInUseError and assert the queue file remains for retry.
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
|
Review-fix update for c048857:
@codex review |
|
RED/GREEN confirmation on all three new regressions before and after the fix is solid. Kicking off the incremental review now. ✅ Action performedReview finished.
|
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/brainlayer/drain.py`:
- Around line 1226-1233: The current code at the
`_ensure_drain_db_schema_preserving_queue` call only breaks from the inner retry
loop when schema bootstrap is busy, allowing the outer file loop to continue
processing other queue files. This reorders queued writes since the current
store file remains queued while later files are processed. Instead of breaking
from the inner loop, break from the outer loop (or return if applicable) to
match the behavior at line 1161 in the pre-transaction path, ensuring the store
file is fully handled before advancing to process other files.
In `@tests/test_write_queue.py`:
- Around line 323-355: The test
test_drain_preserves_store_queue_when_schema_bootstrap_writer_in_use currently
only verifies that the queued file exists after the drain operation but does not
verify that its content is unchanged. Capture the original JSONL content of the
queued file after writing it with queued.write_text, then after the drain_once
call completes, add an assertion that compares the current content of the queued
file with the captured original content to ensure the payload itself was not
modified or truncated.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 5c0b6578-91f4-4929-8bb2-baecc3eeffe7
📒 Files selected for processing (4)
src/brainlayer/drain.pysrc/brainlayer/vector_store.pytests/test_store_handler.pytests/test_write_queue.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: test (3.11)
- GitHub Check: test (3.12)
- GitHub Check: test (3.13)
- GitHub Check: Macroscope - Correctness Check
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Flag risky DB or concurrency changes explicitly and do not hand-wave lock behavior
Enforce one-write-at-a-time concurrency constraint; reads are safe but brain_digest is write-heavy and must not run in parallel with other MCP work
Run pytest before claiming behavior changed safely; current test suite has 929 tests
Files:
tests/test_write_queue.pysrc/brainlayer/drain.pysrc/brainlayer/vector_store.pytests/test_store_handler.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Run tests with
pytestand useBRAINLAYER_PREPUSH=1for pre-push validation
Files:
tests/test_write_queue.pytests/test_store_handler.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Run linting with
ruff check src/ && ruff format src/on all Python source files
Files:
src/brainlayer/drain.pysrc/brainlayer/vector_store.py
src/brainlayer/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Resolve database path using
paths.py:get_db_path()with env var override; canonical path is~/.local/share/brainlayer/brainlayer.db
Files:
src/brainlayer/drain.pysrc/brainlayer/vector_store.py
src/brainlayer/**/*vector_store*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Drop FTS triggers before bulk deletes on chunks table; recreate after completion
Files:
src/brainlayer/vector_store.py
🪛 ast-grep (0.43.0)
tests/test_write_queue.py
[info] 296-305: use jsonify instead of json.dumps for JSON output
Context: json.dumps(
{
"kind": "store_memory",
"chunk_id": "manual-fresh-schema",
"content": "fresh queued store should create schema before replay",
"memory_type": "note",
"project": "test",
"created_at": "2026-06-19T08:54:00Z",
}
)
Note: Security best practice.
(use-jsonify)
[info] 334-342: use jsonify instead of json.dumps for JSON output
Context: json.dumps(
{
"kind": "store_memory",
"chunk_id": "manual-schema-contended",
"content": "store queue should survive contended schema bootstrap",
"memory_type": "note",
"project": "test",
}
)
Note: Security best practice.
(use-jsonify)
[info] 368-376: use jsonify instead of json.dumps for JSON output
Context: json.dumps(
{
"kind": "store_memory",
"chunk_id": "manual-schema-contended",
"content": "burn drain should preserve contended schema bootstrap",
"memory_type": "note",
"project": "test",
}
)
Note: Security best practice.
(use-jsonify)
🔇 Additional comments (6)
src/brainlayer/vector_store.py (3)
124-140: LGTM!Also applies to: 656-670, 676-678
117-121: Keep commit-likeRELEASEstatements deadline-bound.Line 121 treats every
RELEASEas cleanup, butRELEASE SAVEPOINTcan be the commit path for a savepoint. Skipping_refresh_write_busy_timeout_for_deadline()there can leave a stale busy timeout and overrun the store budget. Only exempt rollback cleanup, or track rollback-cleanup state before exemptingRELEASE.As per coding guidelines,
**/*.py: "Flag risky DB or concurrency changes explicitly and do not hand-wave lock behavior."Suggested narrowing
def _is_write_busy_cleanup_statement(statement) -> bool: if not isinstance(statement, str): return False sql = statement.lstrip().upper() - return sql.startswith("ROLLBACK") or sql.startswith("RELEASE") + return sql.startswith("ROLLBACK")
632-653: Close failed init connections before backing off.If
_init_db()opensself.connand then raises a retryable APSW error, this path sleeps/retries without closing the failed connection or removing its deadline trace. That can keep locks/file descriptors alive across the backoff and make the next init attempt contend with its own abandoned connection.As per coding guidelines,
**/*.py: "Flag risky DB or concurrency changes explicitly and do not hand-wave lock behavior."Suggested cleanup before retry
except apsw.Error as e: if not _is_retryable_init_error(e): raise last_err = e + failed_conn = getattr(self, "conn", None) + if failed_conn is not None: + _restore_write_busy_deadline_trace( + failed_conn, + getattr(self, "_init_write_busy_deadline_trace", _NO_EXEC_TRACE), + ) + self._init_write_busy_deadline_trace = _NO_EXEC_TRACE + try: + failed_conn.close() + except apsw.Error: + pass + self.conn = None delay = min(self._INIT_BASE_DELAY * (2**attempt), self._INIT_MAX_DELAY)src/brainlayer/drain.py (1)
177-207: LGTM!Also applies to: 348-353, 807-810, 1031-1090, 1155-1161
tests/test_store_handler.py (1)
507-558: LGTM!tests/test_write_queue.py (1)
20-20: LGTM!Also applies to: 287-321
| if not _ensure_drain_db_schema_preserving_queue( | ||
| db_path, | ||
| log_path, | ||
| context=f"drain failed for {path.name}", | ||
| force=True, | ||
| ): | ||
| break | ||
| continue |
There was a problem hiding this comment.
Stop draining after preserving a schema-blocked store file.
When forced schema bootstrap is busy, Line 1232 only exits the retry loop. The outer file loop then advances, so later queue files can be applied/unlinked while this store file remains queued, reordering queued writes. Return or break the outer loop here, matching the pre-transaction path at Line 1161.
As per coding guidelines, **/*.py: “Flag risky DB or concurrency changes explicitly and do not hand-wave lock behavior.”
Suggested stop-on-preserve fix
if not _ensure_drain_db_schema_preserving_queue(
db_path,
log_path,
context=f"drain failed for {path.name}",
force=True,
):
- break
+ return drained
continue📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if not _ensure_drain_db_schema_preserving_queue( | |
| db_path, | |
| log_path, | |
| context=f"drain failed for {path.name}", | |
| force=True, | |
| ): | |
| break | |
| continue | |
| if not _ensure_drain_db_schema_preserving_queue( | |
| db_path, | |
| log_path, | |
| context=f"drain failed for {path.name}", | |
| force=True, | |
| ): | |
| return drained | |
| continue |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/brainlayer/drain.py` around lines 1226 - 1233, The current code at the
`_ensure_drain_db_schema_preserving_queue` call only breaks from the inner retry
loop when schema bootstrap is busy, allowing the outer file loop to continue
processing other queue files. This reorders queued writes since the current
store file remains queued while later files are processed. Instead of breaking
from the inner loop, break from the outer loop (or return if applicable) to
match the behavior at line 1161 in the pre-transaction path, ensuring the store
file is fully handled before advancing to process other files.
Source: Coding guidelines
There was a problem hiding this comment.
Fixed in 2ca90cc. drain_once now sets a stop flag when forced schema bootstrap is busy after a missing-chunks failure, breaks the outer file loop, and leaves later queue files untouched until the first preserved store can replay. Added test_drain_stops_after_forced_schema_bootstrap_writer_in_use, which failed before the fix by advancing to the second file and now verifies both queued JSONL payloads are unchanged.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c048857219
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| ): | ||
| break |
There was a problem hiding this comment.
Stop draining after contended schema bootstrap
When the DB file already exists but lacks chunks, a store file reaches this forced schema-bootstrap path; if bootstrap is busy, this break exits only the retry loop, so drain_once continues to later queue files while preserving the current one. A subsequent file can then initialize the schema and be applied/deleted before the earlier preserved store, violating queue order and potentially applying supersedes/dedupe against the wrong prior state. This should stop the outer drain pass just like the pre-transaction schema-busy path above.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 2ca90cc. drain_once now stops the outer file loop when forced schema bootstrap is busy after a missing-chunks failure, so later queue files are not applied or unlinked while the first store file remains queued. Added test_drain_stops_after_forced_schema_bootstrap_writer_in_use, which failed before the fix by calling schema bootstrap for the second file and now verifies both queued payloads are unchanged.
|
BL7 latest update after review fixes: Red/green for 2ca90cc:
Local verification:
@codex review |
|
Ruff clean on both touched files and the full pre-push gate at 2997 passed with all ancillary checks (MCP registration, eval/hook routing, bun, FTS determinism, worker exclusion regression) is solid coverage. Kicking off the incremental review now. ✅ Action performedReview finished.
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6a89a922ee
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| context=f"drain failed for {path.name}", | ||
| force=True, | ||
| ): | ||
| break |
There was a problem hiding this comment.
Stop draining after forced schema bootstrap is busy
When the DB file is non-empty but still lacks chunks, the first store file reaches this forced schema bootstrap path; if another writer owns schema init, this break exits only the retry loop, so the outer for path in files continues and attempts later queue files in the same drain run. That violates FIFO/write-safety under schema contention and can repeatedly open the DB/log failures for subsequent payloads instead of preserving the queue until the writer releases; return or break the outer drain loop here.
Useful? React with 👍 / 👎.
| applied = _apply_event(conn, event) | ||
| result.applied_events += 1 |
There was a problem hiding this comment.
Defer burn-drain counters until commit succeeds
This increments the public applied_events count inside the transaction before the new missing-chunks retry path can roll it back. If a batch contains an event that returns before a later store raises no such table: chunks, the forced schema retry applies the batch again and the result/log over-count rolled-back work (for example two queued lines can report applied_events=3 after one retry). Accumulate counts locally per attempt and add them to result only after COMMIT succeeds.
Useful? React with 👍 / 👎.

Summary
Test plan
Note: local CodeRabbit pre-commit review hit the free OSS rate limit, so PR-level reviewer bots are requested on the PR.
Note
Medium Risk
Changes core write/defer paths and singleton VectorStore initialization under lock contention; mis-tuned budgets could increase DEFERRED stores, but behavior is covered by extensive regression tests.
Overview
MCP
brain_storenow caps how long it waits on SQLite contention usingBRAINLAYER_STORE_BUSY_BUDGET_MS(default 3s). Before that budget is exhausted, writes refreshbusy_timeoutper statement (via exec traces), coldVectorStoreinit can time out on the singleton lock, andstore_memoryis called withretry_on_busy=Falseso the handler owns backoff. When the budget runs out, the existing DEFERRED queue /pending-stores.jsonlpath still applies—but validation runs first, so invalid requests are not queued on busy errors. Supersede updates share the same deadline-scoped timeout wrapper.store_memoryadds optionalbusy_deadline/retry_on_busyso outer callers can enforce a single contention window.Drain bootstraps schema via
VectorStorewhen replayingstore_memoryon a missing/empty DB, retries once on “no such table: chunks”, treatsWriterInUseErroras busy (preserving queue files), and applies supersedes even when replay hits an already-reserved chunk id.VectorStorewrite paths honor thread-local deadline-aware busy timeouts during init and restore default timeouts after successful open.Reviewed by Cursor Bugbot for commit c048857. Bugbot is set up for automated code reviews on this repo. Configure here.
Note
Bound brain_store busy deferral with deadline-aware timeouts across store and drain paths
BRAINLAYER_STORE_BUSY_BUDGET_MS) that caps total wait time across VectorStore init, write transactions, and supersede updates in store_handler.py._temporary_store_busy_timeoutcontext manager and per-statement exec trace hooks to refreshbusy_timeoutbefore each SQL statement and restore original settings on exit.store_memoryin store.py withbusy_deadlineandretry_on_busyparameters so internal retry loops honor the caller's deadline instead of sleeping independently.WriterInUseErrorand retries once on missingchunkstable.BusyErroronce the budget is exhausted.Macroscope summarized c048857.
Summary by CodeRabbit
New Features
BRAINLAYER_STORE_BUSY_BUDGET_MS) to bound how long store operations wait under contention.Improvements
Tests