fix: throttle health-check self-heal#521
Conversation
|
Warning Review limit reached
More reviews will be available in 12 minutes and 48 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (2)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
|
@codex review |
|
@coderabbitai review |
|
You need to increase your spend limit or enable usage-based billing to run background agents. Go to Cursor |
✅ Action performedReview finished.
|
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit e76d16a. Configure here.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e76d16af84
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| config=config, | ||
| command_runner=command_runner, | ||
| ) | ||
| if result.missing_vectors is not None or heal_failures: |
There was a problem hiding this comment.
Preserve missing-vector state on count failures
When SQLite counting fails (for example during a transient DB lock), result.missing_vectors is None but _apply_heals still returns a nonempty map for the missing_embeddings_count_failed issue. This new or heal_failures branch then rewrites the state file without missing_vectors/stalled_ticks, so the next successful tick has no previous baseline and cannot detect climbing or stalled embeddings. Please skip this write or merge the existing progress fields when the count is unavailable.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in commit 83e094b: the state write now merges from the loaded state first, then refreshes heal_failures/ts and only overwrites missing_vectors/stalled_ticks when a fresh count is available. Added regression coverage for count-failure plus heal counter persistence.
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
|
You need to increase your spend limit or enable usage-based billing to run background agents. Go to Cursor |

Summary
--healkickstarts behind consecutive failures for the same issue, persisted in the existing health-check state file.Test plan
pytest tests/test_stability_health_check.py::test_backlog_batch_zero_alarms_but_waits_until_repeated_failure_to_kickstart_hotlane -q(failed before implementation, then passed)pytest tests/test_stability_health_check.py tests/test_launchd_hygiene.py -qruff check src/brainlayer/health_check.py tests/test_stability_health_check.pyruff format --check src/brainlayer/health_check.py tests/test_stability_health_check.pycoderabbit review --agent(local, findings: 0)BRAINLAYER_PREPUSH=1 bash scripts/run_tests.sh(passed after rerunning one transient CLI p95 performance failure; final run passed: 3011 passed, 9 skipped, 61 deselected, 1 xfailed; MCP registration, isolated eval/hook routing, bun, and regression shell passed)Note
Medium Risk
Changes when automated launchd restarts fire for live BrainBar/hotlane services; mis-tuned thresholds could delay recovery, though defaults only add one check cycle (~5 minutes).
Overview
Health-check
--healno longer kickstarts launchd on the first sighting of an issue. Each issue code gets a consecutive-failure counter in the existing health-check state file (heal_failures); kickstart runs only when that count reaches a threshold (default 2, overridable viaBRAINLAYER_HEAL_MIN_CONSECUTIVE_FAILURESorHealthCheckConfig.heal_min_consecutive_failures).Alarms, exit behavior, and which labels map to which issues are unchanged. When a heal actually runs, stderr logs
heal actionwith label, issue code, and consecutive-failure count.Tests were updated for two-run throttling (hotlane backlog zero, brainbar canary failure), heal logging, env override, and immediate heal when threshold is set to 1.
Reviewed by Cursor Bugbot for commit e76d16a. Bugbot is set up for automated code reviews on this repo. Configure here.
Note
Throttle health-check self-heal by requiring consecutive failures before kickstart
heal_min_consecutive_failures, default 2) is reached per issue code, preventing spurious restarts on transient issues.heal_failuresand incremented each check cycle; counts reset implicitly when the issue clears.BRAINLAYER_HEAL_MIN_CONSECUTIVE_FAILURESenv var overrides the default threshold via_env_intin health_check.py.Macroscope summarized 83e094b.