feat(ai-service): export Prometheus metrics for circuit breaker (closes #134)#194
Merged
kilodesodiq-arch merged 1 commit intoJun 23, 2026
Conversation
ChainForgee#134) Adds three Prometheus instruments updated on every circuit breaker transition, so unhealthy providers surface in dashboards rather than from user-visible degradation: - Gauge circuit_breaker_state (0=CLOSED, 1=HALF_OPEN, 2=OPEN) - Counter circuit_breaker_failure_count_total (cumulative failures) - Histogram circuit_breaker_recovery_time_seconds (OPEN -> HALF_OPEN lag) Metrics are emitted inside the same lock that guards state, so the exported values can never diverge from the underlying state. Failure count tracks cumulative failures since the last reset, matching the Prometheus counter contract documented in the issue. The initial state is published in __init__ so every instantiated breaker appears in the gauge even before any traffic flows.
Contributor
Author
|
cc @gbengaeben @CodeMayor — recent contributors to |
7 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Closes #134 — adds Prometheus metrics for the AI service circuit breaker so unhealthy provider trips are observable in dashboards rather than from user-visible degradation.
Changes
circuit_breaker_statebreaker_namecircuit_breaker_failure_count_totalcircuit_breaker_recovery_time_seconds_lockthat already guards breaker state, so the exported values can never diverge from the underlying state.__init__so every instantiated breaker appears in the gauge even before any traffic flows.CIRCUIT_STATE_CLOSED|HALF_OPEN|OPENconstants and aset_circuit_state(...)helper to keep call sites readable.Tests
TestCircuitBreakerMetricsclass with five focused tests:tests/test_circuit_breaker.pypass locally in ~1.5s.Compatibility
CircuitBreakercallers are untouched.metrics.pyregistry; no new dependencies, no new registry.Notes
circuit_breaker_failure_count_totalis cumulative since the last reset, matching Prometheus counter semantics. Operators should compare rate over a window rather than the absolute value.prometheus_clientdefault buckets. If finer-grained slices are needed later, passbuckets=[1, 5, 10, 30, 60, 120].