feat(brain): ADR-130 service split — SSE proxy, worker, internal queue#319
Merged
feat(brain): ADR-130 service split — SSE proxy, worker, internal queue#319
Conversation
…gination fallback (ADR-130) Three fixes for recurring pi.ruv.io outages: 1. SSE connection limiter (max 50) — prevents MCP reconnect storms from exhausting Cloud Run concurrency slots. Tracks active count with AtomicUsize, rejects excess with 429. 2. Pipeline optimize rate limiter — max 1 concurrent request with 30s cooldown. Prevents scheduler thundering herd from CPU-saturating the instance. 3. Firestore pagination offset fallback — when page tokens go stale after OOM restart (400 Bad Request), switches to offset-based pagination to load all documents instead of stopping at first batch. Also adds /v1/ready lightweight probe (zero-cost, no state access) for Cloud Run health checks. ADR-130 documents the full decoupling architecture (SSE service split). Co-Authored-By: claude-flow <ruv@ruv.net>
…al queue Implements full MCP SSE decoupling to eliminate recurring outages: 1. ruvbrain-sse: Thin SSE proxy (308 lines) that manages MCP connections independently from the API. Max 200 concurrent SSE, forwards JSON-RPC to the API, polls /internal/queue/drain for responses. No business logic. 2. ruvbrain-worker: Batch worker binary (202 lines) for Cloud Run Jobs. Runs scheduler actions (train, drift, transfer, graph, cleanup, attractor) with direct Firestore access. Runs once and exits. 3. Internal queue endpoints on the API: - POST /internal/queue/push (forward JSON-RPC to session) - GET /internal/queue/drain (poll for responses) - POST /internal/session/create (register session) - DELETE /internal/session/:id (cleanup) 4. Deploy infrastructure: - Dockerfile.sse, Dockerfile.worker - cloudbuild-sse.yaml, cloudbuild-worker.yaml - scripts/deploy_brain_services.sh [api|sse|worker|all] Architecture: SSE (500 concurrency, 512MB) → API (80 concurrency, 4GB) ← Worker (Cloud Run Job, 4GB) Co-Authored-By: claude-flow <ruv@ruv.net>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Full implementation of ADR-130: decouples MCP SSE transport from the brain API to eliminate recurring outages (4 incidents in 2 days).
ruvbrain-sse(308 lines): Thin SSE proxy managing MCP connections independently. 500 concurrency, 512MB — SSE storms can never starve the API.ruvbrain-worker(202 lines): Batch worker for Cloud Run Jobs. Runs scheduler actions (train, drift, transfer, graph rebuild) and exits. No HTTP server./internal/queue/push,/internal/queue/drain,/internal/session/create,/internal/session/:idfor SSE↔API communication.scripts/deploy_brain_services.sh [api|sse|worker|all].Architecture
Test plan
cargo check -p mcp-brain-server— all 3 binaries compilecargo test -p mcp-brain-server— 143 passed (3 pre-existing failures)./scripts/deploy_brain_services.sh sse./scripts/deploy_brain_services.sh worker🤖 Generated with claude-flow