feat(brain): ADR-130 service split — SSE proxy, worker, internal queue by ruvnet · Pull Request #319 · ruvnet/RuVector

ruvnet · 2026-03-30T15:31:27Z

Summary

Full implementation of ADR-130: decouples MCP SSE transport from the brain API to eliminate recurring outages (4 incidents in 2 days).

ruvbrain-sse (308 lines): Thin SSE proxy managing MCP connections independently. 500 concurrency, 512MB — SSE storms can never starve the API.
ruvbrain-worker (202 lines): Batch worker for Cloud Run Jobs. Runs scheduler actions (train, drift, transfer, graph rebuild) and exits. No HTTP server.
Internal queue endpoints: /internal/queue/push, /internal/queue/drain, /internal/session/create, /internal/session/:id for SSE↔API communication.
Deploy infrastructure: Dockerfiles, Cloud Build configs, scripts/deploy_brain_services.sh [api|sse|worker|all].

Architecture

MCP Clients ──SSE──▶ ruvbrain-sse (500 concurrency, 512MB)
                         │ forwards JSON-RPC
                         ▼
REST/Health ──────▶ ruvbrain-api (80 concurrency, 4GB)
                    /internal/* queue endpoints

Schedulers ───────▶ ruvbrain-worker (Cloud Run Job, 4GB)
                    direct Firestore, runs and exits

Test plan

cargo check -p mcp-brain-server — all 3 binaries compile
cargo test -p mcp-brain-server — 143 passed (3 pre-existing failures)
CI builds pass
Deploy SSE service: ./scripts/deploy_brain_services.sh sse
Deploy worker: ./scripts/deploy_brain_services.sh worker
Verify SSE clients connect through proxy
Verify scheduler jobs run via worker

🤖 Generated with claude-flow

…gination fallback (ADR-130) Three fixes for recurring pi.ruv.io outages: 1. SSE connection limiter (max 50) — prevents MCP reconnect storms from exhausting Cloud Run concurrency slots. Tracks active count with AtomicUsize, rejects excess with 429. 2. Pipeline optimize rate limiter — max 1 concurrent request with 30s cooldown. Prevents scheduler thundering herd from CPU-saturating the instance. 3. Firestore pagination offset fallback — when page tokens go stale after OOM restart (400 Bad Request), switches to offset-based pagination to load all documents instead of stopping at first batch. Also adds /v1/ready lightweight probe (zero-cost, no state access) for Cloud Run health checks. ADR-130 documents the full decoupling architecture (SSE service split). Co-Authored-By: claude-flow <ruv@ruv.net>

…al queue Implements full MCP SSE decoupling to eliminate recurring outages: 1. ruvbrain-sse: Thin SSE proxy (308 lines) that manages MCP connections independently from the API. Max 200 concurrent SSE, forwards JSON-RPC to the API, polls /internal/queue/drain for responses. No business logic. 2. ruvbrain-worker: Batch worker binary (202 lines) for Cloud Run Jobs. Runs scheduler actions (train, drift, transfer, graph, cleanup, attractor) with direct Firestore access. Runs once and exits. 3. Internal queue endpoints on the API: - POST /internal/queue/push (forward JSON-RPC to session) - GET /internal/queue/drain (poll for responses) - POST /internal/session/create (register session) - DELETE /internal/session/:id (cleanup) 4. Deploy infrastructure: - Dockerfile.sse, Dockerfile.worker - cloudbuild-sse.yaml, cloudbuild-worker.yaml - scripts/deploy_brain_services.sh [api|sse|worker|all] Architecture: SSE (500 concurrency, 512MB) → API (80 concurrency, 4GB) ← Worker (Cloud Run Job, 4GB) Co-Authored-By: claude-flow <ruv@ruv.net>

ruvnet added 2 commits March 30, 2026 11:59

ruvnet merged commit 3f603e2 into main Mar 30, 2026
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(brain): ADR-130 service split — SSE proxy, worker, internal queue#319

feat(brain): ADR-130 service split — SSE proxy, worker, internal queue#319
ruvnet merged 2 commits intomainfrom
fix/adr-130-brain-server-stability

ruvnet commented Mar 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ruvnet commented Mar 30, 2026

Summary

Architecture

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant