-
Notifications
You must be signed in to change notification settings - Fork 442
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Summary
The ruvbrain Cloud Run service currently stores MCP SSE sessions in a process-local DashMap (line 182 of routes.rs). This forces max-instances=1 to prevent session-not-found 404s when Cloud Run routes requests across instances. To restore horizontal scaling, sessions must move to a shared external store.
Current State
// crates/mcp-brain-server/src/routes.rs:182-183
let sessions: Arc<dashmap::DashMap<String, tokio::sync::mpsc::Sender<String>>> =
Arc::new(dashmap::DashMap::new());- max-instances: 1 (set in bug(mcp/sse): /messages?sessionId= returns 404 — all MCP tool calls non-functional on pi.ruv.io #312 fix to prevent session routing failures)
- containerConcurrency: 20 (20 concurrent SSE + POST requests per instance)
- Session lifetime: Process memory — lost on instance restart or scale-down
- Impact: Cannot scale beyond 1 instance; all MCP clients share a single 2-CPU, 2GB instance
Proposed Solution: Memorystore for Redis
Google Cloud Memorystore provides a managed Redis instance that all Cloud Run instances can share.
GCloud Resources Required
# 1. Create Memorystore Redis instance (Basic tier, 1GB, us-central1)
gcloud redis instances create ruvbrain-sessions \
--size=1 \
--region=us-central1 \
--tier=basic \
--redis-version=redis_7_2 \
--network=default \
--connect-mode=private-service-access \
--project=ruv-dev
# 2. Create VPC connector for Cloud Run → Redis
gcloud compute networks vpc-access connectors create ruvbrain-connector \
--network=default \
--region=us-central1 \
--range=10.8.0.0/28 \
--project=ruv-dev
# 3. Update ruvbrain service with VPC connector and restore scaling
gcloud run services update ruvbrain \
--region=us-central1 --project=ruv-dev \
--vpc-connector=ruvbrain-connector \
--max-instances=10 \
--set-env-vars="REDIS_HOST=<redis-ip>,REDIS_PORT=6379"
# 4. Store Redis host as a secret
gcloud secrets create REDIS_HOST --data-file=- <<< "<redis-ip>"Estimated Monthly Cost
| Resource | Cost |
|---|---|
| Memorystore Basic 1GB (us-central1) | ~$35/month |
| VPC connector (f1-micro, min 2 instances) | ~$7/month |
| Total | ~$42/month |
Code Changes
1. Add redis dependency to mcp-brain-server
# crates/mcp-brain-server/Cargo.toml
redis = { version = "0.25", features = ["tokio-comp", "connection-manager"] }2. Replace DashMap with Redis-backed session store
// New: RedisSessionStore
pub struct RedisSessionStore {
pool: redis::aio::ConnectionManager,
local_senders: DashMap<String, tokio::sync::mpsc::Sender<String>>,
}
impl RedisSessionStore {
pub async fn new(redis_url: &str) -> Result<Self> {
let client = redis::Client::open(redis_url)?;
let pool = redis::aio::ConnectionManager::new(client).await?;
Ok(Self { pool, local_senders: DashMap::new() })
}
pub async fn register(&self, session_id: &str, sender: Sender<String>) {
// Store sender locally (channel is not serializable)
self.local_senders.insert(session_id.to_string(), sender);
// Register session existence in Redis (with TTL)
let mut conn = self.pool.clone();
let _: () = redis::cmd("SET")
.arg(format!("mcp:session:{session_id}"))
.arg("1")
.arg("EX").arg(3600) // 1 hour TTL
.query_async(&mut conn).await.unwrap_or(());
}
pub async fn get(&self, session_id: &str) -> Option<Sender<String>> {
// Check local first (same instance)
if let Some(s) = self.local_senders.get(session_id) {
return Some(s.clone());
}
// Check Redis (session exists on another instance)
// In this case, we can't forward — return None and let the
// client reconnect. Log for observability.
None
}
}3. Update messages_handler
async fn messages_handler(...) -> StatusCode {
let sender = match state.sessions.get(&query.session_id).await {
Some(s) => s,
None => {
// Log session miss for monitoring
tracing::warn!(session_id = %query.session_id, "Session not found (may be on another instance)");
return StatusCode::NOT_FOUND;
}
};
// ... rest unchanged
}Migration Plan
- Provision Memorystore Redis + VPC connector
- Add
redisdependency tomcp-brain-server - Implement
RedisSessionStorewith local sender cache + Redis existence tracking - Add session TTL (1 hour) and cleanup
- Update Cloud Run service: add VPC connector, set
REDIS_HOSTenv var - Restore
max-instances=10 - Test: SSE connect on Instance A, POST on Instance B → verify session found
- Monitor: track session miss rate in logs
Alternative: Streamable HTTP Transport
MCP protocol v2025+ supports Streamable HTTP transport which is stateless (no sessions). This would eliminate the session problem entirely. However, Claude Code currently uses SSE transport, so this is a longer-term migration.
Context
- Fixed in bug(mcp/sse): /messages?sessionId= returns 404 — all MCP tool calls non-functional on pi.ruv.io #312 by setting
max-instances=1 - Sessions stored at
crates/mcp-brain-server/src/routes.rs:182 - Service:
ruvbraininus-central1(2 CPU, 2GB RAM) - 37 MCP tools depend on SSE sessions
🤖 Generated with claude-flow
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request