Skip to content

feat(mcp)!: v2.0.0 — Code Mode (workerd sidecar, 10-tool surface)#11

Draft
pratikbin wants to merge 11 commits into
mainfrom
feat/code-mode-v2
Draft

feat(mcp)!: v2.0.0 — Code Mode (workerd sidecar, 10-tool surface)#11
pratikbin wants to merge 11 commits into
mainfrom
feat/code-mode-v2

Conversation

@pratikbin
Copy link
Copy Markdown
Contributor

@pratikbin pratikbin commented Apr 25, 2026

Summary

CreateOS MCP v2.0.0. Collapses 87 long-tail tool definitions into 3 code-mode tools backed by a workerd V8 sandbox sidecar, while keeping 7 native fast-path tools for hot operations.

Status: Draft. Blocked on autogen-backend-v2#451 (private) — see Cross-repo dependency below.
Rollout tracker: #12.

Tool surface (v2)

Native (7):  GetQuotas, GetSupportedProjectTypes, CheckProjectUniqueName,
             CreateProject, UploadDeploymentBase64Files, GetDeployment,
             CancelDeployment

Code Mode (3):
  search(code)    — read-only OpenAPI-spec introspection in a JS sandbox
  execute(code)   — typed CreateOS API calls in a JS sandbox
  pollJob(jobId)  — drain long-running execute() jobs

The full ~100-endpoint CreateOS API is reachable via execute. The model writes a single async arrow function that may chain multiple typed api.<group>.<operationId>(args) calls inside one sandbox boundary. Long-running ops (>90 s) auto-upgrade to a job; the model loops pollJob until done. Hard wall cap 600 s.

Token cost of the 10-tool descriptor surface: 896 tokens (gated at 1500).

Architecture

Two-process pod:

                ┌──────────────────────────────┐
   MCP client ──┤ mcp container (Go)           │
                │   - 7 native tools           │
                │   - 3 code-mode tools        │ ── HTTP /run /poll /health ──┐
                │   - /metrics /readyz         │                               │
                └──────────────────────────────┘                               │
                                                                              ▼
                                                     ┌─────────────────────────────────┐
                                                     │ workerd container (V8 sandbox)  │
                                                     │   dispatcher worker (:8787)      │
                                                     │     - spec bootstrap on boot    │
                                                     │     - dynamic worker loader     │
                                                     │       (one isolate per /run)    │
                                                     │   api-proxy worker               │
                                                     │     - only egress to backend    │
                                                     └─────────────────────────────────┘
                                                                              │
                                                                              ▼
                                                                       backend API

Comms strictly over loopback. The api-proxy worker is the only thing in the pod with backend egress.

Sandbox guarantees

  • Each search/execute runs in a fresh V8 isolate spawned via the workerd Dynamic Worker Loader.
  • No filesystem, no env vars, no ambient fetch. Only the host.js ESM module exposes spec, api, console, sleep.
  • Auth headers (X-Api-Key, X-Access-Token for bearer) are forwarded by the dispatcher to api-proxy from the trusted MCP request context — user code cannot inject or override them.
  • api.raw(method, path, opts) requires server-relative paths. Absolute URLs, protocol-relative //, and dot-segments (raw or percent-encoded) are rejected to prevent SSRF / BACKEND_URL prefix escape.
  • Response headers are allowlisted; Set-Cookie and internal x-* headers are stripped before reaching user code.

Limits

search timeout 5 s
execute sync window 90 s
execute wall cap 600 s
Concurrent execute jobs 50 (429 beyond)
Concurrent search isolates 50
Job store size 500 (LRU evict on cap)
Job TTL after terminal 30 min
User code 1 MB
Result 64 KB (oversized truncated with __truncated)
sleep(ms) cap 60 s per call

Discovery surface (for clients)

  • initialize.serverInfo.name"CreateOS MCP (Code Mode v2)"
  • tools/list → 10 tools; the 3 code-mode tools carry _meta = {mode:"code", sandbox:"workerd", version:"v2"}
  • resources/list
    • code-mode://intro (workflow + auth + sandbox + limits)
    • code-mode://api-shape (typed accessor + ApiError contract + discovery snippet)
  • prompts/list
    • code-mode/deploy-example (single execute that deploys + waits, with pollJob fallback)
    • code-mode/api-discovery (4 search snippets that walk from keyword to operationId)

Cross-repo dependency

This PR depends on autogen-backend-v2#451 (private repo). That PR adds a CI gate asserting every route in the OpenAPI spec carries an operationId. The MCP sidecar codegens its typed SDK from those operationIds.

Soft dependency — operations missing operationId fall through to api.raw(...), so v2 functions even before the backend PR merges. The hard requirement is for the typed surface to be complete in production. MCP v2 must not ship to prod until the backend PR is merged.

Backend ops tracker: NodeOps-app/autogen-backend-v2#452 (private).

Breaking changes

  • 87 native MCP tools removed. Long-tail API surface is now reachable only via search/execute/pollJob.
  • Server name in initialize.serverInfo.name changed from "MCP Server" to "CreateOS MCP (Code Mode v2)".
  • Server version bumped to 2.0.0.

Native tools retained (7)

GetQuotas, GetSupportedProjectTypes, CheckProjectUniqueName, CreateProject, UploadDeploymentBase64Files, GetDeployment, CancelDeployment.

Operational

  • New container: workerd sidecar built from codemode/workerd/Dockerfile (oven/bun base, non-root user, --experimental workerd flag for the Dynamic Worker Loader).
  • k8s/pod-v2.yaml — two-container pod manifest. MCP container reads config from CREATEOS_MCP_CONFIG env (mounted ConfigMap at /etc/createos).
  • mcp /readyz — 200 only when workerd /health responds 200.
  • mcp /metrics — Prom: codemode_run_duration_seconds{mode,outcome}, codemode_jobs_running, codemode_jobstore_size, codemode_api_call_duration_seconds.
  • WORKERD_URL env (default http://127.0.0.1:8787) tells the MCP binary where the sidecar lives.

CI

  • .github/workflows/token-budget.ymlbun scripts/count-tokens.ts asserts the 10-tool descriptor surface stays ≤1500 tokens (currently 896).

Test plan

  • go build ./... clean
  • go test ./codemode/... passes (15 unit tests)
  • bun test in codemode/workerd/ passes (28 tests across dispatcher, spec-loader, host, api-proxy, api-sdk-builder)
  • CODEMODE_INTEGRATION=1 go test -tags integration passes (3 end-to-end tests, real workerd subprocess, mock backend)
  • bun scripts/count-tokens.ts — 896/1500
  • Manual smoke: search returns spec paths, execute calls real backend with caller auth, pollJob returns terminal envelopes, /readyz reflects workerd /health, /metrics reachable
  • Staging deploy via k8s/pod-v2.yaml
  • 1-week internal team soak on staging
  • Blue-green canary to prod (5% → 25% → 50% → 100%)

Files

  • codemode/ — Go pkg (client, handler, auth, health, metrics, discovery)
  • codemode/workerd/ — workerd config (Cap'n Proto), dispatcher.js, host.js, api-proxy.js, api-sdk-builder.js, spec-loader.js, Dockerfile, bun tests
  • server.go — registers 7 native + 3 code-mode tools + 2 resources + 2 prompts
  • main.go — workerd health probe, /readyz, /metrics, codemode auth-header middleware
  • Dockerfile — distroless/static, nonroot
  • k8s/pod-v2.yaml — pod manifest
  • scripts/{count-tokens,smoke,soak}.ts — CI gate + load tests
  • README.md — Code Mode section + 10-tool surface

Stand up the workerd sidecar that backs Code Mode v2:

- workerd config (Cap'n Proto): dispatcher worker on :8787, api-proxy
  worker with private-network globalOutbound, workerLoader binding for
  Dynamic Worker Loader (experimental flag required).
- spec bootstrap: dispatcher fetches /api-docs/openapi.yaml on first
  request via api-proxy bootstrap mode, resolves $refs and strips
  description/example fields, caches in module scope.
- /run search mode: each call gets a fresh V8 isolate via the worker
  loader. Isolate imports a generated `host.js` exposing `spec`,
  `console`, `sleep` (no network, no api in this phase). 5s timeout,
  1MB code cap, structured error envelope.
- spec-loader is bundled with `bun build` so workerd can embed yaml
  inline (workerd cannot resolve node_modules).
- Go client + Search MCP tool registered alongside existing 94 tools.
  HealthMonitor goroutine probes /health; main wires /readyz to expose
  workerd readiness.
- bun unit tests for dispatcher + spec-loader; Go unit tests for
  client + handler; integration test forks real workerd against an
  httptest backend serving a fixture spec.

Backend (autogen-backend-v2) operationId gate landed in
feat/openapi-operationids; required for Phase 2 SDK codegen.
- api-proxy: authenticated mode forwards X-Api-Key/Bearer, builds
  query strings, 12s upstream timeout, structured network/timeout
  envelopes.
- api-sdk-builder: emits ESM module per call from in-memory spec.
  Generates `api.<group>.<operationId>(args)` keyed by tag, with
  `api.raw(method, path, opts)` escape hatch. Throws ApiError on 4xx/5xx
  carrying status + body + path.
- host.js: drops `api` export; `wrapMainModule` now declares run(callback)
  entrypoint that builds api inside the run scope so the RPC stub stays
  alive (avoids "RPC stub used after being disposed" disposal race).
- dispatcher: passes the api callback to entry.run() instead of a
  separate setApi() RPC.
- Go: AuthCtx context plumbing (WithAuthHeaders / AuthFromContext),
  Execute MCP tool with auth forwarding, registered in server.go.
- Tests: bun (api-proxy / host / sdk-builder), go unit (auth + execute
  forwards auth), integration (execute happy + auth error).
Long-running execute calls auto-upgrade to a job after the 90s sync
window; the model loops pollJob(jobId) until the job completes or hits
the 600s wall cap.

- dispatcher: jobStore (Map keyed by `j_${uuid}`) tracks each execute
  isolate. Each /run execute registers a job, races run vs sync window;
  if the run finishes inside 90s the response is the terminal envelope,
  otherwise it returns {status: 'running', jobId}. /poll blocks up to
  another 90s using awaitJobUntil() against the job's waiter set.
- Wall cap: startIsolateRun races against a 600s timer; on expiry the
  job is finalised with errorKind: 'timeout'.
- TTL: terminal jobs are evicted 30 min after finishedAt.
- Result cap: 64 KB serialised; oversized payloads return {__truncated,
  preview}.
- /health now reports jobsRunning + jobStoreSize.
- Go: Client.Poll + PollJob MCP tool registered in server.go.
- Tests: Go unit (poll happy + jobMissing, handler missing arg).
BREAKING CHANGE: 87 native MCP tools removed. Long-tail API surface is
now reachable via search/execute/pollJob (Code Mode v2).

Native tools retained (7):
  GetQuotas, GetSupportedProjectTypes, CheckProjectUniqueName,
  CreateProject, UploadDeploymentBase64Files, GetDeployment,
  CancelDeployment

Code Mode tools (3):
  search(code), execute(code), pollJob(jobId)

Phase 4 changes:
- server.go: trimmed to 7 native + 3 code-mode AddTool calls. Version
  bumped to 2.0.0.
- mcptools/, handlers/: 87 long-tail files deleted (174 total).
- token-budget gate: scripts/count-tokens.ts asserts the 10-tool
  descriptor surface stays under 1500 tokens (currently 476).
- Prometheus metrics: codemode_run_duration_seconds (per mode/outcome),
  codemode_jobs_running, codemode_jobstore_size,
  codemode_api_call_duration_seconds. Exposed at /metrics in HTTP mode.
- HealthMonitor pulls jobsRunning + jobStoreSize from workerd /health
  and exports them as gauges.
- Hardened Dockerfile (distroless/nonroot) + workerd Dockerfile
  (non-root debian) + k8s/pod-v2.yaml (read-only rootfs, dropped caps,
  RuntimeDefault seccomp, mcp /readyz + workerd /health probes).
- Smoke + soak scripts for staging verification.
- docs/runbook-v2.md: ops runbook (health, common failures, rollback,
  capacity).
- README: 85+ → 10 tools; new Code Mode section with example.
api-proxy:
- buildUrl rejects non-relative paths (must start with single '/'),
  blocking SSRF / credential exfil via api.raw("GET","https://attacker").
- sanitizeRequestHeaders allowlists user-supplied headers; auth headers
  spread last so X-Api-Key/Authorization cannot be overridden by user
  code or smuggled by ApiCall.
- sanitizeResponseHeaders allowlists relayed response headers; strips
  Set-Cookie and any internal x-* headers before they reach the sandbox.
- await req.json() now wrapped in try/catch — malformed bodies return
  400, not 500.

dispatcher:
- Wall-timeout (600s) attempts best-effort dispose of the runaway
  worker's RPC entrypoint + worker stub. Comment documents the workerd
  limitation: there is no isolate.terminate() yet, so the V8 isolate
  continues until the Worker Loader's LRU evicts it. Disposing the stub
  ensures any further api-proxy calls from the runaway code surface as
  ProxyError instead of being silently delivered.

integration_test:
- startWorkerd cancel() now Wait()s after Kill() and sleeps 100ms so the
  fixed loopback port + pid table are released before the next test.

bun tests added: SSRF block, // path block, header override, response
header strip, malformed body. 18/18 pass.

Review item 4 (waiter cleanup deletes wake not resolve) verified
already correct in shipped dispatcher.js:189; plan file had the bug,
implementation transcribed correctly.
Critical:
- Auth was never reaching workerd. Execute() called AuthFromContext but
  no middleware populated the ctx. Added AuthFromRequest that reads
  mcp.CallToolRequest.Header (matching the native handlers/request.go
  pattern) and falls back to AuthFromContext. Added a thin HTTP
  middleware (codemodeAuthHeadersMiddleware) that mirrors incoming
  X-Api-Key / Authorization into ctx via WithAuthHeaders so AuthFromContext
  works as a backup path even on transports that don't surface request
  headers on the MCP request struct.
- startIsolateRun now disposes the entry+worker stubs in a finally
  block on EVERY call (success, user error, timeout) — not just wall
  timeout. Search timeout now also flows through startIsolateRun's
  timer instead of an outer Promise.race that left the isolate alive
  and undisposed. Prevents CPU-bound user code from accumulating
  cached workers.

Warnings:
- Bearer header now forwarded as X-Access-Token to match native client
  (helpers/httpclient.go uses X-Access-Token; sending Authorization
  Bearer would silently fail on Bearer-authenticated callers).
  X-Access-Token added to SAFE_REQUEST_HEADERS allowlist.
- buildUrl now preserves any path prefix on BACKEND_URL. Previously a
  BACKEND_URL like "https://host/api" would be stripped to
  "https://host/v1/..." for authenticated calls (bootstrap concatenated
  raw, masking the bug). Bootstrap path now also routes through
  buildUrl so both modes behave the same.
- Pod manifest mounts a ConfigMap at /etc/createos and sets
  CREATEOS_MCP_CONFIG so the binary's --config flag honours it. main.go
  flag default now reads CREATEOS_MCP_CONFIG. Pod will no longer crash
  on startup with "config.yaml not found".

Info:
- host.js wrapMainModule moves `userFn = (${userCode})` inside the
  try/catch so syntax errors surface as user-code errors, not infra.
- soak.ts pre-claims work units so TOTAL is never overshot, wraps fetch
  in try/catch, and reports succeeded/failed/errored separately.

Tests: 20/20 bun (added X-Access-Token, base-path preservation,
bootstrap prefix), Go auth_test (added AuthFromRequest cases:
X-Api-Key header, Bearer header, ctx fallback), 3/3 integration still
pass.
Critical:
- authMiddleware: any Authorization header containing a space (e.g.
  "Basic abc") used to satisfy the gate because the missing-credentials
  branch only checked for empty Authorization. Now we parse the header,
  require scheme = "Bearer" + non-empty token, reject everything else
  with 401.
- api-proxy: removed x-access-token from SAFE_REQUEST_HEADERS allowlist.
  It was added when X-Access-Token replaced Authorization for bearer
  forwarding, but allowlisting it lets user code inject
  headers: {"X-Access-Token": "stolen"} for callers that authenticated
  via API key (where buildAuthHeaders never sets X-Access-Token, so the
  injected value survives). All credential headers are now populated
  exclusively from the trusted authCtx.

Warnings:
- jobStore + isolate concurrency caps. New constants:
    MAX_RUNNING_JOBS = 50      (concurrent execute jobs)
    MAX_JOB_STORE_TOTAL = 500  (running + finished-but-not-yet-evicted)
    MAX_RUNNING_SEARCHES = 50  (concurrent search isolates)
  /run returns 429 with errorKind: "capacity" when a limit is hit.
  Bounds memory growth and isolate accumulation when CPU-bound user
  code runs to wall_timeout (workerd cannot synchronously terminate the
  V8 isolate; LRU eviction lags).
- Search path now decrements its in-flight counter in a finally block.

Tests:
- bun: added credential-smuggling test (X-Access-Token, Authorization,
  X-Api-Key, Cookie all stripped from user-supplied headers); 21/21
  pass.
- go: existing tests still pass; integration 3/3 pass.
Critical:
- buildUrl rejects "." and ".." path segments. URL normalisation collapses
  "/api/../admin" to "/admin", which would let user code escape any
  BACKEND_URL path prefix and reach unrelated routes (e.g. /admin from
  a /api-prefixed deployment).

Warnings:
- jobStore total cap (500) now evicts the oldest finished job (LRU on
  finishedAt) before falling back to 429. Previously a burst of 500
  fast successful executes would 429 every subsequent request for the
  next 30 minutes, even with zero running jobs.
- api-proxy now reads response body once via resp.text() and parses
  JSON in-process. resp.json() consumes the body, so the previous
  resp.text() fallback would throw "body already used" if the JSON
  parse failed mid-stream.
- stdio transport now reads CREATEOS_API_KEY / CREATEOS_BEARER env
  vars when neither request header nor ctx headers are present.
  AuthFromEnv exposed for tests; AuthFromRequest cascades:
  request header → ctx → env.

Info:
- host.js wrapMainModule comment was misleading: a syntax error in
  userCode fails the dynamic ESM module compile BEFORE run() executes,
  so the inner try cannot catch it. Comment now describes the real
  flow, and dispatcher.js startIsolateRun catches SyntaxError out of
  the loader/run promise chain and reclassifies it as kind=SyntaxError
  / userCode so the model sees actionable feedback instead of
  errorKind=infra.

Tests:
- bun: dot-segment escape (/../admin), middle dot-segment
  (/v1/foo/../../admin), invalid-JSON-with-json-content-type.
  24/24 pass.
- go: TestAuthFromRequest_FallsBackToEnv, TestAuthFromEnv_None.
Path traversal via percent-encoded dot-segments still bypassed the
BACKEND_URL prefix check. RFC 3986 §3.3 treats "%2e" as identical to
".", so URL normalisation collapsed "/api/%2e%2e/admin" to "/admin"
before any request-time check ran.

buildUrl now decodeURIComponent()s each path segment before checking
for "." / ".." and rejects invalid percent-encoding outright.

Tests added: %2e%2e, %2E%2e (mixed case), %2e. (partial), %ZZ
(invalid encoding). 28/28 bun pass.
Make Code Mode self-describing so clients/agents discover the pattern
on connect without external docs.

(d) Server name bumped to "CreateOS MCP (Code Mode v2)" — visible in
    initialize.serverInfo.name.

(a) Richer tool descriptions for search / execute / pollJob: api shape
    (api.<group>.<operationId>, api.raw escape hatch), ApiError
    contract (status/body/path), result envelope (sync/async/error),
    explicit limits, inline examples.

(e) tools[].Meta = {mode:"code", sandbox:"workerd", version:"v2"} on
    each Code Mode tool. Lets orchestrators flag Code Mode tools
    programmatically.

(b) Resources (pull-on-demand, zero cost until requested):
      code-mode://intro       — workflow + auth + sandbox + limits
      code-mode://api-shape   — api proxy + ApiError contract +
                                operationId discovery snippet

(c) Prompts:
      code-mode/deploy-example — single execute() that deploys + waits,
                                 with pollJob fallback
      code-mode/api-discovery  — 4 search() snippets that walk you
                                 from keyword to operationId

Token surface: 476 -> 896 tokens, well under the 1500 gate.
count-tokens.ts updated to match shipped descriptions.
The v2 ops runbook lives in internal docs only; not appropriate for
the public createos-mcp repo. README v2 section + PR body cover the
user-facing surface.
@pratikbin pratikbin mentioned this pull request Apr 25, 2026
26 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant