docs(rfc-003): canonical MCP implementation blueprint (omnigraph-mcp crate)#182
docs(rfc-003): canonical MCP implementation blueprint (omnigraph-mcp crate)#182ragnorc wants to merge 2 commits into
Conversation
…crate) Brings RFC-003 to the canonical from-scratch spec for the in-server MCP surface: a dedicated omnigraph-mcp transport crate + McpBackend trait (server -> mcp dep to avoid a Cargo cycle), the verified rmcp 1.7 design (conformance MUSTs for free), backend reuse (do_* / run_query / run_mutate / authorize / api), the 13-tool catalog + Cedar mapping, ParamKind -> JSON Schema, deny-masking + isError split, two resources, per-graph routing (graphs_list stays REST-only), auth + client-compat matrix + OAuth fast-follow, tests, and locked decisions. Build the implementation on this branch from blueprint sections B.1-B.13. The spec is informed by the #157 reference implementation.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3009564437
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
| - A generic `struct McpService<B>` implements rmcp's `ServerHandler`, delegating each method to `B` after extracting `&Parts` from `ctx.extensions` once (missing → `McpError::internal_error`). `get_info → backend.server_info()`; `initialize`/`ping` use rmcp's defaults. | ||
| - `pub fn mcp_router<B: McpBackend>(backend: B, body_limit: usize) -> axum::Router`: | ||
| - `let svc = StreamableHttpService::new(move || Ok(McpService::new(backend.clone())), Arc::new(LocalSessionManager::default()), config)` with `config = StreamableHttpServerConfig::default().with_stateful_mode(false).with_json_response(true)` (`StreamableHttpServerConfig` is `#[non_exhaustive]` — build from `Default`, mutate via the `with_*` setters; keep the loopback `allowed_hosts` default). |
There was a problem hiding this comment.
Configure allowed hosts for non-loopback MCP deployments
In any remote or cloud deployment, keeping rmcp's default allowed_hosts makes /mcp reject valid clients before bearer auth, because rmcp 1.7 defaults Host validation to loopback authorities only (localhost, 127.0.0.1, ::1). This RFC explicitly targets remotely reachable MCP clients, so the blueprint should require a server-configured allowed-host allowlist (or a documented reverse-proxy Host validation setup) instead of preserving the loopback default.
Useful? React with 👍 / 👎.
| - **Ad-hoc `query`/`mutate` always exposed, Cedar-only**; no `mcp.allow_adhoc`. [Open Q3 → resolved.] | ||
| - **`query`/`mutate` ids only**, no `read`/`change` aliases. [Open Q7 → resolved.] | ||
| - **Per-graph `/mcp` routing**; `graphs_list`/`omnigraph://graphs` **dropped from MCP** (graph discovery via REST `GET /graphs`); no server-scoped MCP tools. [Open Q5 → resolved.] |
There was a problem hiding this comment.
Reconcile the stale rollout with locked decisions
These locked decisions conflict with the still-active Rollout and Testing sections below, which continue to tell implementers to ship graphs_list, read/change aliases, a server-level /mcp, and an mcp.allow_adhoc switch. Since the status says the blueprint only supersedes the older §5 sketches, someone following the Rollout section would build surface area this canonical section says was dropped; please update those later sections or explicitly mark them historical too.
Useful? React with 👍 / 👎.
Fold the valid #157 review findings into the blueprint as by-design fixes so a from-scratch build cannot reintroduce them: - B.3.1: host/origin policy derived from the server bind + config (loopback -> loopback hosts; non-loopback -> configured public host, else Host-allowlist off with bearer + Origin as the controls). Closes the loopback-only-default footgun that 403s every remote client before bearer auth. - B.6/B.7: stored-query params nested under a `params` object (mirrors POST /queries/{name}), so the branch/snapshot knobs cannot collide with a query param; mutation tools omit `snapshot` + additionalProperties:false, so mutation-against-a-snapshot is unrepresentable. Vector `dim` made intrinsic to the kind, killing the unwrap_or(0) zero-length-array schema. - B.8: one canonical classify(Result<_, ApiError>) for tool errors (5xx -> JSON-RPC, 4xx/409 -> isError), no second mapper to drift; list visibility uses the call-path default-branch authorization, not a branch:None probe that hid branch-scoped-grant tools. - Reconciled the stale Summary count and the section-5-era Rollout/Testing sections (banners pointing to B.12/B.4/B.13).
|
Addressed the valid review findings as correct-by-construction fixes (commit
The "missing |
| **Reuse, never reinvent.** First factor **10 thin `do_*` fns** out of the inline `server_*` HTTP handlers (each is `authorize_request(...) → engine call → DTO`) so REST and MCP dispatch one path: `do_snapshot`, `do_schema_get`, `do_branches_list`, `do_commits_list`, `do_commit_show`, `do_ingest`, `do_branch_create`, `do_branch_delete`, `do_branch_merge`, `do_schema_apply`. (No `do_graphs_list` — `graphs_list` is not an MCP tool, see B.10; `server_graphs_list` stays inline for `GET /graphs`.) Land that as a behavior-neutral refactor commit first (it keeps the REST handlers as thin wrappers; all server tests stay green). Then reuse as-is: `run_query` / `run_mutate` (already decoupled from request bodies), `authorize` → `Authz { Allowed, Denied(msg) }` (with `Err` reserved for operational 401/500), `api::query_catalog_entry` / `ParamKind` / `read_output`, `ApiError` (add `pub(crate) status_code()` + `message_str()` accessors for the error classifier). Mount in `build_app`: `.merge(mcp::mcp_router(state))` inside the `per_graph_protected` group, where the server's thin `mcp::mcp_router(state) = omnigraph_mcp::mcp_router(OmnigraphMcpBackend::new(state), INGEST_REQUEST_BODY_LIMIT_BYTES)`. | ||
|
|
||
| Represent the built-ins as a `Builtin` enum (one variant per tool; `descriptor` / `gate` / `call` as match arms) — lower liability than ~14 unit structs + `dyn` + `async-trait` per tool. Stored-query tools are a sibling populator over `handle.queries`. | ||
|
|
There was a problem hiding this comment.
B.4 call site missing
hosts: McpHostPolicy argument
The mcp_router function defined in B.3 has the signature pub fn mcp_router<B: McpBackend>(backend: B, body_limit: usize, hosts: McpHostPolicy) -> axum::Router (three parameters), but the mounting example in B.4 only passes two: omnigraph_mcp::mcp_router(OmnigraphMcpBackend::new(state), INGEST_REQUEST_BODY_LIMIT_BYTES). The hosts: McpHostPolicy argument is absent. An implementer following B.4 verbatim would get a Rust compile error at that call site. The call should be updated to pass the McpHostPolicy computed at startup per B.3.1 — for example omnigraph_mcp::mcp_router(OmnigraphMcpBackend::new(state), INGEST_REQUEST_BODY_LIMIT_BYTES, McpHostPolicy::from_bind_config(&state.config)).
Summary
Updates RFC-003 into the authoritative, build-from-scratch spec for the in-server MCP surface, with a dedicated
omnigraph-mcpcrate from the start. Docs-only (one file); no code.A reference implementation shipped in #157 (it proved rmcp 1.7 integrates on edition 2024, the auth-extension passthrough works, the conformance MUSTs are met, and the surface splits cleanly into a transport crate + a server backend). This RFC folds those learnings into a clean spec so the surface can be implemented properly from
mainwith the crate baked in — not retrofitted.What's in the new "Implementation Blueprint" (B.1–B.13)
A self-contained, ordered build guide:
crates/omnigraph-mcp(rmcp Streamable-HTTP transport + anMcpBackendtrait + rmcp re-exports);omnigraph-serverdepends on it and implements the trait. The dependency must goserver → mcp(amcp → serverdep cycles at the package level:server-bin → mcp → server-lib). The trait inverts the direction, so the crate never names an omnigraph type.&http::request::Partspassthrough (how the server-injectedResolvedActor/GraphHandlereach the backend), and the verified rmcp config that givesGET→405,Origin→403, andMCP-Protocol-Versionvalidation for free.do_*fns; reuserun_query/run_mutate/authorize/api::query_catalog_entry/ApiError.ParamKind → JSON Schema, deny-masking +isErrorsplit, stored-query double-gate, two resources./mcp;graphs_list/omnigraph://graphsdropped from MCP (graph discovery stays REST-only viaGET /graphs); explicitly rules out a single flat/mcpwithgraph_id-in-call (breaks per-graph stored-query listing + isolation).cargo tree | grep rmcpconfirms rmcp is not a direct server dep), and the locked decisions (resolves the Open Questions).Housekeeping in the same edit
McpTooltrait became aBuiltinenum + theMcpBackendcrate trait).Relationship to #157
#157 is the working reference implementation (in-server MCP without the crate split). This RFC is the spec for the clean version; the implementation will be built on a fresh branch from
mainfollowing B.1–B.13, cribbing from #157.Note
Low Risk
Docs-only change to a single RFC file; no runtime code or configuration is modified.
Overview
RFC-003 moves from Proposed to the canonical spec for a clean in-server MCP reimplementation from
main, citing #157 as the reference proof and pointing implementers at the new Implementation Blueprint (B.1–B.13) instead of the original §5 sketches.The blueprint adds an ordered build guide: split
crates/omnigraph-mcp(rmcp Streamable HTTP +McpBackend+ re-exports) fromomnigraph-server(implements the trait;server → mcponly),&http::request::Partsextension passthrough for auth/graph context, deployment-derived Host/Origin policy (not rmcp’s loopback default),do_*refactor + reuse of existing invoke/authorize/catalog paths, 13 built-in tools + Cedar mapping, stored-query JSON Schema and list/call deny-masking, per-graph/mcpwithgraphs_list/omnigraph://graphsdropped from MCP, auth client matrix, and B.12 tests/verification.Housekeeping in the same doc: Summary and stdio-package relationship text updated (v0.6.1 / 16 tools vs in-server 13); Reconciliation labeled pre-MCP history; §5, Testing, Rollout, and Open Questions marked superseded or resolved via B.13 (per-query
invoke_queryand OAuth remain deferred).Reviewed by Cursor Bugbot for commit 2ade97f. Bugbot is set up for automated code reviews on this repo. Configure here.
Greptile Summary
Promotes RFC-003 from "Proposed" to the canonical, authoritative build spec for a clean
omnigraph-mcpcrate implementation, folding in every verified decision from the reference work in #157. One concrete spec inconsistency was found.omnigraph-mcptransport +omnigraph-serverbackend), theMcpBackendtrait seam using&http::request::Partsfor auth passthrough, host/origin policy derived from deployment context, the 13-tool Cedar-mapped catalog, stored-query projection,classifyerror mapping, and a full test/verification plan.mcp_routercall site passes only two arguments while B.3 defines the function with three (backend,body_limit,hosts: McpHostPolicy); following B.4 literally would produce a compile error.Confidence Score: 4/5
Documentation-only change; safe to merge with the B.4 call-site inconsistency corrected before the implementation branch begins.
The only changed file is an RFC markdown document with no runtime impact. The spec is internally consistent except for one call site in B.4 that omits the required
hosts: McpHostPolicyargument that B.3 defines — an implementer following that example verbatim would hit a compile error at the first build. No code, migrations, or configs are modified.docs/dev/rfc-003-mcp-server-surface.md — the B.4 mcp_router call site needs the hosts argument added before implementation begins.
Important Files Changed
Sequence Diagram
sequenceDiagram participant Client as MCP Client participant Middleware as Auth Middleware participant McpCrate as omnigraph-mcp McpService participant Backend as OmnigraphMcpBackend participant Cedar as Cedar Policy Engine participant Engine as Graph Engine Client->>Middleware: "POST /graphs/{id}/mcp Authorization: Bearer" Middleware->>McpCrate: "Extensions: ResolvedActor + Arc<GraphHandle>" McpCrate->>Backend: call_tool(parts, name, args) Backend->>Cedar: authorize(PolicyAction, actor, branch) Cedar-->>Backend: "Allowed | Denied" alt Denied Backend-->>McpCrate: -32602 unknown tool else Allowed Backend->>Engine: "do_* / run_query" Engine-->>Backend: "Ok(result) | Err(ApiError)" Backend->>Backend: classify(result) Backend-->>McpCrate: "CallToolResult | McpError" end McpCrate-->>Client: JSON-RPC responseReviews (2): Last reviewed commit: "docs(rfc-003): correct-by-construction f..." | Re-trigger Greptile