diff --git a/noom-mcp-server/DATABRICKS_APPS_SPEC.md b/noom-mcp-server/DATABRICKS_APPS_SPEC.md new file mode 100644 index 00000000..b047ea12 --- /dev/null +++ b/noom-mcp-server/DATABRICKS_APPS_SPEC.md @@ -0,0 +1,239 @@ +# Spec: Noom MCP Server — Databricks Apps Hosting + +## Goal + +Deploy the Noom-governed MCP server as a shared Databricks App so engineers connect to +a central hosted endpoint instead of running a local process. Governance controls +(SP-based SQL execution, per-user query tagging, tool allowlist) must remain identical +to the local mode. + +## Context + +### Current architecture (local mode) + +Each engineer runs `uv run python run.py` as a subprocess on their laptop. The MCP +client (Cursor / Claude Desktop) connects via **stdio**. At startup: + +1. `ensure_oauth_authenticated()` opens a browser OAuth flow to identify the user. +2. `patch_sql_executor()` monkey-patches `SQLExecutor` so all SQL runs as a governed + Service Principal (SP), with `mcp_user:` appended to `system.query.history`. +3. `apply_tool_allowlist()` removes unapproved tools from the FastMCP instance. +4. `mcp.run()` starts the stdio loop. + +The user identity (`mcp_user:`) is resolved once at startup from the local +OAuth cache and is effectively process-scoped (one process = one user). + +### Why hosting changes things + +| Concern | Local | Hosted | +|---|---|---| +| Transport | stdio | Streamable HTTP (`POST /mcp`) | +| User identity | Local OAuth cache, resolved at startup | HTTP header `X-Forwarded-User` injected per-request by Databricks Apps auth proxy | +| Auth gate | Browser OAuth flow | Databricks Apps handles OAuth — no browser needed | +| SP credential fetch | User's OAuth token fetches from secret scope | App's service principal fetches from secret scope | +| Concurrency | 1 user per process | Many users share one process → identity must be request-scoped | + +### How the upstream builder app handles packaging + +`databricks-builder-app/scripts/deploy.sh` copies `databricks_tools_core` and +`databricks_mcp_server` source into a staging `packages/` directory at deploy time, +then sets `PYTHONPATH=/app/python/source_code/packages` in `app.yaml`. The packages +are never checked into this fork. The same approach applies here — the deploy script +for `noom-mcp-server` will bundle the upstream packages at deploy time. + +### Why not Databricks AI Gateway + +AI Gateway is for LLM inference traffic (OpenAI-compatible format). It cannot proxy +MCP protocol (Streamable HTTP / SSE). Databricks Apps is the right product: it hosts +arbitrary Python web servers and provides a built-in OAuth proxy that injects the +authenticated user's identity as `X-Forwarded-User`. + +--- + +## Design decisions + +**1. Separate `hosting/` from `customization/`** + +`customization/` = monkey-patches applied to the upstream server at import time. +`hosting/` = HTTP serving infrastructure for the Databricks Apps deployment. + +These are different concerns. `customization/` is unchanged in meaning; `hosting/` +is new and scoped only to the hosted entrypoint. + +**2. Per-request identity via `ContextVar`** + +In local mode, identity is process-scoped (one user, resolved at startup). In hosted +mode, many users share one process. A `contextvars.ContextVar` is set per-request +from `X-Forwarded-User` before any tool logic runs. `get_mcp_user_identity()` in +`sql_executor_patch.py` reads from this ContextVar when hosted. + +**3. Minimal diff to `customization/`** + +Only two files in `customization/` change: +- `auth_guard_patch.py`: skip browser OAuth when `DATABRICKS_APPS_HOSTED=1`. +- `sql_executor_patch.py`: `get_mcp_user_identity()` reads from ContextVar (falls + back to existing logic for local mode). + +All other patches (`version_check`, `tool_allowlist`, `sql_timeout`, SP client +override, warehouse pin) are unchanged. + +**4. `run.py` is untouched** + +Local dev continues to work exactly as today. The hosted entrypoint is a new file +(`run_app.py`) that uses the `hosting/` layer. + +--- + +## Target file structure + +``` +noom-mcp-server/ + customization/ # unchanged purpose — patches to upstream + patches.py + auth_guard_patch.py # +2 lines: bypass browser flow when DATABRICKS_APPS_HOSTED=1 + sql_executor_patch.py # get_mcp_user_identity() reads ContextVar when hosted + sql_timeout_patch.py + tool_allowlist_patch.py + version_check.py + + hosting/ # new — Databricks Apps serving layer + __init__.py + request_identity.py # ContextVar[str] + set_user_from_request(headers) + middleware.py # ASGI middleware: extracts X-Forwarded-User, sets ContextVar + + run.py # local dev entrypoint — unchanged + run_app.py # hosted entrypoint: apply_all_patches → allowlist → + # timeout patch → mount hosting/ middleware → mcp.run(transport="streamable-http") + app.yaml # Databricks Apps manifest + scripts/ + deploy.sh # stage packages/ → workspace import → apps deploy +``` + +--- + +## Changes in detail + +### `hosting/request_identity.py` + +```python +import contextvars + +_current_user: contextvars.ContextVar[str] = contextvars.ContextVar("mcp_user", default="unknown") + +def set_user_from_request(headers: dict) -> None: + email = headers.get("x-forwarded-user", "unknown") + _current_user.set(email) + +def get_current_mcp_user() -> str: + return _current_user.get() +``` + +### `hosting/middleware.py` + +ASGI middleware that runs before every MCP request. Extracts `X-Forwarded-User` from +the request headers and calls `set_user_from_request()` to populate the ContextVar. +Non-HTTP scopes (lifespan, websocket) pass through unchanged. + +### `customization/auth_guard_patch.py` + +```python +def ensure_oauth_authenticated() -> None: + if os.environ.get("DATABRICKS_APPS_HOSTED"): + logger.info("Running in Databricks Apps — skipping browser OAuth (identity from request headers)") + return + # existing browser flow unchanged + ... +``` + +### `customization/sql_executor_patch.py` — `get_mcp_user_identity()` + +```python +def get_mcp_user_identity() -> str: + # Hosted mode: identity arrives per-request via X-Forwarded-User + if os.environ.get("DATABRICKS_APPS_HOSTED"): + from hosting.request_identity import get_current_mcp_user + return get_current_mcp_user() + # Local mode: unchanged + from databricks_tools_core.auth import get_current_username + username = get_current_username() + if username: + return username + client_id = os.environ.get("DATABRICKS_CLIENT_ID") + return f"sp:{client_id}" if client_id else "unknown" +``` + +### `run_app.py` + +Same patch sequence as `run.py` (version check → patch SQL executor → OAuth guard → +allowlist → timeout ceiling), then mounts the identity middleware and runs with +Streamable HTTP transport: + +```python +os.environ["DATABRICKS_APPS_HOSTED"] = "1" +# ... apply_all_patches(), allowlist, timeout ... +from hosting.middleware import IdentityMiddleware +mcp_asgi = mcp.http_app(path="/mcp", stateless_http=True) +app = IdentityMiddleware(mcp_asgi) +# run via uvicorn externally (app.yaml command) +``` + +### `app.yaml` + +```yaml +command: + - uvicorn + - run_app:app + - --host + - "0.0.0.0" + - --port + - "$DATABRICKS_APP_PORT" + +env: + - name: DATABRICKS_APPS_HOSTED + value: "true" + - name: DATABRICKS_HOST + value: "https://noom-prod.cloud.databricks.com" + - name: DATABRICKS_MCP_SQL_HOST + value: "https://noom-prod.cloud.databricks.com" + - name: DATABRICKS_WAREHOUSE_ID + valueFrom: + secretScope: dbrix_mcp_secret + secretKey: warehouse-id + - name: PYTHONPATH + value: "/app/python/source_code/packages" +``` + +### `scripts/deploy.sh` + +Mirrors `databricks-builder-app/scripts/deploy.sh` pattern: + +1. Stage `customization/`, `hosting/`, `run_app.py`, `app.yaml` into a temp dir. +2. Copy upstream packages: + ```bash + cp -r ../databricks-tools-core/databricks_tools_core/ staging/packages/databricks_tools_core/ + cp -r ../databricks-mcp-server/databricks_mcp_server/ staging/packages/databricks_mcp_server/ + ``` +3. `databricks workspace import-dir staging/ $WORKSPACE_PATH --overwrite` +4. `databricks apps deploy $APP_NAME --source-code-path $WORKSPACE_PATH` + +--- + +## Permissions required (admin setup) + +The Databricks App's service principal (auto-created by `databricks apps create`) needs: + +- **READ** on secret scope `dbrix_mcp_secret` — to fetch SQL SP credentials and + warehouse ID. +- **Can use** on the pinned SQL warehouse. + +No change to the SQL SP itself; it already has the permissions needed for query +execution. + +--- + +## Out of scope + +- MCP client configuration (Cursor / Claude Desktop remote MCP setup) — separate doc. +- Multi-workspace routing — hosted app targets prod only, same as local mode. +- The upstream `databricks-builder-app` MCP gateway — that is a different product with + a UI; this spec is for a standalone MCP-only App. diff --git a/noom-mcp-server/customization/auth_guard_patch.py b/noom-mcp-server/customization/auth_guard_patch.py index 08e50115..4081a57f 100644 --- a/noom-mcp-server/customization/auth_guard_patch.py +++ b/noom-mcp-server/customization/auth_guard_patch.py @@ -21,6 +21,13 @@ def ensure_oauth_authenticated() -> None: + if os.environ.get("DATABRICKS_APPS_HOSTED"): + logger.info( + "Running in Databricks Apps — skipping browser OAuth " + "(user identity arrives per-request via X-Forwarded-User)" + ) + return + """Authenticate the calling user via OAuth, opening a browser if needed. Execution: diff --git a/noom-mcp-server/customization/patches.py b/noom-mcp-server/customization/patches.py index b5a66dca..3fb6de38 100644 --- a/noom-mcp-server/customization/patches.py +++ b/noom-mcp-server/customization/patches.py @@ -15,6 +15,9 @@ from customization.auth_guard_patch import ( ensure_oauth_authenticated as ensure_oauth_authenticated, ) # re-export +from customization.sql_executor_patch import ( + patch_get_best_warehouse as patch_get_best_warehouse, +) # re-export from customization.sql_executor_patch import patch_sql_executor as patch_sql_executor # re-export from customization.version_check import ( # re-export UpstreamChangedError as UpstreamChangedError, @@ -41,6 +44,7 @@ def apply_all_patches() -> None: systems with instructions to run 'databricks auth login'. """ check_upstream_version() + patch_get_best_warehouse() patch_sql_executor() ensure_oauth_authenticated() logger.info("All Noom MCP governance patches applied successfully") diff --git a/noom-mcp-server/customization/sql_executor_patch.py b/noom-mcp-server/customization/sql_executor_patch.py index dbc4a274..2ed72df5 100644 --- a/noom-mcp-server/customization/sql_executor_patch.py +++ b/noom-mcp-server/customization/sql_executor_patch.py @@ -188,17 +188,23 @@ def get_sql_warehouse_id() -> str: def get_mcp_user_identity() -> str: """Return the calling user's identity string for SQL query tagging. - Resolution order: + Hosted mode (DATABRICKS_APPS_HOSTED=1): + Reads from the per-request ContextVar populated by IdentityMiddleware + from the X-Forwarded-User header. Never touches OAuth credentials. + + Local mode: - OAuth browser / CLI users → their email address (from current_user.me()) - OAuth M2M service accounts → "sp:" - Unresolvable → "unknown" - Must be called while the user's own credentials are still in context - (i.e. before switching to the SQL SP client inside an executor). - Returns: Identity string, never None. """ + if os.environ.get("DATABRICKS_APPS_HOSTED"): + from hosting.request_identity import get_current_mcp_user + + return get_current_mcp_user() + from databricks_tools_core.auth import get_current_username username = get_current_username() @@ -217,6 +223,31 @@ def get_mcp_user_identity() -> str: # --------------------------------------------------------------------------- +def patch_get_best_warehouse() -> None: + """Patch get_best_warehouse in the sql module to return the configured warehouse. + + In hosted mode the app SP has no warehouse access, so the live + warehouses.list() call returns empty and execute_sql raises before + SQLExecutor is ever constructed. Since DATABRICKS_WAREHOUSE_ID is always + set in hosted mode, warehouse discovery is unnecessary. + """ + import databricks_tools_core.sql.sql as _sql_module + + if getattr(_sql_module.get_best_warehouse, "_noom_patched", False): + logger.debug("get_best_warehouse already patched — skipping") + return + + def _patched_get_best_warehouse() -> str: + return get_sql_warehouse_id() + + _patched_get_best_warehouse._noom_patched = True + _sql_module.get_best_warehouse = _patched_get_best_warehouse + logger.info( + "get_best_warehouse patched: returns configured warehouse %s", + get_sql_warehouse_id(), + ) + + def patch_sql_executor() -> None: """Patch SQLExecutor to enforce SP client and inject user identity tags. diff --git a/noom-mcp-server/docs/remote_mcp_note.md b/noom-mcp-server/docs/remote_mcp_note.md new file mode 100644 index 00000000..b4cd1109 --- /dev/null +++ b/noom-mcp-server/docs/remote_mcp_note.md @@ -0,0 +1,91 @@ +# Databricks Apps — Remote MCP Notes + +## Status (as of 2026-06-04) + +| Item | Status | +|---|---| +| Hosted entrypoint (`run_app.py`) | Done | +| Deploy script (`scripts/deploy.sh`) | Done — supports `--env dev\|prod` | +| App deployed to dev | Done — `mcp-noom-dev` on `noom-dev.cloud.databricks.com` | +| App SP excluded from governance cleanup | **Pending** — must be added to ignore list | +| App SP granted READ on `dbrix_mcp_secret` | Done (dev: `fe62b38f-c398-43bc-a8dc-191e0974a2e2`) | +| Warehouse CAN_USE grant for app SP | **Pending** | +| Authorization mode set to "on behalf of SP" | **Pending** | +| MCP client config docs | Deferred — experimentation still in progress | + +--- + +## Platform facts + +| Fact | Impact | +|---|---| +| OAuth proxy is automatic | All requests are authenticated via Databricks SSO before reaching the app. `X-Forwarded-User: ` is injected per-request — trusted, not user-supplied. | +| Each app gets an auto-created SP | SP is provisioned on `apps create`. Cannot be replaced with an existing SP. | +| SP is tied to the app, not the deployment | `apps deploy` reuses the same SP. Only `apps delete` + `apps create` produces a new SP. | +| `app.yaml` supports only `command` and `env` | No `resources` section in standalone mode — permission grants are always manual. | +| Authorization mode is UI-only | "On behalf of service principal" vs "on behalf of user" cannot be set via CLI or `app.yaml`. Must be configured in the Databricks UI once after first deploy. | + +## What we need to do + +### 1. Exclude the app SP from the governance cleanup script + +Noom's SP governance framework will delete the app's SP if it isn't excluded, breaking the app. After every `apps create`, add the SP's application ID (UUID printed by `deploy.sh` Step 6) to the ignore list. + +### 2. Grant the app SP READ on `dbrix_mcp_secret` + +The app SP needs this to fetch the SQL SP credentials at startup: + +```bash +databricks secrets put-acl dbrix_mcp_secret READ --profile +``` + +The SP application ID is the UUID shown in `deploy.sh` Step 6 output. + +### 3. Set authorization mode in the UI (first deploy only) + +Go to: `https:///apps//authorization` +Select: **On behalf of service principal** + +Persists across redeployments unless the app is deleted. + +## If the SP gets deleted + +The app is unrecoverable. Steps: + +```bash +databricks apps delete --profile +bash scripts/deploy.sh --env +``` + +Then redo steps 1–3 above with the new SP's application ID. + +## Test script + +`test-mcp-databricks-dev.sh` (repo root) sends raw MCP requests to the hosted app using your local Databricks OAuth token. + +```bash +# List tools +bash test-mcp-databricks-dev.sh + +# Run a SQL query +bash test-mcp-databricks-dev.sh noom-databricks-dev execute_sql "SELECT 1 AS test" + +# List warehouses +bash test-mcp-databricks-dev.sh noom-databricks-dev list_warehouses +``` + +The script uses `databricks auth token -p ` and passes it as `Authorization: Bearer ` to the MCP endpoint. + +--- + +## Environments + +| Env | Profile | Host | Warehouse | +|---|---|---|---| +| dev | `dev` | `noom-dev.cloud.databricks.com` | `12ce469e5394ac8b` | +| prod | `prod` | `noom-prod.cloud.databricks.com` | `575c0a43969584a4` | + +```bash +bash scripts/deploy.sh mcp-noom-dev --env dev +bash scripts/deploy.sh mcp-noom-prod --env prod +``` diff --git a/noom-mcp-server/hosting/__init__.py b/noom-mcp-server/hosting/__init__.py new file mode 100644 index 00000000..d20de5eb --- /dev/null +++ b/noom-mcp-server/hosting/__init__.py @@ -0,0 +1,3 @@ +# Databricks Apps hosting layer — HTTP serving infrastructure for the hosted deployment. +# This package is only used by run_app.py (the hosted entrypoint). +# Local development uses run.py (stdio) and does not import this package. diff --git a/noom-mcp-server/hosting/middleware.py b/noom-mcp-server/hosting/middleware.py new file mode 100644 index 00000000..558dbe4e --- /dev/null +++ b/noom-mcp-server/hosting/middleware.py @@ -0,0 +1,31 @@ +"""ASGI middleware for the hosted MCP server. + +IdentityMiddleware runs before every HTTP request. It extracts the +X-Forwarded-User header (set by the Databricks Apps OAuth proxy) and +populates the per-request ContextVar so sql_executor_patch can tag every +SQL statement with the calling user's identity. + +Non-HTTP ASGI scopes (lifespan, websocket) pass through unchanged so that +the FastMCP session manager starts and shuts down correctly. +""" + +import logging + +from hosting.request_identity import set_user_from_request + +logger = logging.getLogger(__name__) + + +class IdentityMiddleware: + """ASGI middleware: extract X-Forwarded-User and set per-request identity.""" + + def __init__(self, app) -> None: + self.app = app + + async def __call__(self, scope, receive, send) -> None: + if scope["type"] == "http": + headers = { + k.decode("latin-1"): v.decode("latin-1") for k, v in scope.get("headers", []) + } + set_user_from_request(headers) + await self.app(scope, receive, send) diff --git a/noom-mcp-server/hosting/request_identity.py b/noom-mcp-server/hosting/request_identity.py new file mode 100644 index 00000000..82a8b17d --- /dev/null +++ b/noom-mcp-server/hosting/request_identity.py @@ -0,0 +1,44 @@ +"""Per-request user identity for the hosted MCP server. + +In local mode, identity is resolved once at startup from the OAuth cache +(process-scoped: one user per process). + +In hosted mode, many users share one process. Identity arrives per-request +via the X-Forwarded-User header injected by the Databricks Apps auth proxy. +A ContextVar holds the current request's identity so sql_executor_patch can +read it without any cross-request leakage. + +Usage: + # In middleware, before the request handler runs: + set_user_from_request(headers) + + # In sql_executor_patch.get_mcp_user_identity(): + return get_current_mcp_user() +""" + +import contextvars +import logging + +logger = logging.getLogger(__name__) + +_current_user: contextvars.ContextVar[str] = contextvars.ContextVar("mcp_user", default="unknown") + + +def set_user_from_request(headers: dict) -> None: + """Extract X-Forwarded-User from headers and store in the ContextVar. + + Args: + headers: Dict of lowercase header names → values (as decoded strings). + """ + email = headers.get("x-forwarded-user", "unknown") + _current_user.set(email) + logger.debug("Request identity: %s", email) + + +def get_current_mcp_user() -> str: + """Return the identity set for the current request context. + + Returns 'unknown' if called outside a request context or if + X-Forwarded-User was absent from the request headers. + """ + return _current_user.get() diff --git a/noom-mcp-server/requirements-app.txt b/noom-mcp-server/requirements-app.txt new file mode 100644 index 00000000..f69ef518 --- /dev/null +++ b/noom-mcp-server/requirements-app.txt @@ -0,0 +1,99 @@ +# Dependencies for the hosted Databricks Apps deployment. +# Generated from: uv export --no-hashes --no-dev (excluding local editable packages) +# databricks-tools-core and databricks-mcp-server are bundled via PYTHONPATH in deploy.sh. +aiofile==3.9.0 ; python_full_version < '3.11' +aiofile==3.11.1 ; python_full_version >= '3.11' +annotated-types==0.7.0 +anyio==4.13.0 +attrs==26.1.0 +authlib==1.7.2 +backports-tarfile==1.2.0 ; python_full_version < '3.12' +beartype==0.22.9 +cachetools==7.1.3 +caio==0.9.25 +certifi==2026.5.20 +cffi==2.0.0 ; platform_python_implementation != 'PyPy' +chardet==7.4.3 +charset-normalizer==3.4.7 +click==8.3.3 +colorama==0.4.6 +cryptography==48.0.0 +cyclopts==4.14.1 +databricks-sdk==0.110.0 +diff-cover==10.2.0 +dnspython==2.8.0 +docstring-parser==0.18.0 +email-validator==2.3.0 +exceptiongroup==1.3.1 +fastmcp==3.3.1 +fastmcp-slim==3.3.1 +google-auth==2.53.0 +griffelib==2.0.2 +h11==0.16.0 +httpcore==1.0.9 +httpx==0.28.1 +httpx-sse==0.4.3 +idna==3.15 +importlib-metadata==9.0.0 ; python_full_version < '3.12' +jaraco-classes==3.4.0 +jaraco-context==6.1.2 +jaraco-functools==4.5.0 +jeepney==0.9.0 ; sys_platform == 'linux' +jinja2==3.1.6 +joserfc==1.6.5 +jsonref==1.1.0 +jsonschema==4.26.0 +jsonschema-path==0.5.0 +jsonschema-specifications==2025.9.1 +keyring==25.7.0 +markdown-it-py==4.2.0 +markupsafe==3.0.3 +mcp==1.27.1 +mdurl==0.1.2 +more-itertools==11.0.2 +openapi-pydantic==0.5.1 +opentelemetry-api==1.42.0 +packaging==26.2 +pathable==0.6.0 +pathspec==1.1.1 +platformdirs==4.9.6 +pluggy==1.6.0 +plutoprint==0.19.0 +protobuf==6.33.6 +py-key-value-aio==0.4.4 +pyasn1==0.6.3 +pyasn1-modules==0.4.2 +pycparser==3.0 ; implementation_name != 'PyPy' and platform_python_implementation != 'PyPy' +pydantic==2.13.4 +pydantic-core==2.46.4 +pydantic-settings==2.14.1 +pygments==2.20.0 +pyjwt==2.12.1 +pyperclip==1.11.0 +python-dotenv==1.2.2 +python-multipart==0.0.29 +pywin32==311 ; sys_platform == 'win32' +pywin32-ctypes==0.2.3 ; sys_platform == 'win32' +pyyaml==6.0.3 +referencing==0.37.0 +regex==2026.5.9 +requests==2.34.2 +rich==15.0.0 +rich-rst==2.0.1 +rpds-py==0.30.0 +secretstorage==3.5.0 ; sys_platform == 'linux' +sqlfluff==4.2.1 +sqlglot==30.8.0 +sse-starlette==3.4.4 +starlette==1.0.0 +tblib==3.2.2 +tomli==2.4.1 ; python_full_version < '3.11' +tqdm==4.67.3 +typing-extensions==4.15.0 +typing-inspection==0.4.2 +uncalled-for==0.3.2 +urllib3==2.7.0 +uvicorn==0.47.0 +watchfiles==1.2.0 +websockets==16.0 +zipp==4.1.0 ; python_full_version < '3.12' diff --git a/noom-mcp-server/run_app.py b/noom-mcp-server/run_app.py new file mode 100644 index 00000000..b3ea5828 --- /dev/null +++ b/noom-mcp-server/run_app.py @@ -0,0 +1,79 @@ +"""Hosted entrypoint for Databricks Apps deployment. + +Exposes an ASGI 'app' object that uvicorn serves. Transport: Streamable HTTP +at POST /mcp. User identity arrives per-request via X-Forwarded-User injected +by the Databricks Apps OAuth proxy — no browser flow required. + +Local development uses run.py (stdio transport). This file is for the hosted +deployment only. + +Start (via app.yaml): + uvicorn run_app:app --host 0.0.0.0 --port $DATABRICKS_APP_PORT + +Required env vars (set in app.yaml): + DATABRICKS_APPS_HOSTED Set to "1" — signals hosted mode to patches. + DATABRICKS_HOST Workspace URL (used by SDK for identity/secrets). + DATABRICKS_MCP_SQL_HOST Workspace URL for SQL execution. + DATABRICKS_WAREHOUSE_ID Warehouse all queries are forced to run on. + PYTHONPATH Must include the packages/ directory. +""" + +import logging +import os +import sys + +# Signal hosted mode before any patch code loads. +os.environ["DATABRICKS_APPS_HOSTED"] = "1" + +logging.basicConfig( + level=logging.INFO, + format="%(asctime)s %(levelname)s %(name)s: %(message)s", + stream=sys.stderr, +) +logger = logging.getLogger("customization") + +# --------------------------------------------------------------------------- +# Step 1: Apply governance patches. +# --------------------------------------------------------------------------- + +from customization.patches import apply_all_patches, UpstreamChangedError # noqa: E402 + +try: + apply_all_patches() +except UpstreamChangedError as exc: + logger.error("UPSTREAM VERSION CHANGED — server will not start.\n%s", exc) + sys.exit(2) +except RuntimeError as exc: + logger.error("Governance check failed — server will not start: %s", exc) + sys.exit(1) + +# --------------------------------------------------------------------------- +# Step 2: Import upstream server (registers all tools). +# --------------------------------------------------------------------------- + +from databricks_mcp_server.server import mcp # noqa: E402 + +# --------------------------------------------------------------------------- +# Step 3: Apply tool allowlist and timeout ceiling. +# --------------------------------------------------------------------------- + +from customization.tool_allowlist_patch import apply_tool_allowlist # noqa: E402 +from customization.sql_timeout_patch import apply_sql_timeout_ceiling # noqa: E402 + +apply_tool_allowlist(mcp) +apply_sql_timeout_ceiling(mcp) + +# --------------------------------------------------------------------------- +# Step 4: Build the ASGI app. +# +# mcp.http_app() returns a Streamable HTTP ASGI app. IdentityMiddleware wraps +# it to extract X-Forwarded-User per request before any tool logic runs. +# Non-HTTP scopes (lifespan) pass through so FastMCP starts cleanly. +# --------------------------------------------------------------------------- + +from hosting.middleware import IdentityMiddleware # noqa: E402 + +mcp_asgi = mcp.http_app(path="/mcp", stateless_http=True) +app = IdentityMiddleware(mcp_asgi) + +logger.info("Noom MCP server (hosted) ready — POST /mcp") diff --git a/noom-mcp-server/scripts/deploy.sh b/noom-mcp-server/scripts/deploy.sh new file mode 100755 index 00000000..8ec9fd13 --- /dev/null +++ b/noom-mcp-server/scripts/deploy.sh @@ -0,0 +1,235 @@ +#!/bin/bash +# Deploy the Noom MCP server to Databricks Apps. +# +# Usage: +# bash scripts/deploy.sh --env ENV [--profile PROFILE] [--warehouse-id ID] +# +# Environments (--env): +# dev profile=dev, host=noom-dev.cloud.databricks.com, warehouse=12ce469e5394ac8b +# prod profile=prod, host=noom-prod.cloud.databricks.com, warehouse=575c0a43969584a4 +# +# --profile and --warehouse-id override the env defaults when provided. +# +# Example: +# bash scripts/deploy.sh mcp-noom-dev --env dev +# bash scripts/deploy.sh mcp-noom-prod --env prod --warehouse-id + +set -e + +RED='\033[0;31m' +GREEN='\033[0;32m' +YELLOW='\033[1;33m' +BLUE='\033[0;34m' +NC='\033[0m' + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +PROJECT_DIR="$(dirname "$SCRIPT_DIR")" +REPO_ROOT="$(dirname "$PROJECT_DIR")" + +APP_NAME="" +ENV="" +PROFILE="" +WAREHOUSE_ID="" + +while [[ $# -gt 0 ]]; do + case $1 in + --env) ENV="$2"; shift 2 ;; + --profile) PROFILE="$2"; shift 2 ;; + --warehouse-id) WAREHOUSE_ID="$2"; shift 2 ;; + -*) echo -e "${RED}Unknown option: $1${NC}"; exit 1 ;; + *) + if [ -z "$APP_NAME" ]; then APP_NAME="$1"; else echo -e "${RED}Unexpected arg: $1${NC}"; exit 1; fi + shift ;; + esac +done + +if [ -z "$APP_NAME" ]; then + echo -e "${RED}Error: app name required.${NC}" + echo "Usage: bash scripts/deploy.sh --env ENV [--profile PROFILE] [--warehouse-id ID]" + exit 1 +fi + +if [ -z "$ENV" ]; then + echo -e "${RED}Error: --env is required (dev or prod).${NC}" + echo "Usage: bash scripts/deploy.sh --env ENV [--profile PROFILE] [--warehouse-id ID]" + exit 1 +fi + +# Apply env defaults, then let explicit flags override. +case "$ENV" in + dev) + PROFILE="${PROFILE:-dev}" + WAREHOUSE_ID="${WAREHOUSE_ID:-12ce469e5394ac8b}" + ;; + prod) + PROFILE="${PROFILE:-prod}" + WAREHOUSE_ID="${WAREHOUSE_ID:-575c0a43969584a4}" + ;; + *) + echo -e "${RED}Error: unknown env '${ENV}'. Valid values: dev, prod.${NC}" + exit 1 + ;; +esac + +CLI_ARGS="--profile $PROFILE" +STAGING_DIR="/tmp/${APP_NAME}-deploy" + +echo -e "${BLUE}╔══════════════════════════════════════════════╗${NC}" +echo -e "${BLUE}║ Noom MCP Server — Databricks Apps Deploy ║${NC}" +echo -e "${BLUE}╚══════════════════════════════════════════════╝${NC}" +echo "" +echo -e " App: ${GREEN}${APP_NAME}${NC}" +echo -e " Profile: ${PROFILE}" +echo -e " Warehouse: ${WAREHOUSE_ID}" +echo "" + +# ── Step 1: Auth check ────────────────────────────────────────────────────── +echo -e "${YELLOW}[1/6] Checking auth...${NC}" +if ! databricks auth describe $CLI_ARGS &>/dev/null; then + echo -e "${RED}Not authenticated. Run: databricks auth login --profile ${PROFILE}${NC}"; exit 1 +fi + +WORKSPACE_HOST=$(databricks auth describe $CLI_ARGS 2>/dev/null | grep "^Host:" | awk '{print $2}') +CURRENT_USER=$(databricks current-user me $CLI_ARGS --output json 2>/dev/null \ + | python3 -c "import sys,json; print(json.load(sys.stdin).get('userName',''))") + +echo -e " Workspace: ${WORKSPACE_HOST}" +echo -e " User: ${CURRENT_USER}" + +WORKSPACE_PATH="/Workspace/Users/${CURRENT_USER}/apps/${APP_NAME}" +echo -e " Deploy to: ${WORKSPACE_PATH}" +echo "" + +# ── Step 2: Staging ───────────────────────────────────────────────────────── +echo -e "${YELLOW}[2/6] Staging deployment package...${NC}" +rm -rf "$STAGING_DIR" && mkdir -p "$STAGING_DIR" + +# Noom's customization and hosting layers +cp -r "$PROJECT_DIR/customization" "$STAGING_DIR/" +cp -r "$PROJECT_DIR/hosting" "$STAGING_DIR/" +cp "$PROJECT_DIR/run_app.py" "$STAGING_DIR/" +cp "$PROJECT_DIR/requirements-app.txt" "$STAGING_DIR/requirements.txt" + +# Bundle upstream packages so the App can import them via PYTHONPATH +echo " Bundling databricks-tools-core..." +mkdir -p "$STAGING_DIR/packages/databricks_tools_core" +cp -r "$REPO_ROOT/databricks-tools-core/databricks_tools_core/"* \ + "$STAGING_DIR/packages/databricks_tools_core/" + +echo " Bundling databricks-mcp-server..." +mkdir -p "$STAGING_DIR/packages/databricks_mcp_server" +cp -r "$REPO_ROOT/databricks-mcp-server/databricks_mcp_server/"* \ + "$STAGING_DIR/packages/databricks_mcp_server/" + +# VERSION file — identity.py walks upward from packages/databricks_tools_core/ looking for it +cp "$REPO_ROOT/VERSION" "$STAGING_DIR/" + +# Strip pyc files +find "$STAGING_DIR" -name "__pycache__" -type d -exec rm -rf {} + 2>/dev/null || true +find "$STAGING_DIR" -name "*.pyc" -delete 2>/dev/null || true + +# Generate app.yaml +cat > "$STAGING_DIR/app.yaml" << APPYAML +command: + - uvicorn + - run_app:app + - --host + - "0.0.0.0" + - --port + - "\$DATABRICKS_APP_PORT" + +env: + - name: DATABRICKS_APPS_HOSTED + value: "1" + - name: DATABRICKS_HOST + value: "${WORKSPACE_HOST}" + - name: DATABRICKS_MCP_SQL_HOST + value: "${WORKSPACE_HOST}" + - name: DATABRICKS_WAREHOUSE_ID + value: "${WAREHOUSE_ID}" + - name: PYTHONPATH + value: "/app/python/source_code/packages" +APPYAML + +echo -e " ${GREEN}✓${NC} Staged to ${STAGING_DIR}" +echo "" + +# ── Step 3: Create app ─────────────────────────────────────────────────────── +echo -e "${YELLOW}[3/6] Ensuring app exists...${NC}" +if databricks apps get "$APP_NAME" $CLI_ARGS &>/dev/null; then + echo -e " ${GREEN}✓${NC} App '${APP_NAME}' already exists" +else + echo " Creating '${APP_NAME}'..." + databricks apps create "$APP_NAME" $CLI_ARGS + echo -e " ${GREEN}✓${NC} Created" +fi +echo "" + +# ── Step 4: Upload ─────────────────────────────────────────────────────────── +echo -e "${YELLOW}[4/6] Uploading to workspace...${NC}" +databricks workspace import-dir "$STAGING_DIR" "$WORKSPACE_PATH" --overwrite $CLI_ARGS +echo -e " ${GREEN}✓${NC} Uploaded" +echo "" + +# ── Step 5: Deploy ─────────────────────────────────────────────────────────── +echo -e "${YELLOW}[5/6] Deploying app...${NC}" +DEPLOY_OUT=$(databricks apps deploy "$APP_NAME" --source-code-path "$WORKSPACE_PATH" $CLI_ARGS 2>&1) +echo "$DEPLOY_OUT" + +APP_URL=$(databricks apps get "$APP_NAME" $CLI_ARGS --output json 2>/dev/null \ + | python3 -c "import sys,json; print(json.load(sys.stdin).get('url',''))") + +if echo "$DEPLOY_OUT" | python3 -c "import sys,json; d=json.load(sys.stdin); sys.exit(0 if d.get('status',{}).get('state')=='SUCCEEDED' else 1)" 2>/dev/null; then + echo "" + echo -e "${GREEN}╔══════════════════════════════════════════════╗${NC}" + echo -e "${GREEN}║ Deployment successful! ║${NC}" + echo -e "${GREEN}╚══════════════════════════════════════════════╝${NC}" + echo "" + echo -e " App URL: ${GREEN}${APP_URL}${NC}" + echo -e " MCP endpoint: ${GREEN}${APP_URL}/mcp${NC}" + echo "" +else + echo "" + echo -e "${RED}Deployment may have failed. Check logs:${NC}" + echo -e " databricks apps logs ${APP_NAME} ${CLI_ARGS}" + exit 1 +fi + +# ── Step 6: Grant permissions to app SP ────────────────────────────────────── +echo -e "${YELLOW}[6/6] Granting permissions to app SP...${NC}" + +# SP is assigned asynchronously after first deploy — retry for up to 30s. +APP_SP="" +APP_SP_NAME="" +for i in $(seq 1 6); do + APP_INFO=$(databricks apps get "$APP_NAME" $CLI_ARGS --output json 2>/dev/null) + APP_SP=$(echo "$APP_INFO" | python3 -c "import sys,json; d=json.load(sys.stdin); v=d.get('service_principal_client_id',''); print(v if v else '')") + APP_SP_NAME=$(echo "$APP_INFO" | python3 -c "import sys,json; print(json.load(sys.stdin).get('service_principal_name',''))") + [ -n "$APP_SP" ] && break + echo " Waiting for SP assignment (attempt $i/6)..." + sleep 5 +done + +echo "" +echo -e " App service principal: ${GREEN}${APP_SP_NAME}${NC}" +echo -e " SP application ID: ${GREEN}${APP_SP}${NC}" +echo "" + +if [ -z "$APP_SP" ]; then + echo -e "${RED}Could not determine app SP after retries. Grant manually:${NC}" + echo -e " databricks secrets put-acl dbrix_mcp_secret READ --profile ${PROFILE}" +else + echo " Granting READ on secret scope 'dbrix_mcp_secret'..." + databricks secrets put-acl dbrix_mcp_secret "$APP_SP" READ $CLI_ARGS + echo -e " ${GREEN}✓${NC} Secret scope access granted" + + echo " Setting authorization mode to 'on behalf of service principal' (user_api_scopes: sql)..." + TOKEN=$(databricks auth token $CLI_ARGS --output json | python3 -c "import sys,json; print(json.load(sys.stdin)['access_token'])") + curl -s -X PATCH "${WORKSPACE_HOST}/api/2.0/apps/${APP_NAME}" \ + -H "Authorization: Bearer ${TOKEN}" \ + -H "Content-Type: application/json" \ + -d '{"user_api_scopes": ["sql"]}' > /dev/null + echo -e " ${GREEN}✓${NC} Authorization mode set to 'on behalf of service principal'" +fi + +echo "" diff --git a/test-mcp-databricks-dev.sh b/test-mcp-databricks-dev.sh new file mode 100755 index 00000000..1a9ab573 --- /dev/null +++ b/test-mcp-databricks-dev.sh @@ -0,0 +1,42 @@ +#!/usr/bin/env bash +set -euo pipefail + +PROFILE="${1:-noom-databricks-dev}" +METHOD="${2:-tools/list}" +SQL="${3:-SELECT COUNT(*) AS row_count FROM common_data.dim_date}" +MCP_URL="https://mcp-noom-dev-638571477831686.aws.databricksapps.com/mcp" + +TOKEN=$(databricks auth token -p "$PROFILE" --output json | jq -r .access_token) + +PAYLOAD=$(python3 -c " +import json, sys +method, sql = sys.argv[1], sys.argv[2] +if method == 'initialize': + body = {'jsonrpc':'2.0','id':1,'method':'initialize','params':{'protocolVersion':'2024-11-05','capabilities':{},'clientInfo':{'name':'test','version':'1.0'}}} +elif method == 'tools/list': + body = {'jsonrpc':'2.0','id':2,'method':'tools/list','params':{}} +elif method == 'execute_sql': + body = {'jsonrpc':'2.0','id':3,'method':'tools/call','params':{'name':'execute_sql','arguments':{'sql_query':sql}}} +elif method == 'list_warehouses': + body = {'jsonrpc':'2.0','id':4,'method':'tools/call','params':{'name':'manage_warehouse','arguments':{'action':'list'}}} +else: + print(f'Unknown method: {method}', file=sys.stderr) + sys.exit(1) +print(json.dumps(body)) +" "$METHOD" "$SQL") + +echo ">>> Request: $PAYLOAD" +echo "" + +RESPONSE=$(curl -s -w "\n\nHTTP_STATUS:%{http_code}" -X POST "$MCP_URL" \ + -H "Content-Type: application/json" \ + -H "Accept: application/json, text/event-stream" \ + -H "Authorization: Bearer $TOKEN" \ + --data "$PAYLOAD") + +HTTP_STATUS=$(echo "$RESPONSE" | grep "HTTP_STATUS:" | cut -d: -f2) +# Extract JSON from SSE data: lines format +BODY=$(echo "$RESPONSE" | grep "^data:" | sed 's/^data: //' | head -1) + +echo "<<< HTTP Status: $HTTP_STATUS" +echo "$BODY" | python3 -m json.tool 2>/dev/null || echo "<<< Raw: $BODY"