CoCo v2 -- AI Cohort Copilot for Healthcare RWD

Author: debu-sinha (debusinha2009@gmail.com)

CoCo is a natural-language cohort-building assistant for real-world healthcare data. Ask for a cohort ("Type 2 diabetes patients on metformin with recent labs") and CoCo identifies clinical codes, generates and validates SQL against a Databricks warehouse, executes the query, and synthesizes a response with sample rows and clinical context.

This repo is a turnkey reference implementation. Clone it, deploy the bundle, run the setup job, and you'll have a working Cohort Copilot in your own Databricks workspace in about 30 minutes. Multiple users can deploy to the same workspace without collisions. Each user gets their own namespaced resources.

Quick start

Prerequisites

You'll need these before you start:

A Databricks workspace with Unity Catalog and Model Serving
An existing serverless SQL warehouse (the setup job doesn't create one). To find the warehouse ID: SQL -> SQL Warehouses -> click your warehouse -> copy the 16-char hex from the URL (e.g. https://<workspace>/sql/warehouses/abc123def456789 -> warehouse_id is abc123def456789). If you don't have one, ask your workspace admin to create a serverless warehouse (Small is enough for cohort queries).
An existing Unity Catalog catalog where you have CREATE SCHEMA permission. The setup job can't create catalogs on Default Storage workspaces. Run the preflight script (below) to find one you can use.
databricks-claude-sonnet-4-6 (or equivalent) FMAPI endpoint available in your workspace
Databricks CLI installed and a profile configured (databricks auth login)
Python 3.10+ locally
MLflow Managed Prompt Registry enabled (Preview feature). The agent loads prompt instructions from the registry, and the optimizer writes tuned prompts back. Enable it under Settings > Preview features > "Managed MLflow Prompt Registry".

Note on OBO / user authorization: CoCo does not use on-behalf-of user tokens. Every data access is made by the app's service principal. Its permissions come from typed resource bindings (SQL warehouse CAN_USE, agent endpoint CAN_QUERY, Lakebase CAN_CONNECT_AND_CREATE). Earlier versions required the Apps OBO preview feature. The current version does not. The app deploys cleanly on workspaces without that flag enabled.

Permissions listed in docs/PERMISSIONS.md

Optional (the setup job creates these if you have permission):

Lakebase instance (for chat session persistence)
Vector Search endpoint (for clinical knowledge RAG)

Step 0: Preflight check (MANDATORY - every deployer runs this)

Who runs it: every person who will deploy CoCo - you, your team, your platform admins. Run it against your own CLI profile so it probes your permissions. Admin-only preflights are not enough. The setup job runs as the deployer's identity, so the deployer is the only one whose permissions matter.

When to run it: after you have the CLI profile configured (databricks auth login) and before any databricks bundle command. It takes about 60 seconds.

python scripts/preflight_check.py \
  -p PROFILE \
  --warehouse-id WH_ID \
  --catalog CATALOG \
  --unique-id YOUR_ID

The script actively exercises the permissions (not just lists APIs). It probes:

authentication + warehouse access + Claude endpoint
CAN_QUERY on the Claude endpoint (1-token smoke test)
CREATE SCHEMA on your catalog (creates + drops a probe schema)
MLflow Managed Prompt Registry preview flag (registers + deletes a probe prompt)
config/default.yaml llm.endpoint matches what is actually deployed (catches the 4-5 vs 4-6 mismatch)
Lakebase, Vector Search, Databricks Apps API reachability
Existing coco resources that would collide with your unique_id

What to do based on the result:

Summary line	What to do
`Results: N passed, 0 failed, 0 warnings`	proceed to Step 1 below
`... 0 failed, 1+ warnings`	read each WARN line. Most are handled gracefully by setup, but verify before running a 25-minute job
`... 1+ failed`	stop. Each FAIL line has a specific fix hint. If it is a permission issue, send `docs/PERMISSIONS.md` to your admin and re-run preflight after the grant

If no catalog passes the CREATE SCHEMA probe:

Ask your workspace admin to grant: GRANT USE CATALOG, CREATE SCHEMA ON CATALOG <catalog> TO \``
Or create a catalog you own: UI -> Catalog -> + -> Create a new catalog (not available on all workspaces - Default Storage workspaces block this for non-admins)

Deploy in 3 commands

# 1. Clone
git clone https://github.com/debu-sinha/coco-reference.git
cd coco-reference

# 2. Deploy the bundle
#    YOUR_ID   = your initials or short name (namespaces all resources)
#    WH_ID     = your serverless SQL warehouse id (hex string)
#    CATALOG   = Unity Catalog name you have CREATE SCHEMA access on
#    PROFILE   = your Databricks CLI profile name
databricks bundle deploy -t demo -p PROFILE \
  --var unique_id=YOUR_ID \
  --var warehouse_id=WH_ID \
  --var catalog=CATALOG

# 3. Run the setup job (provisions everything: ~25-40 min)
databricks bundle run setup_workspace -t demo -p PROFILE \
  --var unique_id=YOUR_ID \
  --var warehouse_id=WH_ID \
  --var catalog=CATALOG

When the job finishes, it'll print the app URL. Open it in a browser, create a thread, and ask a cohort question.

What the setup job provisions

All resources are namespaced by unique_id so multiple users can deploy to the same workspace:

Resource	Name pattern	What it does
UC Schema	`cohort_builder_<id>`	Tables, volumes, registered model (setup Step 2)
Delta tables	`patients`, `diagnoses`, `prescriptions`, `procedures`, `claims`, `suppliers`	10k synthetic patients via `src/coco/data_generator/` with clinically realistic correlations
Vector Search endpoint	`coco-vs-<id>`	Per-user. Delta Sync index docs
Vector Search index	`coco_knowledge_idx`	Hybrid BM25 + BGE embeddings on `knowledge_chunks` (setup Step 5)
Lakebase instance	`coco-lb-<id>`	Managed Postgres for session state (Lakebase docs)
Lakebase database	`coco`	Schema `coco_sessions` with `threads`, `messages`, `runs`, `feedback` tables (schema)
MLflow Prompt Registry	`<catalog>.cohort_builder_<id>.{cohort_query,clinical_codes,sql_generator,response_synthesizer}`	3-part UC names with `@production` alias (loader)
Model Serving endpoint	`coco-agent-<id>`	`dspy.ReAct` agent via Mosaic AI Agent Framework (deploy)
UC registered model	`<catalog>.cohort_builder_<id>.coco_agent_<id>`	Versioned agent model (deploy.py)
Databricks App	`coco-<id>`	FastAPI + HTMX chat UI, SP-only auth via `X-Forwarded-Email` (auth.py)
MLflow experiment	`/Users/<email>/coco-agent`	Per-user. Traces, runs, model artifacts. No fallback - setup errors loudly if `COCO_MLFLOW_EXPERIMENT` is unset. This is intentional so traces can never silently land in a shared experiment.

Minimal mode

Pass --var minimal=true to bundle run setup_workspace and the Vector Search index creation step is skipped. The VS endpoint is still provisioned (cheap, idempotent), and Lakebase still runs. The agent deploys either way - its retrieve_knowledge tool just returns empty results when the index isn't populated.

databricks bundle run setup_workspace -t demo -p PROFILE \
  --var unique_id=YOUR_ID --var warehouse_id=WH_ID --var catalog=CATALOG \
  --var minimal=true

Use this for quick learning deploys, workspaces where VS index creation is slow or restricted, or cost-sensitive test runs. Full Lakebase-skip (for workspaces without Lakebase) is planned but not tested yet - see CHANGELOG.md.

Multi-user isolation

Two users deploying to the same workspace with different unique_id values get completely separate resources:

# User A (initials: al)
databricks bundle deploy -t demo -p PROFILE --var unique_id=al --var warehouse_id=WH_ID --var catalog=my_catalog

# User B (initials: mj)
databricks bundle deploy -t demo -p PROFILE --var unique_id=mj --var warehouse_id=WH_ID --var catalog=my_catalog

User A gets coco-agent-al, coco-al app, cohort_builder_al schema. User B gets coco-agent-mj, coco-mj app, cohort_builder_mj schema. Zero collisions. This works through the entire pipeline including agent deployment, model registration, and app creation.

Teardown

To tear down everything your user created (app, serving endpoint, Lakebase instance, UC schema with tables and volumes, prompts, registered model, per-user MLflow experiment):

databricks bundle run teardown_workspace -t demo -p PROFILE \
  --var unique_id=YOUR_ID \
  --var catalog=CATALOG

Shared resources stay intact by default. The teardown never drops the UC catalog or the shared Vector Search endpoint unless you explicitly pass --var delete_catalog=YES or --var delete_vs_endpoint=YES. Two users running teardown at the same time touch disjoint namespaced resources and cannot interfere with each other. The script is idempotent, so a partial teardown is safe to re-run.

Cleaning up multiple deploys at once. If you deployed with several unique_id values (for example, dev, then later ds), one teardown invocation only handles the namespace matching the current vars. Flip the notebook widget scan_all_my_deploys to YES to scan the workspace for every CoCo app, serving endpoint, Lakebase instance, and cohort_builder_* schema whose creator/owner is the current user. Matches are merged with the widget-driven list and torn down together. Run it from the notebook UI (the CLI bundle run command uses the bundle's base_parameters, which keep scan_all_my_deploys=NO).

Troubleshooting

Problem	Cause	Fix
`Only serverless compute is supported`	Workspace is serverless-only	Already handled (jobs use serverless environments)
`Metastore storage root URL doesn't exist`	Default Storage workspace	Pass `--var catalog=<existing>` with a catalog you already have access to
`Cannot create Lakebase instance`	No Lakebase permission	Ask admin to create the instance, then pass `--var lakebase_instance=<name>`. CU_1 is plenty.
`token passthrough feature is not enabled`	Apps OBO not turned on	Workspace admin: Settings > Preview features > enable "User authorization for Databricks Apps"
`typing_extensions` / `deprecated` import error	Serverless sys.path issue	Already handled in the notebook. If you see it, redeploy the bundle to pick up the fix.
`cannot import name 'agents' from 'databricks'`	Serverless namespace collision	Already handled. The post-restart cell extends `databricks.__path__`.
`AiGatewayConfig` import error	Old system databricks-sdk	Already handled. The sys.path fix prioritizes pip-installed packages.
`Bad model name: please specify all three levels`	Config env vars not set	Already handled. Notebook sets `COCO_*` env vars from widget values before deploy.
Agent returns empty results	Config missing warehouse_id	The deploy resolves config at log_model time. Re-deploy the agent.
`error downloading Terraform: openpgp: key expired` on `databricks bundle deploy`	Older Databricks CLI ships with an expired GPG-signing key for its embedded Terraform download	Install Terraform locally (`curl -sSL https://releases.hashicorp.com/terraform/1.9.8/terraform_1.9.8_$(uname -s

For the full permissions checklist, see docs/PERMISSIONS.md.

Known issues

Things that work today but might bite you in specific conditions. Check CHANGELOG.md for the list of what's validated end-to-end.

Lakebase credential rotation. Tokens are minted on demand with a ~1h TTL and the pool rotates at 55 minutes. If Databricks changes the TTL, the pool will hit auth errors on the next query. We don't poll or log credential health. See src/coco/app/sessions/lakebase.py.
Prompt Registry flag can be disabled after deploy. The preflight script catches it before setup, but nothing checks it at request time. If an admin flips the flag off post-deploy, load_prompt falls back to bundled DEFAULTS silently. See src/coco/agent/prompts/__init__.py.
Agent endpoint cold starts. Scale-to-zero is on by default. First request after idle can take 30-60 seconds while the container warms. The app doesn't queue or show a dedicated "waking up" state - the user sees the standard "agent is thinking" spinner.
LLM-as-judge scorers use asyncio.run. response_relevance_scorer and phi_leak_scorer in src/coco/observability/scorers.py spin an event loop. They worked in our validation environment but treat them as lower-confidence than the code scorers.
Vector Search index takes a few minutes to go live after setup_workspace creates it. The agent's retrieve_knowledge tool returns empty results until the index finishes syncing. Setup waits but does not gate on index-ready.

Architecture

The agent uses dspy.ReAct with native tool calling. Claude decides which tools to call (inspect_schema, identify_clinical_codes, generate_sql, execute_sql, retrieve_knowledge) based on tool definitions derived from Python function docstrings. No keyword-matched planner, no separate planning prompt. The model IS the planner. Every tool call is decorated with @mlflow.trace for full observability.

For the detailed architecture writeup with gotchas and code pointers, see docs/design/apps-mosaic-ai-agent-reference.md.

Prompt management and optimization

CoCo uses MLflow Prompt Registry for all DSPy signature instructions. The setup notebook registers 4 default prompts under 3-part UC names: <catalog>.cohort_builder_<id>.{cohort_query,clinical_codes,sql_generator,response_synthesizer}. The agent loads them at runtime via the @production alias, so you can update prompts without redeploying.

The automatic optimization loop:

User asks a cohort question
  -> Agent answers (prompt loaded from MLflow Prompt Registry at @production)
  -> User clicks thumbs up or thumbs down (stored in Lakebase)
  -> Weekly job (notebooks/03_optimize_dspy.py) queries thumbs-up pairs
  -> mlflow.genai.optimize_prompts with GepaPromptOptimizer runs evolutionary
     search over instruction variations against a Correctness scorer
  -> New prompt version registered, production alias flipped to it
  -> Agent picks up the new prompt on the next request (no redeploy)

To enable auto-optimization: collect 2+ thumbs-up interactions (the workshop-demo default) or 10+ for production use, then unpause the CoCo DSPy Optimization job in the Workflows UI. It runs weekly on Sunday at 2am UTC. Override min_examples via the job widget for production.

Always gate promotions on evaluation. GEPA can overfit on small feedback samples and regress on held-out scenarios. The recommended flow is: run the optimize job, run the eval job against the new version, compare metrics against the baseline, and only keep the @production alias on the new version if metrics improved. To roll back, call mlflow.genai.set_prompt_alias(name=..., version=<prev>, alias="production"). The alias flip takes effect on the next request, no redeploy.

To manually edit a prompt: go to the MLflow Prompt Registry in the Databricks UI, find <catalog>.cohort_builder_<id>.cohort_query, and create a new version with the production alias.

Running tests locally

python -m venv .venv
source .venv/activate
pip install -e ".[dev]"

pytest -m unit                  # fast, no external deps (~65 tests)
pytest -m integration           # mocked Databricks services
ruff check src tests
ruff format src tests

Running the app locally (mocked backend)

export COCO_CONFIG_PATH=config/default.yaml
export DATABRICKS_HOST=https://example.cloud.databricks.com
export COCO_WAREHOUSE_ID=your_warehouse_id
export COCO_CATALOG_NAME=coco_demo
export COCO_SCHEMA_NAME=cohort_builder
export COCO_AGENT_ENDPOINT_NAME=coco-agent
cd src/coco/app && uvicorn main:app --reload

The app should come up at http://localhost:8000. SSE streaming won't work without a deployed agent endpoint (COCO_AGENT_ENDPOINT_URL).

Configuration

config/default.yaml is the single config file. Environment variables are interpolated via ${VAR} tokens. Key env vars:

Env var	What it does	Set by
`DATABRICKS_HOST`	Workspace URL	Auto-injected in Databricks Apps
`COCO_WAREHOUSE_ID`	SQL warehouse for queries	App resource binding
`COCO_AGENT_ENDPOINT_URL`	Agent serving endpoint URL	App resource binding
`COCO_AGENT_ENDPOINT_NAME`	Agent serving endpoint name	Setup notebook widget
`COCO_CATALOG_NAME`	Unity Catalog name	App env var / notebook widget
`COCO_SCHEMA_NAME`	Schema name	App env var / notebook widget
`COCO_LAKEBASE_INSTANCE`	Lakebase instance name	Setup notebook widget

Key docs

Doc	What it covers
`docs/PERMISSIONS.md`	Every permission needed for end-to-end deploy
`docs/design/apps-mosaic-ai-agent-reference.md`	Full architecture with gotchas (Lakebase, token rotation, code_paths, planner)
`docs/examples/`	Working Python snippets for calling FMAPI from DSPy
`docs/cost-attribution/`	Cost tracking queries, tagging policy, warehouse setup template
`docs/WORKSHOP_PREREQS.md`	Workshop-specific prerequisites checklist

Knowledge base: demo vs production

The markdown files in src/coco/knowledge/ are the demo knowledge base that get chunked, embedded, and indexed in Vector Search during setup. The agent's retrieve_knowledge tool queries this index.

This is a demo shortcut, not production practice. In production:

Concern	Demo approach	Production approach
Schema knowledge	Hand-written markdown per table	Auto-generate from UC column comments on a schedule
Clinical rules	Prose in `clinical_rules.md`	A `clinical_reference` table in UC that the clinical team maintains
Unstructured docs	Static files in git	DLT pipeline from source systems, Delta table with CDC, auto-syncing VS index
Multi-domain	One set of files for one schema	Each domain gets its own UC schema + VS index

The right long-term pattern:

Put knowledge in UC column comments, not separate files:
```
COMMENT ON COLUMN diagnoses.icd10_code IS 'ICD-10-CM code. E11.* = Type 2 diabetes.';
```
The agent's inspect_schema tool already reads column metadata.

Use a reference table for clinical rules, not markdown:

CREATE TABLE clinical_reference (
    condition STRING, icd10_pattern STRING, first_line_drugs ARRAY<STRING>,
    source STRING, last_reviewed DATE, reviewed_by STRING
);

Use a managed VS pipeline for unstructured docs:

Source docs (PDFs, wikis) -> DLT pipeline -> Delta table (CDC) -> auto-syncing VS index

Extending CoCo

After the initial deploy, Claude Code is the recommended path for day-2 changes. Extension points:

Tools: add a function to src/coco/agent/responses_agent.py and pass it to dspy.ReAct(tools=[...])
Prompts: edit in MLflow Prompt Registry UI (no redeploy needed) or update defaults in src/coco/agent/prompts/__init__.py
DSPy signatures: edit src/coco/agent/signatures.py for field changes, run notebooks/03_optimize_dspy.py for GEPA instruction tuning against thumbs-up feedback
Guardrails: add schemas to config.guardrails.allowed_schemas in config/default.yaml
UI: edit templates in src/coco/app/templates/, CSS in src/coco/app/static/styles.css
Cost tracking: customize queries in docs/cost-attribution/queries/

Repo structure

coco-reference/
  app.yaml                     Databricks App manifest
  databricks.yml               DABs bundle definition (jobs, variables, targets)
  config/default.yaml          Runtime config (env var interpolated)
  requirements.txt             App container pip deps
  pyproject.toml               Package metadata + dev deps
  scripts/
    preflight_check.py         Pre-deploy workspace permission checker
  src/coco/
    agent/                     The dspy.ReAct agent + tools + deploy
      prompts/                 MLflow Prompt Registry integration + defaults
      signatures.py            DSPy signature definitions (typed I/O contracts)
      responses_agent.py       Main agent: ReAct loop, tool functions
      deploy.py                Model logging, UC registration, endpoint deploy
      guardrails.py            SQL validation (read-only, schema allowlist)
      tools/                   Tool implementations (SQL, VS, schema, codes)
    app/                       FastAPI + HTMX + SSE chat UI
      sessions/                Lakebase CRUD (threads, messages, feedback)
    config.py                  Config loader
    gateway/                   LLM gateway client
    sql/                       Statement Execution API client
    data_generator/            Synthetic patient data generator
    knowledge/                 Clinical knowledge markdown docs (demo)
  notebooks/
    00_setup_workspace.py      Full provisioning notebook (auto-namespaces per user)
    02_evaluate.py             Scenario-based evaluation against the live agent endpoint
    03_optimize_dspy.py        GEPA prompt optimization from Lakebase thumbs-up feedback
    99_teardown.py             Removes every resource the setup created, per-user namespace
  docs/
    PERMISSIONS.md             Required permissions
    design/                    Architecture reference doc
    examples/                  FMAPI + DSPy snippets
    cost-attribution/          Cost tracking queries + policy
  tests/                       Unit + integration tests

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CoCo v2 -- AI Cohort Copilot for Healthcare RWD

Quick start

Prerequisites

Step 0: Preflight check (MANDATORY - every deployer runs this)

Deploy in 3 commands

What the setup job provisions

Minimal mode

Multi-user isolation

Teardown

Troubleshooting

Known issues

Architecture

Prompt management and optimization

Running tests locally

Running the app locally (mocked backend)

Configuration

Key docs

Knowledge base: demo vs production

Extending CoCo

Repo structure

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
config		config
docs		docs
evaluation		evaluation
notebooks		notebooks
scripts		scripts
src/coco		src/coco
tests		tests
.gitignore		.gitignore
AUTHORS		AUTHORS
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
app.yaml		app.yaml
databricks.yml		databricks.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

CoCo v2 -- AI Cohort Copilot for Healthcare RWD

Quick start

Prerequisites

Step 0: Preflight check (MANDATORY - every deployer runs this)

Deploy in 3 commands

What the setup job provisions

Minimal mode

Multi-user isolation

Teardown

Troubleshooting

Known issues

Architecture

Prompt management and optimization

Running tests locally

Running the app locally (mocked backend)

Configuration

Key docs

Knowledge base: demo vs production

Extending CoCo

Repo structure

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages