(WIP) Migrate DAIS agents to Apps AI Gateway#70
Open
djliden wants to merge 3 commits into
Open
Conversation
Reconciles the app-based agent migration on this branch with the gateway-routing intent from demo/dais-2026 (fe14776): the Refund and Complaint agents stay deployed as Databricks Apps AND route every LLM call through Unity AI Gateway. Verified end-to-end on the sandbox `all` target (25/25 tasks; streams produced 75k+ rows; live + 8-way concurrent calls all 200). Gateway routing (always-on, no model-serving fallback) - New AI_GATEWAY_ENDPOINT_NAME job param (all target), distinct from LLM_MODEL: LLM_MODEL stays the FM name for generators/support; the gateway endpoint name is sent verbatim as the request `model` to <host>/ai-gateway/mlflow/v1. Default databricks-claude-sonnet-4-5 so existing deploys keep working; override per governed endpoint. - Both apps/*/agent.py read GATEWAY_ENDPOINT_NAME from the env (with a transitional LLM_MODEL fallback); static app.yaml + the deploy stages inject it. Correctness - Complaint agent: configure DSPy once at import, apply a fresh-token LM per request via dspy.context() instead of dspy.configure() — avoids the AgentServer worker-thread thread-affinity error under concurrency. - Refund agent's ChatDatabricks(use_ai_gateway=True) already refreshes the bearer per request via DatabricksOpenAI's BearerAuth (no rebuild needed). App-name + warehouse drift fixes (--var catalog vs --params CATALOG) - New OPS_WAREHOUSE_NAME, REFUND_AGENT_APP_NAME, COMPLAINT_AGENT_APP_NAME params baked from ${var.catalog} at deploy time. - utils/agent_app_client.resolve_agent_app_name() prefers the baked param, re-sanitises, falls back to deriving; threaded through the stream jobs, eval stages, agent stages, and ops dashboard. Runbook + docs - SETUP.ipynb step 5 (create gateway endpoint; CAN_QUERY to each agent App SP, not account users) and MLflow.ipynb gateway demo beats, rewritten for the App architecture. - README + AGENTS: --var vs --params drift table and the gateway/LLM_MODEL distinction. Also folds in the app-server simplification already in the working tree (start_server.py uses AgentServer's native /responses; ops dashboard output extraction tightened) — both exercised by the end-to-end run. Co-authored-by: Isaac
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Validation
Not run