diff --git a/docs/architecture/adr-unify-foundry-agent-config.md b/docs/architecture/adr-unify-foundry-agent-config.md new file mode 100644 index 00000000000..c5ad604db37 --- /dev/null +++ b/docs/architecture/adr-unify-foundry-agent-config.md @@ -0,0 +1,140 @@ +# ADR: Unify Foundry agent configuration in `azure.yaml` + +**Status:** Proposed + +**Date:** 2026-06-08 + +### Context + +The `azure.ai.agents` extension currently models a hosted Foundry agent across +**three files**: `azure.yaml`, `agent.yaml` (an `AgentDefinition`), and +`agent.manifest.yaml` (an `AgentManifest`). This creates several problems: + +1. **Three files, overlapping data.** The agent name appears in three places, + container resources in two, the model deployment name in three. Two + templating syntaxes (`{{param}}` and `${ENV}`) overlap. Issue + [#7901](https://github.com/Azure/azure-dev/issues/7901) is a symptom — `init` + fails when run in a directory that already contains the manifest. +2. **Scope conflation.** `services..config` mixes agent-scoped concerns + (container resources, env, startup command) with project-scoped resources + (model deployments, connections, toolboxes). +3. **No sharing across agents.** Project-scoped resources are nested under a + single agent, so a second agent that wants the same toolbox must redeclare + it. There is nowhere to say "this toolbox belongs to the project." +4. **Divergent tooling.** The Foundry Toolkit for VS Code parses `agent.yaml` + directly while `azd ai agent` works off an `AgentManifest`; the experiences + diverge. +5. **The manifest layer carries no weight.** `agent.manifest.yaml` was designed + for an agent catalog that was never built. Its templating isn't paying for + itself, and abstracting concrete values with `${ENV_VAR}` blurs the line + between a definition and a template. + +We want `azure.yaml` to be the single source of truth: it describes **what +exists in the Foundry project** and **how the agent runs**; agent code defines +**what the agent does**; the azd environment carries deployment-target values; +Bicep is opt-in for developers who need full IaC reproducibility. + +### Decision + +Consolidate all hosted-agent configuration into `azure.yaml` and split it by +scope across two host kinds. + +1. **`host: azure.ai.agent`** describes the agent runtime only. Its `config:` + block maps to the Foundry create-agent API: `kind`, `description`, + `metadata`, `protocols`, `container.resources`, `env`, `startupCommand`. The + project-scoped fields (`deployments`, `resources`, `toolConnections`, + `toolboxes`, `connections`) are removed from this schema. +2. **`host: azure.ai.project`** (new) owns all project-scoped Foundry + data-plane state that can't be modeled in ARM/Bicep — today toolboxes, + connections, and model deployments; future additions (eval datasets, vector + indexes, knowledge sources) go here too (`additionalProperties: true`). It is + a service **without source code**: no `project:` directory, no build, no + artifact. One `azure.ai.project` service per Foundry project. +3. **Agents reference the project via the existing `uses:` field** + (`uses: [foundry-project]`). This is azd's existing inter-service dependency + primitive on `ServiceConfig` — it orders deploys and injects the + dependency's outputs as env vars. **No `dependsOn`, no core schema change.** +4. **Container mode reuses the existing `docker:` block** (`docker.path`, + `docker.remoteBuild`) to build the image from a Dockerfile. An + already-published image is referenced via the existing top-level `image:` + field on `ServiceConfig` (e.g. `myregistry.azurecr.io/my-agent:v1`) — no + build, no new field. No new top-level `dockerfile` field — that would + conflict with the existing `DockerProjectOptions` on `ServiceConfig`. +5. **Code-deploy mode uses a typed `runtime: { stack, version }` block**, + following the existing azure.yaml runtime precedent — not a bare string. +6. **Deploy source is explicit and exactly one of three.** `image:` present → + use the existing pre-built image; `docker:` present → build the image from a + Dockerfile; `runtime:` present → zip the project for code-deploy. Zero or + more than one present → validation error. No silent defaults. +7. **`container.resources` (`cpu`/`memory`) applies to every deploy mode**, not + just container mode. The Foundry create-agent API carries `cpu` and `memory` + for both code-deploy and container/image agents; when omitted the extension + applies defaults (today `cpu: "1"`, `memory: "2Gi"`). So `container.resources` + stays in the agent `config:` regardless of whether `image:`, `docker:`, or + `runtime:` is used. +8. **`${VAR}` env-var expansion** uses the same syntax `azure.yaml` already + supports. The extension performs the expansion inside `config:` blocks + (the core framework does not expand `config:`). +9. **`init` populates `config:`** the way it currently populates `agent.yaml`, + and stops emitting `agent.yaml` / `agent.manifest.yaml`. A deprecation window + keeps reading the old files (with a warning + telemetry) before removal. + +**Phasing (locked):** + +- **Phase 1** — Consolidate files. Retire `agent.yaml` / `agent.manifest.yaml`, + move their content into `azure.yaml`, and ship the new `azure.ai.project` + service target. Provisioning continues to use existing Bicep/ARM. +- **Phase 2** — Bicep-less provisioning: the extension carries built-in Bicep + templates and generates ARM in memory during `azd provision` (similar to + `azd compose`), with opt-in eject to Bicep on disk. Gated on a separate RFC + (not yet filed) and on a real multi-agent sample that needs shared resources. + +### Open Questions (to resolve before locking the schema) + +This ADR records the agreed direction; the following provision/deploy boundary +questions (raised in the issue's framework review) must be answered before the +`azure.ai.project` schema is finalized: + +1. **What exactly does `azure.ai.project` Deploy create vs. what stays in + provision/Bicep?** Toolboxes are clearly data-plane (Deploy). Model + deployments and connections are ambiguous — today deployments are serialized + to env vars and consumed by Bicep, and connection creation involves + ARM-level resources plus credential handling. We need a clear inventory of + the provision (ARM) vs. deploy (Foundry API) split. +2. **Idempotency and state.** On repeated `azd deploy`, does the project target + diff declared config against existing Foundry state and apply incremental + changes, or recreate from scratch? How are removals handled (a toolbox + deleted from config — does Deploy delete it from Foundry)? Relates to + [#8350](https://github.com/Azure/azure-dev/issues/8350) and + [#8349](https://github.com/Azure/azure-dev/issues/8349). +3. **Error recovery across the `uses` chain.** If `azure.ai.project` Deploy + fails partway (toolboxes created, a connection fails), what is the recovery + story for downstream agents that depend on it? +4. **Service-level `runtime:` typing.** The typed `runtime: { stack, version }` + block exists in the schema today only under the compose `appServiceResource` + definition, **not under `services.`**, and `ServiceConfig` has no + `Runtime` field. Phase 1 can capture it via `AdditionalProperties` (untyped, + not validated by the core schema), or core can promote `runtime` to a + first-class service field. Decide whether the typing/validation cost + justifies a core change. + +### Proposed `azure.yaml` shape + +The proposed end-state of `azure.yaml` is captured as standalone, illustrative +examples alongside this ADR, one self-contained file per scenario, under +[`azure-yaml-examples/`](./azure-yaml-examples/): + +| Scenario | File | Deploy source | +|---|---|---| +| New project, zip/code-deploy | [`code-deploy.yaml`](./azure-yaml-examples/code-deploy.yaml) | `runtime:` | +| New project, build from Dockerfile | [`container-build.yaml`](./azure-yaml-examples/container-build.yaml) | `docker:` | +| New project, existing pre-built image (ACR) | [`existing-image.yaml`](./azure-yaml-examples/existing-image.yaml) | `image:` | +| Existing project, reuse existing model | [`brownfield-existing-model.yaml`](./azure-yaml-examples/brownfield-existing-model.yaml) | `runtime:` | +| Existing project, create new model | [`brownfield-new-model.yaml`](./azure-yaml-examples/brownfield-new-model.yaml) | `runtime:` | + +Each file pairs a single `host: azure.ai.project` service (project-scoped +data-plane resources — model deployments, connections, toolboxes) with a +`host: azure.ai.agent` service that references it via `uses:`. The three deploy +sources (`runtime:` / `docker:` / `image:`) are mutually exclusive; exactly one +is required. The brownfield files use the `resourceId:` field introduced by the +Bicep-less RFC [#8065](https://github.com/Azure/azure-dev/issues/8065) (Phase 2). diff --git a/docs/architecture/azure-yaml-examples/brownfield-existing-model.yaml b/docs/architecture/azure-yaml-examples/brownfield-existing-model.yaml new file mode 100644 index 00000000000..87cdb3f57f0 --- /dev/null +++ b/docs/architecture/azure-yaml-examples/brownfield-existing-model.yaml @@ -0,0 +1,38 @@ +# Scenario: Brownfield — existing Foundry project, reuse an existing model. +# +# NOTE: the `resourceId:` field is introduced by the Bicep-less provisioning RFC +# (https://github.com/Azure/azure-dev/issues/8065), i.e. Phase 2. Its presence +# flips the project service to "existing project" mode: the Foundry project ARM +# resource is omitted from the synthesized Bicep and only referenced. Until +# Phase 2 lands, brownfield is handled via the USE_EXISTING_AI_PROJECT / +# AZURE_AI_PROJECT_ID azd environment variables. Illustrative, not runnable. + +name: my-foundry-agents + +services: + foundry-project: + host: azure.ai.project + resourceId: ${AZURE_AI_PROJECT_ID} # presence -> existing project mode + config: + # No deployments: — the model already exists in the project; the agent + # only references its deployment name below. + toolboxes: + agent-toolbox: + tools: + - { type: web_search } + + my-agent: + project: src/my-agent + host: azure.ai.agent + uses: [foundry-project] + runtime: + stack: python + version: "3.13" + config: + kind: hosted + description: A basic agent hosted by Foundry. + protocols: + - { protocol: responses, version: 1.0.0 } + env: + AZURE_AI_MODEL_DEPLOYMENT_NAME: gpt-4.1-mini # reference only + startupCommand: python main.py diff --git a/docs/architecture/azure-yaml-examples/brownfield-new-model.yaml b/docs/architecture/azure-yaml-examples/brownfield-new-model.yaml new file mode 100644 index 00000000000..87059044a58 --- /dev/null +++ b/docs/architecture/azure-yaml-examples/brownfield-new-model.yaml @@ -0,0 +1,44 @@ +# Scenario: Brownfield — existing Foundry project, create a NEW model on it. +# +# NOTE: the `resourceId:` field is introduced by the Bicep-less provisioning RFC +# (https://github.com/Azure/azure-dev/issues/8065), i.e. Phase 2. `resourceId:` +# references the existing project (not created), but the deployment declared +# under `config:` is still synthesized (ARM-backed) against that project. +# Illustrative end-state, not a runnable template. + +name: my-foundry-agents + +services: + foundry-project: + host: azure.ai.project + resourceId: ${AZURE_AI_PROJECT_ID} # existing project (not created) + config: + deployments: # but this model IS created by azd + - name: gpt-4.1-mini2 + model: + format: OpenAI + name: gpt-4.1-mini + version: "2025-04-14" + sku: + name: Standard + capacity: 10 + toolboxes: + agent-toolbox: + tools: + - { type: web_search } + + my-agent: + project: src/my-agent + host: azure.ai.agent + uses: [foundry-project] + runtime: + stack: python + version: "3.13" + config: + kind: hosted + description: A basic agent hosted by Foundry. + protocols: + - { protocol: responses, version: 1.0.0 } + env: + AZURE_AI_MODEL_DEPLOYMENT_NAME: gpt-4.1-mini2 + startupCommand: python main.py diff --git a/docs/architecture/azure-yaml-examples/code-deploy.yaml b/docs/architecture/azure-yaml-examples/code-deploy.yaml new file mode 100644 index 00000000000..7e1faa07924 --- /dev/null +++ b/docs/architecture/azure-yaml-examples/code-deploy.yaml @@ -0,0 +1,59 @@ +# Scenario: Code-deploy (zip) — new Foundry project. +# +# A `runtime:` block is present, so azd zips the project directory and Foundry +# schedules it on the appropriate managed base image. Illustrative end-state, +# not a runnable template. + +name: my-foundry-agents + +services: + # Project-scoped service: all Foundry data-plane resources in one place. + # No source directory, no build artifact — pure declarative state. + foundry-project: + host: azure.ai.project + config: + deployments: + - name: gpt-4.1-mini + model: + format: OpenAI + name: gpt-4.1-mini + version: "2025-04-14" + sku: + name: GlobalBatch + capacity: 10 + connections: + - name: github-mcp-conn + category: CustomKeys + target: https://api.githubcopilot.com/mcp + authType: ApiKey + toolboxes: + agent-toolbox: + tools: + - { type: web_search } + - { type: code_interpreter } + - { type: mcp, connection: ${GITHUB_MCP_CONN} } + + # Agent-scoped service: the runtime. References the project via `uses`. + my-agent: + project: src/my-agent + host: azure.ai.agent + uses: [foundry-project] + # Deploy source: runtime present -> zip/code-deploy. + runtime: + stack: python + version: "3.13" + config: + kind: hosted + description: A basic agent hosted by Foundry. + protocols: + - { protocol: responses, version: 1.0.0 } + # cpu/memory apply to code-deploy too; defaults cpu:"1", memory:"2Gi". + container: + resources: { cpu: "0.5", memory: 1Gi } + env: + AZURE_AI_MODEL_DEPLOYMENT_NAME: gpt-4.1-mini + startupCommand: python main.py + +# No infra: block needed by default. azd provision uses built-in Bicep templates +# internally (like azd compose). Opt in to Bicep on disk via `azd infra gen` or +# equivalent (separate RFC: https://github.com/Azure/azure-dev/issues/8065). diff --git a/docs/architecture/azure-yaml-examples/container-build.yaml b/docs/architecture/azure-yaml-examples/container-build.yaml new file mode 100644 index 00000000000..26da48298ca --- /dev/null +++ b/docs/architecture/azure-yaml-examples/container-build.yaml @@ -0,0 +1,48 @@ +# Scenario: Container build from Dockerfile — new Foundry project. +# +# A `docker:` block is present, so azd builds and pushes the image via the +# existing docker.path / docker.remoteBuild fields. Illustrative end-state, not +# a runnable template. + +name: my-foundry-agents + +services: + foundry-project: + host: azure.ai.project + config: + deployments: + - name: gpt-4.1-mini + model: + format: OpenAI + name: gpt-4.1-mini + version: "2025-04-14" + sku: + name: GlobalBatch + capacity: 10 + toolboxes: + agent-toolbox: + tools: + - { type: web_search } + - { type: code_interpreter } + + my-agent: + project: src/my-agent + host: azure.ai.agent + uses: [foundry-project] + # Deploy source: docker present -> build the image from a Dockerfile. + docker: + path: Dockerfile + remoteBuild: true + config: + kind: hosted + description: A basic agent hosted by Foundry. + protocols: + - { protocol: responses, version: 1.0.0 } + container: + resources: { cpu: "0.25", memory: 0.5Gi } + env: + AZURE_AI_MODEL_DEPLOYMENT_NAME: gpt-4.1-mini + +# No infra: block needed by default. azd provision uses built-in Bicep templates +# internally (like azd compose). Opt in to Bicep on disk via `azd infra gen` or +# equivalent (separate RFC: https://github.com/Azure/azure-dev/issues/8065). diff --git a/docs/architecture/azure-yaml-examples/existing-image.yaml b/docs/architecture/azure-yaml-examples/existing-image.yaml new file mode 100644 index 00000000000..4c7d38eb2fe --- /dev/null +++ b/docs/architecture/azure-yaml-examples/existing-image.yaml @@ -0,0 +1,45 @@ +# Scenario: Existing pre-built image (e.g. ACR) — new Foundry project. +# +# An `image:` reference is present (no build, no zip). azd skips package/publish +# and Foundry pulls the already-published image directly. The top-level `image:` +# field is the existing ServiceConfig field (also used by containerapp), and it +# supports ${VAR} expansion. Illustrative end-state, not a runnable template. + +name: my-foundry-agents + +services: + foundry-project: + host: azure.ai.project + config: + deployments: + - name: gpt-4.1-mini + model: + format: OpenAI + name: gpt-4.1-mini + version: "2025-04-14" + sku: + name: GlobalBatch + capacity: 10 + toolboxes: + agent-toolbox: + tools: + - { type: web_search } + + my-agent: + host: azure.ai.agent + uses: [foundry-project] + # Deploy source: image present -> use the existing pre-built image as-is. + image: myregistry.azurecr.io/my-agent:v1 + config: + kind: hosted + description: A basic agent hosted by Foundry. + protocols: + - { protocol: responses, version: 1.0.0 } + container: + resources: { cpu: "0.25", memory: 0.5Gi } + env: + AZURE_AI_MODEL_DEPLOYMENT_NAME: gpt-4.1-mini + +# No infra: block needed by default. azd provision uses built-in Bicep templates +# internally (like azd compose). Opt in to Bicep on disk via `azd infra gen` or +# equivalent (separate RFC: https://github.com/Azure/azure-dev/issues/8065).