diff --git a/.claude-plugin/marketplace.json b/.claude-plugin/marketplace.json index a6414162..b28acd13 100644 --- a/.claude-plugin/marketplace.json +++ b/.claude-plugin/marketplace.json @@ -13,7 +13,7 @@ "name": "agentops-accelerator", "source": "../../plugins/agentops", "description": "Copilot agent skills for running standardized evaluation workflows with AgentOps Toolkit and Microsoft Foundry agents.", - "version": "0.3.20", + "version": "0.3.21", "keywords": [ "agentops", "evaluation", diff --git a/.github/plugin/marketplace.json b/.github/plugin/marketplace.json index a6414162..b28acd13 100644 --- a/.github/plugin/marketplace.json +++ b/.github/plugin/marketplace.json @@ -13,7 +13,7 @@ "name": "agentops-accelerator", "source": "../../plugins/agentops", "description": "Copilot agent skills for running standardized evaluation workflows with AgentOps Toolkit and Microsoft Foundry agents.", - "version": "0.3.20", + "version": "0.3.21", "keywords": [ "agentops", "evaluation", diff --git a/CHANGELOG.md b/CHANGELOG.md index 590823e2..7c786b00 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -5,6 +5,29 @@ This format follows [Keep a Changelog](https://keepachangelog.com/) and adheres ## [Unreleased] +## [0.3.21] - 2026-06-12 + +### Changed +- **`agentops-workflow` skill now verifies OIDC tenant, branch upstream + tracking, and trace-sampling RBAC before wiring CI.** The packaged skill + instructs agents to treat `AZURE_TENANT_ID` as the tenant that owns the Entra + app registration / federated credential (not the subscription tenant), to set + and verify the local trunk branch upstream (`git branch -vv` must show + `[origin/main]`), and to grant **Reader** on Application Insights (and its + backing Log Analytics workspace) to the Foundry project managed identity for + trace-to-dataset flows. + +### Docs +- **Prompt-agent, hosted-agent, and end-to-end tutorials hardened end to end.** + OIDC setup calls out the app-registration tenant; observability steps require + App Insights Reader for trace sampling and cover workspace-backed App Insights; + the telemetry step queries `gen_ai.evaluation` results from `AppEvents` + (table-safe, no hard-coded dates); the evidence step explains expected + production-telemetry criticals and where the Doctor thresholds live + (`.agentops/agent.yaml`); and the Cockpit step is now a concrete walkthrough + (exact `http://127.0.0.1:8090` URL, read-only note, per-section checks, and + azd-env switching instead of a non-existent URL switch). + ## [0.3.20] - 2026-06-10 ### Changed diff --git a/docs/ci-github-actions.md b/docs/ci-github-actions.md index 0274ad55..0dde1f91 100644 --- a/docs/ci-github-actions.md +++ b/docs/ci-github-actions.md @@ -119,6 +119,12 @@ In Settings → Secrets and variables → Actions → **Variables**, add: | `AZURE_OPENAI_DEPLOYMENT` | Model deployment used by local evaluators and AgentOps cloud eval judges | | `APPLICATIONINSIGHTS_CONNECTION_STRING` | Optional fallback when the Foundry project's App Insights connection cannot be auto-discovered | +Set `AZURE_TENANT_ID` to the tenant that owns the app registration / federated +credential used by `AZURE_CLIENT_ID`. Do not use a subscription +`managedByTenants` tenant id unless the app registration and federated +credential are also visible in that tenant; otherwise `azure/login` can fail at +token issuance before AgentOps starts. + Then on the Azure side, configure Workload Identity Federation (federated credentials) on the app registration so it can be assumed from GitHub Actions runs. See diff --git a/docs/tutorial-end-to-end.md b/docs/tutorial-end-to-end.md index e5475a1d..e9b4bb73 100644 --- a/docs/tutorial-end-to-end.md +++ b/docs/tutorial-end-to-end.md @@ -122,7 +122,7 @@ prompts. | Azure CLI is installed and `az login` succeeds with the tenant that owns the Foundry project. | AgentOps, Foundry SDK calls, Doctor, Cockpit, and CI setup all need the same Azure identity context. | | You have the Foundry project endpoint and can create or publish one Travel Agent target. | The target is either `travel-agent:` for prompt agents or an HTTP endpoint for hosted agents. | | You have a chat-capable Azure OpenAI deployment, for example `gpt-4o-mini`. | Local evals and CI variables need a judge model for evaluator calls. | -| Application Insights is connected to the Foundry project or agent runtime, or you can create/attach it. | Foundry Traces, Operate metrics/Ask AI when available, Azure Monitor, Doctor, Cockpit, and evidence links need telemetry. | +| Application Insights is connected to the Foundry project or agent runtime, or you can create/attach it. For Foundry trace-to-dataset flows, you can also grant Reader on App Insights and its backing Log Analytics workspace to the Foundry project managed identity. | Foundry Traces, Operate metrics/Ask AI when available, trace sampling, Azure Monitor, Doctor, Cockpit, and evidence links need telemetry. | | You can deploy or expose any hosted endpoint that CI will call. | `localhost` works for local eval; remote CI needs a reachable HTTPS URL. | | You can push to the tutorial GitHub repository and run GitHub Actions or Azure Pipelines. | PR and environment workflows only run after the repo is published. | | GitHub CLI is authenticated with `gh auth login` if you use GitHub PR commands while testing CI. | The regression and release-gate steps are smoother when repo, PR, and Actions access are already confirmed. | @@ -524,8 +524,9 @@ environment variable or equivalent Azure DevOps pipeline variable, verify the OIDC principal has **both** Foundry User access on the dev Foundry project **and** Cognitive Services OpenAI User access on the underlying Azure AI Services account that hosts the evaluator model (both are required — without -the OpenAI User role, every cloud eval metric returns null), and show me the -plan before changing GitHub or Azure. +the OpenAI User role, every cloud eval metric returns null), verify +AZURE_TENANT_ID is the tenant that owns the Entra app registration and its +federated credential, and show me the plan before changing GitHub or Azure. ``` That value is not an `agentops init` answer. It tells the Foundry cloud eval @@ -877,6 +878,13 @@ may not have live traffic, scheduled workflows may not have history, and trace regression candidates may not exist yet. That is useful tutorial feedback, not a failure of Doctor. +If production telemetry *does* carry enough live traffic to trip latency or +error criticals, those are honest signals — not tutorial noise. The thresholds +that decide critical-vs-warning live in `.agentops/agent.yaml` +(`checks.latency.p95_threshold_seconds`, `checks.errors.rate_threshold`) and are +separate from the `agentops.yaml` eval-gate thresholds; raise them only if you +deliberately want to relax the production gate for a demo. + ## 10. Run Foundry red-team scans Red-team scans are a Foundry capability. Run them from Foundry Observability / @@ -952,16 +960,31 @@ reviews and accepts them. agentops cockpit --workspace . ``` -Use Cockpit as the local command center: +Cockpit starts a read-only local web server and prints +`http://127.0.0.1:8090`. Open that URL in your browser; press `Ctrl+C` in +the terminal to stop it. It reflects the **active azd environment** +(`sandbox`, from `defaultEnvironment` in `.azure/config.json`) — there is no +URL switch. To inspect `dev`, stop Cockpit, point the active env at `dev` +(set `defaultEnvironment: dev` in `.azure/config.json`, or export +`AZURE_ENV_NAME=dev`), then rerun the command. -- Foundry connection and deep links; -- Microsoft Foundry eval or AgentOps local eval gate status; -- Doctor findings; -- release evidence; -- local eval history; -- production telemetry snapshot; -- CI/CD workflow status; -- next actions. +Read the page top to bottom and confirm each card: + +| Section | What to confirm | +|---|---| +| **Foundry connection** | The Foundry project and tenant resolve, and the agent identity matches your `agentops.yaml` target. | +| **Open in Foundry** | The deep-links open your project in the correct tenant. | +| **Observability readiness** | Trace setup / sampling status from the latest Doctor analysis. | +| **AgentOps Doctor** | The same finding rollup from the Doctor / evidence-pack step (criticals first, then warnings). | +| **Local eval history** | Your `agentops eval run` baseline and regression reruns appear. | +| **Quality metrics** | Evaluator score trends from your runs. | +| **Production telemetry** | App Insights latency / error snapshot (or a clear "no live traffic" state in a fresh workspace). | +| **CI/CD Pipelines** | The workflows you generated are listed. | +| **Next actions** | The prioritized backlog Cockpit derives from the open findings. | + +Cockpit does not run checks or mutate anything — it renders the latest +`results.json`, Doctor report, and evidence pack you already produced, and +links out to Foundry / Azure Monitor for live runtime data. ## Completion checklist diff --git a/docs/tutorial-hosted-agent-quickstart.md b/docs/tutorial-hosted-agent-quickstart.md index d68b9063..f51461bd 100644 --- a/docs/tutorial-hosted-agent-quickstart.md +++ b/docs/tutorial-hosted-agent-quickstart.md @@ -786,12 +786,13 @@ hosted-agent project. Create or connect the GitHub repo if needed, set AGENTOPS_AGENT_ENDPOINT in the `dev` environment to the deployed HTTPS endpoint, wire Azure OIDC and required -Actions variables in the `dev` environment, and set any required endpoint token -as a secret. The PR gate uses --doctor-gate critical so the workflow blocks on -critical Doctor findings (regressions or other strict signals). Do not add -scheduled Doctor, QA, or production workflows yet. Show me the plan before -changing GitHub or Azure, and call out anything that needs owner/admin -permission. +Actions variables in the `dev` environment, verify AZURE_TENANT_ID is the tenant +that owns the Entra app registration and its federated credential, and set any +required endpoint token as a secret. The PR gate uses --doctor-gate critical so +the workflow blocks on critical Doctor findings (regressions or other strict +signals). Do not add scheduled Doctor, QA, or production workflows yet. Show me +the plan before changing GitHub or Azure, and call out anything that needs +owner/admin permission. ``` Open both Doctor outputs. The report explains the findings; the evidence pack @@ -800,6 +801,13 @@ In a fresh tutorial workspace, warnings about production telemetry, CI history, regression history are expected and useful: they show what remains before this local endpoint becomes an operated service. +If production telemetry *does* carry enough live traffic to trip latency or +error criticals, those are honest signals. The thresholds that decide +critical-vs-warning live in `.agentops/agent.yaml` +(`checks.latency.p95_threshold_seconds`, `checks.errors.rate_threshold`) and are +separate from the `agentops.yaml` eval-gate thresholds; raise them only if you +deliberately want to relax the production gate for a demo. + If you later want a separate cadence outside PRs, generate the optional Doctor workflow with `agentops workflow generate --kinds doctor --force`. @@ -816,8 +824,32 @@ look self-contained inside AgentOps. agentops cockpit --workspace . ``` -Cockpit shows the endpoint readiness, eval history, Doctor findings, telemetry -status, release evidence, CI/CD, and next actions. +Cockpit starts a read-only local web server and prints +`http://127.0.0.1:8090` (this is the Cockpit UI port, not your agent's +`:8000`). Open that URL in your browser; press `Ctrl+C` in the terminal to +stop it. It reflects the **active azd environment** (`sandbox`, from +`defaultEnvironment` in `.azure/config.json`) — there is no URL switch. To +inspect `dev`, stop Cockpit, point the active env at `dev` (set +`defaultEnvironment: dev` in `.azure/config.json`, or export +`AZURE_ENV_NAME=dev`), then rerun the command. + +Read the page top to bottom and confirm each card: + +| Section | What to confirm | +|---|---| +| **Foundry connection** | The Foundry project / tenant resolve, and the agent is your hosted endpoint URL. | +| **Open in Foundry** | The deep-links open your project in the correct tenant. | +| **Observability readiness** | Trace setup / sampling status from the latest Doctor analysis. | +| **AgentOps Doctor** | The same finding rollup from the Doctor / evidence-pack step (criticals first, then warnings). | +| **Local eval history** | Your `agentops eval run` baseline, regressed, and fixed reruns appear. | +| **Quality metrics** | Evaluator score trends from your runs. | +| **Production telemetry** | App Insights latency / error snapshot for the `travel-agent.chat` operation (or a "no live traffic" state in a fresh workspace). | +| **CI/CD Pipelines** | The PR and dev deploy workflows you generated are listed. | +| **Next actions** | The prioritized backlog Cockpit derives from the open findings. | + +Cockpit does not run checks or mutate anything — it renders the latest +`results.json`, Doctor report, and evidence pack you already produced, and +links out to Foundry / Azure Monitor for live runtime data. ## Success criteria diff --git a/docs/tutorial-prompt-agent-quickstart.md b/docs/tutorial-prompt-agent-quickstart.md index 38e0f7a5..905975f4 100644 --- a/docs/tutorial-prompt-agent-quickstart.md +++ b/docs/tutorial-prompt-agent-quickstart.md @@ -57,7 +57,7 @@ permission prompts. | You can create **two** Foundry projects in the same Azure subscription (or have two existing projects you can use). | The tutorial uses a sandbox project for authoring and experimentation plus a shared dev project for the PR gate. You only need to publish the agent in sandbox — CI auto-bootstraps it in dev (and later qa / prod). | | You can publish a prompt agent in the **sandbox** Foundry project. | The tutorial seeds `travel-agent:2` only in sandbox (Foundry portal typically numbers the first published version `:2`, not `:1`). Dev / qa / prod start empty; the prompt-agent deploy workflow creates the first version in those projects automatically using `prompt_agent_bootstrap` defaults plus `prompt_file`. | | The **same model deployment name** (for example `gpt-4o-mini`) exists in every Foundry project you plan to deploy to. | `prompt_agent_bootstrap.model` is a single value reused for every environment. If dev does not have that deployment, the first auto-bootstrap fails. | -| You can create or attach Application Insights for at least the dev Foundry project. | Foundry Traces, the Operate dashboard, Doctor, and Cockpit need telemetry to tell the observability story. Sandbox observability is optional. | +| You can create or attach Application Insights for at least the dev Foundry project, and can grant Reader to the dev project's managed identity on that App Insights resource and its backing Log Analytics workspace when workspace-based. | Foundry Traces, the Operate dashboard, trace-to-dataset generation, Doctor, and Cockpit need telemetry to tell the observability story. Sandbox observability is optional. | | You can push to the tutorial GitHub repository and run GitHub Actions. | The PR gate only runs after the repo is pushed. | | GitHub CLI is authenticated with `gh auth login` if you use the PR commands in this tutorial. | The regression step opens PRs and sends the reader directly to the workflow run. | | You can create a GitHub environment named `dev` and add Actions variables/secrets. | The generated workflow uses that environment for Azure auth and the dev Foundry project endpoint. | @@ -336,6 +336,12 @@ For each project, please: uses a single bootstrap model value for every environment. - Attach or create an Application Insights resource for telemetry, starting with the dev project. +- Grant or verify **Reader** on that Application Insights resource to the + **managed identity of the `travel-agent-dev` Foundry project**. Foundry's + trace-to-dataset flow runs as the project identity when it reads traces; the + Operate dashboard may still render for my signed-in user even when this + project identity permission is missing. If Application Insights is + workspace-based, also grant Reader on the backing Log Analytics workspace. - Grant or verify `Foundry User` access for my signed-in user on the parent Foundry / AI Services account so I can build agents in the Foundry UI. Some portal screens still call this role `Azure AI User`. @@ -610,7 +616,7 @@ build the prompt agent. One of two things will be true: | What you see | What it means | What to do | |---|---|---| -| An `appinsights` row with category `AppInsights` | The resource exists and is connected to the dev project. Auto-discovery will pick it up. | **You are done.** Skip the rest of this subsection and continue to section 9. | +| An `appinsights` row with category `AppInsights` | The resource exists and is connected to the dev project. Auto-discovery will pick it up. | Continue with the trace-to-dataset access check below. | | No App Insights row in **Connected resources** | The resource was not connected in step 3. | Click **Add connection**, connect or create an Application Insights resource for the dev project, or paste a connection string manually. | **If Connected resources does not show App Insights**, the fastest fix is @@ -620,6 +626,28 @@ in the same resource group as the dev project. Once an `appinsights` row appears under **Connected resources**, you can again skip the manual env variable — auto-discovery will pick it up. +**Also verify trace-to-dataset access now.** For the step 18 +trace-sampling flow, the **managed identity of the `travel-agent-dev` +Foundry project** needs **Reader** on the connected Application Insights +resource. If the App Insights component is workspace-based, grant the same +Reader role on the backing Log Analytics workspace too. This is separate from +your signed-in user's portal access and separate from GitHub OIDC. If you +connected App Insights manually, open the Application Insights resource in +Azure Portal → **Access control (IAM)** and add: + +| Field | Value | +|---|---| +| **Role** | Reader | +| **Assign access to** | Managed identity | +| **Managed identity** | `travel-agent-dev` Foundry project | + +Then open the Application Insights resource → **Properties** and check +**Workspace Resource ID**. If it points to a Log Analytics workspace, open that +workspace and repeat the same **Reader** assignment for the `travel-agent-dev` +managed identity. + +Wait a few minutes for RBAC propagation before creating a dataset from traces. + **Only if you specifically want to override which resource telemetry goes to** (advanced case, e.g. you have a dedicated observability resource group), grab the connection string and paste it into @@ -1080,11 +1108,13 @@ You have two ways to wire up ASSERT — pick whichever fits your workflow. If you installed the AgentOps coding-agent skills in step 4 (`agentops skills install`), the `agentops-governance` skill knows the full -recipe. In Copilot Chat (or Claude Code), say: +recipe. In Copilot Chat (or Claude Code), paste this prompt: -> Use the `agentops-governance` skill to scaffold ASSERT for this workspace. -> Target the `gpt-4o-mini` deployment, cover prompt_injection / pii_leak / -> jailbreak, 5 cases per dimension. +```text +Use the agentops-governance skill to scaffold ASSERT for this workspace. +Target the gpt-4o-mini deployment, cover prompt_injection / pii_leak / +jailbreak, 5 cases per dimension. +``` Copilot will install `assert-ai`, create `./assert/eval_config.yaml`, and append the `assert:` block to `agentops.yaml` for you. Skip to **Run it @@ -1145,8 +1175,13 @@ Same pattern: Copilot can do it, or you can run the commands yourself. #### Option A — Ask Copilot -> Use the `agentops-governance` skill to scaffold the Red Team runner. -> Target `gpt-4o-mini`, fail when attack success rate exceeds 20%. +Paste this prompt into Copilot Chat (or Claude Code): + +```text +Use the agentops-governance skill to scaffold the Red Team runner for this +workspace. Target the gpt-4o-mini deployment, fail when attack success rate +exceeds 20%. +``` #### Option B — Run the commands yourself @@ -1313,14 +1348,17 @@ project. This may be a brand-new folder with no Git repo or GitHub remote yet. Keep the scope to the PR gate and dev deploy only: create or connect the -GitHub repo if needed, wire Azure OIDC and required Actions -variables/secrets, create only the `dev` environment, verify the OIDC -principal has **both** Foundry User access on the **dev** Foundry project -**and** Cognitive Services OpenAI User on the underlying Azure AI Services -account that hosts the evaluator model (both roles are required — without -the OpenAI User role, the Foundry cloud graders fail with a 401 and every -metric comes back null), and do not set up `qa`, `production`, scheduled -Doctor, or hosted deployment workflows yet. +GitHub repo if needed, ensure local `main` tracks `origin/main` after the +first push/connect, wire Azure OIDC and required Actions variables/secrets, +create only the `dev` environment, verify the OIDC principal has **both** +Foundry User access on the **dev** Foundry project **and** Cognitive Services +OpenAI User on the underlying Azure AI Services account that hosts the +evaluator model (both roles are required — without the OpenAI User role, the +Foundry cloud graders fail with a 401 and every metric comes back null), +verify `AZURE_TENANT_ID` is the tenant that owns the Entra app registration +and its federated credential (not just a subscription `managedByTenants` +value), and do not set up `qa`, `production`, scheduled Doctor, or hosted +deployment workflows yet. I am using trunk-based development with `main` as both my trunk and dev branch. The generator's stock dev-deploy trigger is `push: branches: @@ -1340,12 +1378,20 @@ that needs owner/admin permission. The workflow skill will normally do the following, but call out anything it skips: -- Create/connect the GitHub remote. +- Create/connect the GitHub remote and ensure local `main` tracks + `origin/main` (`git branch -vv` should show `[origin/main]`). If the skill + skips this, run `git branch --set-upstream-to=origin/main main` before the + later tutorial steps that use `git pull`. - Create the `dev` GitHub environment. - Configure OIDC federated credentials between GitHub and Entra ID. - Set Actions variables `AZURE_TENANT_ID`, `AZURE_SUBSCRIPTION_ID`, `AZURE_CLIENT_ID`, `AZURE_AI_FOUNDRY_PROJECT_ENDPOINT` (the dev endpoint), and `APPLICATIONINSIGHTS_CONNECTION_STRING` if available. +- Verify `AZURE_TENANT_ID` against the app registration / federated + credential tenant before the first run. A subscription can be associated + with another tenant through `managedByTenants`; do not copy that tenant id + into the GitHub environment unless the app registration and federated + credential are actually visible there. - **Rewrite the dev deploy trigger to `main`.** The generator emits the stock GitFlow defaults (`pull_request: branches: [develop, "release/**", main]` on `agentops-pr.yml`, `push: branches: [develop]` on @@ -1406,7 +1452,8 @@ If you want to wait on the first PR-workflow verification run from the terminal instead of the Actions UI: ```powershell -$runId = gh run list --workflow agentops-pr.yml --branch main --limit 1 --json databaseId --jq '.[0].databaseId' +$prBranch = gh pr view --json headRefName --jq '.headRefName' +$runId = gh run list --workflow agentops-pr.yml --branch $prBranch --event pull_request --limit 1 --json databaseId --jq '.[0].databaseId' gh run view $runId --web gh run watch $runId --exit-status ``` @@ -1574,9 +1621,9 @@ thresholds are loose enough that a regression slips through, Doctor still catches it. ```powershell -git switch main -git pull -git switch -c feature/regress-travel-agent +git fetch origin +$branch = "feature/regress-travel-agent-step16-$((Get-Date).ToString('yyyyMMddHHmmss'))" +git switch -c $branch origin/main ``` Edit `.agentops/prompts/travel-agent.md` to this intentionally vague @@ -1592,8 +1639,8 @@ Commit and push: ```powershell git add .agentops\prompts\travel-agent.md git commit -m "Intentional regression: vague travel prompt" -git push -u origin feature/regress-travel-agent -gh pr create --base main --head feature/regress-travel-agent --title "Test AgentOps regression gate" --body "Evaluates an intentionally regressed travel-agent prompt." +git push -u origin $branch +gh pr create --base main --head $branch --title "Test AgentOps regression gate" --body "Evaluates an intentionally regressed travel-agent prompt." ``` Watch the PR check: @@ -1673,34 +1720,111 @@ regressions that thresholds alone miss, and the merge promotes through the deploy workflow. None of those gates require the developer to remember to look at a dashboard. -## 18. Brief observability checkout (Foundry side) +## 18. Observability checkout: traces into continuous evaluation -The Foundry side of the loop is worth a short tour, even though it is -not what AgentOps owns. This is the "Foundry tells you what happened" -side of the conversation. +Take a short tour of the Foundry runtime view, then turn the same production +signal into evaluation coverage. This is the bridge from "what happened in +real traces" to "what should keep getting evaluated." 1. Open the `travel-agent-dev` project in the Foundry portal. 2. Open the `travel-agent` agent and switch to the **Traces** tab. If Application Insights is not yet connected, connect or create the resource now. -3. Find the most recent eval run in **Conversations** or +3. Find a recent eval or playground run in **Conversations** or **Responses** and click the **Trace ID**. Inspect spans, latency, - model call, and the input/output panes. -4. Switch to **Operate → Overview** and use **Ask AI** for a - dashboard-level summary. Example: + model calls, and the input/output panes. +4. Switch to **Operate → Overview** and use **Ask AI** for a dashboard-level + summary. Example: ```text Help me identify any issues or anomalies in my agent metrics for the last 24 hours. ``` -5. Optionally, sample the same operation through Application Insights - Logs (KQL) for the engineer-level view. +5. Now use the traces as evaluation signal. In the project, open + **Data Generation**, then select **Create dataset → From traces**. +6. In **Create dataset**, configure: -This is the observability surface AgentOps does **not** replace. Doctor -will check whether this telemetry is wired (App Insights connection -string, recent traces, etc.) and include it in the readiness call, but -the runtime view itself lives in Foundry. + | Field | Value | + |---|---| + | **Dataset usage** | `Evaluation` | + | **Name** | `travel-agent-traces-step18` | + | **Agent** | `travel-agent` | + | **Date range** | Last day or last 7 days | + | **Maximum samples** | At least `15` | + + Leave **Intelligent sampling** enabled when the time-range UI shows it. + Foundry will filter noisy traces, deduplicate near-identical prompts, and + select a representative sample instead of evaluating every request. + + If the dialog shows **Setup incomplete: Assign the Foundry project's managed + identity the Reader role on Application Insights**, click **Resolve** if you + have permission. Otherwise ask an Azure admin to grant **Reader** on the + connected Application Insights resource to the **managed identity of the + `travel-agent-dev` Foundry project**. If Application Insights is + workspace-based, grant Reader on its backing Log Analytics workspace too. + Then wait a few minutes for RBAC to propagate and reopen the dialog. +7. Select **Create** and track the background job on the **Data Generation** + tab. When it finishes, open the generated dataset from the **Data** tab and + preview the rows. This is the evaluation-ready sample created from real + traces. +8. If the portal offers to start an evaluation from the completed job, open it + and confirm the generated dataset is selected. You do not need to finish a + new eval for this tutorial step; the point is to see how Foundry turns + traced behavior into a dataset you can evaluate continuously. + +> **Public preview.** Trace-to-dataset generation and intelligent sampling are +> currently preview Foundry features. If your region or project does not show +> **Create dataset → From traces**, continue with step 19 and treat this section +> as a product tour. + +Optional KQL deep dive: query the evaluation metrics Foundry emits as +`gen_ai.evaluation.result` events. These land in the **`AppEvents`** table, which +only resolves in the **Log Analytics workspace** that backs your Application +Insights resource — not in the App Insights *scoped* Logs blade. Open +**Monitor → Logs** (or the connected Log Analytics workspace), set **Time range** +to **Set in query** (the query below uses `ago(30d)`), and run: + +```kusto +AppEvents +| where TimeGenerated > ago(30d) +| where Name == "gen_ai.evaluation.result" +| extend p = parse_json(tostring(Properties)) +| extend Conversation = tostring(p["gen_ai.conversation.id"]), + Agent = tostring(p["gen_ai.agent.id"]), + Evaluator = tostring(p["gen_ai.evaluation.name"]), + Score = todouble(p["gen_ai.evaluation.score.value"]) +| summarize Time = max(TimeGenerated), AvgScore = round(avg(Score), 2), + Metrics = make_bag(pack(Evaluator, Score)) + by Conversation, Agent +| order by Time desc +| take 20 +``` + +Each row is one conversation with its average score and a `Metrics` bag holding +every evaluator score side by side. For a per-day rollup of average scores by +evaluator, pivot instead: + +```kusto +AppEvents +| where TimeGenerated > ago(30d) +| where Name == "gen_ai.evaluation.result" +| extend p = parse_json(tostring(Properties)) +| extend Evaluator = tostring(p["gen_ai.evaluation.name"]), + Score = todouble(p["gen_ai.evaluation.score.value"]) +| summarize AvgScore = round(avg(Score), 2) by Day = bin(TimeGenerated, 1d), Evaluator +| evaluate pivot(Evaluator, any(AvgScore)) +| order by Day desc +``` + +> **Empty results?** Telemetry can be sparse, so `Last 24 hours` / `Last 7 days` +> may return nothing. Widen the time range (`ago(30d)` with **Set in query**, or +> **Last 30 days**) and confirm you are in the **Log Analytics workspace**, where +> `AppEvents` resolves. + +Foundry gives you the runtime trace view and trace-sampled evaluation datasets; +AgentOps Doctor checks that telemetry and release evidence are wired into the +readiness story. ## 19. Sync local evidence and create the release evidence pack @@ -1737,6 +1861,23 @@ deploys, explicit thresholds, or red-team/governance evidence. Treat those as th hardening backlog. The eval gates and the dev deploy loop are production-ready. +You will likely also see **two critical findings** here, and that is expected +in this tutorial: + +| Critical finding | Why it shows up | +|---|---| +| `latency.p95_production` | App Insights p95 latency exceeds the 5s default (a prompt agent reasoning over each request runs ~9–12s). | +| `errors.production_rate` | Your own tutorial traffic (including the earlier `az login` / token retries) pushed the production error rate above the 5% default. | + +These criticals come from **real production telemetry of your own test +traffic**, not from the release candidate's eval gate (which passed). They are +honest signals: a real release would investigate latency and errors before +promoting. For the tutorial they simply demonstrate that Doctor reads live +runtime data. If you want to relax them for a demo, raise the Doctor thresholds +in `.agentops/agent.yaml` (`checks.latency.p95_threshold_seconds` and +`checks.errors.rate_threshold`) — these are separate from the `agentops.yaml` +eval-gate thresholds. + If you want to show the governance evidence path in the video, keep it as a short optional callout: @@ -1756,10 +1897,31 @@ Guardrail setup, and red-team scans still happen in their owning tools. agentops cockpit --workspace . ``` -Open the local URL printed by the command. The Cockpit should show -Foundry connection (sandbox by default; you can switch in the URL), -AgentOps cloud-eval readiness, Doctor findings, release evidence, the -PR and dev deploy CI pipelines, and next actions. +Cockpit starts a read-only local web server and prints +`http://127.0.0.1:8090`. Open that URL in your browser; press `Ctrl+C` +in the terminal to stop it. It reflects the **active azd environment** +(`sandbox`, from `defaultEnvironment` in `.azure/config.json`) — there is +no URL switch. To inspect `dev` instead, stop Cockpit, point the active +env at `dev` (set `defaultEnvironment: dev` in `.azure/config.json`, or +export `AZURE_ENV_NAME=dev`), then rerun the command. + +Read the page top to bottom and confirm each card against what you built: + +| Section | What to confirm in this run | +|---|---| +| **Foundry connection** | Foundry project = `travel-agent-sandbox`, your Azure tenant is resolved (`az login`), and Agent = `travel-agent:2`. | +| **Open in Foundry** | The deep-links open your sandbox project in the correct tenant. | +| **Observability readiness** | Trace setup / sampling status pulled from the latest Doctor analysis. | +| **AgentOps Doctor** | The same finding rollup you saw in step 19 — **2 critical** (`latency.p95_production`, `errors.production_rate`), plus warnings. | +| **Local eval history** | Your `agentops eval run` from step 19 appears as the latest entry. | +| **Quality metrics** | coherence / fluency / similarity / response_completeness trend cards from your runs. | +| **Production telemetry** | App Insights p95 latency (~11.7s) and error rate (~12%) — the source of the two criticals. | +| **CI/CD Pipelines** | The `pr` and `dev` workflows you generated are listed; `qa`/`prod`/scheduled are absent (expected). | +| **Next actions** | The prioritized backlog Cockpit derives from the open findings. | + +Cockpit does not run checks or mutate anything — it renders the latest +`results.json`, Doctor report, and evidence pack you already produced, and +links out to Foundry / Azure Monitor for live runtime data. ## Success criteria diff --git a/plugins/agentops/package.json b/plugins/agentops/package.json index 29781f6e..105e5b4e 100644 --- a/plugins/agentops/package.json +++ b/plugins/agentops/package.json @@ -2,7 +2,7 @@ "name": "agentops-accelerator", "displayName": "AgentOps Accelerator — Skills for GitHub Copilot", "description": "Copilot agent skills for running standardized evaluation workflows with AgentOps Accelerator and Microsoft Foundry agents.", - "version": "0.3.20", + "version": "0.3.21", "publisher": "AgentOpsAccelerator", "icon": "icon.png", "license": "MIT", diff --git a/plugins/agentops/plugin.json b/plugins/agentops/plugin.json index 9aea4e5f..fbd9382f 100644 --- a/plugins/agentops/plugin.json +++ b/plugins/agentops/plugin.json @@ -1,7 +1,7 @@ { "name": "agentops-accelerator", "description": "Copilot agent skills for running standardized evaluation workflows with AgentOps Accelerator and Microsoft Foundry agents.", - "version": "0.3.20", + "version": "0.3.21", "author": { "name": "AgentOps Accelerator", "url": "https://github.com/Azure/agentops" diff --git a/plugins/agentops/skills/agentops-workflow/SKILL.md b/plugins/agentops/skills/agentops-workflow/SKILL.md index db15e443..90cb635b 100644 --- a/plugins/agentops/skills/agentops-workflow/SKILL.md +++ b/plugins/agentops/skills/agentops-workflow/SKILL.md @@ -76,12 +76,37 @@ by discovering the whole Azure subscription. - optional `APPLICATIONINSIGHTS_CONNECTION_STRING`. 5. Prefer existing values and exact checks: - `git remote get-url origin` and `gh repo view --json nameWithOwner`. + - `git branch -vv` to confirm the local trunk branch tracks + `origin/main` when the tutorial uses trunk-based `main`. - `gh variable list --env ` and `gh secret list --env `. - `agentops init show`, local `.agentops/.env` or `.azure//.env`, and `azd env get-values` values before `az account show`. - `az account show` only as a proposal for tenant/subscription; confirm before writing it to GitHub variables. -6. Copy CI variables from local AgentOps/azd configuration into the GitHub +6. For GitHub OIDC, treat `AZURE_TENANT_ID` as the tenant that owns the app + registration / federated credential, not merely the tenant associated with + the subscription or a `managedByTenants` entry. Before writing + `AZURE_TENANT_ID`, verify the chosen tenant can see the app registration and + the exact federated credential: + - `az ad app show --id ` in the active tenant, or an + equivalent Microsoft Graph query scoped to the proposed tenant. + - `az ad app federated-credential list --id ` and confirm + the `subject`, `issuer`, and `audiences`. + If the app is visible in one tenant but the Azure subscription is associated + with another tenant, use the app/federated-credential tenant for + `AZURE_TENANT_ID`; the subscription id remains `AZURE_SUBSCRIPTION_ID`. + Do not copy a `managedByTenants[*].tenantId` value into GitHub variables + unless the app and federated credential are verified there too. +7. When creating or connecting the GitHub remote for the prompt-agent tutorial, + make sure the local trunk branch tracks the remote trunk before telling the + user to continue: + - If `main` is newly pushed, use `git push -u origin main`. + - If `origin/main` already exists, use + `git branch --set-upstream-to=origin/main main`. + - Verify with `git branch -vv`; `main` must show `[origin/main]`. + Without this, a later `git pull` on `main` can fetch but not update the + local branch. +8. Copy CI variables from local AgentOps/azd configuration into the GitHub environment used by the workflow. Reuse local values for `AZURE_AI_FOUNDRY_PROJECT_ENDPOINT`, `AZURE_OPENAI_ENDPOINT`, `AZURE_OPENAI_DEPLOYMENT`, and optional @@ -89,17 +114,28 @@ by discovering the whole Azure subscription. them again. Explain `AZURE_OPENAI_DEPLOYMENT` only if it is missing: it is the Azure OpenAI deployment used as the evaluator/judge model, not the user's agent. -7. Do not enumerate subscriptions, Foundry projects, Azure OpenAI resources, or +9. For prompt-agent tutorials that use Foundry trace sampling / trace-to-dataset, + verify observability RBAC before telling the user step 18 is ready: + - Resolve the dev Foundry project managed identity principal id. + - Resolve the connected Application Insights resource. + - Grant or verify **Reader** on that Application Insights resource to the dev + Foundry project managed identity. + - If the App Insights component is workspace-based, also grant or verify + **Reader** on the backing Log Analytics workspace. + This is separate from GitHub OIDC and separate from the signed-in user's + portal access. Operate dashboards can still render while trace-to-dataset + fails if the project identity cannot read App Insights. +10. Do not enumerate subscriptions, Foundry projects, Azure OpenAI resources, or model deployments to guess missing values. If `AZURE_SUBSCRIPTION_ID`, `AZURE_TENANT_ID`, `AZURE_AI_FOUNDRY_PROJECT_ENDPOINT`, or `AZURE_OPENAI_DEPLOYMENT` is absent from AgentOps/azd/local env, ask the user to choose or provide it. Only run a scoped Azure query after the user confirms the subscription and the exact missing value. -8. For GitHub OIDC, derive the federated credential subject from the generated +11. For GitHub OIDC, derive the federated credential subject from the generated workflow. If the job has `environment: dev`, the subject is normally `repo:/:environment:dev`. Do not assume branch or `pull_request` subjects without reading the workflow. -9. Before triggering a Foundry prompt-agent workflow, make sure the OIDC app / +12. Before triggering a Foundry prompt-agent workflow, make sure the OIDC app / service principal has **two** RBAC assignments. Both are required; the eval step fails silently (every metric returns `null`) if only one is in place. 1. **Foundry User** on the Foundry project (or the Foundry resource scope @@ -118,7 +154,7 @@ by discovering the whole Azure subscription. metric scores" warning so the cause is visible in CI logs, but the workflow still fails the gate. Grant this role **before** the first run. Azure **Reader** is not enough for either step. -10. If either RBAC assignment is missing, do not run the workflow yet. +13. If either RBAC assignment is missing, do not run the workflow yet. Show the exact GitHub OIDC client ID / service principal, desired role, target scope (project for Foundry User, AI Services account for Cognitive Services OpenAI User), then ask the user to approve the role assignment or @@ -134,25 +170,30 @@ by discovering the whole Azure subscription. `/subscriptions//resourceGroups//providers/Microsoft.CognitiveServices/accounts/` and can be derived from `az cognitiveservices account list --resource-group --query "[?kind=='AIServices'].id" -o tsv`. -11. Ask before creating or updating GitHub repos, GitHub environments, +14. Ask before creating or updating GitHub repos, GitHub environments, variables/secrets, Entra app registrations/service principals, federated credentials, managed identities, or Azure RBAC assignments. -12. When creating federated credentials from PowerShell, avoid fragile +15. When creating federated credentials from PowerShell, avoid fragile interpolation. Do **not** write `"repo:$repo:environment:$envName"` because `$repo:` can be parsed as a scoped variable. Use `"repo:${repo}:environment:${envName}"` or `("repo:{0}:environment:{1}" -f $repo, $envName)`, then build JSON from a PowerShell object with `ConvertTo-Json`. -13. After creating or updating a federated credential, read it back and verify +16. After creating or updating a federated credential, read it back and verify before triggering a workflow: - `subject` exactly matches the generated workflow subject. - `issuer` is `https://token.actions.githubusercontent.com`. - `audiences` includes `api://AzureADTokenExchange`. If any value differs, fix the credential before running GitHub Actions. -14. Do not dispatch `gh workflow run` as a surprise validation step. First show +17. After setting GitHub environment variables, read them back and verify + `AZURE_TENANT_ID` still matches the app/federated-credential tenant before + triggering a run. If `azure/login` fails with `AADSTS53003`, first re-check + this tenant/app alignment before assuming Conditional Access is the root + cause. +18. Do not dispatch `gh workflow run` as a surprise validation step. First show that the GitHub environment, variables/secrets, federated credential, and Foundry RBAC are ready, then ask the user before triggering workflows. -15. Avoid broad discovery unless local config is missing. Do **not** run broad +19. Avoid broad discovery unless local config is missing. Do **not** run broad `az resource list`, `az graph query`, SDK inspection, or web search to find the Foundry project when `agentops init show`, `.agentops/.env`, or `.azure//.env` already has `AZURE_AI_FOUNDRY_PROJECT_ENDPOINT`. If the @@ -317,6 +358,13 @@ across environments, set: Insights from the Foundry project endpoint; this value makes eval and Doctor telemetry explicit. +For Foundry prompt-agent projects that use trace sampling or +**Create dataset → From traces**, also verify the Foundry project managed +identity can read telemetry: grant or verify **Reader** on the connected +Application Insights resource, and on the backing Log Analytics workspace when +the App Insights component is workspace-based. This permission is not covered by +the GitHub OIDC service principal roles above. + Then configure Workload Identity Federation on the Azure side (`federated-credentials` on the app registration) for **each branch / environment** the workflows will run from. See diff --git a/src/agentops/templates/skills/agentops-workflow/SKILL.md b/src/agentops/templates/skills/agentops-workflow/SKILL.md index db15e443..90cb635b 100644 --- a/src/agentops/templates/skills/agentops-workflow/SKILL.md +++ b/src/agentops/templates/skills/agentops-workflow/SKILL.md @@ -76,12 +76,37 @@ by discovering the whole Azure subscription. - optional `APPLICATIONINSIGHTS_CONNECTION_STRING`. 5. Prefer existing values and exact checks: - `git remote get-url origin` and `gh repo view --json nameWithOwner`. + - `git branch -vv` to confirm the local trunk branch tracks + `origin/main` when the tutorial uses trunk-based `main`. - `gh variable list --env ` and `gh secret list --env `. - `agentops init show`, local `.agentops/.env` or `.azure//.env`, and `azd env get-values` values before `az account show`. - `az account show` only as a proposal for tenant/subscription; confirm before writing it to GitHub variables. -6. Copy CI variables from local AgentOps/azd configuration into the GitHub +6. For GitHub OIDC, treat `AZURE_TENANT_ID` as the tenant that owns the app + registration / federated credential, not merely the tenant associated with + the subscription or a `managedByTenants` entry. Before writing + `AZURE_TENANT_ID`, verify the chosen tenant can see the app registration and + the exact federated credential: + - `az ad app show --id ` in the active tenant, or an + equivalent Microsoft Graph query scoped to the proposed tenant. + - `az ad app federated-credential list --id ` and confirm + the `subject`, `issuer`, and `audiences`. + If the app is visible in one tenant but the Azure subscription is associated + with another tenant, use the app/federated-credential tenant for + `AZURE_TENANT_ID`; the subscription id remains `AZURE_SUBSCRIPTION_ID`. + Do not copy a `managedByTenants[*].tenantId` value into GitHub variables + unless the app and federated credential are verified there too. +7. When creating or connecting the GitHub remote for the prompt-agent tutorial, + make sure the local trunk branch tracks the remote trunk before telling the + user to continue: + - If `main` is newly pushed, use `git push -u origin main`. + - If `origin/main` already exists, use + `git branch --set-upstream-to=origin/main main`. + - Verify with `git branch -vv`; `main` must show `[origin/main]`. + Without this, a later `git pull` on `main` can fetch but not update the + local branch. +8. Copy CI variables from local AgentOps/azd configuration into the GitHub environment used by the workflow. Reuse local values for `AZURE_AI_FOUNDRY_PROJECT_ENDPOINT`, `AZURE_OPENAI_ENDPOINT`, `AZURE_OPENAI_DEPLOYMENT`, and optional @@ -89,17 +114,28 @@ by discovering the whole Azure subscription. them again. Explain `AZURE_OPENAI_DEPLOYMENT` only if it is missing: it is the Azure OpenAI deployment used as the evaluator/judge model, not the user's agent. -7. Do not enumerate subscriptions, Foundry projects, Azure OpenAI resources, or +9. For prompt-agent tutorials that use Foundry trace sampling / trace-to-dataset, + verify observability RBAC before telling the user step 18 is ready: + - Resolve the dev Foundry project managed identity principal id. + - Resolve the connected Application Insights resource. + - Grant or verify **Reader** on that Application Insights resource to the dev + Foundry project managed identity. + - If the App Insights component is workspace-based, also grant or verify + **Reader** on the backing Log Analytics workspace. + This is separate from GitHub OIDC and separate from the signed-in user's + portal access. Operate dashboards can still render while trace-to-dataset + fails if the project identity cannot read App Insights. +10. Do not enumerate subscriptions, Foundry projects, Azure OpenAI resources, or model deployments to guess missing values. If `AZURE_SUBSCRIPTION_ID`, `AZURE_TENANT_ID`, `AZURE_AI_FOUNDRY_PROJECT_ENDPOINT`, or `AZURE_OPENAI_DEPLOYMENT` is absent from AgentOps/azd/local env, ask the user to choose or provide it. Only run a scoped Azure query after the user confirms the subscription and the exact missing value. -8. For GitHub OIDC, derive the federated credential subject from the generated +11. For GitHub OIDC, derive the federated credential subject from the generated workflow. If the job has `environment: dev`, the subject is normally `repo:/:environment:dev`. Do not assume branch or `pull_request` subjects without reading the workflow. -9. Before triggering a Foundry prompt-agent workflow, make sure the OIDC app / +12. Before triggering a Foundry prompt-agent workflow, make sure the OIDC app / service principal has **two** RBAC assignments. Both are required; the eval step fails silently (every metric returns `null`) if only one is in place. 1. **Foundry User** on the Foundry project (or the Foundry resource scope @@ -118,7 +154,7 @@ by discovering the whole Azure subscription. metric scores" warning so the cause is visible in CI logs, but the workflow still fails the gate. Grant this role **before** the first run. Azure **Reader** is not enough for either step. -10. If either RBAC assignment is missing, do not run the workflow yet. +13. If either RBAC assignment is missing, do not run the workflow yet. Show the exact GitHub OIDC client ID / service principal, desired role, target scope (project for Foundry User, AI Services account for Cognitive Services OpenAI User), then ask the user to approve the role assignment or @@ -134,25 +170,30 @@ by discovering the whole Azure subscription. `/subscriptions//resourceGroups//providers/Microsoft.CognitiveServices/accounts/` and can be derived from `az cognitiveservices account list --resource-group --query "[?kind=='AIServices'].id" -o tsv`. -11. Ask before creating or updating GitHub repos, GitHub environments, +14. Ask before creating or updating GitHub repos, GitHub environments, variables/secrets, Entra app registrations/service principals, federated credentials, managed identities, or Azure RBAC assignments. -12. When creating federated credentials from PowerShell, avoid fragile +15. When creating federated credentials from PowerShell, avoid fragile interpolation. Do **not** write `"repo:$repo:environment:$envName"` because `$repo:` can be parsed as a scoped variable. Use `"repo:${repo}:environment:${envName}"` or `("repo:{0}:environment:{1}" -f $repo, $envName)`, then build JSON from a PowerShell object with `ConvertTo-Json`. -13. After creating or updating a federated credential, read it back and verify +16. After creating or updating a federated credential, read it back and verify before triggering a workflow: - `subject` exactly matches the generated workflow subject. - `issuer` is `https://token.actions.githubusercontent.com`. - `audiences` includes `api://AzureADTokenExchange`. If any value differs, fix the credential before running GitHub Actions. -14. Do not dispatch `gh workflow run` as a surprise validation step. First show +17. After setting GitHub environment variables, read them back and verify + `AZURE_TENANT_ID` still matches the app/federated-credential tenant before + triggering a run. If `azure/login` fails with `AADSTS53003`, first re-check + this tenant/app alignment before assuming Conditional Access is the root + cause. +18. Do not dispatch `gh workflow run` as a surprise validation step. First show that the GitHub environment, variables/secrets, federated credential, and Foundry RBAC are ready, then ask the user before triggering workflows. -15. Avoid broad discovery unless local config is missing. Do **not** run broad +19. Avoid broad discovery unless local config is missing. Do **not** run broad `az resource list`, `az graph query`, SDK inspection, or web search to find the Foundry project when `agentops init show`, `.agentops/.env`, or `.azure//.env` already has `AZURE_AI_FOUNDRY_PROJECT_ENDPOINT`. If the @@ -317,6 +358,13 @@ across environments, set: Insights from the Foundry project endpoint; this value makes eval and Doctor telemetry explicit. +For Foundry prompt-agent projects that use trace sampling or +**Create dataset → From traces**, also verify the Foundry project managed +identity can read telemetry: grant or verify **Reader** on the connected +Application Insights resource, and on the backing Log Analytics workspace when +the App Insights component is workspace-based. This permission is not covered by +the GitHub OIDC service principal roles above. + Then configure Workload Identity Federation on the Azure side (`federated-credentials` on the app registration) for **each branch / environment** the workflows will run from. See