Azure · placerda · Jun 12, 2026 · Jun 10, 2026 · Jun 11, 2026 · Jun 11, 2026
diff --git a/.claude-plugin/marketplace.json b/.claude-plugin/marketplace.json
@@ -13,7 +13,7 @@
       "name": "agentops-accelerator",
       "source": "../../plugins/agentops",
       "description": "Copilot agent skills for running standardized evaluation workflows with AgentOps Toolkit and Microsoft Foundry agents.",
-      "version": "0.3.20",
+      "version": "0.3.21",
       "keywords": [
         "agentops",
         "evaluation",

diff --git a/.github/plugin/marketplace.json b/.github/plugin/marketplace.json
@@ -13,7 +13,7 @@
       "name": "agentops-accelerator",
       "source": "../../plugins/agentops",
       "description": "Copilot agent skills for running standardized evaluation workflows with AgentOps Toolkit and Microsoft Foundry agents.",
-      "version": "0.3.20",
+      "version": "0.3.21",
       "keywords": [
         "agentops",
         "evaluation",

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -5,6 +5,29 @@ This format follows [Keep a Changelog](https://keepachangelog.com/) and adheres
 
 ## [Unreleased]
 
+## [0.3.21] - 2026-06-12
+
+### Changed
+- **`agentops-workflow` skill now verifies OIDC tenant, branch upstream
+  tracking, and trace-sampling RBAC before wiring CI.** The packaged skill
+  instructs agents to treat `AZURE_TENANT_ID` as the tenant that owns the Entra
+  app registration / federated credential (not the subscription tenant), to set
+  and verify the local trunk branch upstream (`git branch -vv` must show
+  `[origin/main]`), and to grant **Reader** on Application Insights (and its
+  backing Log Analytics workspace) to the Foundry project managed identity for
+  trace-to-dataset flows.
+
+### Docs
+- **Prompt-agent, hosted-agent, and end-to-end tutorials hardened end to end.**
+  OIDC setup calls out the app-registration tenant; observability steps require
+  App Insights Reader for trace sampling and cover workspace-backed App Insights;
+  the telemetry step queries `gen_ai.evaluation` results from `AppEvents`
+  (table-safe, no hard-coded dates); the evidence step explains expected
+  production-telemetry criticals and where the Doctor thresholds live
+  (`.agentops/agent.yaml`); and the Cockpit step is now a concrete walkthrough
+  (exact `http://127.0.0.1:8090` URL, read-only note, per-section checks, and
+  azd-env switching instead of a non-existent URL switch).
+
 ## [0.3.20] - 2026-06-10
 
 ### Changed

diff --git a/docs/ci-github-actions.md b/docs/ci-github-actions.md
@@ -119,6 +119,12 @@ In Settings → Secrets and variables → Actions → **Variables**, add:
 | `AZURE_OPENAI_DEPLOYMENT` | Model deployment used by local evaluators and AgentOps cloud eval judges |
 | `APPLICATIONINSIGHTS_CONNECTION_STRING` | Optional fallback when the Foundry project's App Insights connection cannot be auto-discovered |
 
+Set `AZURE_TENANT_ID` to the tenant that owns the app registration / federated
+credential used by `AZURE_CLIENT_ID`. Do not use a subscription
+`managedByTenants` tenant id unless the app registration and federated
+credential are also visible in that tenant; otherwise `azure/login` can fail at
+token issuance before AgentOps starts.
+
 Then on the Azure side, configure Workload Identity Federation
 (federated credentials) on the app registration so it can be assumed
 from GitHub Actions runs. See

diff --git a/docs/tutorial-end-to-end.md b/docs/tutorial-end-to-end.md
@@ -122,7 +122,7 @@ prompts.
 | Azure CLI is installed and `az login` succeeds with the tenant that owns the Foundry project. | AgentOps, Foundry SDK calls, Doctor, Cockpit, and CI setup all need the same Azure identity context. |
 | You have the Foundry project endpoint and can create or publish one Travel Agent target. | The target is either `travel-agent:<version>` for prompt agents or an HTTP endpoint for hosted agents. |
 | You have a chat-capable Azure OpenAI deployment, for example `gpt-4o-mini`. | Local evals and CI variables need a judge model for evaluator calls. |
-| Application Insights is connected to the Foundry project or agent runtime, or you can create/attach it. | Foundry Traces, Operate metrics/Ask AI when available, Azure Monitor, Doctor, Cockpit, and evidence links need telemetry. |
+| Application Insights is connected to the Foundry project or agent runtime, or you can create/attach it. For Foundry trace-to-dataset flows, you can also grant Reader on App Insights and its backing Log Analytics workspace to the Foundry project managed identity. | Foundry Traces, Operate metrics/Ask AI when available, trace sampling, Azure Monitor, Doctor, Cockpit, and evidence links need telemetry. |
 | You can deploy or expose any hosted endpoint that CI will call. | `localhost` works for local eval; remote CI needs a reachable HTTPS URL. |
 | You can push to the tutorial GitHub repository and run GitHub Actions or Azure Pipelines. | PR and environment workflows only run after the repo is published. |
 | GitHub CLI is authenticated with `gh auth login` if you use GitHub PR commands while testing CI. | The regression and release-gate steps are smoother when repo, PR, and Actions access are already confirmed. |
@@ -524,8 +524,9 @@ environment variable or equivalent Azure DevOps pipeline variable, verify the
 OIDC principal has **both** Foundry User access on the dev Foundry project
 **and** Cognitive Services OpenAI User access on the underlying Azure AI
 Services account that hosts the evaluator model (both are required — without
-the OpenAI User role, every cloud eval metric returns null), and show me the
-plan before changing GitHub or Azure.
+the OpenAI User role, every cloud eval metric returns null), verify
+AZURE_TENANT_ID is the tenant that owns the Entra app registration and its
+federated credential, and show me the plan before changing GitHub or Azure.
 ```
 
 That value is not an `agentops init` answer. It tells the Foundry cloud eval
@@ -877,6 +878,13 @@ may not have live traffic, scheduled workflows may not have history, and trace
 regression candidates may not exist yet. That is useful tutorial feedback, not
 a failure of Doctor.
 
+If production telemetry *does* carry enough live traffic to trip latency or
+error criticals, those are honest signals — not tutorial noise. The thresholds
+that decide critical-vs-warning live in `.agentops/agent.yaml`
+(`checks.latency.p95_threshold_seconds`, `checks.errors.rate_threshold`) and are
+separate from the `agentops.yaml` eval-gate thresholds; raise them only if you
+deliberately want to relax the production gate for a demo.
+
 ## 10. Run Foundry red-team scans
 
 Red-team scans are a Foundry capability. Run them from Foundry Observability /
@@ -952,16 +960,31 @@ reviews and accepts them.
 agentops cockpit --workspace .
 ```
 
-Use Cockpit as the local command center:
+Cockpit starts a read-only local web server and prints
+`http://127.0.0.1:8090`. Open that URL in your browser; press `Ctrl+C` in
+the terminal to stop it. It reflects the **active azd environment**
+(`sandbox`, from `defaultEnvironment` in `.azure/config.json`) — there is no
+URL switch. To inspect `dev`, stop Cockpit, point the active env at `dev`
+(set `defaultEnvironment: dev` in `.azure/config.json`, or export
+`AZURE_ENV_NAME=dev`), then rerun the command.
 
-- Foundry connection and deep links;
-- Microsoft Foundry eval or AgentOps local eval gate status;
-- Doctor findings;
-- release evidence;
-- local eval history;
-- production telemetry snapshot;
-- CI/CD workflow status;
-- next actions.
+Read the page top to bottom and confirm each card:
+
+| Section | What to confirm |
+|---|---|
+| **Foundry connection** | The Foundry project and tenant resolve, and the agent identity matches your `agentops.yaml` target. |
+| **Open in Foundry** | The deep-links open your project in the correct tenant. |
+| **Observability readiness** | Trace setup / sampling status from the latest Doctor analysis. |
+| **AgentOps Doctor** | The same finding rollup from the Doctor / evidence-pack step (criticals first, then warnings). |
+| **Local eval history** | Your `agentops eval run` baseline and regression reruns appear. |
+| **Quality metrics** | Evaluator score trends from your runs. |
+| **Production telemetry** | App Insights latency / error snapshot (or a clear "no live traffic" state in a fresh workspace). |
+| **CI/CD Pipelines** | The workflows you generated are listed. |
+| **Next actions** | The prioritized backlog Cockpit derives from the open findings. |
+
+Cockpit does not run checks or mutate anything — it renders the latest
+`results.json`, Doctor report, and evidence pack you already produced, and
+links out to Foundry / Azure Monitor for live runtime data.
 
 ## Completion checklist
 

diff --git a/docs/tutorial-hosted-agent-quickstart.md b/docs/tutorial-hosted-agent-quickstart.md
@@ -786,12 +786,13 @@ hosted-agent project.
 
 Create or connect the GitHub repo if needed, set AGENTOPS_AGENT_ENDPOINT in the
 `dev` environment to the deployed HTTPS endpoint, wire Azure OIDC and required
-Actions variables in the `dev` environment, and set any required endpoint token
-as a secret. The PR gate uses --doctor-gate critical so the workflow blocks on
-critical Doctor findings (regressions or other strict signals). Do not add
-scheduled Doctor, QA, or production workflows yet. Show me the plan before
-changing GitHub or Azure, and call out anything that needs owner/admin
-permission.
+Actions variables in the `dev` environment, verify AZURE_TENANT_ID is the tenant
+that owns the Entra app registration and its federated credential, and set any
+required endpoint token as a secret. The PR gate uses --doctor-gate critical so
+the workflow blocks on critical Doctor findings (regressions or other strict
+signals). Do not add scheduled Doctor, QA, or production workflows yet. Show me
+the plan before changing GitHub or Azure, and call out anything that needs
+owner/admin permission.
 ```
 
 Open both Doctor outputs. The report explains the findings; the evidence pack
@@ -800,6 +801,13 @@ In a fresh tutorial workspace, warnings about production telemetry, CI history,
 regression history are expected and useful: they show what remains before this
 local endpoint becomes an operated service.
 
+If production telemetry *does* carry enough live traffic to trip latency or
+error criticals, those are honest signals. The thresholds that decide
+critical-vs-warning live in `.agentops/agent.yaml`
+(`checks.latency.p95_threshold_seconds`, `checks.errors.rate_threshold`) and are
+separate from the `agentops.yaml` eval-gate thresholds; raise them only if you
+deliberately want to relax the production gate for a demo.
+
 If you later want a separate cadence outside PRs, generate the optional Doctor
 workflow with `agentops workflow generate --kinds doctor --force`.
 
@@ -816,8 +824,32 @@ look self-contained inside AgentOps.
 agentops cockpit --workspace .
 ```
 
-Cockpit shows the endpoint readiness, eval history, Doctor findings, telemetry
-status, release evidence, CI/CD, and next actions.
+Cockpit starts a read-only local web server and prints
+`http://127.0.0.1:8090` (this is the Cockpit UI port, not your agent's
+`:8000`). Open that URL in your browser; press `Ctrl+C` in the terminal to
+stop it. It reflects the **active azd environment** (`sandbox`, from
+`defaultEnvironment` in `.azure/config.json`) — there is no URL switch. To
+inspect `dev`, stop Cockpit, point the active env at `dev` (set
+`defaultEnvironment: dev` in `.azure/config.json`, or export
+`AZURE_ENV_NAME=dev`), then rerun the command.
+
+Read the page top to bottom and confirm each card:
+
+| Section | What to confirm |
+|---|---|
+| **Foundry connection** | The Foundry project / tenant resolve, and the agent is your hosted endpoint URL. |
+| **Open in Foundry** | The deep-links open your project in the correct tenant. |
+| **Observability readiness** | Trace setup / sampling status from the latest Doctor analysis. |
+| **AgentOps Doctor** | The same finding rollup from the Doctor / evidence-pack step (criticals first, then warnings). |
+| **Local eval history** | Your `agentops eval run` baseline, regressed, and fixed reruns appear. |
+| **Quality metrics** | Evaluator score trends from your runs. |
+| **Production telemetry** | App Insights latency / error snapshot for the `travel-agent.chat` operation (or a "no live traffic" state in a fresh workspace). |
+| **CI/CD Pipelines** | The PR and dev deploy workflows you generated are listed. |
+| **Next actions** | The prioritized backlog Cockpit derives from the open findings. |
+
+Cockpit does not run checks or mutate anything — it renders the latest
+`results.json`, Doctor report, and evidence pack you already produced, and
+links out to Foundry / Azure Monitor for live runtime data.
 
 ## Success criteria