Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
17 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .claude-plugin/marketplace.json
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
"name": "agentops-accelerator",
"source": "../../plugins/agentops",
"description": "Copilot agent skills for running standardized evaluation workflows with AgentOps Toolkit and Microsoft Foundry agents.",
"version": "0.3.20",
"version": "0.3.21",
"keywords": [
"agentops",
"evaluation",
Expand Down
2 changes: 1 addition & 1 deletion .github/plugin/marketplace.json
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
"name": "agentops-accelerator",
"source": "../../plugins/agentops",
"description": "Copilot agent skills for running standardized evaluation workflows with AgentOps Toolkit and Microsoft Foundry agents.",
"version": "0.3.20",
"version": "0.3.21",
"keywords": [
"agentops",
"evaluation",
Expand Down
23 changes: 23 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,29 @@ This format follows [Keep a Changelog](https://keepachangelog.com/) and adheres

## [Unreleased]

## [0.3.21] - 2026-06-12

### Changed
- **`agentops-workflow` skill now verifies OIDC tenant, branch upstream
tracking, and trace-sampling RBAC before wiring CI.** The packaged skill
instructs agents to treat `AZURE_TENANT_ID` as the tenant that owns the Entra
app registration / federated credential (not the subscription tenant), to set
and verify the local trunk branch upstream (`git branch -vv` must show
`[origin/main]`), and to grant **Reader** on Application Insights (and its
backing Log Analytics workspace) to the Foundry project managed identity for
trace-to-dataset flows.

### Docs
- **Prompt-agent, hosted-agent, and end-to-end tutorials hardened end to end.**
OIDC setup calls out the app-registration tenant; observability steps require
App Insights Reader for trace sampling and cover workspace-backed App Insights;
the telemetry step queries `gen_ai.evaluation` results from `AppEvents`
(table-safe, no hard-coded dates); the evidence step explains expected
production-telemetry criticals and where the Doctor thresholds live
(`.agentops/agent.yaml`); and the Cockpit step is now a concrete walkthrough
(exact `http://127.0.0.1:8090` URL, read-only note, per-section checks, and
azd-env switching instead of a non-existent URL switch).

## [0.3.20] - 2026-06-10

### Changed
Expand Down
6 changes: 6 additions & 0 deletions docs/ci-github-actions.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,12 @@ In Settings → Secrets and variables → Actions → **Variables**, add:
| `AZURE_OPENAI_DEPLOYMENT` | Model deployment used by local evaluators and AgentOps cloud eval judges |
| `APPLICATIONINSIGHTS_CONNECTION_STRING` | Optional fallback when the Foundry project's App Insights connection cannot be auto-discovered |

Set `AZURE_TENANT_ID` to the tenant that owns the app registration / federated
credential used by `AZURE_CLIENT_ID`. Do not use a subscription
`managedByTenants` tenant id unless the app registration and federated
credential are also visible in that tenant; otherwise `azure/login` can fail at
token issuance before AgentOps starts.

Then on the Azure side, configure Workload Identity Federation
(federated credentials) on the app registration so it can be assumed
from GitHub Actions runs. See
Expand Down
47 changes: 35 additions & 12 deletions docs/tutorial-end-to-end.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,7 @@ prompts.
| Azure CLI is installed and `az login` succeeds with the tenant that owns the Foundry project. | AgentOps, Foundry SDK calls, Doctor, Cockpit, and CI setup all need the same Azure identity context. |
| You have the Foundry project endpoint and can create or publish one Travel Agent target. | The target is either `travel-agent:<version>` for prompt agents or an HTTP endpoint for hosted agents. |
| You have a chat-capable Azure OpenAI deployment, for example `gpt-4o-mini`. | Local evals and CI variables need a judge model for evaluator calls. |
| Application Insights is connected to the Foundry project or agent runtime, or you can create/attach it. | Foundry Traces, Operate metrics/Ask AI when available, Azure Monitor, Doctor, Cockpit, and evidence links need telemetry. |
| Application Insights is connected to the Foundry project or agent runtime, or you can create/attach it. For Foundry trace-to-dataset flows, you can also grant Reader on App Insights and its backing Log Analytics workspace to the Foundry project managed identity. | Foundry Traces, Operate metrics/Ask AI when available, trace sampling, Azure Monitor, Doctor, Cockpit, and evidence links need telemetry. |
| You can deploy or expose any hosted endpoint that CI will call. | `localhost` works for local eval; remote CI needs a reachable HTTPS URL. |
| You can push to the tutorial GitHub repository and run GitHub Actions or Azure Pipelines. | PR and environment workflows only run after the repo is published. |
| GitHub CLI is authenticated with `gh auth login` if you use GitHub PR commands while testing CI. | The regression and release-gate steps are smoother when repo, PR, and Actions access are already confirmed. |
Expand Down Expand Up @@ -524,8 +524,9 @@ environment variable or equivalent Azure DevOps pipeline variable, verify the
OIDC principal has **both** Foundry User access on the dev Foundry project
**and** Cognitive Services OpenAI User access on the underlying Azure AI
Services account that hosts the evaluator model (both are required — without
the OpenAI User role, every cloud eval metric returns null), and show me the
plan before changing GitHub or Azure.
the OpenAI User role, every cloud eval metric returns null), verify
AZURE_TENANT_ID is the tenant that owns the Entra app registration and its
federated credential, and show me the plan before changing GitHub or Azure.
```

That value is not an `agentops init` answer. It tells the Foundry cloud eval
Expand Down Expand Up @@ -877,6 +878,13 @@ may not have live traffic, scheduled workflows may not have history, and trace
regression candidates may not exist yet. That is useful tutorial feedback, not
a failure of Doctor.

If production telemetry *does* carry enough live traffic to trip latency or
error criticals, those are honest signals — not tutorial noise. The thresholds
that decide critical-vs-warning live in `.agentops/agent.yaml`
(`checks.latency.p95_threshold_seconds`, `checks.errors.rate_threshold`) and are
separate from the `agentops.yaml` eval-gate thresholds; raise them only if you
deliberately want to relax the production gate for a demo.

## 10. Run Foundry red-team scans

Red-team scans are a Foundry capability. Run them from Foundry Observability /
Expand Down Expand Up @@ -952,16 +960,31 @@ reviews and accepts them.
agentops cockpit --workspace .
```

Use Cockpit as the local command center:
Cockpit starts a read-only local web server and prints
`http://127.0.0.1:8090`. Open that URL in your browser; press `Ctrl+C` in
the terminal to stop it. It reflects the **active azd environment**
(`sandbox`, from `defaultEnvironment` in `.azure/config.json`) — there is no
URL switch. To inspect `dev`, stop Cockpit, point the active env at `dev`
(set `defaultEnvironment: dev` in `.azure/config.json`, or export
`AZURE_ENV_NAME=dev`), then rerun the command.

- Foundry connection and deep links;
- Microsoft Foundry eval or AgentOps local eval gate status;
- Doctor findings;
- release evidence;
- local eval history;
- production telemetry snapshot;
- CI/CD workflow status;
- next actions.
Read the page top to bottom and confirm each card:

| Section | What to confirm |
|---|---|
| **Foundry connection** | The Foundry project and tenant resolve, and the agent identity matches your `agentops.yaml` target. |
| **Open in Foundry** | The deep-links open your project in the correct tenant. |
| **Observability readiness** | Trace setup / sampling status from the latest Doctor analysis. |
| **AgentOps Doctor** | The same finding rollup from the Doctor / evidence-pack step (criticals first, then warnings). |
| **Local eval history** | Your `agentops eval run` baseline and regression reruns appear. |
| **Quality metrics** | Evaluator score trends from your runs. |
| **Production telemetry** | App Insights latency / error snapshot (or a clear "no live traffic" state in a fresh workspace). |
| **CI/CD Pipelines** | The workflows you generated are listed. |
| **Next actions** | The prioritized backlog Cockpit derives from the open findings. |

Cockpit does not run checks or mutate anything — it renders the latest
`results.json`, Doctor report, and evidence pack you already produced, and
links out to Foundry / Azure Monitor for live runtime data.

## Completion checklist

Expand Down
48 changes: 40 additions & 8 deletions docs/tutorial-hosted-agent-quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -786,12 +786,13 @@ hosted-agent project.

Create or connect the GitHub repo if needed, set AGENTOPS_AGENT_ENDPOINT in the
`dev` environment to the deployed HTTPS endpoint, wire Azure OIDC and required
Actions variables in the `dev` environment, and set any required endpoint token
as a secret. The PR gate uses --doctor-gate critical so the workflow blocks on
critical Doctor findings (regressions or other strict signals). Do not add
scheduled Doctor, QA, or production workflows yet. Show me the plan before
changing GitHub or Azure, and call out anything that needs owner/admin
permission.
Actions variables in the `dev` environment, verify AZURE_TENANT_ID is the tenant
that owns the Entra app registration and its federated credential, and set any
required endpoint token as a secret. The PR gate uses --doctor-gate critical so
the workflow blocks on critical Doctor findings (regressions or other strict
signals). Do not add scheduled Doctor, QA, or production workflows yet. Show me
the plan before changing GitHub or Azure, and call out anything that needs
owner/admin permission.
```

Open both Doctor outputs. The report explains the findings; the evidence pack
Expand All @@ -800,6 +801,13 @@ In a fresh tutorial workspace, warnings about production telemetry, CI history,
regression history are expected and useful: they show what remains before this
local endpoint becomes an operated service.

If production telemetry *does* carry enough live traffic to trip latency or
error criticals, those are honest signals. The thresholds that decide
critical-vs-warning live in `.agentops/agent.yaml`
(`checks.latency.p95_threshold_seconds`, `checks.errors.rate_threshold`) and are
separate from the `agentops.yaml` eval-gate thresholds; raise them only if you
deliberately want to relax the production gate for a demo.

If you later want a separate cadence outside PRs, generate the optional Doctor
workflow with `agentops workflow generate --kinds doctor --force`.

Expand All @@ -816,8 +824,32 @@ look self-contained inside AgentOps.
agentops cockpit --workspace .
```

Cockpit shows the endpoint readiness, eval history, Doctor findings, telemetry
status, release evidence, CI/CD, and next actions.
Cockpit starts a read-only local web server and prints
`http://127.0.0.1:8090` (this is the Cockpit UI port, not your agent's
`:8000`). Open that URL in your browser; press `Ctrl+C` in the terminal to
stop it. It reflects the **active azd environment** (`sandbox`, from
`defaultEnvironment` in `.azure/config.json`) — there is no URL switch. To
inspect `dev`, stop Cockpit, point the active env at `dev` (set
`defaultEnvironment: dev` in `.azure/config.json`, or export
`AZURE_ENV_NAME=dev`), then rerun the command.

Read the page top to bottom and confirm each card:

| Section | What to confirm |
|---|---|
| **Foundry connection** | The Foundry project / tenant resolve, and the agent is your hosted endpoint URL. |
| **Open in Foundry** | The deep-links open your project in the correct tenant. |
| **Observability readiness** | Trace setup / sampling status from the latest Doctor analysis. |
| **AgentOps Doctor** | The same finding rollup from the Doctor / evidence-pack step (criticals first, then warnings). |
| **Local eval history** | Your `agentops eval run` baseline, regressed, and fixed reruns appear. |
| **Quality metrics** | Evaluator score trends from your runs. |
| **Production telemetry** | App Insights latency / error snapshot for the `travel-agent.chat` operation (or a "no live traffic" state in a fresh workspace). |
| **CI/CD Pipelines** | The PR and dev deploy workflows you generated are listed. |
| **Next actions** | The prioritized backlog Cockpit derives from the open findings. |

Cockpit does not run checks or mutate anything — it renders the latest
`results.json`, Doctor report, and evidence pack you already produced, and
links out to Foundry / Azure Monitor for live runtime data.

## Success criteria

Expand Down
Loading
Loading