Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
14b86fa
docs: showcase ASSERT evidence in tutorial
placerda Jun 9, 2026
bfb5438
Merge pull request #272 from Azure/docs/showcase-assert-evidence-tuto…
placerda Jun 9, 2026
0300fa7
docs: clarify ASSERT evidence flow
placerda Jun 9, 2026
e6b1c2c
Merge pull request #273 from Azure/docs/clarify-assert-evidence-flow
placerda Jun 9, 2026
d34defa
docs: clarify synthetic multi-turn tutorial rows
placerda Jun 9, 2026
1ff25fe
docs: mention Foundry full conversation eval
placerda Jun 9, 2026
57cc5b4
Merge pull request #274 from Azure/docs/clarify-synthetic-multiturn-c…
placerda Jun 9, 2026
f9a20e3
docs: add Foundry full multi-turn evaluation step
placerda Jun 9, 2026
324dc6d
Merge pull request #275 from Azure/docs/add-foundry-full-conversation…
placerda Jun 9, 2026
51f2b5d
docs: clarify CLI conversation gate scope
placerda Jun 9, 2026
90362f2
Merge pull request #276 from Azure/docs/clarify-cli-conversation-gate…
placerda Jun 9, 2026
422a87f
docs: explain Foundry full conversation data source
placerda Jun 9, 2026
96527ad
Merge pull request #277 from Azure/docs/explain-full-conversations-da…
placerda Jun 9, 2026
3fad3a2
docs: remove manual Foundry evaluation URL step
placerda Jun 9, 2026
25b926f
Merge pull request #278 from Azure/docs/remove-manual-evaluation-url-…
placerda Jun 9, 2026
d1ea843
docs: require rubric gate in prompt tutorial
placerda Jun 9, 2026
896baad
Merge pull request #279 from Azure/docs/make-rubric-required-and-skil…
placerda Jun 9, 2026
c5a6ee1
docs: explain rubric evaluator concept
placerda Jun 9, 2026
c761c8d
Merge pull request #280 from Azure/docs/explain-rubric-evaluator-concept
placerda Jun 9, 2026
f6f92a6
docs: mark full multi-turn evaluation optional
placerda Jun 9, 2026
d4d2f2a
Merge pull request #281 from Azure/docs/mark-full-multiturn-optional
placerda Jun 9, 2026
66e581c
docs: explain rubric placeholders with concrete file lookups
placerda Jun 9, 2026
471c8fe
Merge pull request #282 from Azure/docs/explain-rubric-placeholders
placerda Jun 9, 2026
f3647d3
feat: agentops assert run + agentops redteam run as active CI gates (…
placerda Jun 9, 2026
ebb4e62
chore: prepare release 0.3.14
github-actions[bot] Jun 9, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .claude-plugin/marketplace.json
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
"name": "agentops-accelerator",
"source": "../../plugins/agentops",
"description": "Copilot agent skills for running standardized evaluation workflows with AgentOps Toolkit and Microsoft Foundry agents.",
"version": "0.3.13",
"version": "0.3.14",
"keywords": [
"agentops",
"evaluation",
Expand Down
2 changes: 1 addition & 1 deletion .github/plugin/marketplace.json
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
"name": "agentops-accelerator",
"source": "../../plugins/agentops",
"description": "Copilot agent skills for running standardized evaluation workflows with AgentOps Toolkit and Microsoft Foundry agents.",
"version": "0.3.13",
"version": "0.3.14",
"keywords": [
"agentops",
"evaluation",
Expand Down
22 changes: 22 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,28 @@ This format follows [Keep a Changelog](https://keepachangelog.com/) and adheres

## [Unreleased]

## [0.3.14] - 2026-06-09

### Added
- **`agentops assert run` orchestrates the open-source ASSERT framework.**
AgentOps now invokes the `assert-ai` CLI as an active CI step instead of only
consuming pre-generated artifacts via `assert_path:`. A new `assert:` block in
`agentops.yaml` (`config`, `results_dir`, `suite`, `run_id`,
`fail_on_violations`) drives subprocess invocation, locates the run output
under `<results_dir>/<suite>/<run>/`, parses `metrics.json` and
`scores.jsonl`, and writes a normalized summary at `.agentops/assert/latest.json`
that the release evidence pack ingests automatically. Exit code 2 when any
policy dimension reports violations.
- **`agentops redteam run` orchestrates Foundry's AI Red Teaming agent (PyRIT).**
AgentOps now invokes `azure.ai.evaluation.red_team.RedTeam` against the
configured target (Azure OpenAI deployment, Foundry prompt agent, or HTTP
endpoint) and normalizes the per-category and per-strategy attack outcomes.
A new `redteam:` block in `agentops.yaml` (`target`, `risk_categories`,
`attack_strategies`, `num_objectives`, `fail_on_attack_success_rate`)
controls the scan; results land at `.agentops/redteam/latest.json` so the
evidence pack picks them up via `redteam_path:` automatically. Exit code 2
when attack-success-rate exceeds the configured threshold.

## [0.3.13] - 2026-06-09

### Fixed
Expand Down
84 changes: 51 additions & 33 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
<h1 align="center">AgentOps Accelerator</h1>

<p align="center">
Answer the release question for Microsoft Foundry agents: can we ship it, and where is the proof?
<b>Open-source framework and CLI for continuous evaluation, safety testing, and release readiness of Microsoft Foundry agents.</b>
<br/>
Can we ship it, and where is the proof?
</p>

<p align="center">
Expand All @@ -19,25 +21,52 @@ Answer the release question for Microsoft Foundry agents: can we ship it, and wh

## Overview

AgentOps Accelerator helps teams turn Foundry agent work into a clear release
decision. Foundry is the agent control plane; AgentOps turns Foundry signals and
repo checks into repeatable gates, Doctor readiness, release evidence, and
trace-driven regression loops.

The project enables:

- Local and CI execution for release gates
- Foundry prompt agent, Foundry hosted endpoint, HTTP/JSON agent, and raw model targets
- Auto-selected evaluators for RAG, tools, and model quality
- Stable `results.json` for automation
- PR-friendly `report.md`
- Baseline comparison for regression detection
- Doctor checks for repo, CI/CD, telemetry, landing zones, and Foundry setup
- Release evidence packs for promotion review
- Optional `azd ai agent eval` execution with Rubric/custom metric binding
- ASSERT, ACS, and red-team governance evidence references
- Trace promotion into regression datasets
- Cockpit navigation for AgentOps, Foundry, and Azure Monitor
**AgentOps Accelerator is an open-source framework and CLI that standardizes
continuous evaluation, safety testing, and release readiness for enterprise AI
agents — with Microsoft Foundry as the agent runtime.**

It is an *orchestrator*, not a reimplementation. AgentOps wires together the
tools you already use — Foundry Evaluations, `azd ai agent eval`, the
open-source ASSERT framework, the PyRIT-backed AI Red Teaming agent, Azure
Monitor / Application Insights, and your CI/CD platform — into a single
repeatable release loop:

1. **Evaluate** the agent against datasets, rubrics, and policies — locally or
in the cloud — using auto-selected evaluators for RAG, tool use, model
quality, and safety.
2. **Probe** the agent with adversarial inputs by orchestrating ASSERT
(`agentops assert run`) and the Foundry/PyRIT Red Teaming agent
(`agentops redteam run`) as active CI steps.
3. **Diagnose** repo, telemetry, landing zone, and Foundry readiness with
`agentops doctor`.
4. **Gate** the release with a deterministic exit-code contract that PRs and
pipelines can rely on.
5. **Prove** the release with a stable evidence pack (`evidence.json` +
`evidence.md`) that bundles eval results, ASSERT verdicts, red-team
findings, telemetry readiness, and Doctor findings for promotion review.
6. **Learn from production** by promoting reviewed traces into regression
datasets that feed the next eval cycle.

The output is a clear answer to two questions reviewers actually ask:
**can we ship it, and where is the proof?**

### Core outputs

| Artifact | Produced by | Audience |
|---|---|---|
| `results.json` | `agentops eval run` | CI / automation |
| `report.md` | `agentops eval run` | PR reviewers |
| `.agentops/assert/latest.json` | `agentops assert run` | Evidence pack, CI gate |
| `.agentops/redteam/latest.json` | `agentops redteam run` | Evidence pack, CI gate |
| `evidence.json` / `evidence.md` | `agentops doctor --evidence-pack` | Release approver |
| Cockpit (localhost) | `agentops cockpit` | Engineer reviewing readiness |

### Exit-code contract

- `0` — execution succeeded and all gates passed
- `2` — execution succeeded but a threshold, ASSERT violation, red-team rate,
or Doctor severity gate failed
- `1` — runtime or configuration error

## AgentOps and Microsoft Foundry

Expand All @@ -50,26 +79,15 @@ ship/no-ship workflow.
|---|---|---|
| Build and version | Foundry portal, Foundry SDK/Toolkit, `microsoft-foundry` skill, azd | Pins the exact candidate in `agentops.yaml` and generates the PR/release gate around it |
| Evaluate and compare | Foundry Evaluations, `azd ai agent eval`, Rubric evaluator, and official CI actions/extensions | Keeps datasets and thresholds in the repo, records evidence, normalizes azd/Rubric outputs, and provides local/fallback runs for non-prompt targets |
| Probe safety | ASSERT framework, PyRIT-backed AI Red Teaming agent | Runs both as active CI steps via `agentops assert run` and `agentops redteam run`, normalizes verdicts, and gates the pipeline |
| Observe and investigate | Foundry Monitor, Traces, Azure Monitor, App Insights | Surfaces deep links, telemetry readiness, Doctor findings, and Cockpit navigation |
| Decide release | Branch protection, environments, approvals | Packages `evidence.json` / `evidence.md` for promotion review |
| Govern controls | ASSERT, ACS, Foundry Guardrails, Foundry red-team scans | References reviewed artifacts by path/hash/status without executing or applying the external controls |
| Govern controls | ACS, Foundry Guardrails | References reviewed artifacts by path/hash/status without executing or applying the external controls |
| Improve from production | Production traces and Foundry datasets | Promotes reviewed trace learnings into regression candidates |

The rhythm is simple: build and operate the agent in Foundry, keep the release
contract in the repo, and let AgentOps connect the two into a clean review loop.

Core outputs:

- `results.json` (machine-readable)
- `report.md` (human-readable)
- `evidence.json` / `evidence.md` (from `agentops doctor --evidence-pack`)

Exit code contract:

- `0` execution succeeded and all thresholds passed
- `2` execution succeeded but one or more thresholds failed
- `1` runtime or configuration error

## Quickstart

### 1) Install
Expand Down
6 changes: 4 additions & 2 deletions docs/tutorial-end-to-end.md
Original file line number Diff line number Diff line change
Expand Up @@ -892,10 +892,12 @@ Use AgentOps for the repo-side follow-through:
`.agentops/governance/redteam-plan.md`; keep raw payloads/results in the
approved secure system.
3. If you use ASSERT or Agent Control Specification, add reviewed artifacts to
the repo or CI artifacts and point AgentOps at them:
the repo or CI artifacts and point AgentOps at them. These artifacts join the
normal release proof alongside eval results, Doctor findings, and workflow
runs:

```yaml
assert_path: .assert/evaluation-policy.yaml
assert_path: .agentops/governance/assert-evidence.md
acs_path: acs.yaml
redteam_path: .agentops/governance/redteam-plan.md
```
Expand Down
Loading
Loading