From 6f779095e513e6c3fcdd83475625dcfa2e9bcfda Mon Sep 17 00:00:00 2001 From: Paulo Lacerda Date: Tue, 9 Jun 2026 15:05:32 -0300 Subject: [PATCH 1/3] docs: rewrite README Overview in flowing human prose (#285) Replace em-dash heavy, bullet-driven Overview with three flowing paragraphs (no inline bold, no numbered loop list). Swap closing question 'where is the proof?' for 'how do we know?' to feel less confrontational. Same content and six concepts (Evaluate, Probe, Diagnose, Gate, Prove, Learn) woven into one paragraph. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- README.md | 65 +++++++++++++++++++++++++++++-------------------------- 1 file changed, 34 insertions(+), 31 deletions(-) diff --git a/README.md b/README.md index 495a2ff..ecef64b 100644 --- a/README.md +++ b/README.md @@ -3,7 +3,7 @@

Open-source framework and CLI for continuous evaluation, safety testing, and release readiness of Microsoft Foundry agents.
-Can we ship it, and where is the proof? +Can we ship it, and how do we know?

@@ -21,34 +21,36 @@ Can we ship it, and where is the proof? ## Overview -**AgentOps Accelerator is an open-source framework and CLI that standardizes -continuous evaluation, safety testing, and release readiness for enterprise AI -agents — with Microsoft Foundry as the agent runtime.** +AgentOps Accelerator is an open-source framework and CLI that standardizes +continuous evaluation, safety testing, and release readiness for enterprise +AI agents running on Microsoft Foundry. -It is an *orchestrator*, not a reimplementation. AgentOps wires together the -tools you already use — Foundry Evaluations, `azd ai agent eval`, the +It is an orchestrator, not a reimplementation. Foundry already builds and +runs the agent. Tools like Foundry Evaluations, `azd ai agent eval`, the open-source ASSERT framework, the PyRIT-backed AI Red Teaming agent, Azure -Monitor / Application Insights, and your CI/CD platform — into a single -repeatable release loop: - -1. **Evaluate** the agent against datasets, rubrics, and policies — locally or - in the cloud — using auto-selected evaluators for RAG, tool use, model - quality, and safety. -2. **Probe** the agent with adversarial inputs by orchestrating ASSERT - (`agentops assert run`) and the Foundry/PyRIT Red Teaming agent - (`agentops redteam run`) as active CI steps. -3. **Diagnose** repo, telemetry, landing zone, and Foundry readiness with - `agentops doctor`. -4. **Gate** the release with a deterministic exit-code contract that PRs and - pipelines can rely on. -5. **Prove** the release with a stable evidence pack (`evidence.json` + - `evidence.md`) that bundles eval results, ASSERT verdicts, red-team - findings, telemetry readiness, and Doctor findings for promotion review. -6. **Learn from production** by promoting reviewed traces into regression - datasets that feed the next eval cycle. - -The output is a clear answer to two questions reviewers actually ask: -**can we ship it, and where is the proof?** +Monitor and Application Insights, and whatever CI/CD platform your team +prefers all exist and do their job well. What was missing was the glue that +pulls them into one repeatable release loop. That is what AgentOps provides. + +The loop looks the same for every team and every agent. You evaluate the +agent against your datasets, rubrics, and policies, either locally or in the +cloud, with evaluators that AgentOps auto-selects based on whether the +scenario is RAG, tool use, model quality, or safety. You probe the agent +with adversarial inputs by running ASSERT through `agentops assert run` and +the Foundry/PyRIT red teaming agent through `agentops redteam run`, both as +active CI steps that gate the pipeline. You diagnose the rest of the +picture (repo layout, telemetry wiring, landing zone, and Foundry +configuration) with `agentops doctor`. The pipeline gates the release using +a deterministic exit-code contract that pull requests and CI/CD workflows +can rely on, and packages everything into a stable evidence pack +(`evidence.json` and `evidence.md`) that bundles eval results, ASSERT +verdicts, red-team findings, telemetry readiness, and Doctor findings for +whoever signs off on production. Once the release ships, AgentOps closes +the loop by promoting reviewed production traces back into regression +datasets that feed the next eval cycle. + +The output is a clear answer to the two questions reviewers actually ask: +can we ship it, and how do we know? ### Core outputs @@ -63,10 +65,11 @@ The output is a clear answer to two questions reviewers actually ask: ### Exit-code contract -- `0` — execution succeeded and all gates passed -- `2` — execution succeeded but a threshold, ASSERT violation, red-team rate, - or Doctor severity gate failed -- `1` — runtime or configuration error +AgentOps commands exit with `0` when execution succeeded and every gate +passed, with `2` when execution itself succeeded but a threshold, an ASSERT +violation, a red-team attack-success rate, or a Doctor severity gate +failed, and with `1` for runtime or configuration errors. Pipelines can +rely on this contract without parsing output. ## AgentOps and Microsoft Foundry From 6c7674af80d7dfad6cc7c512d6a34cefa5d927f5 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Tue, 9 Jun 2026 18:07:13 +0000 Subject: [PATCH 2/3] chore: prepare release 0.3.15 --- .claude-plugin/marketplace.json | 2 +- .github/plugin/marketplace.json | 2 +- CHANGELOG.md | 2 ++ plugins/agentops/package.json | 2 +- plugins/agentops/plugin.json | 2 +- 5 files changed, 6 insertions(+), 4 deletions(-) diff --git a/.claude-plugin/marketplace.json b/.claude-plugin/marketplace.json index d503c6b..0f55e33 100644 --- a/.claude-plugin/marketplace.json +++ b/.claude-plugin/marketplace.json @@ -13,7 +13,7 @@ "name": "agentops-accelerator", "source": "../../plugins/agentops", "description": "Copilot agent skills for running standardized evaluation workflows with AgentOps Toolkit and Microsoft Foundry agents.", - "version": "0.3.14", + "version": "0.3.15", "keywords": [ "agentops", "evaluation", diff --git a/.github/plugin/marketplace.json b/.github/plugin/marketplace.json index d503c6b..0f55e33 100644 --- a/.github/plugin/marketplace.json +++ b/.github/plugin/marketplace.json @@ -13,7 +13,7 @@ "name": "agentops-accelerator", "source": "../../plugins/agentops", "description": "Copilot agent skills for running standardized evaluation workflows with AgentOps Toolkit and Microsoft Foundry agents.", - "version": "0.3.14", + "version": "0.3.15", "keywords": [ "agentops", "evaluation", diff --git a/CHANGELOG.md b/CHANGELOG.md index c084a28..fcfc104 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -5,6 +5,8 @@ This format follows [Keep a Changelog](https://keepachangelog.com/) and adheres ## [Unreleased] +## [0.3.15] - 2026-06-09 + ## [0.3.14] - 2026-06-09 ### Added diff --git a/plugins/agentops/package.json b/plugins/agentops/package.json index 581612e..67739ae 100644 --- a/plugins/agentops/package.json +++ b/plugins/agentops/package.json @@ -2,7 +2,7 @@ "name": "agentops-accelerator", "displayName": "AgentOps Accelerator — Skills for GitHub Copilot", "description": "Copilot agent skills for running standardized evaluation workflows with AgentOps Accelerator and Microsoft Foundry agents.", - "version": "0.3.14", + "version": "0.3.15", "publisher": "AgentOpsAccelerator", "icon": "icon.png", "license": "MIT", diff --git a/plugins/agentops/plugin.json b/plugins/agentops/plugin.json index 00aad30..b928a5a 100644 --- a/plugins/agentops/plugin.json +++ b/plugins/agentops/plugin.json @@ -1,7 +1,7 @@ { "name": "agentops-accelerator", "description": "Copilot agent skills for running standardized evaluation workflows with AgentOps Accelerator and Microsoft Foundry agents.", - "version": "0.3.14", + "version": "0.3.15", "author": { "name": "AgentOps Accelerator", "url": "https://github.com/Azure/agentops" From c8ee78e33771d55d53fb5baaf469f374cd60b4b5 Mon Sep 17 00:00:00 2001 From: Paulo Lacerda Date: Tue, 9 Jun 2026 15:13:38 -0300 Subject: [PATCH 3/3] docs: tighten README Overview (shorter, no defensive framing) Drop `It is an orchestrator, not a reimplementation` (defensive). State what AgentOps does directly. Cut Overview from three paragraphs to two, roughly 200 words. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- README.md | 37 ++++++++++++++----------------------- 1 file changed, 14 insertions(+), 23 deletions(-) diff --git a/README.md b/README.md index ecef64b..ab55e7b 100644 --- a/README.md +++ b/README.md @@ -25,29 +25,20 @@ AgentOps Accelerator is an open-source framework and CLI that standardizes continuous evaluation, safety testing, and release readiness for enterprise AI agents running on Microsoft Foundry. -It is an orchestrator, not a reimplementation. Foundry already builds and -runs the agent. Tools like Foundry Evaluations, `azd ai agent eval`, the -open-source ASSERT framework, the PyRIT-backed AI Red Teaming agent, Azure -Monitor and Application Insights, and whatever CI/CD platform your team -prefers all exist and do their job well. What was missing was the glue that -pulls them into one repeatable release loop. That is what AgentOps provides. - -The loop looks the same for every team and every agent. You evaluate the -agent against your datasets, rubrics, and policies, either locally or in the -cloud, with evaluators that AgentOps auto-selects based on whether the -scenario is RAG, tool use, model quality, or safety. You probe the agent -with adversarial inputs by running ASSERT through `agentops assert run` and -the Foundry/PyRIT red teaming agent through `agentops redteam run`, both as -active CI steps that gate the pipeline. You diagnose the rest of the -picture (repo layout, telemetry wiring, landing zone, and Foundry -configuration) with `agentops doctor`. The pipeline gates the release using -a deterministic exit-code contract that pull requests and CI/CD workflows -can rely on, and packages everything into a stable evidence pack -(`evidence.json` and `evidence.md`) that bundles eval results, ASSERT -verdicts, red-team findings, telemetry readiness, and Doctor findings for -whoever signs off on production. Once the release ships, AgentOps closes -the loop by promoting reviewed production traces back into regression -datasets that feed the next eval cycle. +It connects Foundry Evaluations, `azd ai agent eval`, the open-source +ASSERT framework, the PyRIT-backed AI Red Teaming agent, Azure Monitor, +and your CI/CD platform into one repeatable release loop. You evaluate the +agent against your datasets, rubrics, and policies with auto-selected +evaluators for RAG, tool use, model quality, and safety. You probe it with +adversarial inputs through `agentops assert run` and `agentops redteam +run`. You diagnose the rest of the picture (repo layout, telemetry wiring, +landing zone, and Foundry configuration) with `agentops doctor`. The +pipeline gates the release using a deterministic exit-code contract, and +packages everything into a stable evidence pack (`evidence.json` and +`evidence.md`) that bundles eval results, ASSERT verdicts, red-team +findings, telemetry readiness, and Doctor findings for whoever signs off +on production. Once the release ships, reviewed production traces are +promoted back into regression datasets that feed the next eval cycle. The output is a clear answer to the two questions reviewers actually ask: can we ship it, and how do we know?