Standards for building agents, better
-
Updated
Feb 22, 2026 - TypeScript
Standards for building agents, better
Agentic testing for agentic codebases
Ship agents you can audit.
The pre-flight check for AI agents
The open-source MultiAgentOps evaluation and verification harness for any industry business workflow.
Deterministic runtime for agent evaluation
GitHub template for agent-testable SaaS apps. Next.js 16 + shadcn/ui + Neon Postgres + agent-browser e2e testing via accessibility tree.
Intent-first unit testing framework for AI agents in Node.js and TypeScript.
Diagnose your AI agents in production. Extract policies from prompts, evaluate traces, generate diagnostic reports.
Qualitative benchmark suite for evaluating AI coding agents and orchestration paradigms on realistic, complex development tasks
Typed Kotlin DSL framework for AI agent systems.
Agent testing automation ๐ค by simulating users ๐ฅ and agents ๐ค with judge โ๏ธ(langwatch-scenario)
Simulation environment for testing and validating autonomous agents
Evaluation and competition arena for testing agents, systems, or workflows in structured local-first scenarios.
The definitive benchmark for AI agents on OpenClaw. 45 tasks across 4 tiers. Powered by MyClaw.ai
Token-efficient stochastic testing for AI agents. 5-20x cost reduction. 10 framework adapters. Paper: arXiv:2603.02601
PHP testing framework for LLM agents โ multi-turn dialogs, cassette replay, tool calling, LLM-as-judge assertions
๐ฌ Playwright for AI Agents โ test what your agent DOES, not what it SAYS. YAML-first, 2900+ tests.
๐ ๐๐ถ๐ญ๐ต๐ช-๐๐จ๐ฆ๐ฏ๐ต ๐๐บ๐ด๐ต๐ฆ๐ฎ ๐ง๐ฐ๐ณ ๐๐ณ๐ฐ๐ด๐ด-๐๐ฉ๐ฆ๐ค๐ฌ๐ช๐ฏ๐จ ๐๐ฉ๐ช๐ด๐ฉ๐ช๐ฏ๐จ ๐๐๐๐ด.
Build and manage intelligent autonomous agents using a modular, multi-language framework in Python and Rust.
Add a description, image, and links to the agent-testing topic page so that developers can more easily learn about it.
To associate your repository with the agent-testing topic, visit your repo's landing page and select "manage topics."