agent-testing

Star

Here are 36 public repositories matching this topic...

langwatch / better-agents

Star

Standards for building agents, better

python cli typescript mcp dev-tool ai-agents agent-framework llmops lllm agent-testing

Updated Feb 22, 2026
TypeScript

langwatch / scenario

Star

Agentic testing for agentic codebases

python-library javascript-library ai-testing agent-simulations agent-testing

Updated Mar 26, 2026
TypeScript

dowhiledev / nomos

Star

Ship agents you can audit.

agent agentic-ai flow-based-agent step-guided-agent agent-testing

Updated Nov 9, 2025
Python

inkog-io / inkog

Star

The pre-flight check for AI agents

cli golang static-analysis owasp security-tools devsecops sast ai-security llm-security owasp-llm-top-10 agent-testing

Updated Mar 24, 2026
Go

najeed / ai-agent-eval-harness

Star

The open-source MultiAgentOps evaluation and verification harness for any industry business workflow.

Updated Mar 29, 2026
Python

justindobbs / Tracecore

Star

Deterministic runtime for agent evaluation

reliability-engineering specification ai-agents benchmarking-framework autogen fastapi langchain observability-platform ai-evaluation-framework agent-testing agent-benchmark deterministic-testing autoresearch

Updated Mar 25, 2026
Python

iscale-llc / agentic-react-nextjs-shadcn

Star

GitHub template for agent-testable SaaS apps. Next.js 16 + shadcn/ui + Neon Postgres + agent-browser e2e testing via accessibility tree.

react open-source typescript accessibility nextjs a11y ai-agents saas-template shadcn-ui drizzle-orm neon-postgres agent-testing

Updated Mar 3, 2026
TypeScript

vitron-ai / themis

Star

Intent-first unit testing framework for AI agents in Node.js and TypeScript.

nodejs testing unit-testing typescript ai test-framework developer-tools ai-agents llm ai-testing agent-testing

Updated Mar 27, 2026
JavaScript

converra / agent-triage

Star

Diagnose your AI agents in production. Extract policies from prompts, evaluate traces, generate diagnostic reports.

Updated Mar 10, 2026
TypeScript

pyros-projects / agent-comparison

Star

Qualitative benchmark suite for evaluating AI coding agents and orchestration paradigms on realistic, complex development tasks

orchestration ai-agents ai-benchmarks qualitative-evaluation llm-agents coding-agents agentic-workflows agent-evaluation agent-testing ai-coding-assistants agent-comparison development-tasks

Updated Nov 25, 2025
Python

Deep-CodeAI / Agents.KT

Star

Typed Kotlin DSL framework for AI agent systems.

Updated Mar 28, 2026
Kotlin

kimtth / agent-auto-eval-azure-aoai-sk

Star

Agent testing automation 🤖 by simulating users 👥 and agents 🤝 with judge ⚖️(langwatch-scenario)

scenario azure-openai semantic-kernel agent-testing

Updated Jul 4, 2025
Python

joshualamerton / agentic-sandbox

Star

Simulation environment for testing and validating autonomous agents

python ai developer-tools autonomous-agents ai-agents developer-tools-test agent-testing

Updated Mar 13, 2026
Python

NYX-305Parad0xLabs / null-arena

Star

Evaluation and competition arena for testing agents, systems, or workflows in structured local-first scenarios.

python infrastructure benchmarking arena simulation evaluation scoring experimentation agent-testing local-f

Updated Mar 19, 2026
Python

LeoYeAI / myclaw-bench

Star

The definitive benchmark for AI agents on OpenClaw. 45 tasks across 4 tiers. Powered by MyClaw.ai

benchmark ai-agent llm-evaluation ai-benchmark agent-testing openclaw myclaw

Updated Mar 9, 2026
Python

qualixar / agentassay

Star

Token-efficient stochastic testing for AI agents. 5-20x cost reduction. 10 framework adapters. Paper: arXiv:2603.02601

python pytest-plugin regression-testing ai-agents statistical-testing ai-testing llm-testing token-optimization agent-testing qualixar

Updated Mar 6, 2026
Python

Avead556 / probellm

Star

PHP testing framework for LLM agents — multi-turn dialogs, cassette replay, tool calling, LLM-as-judge assertions

testing php ai phpunit openai llm elevenlabs anthropic llm-testing agent-testing

Updated Feb 20, 2026
PHP

NeuZhou / agentprobe

Star

🔬 Playwright for AI Agents — test what your agent DOES, not what it SAYS. YAML-first, 2900+ tests.

typescript mcp devtools test-automation ci-cd contract-testing testing-framework chaos-testing ai-agents playwright ai-testing ai-quality llm-testing agent-testing ai-agent-testing

Updated Mar 29, 2026
TypeScript

vksundararajan / cross-check

Star

𝘈 𝘔𝘶𝘭𝘵𝘪-𝘈𝘨𝘦𝘯𝘵 𝘚𝘺𝘴𝘵𝘦𝘮 𝘧𝘰𝘳 𝘊𝘳𝘰𝘴𝘴-𝘊𝘩𝘦𝘤𝘬𝘪𝘯𝘨 𝘗𝘩𝘪𝘴𝘩𝘪𝘯𝘨 𝘜𝘙𝘓𝘴.

dockerfile pytest cybersecurity adk mesop agent-development agent-evals adk-python agent-testing

Updated Dec 17, 2025
Python

Donovan443 / agentic-codebase

Star

Build and manage intelligent autonomous agents using a modular, multi-language framework in Python and Rust.

amp mcp opencode javascript-library code-search openai vscode-extension merkle-tree agents embedding human-in-the-loop jax voyage-ai sourcebot vibe-coding agent-testing

Updated Mar 29, 2026
Rust

Improve this page

Add a description, image, and links to the agent-testing topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the agent-testing topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

agent-testing

Here are 36 public repositories matching this topic...

langwatch / better-agents

langwatch / scenario

dowhiledev / nomos

inkog-io / inkog

najeed / ai-agent-eval-harness

justindobbs / Tracecore

iscale-llc / agentic-react-nextjs-shadcn

vitron-ai / themis

converra / agent-triage

pyros-projects / agent-comparison

Deep-CodeAI / Agents.KT

kimtth / agent-auto-eval-azure-aoai-sk

joshualamerton / agentic-sandbox

NYX-305Parad0xLabs / null-arena

LeoYeAI / myclaw-bench

qualixar / agentassay

Avead556 / probellm

NeuZhou / agentprobe

vksundararajan / cross-check

Donovan443 / agentic-codebase

Improve this page

Add this topic to your repo