Testing and Evulation by Iamsdt · Pull Request #44 · 10xHub/Agentflow

Iamsdt · 2026-01-02T15:11:22Z

This pull request adds comprehensive agent evaluation capabilities and expands support for third-party LLM adapters. The most significant changes include the introduction of a new agentflow.evaluation module for agent assessment, new event and trajectory collectors, updates to the README for new LLM adapters, and improvements to the LLM adapter interface.

Agent Evaluation Framework:

Added the new agentflow.evaluation module, which provides agent evaluation tools including trajectory analysis, response quality assessment, LLM-as-judge criteria, and various reporters and simulators. This includes new classes like AgentEvaluator, EvalSet, TrajectoryCollector, and multiple evaluation criteria and reporters.
Added agentflow.evaluation.collectors with TrajectoryCollector and EventCollector for capturing execution data during agent runs. [1] [2]
Added agentflow.evaluation.criteria with a variety of evaluation criteria for agent assessment, such as trajectory matching, response similarity, hallucination detection, and LLM-based judging.

LLM Adapter Improvements:

Added support for Google Generative AI SDK with a new GoogleGenAIConverter in the LLM adapter interface (agentflow/adapters/llm/__init__.py). Updated documentation and exports accordingly. [1] [2]

Documentation Updates:

Updated the README.md to include installation instructions for new LLM adapters (google-genai, litellm) and updated the multi-extras installation example to include these new options. [1] [2]

- Implement GoogleGenAIConverter to handle standard and streaming responses from the google-genai SDK. - Update README.md to include installation and usage instructions for the Google GenAI adapter. - Create documentation for the Google Generative AI adapter detailing features, installation, quick start, and examples. - Add example script demonstrating standard and streaming response handling with Google GenAI. - Enhance pyproject.toml to include google-genai as an optional dependency. - Add comprehensive unit tests for GoogleGenAIConverter functionality, covering various response scenarios.

- Implemented unit tests for TrajectoryCollector, covering event collection, tool execution, and result handling. - Added tests for EventCollector to verify event addition and filtering. - Updated uv.lock to reflect changes in package versions and added new dependencies.

- Introduced a comprehensive documentation for evaluation reporters including ConsoleReporter, JSONReporter, JUnitXMLReporter, and HTMLReporter, detailing usage, options, and output formats. - Added user simulation documentation explaining the use of ConversationScenario and UserSimulator for dynamic conversation testing, including examples for creating scenarios and running simulations. - Updated the index and mkdocs configuration to include new documentation sections for evaluation and user simulation.

…ing and evaluation frameworks

- Introduced a new `testing` module for unit testing agents, providing utilities for behavior validation and assertions. - Added an `evaluation` module for performance metrics collection and analysis of agent interactions. - Enhanced the `@tool` decorator to support additional metadata, including tags for semantic categorization and improved documentation. - Improved the `Agent` class to support tool filtering by tags, enhancing modularity in complex systems. - Refactored existing code for better error handling, logging, and resource management. - Removed the deprecated `InMemoryStore` class and updated related imports. - Added comprehensive release notes for versions 0.5.6 and 0.5.7, summarizing key enhancements, bug fixes, and migration notes.

…gelog

…ntent string

codecov · 2026-01-02T15:33:47Z

Codecov Report

❌ Patch coverage is 65.56927% with 753 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
agentflow/evaluation/evaluator.py	36.93%	104 Missing and 7 partials ⚠️
agentflow/evaluation/criteria/llm_judge.py	19.35%	100 Missing ⚠️
agentflow/testing/mock_mcp.py	29.83%	87 Missing ⚠️
agentflow/evaluation/criteria/advanced.py	59.16%	60 Missing and 18 partials ⚠️
agentflow/evaluation/simulators/user_simulator.py	48.03%	60 Missing and 6 partials ⚠️
agentflow/adapters/llm/google_genai_converter.py	72.41%	33 Missing and 15 partials ⚠️
agentflow/evaluation/testing.py	40.27%	43 Missing ⚠️
...flow/evaluation/collectors/trajectory_collector.py	68.75%	24 Missing and 11 partials ⚠️
agentflow/evaluation/criteria/response.py	67.74%	12 Missing and 18 partials ⚠️
agentflow/evaluation/eval_result.py	80.74%	23 Missing and 3 partials ⚠️
... and 13 more

📢 Thoughts on this report? Let us know!

Iamsdt added 8 commits January 1, 2026 00:32

Added Unit Testing Options

869ac9b

feat: Add release notes and changelog for version 0.5.7 with new test…

a1b5730

…ing and evaluation frameworks

feat: Add Google SDK support for converters in release notes and chan…

6add8c3

…gelog

fix: Update logger debug message to use configurable cut ratio for co…

1bb5fc2

…ntent string

Iamsdt merged commit ff0918a into main Jan 2, 2026
1 of 2 checks passed

Iamsdt deleted the v3 branch January 2, 2026 15:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Testing and Evulation#44

Testing and Evulation#44
Iamsdt merged 8 commits intomainfrom
v3

Iamsdt commented Jan 2, 2026

Uh oh!

codecov bot commented Jan 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant