Skip to content

Testing and Evulation#44

Merged
Iamsdt merged 8 commits intomainfrom
v3
Jan 2, 2026
Merged

Testing and Evulation#44
Iamsdt merged 8 commits intomainfrom
v3

Conversation

@Iamsdt
Copy link
Collaborator

@Iamsdt Iamsdt commented Jan 2, 2026

This pull request adds comprehensive agent evaluation capabilities and expands support for third-party LLM adapters. The most significant changes include the introduction of a new agentflow.evaluation module for agent assessment, new event and trajectory collectors, updates to the README for new LLM adapters, and improvements to the LLM adapter interface.

Agent Evaluation Framework:

  • Added the new agentflow.evaluation module, which provides agent evaluation tools including trajectory analysis, response quality assessment, LLM-as-judge criteria, and various reporters and simulators. This includes new classes like AgentEvaluator, EvalSet, TrajectoryCollector, and multiple evaluation criteria and reporters.
  • Added agentflow.evaluation.collectors with TrajectoryCollector and EventCollector for capturing execution data during agent runs. [1] [2]
  • Added agentflow.evaluation.criteria with a variety of evaluation criteria for agent assessment, such as trajectory matching, response similarity, hallucination detection, and LLM-based judging.

LLM Adapter Improvements:

  • Added support for Google Generative AI SDK with a new GoogleGenAIConverter in the LLM adapter interface (agentflow/adapters/llm/__init__.py). Updated documentation and exports accordingly. [1] [2]

Documentation Updates:

  • Updated the README.md to include installation instructions for new LLM adapters (google-genai, litellm) and updated the multi-extras installation example to include these new options. [1] [2]

Iamsdt added 8 commits January 1, 2026 00:32
- Implement GoogleGenAIConverter to handle standard and streaming responses from the google-genai SDK.
- Update README.md to include installation and usage instructions for the Google GenAI adapter.
- Create documentation for the Google Generative AI adapter detailing features, installation, quick start, and examples.
- Add example script demonstrating standard and streaming response handling with Google GenAI.
- Enhance pyproject.toml to include google-genai as an optional dependency.
- Add comprehensive unit tests for GoogleGenAIConverter functionality, covering various response scenarios.
- Implemented unit tests for TrajectoryCollector, covering event collection, tool execution, and result handling.
- Added tests for EventCollector to verify event addition and filtering.
- Updated uv.lock to reflect changes in package versions and added new dependencies.
- Introduced a comprehensive documentation for evaluation reporters including ConsoleReporter, JSONReporter, JUnitXMLReporter, and HTMLReporter, detailing usage, options, and output formats.
- Added user simulation documentation explaining the use of ConversationScenario and UserSimulator for dynamic conversation testing, including examples for creating scenarios and running simulations.
- Updated the index and mkdocs configuration to include new documentation sections for evaluation and user simulation.
- Introduced a new `testing` module for unit testing agents, providing utilities for behavior validation and assertions.
- Added an `evaluation` module for performance metrics collection and analysis of agent interactions.
- Enhanced the `@tool` decorator to support additional metadata, including tags for semantic categorization and improved documentation.
- Improved the `Agent` class to support tool filtering by tags, enhancing modularity in complex systems.
- Refactored existing code for better error handling, logging, and resource management.
- Removed the deprecated `InMemoryStore` class and updated related imports.
- Added comprehensive release notes for versions 0.5.6 and 0.5.7, summarizing key enhancements, bug fixes, and migration notes.
@Iamsdt Iamsdt merged commit ff0918a into main Jan 2, 2026
1 of 2 checks passed
@Iamsdt Iamsdt deleted the v3 branch January 2, 2026 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant