Conversation
- Implement GoogleGenAIConverter to handle standard and streaming responses from the google-genai SDK. - Update README.md to include installation and usage instructions for the Google GenAI adapter. - Create documentation for the Google Generative AI adapter detailing features, installation, quick start, and examples. - Add example script demonstrating standard and streaming response handling with Google GenAI. - Enhance pyproject.toml to include google-genai as an optional dependency. - Add comprehensive unit tests for GoogleGenAIConverter functionality, covering various response scenarios.
- Implemented unit tests for TrajectoryCollector, covering event collection, tool execution, and result handling. - Added tests for EventCollector to verify event addition and filtering. - Updated uv.lock to reflect changes in package versions and added new dependencies.
- Introduced a comprehensive documentation for evaluation reporters including ConsoleReporter, JSONReporter, JUnitXMLReporter, and HTMLReporter, detailing usage, options, and output formats. - Added user simulation documentation explaining the use of ConversationScenario and UserSimulator for dynamic conversation testing, including examples for creating scenarios and running simulations. - Updated the index and mkdocs configuration to include new documentation sections for evaluation and user simulation.
…ing and evaluation frameworks
- Introduced a new `testing` module for unit testing agents, providing utilities for behavior validation and assertions. - Added an `evaluation` module for performance metrics collection and analysis of agent interactions. - Enhanced the `@tool` decorator to support additional metadata, including tags for semantic categorization and improved documentation. - Improved the `Agent` class to support tool filtering by tags, enhancing modularity in complex systems. - Refactored existing code for better error handling, logging, and resource management. - Removed the deprecated `InMemoryStore` class and updated related imports. - Added comprehensive release notes for versions 0.5.6 and 0.5.7, summarizing key enhancements, bug fixes, and migration notes.
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request adds comprehensive agent evaluation capabilities and expands support for third-party LLM adapters. The most significant changes include the introduction of a new
agentflow.evaluationmodule for agent assessment, new event and trajectory collectors, updates to the README for new LLM adapters, and improvements to the LLM adapter interface.Agent Evaluation Framework:
agentflow.evaluationmodule, which provides agent evaluation tools including trajectory analysis, response quality assessment, LLM-as-judge criteria, and various reporters and simulators. This includes new classes likeAgentEvaluator,EvalSet,TrajectoryCollector, and multiple evaluation criteria and reporters.agentflow.evaluation.collectorswithTrajectoryCollectorandEventCollectorfor capturing execution data during agent runs. [1] [2]agentflow.evaluation.criteriawith a variety of evaluation criteria for agent assessment, such as trajectory matching, response similarity, hallucination detection, and LLM-based judging.LLM Adapter Improvements:
GoogleGenAIConverterin the LLM adapter interface (agentflow/adapters/llm/__init__.py). Updated documentation and exports accordingly. [1] [2]Documentation Updates:
README.mdto include installation instructions for new LLM adapters (google-genai,litellm) and updated the multi-extras installation example to include these new options. [1] [2]