fixed evalution framework , fixed gen_ai_format function by atharvacoolkni · Pull Request #51 · 10xHub/Agentflow

atharvacoolkni · 2026-02-25T12:48:34Z

This pull request introduces a major refactor and expansion of the agent evaluation framework, improving modularity, clarity, and configuration flexibility. The changes reorganize configuration files, add new preset and reporter configuration options, expand criterion configuration methods, and update imports across the package for better usability and maintainability.

Core framework and configuration improvements:

Split configuration models into a dedicated agentflow/evaluation/config package, moving eval_config.py and adding a new __init__.py for clear import paths and documentation. [1] [2]
Introduced ReporterConfig for fine-grained control over evaluation report generation, including output formats, verbosity, and file details. This is now part of EvalConfig. [1] [2] [3]
Added new preset configurations and expanded criterion configuration methods, including semantic response matching (LLM-based and ROUGE), node order matching, keyword presence, factual accuracy, hallucination, and safety, with detailed docstrings for each. Default judge models now use "gemini-2.5-flash". [1] [2] [3]
Updated and clarified module-level imports and documentation in agentflow/evaluation/__init__.py and agentflow/evaluation/collectors/__init__.py, grouping imports by theme and improving example usage. [1] [2]

API and robustness enhancements:

Improved token usage extraction in litellm_converter.py to handle missing or None values robustly.
Updated default model in UserSimulatorConfig to "gemini-2.5-flash" for consistency with LLM-based criteria.

These changes collectively make the evaluation framework more modular, easier to configure, and more robust for both internal and external users.

Framework and configuration refactor:

Moved evaluation configuration to agentflow/evaluation/config, added ReporterConfig, and expanded preset and criterion configuration methods for greater flexibility and clarity. [1] [2] [3] [4] [5] [6] [7]
Updated imports and documentation in agentflow/evaluation/__init__.py and agentflow/evaluation/collectors/__init__.py for improved usability and thematic grouping. [1] [2]

Criterion and preset expansion:

Added new configuration methods for node order matching, semantic response matching (LLM and ROUGE), keyword presence, factual accuracy, hallucination, and safety criteria, with detailed docstrings. Default judge model is now "gemini-2.5-flash". [1] [2] [3]

API robustness:

Improved token usage extraction in litellm_converter.py to handle missing or None values, preventing errors in downstream processing.

Model configuration:

Updated default model in UserSimulatorConfig to "gemini-2.5-flash" for consistency across LLM-based evaluation criteria.

codecov · 2026-02-26T12:25:49Z

Codecov Report

❌ Patch coverage is 40.25596% with 1027 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
agentflow/evaluation/reporters/html.py	19.37%	117 Missing and 12 partials ⚠️
agentflow/evaluation/evaluator.py	18.00%	122 Missing and 1 partial ⚠️
agentflow/evaluation/reporters/console.py	39.53%	93 Missing and 11 partials ⚠️
agentflow/evaluation/simulators/user_simulator.py	34.14%	70 Missing and 11 partials ⚠️
agentflow/evaluation/reporters/manager.py	25.23%	80 Missing ⚠️
agentflow/evaluation/criteria/llm_utils.py	16.09%	71 Missing and 2 partials ⚠️
agentflow/evaluation/reporters/json.py	25.26%	54 Missing and 17 partials ⚠️
...flow/evaluation/collectors/trajectory_collector.py	50.36%	64 Missing and 4 partials ⚠️
agentflow/evaluation/criteria/trajectory.py	29.62%	38 Missing ⚠️
agentflow/graph/agent.py	2.85%	33 Missing and 1 partial ⚠️
... and 18 more

📢 Thoughts on this report? Let us know!

atharvacoolkni added 2 commits February 25, 2026 17:35

fixed evalution framework , fixed gen_ai_format function

74cc91a

changed location of evaution_tests to examples and fixed precommit

6f146d9

Iamsdt approved these changes Feb 26, 2026

View reviewed changes

Iamsdt merged commit 9034de3 into 10xHub:main Feb 26, 2026
1 of 2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fixed evalution framework , fixed gen_ai_format function#51

fixed evalution framework , fixed gen_ai_format function#51
Iamsdt merged 2 commits into10xHub:mainfrom
atharvacoolkni:main

atharvacoolkni commented Feb 25, 2026

Uh oh!

codecov bot commented Feb 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

atharvacoolkni commented Feb 25, 2026

Uh oh!

codecov bot commented Feb 26, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants