Skip to content

fixed evalution framework , fixed gen_ai_format function#51

Merged
Iamsdt merged 2 commits into10xHub:mainfrom
atharvacoolkni:main
Feb 26, 2026
Merged

fixed evalution framework , fixed gen_ai_format function#51
Iamsdt merged 2 commits into10xHub:mainfrom
atharvacoolkni:main

Conversation

@atharvacoolkni
Copy link
Contributor

This pull request introduces a major refactor and expansion of the agent evaluation framework, improving modularity, clarity, and configuration flexibility. The changes reorganize configuration files, add new preset and reporter configuration options, expand criterion configuration methods, and update imports across the package for better usability and maintainability.

Core framework and configuration improvements:

  • Split configuration models into a dedicated agentflow/evaluation/config package, moving eval_config.py and adding a new __init__.py for clear import paths and documentation. [1] [2]
  • Introduced ReporterConfig for fine-grained control over evaluation report generation, including output formats, verbosity, and file details. This is now part of EvalConfig. [1] [2] [3]
  • Added new preset configurations and expanded criterion configuration methods, including semantic response matching (LLM-based and ROUGE), node order matching, keyword presence, factual accuracy, hallucination, and safety, with detailed docstrings for each. Default judge models now use "gemini-2.5-flash". [1] [2] [3]
  • Updated and clarified module-level imports and documentation in agentflow/evaluation/__init__.py and agentflow/evaluation/collectors/__init__.py, grouping imports by theme and improving example usage. [1] [2]

API and robustness enhancements:

  • Improved token usage extraction in litellm_converter.py to handle missing or None values robustly.
  • Updated default model in UserSimulatorConfig to "gemini-2.5-flash" for consistency with LLM-based criteria.

These changes collectively make the evaluation framework more modular, easier to configure, and more robust for both internal and external users.


Framework and configuration refactor:

  • Moved evaluation configuration to agentflow/evaluation/config, added ReporterConfig, and expanded preset and criterion configuration methods for greater flexibility and clarity. [1] [2] [3] [4] [5] [6] [7]
  • Updated imports and documentation in agentflow/evaluation/__init__.py and agentflow/evaluation/collectors/__init__.py for improved usability and thematic grouping. [1] [2]

Criterion and preset expansion:

  • Added new configuration methods for node order matching, semantic response matching (LLM and ROUGE), keyword presence, factual accuracy, hallucination, and safety criteria, with detailed docstrings. Default judge model is now "gemini-2.5-flash". [1] [2] [3]

API robustness:

  • Improved token usage extraction in litellm_converter.py to handle missing or None values, preventing errors in downstream processing.

Model configuration:

  • Updated default model in UserSimulatorConfig to "gemini-2.5-flash" for consistency across LLM-based evaluation criteria.

@Iamsdt Iamsdt merged commit 9034de3 into 10xHub:main Feb 26, 2026
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants