feat: Add pointwise evaluation mode with pytest integration#4
Merged
Conversation
- Add mode='pointwise' parameter to @evaluation_test decorator - Enable elegant row-by-row evaluation where core logic is separated from test configuration - Add comprehensive word_count example using pointwise mode with haikus dependency - Update README.md with clean architecture documentation and Mermaid diagram - Show parameterized evaluation components in visual diagram - Include both pointwise and batch mode examples - Add dataset adapter helper for word_count evaluation - Deprecate old @reward_function pattern in favor of pytest-based approach This provides a much more elegant API where users define just the core evaluation logic and everything else (models, datasets, thresholds, rollout strategies) is parameterized in the decorator, with full pytest integration for testing and CI/CD.
dphuang2
reviewed
Aug 1, 2025
dphuang2
reviewed
Aug 1, 2025
dphuang2
reviewed
Aug 1, 2025
dphuang2
reviewed
Aug 1, 2025
dphuang2
reviewed
Aug 1, 2025
dphuang2
reviewed
Aug 1, 2025
Collaborator
|
im going to hijack this |
added 8 commits
August 3, 2025 10:05
# Conflicts: # README.md
…cated _execute_function to streamline execution of both async and non-async functions.
…n for pointwise and batch modes, updating tests to use 'rows' instead of 'input_dataset' for consistency.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR introduces a new pointwise evaluation mode that provides an elegant API for writing LLM evaluation functions with full pytest integration.
Key Changes
🎯 New Pointwise Mode
mode='pointwise'parameter to@evaluation_testdecorator📊 Clean Architecture
📚 Documentation Overhaul
🧪 Working Example
Before vs After
Old Pattern (Deprecated):
New Pattern (Recommended):
Testing
Migration Path
@reward_functionpattern still works but is deprecatedThis provides a much more elegant and maintainable approach to LLM evaluation functions.