Skip to content

Commit 0e452bb

Browse files
committed
Add IFEval reward function for instruction-following evaluation
- 112 total constraints (54 IFEval/IFTrain + 58 IFBench OOD) - Self-contained module with no external repo dependencies - Partial credit scoring (fraction of constraints satisfied) - Automatic <think> tag stripping for reasoning models
1 parent 8e2686b commit 0e452bb

File tree

9 files changed

+8735
-0
lines changed

9 files changed

+8735
-0
lines changed
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
# IFEval Reward Function
2+
3+
Evaluates how well model responses follow instruction constraints. Returns a partial credit score (0.0 to 1.0).
4+
5+
## Quick Start
6+
7+
```python
8+
import sys
9+
sys.path.insert(0, '/path/to/eval_protocol/rewards/ifeval')
10+
from reward import ifeval_partial_credit_reward
11+
12+
response = "Hello world! This is my response."
13+
ground_truth = {
14+
"instruction_id": ["keywords:existence"],
15+
"kwargs": [{"keywords": ["hello", "world"]}]
16+
}
17+
18+
score = ifeval_partial_credit_reward(response, ground_truth)
19+
# Score: 1.0 (all constraints satisfied)
20+
```
21+
22+
## Dependencies
23+
24+
```bash
25+
pip install spacy nltk langdetect emoji syllapy immutabledict
26+
python -m spacy download en_core_web_sm
27+
```
28+
29+
## Notes
30+
31+
- Automatically strips `<think>...</think>` tags before evaluation
32+
- Ground truth can be a dict, list, or JSON string
33+
- 112 total constraints (54 IFEval/IFTrain + 58 IFBench OOD)
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
"""IFEval reward function for evaluating instruction-following capabilities.
2+
3+
Usage:
4+
# Option 1: Import spacy first to avoid cupy conflicts in some Docker environments
5+
import spacy
6+
from eval_protocol.rewards.ifeval import ifeval_partial_credit_reward
7+
8+
# Option 2: Direct import (add ifeval dir to path)
9+
import sys
10+
sys.path.insert(0, '/path/to/eval_protocol/rewards/ifeval')
11+
from reward import ifeval_partial_credit_reward
12+
13+
score = ifeval_partial_credit_reward(response, ground_truth)
14+
"""
15+
16+
from .reward import ifeval_partial_credit_reward
17+
18+
__all__ = ["ifeval_partial_credit_reward"]

0 commit comments

Comments
 (0)