File tree Expand file tree Collapse file tree 5 files changed +432
-0
lines changed
Expand file tree Collapse file tree 5 files changed +432
-0
lines changed Original file line number Diff line number Diff line change 1+ # Calibration Evaluator
2+
3+ This directory contains an evaluator for measuring the calibration of LLM classifiers.
4+ It calculates:
5+ - ** Accuracy** : Fraction of correct predictions.
6+ - ** Brier Score** : Mean squared error of the probabilities. Lower is better.
7+ - ** ECE (Expected Calibration Error)** : Weighted average of the difference between confidence and accuracy in bins. Lower is better.
8+
9+ ## Usage
10+
11+ 1 . Install dependencies:
12+ ``` bash
13+ pip install datasets numpy openai python-dotenv
14+ ```
15+
16+ 2 . Set your Fireworks API key in ` .env ` or environment variables:
17+ ``` bash
18+ export FIREWORKS_API_KEY=your_key
19+ ```
20+
21+ 3 . Run the evaluation script:
22+ ``` bash
23+ python run_calibration.py
24+ ```
25+
26+ ## Files
27+
28+ - ` evaluator.py ` : Contains the ` calibration_evaluator ` batch reward function.
29+ - ` run_calibration.py ` : Script to load AG News dataset and run the evaluation on specified models.
30+
31+ ## Configuration
32+
33+ You can modify ` run_calibration.py ` to:
34+ - Change the models being evaluated (` MODELS ` list).
35+ - Change the dataset or number of samples.
36+ - Adjust the class mapping if using a different dataset.
37+
38+ You can modify ` evaluator.py ` to:
39+ - Change the class tokens (` CLASS_TOKENS ` ) if the model uses different tokenization.
40+ - Adjust ` top_logprobs ` if needed (note that some models limit this to 5).
You can’t perform that action at this time.
0 commit comments