Add Bayes@N metric by mohsenhariri · Pull Request #1219 · huggingface/lighteval

mohsenhariri · 2026-04-29T00:05:57Z

Summary

This PR adds Bayes@N as a corpus-level sampling metric for repeated generative evaluations.

Bayes@N estimates model performance from repeated categorical outcomes using posterior moments, returning:

bayes@n: posterior mean performance estimate
bayes@n_sigma: posterior standard deviation, representing posterior uncertainty

The implementation supports binary outcomes by default and multi-category outcomes when category weights are provided. It also supports optional row-aligned prior observations.

Related resources:

Documentation: https://mohsenhariri.github.io/scorio/
GitHub repository: https://github.com/mohsenhariri/scorio
Source implementation: https://github.com/mohsenhariri/scorio/blob/main/scorio/eval/bayes.py
Paper: Don’t Pass@k: A Bayesian Framework for Large Language Model Evaluation, ICLR 2026
Blog post: https://mohsenhariri.github.io/papers/2025-10-21-bayes.html

Changes

Added reusable Bayes@N posterior moment computation in lighteval.metrics.bayes_at_n.
Added BayesAtN sample-level row collection for repeated generations.
Added BayesAtNCorpus corpus-level aggregators for posterior mean and posterior standard deviation.
Registered:
- Metrics.bayes_at_n
- Metrics.bayes_at_n_math
Updated grouped metric parameter handling for metric names such as ["bayes@n", "bayes@n_sigma"].
Documented Bayes@N in the metric list.
Added unit tests for math behavior, validation, corpus aggregation, metric registration, and sample scoring.

Testing

ruff check src/lighteval/metrics/bayes_at_n.py src/lighteval/metrics/metrics_corpus.py src/lighteval/metrics/metrics_sample.py src/lighteval/metrics/metrics.py src/lighteval/tasks/registry.py src/lighteval/pipeline.py tests/unit/metrics/test_bayes_at_n.py
pytest -q tests/unit/metrics/test_bayes_at_n.py

Add Bayes@N metric

a65e81c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Bayes@N metric#1219

Add Bayes@N metric#1219
mohsenhariri wants to merge 1 commit intohuggingface:mainfrom
mohsenhariri:lighteval-bayes-at-n

mohsenhariri commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mohsenhariri commented Apr 29, 2026

Summary

Changes

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant