Skip to content

Add Bayes@N metric#1219

Open
mohsenhariri wants to merge 1 commit intohuggingface:mainfrom
mohsenhariri:lighteval-bayes-at-n
Open

Add Bayes@N metric#1219
mohsenhariri wants to merge 1 commit intohuggingface:mainfrom
mohsenhariri:lighteval-bayes-at-n

Conversation

@mohsenhariri
Copy link
Copy Markdown

Summary

This PR adds Bayes@N as a corpus-level sampling metric for repeated generative evaluations.

Bayes@N estimates model performance from repeated categorical outcomes using posterior moments, returning:

  • bayes@n: posterior mean performance estimate
  • bayes@n_sigma: posterior standard deviation, representing posterior uncertainty

The implementation supports binary outcomes by default and multi-category outcomes when category weights are provided. It also supports optional row-aligned prior observations.

Related resources:

Changes

  • Added reusable Bayes@N posterior moment computation in lighteval.metrics.bayes_at_n.
  • Added BayesAtN sample-level row collection for repeated generations.
  • Added BayesAtNCorpus corpus-level aggregators for posterior mean and posterior standard deviation.
  • Registered:
    • Metrics.bayes_at_n
    • Metrics.bayes_at_n_math
  • Updated grouped metric parameter handling for metric names such as ["bayes@n", "bayes@n_sigma"].
  • Documented Bayes@N in the metric list.
  • Added unit tests for math behavior, validation, corpus aggregation, metric registration, and sample scoring.

Testing

  • ruff check src/lighteval/metrics/bayes_at_n.py src/lighteval/metrics/metrics_corpus.py src/lighteval/metrics/metrics_sample.py src/lighteval/metrics/metrics.py src/lighteval/tasks/registry.py src/lighteval/pipeline.py tests/unit/metrics/test_bayes_at_n.py
  • pytest -q tests/unit/metrics/test_bayes_at_n.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant