ARC Whitebox Estimation Challenge — `whestbench`

ARC Whitebox Estimation Challenge — `whestbench`

Library + CLI for the ARC Whitebox Estimation Challenge. Generates random ReLU MLPs, runs FLOP-budgeted estimators against Monte Carlo ground truth, and produces score reports.

For participants

👉 Start at the whest-starterkit. That repo is the on-ramp: a working estimator.py, four worked examples, stage-by-stage walkthroughs from "just iterate locally" to "package a submission".

For an interactive visualization of small random MLPs and estimator behavior, see the WhestBench Explorer — an in-browser companion that's optional but useful for building intuition.

This repo is the underlying engine. You don't need to clone it directly.

For library / CLI users

from whestbench import BaseEstimator, MLP, sample_mlp
import flopscope as flops
import flopscope.numpy as fnp


class MyEstimator(BaseEstimator):
    def predict(self, mlp: MLP, budget: int) -> fnp.ndarray:
        return fnp.zeros((mlp.depth, mlp.width))

CLI entry point (registered as both whest and whestbench):

whest validate --estimator path/to/estimator.py
whest run --estimator path/to/estimator.py --runner local
whest doctor

See docs/reference/cli-reference.md for the full command surface.

Repository layout

src/whestbench/
├── __init__.py            ← public API surface
├── cli.py                 ← `whest`/`whestbench` entry point
├── concurrency.py         ← parallel execution helpers
├── dataset.py             ← evaluation dataset I/O
├── doctor.py              ← `whest doctor` environment checks
├── domain.py              ← MLP, SetupContext, scoring spec
├── estimators.py          ← BaseEstimator + reference impls (mean/cov/combined)
├── generation.py          ← sample_mlp
├── hardware.py            ← hardware probing
├── loader.py              ← estimator module loading
├── packaging.py           ← submission packaging
├── presentation/          ← Rich rendering helpers
├── profiler.py            ← FLOP profiler integration
├── protocol.py            ← Server runner JSON protocol
├── reporting.py           ← Rich score report + smoke panels
├── runner.py              ← local/server runner orchestration
├── scoring.py             ← evaluate_estimator, ContestSpec
├── sdk.py                 ← Python SDK surface
├── simulation.py          ← Monte Carlo ground truth via flopscope
├── subprocess_worker.py   ← isolated estimator subprocess
└── templates/             ← `whest init` template assets
docs/
├── index.md               ← Library/CLI reference index
└── reference/             ← cli-reference, estimator-contract, score-report-fields, ...

Releases

Tagged via release-please. See docs/RELEASING.md.

Underlying FLOP accounting library: AIcrowd/flopscope (replaced the deprecated whest).

License

See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 816 Commits
.githooks		.githooks
.github/workflows		.github/workflows
assets/logo		assets/logo
docs		docs
profiling		profiling
scripts		scripts
src/whestbench		src/whestbench
tests		tests
.gitignore		.gitignore
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ARC Whitebox Estimation Challenge — `whestbench`

For participants

For library / CLI users

Repository layout

Releases

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ARC Whitebox Estimation Challenge — whestbench

For participants

For library / CLI users

Repository layout

Releases

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

ARC Whitebox Estimation Challenge — `whestbench`

Packages