Skip to content

autogluon/tabarena

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1,162 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
    TabArena Logo

A Living Benchmark for Machine Learning on Tabular Data πŸ’«


πŸš€ Leaderboard πŸ“‚ Example Scripts πŸ“Š Dataset Curation πŸ“„ Paper

TabArena is a living benchmarking system that makes benchmarking tabular machine learning models a reliable experience. TabArena implements best practices to ensure methods are represented at their peak potential, including cross-validated ensembles, strong hyperparameter search spaces contributed by the method authors, early stopping, model refitting, parallel bagging, memory usage estimation, and more.

TabArena currently consists of:

  • 51 manually curated tabular datasets representing real-world tabular data tasks.
  • 9 to 30 evaluated splits per dataset.
  • 27+ tabular machine learning methods, including 10+ tabular foundation models.
  • More than 50 million trained models across the benchmark, with all validation and test predictions cached to enable tuning and post-hoc ensembling analysis.
  • A live TabArena leaderboard showcasing the results.

⚑ Quickstart

Tip

The fastest way to try TabArena end-to-end:

pip install uv
git clone https://github.com/autogluon/tabarena.git && cd tabarena
uv sync --extra benchmark
uv run python examples/benchmarking/run_quickstart_tabarena.py

For other install paths (eval-only, editable AutoGluon, dependency), see Installation below.

πŸ•ΉοΈ Use Cases

We share more details on various use cases of TabArena in our examples:

Datasets

Please refer to our dataset curation repository to learn more about or contributed data!

More Documentation

TabArena code is currently being polished. Detailed Documentation for TabArena will be available soon.

πŸͺ„ Installation

Important

Requires Python 3.11–3.13 and uv.

Pick the install path that matches what you want to do:

πŸ“Š Evaluation only β€” leaderboards & metrics, no model fitting
git clone https://github.com/autogluon/tabarena.git
cd tabarena
uv sync
πŸš€ Benchmark β€” core set of models for benchmarking

Installs the core models used for standard benchmarking: tabpfn, tabicl, ebm, search_spaces, realmlp, tabdpt, tabm.

git clone https://github.com/autogluon/tabarena.git
cd tabarena
uv sync --extra benchmark
βž• Benchmark + Extended β€” core models plus the extended model set

The extended extra is experimental and may fail to resolve or install due to incompatible version requirements across model dependencies. Use it only if you specifically need every model in a single environment; otherwise prefer benchmark or benchmark plus one specific model.

Layers the extended model set (modernnca, xrfm, sap-rpt-oss, ...) on top of the core benchmark set.

git clone https://github.com/autogluon/tabarena.git
cd tabarena
uv sync --extra benchmark --extra extended

To install only one extended model on top of benchmark (recommended over extended when you only need a single extra model), pass its extra by name β€” for example, just xrfm:

git clone https://github.com/autogluon/tabarena.git
cd tabarena
uv sync --extra benchmark --extra xrfm
πŸ› οΈ Developer β€” editable AutoGluon + editable TabArena

Create a virtual environment:

uv venv --seed --python 3.12 ~/.venvs/tabarena
source ~/.venvs/tabarena/bin/activate

Install editable AutoGluon and TabArena:

git clone https://github.com/autogluon/autogluon.git
./autogluon/full_install.sh

git clone https://github.com/autogluon/tabarena.git
uv pip install --prerelease=allow -e "./tabarena/tabarena[benchmark]"

In PyCharm, mark tabarena/ and each autogluon/src/ subdirectory as Sources Root so imports resolve.

πŸ“¦ Use TabArena as a dependency

Add the following to your project's dependencies:

"tabarena @ git+https://github.com/autogluon/tabarena.git#subdirectory=tabarena"

πŸ“¦ TabArena Artifacts

TabArena caches predictions, results, and leaderboards as downloadable artifacts so you can reproduce or extend any analysis without re-running the benchmark.

Artifact tiers, sizes, and examples

Artifacts download to ~/.cache/tabarena/ by default. Override the location with the TABARENA_CACHE environment variable.

Raw data is ~100 GB per method type. Point TABARENA_CACHE at a large disk before downloading it.

Tier Contents Size / method Example
Raw data Per-child test predictions, full metadata, system info ~100 GB inspect_raw_data.py
Processed data Minimal data for HPO simulation, portfolios, leaderboards ~10 GB inspect_processed_data.py
Results Per-config / HPO DataFrames (test error, val error, train time, inference time) <1 MB run_generate_main_leaderboard.py
Leaderboards Aggregated ELO, win-rate, average rank, improvability <1 MB β€”
Figures & Plots Generated from results and leaderboards β€” β€”

πŸ“„ Citation

[!TIP] If you use TabArena in a scientific publication, please cite our paper.

TabArena: A Living Benchmark for Machine Learning on Tabular Data Nick Erickson, Lennart Purucker, Andrej Tschalzev, David HolzmΓΌller, Prateek Mutalik Desai, David Salinas, Frank Hutter NeurIPS 2025, Datasets and Benchmarks Track

πŸ“„ arXiv Β· 🎀 NeurIPS poster & video

BibTeX

The entry uses year=2026 because NeurIPS'25 proceedings are published in 2026.

@article{erickson2026tabarena,
  title   = {TabArena: A Living Benchmark for Machine Learning on Tabular Data},
  author  = {Erickson, Nick and Purucker, Lennart and Tschalzev, Andrej and Holzm{\"u}ller, David and Desai, Prateek and Salinas, David and Hutter, Frank},
  journal = {Advances in Neural Information Processing Systems},
  volume  = {38},
  year    = {2026}
}

Relation to TabRepo

TabArena was built upon and now replaces TabRepo. To see details about TabRepo, the portfolio simulation repository, refer to tabrepo.md.

About

A Living Benchmark for Machine Learning on Tabular Data

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages