🪄 Installation

A Living Benchmark for Machine Learning on Tabular Data 💫

🚀 Leaderboard	📂 Example Scripts	📊 Dataset Curation	📄 Paper

TabArena is a living benchmarking system that makes benchmarking tabular machine learning models a reliable experience. TabArena implements best practices to ensure methods are represented at their peak potential, including cross-validated ensembles, strong hyperparameter search spaces contributed by the method authors, early stopping, model refitting, parallel bagging, memory usage estimation, and more.

TabArena currently consists of:

51 manually curated tabular datasets representing real-world tabular data tasks.
9 to 30 evaluated splits per dataset.
27+ tabular machine learning methods, including 10+ tabular foundation models.
More than 50 million trained models across the benchmark, with all validation and test predictions cached to enable tuning and post-hoc ensembling analysis.
A live TabArena leaderboard showcasing the results.

⚡ Quickstart

Tip

The fastest way to try TabArena end-to-end:

pip install uv
git clone https://github.com/autogluon/tabarena.git && cd tabarena
uv sync --extra benchmark
uv run python examples/benchmarking/run_quickstart_tabarena.py

For other install paths (eval-only, editable AutoGluon, dependency), see Installation below.

🕹️ Use Cases

We share more details on various use cases of TabArena in our examples:

📊 Benchmarking Predictive Machine Learning Models: please refer to examples/benchmarking.
🚀 Using SOTA Tabular Models Benchmarked by TabArena: please refer to examples/running_tabarena_models.
🗃️ Analysing Metadata and Meta-Learning: please refer to examples/meta.
📈 Generating Plots and Leaderboards: please refer to examples/plots_and_leaderboards.
🔁 Reproducibility: we share instructions for reproducibility in examples.

Datasets

Please refer to our dataset curation repository to learn more about or contributed data!

🪄 Installation

Important

Requires Python 3.11–3.13 and uv.

Pick the install path that matches what you want to do:

📊 Evaluation only — leaderboards & metrics, no model fitting

git clone https://github.com/autogluon/tabarena.git
cd tabarena
uv sync

🚀 Benchmark — core set of models for benchmarking

Installs the core models used for standard benchmarking: tabpfn, tabicl, ebm, search_spaces, realmlp, tabdpt, tabm.

git clone https://github.com/autogluon/tabarena.git
cd tabarena
uv sync --extra benchmark

➕ Benchmark + Extended — core models plus the extended model set

The extended extra is experimental and may fail to resolve or install due to incompatible version requirements across model dependencies. Use it only if you specifically need every model in a single environment; otherwise prefer benchmark or benchmark plus one specific model.

Layers the extended model set (modernnca, xrfm, sap-rpt-oss, ...) on top of the core benchmark set.

git clone https://github.com/autogluon/tabarena.git
cd tabarena
uv sync --extra benchmark --extra extended

To install only one extended model on top of benchmark (recommended over extended when you only need a single extra model), pass its extra by name — for example, just xrfm:

git clone https://github.com/autogluon/tabarena.git
cd tabarena
uv sync --extra benchmark --extra xrfm

🛠️ Developer — editable AutoGluon + editable TabArena

Create a virtual environment:

uv venv --seed --python 3.12 ~/.venvs/tabarena
source ~/.venvs/tabarena/bin/activate

Install editable AutoGluon and TabArena:

git clone https://github.com/autogluon/autogluon.git
./autogluon/full_install.sh

git clone https://github.com/autogluon/tabarena.git
uv pip install --prerelease=allow -e "./tabarena/tabarena[benchmark]"

In PyCharm, mark tabarena/ and each autogluon/src/ subdirectory as Sources Root so imports resolve.

📦 Use TabArena as a dependency

Add the following to your project's dependencies:

"tabarena @ git+https://github.com/autogluon/tabarena.git#subdirectory=tabarena"

📦 TabArena Artifacts

TabArena caches predictions, results, and leaderboards as downloadable artifacts so you can reproduce or extend any analysis without re-running the benchmark.

Artifact tiers, sizes, and examples

Artifacts download to ~/.cache/tabarena/ by default. Override the location with the TABARENA_CACHE environment variable.

Raw data is ~100 GB per method type. Point TABARENA_CACHE at a large disk before downloading it.

Tier	Contents	Size / method	Example
Raw data	Per-child test predictions, full metadata, system info	~100 GB	`inspect_raw_data.py`
Processed data	Minimal data for HPO simulation, portfolios, leaderboards	~10 GB	`inspect_processed_data.py`
Results	Per-config / HPO DataFrames (test error, val error, train time, inference time)	<1 MB	`run_generate_main_leaderboard.py`
Leaderboards	Aggregated ELO, win-rate, average rank, improvability	<1 MB	—
Figures & Plots	Generated from results and leaderboards	—	—

📄 Citation

[!TIP] If you use TabArena in a scientific publication, please cite our paper.

TabArena: A Living Benchmark for Machine Learning on Tabular Data Nick Erickson, Lennart Purucker, Andrej Tschalzev, David Holzmüller, Prateek Mutalik Desai, David Salinas, Frank Hutter NeurIPS 2025, Datasets and Benchmarks Track

📄 arXiv · 🎤 NeurIPS poster & video

BibTeX

The entry uses year=2026 because NeurIPS'25 proceedings are published in 2026.

@article{erickson2026tabarena,
  title   = {TabArena: A Living Benchmark for Machine Learning on Tabular Data},
  author  = {Erickson, Nick and Purucker, Lennart and Tschalzev, Andrej and Holzm{\"u}ller, David and Desai, Prateek and Salinas, David and Hutter, Frank},
  journal = {Advances in Neural Information Processing Systems},
  volume  = {38},
  year    = {2026}
}

Relation to TabRepo

TabArena was built upon and now replaces TabRepo. To see details about TabRepo, the portfolio simulation repository, refer to tabrepo.md.

Name		Name	Last commit message	Last commit date
Latest commit History 1,162 Commits
.claude/skills/add-model		.claude/skills/add-model
.github/workflows		.github/workflows
bencheval		bencheval
examples		examples
tabarena		tabarena
tabflow		tabflow
tabflow_slurm		tabflow_slurm
tst		tst
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
ruff.toml		ruff.toml
tabrepo.md		tabrepo.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Living Benchmark for Machine Learning on Tabular Data 💫

⚡ Quickstart

🕹️ Use Cases

Datasets

More Documentation

🪄 Installation

📦 TabArena Artifacts

📄 Citation

Relation to TabRepo

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

A Living Benchmark for Machine Learning on Tabular Data 💫

⚡ Quickstart

🕹️ Use Cases

Datasets

More Documentation

🪄 Installation

📦 TabArena Artifacts

📄 Citation

Relation to TabRepo

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages