The submission pipeline and the stored results for the lean-eval benchmark.
This repository owns two things:
- The submission process — the issue intake, the
submissionworkflow that fetches a submission, evaluates it with comparator, and records the outcome, and the reconciler that catches stranded submission issues. - The results store —
results/<github-login>.json, the append-only public log of solved problems.
The benchmark problem set, the lean-eval CLI, and the comparator/landrun
security model live in leanprover/lean-eval.
The public leaderboard that renders these results is
leanprover/lean-eval-leaderboard
(view it →).
Open a Submit benchmark solution
issue. You point it at any content that contains at least one
lakefile.toml whose name matches a benchmark problem id with a
Submission.lean alongside it — a generated workspace, a fork of
leanprover/lean-eval with changes under generated/, a repo with
several workspaces, or a public gist. The CI walks the content and tries
every match.
Only Submission.lean and files under Submission/ are read. Nothing
else from your submission is inspected or published — only the set of
solved problem ids plus the metadata you enter on the form.
If your submission lives in a private repository, install the
lean-eval-bot GitHub App on it so the CI can clone it:
https://github.com/apps/lean-eval-bot.
Every evaluated submission's compressed source tarball is retained
indefinitely, age-encrypted, in the private
leanprover/lean-eval-audit
repository so that the exact bytes evaluated for any past submission
remain recoverable if a comparator regression, soundness incident, or
research question requires re-examining them. Decryption keys are
held only by the small set of maintainers listed in
.audit/recipients.txt; submitting agrees
to this retention (see the submission form's third acknowledgement).
The compressed source tarball is capped at 10 MiB; submissions
above the cap are rejected before evaluation. See
docs/audit-archive.md for the design and
the decryption procedure.
results/ holds machine-written artifacts produced by the submission
CI. Do not edit them by hand.
results/
<github-login>.json
One file per submitter; filenames use the lowercased GitHub login. Users without a successful submission have no file.
Successes are sticky: once a (user, model, problem) triple is
recorded it is never modified or removed, even if a later submission from
the same user no longer proves it.
{
"schema_version": 1,
"user": "kim-em",
"solved": {
"Claude Opus 4.7": {
"two_plus_two": {
"solved_at": "2026-05-01T03:16:18Z",
"benchmark_commit": "953d54a7af5038566775507761e48e365e7feb3b",
"submission_kind": "gist",
"submission_repo": "kim-em/22bad2dccd67bcca0df87c01d072ef39",
"submission_ref": "567b8d1feebbc6ccbb1f8ebb0a7bbcf5e914f135",
"submission_public": true,
"issue_number": 45,
"production_description": "..."
}
}
}
}| Field | Type | Description |
|---|---|---|
schema_version |
integer | Currently 1. |
user |
string | GitHub login, original case preserved. |
solved |
object | Map from model name to per-problem records. Never empty. |
The keys of solved are free-form model identifiers from the submission
form. Each value maps <problem_id> to a record:
| Field | Type | Description |
|---|---|---|
solved_at |
string | ISO 8601 UTC timestamp the record was first written. |
benchmark_commit |
string | 40-char SHA of the leanprover/lean-eval commit evaluated against. |
submission_kind |
string | github_repo or gist. |
submission_repo |
string | owner/repo for a repository, user/gist-id for a gist. |
submission_ref |
string | 40-char SHA pinning the submission at evaluation time. |
submission_public |
boolean | Whether the submission source was public at evaluation time. |
issue_number |
integer | The leanprover/lean-eval-submissions issue that triggered the evaluation. |
production_description |
string | absent | Optional free-form description from the form. |
issue_numberprovenance. Records written by this repository refer toleanprover/lean-eval-submissionsissues. Records dated before the submission pipeline moved here refer toleanprover/lean-evalissues.
When the submission CI records a successful submission:
- It reads
results/<login>.json, or starts from an emptysolvedmap. - The submission carries one model name; that is the bucket key.
- For each problem that passed: if
solved[<model>][<problem_id>]already exists, do nothing (sticky no-op); otherwise add a record. - If at least one new record was added, the CI commits and pushes; otherwise it makes no commit.
Breaking schema changes bump schema_version; consumers should refuse a
file whose schema_version they do not know.
submission issue on lean-eval-submissions
→ submission.yml: checkout leanprover/lean-eval (problem set + probes), evaluate
→ write results/<login>.json here, push
→ repository_dispatch results-advanced → lean-eval-leaderboard redeploys
submission-reconciler.yml is an hourly safety net: it closes submission
issues that never received a bot comment (workflow disabled, runner died,
etc.).
- Secrets, GitHub Apps, and branch protection:
docs/ci-secrets.md. - Security model / threat analysis:
SECURITY.md. ci.ymlruns the Python test suite,actionlint, the workflow-pin audit, andtests/test_submission_workflow.py(a structural guard onsubmission.yml's security-critical shape).