Skip to content

Add AIME2025, GPQA, HealthBench evaluation_test suites; unify row-limiting via pytest flag; clean up examples #193

Add AIME2025, GPQA, HealthBench evaluation_test suites; unify row-limiting via pytest flag; clean up examples

Add AIME2025, GPQA, HealthBench evaluation_test suites; unify row-limiting via pytest flag; clean up examples #193

Triggered via pull request August 9, 2025 23:59
Status Success
Total duration 8m 26s
Artifacts 4

ci.yml

on: pull_request
Lint & Type Check
1m 25s
Lint & Type Check
Matrix: test-core
Batch Evaluation Tests
1m 39s
Batch Evaluation Tests
MCP End-to-End Tests
48s
MCP End-to-End Tests
Upload Coverage
5s
Upload Coverage
Fit to window
Zoom out
Zoom in

Annotations

1 warning
MCP End-to-End Tests
No files were found with the provided path: coverage.xml. No artifacts will be uploaded.

Artifacts

Produced during runtime
Name Size Digest
coverage-batch-eval Expired
30.6 KB
sha256:610cf815998140497fbc452fc13fb47315c7f63b05f6c8f6b3d3f979016fa84e
coverage-core-3.10 Expired
36.3 KB
sha256:bfde16a1ff5de670fba298baeee75b1ec51a83588898dd7d50d552c4df9d282f
coverage-core-3.11 Expired
36.3 KB
sha256:a5befdc6ecda3d54aba9cae97f101ab34a6489445feebee58a46c75d3be7adad
coverage-core-3.12 Expired
36.3 KB
sha256:feea972f201d325b45252e0de687a0fa10762f0ca4b03eacea7ce0da78150457