Skip to content

Add AIME2025, GPQA, HealthBench evaluation_test suites; unify row-limiting via pytest flag; clean up examples #195

Add AIME2025, GPQA, HealthBench evaluation_test suites; unify row-limiting via pytest flag; clean up examples

Add AIME2025, GPQA, HealthBench evaluation_test suites; unify row-limiting via pytest flag; clean up examples #195

Triggered via pull request August 10, 2025 05:24
Status Failure
Total duration 6m 8s
Artifacts 2

ci.yml

on: pull_request
Lint & Type Check
53s
Lint & Type Check
Matrix: test-core
Batch Evaluation Tests
1m 36s
Batch Evaluation Tests
MCP End-to-End Tests
48s
MCP End-to-End Tests
Upload Coverage
0s
Upload Coverage
Fit to window
Zoom out
Zoom in

Annotations

2 errors and 1 warning
Core Tests (Python 3.12)
Process completed with exit code 1.
Core Tests (Python 3.11)
Process completed with exit code 1.
MCP End-to-End Tests
No files were found with the provided path: coverage.xml. No artifacts will be uploaded.

Artifacts

Produced during runtime
Name Size Digest
coverage-batch-eval Expired
30.7 KB
sha256:01a7b53b055a83d8d07a986e256c7c78fba6565adbbe26e429ce03a075e7a818
coverage-core-3.10 Expired
36.6 KB
sha256:ac08a00538707773367a4b814506628f3b26954950b4c0d5aedded3daf34d1ba