Skip to content

dont allow all 0#363

Merged
xzrderek merged 4 commits intomainfrom
derekx/dont-allow-all-0
Dec 10, 2025
Merged

dont allow all 0#363
xzrderek merged 4 commits intomainfrom
derekx/dont-allow-all-0

Conversation

@xzrderek
Copy link
Copy Markdown
Contributor

@xzrderek xzrderek commented Dec 10, 2025

Note

Enforces a small success threshold for evaluation tests (host/Docker), forwards EP_SUMMARY_JSON into containers, and adds tests validating threshold behavior.

  • CLI:
    • Local test runner (eval_protocol/cli_commands/local_test.py):
      • Always pass --ep-success-threshold 0.001 to pytest (host and Docker) so all-zero-score runs fail.
      • Forward EP_SUMMARY_JSON into Docker when it points under ~/.eval_protocol to expose summary artifacts on host.
    • RFT creation (eval_protocol/cli_commands/create_rft.py):
      • Expand validator docstring to clarify enforced success threshold behavior.
  • Tests (tests/test_evaluation_postprocess.py):
    • Add threshold tests asserting all-zero scores fail at success=0.01 and equality passes.
    • Import updates to support new tests.

Written by Cursor Bugbot for commit a891471. This will update automatically on new commits. Configure here.

@xzrderek xzrderek requested a review from benjibc December 10, 2025 07:05
@xzrderek xzrderek merged commit 4ee6b27 into main Dec 10, 2025
2 of 3 checks passed
@xzrderek xzrderek deleted the derekx/dont-allow-all-0 branch December 10, 2025 07:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants