[evals] bump eval runner to v1.3, add v1.3 results placeholder by Obsidian68 · Pull Request #18 · Obsidian68/Engram

Obsidian68 · 2026-05-03T13:12:20Z

Summary

Updated evals/runner.py: RESULTS_PATH changed from v1.2.json to v1.3.json, version string changed from "1.2.0" to "1.3.0"
Added evals/results/v1.3.json placeholder with version "1.3.0", zero metrics, empty sweep_results
Updated CHANGELOG with v1.3.0 section
Updated STATUS and progress continuity docs

Known limitations

evals/sweep.py line 152 still has "version": "1.2.0" — out of scope for this branch, needs updating on integration

Test plan

uv run pytest — 277 tests pass
uv run ruff check evals/ — lint clean
uv run ruff format --check evals/ — format clean
Zero TODOs, FIXMEs, HACKs, XXXs in changed files
evals/results/v1.3.json contains valid JSON with version "1.3.0"
python -m evals.runner runs end-to-end against live v1.3 server (requires running server)

🤖 Generated with Claude Code

- Update RESULTS_PATH from v1.2.json to v1.3.json - Update version string from 1.2.0 to 1.3.0 - Add evals/results/v1.3.json placeholder (populated on integration) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 85afc24321

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-03T13:14:35Z


 GOLDEN_SET_PATH = Path(__file__).parent / "golden_set" / "queries.jsonl"
-RESULTS_PATH = Path(__file__).parent / "results" / "v1.2.json"
+RESULTS_PATH = Path(__file__).parent / "results" / "v1.3.json"


Keep sweep version in sync with the v1.3 results target

Changing RESULTS_PATH to v1.3.json here also affects evals/sweep.py because it imports this constant (from evals.runner import ... RESULTS_PATH), but sweep still hardcodes "version": "1.2.0" in its output. After this commit, running the sweep will write a 1.2.0 payload into the v1.3 results file, which can corrupt versioned eval artifacts and downstream comparisons that trust file/version consistency.

Useful? React with 👍 / 👎.

chatgpt-codex-connector Bot reviewed May 3, 2026

View reviewed changes

Obsidian68 merged commit 294e5ea into feat/integration May 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[evals] bump eval runner to v1.3, add v1.3 results placeholder#18

[evals] bump eval runner to v1.3, add v1.3 results placeholder#18
Obsidian68 merged 1 commit intofeat/integrationfrom
feat/evals-v1.3

Obsidian68 commented May 3, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Obsidian68 commented May 3, 2026

Summary

Known limitations

Test plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant