Conversation
The eval module has been moved from `src/sre_agent/eval` into a top level `evals`. The eval entry points in `[project.scripts]` have been removed. Imports have been updated to reflect the move. READMEs have been updated to use `python -m evals.tool_call.run` and `python -m evals.diagnosis_quality.run`. Implements: #177
There was a problem hiding this comment.
A few comments but looks mostly fine, not sure what's the stuff in PR description, are they like a list of previous commits?
The PR is for moving the eval suite out to the root right, let's simplified the description.
I would also try running the eval suite on your end following the readme to see if the instructions makes sense to you or need more clarification.
Nice work though!!
There was a problem hiding this comment.
I think command in the run section needs to be updated:
uv sync --group eval
uv run sre-agent-run-tool-call-evaldoes not work.
There was a problem hiding this comment.
I think command in the run section needs to be updated:
uv sync --group eval
uv run sre-agent-run-diagnosis-quality-evaldoes not work.
| ```bash | ||
| uv run sre-agent-run-tool-call-eval | ||
| uv run sre-agent-run-diagnosis-quality-eval | ||
| uv run python -m evals.tool_call.run |
There was a problem hiding this comment.
Let’s add uv sync --group eval above this to ensure Opik is installed before running the eval suites. We should also make it clear that Opik needs to be set up first.
Let's add the below:
"""
Assuming you already have Opik up and running. If not, please refer to the README in either of the eval suites for setup instructions. Once ready, run the following to install prerequisites:
export GITHUB_PERSONAL_ACCESS_TOKEN="..."
export ANTHROPIC_API_KEY="..."
uv sync --group eval"""
Added the suggested comments about setting up Opik to the main README, and fixed the other READMEs.
Moved evals from
src/sre_agent/eval/to a top-levelevalsfolder, and fixed imports.