Skip to content

eval/top level evals#204

Open
HamzaSardar wants to merge 2 commits intodevelopfrom
eval/top-level-evals
Open

eval/top level evals#204
HamzaSardar wants to merge 2 commits intodevelopfrom
eval/top-level-evals

Conversation

@HamzaSardar
Copy link
Copy Markdown

@HamzaSardar HamzaSardar commented Mar 18, 2026

Moved evals from src/sre_agent/eval/ to a top-level evals folder, and fixed imports.

The eval module has been moved from `src/sre_agent/eval` into a top level `evals`. The eval entry points in `[project.scripts]` have been removed. Imports have been updated to reflect the move. READMEs have been updated to use `python -m evals.tool_call.run` and `python -m evals.diagnosis_quality.run`.

Implements: #177
@HamzaSardar HamzaSardar changed the base branch from main to develop March 18, 2026 15:09
@HamzaSardar HamzaSardar self-assigned this Mar 18, 2026
@HamzaSardar HamzaSardar requested a review from osw282 March 18, 2026 15:10
Copy link
Copy Markdown
Contributor

@osw282 osw282 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments but looks mostly fine, not sure what's the stuff in PR description, are they like a list of previous commits?

The PR is for moving the eval suite out to the root right, let's simplified the description.

I would also try running the eval suite on your end following the readme to see if the instructions makes sense to you or need more clarification.

Nice work though!!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think command in the run section needs to be updated:

uv sync --group eval
uv run sre-agent-run-tool-call-eval

does not work.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think command in the run section needs to be updated:

uv sync --group eval
uv run sre-agent-run-diagnosis-quality-eval

does not work.

```bash
uv run sre-agent-run-tool-call-eval
uv run sre-agent-run-diagnosis-quality-eval
uv run python -m evals.tool_call.run
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let’s add uv sync --group eval above this to ensure Opik is installed before running the eval suites. We should also make it clear that Opik needs to be set up first.

Let's add the below:

"""
Assuming you already have Opik up and running. If not, please refer to the README in either of the eval suites for setup instructions. Once ready, run the following to install prerequisites:

export GITHUB_PERSONAL_ACCESS_TOKEN="..."
export ANTHROPIC_API_KEY="..."
uv sync --group eval

"""

Added the suggested comments about setting up Opik to the main README, and fixed the other READMEs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants