eval/top level evals by HamzaSardar · Pull Request #204 · fuzzylabs/sre-agent

HamzaSardar · 2026-03-18T15:07:32Z

Moved evals from src/sre_agent/eval/ to a top-level evals folder, and fixed imports.

The eval module has been moved from `src/sre_agent/eval` into a top level `evals`. The eval entry points in `[project.scripts]` have been removed. Imports have been updated to reflect the move. READMEs have been updated to use `python -m evals.tool_call.run` and `python -m evals.diagnosis_quality.run`. Implements: #177

osw282

A few comments but looks mostly fine, not sure what's the stuff in PR description, are they like a list of previous commits?

The PR is for moving the eval suite out to the root right, let's simplified the description.

I would also try running the eval suite on your end following the readme to see if the instructions makes sense to you or need more clarification.

Nice work though!!

osw282 · 2026-03-19T09:42:18Z

evals/tool_call/README.md

I think command in the run section needs to be updated:

uv sync --group eval uv run sre-agent-run-tool-call-eval

does not work.

osw282 · 2026-03-19T09:43:08Z

evals/diagnosis_quality/README.md

I think command in the run section needs to be updated:

uv sync --group eval uv run sre-agent-run-diagnosis-quality-eval

does not work.

osw282 · 2026-03-19T09:44:29Z

README.md

 ```bash
-uv run sre-agent-run-tool-call-eval
-uv run sre-agent-run-diagnosis-quality-eval
+uv run python -m evals.tool_call.run


Let’s add uv sync --group eval above this to ensure Opik is installed before running the eval suites. We should also make it clear that Opik needs to be set up first.

Let's add the below:

"""
Assuming you already have Opik up and running. If not, please refer to the README in either of the eval suites for setup instructions. Once ready, run the following to install prerequisites:

export GITHUB_PERSONAL_ACCESS_TOKEN="..." export ANTHROPIC_API_KEY="..." uv sync --group eval

"""

Added the suggested comments about setting up Opik to the main README, and fixed the other READMEs.

HamzaSardar added the evaluation label Mar 18, 2026

HamzaSardar changed the base branch from main to develop March 18, 2026 15:09

HamzaSardar self-assigned this Mar 18, 2026

HamzaSardar requested a review from osw282 March 18, 2026 15:10

osw282 requested changes Mar 19, 2026

View reviewed changes

Fixed READMEs to reflect new invocation of eval suites.

60507ee

Added the suggested comments about setting up Opik to the main README, and fixed the other READMEs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

eval/top level evals#204

eval/top level evals#204
HamzaSardar wants to merge 2 commits intodevelopfrom
eval/top-level-evals

HamzaSardar commented Mar 18, 2026 •

edited

Loading

Uh oh!

osw282 left a comment •

edited

Loading

Uh oh!

osw282 Mar 19, 2026

Uh oh!

osw282 Mar 19, 2026

Uh oh!

osw282 Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

HamzaSardar commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

osw282 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

osw282 Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

osw282 Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

osw282 Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

HamzaSardar commented Mar 18, 2026 •

edited

Loading

osw282 left a comment •

edited

Loading