chore(evals): unblock ToolPredictionScorer by JoshFerge · Pull Request #962 · getsentry/sentry-mcp

JoshFerge · 2026-05-13T22:04:43Z

Summary

Two unrelated bugs were both blocking the entire eval suite — every eval currently crashes with `MCPClientError: Connection closed` before any scoring runs. Both are fixed here.

Stale CLI flag. `toolPredictionScorer` spawns `pnpm exec sentry-mcp` with `--access-token=mocked-access-token --all-scopes`. `--all-scopes` was removed in refactor: remove legacy grantedScopes in favor of grantedSkills #668 ("remove legacy grantedScopes in favor of grantedSkills"), so the spawned binary prints its usage banner and exits, surfacing as a closed-stdio error. Drop the flag.
OpenAI strict response_format rejection. The prediction schema uses `arguments: z.record(z.any()).optional().default({})`. zod-to-JSON-Schema emits this as `additionalProperties: {}` (no `type` key), which OpenAI's strict structured-output endpoint refuses with `AI_APICallError: Invalid schema for response_format 'response': … schema must have a 'type' key`. Verified this also fails on the latest `@ai-sdk/openai@3.0.63` + `ai@6.0.182`, so it's not a stale-dep issue. Encode arguments as a JSON string instead.

With both fixed, the eval suite runs end-to-end again (verified locally — autofix.eval.ts cases score 1.00 against the current main).

Test plan

`pnpm run tsc`
`pnpm run lint`
`pnpm run test` — all 1341 tests pass
`pnpm vitest run autofix.eval.ts` with `OPENAI_API_KEY` set — both cases score 1.00

Notes for reviewers

`predictedTools[*].arguments` was already only read for the rationale/metadata field in the scorer's return value, so renaming to `argumentsJson` doesn't break scoring.
First in a stack of three. Subsequent PRs do the Sentry-side autofix migration to explorer-mode endpoints.

🤖 Generated with Claude Code

Agent transcript: https://claudescope.sentry.dev/share/HYSozfr18sU5lMEbTaUO5UX2YmVyaBdNoueQpOS6xJQ

- Drop the stale `--all-scopes` flag the bin no longer accepts; the scorer-side stdio spawn was printing usage and exiting, surfacing as `MCPClientError: Connection closed` across every eval. - Switch the `predictedTools[*].arguments` field to a JSON-encoded string; `z.record(z.any())` emits `additionalProperties` with no `type`, which OpenAI's strict response_format rejects. With both fixes, `autofix.eval.ts` scores 1.00 / 1.00 against the new explorer-mode tool flow. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

dcramer · 2026-05-13T22:36:29Z

-      "exec",
-      "sentry-mcp",
-      "--access-token=mocked-access-token",
-      "--all-scopes",


im actually surprised this wouldnt break something

JoshFerge temporarily deployed to Actions May 13, 2026 22:04 — with GitHub Actions Inactive

This was referenced May 13, 2026

feat(seer): add explorer-mode autofix schemas + API methods #963

Draft

feat(seer): migrate analyze_issue_with_seer + summary to explorer sections #966

Draft

feat(seer): migrate autofix tool to explorer-mode endpoint #961

Closed

dcramer reviewed May 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(evals): unblock ToolPredictionScorer#962

chore(evals): unblock ToolPredictionScorer#962
JoshFerge wants to merge 1 commit into
mainfrom
chore/evals-unblock-prediction-scorer

JoshFerge commented May 13, 2026 •

edited

Loading

Uh oh!

dcramer May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

JoshFerge commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Notes for reviewers

Uh oh!

dcramer May 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JoshFerge commented May 13, 2026 •

edited

Loading