Skip to content

feat(frontend): migrate evaluator playground to workflow evaluators#3767

Merged
mmabrouk merged 16 commits intodocs/migrate-evaluator-playground-planfrom
feat/migrate-evaluator-playground-frontend
Feb 16, 2026
Merged

feat(frontend): migrate evaluator playground to workflow evaluators#3767
mmabrouk merged 16 commits intodocs/migrate-evaluator-playground-planfrom
feat/migrate-evaluator-playground-frontend

Conversation

@mmabrouk
Copy link
Member

@mmabrouk mmabrouk commented Feb 16, 2026

Summary

  • Migrates evaluator configs CRUD to use workflow evaluator endpoints
  • Migrates evaluator run/invoke to use workflow-based invocation
  • Includes API fixes for schema persistence and builtin evaluator hydration

Stacked on #3572. Merges into docs/migrate-evaluator-playground-plan which targets main.

Previously merged as #3576 and #3577 into the wrong base branch (chore/migrate-evaluators).


Open with Devin

Expose output schemas from evaluator templates and send them on config create/edit, including dynamic derivation for auto_ai_critique and json_multi_field_match. Also remove legacy /evaluators/map usage and relax config listing filters so older non-human configs remain visible.
…ound-run

feat(frontend): migrate evaluator run invocation
@vercel
Copy link

vercel bot commented Feb 16, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agenta-documentation Ready Ready Preview, Comment Feb 16, 2026 1:21pm

Request Review

@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. Backend Evaluation feature Frontend labels Feb 16, 2026
…d-plan' into feat/migrate-evaluator-playground-frontend

# Conflicts:
#	docs/design/migrate-evaluator-playground/current-system.md
#	docs/design/migrate-evaluator-playground/new-endpoints.md
#	docs/design/migrate-evaluator-playground/plan.md
#	docs/design/migrate-evaluator-playground/status.md
#	web/oss/src/lib/atoms/evaluation.ts
@mmabrouk mmabrouk merged commit 44009a7 into docs/migrate-evaluator-playground-plan Feb 16, 2026
4 of 5 checks passed
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

View 7 additional findings in Devin Review.

Open in Devin Review

Comment on lines +848 to +853
return SimpleEvaluatorData(
**{
**hydrated_data_dict,
**existing_data_dict,
}
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Shallow merge in _ensure_builtin_evaluator_data can silently discard hydrated output schemas

When the existing simple_evaluator_data has a schemas field set to a value that does not contain an outputs key (e.g., schemas: {} or schemas: {"inputs": {...}}), the shallow dict merge {**hydrated_data_dict, **existing_data_dict} completely replaces the hydrated schemas (which contains the computed outputs) with the existing incomplete schemas, thereby losing the output schema the function was supposed to inject.

Root Cause & Impact

The function's purpose is to fill in missing output schemas for builtin evaluators. At api/oss/src/core/evaluators/service.py:848-853, the merge is:

return SimpleEvaluatorData(
    **{
        **hydrated_data_dict,   # has schemas.outputs
        **existing_data_dict,   # may have schemas WITHOUT outputs
    }
)

Because Python dict unpacking is a shallow merge, the entire schemas key from existing_data_dict replaces the one from hydrated_data_dict. For example:

  • hydrated_data_dict["schemas"] = {"outputs": {"type": "object", ...}}
  • existing_data_dict["schemas"] = {} (or {"inputs": {...}})
  • Result: schemas = {} — the outputs schema is lost

The _has_outputs_schema guard at line 817 only checks if outputs already exist and returns early. It does not prevent the merge from discarding the hydrated outputs when schemas exists but lacks the outputs key.

Impact: Evaluator revisions can be persisted without the expected data.schemas.outputs, which may break downstream consumers that rely on knowing the evaluator's output shape (e.g., workflow invocation, result rendering).

Prompt for agents
In api/oss/src/core/evaluators/service.py, the _ensure_builtin_evaluator_data method (around line 848-853) performs a shallow dict merge that can lose the hydrated schemas.outputs. Replace the shallow merge with a deep merge that preserves nested dict keys. Specifically, when both hydrated_data_dict and existing_data_dict have a 'schemas' key, the merge should combine them so that existing schemas keys take precedence but missing keys (like 'outputs') are filled in from the hydrated dict. For example:

merged = {**hydrated_data_dict, **existing_data_dict}
if 'schemas' in hydrated_data_dict and 'schemas' in existing_data_dict:
    merged['schemas'] = {**hydrated_data_dict['schemas'], **existing_data_dict['schemas']}

Then construct SimpleEvaluatorData(**merged).
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Backend Evaluation feature Frontend size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant