feat(frontend): migrate evaluator playground to workflow evaluators by mmabrouk · Pull Request #3767 · Agenta-AI/agenta

mmabrouk · 2026-02-16T13:13:50Z

Summary

Migrates evaluator configs CRUD to use workflow evaluator endpoints
Migrates evaluator run/invoke to use workflow-based invocation
Includes API fixes for schema persistence and builtin evaluator hydration

Stacked on #3572. Merges into docs/migrate-evaluator-playground-plan which targets main.

Previously merged as #3576 and #3577 into the wrong base branch (chore/migrate-evaluators).

…(CRUD) and PR 2 (Run)

…layground-frontend

…igrate-evaluator-playground-run

This reverts commit 00c2aa2.

Expose output schemas from evaluator templates and send them on config create/edit, including dynamic derivation for auto_ai_critique and json_multi_field_match. Also remove legacy /evaluators/map usage and relax config listing filters so older non-human configs remain visible.

…ound-run feat(frontend): migrate evaluator run invocation

vercel · 2026-02-16T13:13:55Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
agenta-documentation	Ready	Preview, Comment	Feb 16, 2026 1:21pm

…d-plan' into feat/migrate-evaluator-playground-frontend # Conflicts: # docs/design/migrate-evaluator-playground/current-system.md # docs/design/migrate-evaluator-playground/new-endpoints.md # docs/design/migrate-evaluator-playground/plan.md # docs/design/migrate-evaluator-playground/status.md # web/oss/src/lib/atoms/evaluation.ts

devin-ai-integration

Devin Review found 1 potential issue.

View 7 additional findings in Devin Review.

devin-ai-integration · 2026-02-16T13:25:46Z

api/oss/src/core/evaluators/service.py

+        return SimpleEvaluatorData(
+            **{
+                **hydrated_data_dict,
+                **existing_data_dict,
+            }
+        )


🟡 Shallow merge in _ensure_builtin_evaluator_data can silently discard hydrated output schemas

When the existing simple_evaluator_data has a schemas field set to a value that does not contain an outputs key (e.g., schemas: {} or schemas: {"inputs": {...}}), the shallow dict merge {**hydrated_data_dict, **existing_data_dict} completely replaces the hydrated schemas (which contains the computed outputs) with the existing incomplete schemas, thereby losing the output schema the function was supposed to inject.

Root Cause & Impact

The function's purpose is to fill in missing output schemas for builtin evaluators. At api/oss/src/core/evaluators/service.py:848-853, the merge is:

return SimpleEvaluatorData( **{ **hydrated_data_dict, # has schemas.outputs **existing_data_dict, # may have schemas WITHOUT outputs } )

Because Python dict unpacking is a shallow merge, the entire schemas key from existing_data_dict replaces the one from hydrated_data_dict. For example:

hydrated_data_dict["schemas"] = {"outputs": {"type": "object", ...}}

existing_data_dict["schemas"] = {} (or {"inputs": {...}})

Result: schemas = {} — the outputs schema is lost

The _has_outputs_schema guard at line 817 only checks if outputs already exist and returns early. It does not prevent the merge from discarding the hydrated outputs when schemas exists but lacks the outputs key.

Impact: Evaluator revisions can be persisted without the expected data.schemas.outputs, which may break downstream consumers that rely on knowing the evaluator's output shape (e.g., workflow invocation, result rendering).

Prompt for agents

In api/oss/src/core/evaluators/service.py, the _ensure_builtin_evaluator_data method (around line 848-853) performs a shallow dict merge that can lose the hydrated schemas.outputs. Replace the shallow merge with a deep merge that preserves nested dict keys. Specifically, when both hydrated_data_dict and existing_data_dict have a 'schemas' key, the merge should combine them so that existing schemas keys take precedence but missing keys (like 'outputs') are filled in from the hydrated dict. For example: merged = {**hydrated_data_dict, **existing_data_dict} if 'schemas' in hydrated_data_dict and 'schemas' in existing_data_dict: merged['schemas'] = {**hydrated_data_dict['schemas'], **existing_data_dict['schemas']} Then construct SimpleEvaluatorData(**merged).

Was this helpful? React with 👍 or 👎 to provide feedback.

mmabrouk added 15 commits January 27, 2026 20:41

docs: add evaluator playground migration planning workspace

e5a0e16

docs: update plan to direct migration (no adapters), split into PR 1 …

df1e622

…(CRUD) and PR 2 (Run)

feat(frontend): migrate evaluator configs CRUD

e3e633d

fix(frontend): remove duplicate hook imports

02ad4dc

feat(frontend): invoke evaluators via workflows

9b9435a

fix(frontend): use explicit evaluator URI for invoke

8ecf91c

fix(frontend): harden evaluator invoke interface

23e6d62

Merge branch 'chore/migrate-evaluators' into feat/migrate-evaluator-p…

8495417

…layground-frontend

Merge branch 'feat/migrate-evaluator-playground-frontend' into feat/m…

dfcd091

…igrate-evaluator-playground-run

fix(frontend): exclude container ag metrics from overview

00c2aa2

Revert "fix(frontend): exclude container ag metrics from overview"

bef12ec

This reverts commit 00c2aa2.

fix(api): hydrate builtin evaluator schemas on simple CRUD

fa91fcf

chore(api): apply ruff formatting cleanup

0c6ae55

Merge pull request #3577 from Agenta-AI/feat/migrate-evaluator-playgr…

2daf78d

…ound-run feat(frontend): migrate evaluator run invocation

dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. Backend Evaluation feature Frontend labels Feb 16, 2026

mmabrouk merged commit 44009a7 into docs/migrate-evaluator-playground-plan Feb 16, 2026
4 of 5 checks passed

vercel bot deployed to Preview February 16, 2026 13:21 View deployment

devin-ai-integration bot reviewed Feb 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(frontend): migrate evaluator playground to workflow evaluators#3767

feat(frontend): migrate evaluator playground to workflow evaluators#3767
mmabrouk merged 16 commits intodocs/migrate-evaluator-playground-planfrom
feat/migrate-evaluator-playground-frontend

mmabrouk commented Feb 16, 2026 •

edited by devin-ai-integration bot

Loading

Uh oh!

vercel bot commented Feb 16, 2026 •

edited

Loading

Uh oh!

Uh oh!

devin-ai-integration bot left a comment

Uh oh!

devin-ai-integration bot Feb 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mmabrouk commented Feb 16, 2026 • edited by devin-ai-integration bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

vercel bot commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mmabrouk commented Feb 16, 2026 •

edited by devin-ai-integration bot

Loading

vercel bot commented Feb 16, 2026 •

edited

Loading