Skip to content

Fix experiment trigger on remote eval#128

Open
Pedro Mora (stretpjc) wants to merge 2 commits into
mainfrom
fix-experiment-trigger-on-remote-eval
Open

Fix experiment trigger on remote eval#128
Pedro Mora (stretpjc) wants to merge 2 commits into
mainfrom
fix-experiment-trigger-on-remote-eval

Conversation

@stretpjc

Copy link
Copy Markdown

Summary

Remote evals triggered as an Experiment from the Braintrust UI failed (IllegalArgumentException: braintrust parent (playground_id) not found) while Playground runs worked. Experiment runs send parent=null + experiment_name/project_id; the dev server had no branch for that path. This PR adds the missing path and a regression test for it.

Changes

  • Devserver.extractParentInfo: when no playground parent is present, create an experiment via ExperimentsApi.postExperiment(...) and parent spans to experiment_id:<id>.
  • Devserver.extractParentInfo: pass ensure_new=true so repeated UI runs create distinct experiments instead of appending to the first.
  • Devserver.extractParentInfo: keep the playground path (playground_id:<id>) unchanged and turn the old unconditional throw into a real "neither signal present" fallback.
  • DevserverTest.testExperimentEval: new test added in this PR, sending the experiment-run shape and asserting clean summary/done with the eval span parented to experiment_id:<id>.
  • cassettes/.../v1_experiment-java-experiment-repro.json: add stub for POST /v1/experiment matching "ensure_new":true (also serves as the regression guard for the flag).

Testing

  • The new testExperimentEval failed before the fix with the exact reported error; passes now.
  • Full DevserverTest suite passes (playground paths unaffected); validated end-to-end against a local fixed-SDK build.

Out of scope

  • Eval.java (CLI runner) calls postExperiment without ensureNew — left unchanged as it may be intentional for CLI re-runs.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8bc4a344a5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "Codex (@codex) review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "Codex (@codex) address that feedback".

Comment on lines +1174 to +1177
new CreateExperiment()
.projectId(projectId)
.name(experimentName)
.ensureNew(true));

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Link dataset runs to the created experiment

When the Experiment trigger is run against a Braintrust dataset (data.dataset_id or project_name/dataset_name), this creates the experiment with only project/name/ensure_new before extractDataset() opens the dataset cursor. Unlike Eval.run(), which copies the dataset id and cursor version into CreateExperiment, the new remote-eval experiment is not linked to the dataset/version, so the Experiment page loses the dataset association even though the rows were fetched from it.

Useful? React with 👍 / 👎.

@realark Andrew Kent (realark) left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(going to take over this branch and refactor devserver and evals to share a code path to address issues like the one the bot raised)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants