Fix experiment trigger on remote eval#128
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8bc4a344a5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "Codex (@codex) review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "Codex (@codex) address that feedback".
| new CreateExperiment() | ||
| .projectId(projectId) | ||
| .name(experimentName) | ||
| .ensureNew(true)); |
There was a problem hiding this comment.
Link dataset runs to the created experiment
When the Experiment trigger is run against a Braintrust dataset (data.dataset_id or project_name/dataset_name), this creates the experiment with only project/name/ensure_new before extractDataset() opens the dataset cursor. Unlike Eval.run(), which copies the dataset id and cursor version into CreateExperiment, the new remote-eval experiment is not linked to the dataset/version, so the Experiment page loses the dataset association even though the rows were fetched from it.
Useful? React with 👍 / 👎.
Andrew Kent (realark)
left a comment
There was a problem hiding this comment.
(going to take over this branch and refactor devserver and evals to share a code path to address issues like the one the bot raised)
Summary
Remote evals triggered as an Experiment from the Braintrust UI failed (
IllegalArgumentException: braintrust parent (playground_id) not found) while Playground runs worked. Experiment runs sendparent=null+experiment_name/project_id; the dev server had no branch for that path. This PR adds the missing path and a regression test for it.Changes
Devserver.extractParentInfo: when no playground parent is present, create an experiment viaExperimentsApi.postExperiment(...)and parent spans toexperiment_id:<id>.Devserver.extractParentInfo: passensure_new=trueso repeated UI runs create distinct experiments instead of appending to the first.Devserver.extractParentInfo: keep the playground path (playground_id:<id>) unchanged and turn the old unconditional throw into a real "neither signal present" fallback.DevserverTest.testExperimentEval: new test added in this PR, sending the experiment-run shape and asserting clean summary/done with the eval span parented toexperiment_id:<id>.cassettes/.../v1_experiment-java-experiment-repro.json: add stub forPOST /v1/experimentmatching"ensure_new":true(also serves as the regression guard for the flag).Testing
testExperimentEvalfailed before the fix with the exact reported error; passes now.DevserverTestsuite passes (playground paths unaffected); validated end-to-end against a local fixed-SDK build.Out of scope
Eval.java(CLI runner) callspostExperimentwithoutensureNew— left unchanged as it may be intentional for CLI re-runs.