Improve parameter flexibility by Qard · Pull Request #160 · braintrustdata/autoevals

Stephen Belanger (Qard) · 2026-01-12T22:55:52Z

This makes max_tokens configurable and makes both it and temperature fallback to model-provided defaults otherwise.

Fixes #149

github-actions · 2026-01-12T22:56:07Z

Braintrust eval report

Autoevals (parameter-flexibility-1768263664)

Score	Average	Improvements	Regressions
NumericDiff	72.8% (-1pp)	-	2 🔴
Time_to_first_token	1.34tok (-0.12tok)	112 🟢	7 🔴
Llm_calls	1.55 (+0)	-	-
Tool_calls	0 (+0)	-	-
Errors	0 (+0)	-	-
Llm_errors	0 (+0)	-	-
Tool_errors	0 (+0)	-	-
Prompt_tokens	279.25tok (+0tok)	-	-
Prompt_cached_tokens	0tok (+0tok)	-	-
Prompt_cache_creation_tokens	0tok (+0tok)	-	-
Completion_tokens	19.3tok (+0tok)	-	-
Completion_reasoning_tokens	0tok (+0tok)	-	-
Total_tokens	298.54tok (+0tok)	-	-
Estimated_cost	0$ (+0$)	-	-
Duration	2.51s (+1s)	114 🟢	105 🔴
Llm_duration	2.78s (-0.25s)	114 🟢	5 🔴

github-actions · 2026-01-12T22:56:07Z

Braintrust eval report

Autoevals (parameter-flexibility-1768258553)

Score	Average	Improvements	Regressions
NumericDiff	71.6% (-2pp)	1 🟢	7 🔴
Time_to_first_token	1.38tok (+0.05tok)	40 🟢	79 🔴
Llm_calls	1.55 (+0)	-	-
Tool_calls	0 (+0)	-	-
Errors	0 (+0)	-	-
Llm_errors	0 (+0)	-	-
Tool_errors	0 (+0)	-	-
Prompt_tokens	279.25tok (+0tok)	-	-
Prompt_cached_tokens	0tok (+0tok)	-	-
Prompt_cache_creation_tokens	0tok (+0tok)	-	-
Completion_tokens	19.3tok (+0tok)	-	-
Completion_reasoning_tokens	0tok (+0tok)	-	-
Total_tokens	298.54tok (+0tok)	-	-
Estimated_cost	0$ (+0$)	-	-
Duration	2.77s (+1.37s)	17 🟢	201 🔴
Llm_duration	2.87s (+0.08s)	28 🟢	89 🔴

Olmo Maldonado (ibolmo)

Quick tests would be good, but looks reasonable.

This makes max_tokens configurable and makes both it and temperature fallback to model-provided defaults otherwise.

Olmo Maldonado (ibolmo)

ty!

github-actions · 2026-01-13T17:04:42Z

Braintrust eval report

Autoevals (main-1768323886)

Score	Average	Improvements	Regressions
NumericDiff	72.8% (-1pp)	-	2 🔴
Time_to_first_token	1.38tok (-0.05tok)	88 🟢	28 🔴
Llm_calls	1.55 (+0)	-	-
Tool_calls	0 (+0)	-	-
Errors	0 (+0)	-	-
Llm_errors	0 (+0)	-	-
Tool_errors	0 (+0)	-	-
Prompt_tokens	279.25tok (+0tok)	-	-
Prompt_cached_tokens	0tok (+0tok)	-	-
Prompt_cache_creation_tokens	0tok (+0tok)	-	-
Completion_tokens	19.3tok (+0tok)	-	-
Completion_reasoning_tokens	0tok (+0tok)	-	-
Total_tokens	298.54tok (+0tok)	-	-
Estimated_cost	0$ (+0$)	-	-
Duration	3.56s (+2.12s)	96 🟢	123 🔴
Llm_duration	2.8s (-0.12s)	95 🟢	23 🔴

Stephen Belanger (Qard) requested review from Ankur Goyal (ankrgyl) and Olmo Maldonado (ibolmo) January 12, 2026 22:55

Stephen Belanger (Qard) self-assigned this Jan 12, 2026

Stephen Belanger (Qard) added enhancement New feature or request lang:python lang:typescript labels Jan 12, 2026

Olmo Maldonado (ibolmo) approved these changes Jan 13, 2026

View reviewed changes

Improve parameter flexibility

19fceba

This makes max_tokens configurable and makes both it and temperature fallback to model-provided defaults otherwise.

Stephen Belanger (Qard) force-pushed the parameter-flexibility branch from 89c7134 to 19fceba Compare January 13, 2026 00:20

Stephen Belanger (Qard) requested a review from Olmo Maldonado (ibolmo) January 13, 2026 00:33

Olmo Maldonado (ibolmo) approved these changes Jan 13, 2026

View reviewed changes

Stephen Belanger (Qard) merged commit df6af22 into main Jan 13, 2026
7 checks passed

Stephen Belanger (Qard) deleted the parameter-flexibility branch January 13, 2026 17:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve parameter flexibility#160

Improve parameter flexibility#160
Stephen Belanger (Qard) merged 1 commit into
mainfrom
parameter-flexibility

Stephen Belanger (Qard) commented Jan 12, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jan 12, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jan 12, 2026

Uh oh!

Olmo Maldonado (ibolmo) left a comment

Uh oh!

Olmo Maldonado (ibolmo) left a comment

Uh oh!

Uh oh!

github-actions Bot commented Jan 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Stephen Belanger (Qard) commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Braintrust eval report

Uh oh!

github-actions Bot commented Jan 12, 2026

Braintrust eval report

Uh oh!

Olmo Maldonado (ibolmo) left a comment

Choose a reason for hiding this comment

Uh oh!

Olmo Maldonado (ibolmo) left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Braintrust eval report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Stephen Belanger (Qard) commented Jan 12, 2026 •

edited

Loading

github-actions Bot commented Jan 12, 2026 •

edited

Loading

github-actions Bot commented Jan 13, 2026 •

edited

Loading