docs: document score result shape by Rul1an · Pull Request #189 · braintrustdata/autoevals

Rul1an (Rul1an) · 2026-05-18T18:15:56Z

Summary

Documents the small Score result object returned by scorers in the README.

The new section calls out the public fields consumers should read when storing, comparing, or exporting evaluation results: name, score, metadata, and the deprecated error field. It also clarifies that inputs, expected values, prompts, and runtime context are kept separately from the returned Score.

Context

This follows the public-surface clarification from #187, where maintainers confirmed the returned Score object is the right minimal scorer-consumer boundary.

Validation

git diff --check

Copilot

Pull request overview

Adds README documentation defining the public Score result object returned by scorers, clarifying which fields external consumers should use and what context is intentionally kept outside the Score.

Changes:

Documented the Score result shape (name, score, metadata, and deprecated error) in the README.
Clarified that inputs/expected values/prompts/runtime context are not included in Score and should be stored separately.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

github-actions · 2026-06-08T19:07:40Z

Braintrust eval report

Autoevals (main-1780945663)

Score	Average	Improvements	Regressions
NumericDiff	78.9% (+0pp)	8 🟢	9 🔴
Time_to_first_token	8.95tok (-1.61tok)	91 🟢	28 🔴
Llm_calls	1.09 (+0)	-	-
Tool_calls	0 (+0)	-	-
Errors	0 (+0)	-	-
Llm_errors	0 (+0)	-	-
Tool_errors	0 (+0)	-	-
Prompt_tokens	317.7tok (+0tok)	-	-
Prompt_cached_tokens	0tok (+0tok)	-	-
Prompt_cache_creation_tokens	0tok (+0tok)	-	-
Prompt_cache_creation_5m_tokens	0tok (+0tok)	-	-
Prompt_cache_creation_1h_tokens	0tok (+0tok)	-	-
Completion_tokens	249.22tok (+0.82tok)	50 🟢	54 🔴
Completion_reasoning_tokens	0tok (+0tok)	-	-
Total_tokens	566.92tok (+0.82tok)	50 🟢	54 🔴
Estimated_cost	0$ (+0$)	48 🟢	52 🔴
Duration	9.44s (-1.2s)	153 🟢	66 🔴
Llm_duration	10.55s (-1.43s)	87 🟢	32 🔴

docs: document score result shape

9bfb902

Copilot AI review requested due to automatic review settings May 18, 2026 18:15

Copilot started reviewing on behalf of Rul1an (Rul1an) May 18, 2026 18:16 View session

Copilot AI reviewed May 18, 2026

View reviewed changes

Comment thread README.md Outdated

Rul1an (Rul1an) added 2 commits May 18, 2026 20:23

docs: clarify score error compatibility

997ac32

docs: clarify score metadata scope

fe6d143

Abhijeet Prasad (AbhiPrasad) approved these changes Jun 8, 2026

View reviewed changes

Abhijeet Prasad (AbhiPrasad) merged commit 7dcc2ed into braintrustdata:main Jun 8, 2026
3 of 8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: document score result shape#189

docs: document score result shape#189
Abhijeet Prasad (AbhiPrasad) merged 3 commits into
braintrustdata:mainfrom
Rul1an:codex/document-score-result-surface

Rul1an (Rul1an) commented May 18, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jun 8, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Rul1an (Rul1an) commented May 18, 2026

Summary

Context

Validation

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Braintrust eval report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions Bot commented Jun 8, 2026 •

edited

Loading