Skip to content

docs: document score result shape#189

Merged
Abhijeet Prasad (AbhiPrasad) merged 3 commits into
braintrustdata:mainfrom
Rul1an:codex/document-score-result-surface
Jun 8, 2026
Merged

docs: document score result shape#189
Abhijeet Prasad (AbhiPrasad) merged 3 commits into
braintrustdata:mainfrom
Rul1an:codex/document-score-result-surface

Conversation

@Rul1an

Copy link
Copy Markdown
Contributor

Summary

Documents the small Score result object returned by scorers in the README.

The new section calls out the public fields consumers should read when storing, comparing, or exporting evaluation results: name, score, metadata, and the deprecated error field. It also clarifies that inputs, expected values, prompts, and runtime context are kept separately from the returned Score.

Context

This follows the public-surface clarification from #187, where maintainers confirmed the returned Score object is the right minimal scorer-consumer boundary.

Validation

  • git diff --check

Copilot AI review requested due to automatic review settings May 18, 2026 18:15

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds README documentation defining the public Score result object returned by scorers, clarifying which fields external consumers should use and what context is intentionally kept outside the Score.

Changes:

  • Documented the Score result shape (name, score, metadata, and deprecated error) in the README.
  • Clarified that inputs/expected values/prompts/runtime context are not included in Score and should be stored separately.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread README.md Outdated
@AbhiPrasad Abhijeet Prasad (AbhiPrasad) merged commit 7dcc2ed into braintrustdata:main Jun 8, 2026
3 of 8 checks passed
@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown

Braintrust eval report

Autoevals (main-1780945663)

Score Average Improvements Regressions
NumericDiff 78.9% (+0pp) 8 🟢 9 🔴
Time_to_first_token 8.95tok (-1.61tok) 91 🟢 28 🔴
Llm_calls 1.09 (+0) - -
Tool_calls 0 (+0) - -
Errors 0 (+0) - -
Llm_errors 0 (+0) - -
Tool_errors 0 (+0) - -
Prompt_tokens 317.7tok (+0tok) - -
Prompt_cached_tokens 0tok (+0tok) - -
Prompt_cache_creation_tokens 0tok (+0tok) - -
Prompt_cache_creation_5m_tokens 0tok (+0tok) - -
Prompt_cache_creation_1h_tokens 0tok (+0tok) - -
Completion_tokens 249.22tok (+0.82tok) 50 🟢 54 🔴
Completion_reasoning_tokens 0tok (+0tok) - -
Total_tokens 566.92tok (+0.82tok) 50 🟢 54 🔴
Estimated_cost 0$ (+0$) 48 🟢 52 🔴
Duration 9.44s (-1.2s) 153 🟢 66 🔴
Llm_duration 10.55s (-1.43s) 87 🟢 32 🔴

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants