Log per-sample details as trackio.Trace in push_to_wandb by abidlabs · Pull Request #1217 · huggingface/lighteval

abidlabs · 2026-04-27T17:26:02Z

Summary

Lighteval already has a Trackio path (import trackio as wandb in evaluation_tracker.py), but two gaps remained that made it less useful than it could be:

push_to_wandb accepts details_datasets but never uses it (line 297) — only aggregated results are logged. Per-sample data is dropped on the floor for both wandb and Trackio users.
trackio.Trace is unused, even though lighteval's per-sample shape (Doc.query + ModelResponse.text + metric dict + gold_index/choices) maps cleanly onto a conversational trace.

This PR closes both: when Trackio is the active backend, each per-sample detail is emitted as a trackio.Trace(messages=..., metadata=...), so individual eval samples are inspectable on the dashboard alongside the aggregated metrics.

The wandb code path is unchanged.

Trace shape

For each row in a task's details_datasets:

trackio.Trace(
    messages=[
        # only if doc.instruction is set:
        {"role": "system", "content": doc.instruction},
        {"role": "user",      "content": model_response.input or doc.query},
        {"role": "assistant", "content": model_response.text[0]},
    ],
    metadata={
        "task": task_name,
        **metric,                # all per-sample metric values
        "gold_index": doc.gold_index,
        "gold": doc.choices[doc.gold_index],   # when resolvable
    },
)

Logged under {task_name}/sample so samples group by task on the dashboard.

Changes

src/lighteval/logging/evaluation_tracker.py:
- Track whether the active backend is Trackio (self._using_trackio) and stash the imported module on the tracker.
- Add _log_details_as_traces and call it from push_to_wandb behind the trackio check.

The existing Trackio path (`import trackio as wandb`) already let users opt in via WANDB_PROJECT, but two real gaps remained: 1. push_to_wandb received `details_datasets` and never used it — every per-sample detail was dropped on the floor for both wandb and Trackio users, so dashboards only ever showed aggregated metrics. 2. trackio.Trace was unused, even though lighteval's per-sample shape (Doc.query + ModelResponse.text + metric dict + gold_index/choices) maps cleanly onto a conversational trace. This change: - Tracks whether the active backend is Trackio (`self._using_trackio`) and stashes the imported module on the tracker. - Adds `_log_details_as_traces` which iterates each task's sample dataset and emits one trackio.Trace per row, with system instruction / user prompt / assistant completion as messages and metric values, task name, and gold answer as metadata. Logged under `{task_name}/sample` so samples are grouped on the dashboard. - Wires it from push_to_wandb behind the trackio check, so the wandb code path is unchanged. Per-sample details still flow to the Hub via push_to_hub as before.

bot-ci-comment · 2026-04-27T17:28:33Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

abidlabs marked this pull request as ready for review April 27, 2026 17:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Log per-sample details as trackio.Trace in push_to_wandb#1217

Log per-sample details as trackio.Trace in push_to_wandb#1217
abidlabs wants to merge 1 commit intohuggingface:mainfrom
abidlabs:add-trackio-traces

abidlabs commented Apr 27, 2026 •

edited

Loading

Uh oh!

bot-ci-comment Bot commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

abidlabs commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Trace shape

Changes

Uh oh!

bot-ci-comment Bot commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

abidlabs commented Apr 27, 2026 •

edited

Loading