Log per-sample details as trackio.Trace in push_to_wandb#1217
Open
abidlabs wants to merge 1 commit intohuggingface:mainfrom
Open
Log per-sample details as trackio.Trace in push_to_wandb#1217abidlabs wants to merge 1 commit intohuggingface:mainfrom
abidlabs wants to merge 1 commit intohuggingface:mainfrom
Conversation
The existing Trackio path (`import trackio as wandb`) already let users
opt in via WANDB_PROJECT, but two real gaps remained:
1. push_to_wandb received `details_datasets` and never used it — every
per-sample detail was dropped on the floor for both wandb and
Trackio users, so dashboards only ever showed aggregated metrics.
2. trackio.Trace was unused, even though lighteval's per-sample shape
(Doc.query + ModelResponse.text + metric dict + gold_index/choices)
maps cleanly onto a conversational trace.
This change:
- Tracks whether the active backend is Trackio (`self._using_trackio`)
and stashes the imported module on the tracker.
- Adds `_log_details_as_traces` which iterates each task's sample
dataset and emits one trackio.Trace per row, with system instruction
/ user prompt / assistant completion as messages and metric values,
task name, and gold answer as metadata. Logged under
`{task_name}/sample` so samples are grouped on the dashboard.
- Wires it from push_to_wandb behind the trackio check, so the wandb
code path is unchanged.
Per-sample details still flow to the Hub via push_to_hub as before.
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Lighteval already has a Trackio path (
import trackio as wandbinevaluation_tracker.py), but two gaps remained that made it less useful than it could be:push_to_wandbacceptsdetails_datasetsbut never uses it (line 297) — only aggregated results are logged. Per-sample data is dropped on the floor for both wandb and Trackio users.trackio.Traceis unused, even though lighteval's per-sample shape (Doc.query+ModelResponse.text+metricdict +gold_index/choices) maps cleanly onto a conversational trace.This PR closes both: when Trackio is the active backend, each per-sample detail is emitted as a
trackio.Trace(messages=..., metadata=...), so individual eval samples are inspectable on the dashboard alongside the aggregated metrics.The wandb code path is unchanged.
Trace shape
For each row in a task's
details_datasets:Logged under
{task_name}/sampleso samples group by task on the dashboard.Changes
src/lighteval/logging/evaluation_tracker.py:self._using_trackio) and stash the imported module on the tracker._log_details_as_tracesand call it frompush_to_wandbbehind the trackio check.