Show aggregated metrics in UI (Part 1) by dphuang2 · Pull Request #43 · eval-protocol/python-sdk

dphuang2 · 2025-08-09T07:28:33Z

No description provided.

use model_dump(mode="json") Add run_id to EvalMetadataSchema for unique run identification - Introduced run_id as an optional string to the EvalMetadataSchema to uniquely identify evaluation runs. - Updated description to clarify the purpose of the run_id field. Add run_id field to EvalMetadata for unique run identification - Added run_id as an optional string to the EvalMetadata class to uniquely identify groups of evaluation rows. - Updated the field description to clarify its purpose in relation to evaluation tests. Fix evaluation result assignment in markdown highlighting test - Updated the test_markdown_highlighting_evaluation function to assign the evaluation result directly to the row when no assistant response is found, ensuring proper handling of evaluation results. Add run_id generation in evaluation_test for unique identification - Integrated the generate_id function to create a run_id within the evaluation_test function. - Passed the generated run_id to the evaluation function, ensuring unique identification of evaluation runs.

…avigation.

- Simplified the construction of the log initialization message by creating a data dictionary before sending it over the WebSocket, improving code readability.

- Introduced a new test case to validate the handling of multiple column fields in the computePivot function. - Verified correct computation of cell values, row totals, column totals, and grand total for the pivot table with composite columns.

# Conflicts: # eval_protocol/pytest/evaluation_test.py

…t_id, and run_id fields. Update evaluation_test to handle new identifiers and improve documentation on evaluation concepts.

…ng schema in eval-protocol types. This enhances tracking of invocation context for evaluation rows.

Dylan Huang added 15 commits August 6, 2025 23:05

save

3c605d5

Merge branch 'main' into show-aggregated-metrics-in-ui

b9c88a1

Wrap logo image in a link to the Eval Protocol website for improved n…

39c5f88

…avigation.

TODO: test the pivot table logic

cb32a50

Refactor WebSocket log initialization message handling in logs_server.py

6666f5e

- Simplified the construction of the log initialization message by creating a data dictionary before sending it over the WebSocket, improving code readability.

flatten json test

dd90428

refine pivot.ts

83dcf47

Merge branch 'main' into show-aggregated-metrics-in-ui

ffcb08d

# Conflicts: # eval_protocol/pytest/evaluation_test.py

assertion error means finished

b28fa2b

Refactor EvalMetadata and EvaluationRow models; add cohort_id, rollou…

3ad780b

…t_id, and run_id fields. Update evaluation_test to handle new identifiers and improve documentation on evaluation concepts.

Add invocation_id field to EvaluationRow model and update correspondi…

c90f4c6

…ng schema in eval-protocol types. This enhances tracking of invocation context for evaluation rows.

rename as its causing issues in pytest collection

87c3dcb

square up all the id madness and add a test

2ceaf72

dphuang2 changed the title ~~Show aggregated metrics in UI~~ Show aggregated metrics in UI (Part 1) Aug 10, 2025

dphuang2 merged commit e355931 into main Aug 10, 2025
7 checks passed

dphuang2 deleted the show-aggregated-metrics-in-ui branch August 10, 2025 22:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Show aggregated metrics in UI (Part 1)#43

Show aggregated metrics in UI (Part 1)#43
dphuang2 merged 15 commits intomainfrom
show-aggregated-metrics-in-ui

dphuang2 commented Aug 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dphuang2 commented Aug 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant