Chore report evals as experiments in langfuse by valdis · Pull Request #254 · eggai-tech/qualops

valdis · 2026-06-17T08:44:33Z

This PR does two unrelated but both welcome things:

Langfuse SDK upgrade (langfuse@3 → @langfuse/client@5.4.1)

The v5 client ships a structured resource API (langfuse.api.datasetRunItems.create, langfuse.dataset.get, langfuse.score.create, etc.) instead of the flat method surface of v3. The migration is mechanical — every call site is updated to the new namespaced path, and the manual pagination loop for datasetItemsList is replaced by langfuse.dataset.get() which returns items directly.

The key functional addition: datasetRunItems.create now includes runDescription (model · mode · preset) and a richer metadata payload, so each eval run is properly registered as a Langfuse experiment with enough context to distinguish runs in the UI without needing to click into individual traces.

…ve switch The openai-compatible case used a raw string literal instead of the enum constant, bypassing the never check in the default branch. Adding OPENAI_COMPATIBLE to AIProviderType and using it in the switch restores exhaustiveness — adding a new AIProviderName will now cause a compile error if the factory switch is not updated.

Each dataset run item now carries runDescription and run-level metadata (model, mode, provider, preset, configPath) so Langfuse shows meaningful experiment context alongside each run — not just the auto-generated name. The SDK upserts these fields on the dataset run object on every call, so the metadata is consistent across all items in the same run.

Replace `langfuse@3.38.20` with `@langfuse/client@5.4.1` across the eval pipeline. Key changes: - `Langfuse` → `LangfuseClient` everywhere - `langfuse.score()` → `langfuse.score.create()` - `langfuse.shutdownAsync()` → `langfuse.shutdown()` - `langfuse.api.datasetsGet()` + manual pagination → `langfuse.dataset.get()` (auto-paginated) - `langfuse.api.datasetRunItemsCreate()` → `langfuse.api.datasetRunItems.create()` - `langfuse.api.datasetsCreate()` → `langfuse.api.datasets.create()` - `langfuse.api.datasetItemsCreate()` → `langfuse.api.datasetItems.create()`

github-actions · 2026-06-17T08:50:32Z

QualOps Code Quality Analysis

Status: ✅ PASSED - No issues found

Summary

Total Issues: 0
Critical: 0 🔴
High: 0 🟠
Medium: 0 🟡
Low: 0 🟢
Files Analyzed: 8

No issues found in the analyzed code.

📊 Full Report

View detailed report

Powered by QualOps

valdis added 3 commits June 17, 2026 11:45

valdis force-pushed the chore-report-evals-as-experiments-in-langfuse branch from 15f4b79 to 0274b45 Compare June 17, 2026 08:46

sebastianwessel approved these changes Jun 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chore report evals as experiments in langfuse#254

Chore report evals as experiments in langfuse#254
valdis wants to merge 3 commits into
mainfrom
chore-report-evals-as-experiments-in-langfuse

valdis commented Jun 17, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

valdis commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 17, 2026

QualOps Code Quality Analysis

Summary

📊 Full Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

valdis commented Jun 17, 2026 •

edited

Loading