Skip to content

feat: add Flue framework harness adapter#77

Open
sergical wants to merge 6 commits into
mainfrom
feat/harness-flue
Open

feat: add Flue framework harness adapter#77
sergical wants to merge 6 commits into
mainfrom
feat/harness-flue

Conversation

@sergical
Copy link
Copy Markdown
Member

Summary

  • Add @vitest-evals/harness-flue — a harness adapter for the Flue agent harness framework
  • The adapter owns the Flue runtime lifecycle (createFlueContextinit()session), captures tool calls and usage via the FlueEvent stream, filters internal result tools (finish/give_up), and normalizes into HarnessRun
  • Add apps/demo-flue with a refund agent eval exercising structured output via valibot result: schemas, tool call assertions, and factuality judging
  • Add docs page for the Flue harness following the existing template
  • Update CLAUDE.md, README.md, and workspace configs

Test plan

  • 19 unit tests for event collector, usage aggregation, model extraction, output parsing (packages/harness-flue/src/index.test.ts)
  • 2 e2e eval cases pass against live Anthropic API (apps/demo-flue/evals/refund.eval.ts)
  • Full workspace build passes (all 7 packages + docs site at 126 pages)
  • Existing 298 tests unaffected
  • Docs site builds with new Flue harness page

🤖 Generated with Claude Code

Add `@vitest-evals/harness-flue` — a harness adapter for the Flue agent
harness framework. The adapter owns the Flue runtime lifecycle, captures
tool calls and usage from the event stream, filters internal result tools
(finish/give_up), and normalizes everything into HarnessRun.

Includes a demo app (apps/demo-flue) with a refund agent eval that
exercises structured output via valibot schemas, tool call assertions,
and factuality judging — mirroring the existing ai-sdk and openai-agents
demos.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented May 24, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
vitest-evals-docs Ready Ready Preview, Comment May 24, 2026 1:11pm

Request Review

The docs:check CI step requires JSDoc on all exported symbols from
package entrypoints. Moved event collector, usage aggregation, and
output helpers into src/internals.ts so they're importable by tests
but not part of the public API surface.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The path alias in tsconfig.base.json caused tsc to resolve
harness-flue source files despite the exclude, since path-resolved
files bypass the include/exclude globs. Removing the alias lets the
exclude work correctly. The demo-flue app has its own tsconfig with
bundler resolution and pnpm workspace linking handles the import.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread packages/harness-flue/src/internals.ts
…internal tools

The demo-flue app also needs bundler resolution for @flue/runtime
imports, so exclude it from the root typecheck alongside harness-flue.

Also filter internal tools (finish/give_up) at tool_start time so
their arguments aren't left in the pendingArgs map.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The path alias in tsconfig.base.json is needed by vite-tsconfig-paths
for test resolution, but it causes the root tsc to pull in harness-flue
source files that need bundler module resolution. Use a dedicated
tsconfig.typecheck.json that omits the flue alias for the root
typecheck while keeping the full alias set for Vite.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The adapter no longer imports from @flue/runtime/internal — it only
imports public types from @flue/runtime. The user owns the Flue
runtime lifecycle in their run function and wires the adapter's event
handler via ctx.subscribeEvent(). This removes the need for bundler
module resolution in the package, the just-bash dependency, and all
root tsconfig workarounds.

The demo app uses @flue/runtime/internal since it needs to construct
the FlueContext for testing. Only apps/demo-flue is excluded from the
root typecheck (same pattern as packages/docs).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant