feat: add Flue framework harness adapter#77
Open
sergical wants to merge 6 commits into
Open
Conversation
Add `@vitest-evals/harness-flue` — a harness adapter for the Flue agent harness framework. The adapter owns the Flue runtime lifecycle, captures tool calls and usage from the event stream, filters internal result tools (finish/give_up), and normalizes everything into HarnessRun. Includes a demo app (apps/demo-flue) with a refund agent eval that exercises structured output via valibot schemas, tool call assertions, and factuality judging — mirroring the existing ai-sdk and openai-agents demos. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
The docs:check CI step requires JSDoc on all exported symbols from package entrypoints. Moved event collector, usage aggregation, and output helpers into src/internals.ts so they're importable by tests but not part of the public API surface. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The path alias in tsconfig.base.json caused tsc to resolve harness-flue source files despite the exclude, since path-resolved files bypass the include/exclude globs. Removing the alias lets the exclude work correctly. The demo-flue app has its own tsconfig with bundler resolution and pnpm workspace linking handles the import. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…internal tools The demo-flue app also needs bundler resolution for @flue/runtime imports, so exclude it from the root typecheck alongside harness-flue. Also filter internal tools (finish/give_up) at tool_start time so their arguments aren't left in the pendingArgs map. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The path alias in tsconfig.base.json is needed by vite-tsconfig-paths for test resolution, but it causes the root tsc to pull in harness-flue source files that need bundler module resolution. Use a dedicated tsconfig.typecheck.json that omits the flue alias for the root typecheck while keeping the full alias set for Vite. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The adapter no longer imports from @flue/runtime/internal — it only imports public types from @flue/runtime. The user owns the Flue runtime lifecycle in their run function and wires the adapter's event handler via ctx.subscribeEvent(). This removes the need for bundler module resolution in the package, the just-bash dependency, and all root tsconfig workarounds. The demo app uses @flue/runtime/internal since it needs to construct the FlueContext for testing. Only apps/demo-flue is excluded from the root typecheck (same pattern as packages/docs). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
@vitest-evals/harness-flue— a harness adapter for the Flue agent harness frameworkcreateFlueContext→init()→session), captures tool calls and usage via theFlueEventstream, filters internal result tools (finish/give_up), and normalizes intoHarnessRunapps/demo-fluewith a refund agent eval exercising structured output via valibotresult:schemas, tool call assertions, and factuality judgingTest plan
packages/harness-flue/src/index.test.ts)apps/demo-flue/evals/refund.eval.ts)🤖 Generated with Claude Code