Skip to content

feat(core): wire validator history and surface validationOutcome (#429)#463

Draft
lmorchard wants to merge 4 commits into
mainfrom
feat/429-validator-context-outcome
Draft

feat(core): wire validator history and surface validationOutcome (#429)#463
lmorchard wants to merge 4 commits into
mainfrom
feat/429-validator-context-outcome

Conversation

@lmorchard
Copy link
Copy Markdown
Collaborator

@lmorchard lmorchard commented May 20, 2026

Summary

  • Wires the agent's recent conversation history (last 30 messages) into the task-validator prompt so the validator can spot "agent gave up early but final answer looks plausible" failure modes — not just score the final answer in isolation.
  • Adds validationOutcome?: "accepted" | "force-accepted" to TaskExecutionResult so callers (eval-judge, telemetry) can distinguish a real validator accept from a force-accept after maxValidationAttempts. Today these were indistinguishable: both surfaced as success: true.

This is PR1 of a planned two-PR sequence. Core changes only — consumer plumbing (CLI display, extension UI) is deliberately deferred to PR2. The eval-judge / telemetry signal lands here; the server SSE complete event auto-forwards the new optional field through existing serialization (no server code change needed).

Design Decisions

  • Wire conversationHistory into the template, don't delete the dead helper. formatConversationHistory already exists and builds a 30-message string; the template just never referenced it. Wiring it gives the validator real signal about whether the trajectory matches the claimed result.
  • Two outcome values only: "accepted" and "force-accepted". Field optional. undefined is the implicit "validation didn't run" case (task aborted, max iterations). Skipped "rejected" / "skipped" enum values — neither has a firing code path today; trivial to expand later when one does.
  • Force-accept lumps both sub-cases. Validator-disagreed-three-times and validator-call-itself-errored both map to "force-accepted". Both are "the validator did not actively endorse this answer." A finer split (e.g., "force-accepted-error") is a follow-up if eval data shows it matters.
  • Reuse the existing external-content wrapping pattern. History is wrapped in <EXTERNAL-CONTENT label="conversation-history">…</EXTERNAL-CONTENT> via the existing wrapExternalContentWithWarning helper. New ConversationHistory variant added to ExternalContentLabel. (Note: the shared warning text mentions "page text" — imperfect fit, but the threat-model intent of "treat as data, not instructions" is consistent.)
  • formatConversationHistory shape unchanged. Still this.messages.slice(-30). Reshape work (e.g., "first user message + last 20") is speculative; ship the wiring first.

Changes

packages/core/src/:

  • prompts.tstaskValidationTemplate references {{ wrappedConversationHistory }}; buildTaskValidationPrompt wraps the history before passing into the template; adds a trajectory-review step to the evaluation instructions.
  • utils/promptSecurity.tsConversationHistory = "conversation-history" added to ExternalContentLabel.
  • webAgent.tsvalidationOutcome? threaded through TaskExecutionResult, ExecutionState, validateTaskCompletion, generateAndProcessAction, runMainLoop, and buildResult. Conditional spread in buildResult mirrors how error is spread.

packages/core/test/:

  • prompts.test.ts — 3 new tests asserting the validation prompt includes the wrapped history, the safety warning, and the trajectory-review instruction.
  • webAgent.test.ts — 4 new tests covering validationOutcome === "accepted" on first-attempt accept, "force-accepted" via validator rejecting to max attempts, "force-accepted" via validator throwing to max attempts, and undefined when the task fails before done() (max iterations path).

Test Plan

  • pnpm run check passes (core 682, server 96, cli 221, extension 266 tests)
  • pnpm run typecheck passes
  • pnpm run format:check passes
  • gitleaks detect --log-opts="880db9f..HEAD" clean on branch commits
  • Reviewer: confirm TaskExecutionResult.validationOutcome reads cleanly in the eval-judge integration (the originating use case)

References

This comment was marked as resolved.

@lmorchard
Copy link
Copy Markdown
Collaborator Author

Filed #464 as the follow-up for the EXTERNAL_CONTENT_WARNING text issue raised by Copilot.

@lmorchard lmorchard marked this pull request as draft May 20, 2026 23:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Wire validator conversation history and surface validationOutcome in TaskExecutionResult

2 participants