Skip to content

fix(llm): pass existing folder cases into outline prompt to avoid duplicates#335

Merged
therealbrad merged 2 commits into
mainfrom
fix/outline-existing-cases-context
May 22, 2026
Merged

fix(llm): pass existing folder cases into outline prompt to avoid duplicates#335
therealbrad merged 2 commits into
mainfrom
fix/outline-existing-cases-context

Conversation

@therealbrad
Copy link
Copy Markdown
Contributor

Description

When a user generated test cases for the same story a second time (for example, first run focused on critical paths, second run on edge cases via the suggestion chips), the AI would produce overlapping titles because the outline phase had no awareness of cases that already existed.

The single-shot generator (route.ts) has always fetched folder-hierarchy context and passed it into the prompt with an explicit "do not duplicate" instruction. The newer two-phase outline → expand flow dropped that context — buildOutlineUserPrompt only saw the issue plus the free-form user notes, so a second run on the same story produced near-duplicates.

This restores parity. No UI, behavior, or API surface changes — same inputs, same flow; the outline LLM just now sees the same coverage context the single-shot generator always did.

Changes:

  • buildOutlineUserPrompt renders an EXISTING TEST CASES — DO NOT DUPLICATE OR SUBSTANTIALLY OVERLAP block when the context carries cases. Each entry is the title plus up to 200 chars of description (the outline only needs enough signal to recognise overlap; full step detail is unnecessary).
  • outline/route.ts now mirrors the single-shot route's token-budget math and calls fetchHierarchyContext before assembling the user prompt, then re-builds the prompt with the enriched context.
  • Four new unit tests for buildOutlineUserPrompt covering the no-existing-cases path, the rendered block, 200-char truncation, and no-description entries.

Related Issue

N/A — customer support report.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)

Testing

  • 4 new unit tests for buildOutlineUserPrompt.
  • Full unit suite: 7420 passed / 0 failed.

Checklist

  • My code follows the project style guidelines
  • I have performed a self-review of my own code
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective
  • New and existing unit tests pass locally with my changes

therealbrad and others added 2 commits May 22, 2026 14:33
…licates

When a user generated test cases for the same story a second time (e.g.
first run focused on critical paths, second run on edge cases via the
notes suggestions), the AI would produce overlapping titles because the
outline phase had no awareness of cases that already existed.

The single-shot generator (route.ts) has always fetched folder-hierarchy
context and passed it into the prompt with an explicit "do not duplicate"
instruction. The newer two-phase outline -> expand flow dropped that
context. buildOutlineUserPrompt only saw the issue plus the free-form
user notes, so a second run on the same story produced near-duplicates.

Restore parity:
- buildOutlineUserPrompt renders an "EXISTING TEST CASES" block when
  the context carries cases. Each entry is the title plus up to 200
  chars of description (the outline only needs enough signal to
  recognise overlap; full step detail is unnecessary).
- outline/route.ts now mirrors the single-shot route's token-budget
  math and calls fetchHierarchyContext before assembling the user
  prompt, then re-builds the prompt with the enriched context.
- Four new unit tests for buildOutlineUserPrompt covering the
  no-existing-cases path, the rendered block, 200-char truncation, and
  no-description entries.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The first pass at this PR copied the single-shot generator's 65%-of-
request budget. That budget makes sense for the single-shot path
(which generates entire cases in one round-trip), but it's too
generous for the outline phase: at default 4096 tokens-per-request,
the outline call could end up carrying ~2500 tokens of folder
context plus all the steps and field values for every fetched case.

Customer hit a 60s Anthropic timeout on every regeneration against
a folder that already had cases. The outline LLM only needs to know
WHICH titles already exist to avoid emitting overlapping ones —
descriptions and steps add no useful dedup signal at the title
stage.

- Outline route: hardcoded 1500-token context budget instead of
  scaling with maxTokensPerRequest. Fits roughly 100-200 case
  names without bloating the prompt or pushing the request past
  the configured integration timeout (default 30s).
- buildOutlineUserPrompt now renders titles only, no descriptions.
  Heading renamed to "EXISTING TEST CASE TITLES" to reflect the
  shape, and a comment explains why descriptions are omitted.
- Updated existing-cases tests: replaced the 200-char-truncation
  and description-rendering tests with a single "descriptions and
  steps never appear" assertion that locks the contract.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@therealbrad therealbrad merged commit 4b0c3a3 into main May 22, 2026
5 checks passed
@therealbrad therealbrad deleted the fix/outline-existing-cases-context branch May 22, 2026 20:59
@therealbrad
Copy link
Copy Markdown
Contributor Author

🎉 This PR is included in version 0.29.9 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

therealbrad added a commit that referenced this pull request May 23, 2026
…#336)

The fixed 1500-token budget set in PR #335 traded one problem for
another: small folders worked but large ones still risked timing out,
and the budget math was charging for full case content (steps + field
values) while the outline prompt only renders names. With the full-
content estimator, 1500 tokens fits ~7-10 cases — far fewer than the
LLM could actually consume.

Two changes:

1. fetchHierarchyContext gains a `mode: "names" | "full"` parameter.
   In "names" mode it skips loading steps and field values from the
   DB and bills the budget per name length only — so 1500 tokens
   fits ~100 case titles instead of ~7-10. Default stays "full" so
   the single-shot and stream generators don't change behavior.

2. The outline route runs the LLM with an adaptive context budget,
   modeled on the duplicate-detection split-and-retry pattern:
   - Start at 1500 tokens for a fresh integration.
   - On a clean success, grow the learned budget by 1.5x for the
     next call, capped at 8000 (~600 titles).
   - On a timeout, halve the budget and retry the same request,
     up to depth 3. Remember the smaller working size so the next
     call starts from there (then grows back up over successive
     successful calls).
   - If the halve-chain bottoms out below 100 tokens, the final
     attempt runs with no existing-cases context — matching
     pre-PR-#335 behavior, so the call always returns something.

State is in-memory only, lost on restart. Test case generation has
never persisted across restarts.

Extracted budget helpers to outline/adaptive-budget.ts with 10 unit
tests covering growth, cap, never-below-initial, per-integration
isolation, convergence sequence, and isTimeoutError classification.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant