fix(llm): pass existing folder cases into outline prompt to avoid duplicates#335
Merged
Merged
Conversation
…licates When a user generated test cases for the same story a second time (e.g. first run focused on critical paths, second run on edge cases via the notes suggestions), the AI would produce overlapping titles because the outline phase had no awareness of cases that already existed. The single-shot generator (route.ts) has always fetched folder-hierarchy context and passed it into the prompt with an explicit "do not duplicate" instruction. The newer two-phase outline -> expand flow dropped that context. buildOutlineUserPrompt only saw the issue plus the free-form user notes, so a second run on the same story produced near-duplicates. Restore parity: - buildOutlineUserPrompt renders an "EXISTING TEST CASES" block when the context carries cases. Each entry is the title plus up to 200 chars of description (the outline only needs enough signal to recognise overlap; full step detail is unnecessary). - outline/route.ts now mirrors the single-shot route's token-budget math and calls fetchHierarchyContext before assembling the user prompt, then re-builds the prompt with the enriched context. - Four new unit tests for buildOutlineUserPrompt covering the no-existing-cases path, the rendered block, 200-char truncation, and no-description entries. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The first pass at this PR copied the single-shot generator's 65%-of- request budget. That budget makes sense for the single-shot path (which generates entire cases in one round-trip), but it's too generous for the outline phase: at default 4096 tokens-per-request, the outline call could end up carrying ~2500 tokens of folder context plus all the steps and field values for every fetched case. Customer hit a 60s Anthropic timeout on every regeneration against a folder that already had cases. The outline LLM only needs to know WHICH titles already exist to avoid emitting overlapping ones — descriptions and steps add no useful dedup signal at the title stage. - Outline route: hardcoded 1500-token context budget instead of scaling with maxTokensPerRequest. Fits roughly 100-200 case names without bloating the prompt or pushing the request past the configured integration timeout (default 30s). - buildOutlineUserPrompt now renders titles only, no descriptions. Heading renamed to "EXISTING TEST CASE TITLES" to reflect the shape, and a comment explains why descriptions are omitted. - Updated existing-cases tests: replaced the 200-char-truncation and description-rendering tests with a single "descriptions and steps never appear" assertion that locks the contract. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Contributor
Author
|
🎉 This PR is included in version 0.29.9 🎉 The release is available on GitHub release Your semantic-release bot 📦🚀 |
7 tasks
therealbrad
added a commit
that referenced
this pull request
May 23, 2026
…#336) The fixed 1500-token budget set in PR #335 traded one problem for another: small folders worked but large ones still risked timing out, and the budget math was charging for full case content (steps + field values) while the outline prompt only renders names. With the full- content estimator, 1500 tokens fits ~7-10 cases — far fewer than the LLM could actually consume. Two changes: 1. fetchHierarchyContext gains a `mode: "names" | "full"` parameter. In "names" mode it skips loading steps and field values from the DB and bills the budget per name length only — so 1500 tokens fits ~100 case titles instead of ~7-10. Default stays "full" so the single-shot and stream generators don't change behavior. 2. The outline route runs the LLM with an adaptive context budget, modeled on the duplicate-detection split-and-retry pattern: - Start at 1500 tokens for a fresh integration. - On a clean success, grow the learned budget by 1.5x for the next call, capped at 8000 (~600 titles). - On a timeout, halve the budget and retry the same request, up to depth 3. Remember the smaller working size so the next call starts from there (then grows back up over successive successful calls). - If the halve-chain bottoms out below 100 tokens, the final attempt runs with no existing-cases context — matching pre-PR-#335 behavior, so the call always returns something. State is in-memory only, lost on restart. Test case generation has never persisted across restarts. Extracted budget helpers to outline/adaptive-budget.ts with 10 unit tests covering growth, cap, never-below-initial, per-integration isolation, convergence sequence, and isTimeoutError classification. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
When a user generated test cases for the same story a second time (for example, first run focused on critical paths, second run on edge cases via the suggestion chips), the AI would produce overlapping titles because the outline phase had no awareness of cases that already existed.
The single-shot generator (
route.ts) has always fetched folder-hierarchy context and passed it into the prompt with an explicit "do not duplicate" instruction. The newer two-phase outline → expand flow dropped that context —buildOutlineUserPromptonly saw the issue plus the free-form user notes, so a second run on the same story produced near-duplicates.This restores parity. No UI, behavior, or API surface changes — same inputs, same flow; the outline LLM just now sees the same coverage context the single-shot generator always did.
Changes:
buildOutlineUserPromptrenders anEXISTING TEST CASES — DO NOT DUPLICATE OR SUBSTANTIALLY OVERLAPblock when the context carries cases. Each entry is the title plus up to 200 chars of description (the outline only needs enough signal to recognise overlap; full step detail is unnecessary).outline/route.tsnow mirrors the single-shot route's token-budget math and callsfetchHierarchyContextbefore assembling the user prompt, then re-builds the prompt with the enriched context.buildOutlineUserPromptcovering the no-existing-cases path, the rendered block, 200-char truncation, and no-description entries.Related Issue
N/A — customer support report.
Type of Change
Testing
buildOutlineUserPrompt.Checklist