fix(llm): adaptive context budget for outline phase to avoid timeouts#336
Open
therealbrad wants to merge 1 commit into
Open
fix(llm): adaptive context budget for outline phase to avoid timeouts#336therealbrad wants to merge 1 commit into
therealbrad wants to merge 1 commit into
Conversation
The fixed 1500-token budget set in PR #335 traded one problem for another: small folders worked but large ones still risked timing out, and the budget math was charging for full case content (steps + field values) while the outline prompt only renders names. With the full- content estimator, 1500 tokens fits ~7-10 cases — far fewer than the LLM could actually consume. Two changes: 1. fetchHierarchyContext gains a `mode: "names" | "full"` parameter. In "names" mode it skips loading steps and field values from the DB and bills the budget per name length only — so 1500 tokens fits ~100 case titles instead of ~7-10. Default stays "full" so the single-shot and stream generators don't change behavior. 2. The outline route runs the LLM with an adaptive context budget, modeled on the duplicate-detection split-and-retry pattern: - Start at 1500 tokens for a fresh integration. - On a clean success, grow the learned budget by 1.5x for the next call, capped at 8000 (~600 titles). - On a timeout, halve the budget and retry the same request, up to depth 3. Remember the smaller working size so the next call starts from there (then grows back up over successive successful calls). - If the halve-chain bottoms out below 100 tokens, the final attempt runs with no existing-cases context — matching pre-PR-#335 behavior, so the call always returns something. State is in-memory only, lost on restart. Test case generation has never persisted across restarts. Extracted budget helpers to outline/adaptive-budget.ts with 10 unit tests covering growth, cap, never-below-initial, per-integration isolation, convergence sequence, and isTimeoutError classification. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Follow-up to PR #335. The fixed 1500-token outline context budget shipped in #335 traded one problem for another:
fetchHierarchyContextbills the budget per case including all steps and field-value text. The outline prompt only renders titles. Net effect: 1500 tokens fit only ~7–10 cases instead of the ~100 titles the LLM could actually consume.This PR makes the outline phase forgive both — and patterns it after the existing duplicate-detection / auto-tag split-and-retry logic so the codebase keeps one shape for this kind of thing.
Two changes:
fetchHierarchyContextgains amode: "names" | "full"parameter."names"mode skips loading steps + field values from the DB and bills the budget per name length only. ~100 titles in 1500 tokens instead of ~7–10 cases."full"so the single-shot and stream generators don't change behavior.Outline route runs the LLM with an adaptive context budget. Modeled on
duplicate-analysis.service.ts:149-224:In-memory state only, lost on restart. Test case generation has never persisted across restarts.
Convergence example (fast integration, 6 successful calls in a session):
1500 → 2250 → 3375 → 5063 → 7595 → 8000 (cap) → 8000Worst case (doomed call on a very slow integration):
4 × timeouttotal before falling back to no context.Related Issue
N/A — direct follow-up to a customer report addressed in PR #335.
Type of Change
Testing
outline/adaptive-budget.test.ts:getStartingBudgetinitial, growth, cap, never-below-initial, learned-zero recovery, per-integration isolation, and the full convergence sequence (1500 → 2250 → 3375 → 5063 → 7595 → 8000)isTimeoutErrorrecognises both theLlmError { code: "TIMEOUT" }shape and plainError("Request timeout")messagesChecklist