Skip to content

fix(llm): adaptive context budget for outline phase to avoid timeouts#336

Open
therealbrad wants to merge 1 commit into
mainfrom
fix/outline-adaptive-context-budget
Open

fix(llm): adaptive context budget for outline phase to avoid timeouts#336
therealbrad wants to merge 1 commit into
mainfrom
fix/outline-adaptive-context-budget

Conversation

@therealbrad
Copy link
Copy Markdown
Contributor

Description

Follow-up to PR #335. The fixed 1500-token outline context budget shipped in #335 traded one problem for another:

  1. Budget math was over-charging. fetchHierarchyContext bills the budget per case including all steps and field-value text. The outline prompt only renders titles. Net effect: 1500 tokens fit only ~7–10 cases instead of the ~100 titles the LLM could actually consume.
  2. Large folders still risked timing out. A fixed budget can't adapt to a slow integration or a model that's having a bad minute.

This PR makes the outline phase forgive both — and patterns it after the existing duplicate-detection / auto-tag split-and-retry logic so the codebase keeps one shape for this kind of thing.

Two changes:

  1. fetchHierarchyContext gains a mode: "names" | "full" parameter.

    • "names" mode skips loading steps + field values from the DB and bills the budget per name length only. ~100 titles in 1500 tokens instead of ~7–10 cases.
    • Default stays "full" so the single-shot and stream generators don't change behavior.
  2. Outline route runs the LLM with an adaptive context budget. Modeled on duplicate-analysis.service.ts:149-224:

    • Start at 1500 tokens for a fresh integration.
    • On a clean success → grow the learned budget by 1.5× for next call, capped at 8000 (~600 titles).
    • On a timeout → halve the budget and retry the same request, up to depth 3. Save the smaller working size so the next call starts there.
    • If the halve-chain bottoms out below 100 tokens → final attempt with no existing-cases context. Matches pre-fix(llm): pass existing folder cases into outline prompt to avoid duplicates #335 behavior, so the call always returns something.

In-memory state only, lost on restart. Test case generation has never persisted across restarts.

Convergence example (fast integration, 6 successful calls in a session): 1500 → 2250 → 3375 → 5063 → 7595 → 8000 (cap) → 8000

Worst case (doomed call on a very slow integration): 4 × timeout total before falling back to no context.

Related Issue

N/A — direct follow-up to a customer report addressed in PR #335.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • Performance / reliability

Testing

  • 10 new unit tests for the adaptive-budget helpers in outline/adaptive-budget.test.ts:
    • getStartingBudget initial, growth, cap, never-below-initial, learned-zero recovery, per-integration isolation, and the full convergence sequence (1500 → 2250 → 3375 → 5063 → 7595 → 8000)
    • isTimeoutError recognises both the LlmError { code: "TIMEOUT" } shape and plain Error("Request timeout") messages
  • Full unit suite: 7429 passed / 0 failed (+10 vs main).

Checklist

  • My code follows the project style guidelines
  • I have performed a self-review of my own code
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective
  • New and existing unit tests pass locally with my changes

The fixed 1500-token budget set in PR #335 traded one problem for
another: small folders worked but large ones still risked timing out,
and the budget math was charging for full case content (steps + field
values) while the outline prompt only renders names. With the full-
content estimator, 1500 tokens fits ~7-10 cases — far fewer than the
LLM could actually consume.

Two changes:

1. fetchHierarchyContext gains a `mode: "names" | "full"` parameter.
   In "names" mode it skips loading steps and field values from the
   DB and bills the budget per name length only — so 1500 tokens
   fits ~100 case titles instead of ~7-10. Default stays "full" so
   the single-shot and stream generators don't change behavior.

2. The outline route runs the LLM with an adaptive context budget,
   modeled on the duplicate-detection split-and-retry pattern:
   - Start at 1500 tokens for a fresh integration.
   - On a clean success, grow the learned budget by 1.5x for the
     next call, capped at 8000 (~600 titles).
   - On a timeout, halve the budget and retry the same request,
     up to depth 3. Remember the smaller working size so the next
     call starts from there (then grows back up over successive
     successful calls).
   - If the halve-chain bottoms out below 100 tokens, the final
     attempt runs with no existing-cases context — matching
     pre-PR-#335 behavior, so the call always returns something.

State is in-memory only, lost on restart. Test case generation has
never persisted across restarts.

Extracted budget helpers to outline/adaptive-budget.ts with 10 unit
tests covering growth, cap, never-below-initial, per-integration
isolation, convergence sequence, and isTimeoutError classification.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant