fix: prevent KeyError crash and context exhaustion in TOC processing#188
fix: prevent KeyError crash and context exhaustion in TOC processing#188Shreyansh1729 wants to merge 2 commits intoVectifyAI:mainfrom
Conversation
- Use .get() with safe defaults for all LLM response dict accesses - Optimize extract_toc_content retry loop to grow chat_history incrementally instead of rebuilding with full accumulated response - Optimize toc_transformer retry loop to use chat_history instead of re-embedding the entire raw TOC and incomplete JSON in each prompt - Return best-effort results on max retries instead of raising - Add 14 mock-based tests covering all fix scenarios Closes VectifyAI#163
- Upstream main (28542de) — latest VectifyAI/PageIndex - PR VectifyAI#188: fix prevent KeyError crash and context exhaustion in TOC processing - Fix list_index shadowing (VectifyAI#167) already in upstream - LiteLLM integration (VectifyAI#168) already in upstream Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
JiwaniZakir
left a comment
There was a problem hiding this comment.
The single_toc_item_index_fixer change in page_index.py (line 744) swaps a KeyError for a potential TypeError: json_content.get('physical_index') returns None when the key is absent, and then convert_physical_index_to_int(None) is called. Whether this actually resolves the crash depends entirely on how convert_physical_index_to_int handles None — worth checking and possibly providing an explicit fallback value (e.g., .get('physical_index', 0)) rather than silently passing None downstream.
The for/else refactor in both extract_toc_content and toc_transformer is a clear improvement over the manual attempt counter, but the else branch now only logs a warning instead of raising. This means callers silently receive a partial or potentially malformed result after max retries, which could cause confusing failures further downstream. The original raise Exception was at least explicit about the failure; a silent degraded return should at minimum be documented in the function's docstring or validated before returning.
In toc_transformer's loop, last_complete = last_complete[:position+2] runs on every iteration — including iterations where last_complete already contains successfully appended content from a previous pass. This means valid accumulated output could be silently truncated on each retry cycle if rfind('}') finds an interior } rather than the terminal one.
- Restore explicit Exception on max retries instead of silent warning - Move truncation logic before the retry loop so it only runs once on the initial incomplete response, not on every iteration - Add explicit None guard for physical_index before passing to convert_physical_index_to_int to prevent potential TypeError - Update test to expect Exception on max retries
|
Thanks for the detailed review. All three points are valid — pushed fixes in ce67562:
|
|
@JiwaniZakir check it now |
|
The |
|
Confirmed on both points:
|
|
@JiwaniZakir it is ready for the merge |
|
The closing rationale is sound — changing the fallback from |
|
Valid point about the For this PR, the scope is intentionally narrow: fix the crash from #163 with minimal risk. The Happy to open a follow-up PR to normalize |
|
The context exhaustion fix is the more critical of the two — passing the full accumulated response on each retry is quadratic in cost for large documents. To confirm the refactored retry logic is correct, check that the new prompt on iteration N contains only the remaining unparsed portion of the TOC, not the concatenation of all prior partial responses. The |
|
Confirmed — the prompt on iteration N contains only the fixed continuation string, not the accumulated response.
|
|
The |
|
The base prompt (containing the full TOC text) is constructed once at line 160-165 and stored in The retry loop (line 180-187) only passes For |
|
The incremental |
|
Boundary tracking is handled through the conversation context itself. Here's the exact message sequence the LLM sees on iteration N: Each assistant message contains only its own chunk. The LLM sees the full conversation thread and naturally continues from where the last response ended. The offset is implicit in the chat sequence — the model knows exactly where it left off because it can see its own prior outputs as separate messages.
|
|
@JiwaniZakir i think this ready now |
|
The narrow scope is reasonable for an emergency fix, but the |
|
The total tokens in The old code was quadratic — on iteration N, the prompt f-string embedded the full accumulated output ( Now it's O(total_output) — each chunk exists once. For an 800-page document, if the TOC output is ~10k tokens split across 5 retries, the final call sees ~10k tokens of history vs the old code seeing ~30k+ due to duplication. Context limits could still be hit on extremely large TOCs, but that would require a fundamentally different approach (chunking the input TOC itself) — a separate concern from this fix. Agreed on the |
|
@JiwaniZakir check it |
|
@JiwaniZakir , @KylinMountain i think it's ready for merge |
Summary
Fixes a critical bug where processing large documents (~800+ pages) causes a
KeyError: 'completed'crash and context window exhaustion during TOC transformation retries.Root Cause
Two independent issues combined to break large document processing:
KeyError on truncated LLM output: When
extract_json()returns{}due to malformed or truncated JSON, direct dictionary access likejson_content['completed']throws aKeyErrorinstead of gracefully handling the failure.Context window exhaustion in retry loops: Both
extract_toc_content()andtoc_transformer()rebuilt the full prompt context on every retry iteration — embedding the entire raw TOC text AND the growing accumulated response into each new prompt. For 800-page documents, this quickly exceeds the model's 128k token limit.Changes Made
json_content['key']accesses with.get('key', default)to handle empty/malformed LLM responses gracefullyextract_toc_content()retry loop to growchat_historyincrementally — only the delta/new content is appended per iteration, not the full accumulated responsetoc_transformer()retry loop to usechat_historyinstead of re-embedding the entire raw TOC and incomplete JSON directly in each promptlast_complete[:position+2]) to run once before the retry loop, preventing valid accumulated content from being cut on subsequent passesNoneguard forphysical_indexbefore callingconvert_physical_index_to_intExceptionon max retries so callers fail fast instead of receiving partial resultsTesting
14 mock-based tests added in
tests/test_issue_163.py:extract_toc_contentretry behavior (first-try success, continuation, max retries raises exception, incremental history growth)toc_transformerretry behavior (first-try success, missing key handling)Closes #163