Skip to content

ci: add concurrency throttling for LLM API calls#288

Open
okwn wants to merge 1 commit into
VectifyAI:mainfrom
okwn:ci/add-concurrency-throttling
Open

ci: add concurrency throttling for LLM API calls#288
okwn wants to merge 1 commit into
VectifyAI:mainfrom
okwn:ci/add-concurrency-throttling

Conversation

@okwn
Copy link
Copy Markdown

@okwn okwn commented May 22, 2026

Summary

Adds semaphore-based concurrency throttling for LLM API calls to prevent HTTP 429 rate limits when processing large documents with many concurrent calls.

Changes

  • pageindex/concurrency.py (new): Semaphore-based throttling module with limited_llm_acompletion() wrapper and set_max_concurrent() / get_max_concurrent() configuration functions
  • pageindex/config.yaml: Add max_concurrent_llm_calls: 5 config option
  • pageindex/utils.py:
    • Import concurrency helpers
    • Replace llm_acompletion export with throttled version for backward compatibility
    • ConfigLoader.load() now applies the concurrency setting on load
    • Relax key validation to allow new config keys
  • pageindex/page_index.py: Use limited_llm_acompletion in check_title_appearance, check_title_appearance_in_start, and verify_toc functions
  • tests/test_concurrency.py (new): Tests for semaphore behavior and concurrent call limiting

Behavior

The semaphore limits concurrent LLM API calls globally (default: 5 concurrent). When processing large documents that would otherwise fire hundreds of simultaneous LLM calls, requests are now queued and executed in controlled batches.

Testing

pytest runs.

- Add semaphore-based throttling via pageindex/concurrency.py
- Add max_concurrent_llm_calls config option (default: 5)
- Apply throttling to check_title_appearance functions in page_index.py
- Add basic concurrency tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant