fix: address Copilot review on utils-split PR by denis-samatov · Pull Request #182 · VectifyAI/PageIndex

denis-samatov · 2026-03-26T13:12:08Z

Summary

This PR addresses the automated Copilot review comments left on the feature/utils-split PR.

Changes

`pageindex/core/pdf.py`

Fix KeyError crash: get_page_tokens() now wraps tiktoken.encoding_for_model() in try/except KeyError and falls back to cl100k_base — consistent with the same fallback logic already in core/llm.py's count_tokens()

`tests/conftest.py`

Fix broken sys.path: Replaced ../src with .. (project root). The repo has no src/ directory, so the previous path was silently adding a non-existent location to sys.path, masking potential import failures

`pageindex/utils.py`

Restore backward compatibility: Added explicit wrappers for the removed legacy symbols llm_completion and llm_acompletion. Without these, any code importing from pageindex.utils would raise NameError at runtime. The wrappers delegate to the new ChatGPT_API and ChatGPT_API_async in core/llm, with a thread-executor fallback for the async path

Testing

All 11 unit tests pass:

11 passed in 0.51s

- core/pdf.py: add KeyError fallback in get_page_tokens() for unknown model names, consistent with count_tokens() in core/llm.py - tests/conftest.py: fix sys.path to point at project root instead of non-existent '../src' directory - pageindex/utils.py: add backward-compatible wrappers for removed legacy symbols llm_completion and llm_acompletion so existing callers don't raise NameError at runtime

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

Copilot

Pull request overview

This PR follows up on the earlier feature/utils-split work by wiring pageindex to the new core/* modules, restoring legacy pageindex.utils symbols, and adding/adjusting tests + config loading to match the refactor.

Changes:

Re-export core modules via pageindex/utils.py and add legacy llm_completion / llm_acompletion wrappers.
Move page_index.py off from .utils import * onto explicit core.* imports and update call sites accordingly.
Add core/ modules (llm, pdf, tree, logging), introduce Pydantic-based config loading, and add unit tests.

Reviewed changes

Copilot reviewed 13 out of 14 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
`pageindex/utils.py`	Re-exports `core` modules and adds backward-compat wrappers for removed legacy symbols.
`pageindex/page_index.py`	Switches to explicit imports from `core/*` and updates LLM call sites and option handling.
`pageindex/core/llm.py`	Adds OpenAI client helpers, token counting, and JSON extraction utilities.
`pageindex/core/pdf.py`	Adds PDF helpers and per-page token counting with a tiktoken fallback.
`pageindex/core/tree.py`	Adds tree/structure utilities and summary generation helpers.
`pageindex/core/logging.py`	Adds JSON file logger used by indexing pipeline.
`pageindex/config.py`	Introduces Pydantic-backed `PageIndexConfig` + loader.
`config.yaml`	Adds repo-level default configuration values.
`pyproject.toml`	Declares package metadata + pinned dependencies.
`tests/conftest.py`	Fixes sys.path injection to project root for tests.
`tests/test_tree.py`	Adds tests for new tree utilities.
`tests/test_llm.py`	Adds tests for JSON extraction and token counting.
`tests/test_config.py`	Adds tests for config loading and validation behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

pageindex/core/tree.py

pageindex/core/pdf.py

pageindex/utils.py

pageindex/page_index.py

pageindex/core/tree.py

pageindex/core/pdf.py

pageindex/utils.py

- core/tree.py: fix IndentationError in add_node_text() and add_node_text_with_labels() — extra space on node['text'] = ... line - core/pdf.py: guard against None from page.extract_text() in both extract_text_from_pdf() and get_page_tokens() with 'or ""' - utils.py: use functools.partial in run_in_executor fallback to safely pass kwargs; add return_finish_reason routing in llm_completion wrapper; remove unused _core_pkg import - page_index.py: load default ConfigLoader when opt is None in page_index_main() to prevent model=None crash; fix tob_ typo in single_toc_item_index_fixer()

JiwaniZakir

The default values in config.yaml and PageIndexConfig in pageindex/config.py are inconsistent in several places: toc_check_page_num is 20 vs. 3, max_page_num_each_node is 10 vs. 5, max_token_num_each_node is 20000 vs. 4000, and both if_add_doc_description and if_add_node_text are false in the YAML but True in the Pydantic model. Since ConfigLoader falls back to the Pydantic defaults when no config file is found (e.g., during testing or when the package is installed without the repo's config.yaml), this creates two distinct behavioral profiles depending on the deployment context — likely unintentional. The Pydantic field defaults should either match config.yaml exactly, or config.yaml should be the sole source of truth and the Pydantic defaults should be treated as last-resort fallbacks with that clearly documented.

Additionally, in ConfigLoader.load, filtering out None values from user_dict (if v is not None) silently prevents a caller from explicitly clearing a field that has a non-None default (e.g., setting api_key=None to override a key loaded from the YAML). Consider distinguishing between "not provided" and "explicitly set to None," possibly by using MISSING sentinel values or model_fields_set from Pydantic v2.

…utils-split # Conflicts: # examples/documents/conftest.py # examples/documents/test_config.py # examples/documents/test_llm.py # examples/documents/test_tree.py # pageindex/config.yaml # pageindex/utils.py

denis-samatov added 5 commits March 26, 2026 18:59

refactor: modularize utils.py into core package

a10ad51

feat: add pydantic configuration and validation

67f6cdf

fix(page_index): address Copilot review feedback

afecf2f

fix: logic tweaks, boolean conditionals, and added retry limits

d2bed43

Copilot AI review requested due to automatic review settings March 26, 2026 13:12

claude bot reviewed Mar 26, 2026

View reviewed changes

Copilot started reviewing on behalf of denis-samatov March 26, 2026 13:12 View session

Copilot AI reviewed Mar 26, 2026

View reviewed changes

JiwaniZakir reviewed Mar 30, 2026

View reviewed changes

denis-samatov added 2 commits March 30, 2026 14:02

fix: align config defaults and preserve overrides

f01793a

Merge remote-tracking branch 'upstream/main' into fix/copilot-review-…

46cf6b9

…utils-split # Conflicts: # examples/documents/conftest.py # examples/documents/test_config.py # examples/documents/test_llm.py # examples/documents/test_tree.py # pageindex/config.yaml # pageindex/utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: address Copilot review on utils-split PR#182

fix: address Copilot review on utils-split PR#182
denis-samatov wants to merge 8 commits intoVectifyAI:mainfrom
denis-samatov:fix/copilot-review-utils-split

denis-samatov commented Mar 26, 2026

Uh oh!

claude bot left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JiwaniZakir left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

denis-samatov commented Mar 26, 2026

Summary

Changes

pageindex/core/pdf.py

tests/conftest.py

pageindex/utils.py

Testing

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JiwaniZakir left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

`pageindex/core/pdf.py`

`tests/conftest.py`

`pageindex/utils.py`