VectifyAI · denis-samatov · Mar 26, 2026 · Copilot · Mar 26, 2026
diff --git a/pageindex.egg-info/PKG-INFO b/pageindex.egg-info/PKG-INFO
@@ -0,0 +1,147 @@
+Metadata-Version: 2.4
+Name: pageindex
+Version: 0.1.0
+Summary: Vectorless, reasoning-based RAG indexer
+License: MIT
+Requires-Python: >=3.9
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: openai==1.101.0
+Requires-Dist: pymupdf==1.26.4
+Requires-Dist: PyPDF2==3.0.1
+Requires-Dist: python-dotenv==1.1.0
+Requires-Dist: tiktoken==0.11.0
+Requires-Dist: pyyaml==6.0.2
+Requires-Dist: pydantic>=2.0
+Provides-Extra: dev
+Requires-Dist: pytest>=7.4.0; extra == "dev"
+Requires-Dist: pytest-asyncio>=0.21.0; extra == "dev"
+Dynamic: license-file
+
+<div align="center">
+
+<a href="https://vectify.ai/pageindex" target="_blank">
+  <img src="https://github.com/user-attachments/assets/46201e72-675b-43bc-bfbd-081cc6b65a1d" alt="PageIndex Banner" />
+</a>
+
+<br/>
+<br/>
+
+<p align="center">
+  <a href="https://trendshift.io/repositories/14736" target="_blank"><img src="https://trendshift.io/api/badge/repositories/14736" alt="VectifyAI%2FPageIndex | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>
+</p>
+
+# PageIndex: Reasoning-Based Vectorless RAG
+
+<p align="center"><b>Reasoning-native RAG&nbsp; ◦ &nbsp;No Vector DB&nbsp; ◦ &nbsp;No Chunking&nbsp; ◦ &nbsp;Human-like Retrieval</b></p>
+
+<h4 align="center">
+  <a href="https://vectify.ai">🏠 Homepage</a>&nbsp; • &nbsp;
+  <a href="https://chat.pageindex.ai">🖥️ Chat Platform</a>&nbsp; • &nbsp;
+  <a href="https://pageindex.ai/mcp">🔌 MCP</a>&nbsp; • &nbsp;
+  <a href="https://docs.pageindex.ai">📚 Documentation</a>&nbsp; • &nbsp;
+  <a href="https://discord.com/invite/VuXuf29EUj">💬 Discord</a>&nbsp; • &nbsp;
+  <a href="https://ii2abc2jejf.typeform.com/to/tK3AXl8T">✉️ Contact Us</a>&nbsp;
+</h4>
+
+</div>
+
+<details open>
+<summary><h3>📢 Latest Updates</h3></summary>
+
+ **🔥 Releases:**
+- [**PageIndex Chat**](https://chat.pageindex.ai): The first human-like agentic platform for document analysis, built for professional long-context documents. Also available via [MCP](https://pageindex.ai/mcp) or [API](https://docs.pageindex.ai/quickstart) (beta).
+
+ **📝 Articles:**
+- [**PageIndex Framework**](https://pageindex.ai/blog/pageindex-intro): Introduces the PageIndex framework — an *agentic, in-context tree index* that empowers LLMs to perform *reasoning-based, human-like retrieval* over long documents without a Vector DB or chunking.
+
+ **🧪 Cookbooks:**
+- [Vectorless RAG](https://docs.pageindex.ai/cookbook/vectorless-rag-pageindex): A minimal, practical example of reasoning-based RAG using PageIndex. No vectors, no chunks, and human-like retrieval.
+- [Vision-based Vectorless RAG](https://docs.pageindex.ai/cookbook/vision-rag-pageindex): Vision-only RAG without OCR; a reasoning-native approach that acts directly over PDF page images.
+</details>
+
+---
+
+# 📑 Introduction to PageIndex
+
+Tired of poor retrieval accuracy with Vector DBs on long, professional documents? Traditional vector RAG relies on semantic *similarity* rather than true *relevance*. But **similarity ≠ relevance** — what we need for retrieval is **relevance**, and relevance requires **reasoning**. When dealing with professional documents where domain knowledge and multi-step reasoning matter, similarity search often fails.
+
+Inspired by AlphaGo, we propose **[PageIndex](https://vectify.ai/pageindex)** — a reasoning-based, **Vectorless RAG** framework that builds a **hierarchical tree index** from long documents and prompts the LLM to **reason over this index** for **agentic, context-aware retrieval**.
+
+---
+
+# ⚙️ Package Usage
+
+### 1. Install Dependencies
+
+```bash
+pip3 install --upgrade -r requirements.txt
+pip3 install -e .
+```
+
+### 2. Provide your OpenAI API Key
+
+Create a `.env` file in the root directory and add your API key:
+
+```bash
+OPENAI_API_KEY=your_openai_key_here
+```
+
+### 3. Run PageIndex on your PDF
+
+```bash
+pageindex --pdf_path /path/to/your/document.pdf
+```
+
+---
+
+# 💻 Developer Guide
+
+This section is for developers contributing to `PageIndex` or integrating it as a library.
+
+### Development Setup
+
+1.  **Clone the repository:**
+    ```bash
+    git clone https://github.com/VectifyAI/PageIndex.git
+    cd PageIndex
+    ```
+
+2.  **Install development dependencies:**
+    ```bash
+    pip install -e ".[dev]"
+    # Or simply:
+    pip install pytest pytest-asyncio
+    ```
+
+3.  **Run Tests:**
+    We use `pytest` for unit and integration testing.
+    ```bash
+    pytest
+    ```
+
+### Project Structure
+
+The project has been refactored into a modular library structure under `pageindex`.
+
+-   `pageindex/core/`: Core logic modules.
+    -   `llm.py`: LLM interactions and token counting.
+    -   `pdf.py`: PDF text extraction and processing.
+    -   `tree.py`: Tree data structure manipulation and recursion.
+    -   `logging.py`: Custom logging utilities.
+-   `pageindex/config.py`: Configuration loading and validation (Pydantic).
+-   `pageindex/cli.py`: Command Line Interface entry point.
+-   `pageindex/utils.py`: Facade for backward compatibility.
+
+### Configuration
+
+Configuration is handled via `pageindex/config.py`. You can modify default settings in `config.yaml` or override them via environment variables (`PAGEINDEX_CONFIG`) or CLI arguments.
+Config validation is powered by Pydantic, ensuring type safety.
+
+For API Reference, please see [API_REFERENCE.md](docs/API_REFERENCE.md).
+
+---
+
+# ⭐ Support Us
+
+Give us a star 🌟 if you like the project. Thank you!
diff --git a/pageindex.egg-info/SOURCES.txt b/pageindex.egg-info/SOURCES.txt
@@ -0,0 +1,28 @@
+LICENSE
+README.md
+pyproject.toml
+pageindex/__init__.py
+pageindex/cli.py
+pageindex/config.py
+pageindex/page_index.py
+pageindex/page_index_md.py
+pageindex/utils.py
+pageindex.egg-info/PKG-INFO
+pageindex.egg-info/SOURCES.txt
+pageindex.egg-info/dependency_links.txt
+pageindex.egg-info/entry_points.txt
+pageindex.egg-info/requires.txt
+pageindex.egg-info/top_level.txt
+pageindex/core/__init__.py
+pageindex/core/llm.py
+pageindex/core/logging.py
+pageindex/core/pdf.py
+pageindex/core/tree.py
+scripts/analyze_notebooks.py
+scripts/local_client_adapter.py
+scripts/refactor_notebooks_logic.py
+scripts/verify_adapter.py
+tests/conftest.py
+tests/test_config.py
+tests/test_llm.py
+tests/test_tree.py
diff --git a/pageindex.egg-info/dependency_links.txt b/pageindex.egg-info/dependency_links.txt
@@ -0,0 +1 @@
+
diff --git a/pageindex.egg-info/entry_points.txt b/pageindex.egg-info/entry_points.txt
@@ -0,0 +1,2 @@
+[console_scripts]
+pageindex = pageindex.cli:main
diff --git a/pageindex.egg-info/requires.txt b/pageindex.egg-info/requires.txt
@@ -0,0 +1,11 @@
+openai==1.101.0
+pymupdf==1.26.4
+PyPDF2==3.0.1
+python-dotenv==1.1.0
+tiktoken==0.11.0
+pyyaml==6.0.2
+pydantic>=2.0
+
+[dev]
+pytest>=7.4.0
+pytest-asyncio>=0.21.0
diff --git a/pageindex.egg-info/top_level.txt b/pageindex.egg-info/top_level.txt
@@ -0,0 +1,6 @@
+data
+docs
+notebooks
+pageindex
+scripts
+tests
-data
-docs
-notebooks
-pageindex
-scripts
-tests
+pageindex
-data
-docs
-notebooks
-pageindex
-scripts
-tests
+pageindex
diff --git a/pageindex/core/__init__.py b/pageindex/core/__init__.py
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		[console_scripts]
		pageindex = pageindex.cli:main