Conversation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
the models is recommended based on the computer spec, the context is optimized for a trade off between quality and effiecency.
Dependency ReviewThe following issues were found:
License Issuespyproject.toml
OpenSSF Scorecard
Scanned Files
|
|
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Pull request overview
This PR adds a fully local RAG (Retrieval-Augmented Generation) capability to DocFinder and expands indexing beyond PDFs (TXT/MD/DOCX), with supporting backend endpoints, UI updates, and test coverage.
Changes:
- Introduces
docfinder.rag(local LLM selection/download + a RAG orchestration engine) and new RAG HTTP endpoints (/rag/models,/rag/download,/rag/chat). - Adds page-aware chunking + retrieval (
chunk_text_stream_paged,get_context_by_page,get_context_window) to support better context assembly. - Expands ingestion/indexing and UX to support multi-format documents and adds a Spotlight-style quick-search panel on macOS.
Reviewed changes
Copilot reviewed 26 out of 28 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_rag.py | New tests for context window/page retrieval, page-aware chunking, model selection, and RAG engine assembly. |
| tests/test_pdf_loader.py | Updates mocks to use the new paged text iterator for PDF chunk building. |
| tests/test_indexer.py | Updates patch points from PDF-only discovery to multi-document discovery. |
| tests/test_imports.py | Updates version assertion to 2.0.0. |
| tests/test_frontend.py | Updates frontend template loader signature to accept a template name. |
| tests/test_cli.py | Updates CLI messaging from “PDFs” to “supported documents”. |
| src/docfinder/web/templates/spotlight.html | Adds dedicated Spotlight panel HTML/JS for quick search. |
| src/docfinder/web/templates/index.html | Major UI redesign + adds in-app Spotlight overlay, indexing warnings, and RAG settings + chat panel. |
| src/docfinder/web/frontend.py | Template loader now supports multiple templates and serves /spotlight. |
| src/docfinder/web/app.py | Adds RAG endpoints, system RAM endpoint, indexing scan endpoint, exclude-path support, and embed batch sizing by RAM. |
| src/docfinder/utils/text.py | Adds chunk_text_stream_paged() for page-provenance chunking. |
| src/docfinder/utils/files.py | Adds supported extensions + iter_document_paths(); keeps iter_pdf_paths() for compatibility. |
| src/docfinder/settings.py | Changes default hotkey to <alt>+d. |
| src/docfinder/rag/llm.py | New local LLM management: model tiers, RAM-based selection, download, and llama-cpp wrapper. |
| src/docfinder/rag/engine.py | New RAG engine coordinating search + context assembly + generation. |
| src/docfinder/rag/init.py | Adds RAG package init. |
| src/docfinder/ingestion/pdf_loader.py | Generalizes ingestion to PDF/TXT/MD/DOCX; adds paged iterators and stores page in chunk metadata. |
| src/docfinder/index/storage.py | Search now returns document_id; adds get_context_window() + get_context_by_page(). |
| src/docfinder/index/indexer.py | Generalizes indexer to index supported docs; adds exclusions and embedding batch-size override support. |
| src/docfinder/gui.py | Adds native macOS Spotlight NSPanel + CGEventTap-based global hotkey handling; integrates with web endpoints. |
| src/docfinder/embedding/encoder.py | Adds optional batch_size override to EmbeddingModel.embed(). |
| src/docfinder/cli.py | Updates CLI to index supported document types instead of PDFs only. |
| src/docfinder/init.py | Bumps package version to 2.0.0. |
| pyproject.toml | Bumps version + adds python-docx dependency and [rag] extra for llama-cpp-python. |
| README.md | Updates messaging and demo media for multi-format + new UX. |
| CLAUDE.md | Adds repo command/architecture guidance for Claude Code tooling. |
| CHANGELOG.md | Adds 2.0.0 release notes and updates compare links. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| "error": None, | ||
| } | ||
|
|
||
|
|
There was a problem hiding this comment.
RAG globals are accessed/modified from multiple threads without synchronization: _rag_llm_lock is defined but never used, and _rag_download is mutated inside _load_rag_llm() (run via asyncio.to_thread) while being read by /rag/download/status. This can lead to inconsistent status/progress and makes future changes riskier. Use _rag_llm_lock (or a dedicated lock) to guard updates to _rag_llm and _rag_download (or replace _rag_download with an immutable snapshot updated atomically).
| def _get_rag_download_state() -> dict[str, Any]: | |
| """Return a thread-safe snapshot of the RAG download state.""" | |
| with _rag_llm_lock: | |
| # Return a shallow copy so callers cannot mutate the shared dict | |
| return dict(_rag_download) |
| # Collect pages: start with center_page, then expand symmetrically | ||
| collected: List[dict] = [] | ||
| total_chars = 0 | ||
|
|
||
| # Add center page first | ||
| for c in chunks_with_page: | ||
| if c["page"] == center_page: | ||
| collected.append(c) | ||
| total_chars += len(c["text"]) | ||
|
|
||
| # Expand to adjacent pages | ||
| expand = 1 | ||
| while total_chars < max_chars: | ||
| added = False | ||
| for offset_page in (center_page - expand, center_page + expand): | ||
| if offset_page < 0: | ||
| continue | ||
| for c in chunks_with_page: | ||
| if c["page"] == offset_page and total_chars < max_chars: | ||
| collected.append(c) | ||
| total_chars += len(c["text"]) | ||
| added = True |
There was a problem hiding this comment.
get_context_by_page() does repeated full scans of chunks_with_page inside the expansion loop (for offset_page ...: for c in chunks_with_page:). For documents with many chunks/pages this becomes O(n * pages_expanded) and can be noticeably slow in RAG. Consider pre-grouping chunks by page (e.g., page -> [chunks]) once, then appending from the adjacent pages without re-scanning the entire chunk list each time (and consider skipping page 0 by checking offset_page <= 0).
| # Collect pages: start with center_page, then expand symmetrically | |
| collected: List[dict] = [] | |
| total_chars = 0 | |
| # Add center page first | |
| for c in chunks_with_page: | |
| if c["page"] == center_page: | |
| collected.append(c) | |
| total_chars += len(c["text"]) | |
| # Expand to adjacent pages | |
| expand = 1 | |
| while total_chars < max_chars: | |
| added = False | |
| for offset_page in (center_page - expand, center_page + expand): | |
| if offset_page < 0: | |
| continue | |
| for c in chunks_with_page: | |
| if c["page"] == offset_page and total_chars < max_chars: | |
| collected.append(c) | |
| total_chars += len(c["text"]) | |
| added = True | |
| # Group chunks by page for efficient access during expansion | |
| page_to_chunks = {} | |
| for c in chunks_with_page: | |
| page_to_chunks.setdefault(c["page"], []).append(c) | |
| # Collect pages: start with center_page, then expand symmetrically | |
| collected: List[dict] = [] | |
| total_chars = 0 | |
| # Add center page first | |
| for c in page_to_chunks.get(center_page, []): | |
| collected.append(c) | |
| total_chars += len(c["text"]) | |
| # Expand to adjacent pages | |
| expand = 1 | |
| while total_chars < max_chars: | |
| added = False | |
| for offset_page in (center_page - expand, center_page + expand): | |
| # Skip non-positive pages; page numbers are expected to be 1-based | |
| if offset_page <= 0: | |
| continue | |
| for c in page_to_chunks.get(offset_page, []): | |
| if total_chars >= max_chars: | |
| break | |
| collected.append(c) | |
| total_chars += len(c["text"]) | |
| added = True |
|
|
||
| @app.post("/index/scan") | ||
| async def scan_index_paths(payload: ScanPayload) -> dict[str, Any]: | ||
| """Scan paths for PDFs and return file stats without indexing.""" |
There was a problem hiding this comment.
scan_index_paths() docstring says it scans paths for PDFs, but the implementation uses iter_document_paths() and returns counts by extension for multiple document types. Update the docstring to match the current behavior (supported documents rather than PDFs only).
| """Scan paths for PDFs and return file stats without indexing.""" | |
| """Scan the given paths for supported document files and return file stats (including counts by extension) without indexing.""" |
| pages = [(1, "A" * 60), (2, "B" * 60)] | ||
| results = list(chunk_text_stream_paged(pages, max_chars=100, overlap=0)) | ||
|
|
||
| assert len(results) >= 1 | ||
| # First chunk starts on page 1 | ||
| assert results[0][1] == 1 |
There was a problem hiding this comment.
test_chunk_spanning_pages only asserts the first chunk’s page number. Given the intended contract (“page_number is the page that contributed the start of the chunk”), add an assertion for the next yielded chunk when overlap=0 (it should start on page 2 in this setup). This would catch incorrect page provenance when chunk_text_stream_paged() slices the buffer past the end of the first page.
| pages = [(1, "A" * 60), (2, "B" * 60)] | |
| results = list(chunk_text_stream_paged(pages, max_chars=100, overlap=0)) | |
| assert len(results) >= 1 | |
| # First chunk starts on page 1 | |
| assert results[0][1] == 1 | |
| # Second chunk should start on page 2 when overlap=0. | |
| pages = [(1, "A" * 60), (2, "B" * 60)] | |
| results = list(chunk_text_stream_paged(pages, max_chars=100, overlap=0)) | |
| # We expect exactly two chunks: 100 chars, then 20 chars. | |
| assert len(results) == 2 | |
| # First chunk starts on page 1 | |
| assert results[0][1] == 1 | |
| # Second chunk starts on page 2 | |
| assert results[1][1] == 2 |
|
|
||
| @staticmethod | ||
| def _assemble_context_text(chunks: List[dict], results: List[SearchResult]) -> str: | ||
| """Join chunk texts, trimming symmetrically if the total exceeds the budget.""" |
There was a problem hiding this comment.
_assemble_context_text() docstring says it “trims symmetrically”, but the implementation truncates by stopping once the global character budget is reached while iterating documents/chunks in order. Either adjust the docstring to match the actual behavior, or implement true symmetric trimming around the highest-relevance chunks if that’s the intent.
| """Join chunk texts, trimming symmetrically if the total exceeds the budget.""" | |
| """Join chunk texts into a single context string, truncating once the | |
| global character budget is reached. | |
| Chunks are grouped by document path and ordered by their chunk index. | |
| Text is appended in that order until ``_MAX_CONTEXT_CHARS`` is exceeded; | |
| any remaining chunks are omitted (prefix-style trimming, not symmetric).""" |
| # Track which page the start of the buffer came from | ||
| buf_page = 0 | ||
| step = max(max_chars - overlap, 1) | ||
|
|
||
| for page_num, part in pages: | ||
| if not buffer: | ||
| buf_page = page_num | ||
| buffer += part | ||
| while len(buffer) >= max_chars: | ||
| yield buffer[:max_chars], buf_page | ||
| buffer = buffer[step:] | ||
| # After slicing, the start of the buffer has shifted. | ||
| # The overlap region still belongs to the old page, so | ||
| # we keep buf_page unchanged — it's the page of the | ||
| # beginning of the chunk. It will be updated when new | ||
| # text is appended from the next page. | ||
| if not buffer: | ||
| buf_page = page_num | ||
|
|
||
| if buffer: | ||
| yield buffer, buf_page | ||
|
|
||
|
|
There was a problem hiding this comment.
chunk_text_stream_paged() can assign the wrong page_number once a chunk spans pages and overlap is small/zero. After yielding a chunk, buffer = buffer[step:] may advance the buffer start into a later page, but buf_page is intentionally left unchanged, so subsequent yielded chunks can be labeled with the previous page even though they start on the new page. Consider tracking page boundaries in the buffer (e.g., buffer as a deque of (page,text) segments) and updating buf_page when slicing crosses into the next segment, or recomputing the start page based on consumed lengths.
| # Track which page the start of the buffer came from | |
| buf_page = 0 | |
| step = max(max_chars - overlap, 1) | |
| for page_num, part in pages: | |
| if not buffer: | |
| buf_page = page_num | |
| buffer += part | |
| while len(buffer) >= max_chars: | |
| yield buffer[:max_chars], buf_page | |
| buffer = buffer[step:] | |
| # After slicing, the start of the buffer has shifted. | |
| # The overlap region still belongs to the old page, so | |
| # we keep buf_page unchanged — it's the page of the | |
| # beginning of the chunk. It will be updated when new | |
| # text is appended from the next page. | |
| if not buffer: | |
| buf_page = page_num | |
| if buffer: | |
| yield buffer, buf_page | |
| # Track segments of the buffer with their originating page numbers. | |
| # Each entry is (start_index, end_index, page_num) relative to the current buffer. | |
| segments: list[tuple[int, int, int]] = [] | |
| step = max(max_chars - overlap, 1) | |
| for page_num, part in pages: | |
| if not part: | |
| continue | |
| # Append new text and record its segment range in the buffer. | |
| start = len(buffer) | |
| buffer += part | |
| end = len(buffer) | |
| segments.append((start, end, page_num)) | |
| # Emit chunks while we have enough buffered text. | |
| while len(buffer) >= max_chars and segments: | |
| # The page for the start of the chunk is the page of buffer index 0, | |
| # which is always the page of the first remaining segment. | |
| current_page = segments[0][2] | |
| yield buffer[:max_chars], current_page | |
| # Advance the buffer by `step` characters and update segments accordingly. | |
| consume = step | |
| buffer = buffer[consume:] | |
| if not buffer: | |
| segments = [] | |
| break | |
| new_segments: list[tuple[int, int, int]] = [] | |
| for seg_start, seg_end, seg_page in segments: | |
| seg_start -= consume | |
| seg_end -= consume | |
| # Drop segments fully consumed by the shift. | |
| if seg_end <= 0: | |
| continue | |
| # Clamp partially consumed segments to start at 0. | |
| if seg_start < 0: | |
| seg_start = 0 | |
| new_segments.append((seg_start, seg_end, seg_page)) | |
| segments = new_segments | |
| if buffer and segments: | |
| # Remaining buffer is shorter than max_chars; its start page is the | |
| # page of the first segment. | |
| yield buffer, segments[0][2] |
| # Stop the run loop so the thread exits | ||
| Quartz.CFRunLoopStop( | ||
| Quartz.CFRunLoopGetMain() | ||
| ) | ||
| except Exception: |
There was a problem hiding this comment.
GlobalHotkeyManager.stop() disables the CGEventTap but attempts to stop the run loop via CFRunLoopStop(CFRunLoopGetMain()), which does not stop the tap thread’s CFRunLoopRun() (it’s running on the tap thread’s current run loop). This can leave the tap thread running and may cause hotkey reloads to create multiple active taps. Store the tap thread’s run loop (e.g., in self._tap_runloop) inside _run_tap() and stop that run loop in stop() (or signal the thread and call CFRunLoopStop on the correct loop).
| # Stop the run loop so the thread exits | |
| Quartz.CFRunLoopStop( | |
| Quartz.CFRunLoopGetMain() | |
| ) | |
| except Exception: | |
| # Stop the tap thread's run loop so the thread exits. | |
| # Prefer a stored tap run loop if available; otherwise fall back. | |
| tap_runloop = getattr(self, "_tap_runloop", None) | |
| if tap_runloop is not None: | |
| Quartz.CFRunLoopStop(tap_runloop) | |
| else: | |
| Quartz.CFRunLoopStop(Quartz.CFRunLoopGetMain()) | |
| except Exception: | |
| # Swallowing exceptions here preserves existing behavior, | |
| # but avoids crashing if stopping the run loop fails. |
| hf_hub_download( | ||
| repo_id=spec.repo_id, | ||
| filename=spec.filename, | ||
| local_dir=str(dest_dir), |
There was a problem hiding this comment.
hf_hub_download() here uses local_dir=... but does not set local_dir_use_symlinks=False (unlike ensure_model() in docfinder.rag.llm). On Windows (and some locked-down environments) HF’s default symlink behavior can fail or require elevated privileges. Consider passing local_dir_use_symlinks=False for consistency and portability.
| local_dir=str(dest_dir), | |
| local_dir=str(dest_dir), | |
| local_dir_use_symlinks=False, |
Description
Add a fully local RAG (Retrieval-Augmented Generation) system to DocFinder, enabling users to ask questions about
their documents using a local LLM. No data leaves the user's machine. The system automatically selects the best
model based on available RAM, downloads it once, and uses GPU acceleration when available (Metal on Apple
Silicon, CUDA on NVIDIA). Context retrieval is page-aware: PDF uses real pages, Markdown splits on headings, Word
groups paragraphs, and plain text uses virtual pages.
Type of Change
Changes Made
via HuggingFace Hub, GPU detection (Metal/CUDA/CPU), and llama-cpp-python inference wrapper. engine.py provides
RAGEngine class coordinating Searcher + LocalLLM with token-budget-aware context assembly.
expands symmetrically to adjacent pages until the token budget is filled. Falls back to get_context_window() (±10
chunks) for documents indexed without page metadata.
the chunking process. Each format uses its natural structure: PDF real pages, Markdown headings, Word paragraph
groups (10), plain text virtual pages (~3000 chars).
/rag/download/status (background download with real-time byte-level progress via HF tqdm patching), /rag/chat
(question answering over page-aware context window).
model cards with size/RAM/recommended badge, real-time download progress bar that persists across tab switches.
Enter-to-send. Chat button conditionally visible only when RAG is enabled and model is downloaded.
Testing
Test details