Feature/rag by filippostanghellini · Pull Request #29 · filippostanghellini/DocFinder

filippostanghellini · 2026-03-12T16:37:39Z

Description

Add a fully local RAG (Retrieval-Augmented Generation) system to DocFinder, enabling users to ask questions about
their documents using a local LLM. No data leaves the user's machine. The system automatically selects the best
model based on available RAM, downloads it once, and uses GPU acceleration when available (Metal on Apple
Silicon, CUDA on NVIDIA). Context retrieval is page-aware: PDF uses real pages, Markdown splits on headings, Word
groups paragraphs, and plain text uses virtual pages.

Type of Change

🐛 Bug fix (non-breaking change which fixes an issue)
✨ New feature (non-breaking change which adds functionality)
💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
📝 Documentation update
🎨 Code style update (formatting, renaming)
♻️ Code refactor (no functional changes)
⚡ Performance improvement
✅ Test update
🔧 Configuration change
🏗️ Build/CI update

Changes Made

New src/docfinder/rag/ module — llm.py handles model selection (Qwen2.5-7B/3B/1.5B based on RAM), GGUF download
via HuggingFace Hub, GPU detection (Metal/CUDA/CPU), and llama-cpp-python inference wrapper. engine.py provides
RAGEngine class coordinating Searcher + LocalLLM with token-budget-aware context assembly.
Page-aware context retrieval — new get_context_by_page() in storage.py retrieves chunks by page/section and
expands symmetrically to adjacent pages until the token budget is filled. Falls back to get_context_window() (±10
chunks) for documents indexed without page metadata.
Page-aware chunking pipeline — chunk_text_stream_paged() in utils/text.py propagates page provenance through
the chunking process. Each format uses its natural structure: PDF real pages, Markdown headings, Word paragraph
groups (10), plain text virtual pages (~3000 chars).
RAG backend endpoints — /rag/models (list available models with recommended flag), /rag/download +
/rag/download/status (background download with real-time byte-level progress via HF tqdm patching), /rag/chat
(question answering over page-aware context window).
RAG Settings UI — "AI Chat (RAG)" section in Settings with enable/disable toggle, hardware detection, three
model cards with size/RAM/recommended badge, real-time download progress bar that persists across tab switches.
Chat panel UI — slide-up panel from bottom-right on search results, with conversation history, auto-focus,
Enter-to-send. Chat button conditionally visible only when RAG is enabled and model is downloaded.
Version bump to 2.0.0 across pyproject.toml, init.py, app.py, and CHANGELOG.md.

Testing

Existing tests pass locally (198 passed, 1 skipped)
Added new tests for the changes (31 tests in test_rag.py)
Manual testing performed

Test details

TestGetContextWindow (8 tests) — fixed-window retrieval, clamping, ordering, edge cases
TestGetContextByPage (7 tests) — page-based retrieval, expansion, max_chars budget, ordering
TestChunkTextStreamPaged (3 tests) — page provenance through chunking pipeline
TestTxtPaged (3 tests) — plain text virtual paging
TestMdPaged (3 tests) — Markdown heading-based sections
TestDocxPaged (1 test) — Word paragraph grouping
TestSelectModel (5 tests) — RAM-based model tier selection
TestRAGEngineContextAssembly (2 tests) — context trimming and ordering

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

the models is recommended based on the computer spec, the context is optimized for a trade off between quality and effiecency.

github-actions · 2026-03-12T16:37:52Z

Dependency Review

The following issues were found:

✅ 0 vulnerable package(s)
✅ 0 package(s) with incompatible licenses
✅ 0 package(s) with invalid SPDX license definitions
⚠️ 1 package(s) with unknown licenses.

See the Details below.

License Issues

pyproject.toml

Package	Version	License	Issue Type
python-docx	>= 1.1.0	Null	Unknown License

OpenSSF Scorecard

Package	Version	Score	Details
pip/python-docx	>= 1.1.0	Unknown	Unknown

Scanned Files

pyproject.toml

tests/test_rag.py

src/docfinder/gui.py

src/docfinder/rag/llm.py

src/docfinder/web/app.py

codecov-commenter · 2026-03-12T16:40:43Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 32.65306% with 594 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/docfinder/gui.py	0.00%	241 Missing ⚠️
src/docfinder/web/app.py	16.88%	192 Missing ⚠️
src/docfinder/ingestion/pdf_loader.py	57.51%	65 Missing ⚠️
src/docfinder/rag/llm.py	37.03%	51 Missing ⚠️
src/docfinder/rag/engine.py	57.64%	36 Missing ⚠️
src/docfinder/index/indexer.py	75.00%	4 Missing ⚠️
src/docfinder/utils/files.py	75.00%	2 Missing ⚠️
src/docfinder/index/storage.py	97.22%	1 Missing ⚠️
src/docfinder/settings.py	0.00%	1 Missing ⚠️
src/docfinder/web/frontend.py	83.33%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copilot

Pull request overview

This PR adds a fully local RAG (Retrieval-Augmented Generation) capability to DocFinder and expands indexing beyond PDFs (TXT/MD/DOCX), with supporting backend endpoints, UI updates, and test coverage.

Changes:

Introduces docfinder.rag (local LLM selection/download + a RAG orchestration engine) and new RAG HTTP endpoints (/rag/models, /rag/download, /rag/chat).
Adds page-aware chunking + retrieval (chunk_text_stream_paged, get_context_by_page, get_context_window) to support better context assembly.
Expands ingestion/indexing and UX to support multi-format documents and adds a Spotlight-style quick-search panel on macOS.

Reviewed changes

Copilot reviewed 26 out of 28 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
tests/test_rag.py	New tests for context window/page retrieval, page-aware chunking, model selection, and RAG engine assembly.
tests/test_pdf_loader.py	Updates mocks to use the new paged text iterator for PDF chunk building.
tests/test_indexer.py	Updates patch points from PDF-only discovery to multi-document discovery.
tests/test_imports.py	Updates version assertion to 2.0.0.
tests/test_frontend.py	Updates frontend template loader signature to accept a template name.
tests/test_cli.py	Updates CLI messaging from “PDFs” to “supported documents”.
src/docfinder/web/templates/spotlight.html	Adds dedicated Spotlight panel HTML/JS for quick search.
src/docfinder/web/templates/index.html	Major UI redesign + adds in-app Spotlight overlay, indexing warnings, and RAG settings + chat panel.
src/docfinder/web/frontend.py	Template loader now supports multiple templates and serves `/spotlight`.
src/docfinder/web/app.py	Adds RAG endpoints, system RAM endpoint, indexing scan endpoint, exclude-path support, and embed batch sizing by RAM.
src/docfinder/utils/text.py	Adds `chunk_text_stream_paged()` for page-provenance chunking.
src/docfinder/utils/files.py	Adds supported extensions + `iter_document_paths()`; keeps `iter_pdf_paths()` for compatibility.
src/docfinder/settings.py	Changes default hotkey to `<alt>+d`.
src/docfinder/rag/llm.py	New local LLM management: model tiers, RAM-based selection, download, and llama-cpp wrapper.
src/docfinder/rag/engine.py	New RAG engine coordinating search + context assembly + generation.
src/docfinder/rag/init.py	Adds RAG package init.
src/docfinder/ingestion/pdf_loader.py	Generalizes ingestion to PDF/TXT/MD/DOCX; adds paged iterators and stores `page` in chunk metadata.
src/docfinder/index/storage.py	Search now returns `document_id`; adds `get_context_window()` + `get_context_by_page()`.
src/docfinder/index/indexer.py	Generalizes indexer to index supported docs; adds exclusions and embedding batch-size override support.
src/docfinder/gui.py	Adds native macOS Spotlight NSPanel + CGEventTap-based global hotkey handling; integrates with web endpoints.
src/docfinder/embedding/encoder.py	Adds optional `batch_size` override to `EmbeddingModel.embed()`.
src/docfinder/cli.py	Updates CLI to index supported document types instead of PDFs only.
src/docfinder/init.py	Bumps package version to 2.0.0.
pyproject.toml	Bumps version + adds `python-docx` dependency and `[rag]` extra for llama-cpp-python.
README.md	Updates messaging and demo media for multi-format + new UX.
CLAUDE.md	Adds repo command/architecture guidance for Claude Code tooling.
CHANGELOG.md	Adds 2.0.0 release notes and updates compare links.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copilot · 2026-03-12T16:43:08Z

src/docfinder/web/app.py

+    "error": None,
+}
+
+


RAG globals are accessed/modified from multiple threads without synchronization: _rag_llm_lock is defined but never used, and _rag_download is mutated inside _load_rag_llm() (run via asyncio.to_thread) while being read by /rag/download/status. This can lead to inconsistent status/progress and makes future changes riskier. Use _rag_llm_lock (or a dedicated lock) to guard updates to _rag_llm and _rag_download (or replace _rag_download with an immutable snapshot updated atomically).

Suggested change

def _get_rag_download_state() -> dict[str, Any]:

"""Return a thread-safe snapshot of the RAG download state."""

with _rag_llm_lock:

# Return a shallow copy so callers cannot mutate the shared dict

return dict(_rag_download)

Copilot · 2026-03-12T16:43:08Z

src/docfinder/index/storage.py

+        # Collect pages: start with center_page, then expand symmetrically
+        collected: List[dict] = []
+        total_chars = 0
+
+        # Add center page first
+        for c in chunks_with_page:
+            if c["page"] == center_page:
+                collected.append(c)
+                total_chars += len(c["text"])
+
+        # Expand to adjacent pages
+        expand = 1
+        while total_chars < max_chars:
+            added = False
+            for offset_page in (center_page - expand, center_page + expand):
+                if offset_page < 0:
+                    continue
+                for c in chunks_with_page:
+                    if c["page"] == offset_page and total_chars < max_chars:
+                        collected.append(c)
+                        total_chars += len(c["text"])
+                        added = True


get_context_by_page() does repeated full scans of chunks_with_page inside the expansion loop (for offset_page ...: for c in chunks_with_page:). For documents with many chunks/pages this becomes O(n * pages_expanded) and can be noticeably slow in RAG. Consider pre-grouping chunks by page (e.g., page -> [chunks]) once, then appending from the adjacent pages without re-scanning the entire chunk list each time (and consider skipping page 0 by checking offset_page <= 0).

Suggested change

# Collect pages: start with center_page, then expand symmetrically

collected: List[dict] = []

total_chars = 0

# Add center page first

for c in chunks_with_page:

if c["page"] == center_page:

collected.append(c)

total_chars += len(c["text"])

# Expand to adjacent pages

expand = 1

while total_chars < max_chars:

added = False

for offset_page in (center_page - expand, center_page + expand):

if offset_page < 0:

continue

for c in chunks_with_page:

if c["page"] == offset_page and total_chars < max_chars:

collected.append(c)

total_chars += len(c["text"])

added = True

# Group chunks by page for efficient access during expansion

page_to_chunks = {}

for c in chunks_with_page:

page_to_chunks.setdefault(c["page"], []).append(c)

# Collect pages: start with center_page, then expand symmetrically

collected: List[dict] = []

total_chars = 0

# Add center page first

for c in page_to_chunks.get(center_page, []):

collected.append(c)

total_chars += len(c["text"])

# Expand to adjacent pages

expand = 1

while total_chars < max_chars:

added = False

for offset_page in (center_page - expand, center_page + expand):

# Skip non-positive pages; page numbers are expected to be 1-based

if offset_page <= 0:

continue

for c in page_to_chunks.get(offset_page, []):

if total_chars >= max_chars:

break

collected.append(c)

total_chars += len(c["text"])

added = True

Copilot · 2026-03-12T16:43:08Z

src/docfinder/web/app.py

+
+@app.post("/index/scan")
+async def scan_index_paths(payload: ScanPayload) -> dict[str, Any]:
+    """Scan paths for PDFs and return file stats without indexing."""


scan_index_paths() docstring says it scans paths for PDFs, but the implementation uses iter_document_paths() and returns counts by extension for multiple document types. Update the docstring to match the current behavior (supported documents rather than PDFs only).

Suggested change

"""Scan paths for PDFs and return file stats without indexing."""

"""Scan the given paths for supported document files and return file stats (including counts by extension) without indexing."""

Copilot · 2026-03-12T16:43:09Z

tests/test_rag.py

+        pages = [(1, "A" * 60), (2, "B" * 60)]
+        results = list(chunk_text_stream_paged(pages, max_chars=100, overlap=0))
+
+        assert len(results) >= 1
+        # First chunk starts on page 1
+        assert results[0][1] == 1


test_chunk_spanning_pages only asserts the first chunk’s page number. Given the intended contract (“page_number is the page that contributed the start of the chunk”), add an assertion for the next yielded chunk when overlap=0 (it should start on page 2 in this setup). This would catch incorrect page provenance when chunk_text_stream_paged() slices the buffer past the end of the first page.

Suggested change

pages = [(1, "A" * 60), (2, "B" * 60)]

results = list(chunk_text_stream_paged(pages, max_chars=100, overlap=0))

assert len(results) >= 1

# First chunk starts on page 1

assert results[0][1] == 1

# Second chunk should start on page 2 when overlap=0.

pages = [(1, "A" * 60), (2, "B" * 60)]

results = list(chunk_text_stream_paged(pages, max_chars=100, overlap=0))

# We expect exactly two chunks: 100 chars, then 20 chars.

assert len(results) == 2

# First chunk starts on page 1

assert results[0][1] == 1

# Second chunk starts on page 2

assert results[1][1] == 2

Copilot · 2026-03-12T16:43:09Z

src/docfinder/rag/engine.py

+
+    @staticmethod
+    def _assemble_context_text(chunks: List[dict], results: List[SearchResult]) -> str:
+        """Join chunk texts, trimming symmetrically if the total exceeds the budget."""


_assemble_context_text() docstring says it “trims symmetrically”, but the implementation truncates by stopping once the global character budget is reached while iterating documents/chunks in order. Either adjust the docstring to match the actual behavior, or implement true symmetric trimming around the highest-relevance chunks if that’s the intent.

Suggested change

"""Join chunk texts, trimming symmetrically if the total exceeds the budget."""

"""Join chunk texts into a single context string, truncating once the

global character budget is reached.

Chunks are grouped by document path and ordered by their chunk index.

Text is appended in that order until ``_MAX_CONTEXT_CHARS`` is exceeded;

any remaining chunks are omitted (prefix-style trimming, not symmetric)."""

Copilot · 2026-03-12T16:43:09Z

src/docfinder/utils/text.py

+    # Track which page the start of the buffer came from
+    buf_page = 0
+    step = max(max_chars - overlap, 1)
+
+    for page_num, part in pages:
+        if not buffer:
+            buf_page = page_num
+        buffer += part
+        while len(buffer) >= max_chars:
+            yield buffer[:max_chars], buf_page
+            buffer = buffer[step:]
+            # After slicing, the start of the buffer has shifted.
+            # The overlap region still belongs to the old page, so
+            # we keep buf_page unchanged — it's the page of the
+            # beginning of the chunk.  It will be updated when new
+            # text is appended from the next page.
+            if not buffer:
+                buf_page = page_num
+
+    if buffer:
+        yield buffer, buf_page
+
+


chunk_text_stream_paged() can assign the wrong page_number once a chunk spans pages and overlap is small/zero. After yielding a chunk, buffer = buffer[step:] may advance the buffer start into a later page, but buf_page is intentionally left unchanged, so subsequent yielded chunks can be labeled with the previous page even though they start on the new page. Consider tracking page boundaries in the buffer (e.g., buffer as a deque of (page,text) segments) and updating buf_page when slicing crosses into the next segment, or recomputing the start page based on consumed lengths.

Suggested change

# Track which page the start of the buffer came from

buf_page = 0

step = max(max_chars - overlap, 1)

for page_num, part in pages:

if not buffer:

buf_page = page_num

buffer += part

while len(buffer) >= max_chars:

yield buffer[:max_chars], buf_page

buffer = buffer[step:]

# After slicing, the start of the buffer has shifted.

# The overlap region still belongs to the old page, so

# we keep buf_page unchanged — it's the page of the

# beginning of the chunk. It will be updated when new

# text is appended from the next page.

if not buffer:

buf_page = page_num

if buffer:

yield buffer, buf_page

# Track segments of the buffer with their originating page numbers.

# Each entry is (start_index, end_index, page_num) relative to the current buffer.

segments: list[tuple[int, int, int]] = []

step = max(max_chars - overlap, 1)

for page_num, part in pages:

if not part:

continue

# Append new text and record its segment range in the buffer.

start = len(buffer)

buffer += part

end = len(buffer)

segments.append((start, end, page_num))

# Emit chunks while we have enough buffered text.

while len(buffer) >= max_chars and segments:

# The page for the start of the chunk is the page of buffer index 0,

# which is always the page of the first remaining segment.

current_page = segments[0][2]

yield buffer[:max_chars], current_page

# Advance the buffer by `step` characters and update segments accordingly.

consume = step

buffer = buffer[consume:]

if not buffer:

segments = []

break

new_segments: list[tuple[int, int, int]] = []

for seg_start, seg_end, seg_page in segments:

seg_start -= consume

seg_end -= consume

# Drop segments fully consumed by the shift.

if seg_end <= 0:

continue

# Clamp partially consumed segments to start at 0.

if seg_start < 0:

seg_start = 0

new_segments.append((seg_start, seg_end, seg_page))

segments = new_segments

if buffer and segments:

# Remaining buffer is shorter than max_chars; its start page is the

# page of the first segment.

yield buffer, segments[0][2]

Copilot · 2026-03-12T16:43:09Z

src/docfinder/gui.py

+                    # Stop the run loop so the thread exits
+                    Quartz.CFRunLoopStop(
+                        Quartz.CFRunLoopGetMain()
+                    )
+            except Exception:


GlobalHotkeyManager.stop() disables the CGEventTap but attempts to stop the run loop via CFRunLoopStop(CFRunLoopGetMain()), which does not stop the tap thread’s CFRunLoopRun() (it’s running on the tap thread’s current run loop). This can leave the tap thread running and may cause hotkey reloads to create multiple active taps. Store the tap thread’s run loop (e.g., in self._tap_runloop) inside _run_tap() and stop that run loop in stop() (or signal the thread and call CFRunLoopStop on the correct loop).

Suggested change

# Stop the run loop so the thread exits

Quartz.CFRunLoopStop(

Quartz.CFRunLoopGetMain()

)

except Exception:

# Stop the tap thread's run loop so the thread exits.

# Prefer a stored tap run loop if available; otherwise fall back.

tap_runloop = getattr(self, "_tap_runloop", None)

if tap_runloop is not None:

Quartz.CFRunLoopStop(tap_runloop)

else:

Quartz.CFRunLoopStop(Quartz.CFRunLoopGetMain())

except Exception:

# Swallowing exceptions here preserves existing behavior,

# but avoids crashing if stopping the run loop fails.

Copilot · 2026-03-12T16:43:10Z

src/docfinder/web/app.py

+            hf_hub_download(
+                repo_id=spec.repo_id,
+                filename=spec.filename,
+                local_dir=str(dest_dir),


hf_hub_download() here uses local_dir=... but does not set local_dir_use_symlinks=False (unlike ensure_model() in docfinder.rag.llm). On Windows (and some locked-down environments) HF’s default symlink behavior can fail or require elevated privileges. Consider passing local_dir_use_symlinks=False for consistency and portability.

Suggested change

local_dir=str(dest_dir),

local_dir=str(dest_dir),

local_dir_use_symlinks=False,

filippostanghellini and others added 5 commits March 11, 2026 23:26

feat: general improvements

871d598

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Update UI theme and add demo GIF

28237d6

Update documents descriptions

c250a84

release: v2.0.0-beta

1f9f455

feat: rag, optimized context and models. v2.0.0

e000799

the models is recommended based on the computer spec, the context is optimized for a trade off between quality and effiecency.

Copilot AI review requested due to automatic review settings March 12, 2026 16:37

Copilot started reviewing on behalf of filippostanghellini March 12, 2026 16:38 View session

github-advanced-security bot found potential problems Mar 12, 2026

View reviewed changes

Copilot AI reviewed Mar 12, 2026

View reviewed changes

filippostanghellini added 2 commits March 12, 2026 17:44

fix ruff

a6deb2e

fix ruff

497a5fb

filippostanghellini merged commit e2f2337 into main Mar 12, 2026
21 checks passed

filippostanghellini deleted the feature/rag branch March 13, 2026 12:16

+def _get_rag_download_state() -> dict[str, Any]:
+    """Return a thread-safe snapshot of the RAG download state."""
+    with _rag_llm_lock:
+        # Return a shallow copy so callers cannot mutate the shared dict
+        return dict(_rag_download)

	"""Scan paths for PDFs and return file stats without indexing."""
	"""Scan the given paths for supported document files and return file stats (including counts by extension) without indexing."""

-        """Join chunk texts, trimming symmetrically if the total exceeds the budget."""
+        """Join chunk texts into a single context string, truncating once the
+        global character budget is reached.
+        Chunks are grouped by document path and ordered by their chunk index.
+        Text is appended in that order until ``_MAX_CONTEXT_CHARS`` is exceeded;
+        any remaining chunks are omitted (prefix-style trimming, not symmetric)."""

-                    # Stop the run loop so the thread exits
-                    Quartz.CFRunLoopStop(
-                        Quartz.CFRunLoopGetMain()
-                    )
-            except Exception:
+                    # Stop the tap thread's run loop so the thread exits.
+                    # Prefer a stored tap run loop if available; otherwise fall back.
+                    tap_runloop = getattr(self, "_tap_runloop", None)
+                    if tap_runloop is not None:
+                        Quartz.CFRunLoopStop(tap_runloop)
+                    else:
+                        Quartz.CFRunLoopStop(Quartz.CFRunLoopGetMain())
+            except Exception:
+                # Swallowing exceptions here preserves existing behavior,
+                # but avoids crashing if stopping the run loop fails.

	local_dir=str(dest_dir),
	local_dir=str(dest_dir),
	local_dir_use_symlinks=False,

Conversation

filippostanghellini commented Mar 12, 2026

Uh oh!

github-actions bot commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Dependency Review

License Issues

pyproject.toml

OpenSSF Scorecard

Scanned Files

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions bot commented Mar 12, 2026 •

edited

Loading

codecov-commenter commented Mar 12, 2026 •

edited

Loading