Indexing optimization, UI improvement and v1.2.0 release preparation by filippostanghellini · Pull Request #28 · filippostanghellini/DocFinder

filippostanghellini · 2026-03-10T22:49:07Z

No description provided.

github-actions · 2026-03-10T22:49:18Z

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

src/docfinder/gui.py

Copilot

Pull request overview

This PR prepares the v1.2.0 release by improving indexing/search performance, adding desktop-oriented UX (progress polling, folder picker, global hotkey + settings), and refreshing the web UI styling and interactions.

Changes:

Speed up indexing via batched SQLite inserts, larger embedding/index batches, and a singleton embedder warmed at app startup.
Add async indexing jobs with /index/status/{job_id} polling + corresponding UI progress bar.
Introduce persistent user settings (global hotkey) and desktop integrations (pywebview folder picker, hotkey reload, macOS dock icon fix, Windows icon-parameter compatibility).

Reviewed changes

Copilot reviewed 11 out of 15 changed files in this pull request and generated 13 comments.

Show a summary per file

File	Description
src/docfinder/web/templates/index.html	Major UI refresh; adds progress polling, settings tab + hotkey capture modal, folder picker integration, and new JS flows.
src/docfinder/web/app.py	Adds embedder singleton + warm startup, async indexing job registry/status endpoint, and settings GET/POST APIs.
src/docfinder/settings.py	New persistent settings module storing user preferences (hotkey + enabled flag).
src/docfinder/index/storage.py	Speeds up chunk inserts with `executemany()` batching.
src/docfinder/index/indexer.py	Adds progress callback support and increases chunk batch size.
src/docfinder/gui.py	Adds global hotkey manager (pynput), desktop JS API, and macOS/Windows icon handling improvements.
src/docfinder/embedding/encoder.py	Raises embedding batch size and removes manual mini-batching/GC loop.
pyproject.toml	Bumps version to 1.2.0; adds GUI deps for hotkey + macOS Cocoa framework.
README.md	Streamlines docs and updates run-from-source instructions.
Makefile	Adds `setup`, `run`, and `run-web` convenience targets.
CHANGELOG.md	Adds v1.2.0 release notes and link updates.
images/index-documents.png, images/database-documents.png	Screenshot assets referenced by docs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-10T22:55:27Z

src/docfinder/web/app.py

+    # Pre-load the embedding model at startup so the first request is instant
+    await asyncio.to_thread(_get_embedder)


The lifespan hook eagerly preloads the embedding model on startup. In the test suite, the first request is often an unpatched call (e.g., empty-query validation), which will still trigger startup and load the real SentenceTransformer model, making tests slow/flaky. Consider guarding preload behind an env flag (disabled in tests) or making preload optional so startup validation-only requests don't force model initialization.

Suggested change

# Pre-load the embedding model at startup so the first request is instant

await asyncio.to_thread(_get_embedder)

# Optionally pre-load the embedding model at startup so the first request is instant.

# Controlled by DOCFINDER_PRELOAD_EMBEDDINGS (default: enabled). Set to "0"/"false"/"no" to disable.

preload_flag = os.getenv("DOCFINDER_PRELOAD_EMBEDDINGS", "1").lower()

if preload_flag not in ("0", "false", "no"):

await asyncio.to_thread(_get_embedder)

Copilot · 2026-03-10T22:55:28Z

src/docfinder/web/app.py

+    job_id = str(uuid.uuid4())
+    job: dict[str, Any] = {
+        "id": job_id,
+        "status": "running",
+        "processed": 0,
+        "total": 0,
+        "current_file": "",
+        "stats": None,
+        "error": None,
+    }
+    _index_jobs[job_id] = job


_index_jobs is an unbounded global registry and completed/errored jobs are never evicted. Repeated indexing will grow memory usage over time. Consider removing jobs after a TTL, capping the dict size, or deleting entries once the frontend has fetched final stats.

Copilot · 2026-03-10T22:55:28Z

src/docfinder/web/app.py

+@app.get("/index/status/{job_id}")
+async def get_index_status(job_id: str) -> dict[str, Any]:
+    """Poll the status of a running or completed indexing job."""
+    job = _index_jobs.get(job_id)
+    if job is None:
+        raise HTTPException(status_code=404, detail="Job not found")
+    return job


New async indexing workflow adds /index/status/{job_id}, but there are no tests validating job creation, polling transitions (running→complete/error), or 404 behavior. Since tests/test_web_app.py already covers /index, add coverage for the status endpoint to ensure progress polling works and stays stable.

Copilot · 2026-03-10T22:55:28Z

src/docfinder/web/templates/index.html

+        const safePath = r.path.replace(/'/g, "\\'");
+        const title = r.title || r.path.split('/').pop();
+        const pct   = Math.round(r.score * 100);


safePath only escapes single quotes before being embedded into an inline JS string. On Windows, paths contain backslashes (e.g., C:\docs\a.pdf), which will be interpreted as escape sequences inside the generated JS and can corrupt the path or break the handler. Prefer avoiding inline onclick entirely (attach a click listener and store the path in a data attribute), or at minimum escape backslashes/newlines and generate the argument via JSON.stringify.

Copilot · 2026-03-10T22:55:28Z

src/docfinder/web/templates/index.html

+            const interval = setInterval(async () => {
+              try {
+                const statusRes = await fetch(`${apiBase}/index/status/${jobId}`);
+                const job = await statusRes.json();
+                if (job.status === 'running') {
+                  indexStatus.innerHTML = renderProgress(job);
+                } else if (job.status === 'complete') {
+                  clearInterval(interval);
+                  resolve(job);
+                } else if (job.status === 'error') {
+                  clearInterval(interval);
+                  reject(new Error(job.error || 'Indexing failed'));
+                }
+              } catch (err) { clearInterval(interval); reject(err); }
+            }, 600);


Using setInterval with an async callback can cause overlapping in-flight fetches if a request takes longer than the interval (600ms), leading to out-of-order UI updates and unnecessary load. Prefer a recursive setTimeout that schedules the next poll only after the previous one finishes (or guard with an inFlight flag).

Suggested change

const interval = setInterval(async () => {

try {

const statusRes = await fetch(`${apiBase}/index/status/${jobId}`);

const job = await statusRes.json();

if (job.status === 'running') {

indexStatus.innerHTML = renderProgress(job);

} else if (job.status === 'complete') {

clearInterval(interval);

resolve(job);

} else if (job.status === 'error') {

clearInterval(interval);

reject(new Error(job.error || 'Indexing failed'));

}

} catch (err) { clearInterval(interval); reject(err); }

}, 600);

const poll = async () => {

try {

const statusRes = await fetch(`${apiBase}/index/status/${jobId}`);

const job = await statusRes.json();

if (job.status === 'running') {

indexStatus.innerHTML = renderProgress(job);

setTimeout(poll, 600);

} else if (job.status === 'complete') {

resolve(job);

} else if (job.status === 'error') {

reject(new Error(job.error || 'Indexing failed'));

} else {

// Unknown status: poll again conservatively

setTimeout(poll, 600);

}

} catch (err) {

reject(err);

}

};

poll();

Copilot · 2026-03-10T22:55:29Z

src/docfinder/web/app.py

+@app.get("/settings")
+async def get_settings() -> dict:
+    """Return current user settings."""
+    return load_settings()
+
+
+@app.post("/settings")
+async def update_settings(payload: SettingsPayload) -> dict:
+    """Persist updated settings and return the full settings dict."""
+    current = load_settings()
+    if payload.hotkey is not None:
+        current["hotkey"] = payload.hotkey
+    if payload.hotkey_enabled is not None:
+        current["hotkey_enabled"] = payload.hotkey_enabled
+    _save_settings(current)
+    return current


New /settings endpoints were added, but there are existing comprehensive tests for docfinder.web.app and no tests currently cover settings load/update behavior. Add tests for GET/POST /settings (including partial updates) to prevent regressions in persistence/serialization.

Copilot · 2026-03-10T22:55:29Z

src/docfinder/web/templates/index.html

+        resultsMeta.style.display = 'none';
+        resultsList.innerHTML = '<div class="loading"><div class="spinner"></div><span>Searching…</span></div>';
+


resultsList is a <ul>, but loading/empty/error states are injected as a <div> via innerHTML. This produces invalid list markup and can confuse screen readers. Render these states as <li> elements (or move them to a separate container outside the <ul>).

Copilot · 2026-03-10T22:55:29Z

src/docfinder/web/templates/index.html

+      document.addEventListener('keydown', e => {
+        if ((e.metaKey || e.ctrlKey) && e.key === 'k') {
+          e.preventDefault();
+          document.querySelector('[data-tab="search"]').click();
+          setTimeout(() => document.getElementById('query').focus(), 60);
+        }
+      });


The global ⌘K/Ctrl+K handler runs even while the hotkey capture modal is open. Because it's registered earlier than the capture modal keydown handler, pressing ⌘K during capture will switch tabs instead of recording the combo. Add a guard (e.g., if #hotkey-modal is active, ignore) or register the capture handler in the capture phase so it always wins.

Copilot · 2026-03-10T22:55:29Z

src/docfinder/web/templates/index.html

+              try {
+                const statusRes = await fetch(`${apiBase}/index/status/${jobId}`);
+                const job = await statusRes.json();
+                if (job.status === 'running') {
+                  indexStatus.innerHTML = renderProgress(job);
+                } else if (job.status === 'complete') {
+                  clearInterval(interval);
+                  resolve(job);


Progress polling doesn't check statusRes.ok. If the status endpoint returns 404/500 (or a non-job JSON like {detail: ...}), job.status will be undefined and the interval will keep running forever. Handle non-2xx responses by clearing the interval and surfacing an error to the user.

Copilot · 2026-03-10T22:55:30Z

src/docfinder/web/templates/index.html

-
-          // Render stats
+          const res  = await fetch(`${apiBase}/documents`);
+          const data = await res.json();


The Documents load path parses JSON without checking res.ok. If the server returns an error payload, the code will try to read data.stats and likely throw a generic JS exception, which then becomes a low-quality UI error. Check res.ok and raise using data.detail (similar to the search and delete flows).

Suggested change

const data = await res.json();

const data = await res.json();

if (!res.ok) {

throw new Error(data.detail || `Failed to load documents (status ${res.status})`);

}

codecov-commenter · 2026-03-11T02:06:28Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 37.87234% with 146 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/docfinder/gui.py	0.00%	95 Missing ⚠️
src/docfinder/web/app.py	68.75%	25 Missing ⚠️
src/docfinder/settings.py	35.13%	24 Missing ⚠️
src/docfinder/index/indexer.py	88.23%	2 Missing ⚠️

📢 Thoughts on this report? Let us know!

filippostanghellini added 4 commits March 10, 2026 19:54

feat: optimize indexing method and update UI

1365da3

release: v1.2.0, Windows crash fix, 2-4x faster indexing, UI overhaul

000ab5d

release: v.1.2.0

0c76b35

update changelog.md

1971c13

Copilot AI review requested due to automatic review settings March 10, 2026 22:49

Copilot started reviewing on behalf of filippostanghellini March 10, 2026 22:49 View session

github-advanced-security bot found potential problems Mar 10, 2026

View reviewed changes

src/docfinder/gui.py Dismissed Show dismissed Hide dismissed

src/docfinder/gui.py Dismissed Show dismissed Hide dismissed

filippostanghellini added 2 commits March 10, 2026 23:50

ruff check

295a756

update test

7ee05c1

Copilot AI reviewed Mar 10, 2026

View reviewed changes

filippostanghellini merged commit 57e1108 into main Mar 10, 2026
21 checks passed

filippostanghellini deleted the feature/optimize-indexing branch March 10, 2026 23:20

		# Pre-load the embedding model at startup so the first request is instant
		await asyncio.to_thread(_get_embedder)

-    # Pre-load the embedding model at startup so the first request is instant
-    await asyncio.to_thread(_get_embedder)
+    # Optionally pre-load the embedding model at startup so the first request is instant.
+    # Controlled by DOCFINDER_PRELOAD_EMBEDDINGS (default: enabled). Set to "0"/"false"/"no" to disable.
+    preload_flag = os.getenv("DOCFINDER_PRELOAD_EMBEDDINGS", "1").lower()
+    if preload_flag not in ("0", "false", "no"):
+        await asyncio.to_thread(_get_embedder)

		resultsMeta.style.display = 'none';
		resultsList.innerHTML = '<div class="loading"><div class="spinner"></div><span>Searching…</span></div>';

Conversation

filippostanghellini commented Mar 10, 2026

Uh oh!

github-actions bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Dependency Review

Scanned Files

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov-commenter commented Mar 11, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions bot commented Mar 10, 2026 •

edited

Loading