Skip to content

Indexing optimization, UI improvement and v1.2.0 release preparation#28

Merged
filippostanghellini merged 6 commits intomainfrom
feature/optimize-indexing
Mar 10, 2026
Merged

Indexing optimization, UI improvement and v1.2.0 release preparation#28
filippostanghellini merged 6 commits intomainfrom
feature/optimize-indexing

Conversation

@filippostanghellini
Copy link
Copy Markdown
Owner

No description provided.

Copilot AI review requested due to automatic review settings March 10, 2026 22:49
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 10, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR prepares the v1.2.0 release by improving indexing/search performance, adding desktop-oriented UX (progress polling, folder picker, global hotkey + settings), and refreshing the web UI styling and interactions.

Changes:

  • Speed up indexing via batched SQLite inserts, larger embedding/index batches, and a singleton embedder warmed at app startup.
  • Add async indexing jobs with /index/status/{job_id} polling + corresponding UI progress bar.
  • Introduce persistent user settings (global hotkey) and desktop integrations (pywebview folder picker, hotkey reload, macOS dock icon fix, Windows icon-parameter compatibility).

Reviewed changes

Copilot reviewed 11 out of 15 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
src/docfinder/web/templates/index.html Major UI refresh; adds progress polling, settings tab + hotkey capture modal, folder picker integration, and new JS flows.
src/docfinder/web/app.py Adds embedder singleton + warm startup, async indexing job registry/status endpoint, and settings GET/POST APIs.
src/docfinder/settings.py New persistent settings module storing user preferences (hotkey + enabled flag).
src/docfinder/index/storage.py Speeds up chunk inserts with executemany() batching.
src/docfinder/index/indexer.py Adds progress callback support and increases chunk batch size.
src/docfinder/gui.py Adds global hotkey manager (pynput), desktop JS API, and macOS/Windows icon handling improvements.
src/docfinder/embedding/encoder.py Raises embedding batch size and removes manual mini-batching/GC loop.
pyproject.toml Bumps version to 1.2.0; adds GUI deps for hotkey + macOS Cocoa framework.
README.md Streamlines docs and updates run-from-source instructions.
Makefile Adds setup, run, and run-web convenience targets.
CHANGELOG.md Adds v1.2.0 release notes and link updates.
images/index-documents.png, images/database-documents.png Screenshot assets referenced by docs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +54 to +55
# Pre-load the embedding model at startup so the first request is instant
await asyncio.to_thread(_get_embedder)
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lifespan hook eagerly preloads the embedding model on startup. In the test suite, the first request is often an unpatched call (e.g., empty-query validation), which will still trigger startup and load the real SentenceTransformer model, making tests slow/flaky. Consider guarding preload behind an env flag (disabled in tests) or making preload optional so startup validation-only requests don't force model initialization.

Suggested change
# Pre-load the embedding model at startup so the first request is instant
await asyncio.to_thread(_get_embedder)
# Optionally pre-load the embedding model at startup so the first request is instant.
# Controlled by DOCFINDER_PRELOAD_EMBEDDINGS (default: enabled). Set to "0"/"false"/"no" to disable.
preload_flag = os.getenv("DOCFINDER_PRELOAD_EMBEDDINGS", "1").lower()
if preload_flag not in ("0", "false", "no"):
await asyncio.to_thread(_get_embedder)

Copilot uses AI. Check for mistakes.
Comment on lines +348 to +358
job_id = str(uuid.uuid4())
job: dict[str, Any] = {
"id": job_id,
"status": "running",
"processed": 0,
"total": 0,
"current_file": "",
"stats": None,
"error": None,
}
_index_jobs[job_id] = job
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_index_jobs is an unbounded global registry and completed/errored jobs are never evicted. Repeated indexing will grow memory usage over time. Consider removing jobs after a TTL, capping the dict size, or deleting entries once the frontend has fetched final stats.

Copilot uses AI. Check for mistakes.
Comment on lines +376 to +382
@app.get("/index/status/{job_id}")
async def get_index_status(job_id: str) -> dict[str, Any]:
"""Poll the status of a running or completed indexing job."""
job = _index_jobs.get(job_id)
if job is None:
raise HTTPException(status_code=404, detail="Job not found")
return job
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New async indexing workflow adds /index/status/{job_id}, but there are no tests validating job creation, polling transitions (running→complete/error), or 404 behavior. Since tests/test_web_app.py already covers /index, add coverage for the status endpoint to ensure progress polling works and stays stable.

Copilot uses AI. Check for mistakes.
Comment on lines +1101 to +1103
const safePath = r.path.replace(/'/g, "\\'");
const title = r.title || r.path.split('/').pop();
const pct = Math.round(r.score * 100);
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

safePath only escapes single quotes before being embedded into an inline JS string. On Windows, paths contain backslashes (e.g., C:\docs\a.pdf), which will be interpreted as escape sequences inside the generated JS and can corrupt the path or break the handler. Prefer avoiding inline onclick entirely (attach a click listener and store the path in a data attribute), or at minimum escape backslashes/newlines and generate the argument via JSON.stringify.

Copilot uses AI. Check for mistakes.
Comment on lines +1225 to +1239
const interval = setInterval(async () => {
try {
const statusRes = await fetch(`${apiBase}/index/status/${jobId}`);
const job = await statusRes.json();
if (job.status === 'running') {
indexStatus.innerHTML = renderProgress(job);
} else if (job.status === 'complete') {
clearInterval(interval);
resolve(job);
} else if (job.status === 'error') {
clearInterval(interval);
reject(new Error(job.error || 'Indexing failed'));
}
} catch (err) { clearInterval(interval); reject(err); }
}, 600);
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using setInterval with an async callback can cause overlapping in-flight fetches if a request takes longer than the interval (600ms), leading to out-of-order UI updates and unnecessary load. Prefer a recursive setTimeout that schedules the next poll only after the previous one finishes (or guard with an inFlight flag).

Suggested change
const interval = setInterval(async () => {
try {
const statusRes = await fetch(`${apiBase}/index/status/${jobId}`);
const job = await statusRes.json();
if (job.status === 'running') {
indexStatus.innerHTML = renderProgress(job);
} else if (job.status === 'complete') {
clearInterval(interval);
resolve(job);
} else if (job.status === 'error') {
clearInterval(interval);
reject(new Error(job.error || 'Indexing failed'));
}
} catch (err) { clearInterval(interval); reject(err); }
}, 600);
const poll = async () => {
try {
const statusRes = await fetch(`${apiBase}/index/status/${jobId}`);
const job = await statusRes.json();
if (job.status === 'running') {
indexStatus.innerHTML = renderProgress(job);
setTimeout(poll, 600);
} else if (job.status === 'complete') {
resolve(job);
} else if (job.status === 'error') {
reject(new Error(job.error || 'Indexing failed'));
} else {
// Unknown status: poll again conservatively
setTimeout(poll, 600);
}
} catch (err) {
reject(err);
}
};
poll();

Copilot uses AI. Check for mistakes.
Comment on lines +232 to +247
@app.get("/settings")
async def get_settings() -> dict:
"""Return current user settings."""
return load_settings()


@app.post("/settings")
async def update_settings(payload: SettingsPayload) -> dict:
"""Persist updated settings and return the full settings dict."""
current = load_settings()
if payload.hotkey is not None:
current["hotkey"] = payload.hotkey
if payload.hotkey_enabled is not None:
current["hotkey_enabled"] = payload.hotkey_enabled
_save_settings(current)
return current
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New /settings endpoints were added, but there are existing comprehensive tests for docfinder.web.app and no tests currently cover settings load/update behavior. Add tests for GET/POST /settings (including partial updates) to prevent regressions in persistence/serialization.

Copilot uses AI. Check for mistakes.
Comment on lines +1129 to +1131
resultsMeta.style.display = 'none';
resultsList.innerHTML = '<div class="loading"><div class="spinner"></div><span>Searching…</span></div>';

Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resultsList is a <ul>, but loading/empty/error states are injected as a <div> via innerHTML. This produces invalid list markup and can confuse screen readers. Render these states as <li> elements (or move them to a separate container outside the <ul>).

Copilot uses AI. Check for mistakes.
Comment on lines +1054 to +1060
document.addEventListener('keydown', e => {
if ((e.metaKey || e.ctrlKey) && e.key === 'k') {
e.preventDefault();
document.querySelector('[data-tab="search"]').click();
setTimeout(() => document.getElementById('query').focus(), 60);
}
});
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The global ⌘K/Ctrl+K handler runs even while the hotkey capture modal is open. Because it's registered earlier than the capture modal keydown handler, pressing ⌘K during capture will switch tabs instead of recording the combo. Add a guard (e.g., if #hotkey-modal is active, ignore) or register the capture handler in the capture phase so it always wins.

Copilot uses AI. Check for mistakes.
Comment on lines +1226 to +1233
try {
const statusRes = await fetch(`${apiBase}/index/status/${jobId}`);
const job = await statusRes.json();
if (job.status === 'running') {
indexStatus.innerHTML = renderProgress(job);
} else if (job.status === 'complete') {
clearInterval(interval);
resolve(job);
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Progress polling doesn't check statusRes.ok. If the status endpoint returns 404/500 (or a non-job JSON like {detail: ...}), job.status will be undefined and the interval will keep running forever. Handle non-2xx responses by clearing the interval and surfacing an error to the user.

Copilot uses AI. Check for mistakes.

// Render stats
const res = await fetch(`${apiBase}/documents`);
const data = await res.json();
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Documents load path parses JSON without checking res.ok. If the server returns an error payload, the code will try to read data.stats and likely throw a generic JS exception, which then becomes a low-quality UI error. Check res.ok and raise using data.detail (similar to the search and delete flows).

Suggested change
const data = await res.json();
const data = await res.json();
if (!res.ok) {
throw new Error(data.detail || `Failed to load documents (status ${res.status})`);
}

Copilot uses AI. Check for mistakes.
@filippostanghellini filippostanghellini merged commit 57e1108 into main Mar 10, 2026
21 checks passed
@filippostanghellini filippostanghellini deleted the feature/optimize-indexing branch March 10, 2026 23:20
@codecov-commenter
Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 37.87234% with 146 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/docfinder/gui.py 0.00% 95 Missing ⚠️
src/docfinder/web/app.py 68.75% 25 Missing ⚠️
src/docfinder/settings.py 35.13% 24 Missing ⚠️
src/docfinder/index/indexer.py 88.23% 2 Missing ⚠️

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants