Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 30 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,34 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

## [1.2.0] - 2026-03-10

### Fixed
- **Windows: Fixed crash on startup** — `create_window()` raised `TypeError: unexpected keyword argument 'icon'` on pywebview builds that don't expose the `icon` parameter. The app now checks for parameter support at runtime via `inspect.signature` and falls back gracefully, so the window opens without an icon instead of crashing entirely.
- **macOS: Fixed dock icon showing Python logo** — When running from source the dock now displays the DocFinder logo instead of the generic Python 3.x icon, via AppKit `NSApplication.setApplicationIconImage_()`.

### Added
- **Global hotkey** — bring DocFinder to the front from any app with a configurable system-wide keyboard shortcut (default: `⌘+Shift+F` on macOS, `Ctrl+Shift+F` on Windows/Linux); implemented via pynput `GlobalHotKeys`
- **Settings tab** — new gear-icon tab lets you enable/disable the global hotkey and change the key combination via an interactive capture modal (press the desired keys, confirm)
- **Native folder picker** — "Browse…" button in the Index tab opens the system file dialog (Finder on macOS, Explorer on Windows) via `window.pywebview.api.pick_folder()`; button is shown only when running inside the desktop app

### Performance
- **Indexing 2–4× faster** through several compounding improvements:
- `insert_chunks()` now uses `executemany()` for batch SQLite inserts (was one `execute()` per row)
- `EmbeddingModel.embed()` uses SentenceTransformer's native batching directly (batch size 32, up from 8); removed the artificial inner mini-batch loop of 4
- Chunk batch size per document increased from 32 to 64
- Removed `gc.collect()` calls from inside the per-chunk loop; one call per document is sufficient
- Removed the artificial 2-files-at-a-time outer loop during indexing
- **First request instant** — `EmbeddingModel` is now a singleton loaded once at startup; previously a new model instance was created for every `/search`, `/documents`, `/index`, and `/cleanup` request

### UI
- **Real-time indexing progress** — animated progress bar with file counter and current filename, updated every 600 ms via polling
- **macOS-native design** — header uses `backdrop-filter: saturate(180%) blur(20px)` for the system frosted-glass effect; improved shadows and depth
- **⌘K / Ctrl+K** shortcut to jump to search from any tab; search input auto-focused on load
- **Drag & drop** — drag a folder from Finder/Explorer directly onto the path input in the Index tab
- Relevance score shown as a **percentage** (e.g. `87%`) instead of a raw float
- Search result **count** displayed above the results list

## [1.1.2] - 2025-12-15

### Fixed
Expand Down Expand Up @@ -152,7 +180,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Fixed linting issues for consistent code style
- Updated ruff configuration to use non-deprecated settings

[Unreleased]: https://github.com/filippostanghellini/DocFinder/compare/v1.1.2...HEAD
[Unreleased]: https://github.com/filippostanghellini/DocFinder/compare/v1.2.0...HEAD
[1.2.0]: https://github.com/filippostanghellini/DocFinder/compare/v1.1.2...v1.2.0
[1.1.2]: https://github.com/filippostanghellini/DocFinder/compare/v1.1.1...v1.1.2
[1.1.1]: https://github.com/filippostanghellini/DocFinder/compare/v1.0.1...v1.1.1
[1.0.1]: https://github.com/filippostanghellini/DocFinder/compare/v1.0.0...v1.0.1
Expand Down
29 changes: 27 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,6 +1,31 @@
.PHONY: lint format test check-all install clean build-macos build-windows build-linux
.PHONY: setup run run-web lint format format-check test check-all install install-gui clean build-macos build-windows build-linux

# Install dependencies
# ── First-time setup ──────────────────────────────────────────────────────────
# Creates a virtual environment and installs all dependencies in one command.
# Run this once after cloning the repository.
setup:
python -m venv .venv
.venv/bin/pip install --upgrade pip --quiet
.venv/bin/pip install -e ".[dev,web,gui]"
@echo ""
@echo "✅ Setup complete!"
@echo " Launch the desktop app : make run"
@echo " Launch the web UI : make run-web"
@echo " Run tests : make test"

# ── Run ───────────────────────────────────────────────────────────────────────

# Launch the native desktop GUI
run:
.venv/bin/docfinder-gui

# Launch the web interface (opens in browser at http://127.0.0.1:8000)
run-web:
.venv/bin/docfinder web

# ── Install (legacy targets, prefer 'make setup') ─────────────────────────────

# Install dependencies (no GUI)
install:
.venv/bin/pip install -e ".[dev,web]"

Expand Down
223 changes: 32 additions & 191 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,227 +4,68 @@
[![CodeQL](https://img.shields.io/github/actions/workflow/status/filippostanghellini/DocFinder/codeql.yml?branch=main&label=CodeQL&logo=github)](https://github.com/filippostanghellini/DocFinder/actions/workflows/codeql.yml)
[![License: AGPL v3](https://img.shields.io/badge/License-AGPL%20v3-blue.svg)](LICENSE)
[![Python](https://img.shields.io/badge/python-3.10%2B-blue?logo=python&logoColor=white)](https://www.python.org/downloads/)
[![Code style: ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
[![Stars](https://img.shields.io/github/stars/filippostanghellini/DocFinder?style=social)](https://github.com/filippostanghellini/DocFinder/stargazers)
[![Release](https://img.shields.io/github/v/release/filippostanghellini/DocFinder?logo=github)](https://github.com/filippostanghellini/DocFinder/releases)
[![Downloads](https://img.shields.io/github/downloads/filippostanghellini/DocFinder/total?logo=github)](https://github.com/filippostanghellini/DocFinder/releases)

<p align="center">
<img src="Logo.png" alt="DocFinder Logo" width="200">
<img src="Logo.png" alt="DocFinder Logo" width="160">
</p>

<p align="center">
<strong>🔍 Local-first semantic search for your PDF documents</strong>
<strong>Local-first semantic search for your PDF documents.</strong><br>
Everything runs on your machine — no cloud, no accounts, complete privacy.
</p>

<p align="center">
Index and search your PDFs using AI powered semantic embeddings.<br>
Everything runs locally on your machine no cloud, no external services, complete privacy.
</p>

---

## ✨ Features
<table width="100%">
<tr>
<td width="50%"><img src="images/search.png" alt="Search" width="100%"></td>
<td width="50%"><img src="images/index.png" alt="Index" width="100%"></td>
</tr>
</table>

- **100% Local**: Your documents never leave your machine
- **Fast Semantic Search**: Find documents by meaning, not just keywords
- **Cross-Platform**: Native apps for macOS, Windows, and Linux
- **GPU Accelerated**: Auto-detects Apple Silicon, NVIDIA, or AMD GPUs
- **PDF Optimized**: Powered by PyMuPDF for reliable text extraction
- **Web Interface**: UI for indexing and searching
## Features

---
- **Semantic search** — find documents by meaning, not just keywords
- **100% local** — your files never leave your machine
- **GPU accelerated** — auto-detects Apple Silicon (Metal), NVIDIA (CUDA), AMD (ROCm)
- **Cross-platform** — native apps for macOS, Windows, and Linux
- **Global shortcut** — bring DocFinder to front from anywhere with a configurable hotkey

## 🚀 Quick Start
## Download

### 1. Install

Download the app for your platform from [**GitHub Releases**](https://github.com/filippostanghellini/DocFinder/releases):

| Platform | Download |
|----------|----------|
| Platform | Installer |
|----------|-----------|
| **macOS** | [DocFinder-macOS.dmg](https://github.com/filippostanghellini/DocFinder/releases/latest) |
| **Windows** | [DocFinder-Windows-Setup.exe](https://github.com/filippostanghellini/DocFinder/releases/latest) |
| **Linux** | [DocFinder-Linux-x86_64.AppImage](https://github.com/filippostanghellini/DocFinder/releases/latest) |

### 2. Index Your Documents

1. Open DocFinder
2. Enter the path to your PDF folder (e.g., `~/Documents/Papers`)
3. Click **Index** and wait for completion

### 3. Search

Type a natural language query like:
- *"contract about property sale"*
- *"machine learning introduction"*
- *"invoice from December 2024"*

DocFinder finds relevant documents by **meaning**, not just exact keywords.

---

## 📸 Screenshots

<details>
<summary>Click to expand</summary>

**Search**
![Search](images/search.png)

**Index Documents**
![Index Documents](images/index-documents.png)

**Database**
![Database](images/database-documents.png)
**macOS** — open the DMG, drag DocFinder to Applications, then right-click → **Open** on first launch (Gatekeeper warning — normal for unsigned open-source apps).

</details>

---

## 💻 System Requirements

| Component | Minimum | Recommended |
|-----------|---------|-------------|
| **RAM** | 4 GB | 8 GB+ |
| **Disk Space** | 500 MB | 1 GB+ |
| **macOS** | 11.0 (Big Sur) | 13.0+ (Ventura) |
| **Windows** | 10 | 11 |
| **Linux** | Ubuntu 20.04+ | Ubuntu 22.04+ |

### GPU Support (Optional)

DocFinder **automatically detects** your hardware and uses the best available option:

| Hardware | Support | Notes |
|----------|---------|-------|
| **Apple Silicon** (M1/M2/M3/M4) | ✅ Automatic | Uses Metal Performance Shaders |
| **NVIDIA GPU** | ✅ With `[gpu]` extra | Requires CUDA drivers |
| **AMD GPU** | ✅ Automatic | Uses ROCm on Linux |
| **CPU** | ✅ Always works | Fallback option |

---

## 📦 Installation

### Desktop App (Recommended)

#### macOS

1. Download `DocFinder-macOS.dmg`
2. Open the DMG and drag **DocFinder** to **Applications**
3. **First launch**: Right-click → **Open** → Click **Open** again

> ⚠️ macOS shows a warning because the app isn't signed with an Apple Developer ID. This is normal for open-source software.

#### Windows

1. Download `DocFinder-Windows-Setup.exe`
2. Run the installer
3. If SmartScreen warns you: Click **More info** → **Run anyway**

#### Linux
**Windows** — run the installer; if SmartScreen appears choose **More info → Run anyway**.

**Linux**
```bash
wget https://github.com/filippostanghellini/DocFinder/releases/latest/download/DocFinder-Linux-x86_64.AppImage
chmod +x DocFinder-Linux-x86_64.AppImage
./DocFinder-Linux-x86_64.AppImage
chmod +x DocFinder-Linux-x86_64.AppImage && ./DocFinder-Linux-x86_64.AppImage
```

---

### Python Package
## Run from Source

For developers or advanced users:
Requires Python 3.10+ and `make`.

```bash
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate

# Install DocFinder
pip install .

# With GPU support (NVIDIA)
pip install '.[gpu]'

# With all extras (development + web + GUI)
pip install '.[dev,web,gui]'
```

---

## 🔧 Usage

### Desktop App

Just launch **DocFinder** from your Applications folder, Start Menu, or run the AppImage.

### Command Line

```bash
# Index a folder of PDFs
docfinder index ~/Documents/PDFs

# Search your documents
docfinder search "quarterly financial report"

# Launch web interface
docfinder web

# Launch desktop GUI (from source)
docfinder-gui
```

### Where is my data stored?

| Mode | Database Location |
|------|-------------------|
| Desktop App | `~/Documents/DocFinder/docfinder.db` |
| Development | `data/docfinder.db` |

---

## 🛠️ Build from Source

```bash
# Clone the repository
git clone https://github.com/filippostanghellini/DocFinder.git
cd DocFinder

# Install dependencies
make install-gui

# Run the GUI
docfinder-gui

# Build native app (macOS)
make build-macos
make setup # create .venv and install all dependencies
make run # desktop GUI
make run-web # web interface at http://127.0.0.1:8000
```

---

## 📁 Project Structure

```
src/docfinder/
├── ingestion/ # PDF parsing and text chunking
├── embedding/ # AI model wrappers (sentence-transformers, ONNX)
├── index/ # SQLite vector storage and search
├── utils/ # File handling and text utilities
└── web/ # FastAPI web interface
```

---

## 🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.
## Contributing

---
Contributions are welcome, feel free to open an issue or submit a pull request.

## 📄 License
## License

This project is licensed under the **GNU Affero General Public License v3.0 (AGPL-3.0)**.
Licensed under the **GNU Affero General Public License v3.0 (AGPL-3.0)**.

> **Note**: DocFinder was originally released under the MIT License. Starting from version 1.1.1, the license was changed to AGPL-3.0 to comply with [PyMuPDF](https://pymupdf.readthedocs.io/) licensing requirements.
> DocFinder was originally released under the MIT License. Starting from version 1.1.1 the license was changed to AGPL-3.0 to comply with the [PyMuPDF](https://pymupdf.readthedocs.io/) licensing requirements, as PyMuPDF itself is AGPL-3.0 licensed.
Binary file removed images/database-documents.png
Binary file not shown.
Binary file removed images/index-documents.png
Binary file not shown.
Binary file added images/index.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified images/search.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 4 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"

[project]
name = "docfinder"
version = "1.1.2"
version = "1.2.0"
license = "AGPL-3.0-or-later"
description = "Local-first semantic search CLI for PDF documents."
authors = [
Expand Down Expand Up @@ -45,7 +45,9 @@ gui = [
"pywebview>=5.0.0",
"fastapi>=0.115.0",
"uvicorn[standard]>=0.32.0",
"pydantic>=2.9.0"
"pydantic>=2.9.0",
"pynput>=1.7.0",
"pyobjc-framework-Cocoa>=9.0; sys_platform == 'darwin'"
]
gpu = [
"onnxruntime-gpu>=1.17.0"
Expand Down
Loading
Loading