A private, local PDF research assistant that reads your documents aloud and answers questions about them. Upload PDFs or capture webpages, then chat with your content using AIβall running on your own machine, no subscriptions required.
- Docker and Docker Compose
- A local LLM runtime (choose one):
- Docker Model Runner (built into Docker Desktop)
- Ollama (lightweight CLI)
- LMStudio (GUI app)
π Choose your LLM setup (click to expand)
- copy .env.example file and rename it to .env file.
- Pick an LLM server from below and set "LLM_API_URL' with the url accordingly
- Enable Model Runner in Docker Desktop β Settings β Features
- Pull models:
docker model pull ai/qwen3:latest docker model pull ai/nomic-embed-text-v1.5:latest
- Set LLM_API_URL in
.envfile to:LLM_API_URL=http://host.docker.internal:12434
- Install Ollama
- Pull models:
ollama pull llama3.2 ollama pull nomic-embed-text
- Set LLM_API_URL in
.envfile to:LLM_API_URL=http://host.docker.internal:11434
- Install LMStudio
- Download a chat model and embedding model
- Start Local Server (port 1234)
- Set LLM_API_URL in
.envfile to:LLM_API_URL=http://host.docker.internal:1234/v1
docker-compose up --build- Open: http://localhost:3000
- Upload PDFs or add webpages
- Click Play to hear documents read aloud
- Ask questions about your content
- Text-to-Speech: High-quality voice reads your PDFs aloud
- Sentence Tracking: Visual highlighting shows what's being read
- Multiple Documents: Switch between PDFs and webpages with tabs
- PDF Annotations: Highlight, draw, and comment directly on documents
- AI Assistant: Ask questions about your uploaded content
- Smart Memory: Remembers previous conversations in each thread
- Web Search: Optionally include live internet results
- Reasoning Display: See how the AI thinks through problems
- Modern Interface: Clean, intuitive design
- Thread Organization: Keep different topics separate
- Customizable: Adjust AI behavior per conversation
- Private: Everything runs locally on your machine
- Create a Thread - Use the sidebar to start a new conversation
- Add Content - Upload PDFs or add webpage URLs
- Start Reading - Click Play to hear documents aloud
- Ask Questions - Type questions in the chat
- Play Controls: Click Play or double-click any sentence to start
- Voice Settings: Choose different voices and adjust speed (0.5x-2.0x)
- Auto-Scroll: Document follows along automatically
- Select Model: Choose your preferred AI model
- Internet Search: Toggle to include live web results
- View Reasoning: Expand panels to see AI thinking process
- Semantic Memory: See which past conversations were used
- Thread Settings: Click βοΈ to adjust AI behavior
- System Role: Change the AI's persona
- Tool Instructions: Guide how AI uses different tools
- Custom Instructions: Add extra directions
ποΈ Architecture & Services
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Docker Compose β
βββββββββββββββββββ¬ββββββββββββββββββ¬ββββββββββββββββββ¬ββββββββββββββββββ¬ββββββββββββββββββββββ€
β Frontend β RAG Service β Browser Captureβ PostgreSQL β Weaviate β
β (Next.js) β (FastAPI) β (Selenium) β (Primary DB) β (Vector DB) β
β Port: 3000 β Port: 8000 β Port: 7800 β Port: 5432 β Port: 8080 β
βββββββββββββββββββ΄ββββββββββββββββββ΄ββββββββββββββββββ΄ββββββββββββββββββ΄ββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββ
β DMR / Ollama / LMStudio / LLM β
β (OpenAI-compatible) β
β Port: 12434 (default) β
ββββββββββββββββββββββββββββββββββββββββββββββββ
| Service | Port | Description |
|---|---|---|
| Frontend | 3000 | Next.js React app with PDF viewer, chat UI, thread management, and TTS |
| RAG Service | 8000 | FastAPI server for PDF processing, document indexing, AI chat, thread/message/file management |
| Browser Capture | 7800 | Selenium-based service for interactive webpage capture and PDF conversion |
| PostgreSQL | 5432 | Primary database for threads, messages, files, settings, and annotations |
| Weaviate | 8080 | Vector database for semantic and memory search |
| DMR/Ollama/LMStudio | 12434 | Local LLM server (external, user-provided) |
π€ Advanced AI Features
- Orchestrator Agent: LangGraph-powered agent that plans, selects tools, and synthesizes answers
- Intent Agent (optional): Pre-processes questions to improve query clarity and search precision
- Tool-Calling: Dynamic tool selection including document search, memory recall, web search, and clarification
- Configurable Iterations: Control tool-call rounds with forced final answer to prevent infinite loops
- Multi-Provider Extraction: Supports reasoning traces from Claude, OpenAI o-series, DeepSeek, QwQ, Qwen3-Thinking
- Database Storage: Reasoning traces persisted alongside answers in PostgreSQL
- UI Display: Expandable reasoning panels in chat bubbles for transparent AI thinking
- Thread-Scoped Collections: Each thread has isolated vector collections in Weaviate
- Multi-Source Retrieval: Simultaneous search across PDFs, webpages, and past conversations
- Semantic Recollection: UI highlights which past messages were used in current answers
- Context Management: Intelligent token budgeting for optimal LLM context window usage
π οΈ Technology Stack
| Technology | Purpose |
|---|---|
| FastAPI | Web framework |
| LangChain | LLM/Embedding integration |
| LangGraph | Stateful multi-agent workflow |
| Weaviate Client | Vector database operations |
| SQLModel | ORM built on SQLAlchemy |
| SQLAlchemy | Async database operations |
| Alembic | Database migration management |
| asyncpg | Async PostgreSQL driver |
| Technology | Purpose |
|---|---|
| Selenium | WebDriver automation |
| Brave Browser | Headless browser rendering |
| WeasyPrint | PDF conversion fallback |
| FastAPI | Service API framework |
| Technology | Purpose |
|---|---|
| Next.js | React framework |
| Material-UI (MUI) | UI components (v7) |
| EmbedPDF | PDF rendering with annotations |
| react-markdown | Chat message rendering |
| React Query | Async state management |
π Project Structure
askpdf/
βββ docker-compose.yml # Multi-service orchestration
βββ run_tests.sh # Comprehensive test runner
βββ browser_capture/ # Selenium-based webpage capture service
βββ rag_service/ # FastAPI backend with AI, RAG, and database
β βββ app/
β β βββ api/ # REST API route handlers
β β βββ agent/ # Multi-agent AI system
β β βββ db/ # PostgreSQL data layer
β β βββ services/ # Business logic services
β β βββ rag/ # RAG core logic
β βββ tests/ # Comprehensive test suite
βββ frontend/ # Next.js React application
βββ src/
β βββ components/ # UI components
β βββ hooks/ # React hooks
β βββ lib/ # Utility functions
βββ package.json
βοΈ Configuration & Environment
Environment variables are now managed using a .env file for better security and maintainability. The system uses two approaches:
.envfile - For user-configurable settings (models, database URLs, behavior settings)docker-compose.yml- For service-specific configuration (networking, basic service settings)
-
Copy the example file:
cp .env.example .env
-
Configure your LLM provider in
.env:# Choose your LLM provider LLM_API_URL=http://host.docker.internal:1234/v1 # LMStudio # LLM_API_URL=http://host.docker.internal:11434 # Ollama # LLM_API_URL=http://host.docker.internal:12434 # Docker Model Runner
-
Review other settings in
.envand adjust as needed for your use case.
LLM Configuration
| Variable | Default | Description |
|---|---|---|
LLM_API_URL |
(none) | External LLM server URL (Docker Model Runner/Ollama/LMStudio) |
Model Configuration
| Variable | Default | Description |
|---|---|---|
LOCAL_EMBEDDING_MODEL |
BAAI/bge-m3 |
Single local embedding model to use |
LOCAL_RERANKER_MODEL |
BAAI/bge-reranker-v2-m3 |
Single local reranker model to use |
EMBEDDING_DEVICE |
cpu |
Device for embedding models (cpu/cuda/mps) |
RERANKER_DEVICE |
cpu |
Device for reranker models (cpu/cuda/mps) |
AI Behavior & Limits
| Variable | Default | Description |
|---|---|---|
DEFAULT_TOKEN_BUDGET |
8192 |
Context window size for AI responses |
DEFAULT_MAX_ITERATIONS |
10 |
Maximum tool-call rounds for AI reasoning |
MIN_MAX_ITERATIONS |
1 |
Minimum allowed iterations |
MAX_MAX_ITERATIONS |
30 |
Maximum allowed iterations |
MAX_CUSTOM_INSTRUCTIONS_CHARS |
2000 |
Maximum custom instruction length |
MAX_SYSTEM_ROLE_CHARS |
500 |
Maximum system role description length |
MAX_TOOL_INSTRUCTION_CHARS |
500 |
Maximum tool instruction length |
INTENT_AGENT_MAX_ITERATIONS |
1 |
Maximum iterations for intent agent |
MAX_ITERATIONS_SUFFICIENT_COVERAGE |
2 |
Iteration bonus for sufficient coverage |
MAX_ITERATIONS_PROBABLY_SUFFICIENT_COVERAGE |
4 |
Iteration bonus for probable sufficient coverage |
WEB_SEARCH_ITERATION_BONUS |
2 |
Extra iterations when web search is enabled |
Document Processing (Docling)
| Variable | Default | Description |
|---|---|---|
DOCLING_DO_OCR |
True |
Enable OCR for scanned images (preserves digital text) |
DOCLING_DO_TABLE_STRUCTURE |
True |
Extract table structure from documents |
DOCLING_TABLE_MODE |
ACCURATE |
Table extraction mode (FAST/ACCURATE) |
DOCLING_FORCE_FULL_PAGE_OCR |
False |
Force full-page OCR (keep false for digital PDFs) |
DOCLING_DO_FORMULA_ENRICHMENT |
False |
Enable mathematical formula extraction |
Database Configuration
| Variable | Default | Description |
|---|---|---|
DATABASE_URL |
postgresql+asyncpg://postgres:postgres@postgresql:5432/askpdf |
PostgreSQL connection string |
TEST_DATABASE_URL |
postgresql+asyncpg://postgres:postgres@postgresql:5432/test_askpdf |
Test database connection string |
POSTGRES_POOL_SIZE |
10 |
Database connection pool size |
POSTGRES_MAX_OVERFLOW |
20 |
Maximum additional connections beyond pool size |
Frontend Service
| Variable | Default | Description |
|---|---|---|
NEXT_PUBLIC_API_URL |
http://localhost:8000 |
RAG service API URL for frontend communication |
RAG Service - Core Configuration
| Variable | Default | Description |
|---|---|---|
LOG_LEVEL |
INFO |
Logging level (DEBUG, INFO, WARNING, ERROR) |
WEAVIATE_URL |
http://weaviate:8080 |
Weaviate vector database endpoint |
WEAVIATE_HYBRID_ALPHA |
0.7 |
Hybrid search balance (0.0=pure vector, 1.0=pure keyword) |
CAPTURE_SERVICE_URL |
http://browser-capture:8080 |
Browser capture service endpoint |
-
Initial Setup: Copy
.env.exampleto.envand configure your settings:cp .env.example .env # Edit .env with your preferred settings -
Apply Changes: After modifying environment variables, restart the services:
docker-compose down docker-compose up --build
You need a chat model with tool calling support and an embedding model:
| Runtime | Chat model example | Embedding model example |
|---|---|---|
| DMR | ai/qwen3:latest |
ai/nomic-embed-text-v1.5:latest |
| Ollama | llama3.2 |
nomic-embed-text |
| LMStudio | google/gemma-3-12b |
text-embedding-embeddinggemma-300m-qat |
π API Reference
POST /api/threads- Create new threadPOST /api/threads/{thread_id}/chat- Chat with documentsPUT /api/threads/{thread_id}/settings- Update thread settingsGET /api/threads/{thread_id}/messages- List messages
POST /api/threads/{thread_id}/files/upload- Upload PDFGET /api/threads/{thread_id}/files/{file_hash}- Get file dataGET /api/threads/{thread_id}/files/{file_hash}/status- Check processing status
GET /api/models- List available modelsGET /api/health/chat-model/{model}- Check chat model healthGET /api/health/embed-model/{model}- Check embedding model health
π§ͺ Testing
./run_tests.sh [options]--verbose- Verbose output--file <file>- Run specific test file--coverage- Run with coverage report--db-tests- Run PostgreSQL database tests--api- Run API endpoint tests
- Database Tests: PostgreSQL operations, models, repositories
- API Tests: Endpoint testing, integration tests
- Parsing Tests: PDF processing with Docling and pdfplumber
Contributions are welcome! Please feel free to submit a Pull Request.
This project uses the following third-party technologies:
- Kokoro - Text-to-speech model
- spaCy - Natural language processing
- LangChain - LLM framework
- LangGraph - Stateful AI workflows
- Weaviate - Vector database
- FastAPI - Web framework
- Next.js - React framework
- hexgrad for the amazing Kokoro-82M model
- spaCy for robust NLP capabilities
- LangChain team for the excellent LLM framework
- Weaviate for the powerful vector database
- The open-source community for all the amazing tools
For questions, issues, or suggestions, please open an issue on the GitHub repository.