Ask questions about a PDF, get answers with clickable, page-anchored citations, and see the source passages highlighted in a side-by-side PDF viewer.
Built on the Claude Agent SDK and PyMuPDF. Authentication piggybacks on your local Claude Code session — no API keys to manage.
-
Two-pane web UI — PDF on the left, chat on the right
-
Color-matched citations — each citation gets a distinct pastel color, applied to the PDF highlight, the citation chip's left border, and the inline
(p. N)pill in the answer text. When several citations land on the same page, the colors tell you which highlight goes with which chip at a glance. -
Click-to-jump — both inline
(p. 4)references and the citation chips scroll the PDF to the right page on click. -
Three answer modes — pick by how much initiative you want the model to take:
auto(default) — concise, extractive, shaped to the question. Use when you're asking about the paper. Examples: "What does the paper claim about X?", "Which datasets did they evaluate on?", "Summarize section 4."strict— pure extraction, never infer or interpret. Use when you want the literal text only. Examples: "List exactly the methods they evaluated.", "Quote the threats-to-validity section."freehand— the model is your collaborator; the user prompt is the spec. Inference, synthesis, application, and structural framing are encouraged — every factual claim still anchored to a citation, but inferences are marked ("suggests", "implies", "extending this"). Use for anything generative. Examples: "Fill this SLR rubric for me: [paste rubric]", "Draft a related-work paragraph that critically engages with this paper.", "What hypotheses does the framework in §3 suggest about [my domain]?", "Identify weaknesses in their evaluation."
Rule of thumb: auto for "tell me about the paper," freehand for "use the paper to do something for me," strict when you don't want the model adding any flavor.
-
Per-PDF chat memory — follow-up questions like "make that more concise" or "what did I ask first?" work because each turn sees the prior conversation. A clear button wipes history; turns rehydrate when you switch back to a PDF.
-
Abstract is off-limits —
extract_pageswraps the abstract in explicit[BEGIN ABSTRACT]/[END ABSTRACT]markers and the prompt forbids citing inside them, forcing the model to anchor claims in the body where they're elaborated. -
Markdown answers with bold, lists, headings, and inline code.
-
Activity trail — collapsed
Thought for Ns · K stepsper turn shows the agent's reasoning and tool calls. -
Robust word-coordinate highlighting — multi-line wraps, hyphenated breaks, and minor paraphrases all match. Cross-page fallback: if the cited page misses, every other page is scanned and the longest match wins, so off-by-one page numbers from the model self-heal.
-
Model selector — Haiku 4.5 (fast/cheap default), Sonnet 4.6, Opus 4.7, or whatever your CLI's
/modelis set to (inherit). -
CLI — same agent, headless:
python pdf_qa.py paper.pdf "question" [--mode auto|strict|brainstorm].
-
Python 3.10 or newer. The Claude Agent SDK requires it. If
pip installreportsNo matching distribution found for claude-agent-sdk, your Python is too old — check withpython --versionand install a newer one (python.org). -
Claude Code CLI installed and signed in once interactively (
claudefrom a terminal). The Agent SDK spawns theclaudeCLI as a subprocess and inherits whatever it's authenticated with — works with a Claude Pro/Max subscription (OAuth) or anANTHROPIC_API_KEYconfigured via the CLI.Where Claude Code stores its credentials per OS:
OS Path macOS / Linux ~/.claude/Windows %USERPROFILE%\.claude\(C:\Users\<you>\.claude\)Nothing in this repo touches that directory; auth lives entirely outside the project, so cloning is enough — no
.envfiles, no key configuration.
git clone https://github.com/<you>/sift.git
cd sift
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txtIf python3 resolves to an older interpreter, run the venv step with the
specific binary, e.g. python3.12 -m venv venv.
git clone https://github.com/<you>/sift.git
cd sift
python -m venv venv
venv\Scripts\Activate.ps1
pip install -r requirements.txtIf PowerShell blocks the activate script with an execution-policy error,
run once: Set-ExecutionPolicy -Scope CurrentUser RemoteSigned.
Use venv\Scripts\activate.bat from cmd.exe instead of PowerShell.
uvicorn app:app --port 8000Open http://localhost:8000. Upload a PDF, ask a question, click any
(p. N) reference or citation chip to jump.
python pdf_qa.py path/to/paper.pdf "What is the main contribution?"
python pdf_qa.py --model sonnet --mode brainstorm paper.pdf "Limitations?"Outputs: paper_highlighted.pdf and paper_citations.json next to the
input PDF.
- PyMuPDF extracts per-page text into
<paper>.pages.txt. - The agent reads that file, identifies passages that ground each claim,
and writes a tiny script that calls
highlight_lib.highlight_pdf(input, output, citations, passages). highlight_libmatches each passage at the word-coordinate level (it pulls every word's bounding box viapage.get_text("words")and finds the longest contiguous matching run vs the quote, normalized lowercase + alphanumeric). Word-level matching survives anythingsearch_forchokes on: line wraps, hyphenated breaks, ligatures.- Citations JSON records the actual highlighted text plus a per-citation
pastel color, so the chip's border, the inline
(p. N)pill, and the yellow PDF region always agree. - Per-PDF chat memory lives in an in-process dict (
CHATS) keyed by filename. Each/askprepends the last 10 turns to the prompt as aPRIOR CONVERSATIONblock, so the agent can answer follow-ups that reference earlier turns. Memory is volatile (lost on uvicorn restart); add a JSON dump inapp.pyif you want persistence.
.
├── app.py FastAPI server (web UI + SSE streaming)
├── agent_core.py Shared agent setup and prompt for CLI + web
├── highlight_lib.py Word-coordinate highlighting library
├── pdf_qa.py CLI entrypoint
├── static/index.html Two-pane UI (vanilla JS, no build step)
├── pdfs/ User PDFs and generated artifacts (gitignored)
└── requirements.txt
Endpoints (app.py):
GET /static UIGET /configmodel + mode choicesPOST /uploadmultipart PDF uploadGET /pdfslist uploaded PDFsGET /pdf/{id}serve PDF (?highlighted=truefor annotated copy)GET /history/{id}per-PDF chat history for rehydration on reloadPOST /clear/{id}wipe chat history for a PDFPOST /askSSE stream:stats,text,tool,done,error
Per-turn options sent to /ask:
{ file_id, question, model: "haiku|sonnet|opus|inherit", mode: "auto|strict|freehand" }.
The defaults (DEFAULT_MODEL, DEFAULT_MODE) and per-PDF turn cap
(MAX_TURNS_KEPT) live in agent_core.py and app.py respectively.
max_buffer_sizeis set to 20 MB on the SDK transport because theReadtool returns large JSON payloads when invoked on big files. Don't lower it.- The
Readtool is intentionally pointed at the extracted.pages.txt, never at the PDF directly — invokingReadon a PDF returns each page as base64 image data and immediately blows past any sane buffer.
