Document Q&A system: upload PDF/TXT documents and ask questions. Uses a multi-agent pipeline (router, retrieval, generator, evaluator) with LangGraph, LlamaIndex RAG, and Qdrant.
flowchart TD
UserQuery[User Query]
Router[Router Agent]
Retrieval[Retrieval Agent]
Clarification[Clarification Agent]
RAG[RAG Pipeline LlamaIndex]
Generator[Answer Generation Agent]
Evaluator[Evaluation Agent]
Response[Response]
UserQuery --> Router
Router -->|"intent: answer"| Retrieval
Router -->|"intent: clarify"| Clarification
Clarification --> Response
Retrieval --> RAG
RAG --> Generator
Generator --> Evaluator
Evaluator --> Response
- Router: Classifies query as
answerorclarify(GPT-4o). - Retrieval: Fetches top-k chunks from Qdrant via LlamaIndex.
- Generator: Builds answer from context + query (GPT-4o).
- Evaluator: Confidence score and guardrails (PASS/FAIL).
| Layer | Technology |
|---|---|
| LLM | OpenAI GPT-4o |
| Orchestration | LangChain + LangGraph |
| Embeddings | OpenAI text-embedding-3-small |
| Vector DB | Qdrant (Docker) |
| RAG | LlamaIndex |
| Backend | FastAPI |
| Observability | LangSmith |
| Package manager | uv |
- Copy env and set API keys:
cp .env.example .env # Edit .env: OPENAI_API_KEY=sk-... and optionally LANGCHAIN_API_KEY=... - Start Qdrant and the app:
docker compose up --build
- API:
http://localhost:8000(docs athttp://localhost:8000/docs).
- Install uv, then:
uv sync --extra dev
- Run Qdrant (e.g.
docker run -p 6333:6333 qdrant/qdrant:latest). - Set
.envas above; then:uv run uvicorn api.main:app --reload --port 8000
| Variable | Description |
|---|---|
OPENAI_API_KEY |
Required for embeddings and GPT-4o. |
LANGCHAIN_API_KEY |
Optional; for LangSmith tracing. |
LANGCHAIN_TRACING_V2 |
Set to true to enable LangSmith. |
LANGCHAIN_PROJECT |
Project name in LangSmith (e.g. prod-rag-agent). |
QDRANT_HOST |
localhost locally; qdrant in Docker. |
QDRANT_PORT |
6333. |
- POST /upload — Upload a PDF or TXT file (multipart form
file). Content is chunked, embedded, and stored in Qdrant. - POST /query — Body:
{"question": "..."}. Returnsanswer,confidence,sources,latency_ms(andclarification: truewhen the router asks for clarification). - GET /health — Liveness check.
- Upload a document:
curl -X POST http://localhost:8000/upload -F "file=@doc.pdf" - Ask a question:
curl -X POST http://localhost:8000/query -H "Content-Type: application/json" -d '{"question": "What is the main topic of the document?"}'
- Example response:
{ "answer": "The document describes...", "confidence": 0.92, "sources": [{"text_preview": "..."}], "latency_ms": 1850.5 }
- Target: p95 response time < 3 seconds for a typical query (upload not included).
- Achieved latency depends on document size, retrieval count, and OpenAI latency;
latency_msis returned on every/queryresponse.
With LANGCHAIN_TRACING_V2=true and LANGCHAIN_API_KEY set, LangChain/LangGraph calls are traced automatically. View runs and traces in your LangSmith project (screenshot placeholder: add a screenshot of the LangSmith trace for a /query request).
- Unit + integration:
uv run pytest - Integration only:
uv run pytest -m integration
Requires no real API keys (OpenAI and Qdrant are mocked or in-memory in tests).