Skip to content

iscarface/rag_agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Production RAG Pipeline with Multi-Agent Orchestration

Document Q&A system: upload PDF/TXT documents and ask questions. Uses a multi-agent pipeline (router, retrieval, generator, evaluator) with LangGraph, LlamaIndex RAG, and Qdrant.

Architecture

flowchart TD
    UserQuery[User Query]
    Router[Router Agent]
    Retrieval[Retrieval Agent]
    Clarification[Clarification Agent]
    RAG[RAG Pipeline LlamaIndex]
    Generator[Answer Generation Agent]
    Evaluator[Evaluation Agent]
    Response[Response]

    UserQuery --> Router
    Router -->|"intent: answer"| Retrieval
    Router -->|"intent: clarify"| Clarification
    Clarification --> Response
    Retrieval --> RAG
    RAG --> Generator
    Generator --> Evaluator
    Evaluator --> Response
Loading
  • Router: Classifies query as answer or clarify (GPT-4o).
  • Retrieval: Fetches top-k chunks from Qdrant via LlamaIndex.
  • Generator: Builds answer from context + query (GPT-4o).
  • Evaluator: Confidence score and guardrails (PASS/FAIL).

Tech Stack

Layer Technology
LLM OpenAI GPT-4o
Orchestration LangChain + LangGraph
Embeddings OpenAI text-embedding-3-small
Vector DB Qdrant (Docker)
RAG LlamaIndex
Backend FastAPI
Observability LangSmith
Package manager uv

Setup

Using Docker Compose

  1. Copy env and set API keys:
    cp .env.example .env
    # Edit .env: OPENAI_API_KEY=sk-... and optionally LANGCHAIN_API_KEY=...
  2. Start Qdrant and the app:
    docker compose up --build
  3. API: http://localhost:8000 (docs at http://localhost:8000/docs).

Local development (uv)

  1. Install uv, then:
    uv sync --extra dev
  2. Run Qdrant (e.g. docker run -p 6333:6333 qdrant/qdrant:latest).
  3. Set .env as above; then:
    uv run uvicorn api.main:app --reload --port 8000

Environment (.env)

Variable Description
OPENAI_API_KEY Required for embeddings and GPT-4o.
LANGCHAIN_API_KEY Optional; for LangSmith tracing.
LANGCHAIN_TRACING_V2 Set to true to enable LangSmith.
LANGCHAIN_PROJECT Project name in LangSmith (e.g. prod-rag-agent).
QDRANT_HOST localhost locally; qdrant in Docker.
QDRANT_PORT 6333.

API

  • POST /upload — Upload a PDF or TXT file (multipart form file). Content is chunked, embedded, and stored in Qdrant.
  • POST /query — Body: {"question": "..."}. Returns answer, confidence, sources, latency_ms (and clarification: true when the router asks for clarification).
  • GET /health — Liveness check.

Example flow

  1. Upload a document:
    curl -X POST http://localhost:8000/upload -F "file=@doc.pdf"
  2. Ask a question:
    curl -X POST http://localhost:8000/query -H "Content-Type: application/json" -d '{"question": "What is the main topic of the document?"}'
  3. Example response:
    {
      "answer": "The document describes...",
      "confidence": 0.92,
      "sources": [{"text_preview": "..."}],
      "latency_ms": 1850.5
    }

Performance

  • Target: p95 response time < 3 seconds for a typical query (upload not included).
  • Achieved latency depends on document size, retrieval count, and OpenAI latency; latency_ms is returned on every /query response.

Observability (LangSmith)

With LANGCHAIN_TRACING_V2=true and LANGCHAIN_API_KEY set, LangChain/LangGraph calls are traced automatically. View runs and traces in your LangSmith project (screenshot placeholder: add a screenshot of the LangSmith trace for a /query request).

Tests

  • Unit + integration: uv run pytest
  • Integration only: uv run pytest -m integration

Requires no real API keys (OpenAI and Qdrant are mocked or in-memory in tests).

About

Production RAG pipeline: multi-agent document Q&A (router, retrieval, generator, evaluator) with LangGraph, LlamaIndex, Qdrant, and FastAPI.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors