Skip to content

Latest commit

 

History

History
288 lines (228 loc) · 9 KB

File metadata and controls

288 lines (228 loc) · 9 KB

AI Backend

AI backend platform built with FastAPI, featuring RAG pipelines, vector search, streaming AI responses, and scalable async architecture.


Architecture Overview

┌─────────────────────────────────────────────────────────┐
│                      FastAPI App                        │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐             │
│  │  Routes   │  │   Auth   │  │  Health  │             │
│  └────┬─────┘  └────┬─────┘  └──────────┘             │
│       │              │                                   │
│  ┌────▼──────────────▼────┐                            │
│  │      Service Layer      │                            │
│  │  AuthService │ AIService│                            │
│  │  FileService │ SubService                            │
│  └────┬─────────────┬─────┘                            │
│       │             │                                   │
│  ┌────▼────┐   ┌────▼──────────────────┐              │
│  │  Repos  │   │     AI Pipeline        │              │
│  │  (DB)   │   │  Embed → Index → RAG   │              │
│  └────┬────┘   └────┬──────────────────┘              │
│       │             │                                   │
│  ┌────▼────┐   ┌────▼────────┐   ┌───────────────┐   │
│  │PostgreSQL│  │   Qdrant    │   │   OpenAI API   │   │
│  └─────────┘  └─────────────┘   └───────────────┘   │
└─────────────────────────────────────────────────────────┘
         │
    ┌────▼──────────────────────┐
    │    Redis + Celery Workers  │
    │  Embedding │ Indexing │ GC │
    └────────────────────────────┘

Tech Stack

Layer Technology
Framework FastAPI + Uvicorn
Database PostgreSQL 16 + SQLAlchemy 2.0 Async
Cache / Broker Redis 7
Vector DB Qdrant
AI Provider OpenAI (GPT-4o + text-embedding-3-small)
Background Tasks Celery
Auth JWT (access + refresh tokens)
Validation Pydantic v2
Migrations Alembic
Testing Pytest + pytest-asyncio + HTTPX
Containerization Docker + Docker Compose

Features

Authentication

  • JWT access tokens (30-minute expiry)
  • Refresh token rotation with secure hashing
  • bcrypt password hashing
  • Role-based access control (user / admin)

AI Capabilities

  • Document Chat — RAG-powered Q&A over uploaded documents
  • Resume Analyzer — Structured resume feedback with optional job description matching
  • Code Review — Security, performance, and quality analysis
  • Meeting Summarizer — Transcript summarization with action items
  • Streaming Responses — Real-time SSE token streaming for all AI endpoints

File Processing Pipeline

Upload → Validate → Store → Queue Celery Task
  → Extract Text (PDF/DOCX/TXT/Code)
  → Chunk Text (configurable window + overlap)
  → Generate Embeddings (OpenAI batch)
  → Index to Qdrant
  → Update File Status → Done

Vector Search (RAG)

  • Semantic similarity search via Qdrant cosine distance
  • Per-user + per-file metadata filtering
  • Configurable top-k retrieval with score threshold
  • Context injection into structured prompts

Subscription System

  • Free / Pro / Enterprise tiers
  • Per-month request and token quotas
  • Quota enforcement via dependency injection
  • Stripe-ready schema (customer_id, subscription_id columns)

Background Processing (Celery)

  • embeddings queue — document processing & indexing
  • indexing queue — vector operations
  • cleanup queue — expired token purge, orphaned file cleanup

Project Structure

app/
├── api/v1/          # Thin route handlers
│   ├── auth.py
│   ├── ai.py
│   ├── files.py
│   ├── stream.py
│   └── subscriptions.py
├── ai/              # OpenAI integration & RAG pipeline
│   ├── client.py
│   ├── completions.py
│   ├── embeddings.py
│   └── pipeline.py
├── core/            # Security, exceptions, logging
├── db/              # SQLAlchemy engine & session
├── models/          # ORM models (7 tables)
├── schemas/         # Pydantic v2 request/response schemas
├── repositories/    # Data access layer (no business logic)
├── services/        # Business logic layer
├── tasks/           # Celery workers
├── vector/          # Qdrant client, indexer, retriever
├── streaming/       # SSE helpers
├── middleware/       # Request logging, rate limiting
├── dependencies/    # FastAPI DI (auth, db, quota)
├── utils/           # File extraction, text chunking
└── tests/           # pytest async test suite

Quick Start

Using Docker (recommended)

cp .env.example .env
# Edit .env with your OPENAI_API_KEY and SECRET_KEY

docker compose up --build

The API will be available at http://localhost:8000.

Local Development

python -m venv venv
source venv/bin/activate          # Windows: venv\Scripts\activate
pip install -r requirements.txt

cp .env.example .env
# Fill in DATABASE_URL, REDIS_URL, OPENAI_API_KEY, SECRET_KEY

alembic upgrade head
uvicorn app.main:app --reload

API Reference

Authentication

POST /api/v1/auth/register    Register new user
POST /api/v1/auth/login       Obtain access + refresh tokens
POST /api/v1/auth/refresh     Rotate refresh token

Files

POST   /api/v1/files/upload   Upload document (PDF/DOCX/TXT/code)
GET    /api/v1/files          List user's files
DELETE /api/v1/files/{id}     Delete file + vectors

AI Endpoints

POST /api/v1/ai/chat              General AI chat
POST /api/v1/ai/document-chat     RAG chat over uploaded document
POST /api/v1/ai/resume-analyze    Resume analysis
POST /api/v1/ai/code-review       Code review
POST /api/v1/ai/meeting-summary   Meeting transcript summarizer

Streaming (SSE)

POST /api/v1/stream/chat              Streaming general chat
POST /api/v1/stream/document-chat     Streaming RAG document chat

SSE events: token, done, error

Subscriptions

GET  /api/v1/subscriptions/plans    Available plans
GET  /api/v1/subscriptions/me       Current subscription
POST /api/v1/subscriptions/upgrade  Upgrade tier
GET  /api/v1/subscriptions/usage    Current period usage

Health

GET /health    Service health check

Environment Variables

Variable Description Default
DATABASE_URL PostgreSQL async URL required
REDIS_URL Redis URL required
OPENAI_API_KEY OpenAI secret key required
SECRET_KEY JWT signing secret required
QDRANT_URL Qdrant HTTP URL http://localhost:6333
OPENAI_MODEL Chat model gpt-4o
OPENAI_EMBEDDING_MODEL Embedding model text-embedding-3-small
ACCESS_TOKEN_EXPIRE_MINUTES Access token TTL 30
REFRESH_TOKEN_EXPIRE_DAYS Refresh token TTL 7
MAX_FILE_SIZE Max upload bytes 10485760 (10MB)
CHUNK_SIZE Embedding chunk word count 512
CHUNK_OVERLAP Chunk overlap words 50

Running Tests

# Requires a test PostgreSQL database: ai_saas_test
pytest -v

Database Migrations

# Generate migration after model changes
alembic revision --autogenerate -m "description"

# Apply migrations
alembic upgrade head

# Roll back one
alembic downgrade -1

Streaming Example (JavaScript)

const response = await fetch('/api/v1/stream/chat', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'Authorization': `Bearer ${token}`
  },
  body: JSON.stringify({ message: 'Explain async/await in Python' })
});

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  const lines = decoder.decode(value).split('\n');
  for (const line of lines) {
    if (line.startsWith('data:')) {
      const data = JSON.parse(line.slice(5));
      if (data.token) process.stdout.write(data.token);
    }
  }
}