Skip to content

gauravlochab/agentic-invoice-processor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Agentic Invoice Processor

Production-grade AI agent platform for automated invoice processing and compliance workflows

FastAPI LangGraph React OpenAI License: MIT


Screenshots

Login Invoice Inbox AI Chat Interface
Login Inbox Chat

Login — JWT auth with role-based access (accuracy reviewer, compliance reviewer, senior, admin) Inbox — Invoice queue with status stats, filters by vendor/date/status, and email sync controls Chat — Natural language interface powered by LangChain tool-calling agents — approve, reject, and query invoices in plain English


Overview

An end-to-end agentic invoice processing system built on LangGraph, FastAPI, and React. The platform automates the full invoice lifecycle:

Email/Upload ingestion → OCR → LLM extraction → Contract matching → Human-in-the-loop approval → ERP export

Two interrupt gates enforce human oversight at accuracy review and compliance review stages, while a full audit trail captures every state transition for compliance-grade evidence.


Architecture

LangGraph 7-Node State Machine

                        ┌─────────────────────────────┐
  Email / File Upload   │                             │
  ────────────────────► │     EXTRACT_DOCUMENT        │
                        │  GPT-4-turbo (temp=0)       │
                        │  EasyOCR + JSON correction  │
                        └──────────────┬──────────────┘
                                       │
                        ┌──────────────▼──────────────┐
                        │     VALIDATE_CONTRACT       │
                        │  GPT-4o vendor normalization│
                        │  ±5% rate tolerance check   │
                        └──────────────┬──────────────┘
                                       │
                        ┌──────────────▼──────────────┐
                        │      PERSIST_INVOICE        │
                        │  PostgreSQL + MinIO storage │
                        └──────────────┬──────────────┘
                                       │
                        ┌──────────────▼──────────────┐
                        │  ⏸  Human Review: Accuracy  │  ← INTERRUPT GATE 1
                        │     (Chat or UI approval)   │
                        └──────────────┬──────────────┘
                                       │
                        ┌──────────────▼──────────────┐
                        │  PROCESS_ACCURACY_APPROVAL  │
                        └──────────────┬──────────────┘
                                       │
                        ┌──────────────▼──────────────┐
                        │ ⏸  Human Review: Compliance │  ← INTERRUPT GATE 2
                        │     (Chat or UI approval)   │
                        └──────────────┬──────────────┘
                                       │
                        ┌──────────────▼──────────────┐
                        │ PROCESS_COMPLIANCE_APPROVAL │
                        └──────────────┬──────────────┘
                                       │
                        ┌──────────────▼──────────────┐
                        │        CREATE_EXPORT        │
                        │  JSON/CSV ERP-ready export  │
                        └─────────────────────────────┘

Key Features

Automated Ingestion

  • Gmail and Outlook IMAP polling with configurable intervals
  • SHA-256 content deduplication — no duplicate invoices ever processed
  • Attachment extraction (PDF, PNG, JPG) routed directly into the LangGraph workflow

Multi-Model LLM Pipeline

  • GPT-4-turbo (temperature=0) for deterministic field extraction from raw OCR text
  • GPT-4o for intelligent contract matching and vendor normalization
  • GPT-3.5-turbo for the natural-language chat interface (cost-efficient)
  • Malformed JSON auto-correction loop with up to 3 retry attempts

LangGraph State Machine

  • 7-node directed graph with typed InvoiceState shared across all nodes
  • 2 human-in-the-loop interrupt gates using LangGraph's interrupt() primitive
  • Resumable workflow — persists state to PostgreSQL between interrupt gates
  • Celery worker executes the graph asynchronously; Redis tracks job state

Intelligent Contract Matching

  • LLM-based vendor name normalization (handles abbreviations, legal suffixes)
  • ±5% rate tolerance for line-item price variance
  • Service coverage validation against active contract scope
  • Mismatch reasons surfaced to reviewers with field-level evidence

Natural Language Chat

  • 5 LangChain tool-use agents: approve_invoice, reject_invoice, find_invoice, list_invoices, get_invoice_details
  • Per-session conversation memory with PostgreSQL-backed ChatSession / ChatMessage tables
  • Resolve entire review queues by typing plain English commands

Full Audit Trail

  • Append-only InvoiceStatusLog records every state transition with timestamp and actor
  • InvoiceReview table captures reviewer identity, decision, and free-text notes
  • Export table records every downstream handoff for compliance evidence

Tech Stack

Layer Technologies
Backend FastAPI 0.110, LangGraph, LangChain, CrewAI, PostgreSQL 16, Redis 7, MinIO, Celery
Frontend React 18, TypeScript, Vite, Tailwind CSS
AI / LLM OpenAI GPT-4-turbo, GPT-4o, GPT-3.5-turbo, EasyOCR
Infra Docker Compose (6 services: postgres, redis, minio, mailpit, api, worker)

Database Schema

11 tables covering the full invoice lifecycle:

Table Purpose
users Authentication and role management
invoice_documents Raw file metadata, MinIO object keys, SHA-256 hashes
invoices Extracted structured data (vendor, amounts, dates, line items)
invoice_line_items Individual service/product rows per invoice
contracts Vendor contracts with rate schedules and validity windows
invoice_reviews Reviewer decisions, notes, timestamps for both review gates
invoice_status_logs Append-only audit trail of every status transition
exports ERP export records with format, destination, and delivery status
chat_sessions Per-user conversation context for the LangChain chat agent
chat_messages Message history (role + content) for each chat session
email_ingestion_logs IMAP poll records, attachment counts, deduplication outcomes

Quick Start

Prerequisites

  • Docker and Docker Compose
  • OpenAI API key

Setup

git clone https://github.com/gauravlochab/agentic-invoice-processor.git
cd agentic-invoice-processor

# Copy and configure environment variables
cp ops/.env.example ops/.env
# Edit ops/.env and set OPENAI_API_KEY and other required values

# Start all 6 services
docker compose -f ops/docker-compose.yml up --build

The API will be available at http://localhost:8000 and the React frontend at http://localhost:5173.

Run database migrations

docker compose -f ops/docker-compose.yml exec api alembic upgrade head

API Endpoints

Method Route Description
POST /api/v1/auth/login Obtain JWT access token
GET /api/v1/invoices/ List invoices with filters and pagination
GET /api/v1/invoices/{id} Full invoice detail with line items and audit log
POST /api/v1/invoices/{id}/approve Approve invoice at current review gate
POST /api/v1/invoices/{id}/reject Reject invoice with reason
POST /api/v1/invoices/upload Upload a PDF/image and trigger extraction workflow
GET /api/v1/contracts/ List all vendor contracts
POST /api/v1/contracts/ Create a new vendor contract
POST /api/v1/chat/message Send a message to the LangChain invoice agent
GET /api/v1/chat/sessions List chat sessions for the current user
POST /api/v1/email-sync/trigger Manually trigger IMAP poll
GET /api/v1/analytics/summary Invoice volume, approval rates, cycle-time metrics
GET /api/v1/queue/status Celery queue depth and worker health
GET /api/v1/health Service liveness and dependency health check

Project Structure

agentic-invoice-processor/
├── backend/
│   └── app/
│       ├── api/v1/routers/       # FastAPI route handlers
│       │   ├── auth.py
│       │   ├── invoices.py
│       │   ├── contracts.py
│       │   ├── chat.py
│       │   ├── email_sync.py
│       │   └── analytics.py
│       ├── core/                 # Settings, security, middleware
│       ├── db/                   # SQLAlchemy models, schemas, migrations
│       ├── services/
│       │   ├── extraction/       # GPT-4-turbo OCR + field extraction
│       │   ├── contracts/        # Contract matching logic
│       │   ├── ingestion/        # IMAP polling, deduplication
│       │   ├── ocr/              # EasyOCR wrapper
│       │   └── workflow/         # LangGraph state machine (7 nodes)
│       ├── crew/                 # CrewAI orchestration helpers
│       └── workers/              # Celery task definitions
├── frontend/
│   └── src/
│       ├── components/           # Shared UI components
│       ├── pages/                # Route-level page components
│       │   ├── Dashboard.tsx
│       │   ├── InvoiceDetail.tsx
│       │   ├── AccuracyReviewScreen.tsx
│       │   ├── ComplianceReviewScreen.tsx
│       │   ├── Chat.tsx
│       │   └── Inbox.tsx
│       ├── hooks/                # React Query data hooks
│       └── types/                # TypeScript interfaces
├── ops/
│   ├── docker-compose.yml        # Local dev (6 services)
│   ├── docker-compose.prod.yml   # Production configuration
│   ├── .env.example              # Required environment variables
│   ├── Dockerfile.api
│   ├── Dockerfile.worker
│   └── Dockerfile.web
└── tests/                        # Pytest integration tests

Environment Variables

Key variables required in ops/.env (see ops/.env.example for the full list):

OPENAI_API_KEY=sk-...
DATABASE_URL=postgresql://user:pass@postgres:5432/invoices
REDIS_URL=redis://redis:6379/0
MINIO_ENDPOINT=minio:9000
MINIO_ACCESS_KEY=...
MINIO_SECRET_KEY=...
SECRET_KEY=<random-256-bit-string>
ALLOWED_ORIGINS=http://localhost:5173

License

MIT License — see LICENSE for details.

About

Production AI agent platform: LangGraph state machine + GPT-4 for automated invoice processing, contract matching, and compliance workflows

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors