A comprehensive full-stack platform for legal document processing, natural language querying, and intelligent metadata extraction. Built with FastAPI + React TypeScript, demonstrating enterprise-grade architecture and production-ready code.
- Document Upload: Drag-and-drop interface for multiple PDF/DOCX legal documents
- Intelligent Processing: Automatic extraction of metadata (agreement types, jurisdictions, industries, geography)
- Natural Language Querying: Ask questions in plain English across your document collection
- Interactive Dashboard: Visual insights with charts and analytics
- Secure Authentication: JWT-based security with rate limiting
- Scalable Architecture: Built for production with proper error handling and logging
- Framework: FastAPI with Python 3.9+
- Database: SQLAlchemy ORM with SQLite (configurable for PostgreSQL/MySQL)
- Authentication: JWT tokens with refresh mechanism
- Document Processing: Intelligent pattern recognition for metadata extraction
- API Design: RESTful endpoints with proper validation and error handling
- Framework: React 18 with TypeScript
- Styling: Tailwind CSS with responsive design
- Charts: Recharts for data visualization
- State Management: React Context API with hooks
- File Handling: React Dropzone for document uploads
- Python 3.9+
- Node.js 18+
- Poetry (Python dependency management)
- npm or yarn
-
Install Poetry (if not already installed):
curl -sSL https://install.python-poetry.org | python3 - -
Install Dependencies:
poetry install
-
Environment Configuration:
cp env.example .env # Edit .env with your configuration -
Run Backend:
poetry run python run.py
-
Install Dependencies:
cd frontend npm install -
Run Development Server:
npm run dev
-
Build for Production:
npm run build
# Database Configuration
DATABASE_URL=sqlite:///./legal_intel.db
# File Upload Settings
UPLOAD_DIR=uploads
MAX_FILE_SIZE=10485760
# Security
SECRET_KEY=your-super-secret-key-change-this-in-production
ALGORITHM=HS256
ACCESS_TOKEN_EXPIRE_MINUTES=30
REFRESH_TOKEN_EXPIRE_DAYS=7
# Redis Configuration
REDIS_HOST=localhost
REDIS_PORT=6379
REDIS_DB=0
REDIS_PASSWORD=
# Rate Limiting
RATE_LIMIT_PER_MINUTE=60
RATE_LIMIT_PER_HOUR=1000
MAX_CONCURRENT_UPLOADS=5
# AI/LLM Configuration
OPENAI_API_KEY=your-openai-api-key
OPENAI_MODEL=gpt-4POST /auth/register- User registrationPOST /auth/login- User loginPOST /auth/refresh- Refresh access tokenPOST /auth/logout- User logout
POST /documents/upload- Upload multiple documentsGET /documents/- List all documentsGET /documents/{id}- Get document detailsDELETE /documents/{id}- Delete documentGET /documents/stats/summary- Document statistics
POST /query- Natural language query across documentsGET /query/history- Query history
GET /dashboard- Dashboard analytics and insights
Example 1: Find agreements governed by UAE law
Question: "Which agreements are governed by UAE law?"
Response:
{
"results": [
{
"document": "nda_abudhabi.pdf",
"governing_law": "UAE"
},
{
"document": "supplier_contract_dubai.docx",
"governing_law": "UAE"
}
],
"total_results": 2
}Example 2: Find technology industry contracts
Question: "Show me all contracts in the technology industry"
- Navigate to the Upload page
- Drag and drop PDF/DOCX files or click to browse
- Files are automatically processed and metadata extracted
- View processing status and results
poetry run pytest
poetry run pytest tests/ -vcd frontend
npm test# Backend
poetry run black .
poetry run isort .
poetry run flake8
poetry run mypy .
poetry run pylint app/
# Frontend
cd frontend
npm run lint- Database: Use PostgreSQL or MySQL for production
- Redis: Configure Redis for session management and caching
- File Storage: Use cloud storage (AWS S3, Azure Blob) for documents
- Environment: Set production environment variables
- Process Manager: Use systemd, supervisor, or Docker
- Build:
npm run build - Serve: Use nginx, Apache, or cloud hosting
- CDN: Configure CDN for static assets
# Build and run with Docker Compose
docker-compose up -d- JWT-based authentication with refresh tokens
- Rate limiting on API endpoints
- File type validation and size limits
- SQL injection protection via SQLAlchemy
- CORS configuration for frontend integration
- Secure password hashing with bcrypt
- SQLite database (suitable for development/small scale)
- In-memory processing for document analysis
- Basic caching with Redis
- Database: PostgreSQL with connection pooling
- Document Processing: Async processing with Celery
- Caching: Redis cluster for distributed caching
- Load Balancing: Multiple backend instances
- CDN: CloudFront/Akamai for static assets
For production use with large documents:
- Implement streaming file processing
- Use background tasks for document analysis
- Implement document chunking for very large files
- Add progress tracking for long-running operations
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
For support and questions:
- Create an issue in the GitHub repository
- Check the documentation
- Review the API endpoints
- Advanced AI: Integration with real LLM APIs (OpenAI, Anthropic)
- Document Indexing: Elasticsearch for advanced search capabilities
- Real-time Updates: WebSocket support for live document processing
- Advanced Analytics: Machine learning insights and trend analysis
- Multi-language Support: Internationalization for global use
- API Versioning: Proper API versioning for production use
- Monitoring: Prometheus metrics and Grafana dashboards
- CI/CD: Automated testing and deployment pipelines