This project implements a production-style analytics platform for a fictional Seoul-based café selling coffees and cakes.
The system simulates transactional café orders, ingests them into a relational database with proper constraints, and exposes governed analytics through Apache Superset — all running in a fully containerized environment.
🎯 Objective Build, debug, and operate a realistic data engineering pipeline rather than a demo-only dashboard.
┌──────────────────┐
│ Python Ingestion│
│ (Batch Job) │
└───────┬──────────┘
↓
┌──────────────────┐
│ PostgreSQL 15 │
│ Orders & Items │
└───────┬──────────┘
↓
┌──────────────────┐
│ Apache Superset │
│ BI & Dashboards │
└──────────────────┘
(All services orchestrated via Docker Compose)
- Designed around real POS systems
- Supports multi-item orders
- Optimized for analytics & aggregations
- Enforces data integrity via constraints
item_no INT PRIMARY KEY
item_name TEXT
category coffee | cake
description TEXT
price NUMERIC(8,2)(order_no, item_no) PRIMARY KEY
quantity INT
order_ts TIMESTAMPGrain: One row per item per order This enables accurate revenue, quantity, and time-series analysis.
The ingestion service:
- Inserts café menu items (idempotent)
- Generates realistic order volumes
- Creates multi-item orders
- Spreads orders across operating hours
- Enforces foreign-key relationships
📁 Location:
ingestion/ingest.py
The ingestion container runs as a batch job and exits after completion.
Superset is used as the analytics layer, enabling:
- Revenue analysis by time
- Coffee vs cake sales comparison
- Average order size
- Item-level performance
- Peak café hours
postgresql+psycopg2://cafe_user:cafe_pass@postgres:5432/cafe
This project intentionally implements governed analytics, not open access.
| Role | Capabilities |
|---|---|
| Admin | Full access (DB, SQL Lab, users, dashboards) |
| Analyst | Query datasets, build charts & dashboards |
| Viewer | Read-only dashboard access |
- Mirrors enterprise BI environments
- Prevents unrestricted SQL access
- Enables controlled data exposure
- Validates multi-user workflows
The platform was accessed and validated from:
- Multiple laptops
- Browser-only sessions
- Different user roles
This confirmed:
- Stateless UI behavior
- Dockerized service reliability
- Role-specific access enforcement
- Production-like usability
| Service | Purpose |
|---|---|
postgres |
Relational data store |
ingestion |
Batch data generator |
superset |
BI & analytics UI |
All services communicate via a private Docker network.
Superset requires database drivers to be installed inside its runtime environment.
A custom Dockerfile is used to safely enable PostgreSQL support.
📁 superset/Dockerfile
FROM apache/superset:latest
USER root
RUN apt-get update && apt-get install -y \
build-essential \
libpq-dev \
gcc \
&& rm -rf /var/lib/apt/lists/*
RUN pip install --no-cache-dir \
psycopg2-binary \
sqlalchemy \
sqlalchemy-utils \
redis \
pymysql \
clickhouse-connect
USER supersetThis resolves common PostgresEngineSpec errors in Superset.
git clone <repo-url>
cd seoul-cafe-analyticsdocker compose down -v
docker compose build --no-cache
docker compose up- URL: http://localhost:8088
- Username:
admin - Password:
admin
docker exec -it cafe_superset superset shellimport psycopg2
psycopg2.__version__- Superset driver loading issues
- Docker build vs runtime Python environments
- Foreign-key constraint handling
- Duplicate primary key errors
- Secure, role-based analytics access
Harsh Indoria
