SQL-driven financial crime analysis on a simulated fintech transactional database. Ten progressive queries — from foundational aggregations to advanced window functions — answering real compliance and analytics questions.
Built as a portfolio project to demonstrate SQL proficiency for Data Analyst and Financial Crime Analyst roles in fintech.
This project takes a relational database with 4 tables (users, merchants, transactions, alerts) and answers 10 business questions that a compliance or analytics team at a neobank would face daily. Each query teaches a SQL concept while solving a real problem.
Database schema:
users (3,000 rows) merchants (500 rows)
├── user_id (PK) ├── merchant_id (PK)
├── signup_date ├── merchant_name
├── country ├── category
├── age ├── country
├── risk_tier └── is_high_risk
└── is_active
transactions (200,000 rows) alerts (4,000 rows)
├── transaction_id (PK) ├── alert_id (PK)
├── user_id (FK → users) ├── transaction_id (FK → transactions)
├── merchant_id (FK → merchants)├── alert_date
├── timestamp ├── status
├── amount_usd ├── analyst_id
├── country └── resolution_days
├── channel
└── is_fraud
| # | Business question | SQL concepts |
|---|---|---|
| Q1 | Top 5 countries by transaction volume | GROUP BY, COUNT, SUM, ORDER BY, LIMIT |
| Q2 | Fraud rate by merchant category | JOIN, AVG on binary column |
| Q3 | High-volume users with suspicious activity | GROUP BY, HAVING |
| # | Business question | SQL concepts |
|---|---|---|
| Q4 | Classify transactions into risk bands | CASE WHEN, conditional categorization |
| Q5 | Countries with above-average fraud rates | Subquery in HAVING |
| Q6 | Full alert investigation view (3-table join) | Multiple JOINs, table aliases |
| Q7 | Analyst productivity and detection accuracy | CASE WHEN inside SUM, NULLIF |
| # | Business question | SQL concepts |
|---|---|---|
| Q8 | Top users by spending with global % share | CTE (WITH), ROW_NUMBER, CROSS JOIN |
| Q9 | Time between consecutive user transactions | LAG, PARTITION BY, velocity detection |
| Q10 | Transaction amount vs user's rolling average | Window frames (ROWS BETWEEN), anomaly scoring |
Velocity detection (Q9): uses LAG() OVER (PARTITION BY user_id ORDER BY timestamp) to calculate time gaps between consecutive transactions per user. Transactions with gaps of seconds or minutes indicate card testing or account takeover — the same logic behind real-time fraud monitoring systems.
Anomaly scoring (Q10): computes a rolling average of each user's last 10 transactions using ROWS BETWEEN 10 PRECEDING AND 1 PRECEDING (excluding the current row to prevent data leakage), then flags transactions where the amount exceeds 5x the user's baseline. This is the SQL equivalent of the amount_vs_user_avg feature in the FinCrime Transaction Monitor ML project.
Analyst KPIs (Q7): calculates precision (confirmed fraud / total closed alerts) per analyst using conditional aggregation (CASE WHEN inside SUM), demonstrating how to build operational dashboards from raw alert data.
This project complements the FinCrime Transaction Monitor (XGBoost + Streamlit). The ML project engineers features in Python; this project shows the same logic can be implemented in pure SQL:
| ML feature (Python) | SQL equivalent (this project) |
|---|---|
tx_count_1h (velocity) |
Q9: LAG + time delta |
amount_vs_user_avg (anomaly) |
Q10: rolling AVG with ROWS BETWEEN |
is_high_risk_mcc (risk flag) |
Q4: CASE WHEN risk bands |
In production, feature engineering lives in SQL (executed in BigQuery, Snowflake, or a feature store), and the ML model consumes the results. This project demonstrates fluency in both approaches.
- Python 3.10+
- ~50 MB free disk space
git clone https://github.com/Rivalry11/fincrime-sql-analytics.git
cd fincrime-sql-analytics
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt# Generate database (if not present)
python data/generate_db.py
# Open notebook
jupyter notebook notebooks/fincrime_sql_analytics.ipynbThe database file data/fincrime.db is included in the repo, so you can open the notebook directly without generating data.
- DuckDB — in-process analytical database (no server setup required)
- SQL — all analysis in pure SQL, no Python data manipulation
- Python — only for database generation and notebook execution
- Jupyter — interactive notebook with documented queries
Fundamentals: SELECT, WHERE, GROUP BY, HAVING, ORDER BY, LIMIT, JOIN, aggregate functions (COUNT, SUM, AVG, ROUND)
Intermediate: CASE WHEN, subqueries, multiple JOINs (3+ tables), conditional aggregation (CASE inside SUM), NULLIF, IN operator
Advanced: Common Table Expressions (WITH/CTE), window functions (ROW_NUMBER, LAG), PARTITION BY, window frames (ROWS BETWEEN), CROSS JOIN, EXTRACT
Camila Rubio Cuellar Data Analyst transitioning to Financial Crime Analytics LinkedIn · GitHub
Built April 2026 as part of an active job search portfolio.