Skip to content

Rivalry11/fincrime-sql-analytics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🔍 FinCrime SQL Analytics

SQL-driven financial crime analysis on a simulated fintech transactional database. Ten progressive queries — from foundational aggregations to advanced window functions — answering real compliance and analytics questions.

Built as a portfolio project to demonstrate SQL proficiency for Data Analyst and Financial Crime Analyst roles in fintech.


📋 Overview

This project takes a relational database with 4 tables (users, merchants, transactions, alerts) and answers 10 business questions that a compliance or analytics team at a neobank would face daily. Each query teaches a SQL concept while solving a real problem.

Database schema:

users (3,000 rows)          merchants (500 rows)
├── user_id (PK)            ├── merchant_id (PK)
├── signup_date             ├── merchant_name
├── country                 ├── category
├── age                     ├── country
├── risk_tier               └── is_high_risk
└── is_active

transactions (200,000 rows)     alerts (4,000 rows)
├── transaction_id (PK)         ├── alert_id (PK)
├── user_id (FK → users)        ├── transaction_id (FK → transactions)
├── merchant_id (FK → merchants)├── alert_date
├── timestamp                   ├── status
├── amount_usd                  ├── analyst_id
├── country                     └── resolution_days
├── channel
└── is_fraud

🎯 Queries and business questions

Level 1 — Aggregations & GROUP BY

# Business question SQL concepts
Q1 Top 5 countries by transaction volume GROUP BY, COUNT, SUM, ORDER BY, LIMIT
Q2 Fraud rate by merchant category JOIN, AVG on binary column
Q3 High-volume users with suspicious activity GROUP BY, HAVING

Level 2 — CASE WHEN, Subqueries, Multiple JOINs

# Business question SQL concepts
Q4 Classify transactions into risk bands CASE WHEN, conditional categorization
Q5 Countries with above-average fraud rates Subquery in HAVING
Q6 Full alert investigation view (3-table join) Multiple JOINs, table aliases
Q7 Analyst productivity and detection accuracy CASE WHEN inside SUM, NULLIF

Level 3 — CTEs & Window Functions

# Business question SQL concepts
Q8 Top users by spending with global % share CTE (WITH), ROW_NUMBER, CROSS JOIN
Q9 Time between consecutive user transactions LAG, PARTITION BY, velocity detection
Q10 Transaction amount vs user's rolling average Window frames (ROWS BETWEEN), anomaly scoring

🔬 Key technical highlights

Velocity detection (Q9): uses LAG() OVER (PARTITION BY user_id ORDER BY timestamp) to calculate time gaps between consecutive transactions per user. Transactions with gaps of seconds or minutes indicate card testing or account takeover — the same logic behind real-time fraud monitoring systems.

Anomaly scoring (Q10): computes a rolling average of each user's last 10 transactions using ROWS BETWEEN 10 PRECEDING AND 1 PRECEDING (excluding the current row to prevent data leakage), then flags transactions where the amount exceeds 5x the user's baseline. This is the SQL equivalent of the amount_vs_user_avg feature in the FinCrime Transaction Monitor ML project.

Analyst KPIs (Q7): calculates precision (confirmed fraud / total closed alerts) per analyst using conditional aggregation (CASE WHEN inside SUM), demonstrating how to build operational dashboards from raw alert data.


🔗 Connection to FinCrime Transaction Monitor

This project complements the FinCrime Transaction Monitor (XGBoost + Streamlit). The ML project engineers features in Python; this project shows the same logic can be implemented in pure SQL:

ML feature (Python) SQL equivalent (this project)
tx_count_1h (velocity) Q9: LAG + time delta
amount_vs_user_avg (anomaly) Q10: rolling AVG with ROWS BETWEEN
is_high_risk_mcc (risk flag) Q4: CASE WHEN risk bands

In production, feature engineering lives in SQL (executed in BigQuery, Snowflake, or a feature store), and the ML model consumes the results. This project demonstrates fluency in both approaches.


🚀 How to run

Prerequisites

  • Python 3.10+
  • ~50 MB free disk space

Setup

git clone https://github.com/Rivalry11/fincrime-sql-analytics.git
cd fincrime-sql-analytics

python -m venv venv
source venv/bin/activate          # Windows: venv\Scripts\activate
pip install -r requirements.txt

Run

# Generate database (if not present)
python data/generate_db.py

# Open notebook
jupyter notebook notebooks/fincrime_sql_analytics.ipynb

The database file data/fincrime.db is included in the repo, so you can open the notebook directly without generating data.


🛠️ Tech stack

  • DuckDB — in-process analytical database (no server setup required)
  • SQL — all analysis in pure SQL, no Python data manipulation
  • Python — only for database generation and notebook execution
  • Jupyter — interactive notebook with documented queries

📚 SQL concepts covered

Fundamentals: SELECT, WHERE, GROUP BY, HAVING, ORDER BY, LIMIT, JOIN, aggregate functions (COUNT, SUM, AVG, ROUND)

Intermediate: CASE WHEN, subqueries, multiple JOINs (3+ tables), conditional aggregation (CASE inside SUM), NULLIF, IN operator

Advanced: Common Table Expressions (WITH/CTE), window functions (ROW_NUMBER, LAG), PARTITION BY, window frames (ROWS BETWEEN), CROSS JOIN, EXTRACT


👤 Author

Camila Rubio Cuellar Data Analyst transitioning to Financial Crime Analytics LinkedIn · GitHub


Built April 2026 as part of an active job search portfolio.

About

SQL-driven financial crime analysis with 10 progressive queries, from GROUP BY to window functions, on a simulated fintech database.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors