This project builds a behavioral profile table for users based on their transaction history.
It uses SQL (SQLite) and Python (Pandas + SQLAlchemy) to transform raw transactional data into analytical features that describe customer engagement, activity level, and points accumulation patterns.
The goal was to simulate a real-world ETL pipeline that extracts, transforms, and loads behavioral metrics across multiple time windows (Life, D7, D14, D28, D56).
Each record in the final table represents a user, enriched with behavioral metrics:
- Total transaction count over multiple periods (Life, D7, D14, D28, D56).
- Days since last transaction.
- Customer age in the dataset.
- Most frequently used product (Life, D7, D14, D28, D56).
- Current points balance.
- Accumulated positive points (Life, D7, D14, D28, D56).
- Accumulated negative points (Life, D7, D14, D28, D56).
- Most active day of the week (D56).
- Most active time of day (D28) — Morning, Afternoon, Night, or Early Morning.
- Engagement ratio comparing activity in D56 versus lifetime.
These features can be used for customer segmentation, churn analysis, or predictive modeling.
etl_project.sql defines all transformations using CTEs (Common Table Expressions) to calculate metrics for each customer.
etl.py automates the process:
- Iterates through multiple reference dates.
- Runs the SQL transformations for each date.
- Appends the results to the feature table
tb_feature_clients_storeinside the SQLite database. - The result is a feature store ready for analytics or machine learning pipelines.
SQLite — local database.
SQL — data transformations and aggregation.
Python — ETL orchestration and automation.
Pandas — data manipulation.
SQLAlchemy — database connection.
The original SQLite file (database.db) could not be uploaded because GitHub has a 25 MB file size limit.
You can download it from Google Drive here:
Download database.db 👉 Google Drive Link
- Clone this repository.
- Download the
database.dbfile and place it in the project directory. - Run the ETL process:
python etl.py