Data Science Student @ ITERA
I design and build scalable, reliable data systems — from real-time streaming pipelines to batch lakehouse architectures. My focus is on production engineering practices: idempotent pipelines, data quality enforcement, fault tolerance, and reproducibility.
| Domain | Technologies |
|---|---|
| Orchestration & Infrastructure | |
| Data Processing | |
| Streaming | |
| Storage & Warehouse | |
| Data Quality & Observability | |
| Other |
Kafka · Spark · dbt · Airflow · FastAPI · DuckDB · Docker
Designed and implemented a production-oriented data platform simulating real-world e-commerce analytics with streaming ingestion and robust failure handling.
- Architecture: Real-time pipeline using Kafka + Medallion Architecture (Bronze → Silver → Gold)
- Reliability: Implemented Dead Letter Queue (DLQ) and idempotent pipelines (safe retries, zero duplication)
- Data Quality: Enforced validation gates at the Silver layer before downstream processing
- Orchestration: Managed workflows using Airflow DAGs with failure handling
- Serving Layer: FastAPI + DuckDB for zero-ETL analytics endpoints (revenue, funnel, traffic, top products)
- Engineering Practice: Documented all major decisions using Architecture Decision Records (ADR)
Dagster · dbt Core · DuckDB · Soda Core · GitHub Actions
Built a modern data stack (MDS) with strong data contracts and full automation.
- Implemented SCD Type 2 for historical change tracking
- Fully automated CI/CD pipeline with testing & validation on every push
- Enforced data quality checks as deployment gates via Soda Core
Apache Spark · Apache Airflow · MinIO · PostgreSQL · Great Expectations
- Processed 2.9M+ records using distributed Spark clusters
- Built an end-to-end pipeline with automated data validation
- Simulated cloud data lake storage using MinIO (S3-compatible)
dbt Core · PostgreSQL · Docker
- Designed a star schema data warehouse for analytics-ready reporting
- Built modular transformations with dbt, including tests and lineage tracking
Apache Airflow · Docker · PostgreSQL · Python
- Developed a fault-tolerant ETL pipeline for real-time market data ingestion
- Implemented retry logic, scheduling, and containerized deployment
Azure · SSIS · SQL Server
- Role: Principal Data Engineer & Team Lead
- Delivered an institutional warehouse with dimensional modeling and cloud ETL pipelines on Azure
PyTorch · EfficientNet · Docker · DVC
- Built a fully reproducible ML pipeline with Docker + DVC + fixed seeds
- Achieved ROC-AUC 0.9801 on blind test set with quality-aware loss engineering
Primary focus is Data Engineering & platform reliability — ML projects demonstrate systems thinking applied to the full model lifecycle.