Skip to content
View KMoex-HZ's full-sized avatar
  • Open for Remote Opportunities
  • Remote / GMT+7
  • 17:17 (UTC -12:00)

Block or report KMoex-HZ

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
KMoex-HZ/README.md

Khairunnisa Maharani

Data Engineer | Building Production-Ready Data Platforms

Data Science Student @ ITERA

I design and build scalable, reliable data systems — from real-time streaming pipelines to batch lakehouse architectures. My focus is on production engineering practices: idempotent pipelines, data quality enforcement, fault tolerance, and reproducibility.


🛠️ Core Engineering Stack

Domain Technologies
Orchestration & Infrastructure Docker Airflow Dagster
Data Processing Python Spark dbt DuckDB
Streaming Kafka
Storage & Warehouse PostgreSQL MinIO Azure
Data Quality & Observability Great Expectations Soda
Other GitHub Actions Bash

🚀 Featured Data Engineering Projects

Kafka · Spark · dbt · Airflow · FastAPI · DuckDB · Docker

Designed and implemented a production-oriented data platform simulating real-world e-commerce analytics with streaming ingestion and robust failure handling.

  • Architecture: Real-time pipeline using Kafka + Medallion Architecture (Bronze → Silver → Gold)
  • Reliability: Implemented Dead Letter Queue (DLQ) and idempotent pipelines (safe retries, zero duplication)
  • Data Quality: Enforced validation gates at the Silver layer before downstream processing
  • Orchestration: Managed workflows using Airflow DAGs with failure handling
  • Serving Layer: FastAPI + DuckDB for zero-ETL analytics endpoints (revenue, funnel, traffic, top products)
  • Engineering Practice: Documented all major decisions using Architecture Decision Records (ADR)

Dagster · dbt Core · DuckDB · Soda Core · GitHub Actions

CI/CD

Built a modern data stack (MDS) with strong data contracts and full automation.

  • Implemented SCD Type 2 for historical change tracking
  • Fully automated CI/CD pipeline with testing & validation on every push
  • Enforced data quality checks as deployment gates via Soda Core

Apache Spark · Apache Airflow · MinIO · PostgreSQL · Great Expectations

  • Processed 2.9M+ records using distributed Spark clusters
  • Built an end-to-end pipeline with automated data validation
  • Simulated cloud data lake storage using MinIO (S3-compatible)

dbt Core · PostgreSQL · Docker

  • Designed a star schema data warehouse for analytics-ready reporting
  • Built modular transformations with dbt, including tests and lineage tracking

Apache Airflow · Docker · PostgreSQL · Python

  • Developed a fault-tolerant ETL pipeline for real-time market data ingestion
  • Implemented retry logic, scheduling, and containerized deployment

Azure · SSIS · SQL Server

  • Role: Principal Data Engineer & Team Lead
  • Delivered an institutional warehouse with dimensional modeling and cloud ETL pipelines on Azure

🧠 Additional Experience

PyTorch · EfficientNet · Docker · DVC

  • Built a fully reproducible ML pipeline with Docker + DVC + fixed seeds
  • Achieved ROC-AUC 0.9801 on blind test set with quality-aware loss engineering

Primary focus is Data Engineering & platform reliability — ML projects demonstrate systems thinking applied to the full model lifecycle.


📊 GitHub Analytics

Pinned Loading

  1. glowcart glowcart Public

    End-to-end e-commerce data platform | Kafka · Airflow · Spark · dbt · Docker

    Python 2 1

  2. modern-data-platform-dagster modern-data-platform-dagster Public

    A production-grade Modern Data Stack (MDS) implementation featuring automated ELT, SCD Type 2 history tracking, and CI/CD quality guardrails using Dagster, dbt Core, DuckDB, and Soda.

    Python 1

  3. nyc-taxi-pipeline-spark-airflow nyc-taxi-pipeline-spark-airflow Public

    An automated end-to-end data pipeline using Apache Airflow, Spark, and MinIO for processing NYC Taxi datasets. Features containerized infrastructure (Docker), distributed transformations, and data …

    Python

  4. LPMPP-Data-Warehouse-Project LPMPP-Data-Warehouse-Project Public

    LPMPP Data Mart: Institutional Quality Assurance Analytics

    TSQL