Disaster Response Pipeline

Executive Summary

This repository showcases an end-to-end data engineering and machine learning pipeline orchestrated to classify disaster-related messages in real-time. Designed to ingest raw streams (e.g., from social media and texts), the predictive model instantly categorizes distress signals to route them to appropriate emergency response agencies. It demonstrates robust ETL pipeline construction, high-dimensional Natural Language Processing (NLP), and the deployment of a RandomForestClassifier served through a scalable Flask web application.

For an extensive dive into the mathematical decisioning and architectural trade-offs behind this project, see TECHNICAL.md.

System Architecture

flowchart LR
    A[Raw Data CSVs] -->|ETL Script| B(Pandas Cleaning)
    B --> C[(SQLite Database)]
    C -->|ML Pipeline| D{TF-IDF Tokenizer}
    D --> E[Random Forest Classifier]
    E --> F((Model.pkl))
    F -->|Flask App| G[Web Deployment]
    G --> H[End User Visualizations]

Key Engineering Features

Deterministic ETL: Extracts textual data, imputes missing values, engineers deterministic categorization features, and normalizes output into SQLite.
Robust ML Pipeline: Implements scikit-learn via a customized pipeline containing an NLTK-based tokenizer, TF-IDF vectorizer, and a computationally optimized RandomForestClassifier wrapped within a MultiOutputClassifier.
Full-Stack Presentation: Dynamic inference via a Python/Flask web front-end containing interactive Plotly charts tracking dataset distribution metrics.

Quick Start

1. Environment Configuration

Ensure you have an active Python virtual environment (e.g., venv or conda), and run the following commands to initialize the pipeline dependencies.

git clone https://github.com/stephengardnerd/DataEngineering_MLPipeline.git
cd DataEngineering_MLPipeline
pip install -r requirements.txt

2. Execute the ETL Process

This ingestion script merges raw disaster datasets, applies robust preprocessing schemas, and streams the output directly into a relational SQLite database.

python disaster_response_pipeline_project/data/process_data.py \
    disaster_response_pipeline_project/data/disaster_messages.csv \
    disaster_response_pipeline_project/data/disaster_categories.csv \
    disaster_response_pipeline_project/data/DisasterResponse.db

3. Compile the ML Pipeline

This command loads the normalized data, trains the Random Forest algorithm (leveraging a MultiOutput wrapper), and persists the output as a .pkl for dynamic inference.

python disaster_response_pipeline_project/models/train_classifier.py \
    disaster_response_pipeline_project/data/DisasterResponse.db \
    disaster_response_pipeline_project/models/classifier.pkl

4. Deploy the Front-End

Initiate the routing application to interface with the predictive model in real time.

cd disaster_response_pipeline_project/app
python run.py ../data/DisasterResponse.db ../models/classifier.pkl

Navigate to http://0.0.0.0:3001/ to view the running implementation.

Author: Stephen D. Gardner

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
assets		assets
disaster_response_pipeline_project		disaster_response_pipeline_project
.gitignore		.gitignore
AUDIT_REPORT.md		AUDIT_REPORT.md
AUDIT_SUMMARY.md		AUDIT_SUMMARY.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
TECHNICAL.md		TECHNICAL.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Disaster Response Pipeline

Executive Summary

System Architecture

Key Engineering Features

Quick Start

1. Environment Configuration

2. Execute the ETL Process

3. Compile the ML Pipeline

4. Deploy the Front-End

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Disaster Response Pipeline

Executive Summary

System Architecture

Key Engineering Features

Quick Start

1. Environment Configuration

2. Execute the ETL Process

3. Compile the ML Pipeline

4. Deploy the Front-End

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages