Unsupervised Learning Projects

This repository contains three data science projects demonstrating unsupervised learning techniques, including clustering, dimensionality reduction, and recommendation systems. These projects were completed as part of advanced machine learning coursework and showcase practical applications of data science in business contexts.

Projects Overview

Project	Technique	Business Application	Key Outcome
Wholesale Customer Segmentation	K-Means Clustering, Hierarchical Clustering	Market segmentation for B2B wholesale distributor	Identified distinct customer segments based on purchasing patterns
Restaurant Recommendation System	Collaborative Filtering, Matrix Factorization	Personalized restaurant recommendations	Built recommendation engine for improved customer experience
Employee Retention Analysis	K-Means Clustering, PCA	HR analytics and talent retention	Segmented employees and provided actionable retention strategies

Technologies Used

Python 3.x
Core Libraries:
- pandas - Data manipulation and analysis
- NumPy - Numerical computing
- scikit-learn - Machine learning algorithms
- Matplotlib & Seaborn - Data visualization
- scipy - Statistical analysis and hierarchical clustering
Machine Learning Techniques:
- K-Means Clustering
- Hierarchical Clustering (Agglomerative)
- Principal Component Analysis (PCA)
- Collaborative Filtering
- Standardization & Normalization
- Silhouette Analysis
- Elbow Method

Project Details

1. Wholesale Customer Segmentation

File: section05_clustering_project-working.ipynb

Objective: Segment wholesale customers based on their annual spending across six product categories to enable targeted marketing strategies.

Dataset: Wholesale customer data containing annual spending on Fresh products, Milk, Grocery, Frozen goods, Detergents & Paper, and Delicatessen items for 440 customers.

Methodology:

Data preprocessing and standardization using StandardScaler
Exploratory data analysis to understand spending patterns
K-Means clustering with optimal cluster determination via elbow method
Hierarchical clustering with dendrogram visualization
Cluster validation using silhouette scores
Cluster profiling and interpretation

Key Findings:

Successfully identified distinct customer segments based on purchasing behavior
Different customer types show clear preferences for specific product categories
Segments can be targeted with customized marketing and inventory strategies

Business Impact: Enables the wholesale distributor to develop segment-specific marketing campaigns, optimize inventory management, and improve customer relationship management.

2. Restaurant Recommendation System

File: section09_recommender_project_working.ipynb

Objective: Build a collaborative filtering recommendation system to suggest restaurants to users based on historical rating patterns.

Dataset: Restaurant ratings dataset containing 1,161 ratings from multiple consumers across various restaurants, with ratings on a 0-2 scale.

Methodology:

Created user-item rating matrix with consumers as rows and restaurants as columns
Implemented collaborative filtering approach
Handled sparse matrix with mean imputation strategy
Generated personalized restaurant recommendations based on user similarity
Evaluated recommendation quality through rating predictions

Key Findings:

Successfully built a recommendation engine that identifies similar users
System can predict ratings for unvisited restaurants
Mean-centering improves recommendation accuracy by accounting for user rating tendencies

Business Impact: Enhances customer experience through personalized restaurant suggestions, potentially increasing customer engagement and satisfaction.

3. Employee Retention Analysis

File: section11_final_project_working.ipynb

Objective: Analyze employee data to identify distinct workforce segments and develop targeted retention strategies to reduce attrition.

Dataset: Employee dataset including demographic information (Age, Gender), job characteristics (JobLevel, Department, MonthlyIncome, PerformanceRating, JobSatisfaction), and attrition status.

Methodology:

Data Preparation & EDA
- Data cleaning and type validation
- Exploratory analysis of employee characteristics
- Encoding of categorical variables
Initial Clustering (Round 1)
- K-Means clustering on raw features
- Optimal cluster determination using elbow method and silhouette analysis
PCA for Visualization (Round 1)
- Dimensionality reduction to 2D for cluster visualization
- Assessment of explained variance
Refined Clustering (Round 2)
- Feature standardization
- Re-application of K-Means clustering
- Improved cluster separation
Enhanced Visualization (Round 2)
- PCA-based 2D visualization of refined clusters
- Cluster interpretation and profiling
Cluster Analysis & Recommendations
- Deep dive into each cluster's characteristics
- Correlation of cluster membership with attrition rates
- Development of segment-specific retention strategies

Key Findings:

Identified distinct employee segments with varying attrition risks
Uncovered relationships between job satisfaction, income, performance, and retention
Different employee segments require tailored retention approaches

Business Impact: Provides HR with actionable insights to reduce turnover through targeted interventions, potentially saving significant recruitment and training costs.

Key Skills Demonstrated

Technical Skills

Machine Learning: K-Means, Hierarchical Clustering, PCA, Collaborative Filtering
Data Analysis: EDA, statistical analysis, pattern recognition
Data Preprocessing: Standardization, normalization, encoding, missing value handling
Python Programming: Efficient use of pandas, NumPy, scikit-learn
Visualization: Creating insightful plots with Matplotlib and Seaborn

Analytical Skills

Feature selection and engineering
Cluster validation and interpretation
Model evaluation and optimization
Hyperparameter tuning (determining optimal k)
Business insight generation from technical analysis

Domain Knowledge

Customer segmentation strategies
Recommendation system design
HR analytics and workforce management
Understanding of business metrics and KPIs

Author

Charles - Data Scientist.

License

MIT License - See LICENSE file for details.

Acknowledgments

Projects completed as part of Maven Data Science in Python: Unsupervised Learning coursework

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
section05_clustering_project.ipynb		section05_clustering_project.ipynb
section09_recommender_project.ipynb		section09_recommender_project.ipynb
section11_final_project.ipynb		section11_final_project.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unsupervised Learning Projects

Table of Contents

Projects Overview

Technologies Used

Project Details

1. Wholesale Customer Segmentation

2. Restaurant Recommendation System

3. Employee Retention Analysis

Key Skills Demonstrated

Technical Skills

Analytical Skills

Domain Knowledge

Author

License

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

License

dataville/unsupervised_learning_class

Folders and files

Latest commit

History

Repository files navigation

Unsupervised Learning Projects

Table of Contents

Projects Overview

Technologies Used

Project Details

1. Wholesale Customer Segmentation

2. Restaurant Recommendation System

3. Employee Retention Analysis

Key Skills Demonstrated

Technical Skills

Analytical Skills

Domain Knowledge

Author

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages