This project demonstrates the implementation of a binary classification model using Logistic Regression on the Breast Cancer Wisconsin Diagnostic Dataset.
Build a binary classifier using logistic regression and evaluate its performance using various metrics and visualization tools.
- Python
- Scikit-learn
- Pandas
- Matplotlib
- Seaborn
The dataset used is the Breast Cancer Wisconsin Diagnostic dataset, which contains features computed from digitized images of fine needle aspirates (FNA) of breast masses.
- Data Loading & Exploration
- Preprocessing
(i) Dropping unnecessary columns
(ii) Converting categorical labels to binary - Train/Test Split and Feature Standardization
- Model Training with Logistic Regression
- Model Evaluation
(i) Confusion Matrix
(ii) Precision, Recall, F1-score
(iii) ROC-AUC score
(iv) ROC Curve Visualization - Threshold Tuning
Precision and recall at different thresholds - Sigmoid Function Explanation
-
Achieved high accuracy and AUC using Logistic Regression.
-
Tuned threshold to observe trade-offs between precision and recall.