A beginner-friendly machine learning project that trains a Random Forest classifier on the classic Iris dataset to predict flower species with high accuracy — covering the full ML pipeline from data loading to model evaluation.
- Python 3 — Core programming language
- pandas — Used to load and structure the dataset into a readable DataFrame
- scikit-learn — Provides the Iris dataset, train/test splitting, Random Forest model, and evaluation metrics
- matplotlib — Imported for potential data visualization support
- RandomForestClassifier — An ensemble learning model that combines multiple decision trees for accurate predictions
- accuracy_score & classification_report — Metrics used to measure how well the model performs
- Load the dataset — The built-in Iris dataset is loaded from scikit-learn and converted into a pandas DataFrame with proper column names
- Explore the data — The first 5 rows of features (sepal length, sepal width, petal length, petal width) are printed for inspection
- Split the data — The dataset is divided into 80% training and 20% testing using
train_test_splitwith a fixedrandom_state=42for reproducibility - Initialize the model — A
RandomForestClassifieris created withrandom_state=42to ensure consistent results across runs - Train the model — The model is fitted on the training data using
model.fit(X_train, y_train) - Make predictions — The trained model predicts flower species for the unseen test data
- Evaluate performance —
accuracy_scorecalculates overall accuracy andclassification_reportbreaks down precision, recall, and F1-score per class
- How to build a complete machine learning pipeline in Python — from loading and splitting data to training and evaluating a classifier using scikit-learn
- How Random Forest works as an ensemble model, and how metrics like accuracy, precision, recall, and F1-score are used to measure real model performance