This project is a hands-on exploration of key components in modern Convolutional Neural Networks. Using the STL-10 dataset, it follows a four-part journey: we begin with a simple baseline CNN, diagnose its weaknesses, implement foundational layers (BatchNorm, Dropout) from scratch to fix them, and finally, contrast this approach with the power of modern Transfer Learning (ResNet-18).
All code, analysis, and experiments are contained in the main.ipynb notebook.
This repository is structured as a progressive exploration, with each part building on the last.
We first implement a small, two-layer CNN (ToyCNN) to establish a performance baseline. The goal is to observe its raw performance and, more importantly, analyze its training curves to diagnose its primary weakness.
- Key Result: The baseline model quickly overfits the training data, motivating the need for more advanced stabilization and regularization techniques.
Before improving our model, we build the tools. This section is a "deep dive" into two critical layers, implementing them from first principles to understand why they work.
- Batch Normalization: Implemented from scratch (including forward and backward passes) to build a deep intuition for how it achieves faster, more stable training by correcting for internal covariate shift.
- Dropout: Implemented the inverted dropout technique from scratch to fundamentally understand how it acts as a powerful regularizer and helps a model achieve better generalization by preventing neural co-adaptation.
Armed with a deep, first-principles understanding of how these layers work (from Part II), we now apply these concepts iteratively. We will build our final ToyCNNModified in stages, adding one feature at a time to observe its specific impact. This methodical process is central to machine learning engineering.
-
Step 3.1: Tackling Overfitting with Dropout: First, we apply the library's
nn.Dropoutto our baseline model to control the severe overfitting diagnosed in Part I. -
Step 3.2: Increasing Capacity by Going Deeper: With regularization in place, we increase the model's capacity by adding another convolutional block, allowing it to learn more complex features.
-
Step 3.3: Stabilizing the Deeper Network with Batch Norm: Finally, we add the library's
nn.BatchNormto stabilize the training of our new, deeper network, solving the internal covariate shift problem we studied in Part II. -
Key Result: The final
ToyCNNModifiedcombines all three improvements, resulting in a model that successfully overcomes overfitting, has sufficient capacity, and trains stably, achieving a significant boost in validation accuracy.
Finally, we explore the most common and powerful technique in applied computer vision. Instead of training from scratch, we adapt a ResNet-18 model pre-trained on ImageNet, leveraging its powerful learned features.
- Key Result: By only fine-tuning the final classification layer, this model achieves state-of-the-art performance ( >90% accuracy) with minimal training, demonstrating the clear advantage of transfer learning for most practical tasks.
This table, drawn from main.ipynb, summarizes the journey:
| Stage | What was Implemented | Main Benefit | Typical Outcome (Val Acc) |
|---|---|---|---|
| I. ToyCNN (Baseline) | A basic CNN + training loop | Establish baseline & diagnose overfitting | >50% |
| II. Custom Layers | BatchNorm & Dropout from scratch |
Understand internals by building them | (Implementations validated by tests) |
| III. ToyCNNModified | Iterative Improvement: 1. Add library Dropout 2. Go Deeper 3. Add library BatchNorm |
Systematic improvement & generalization | > 60% (Final Model) |
| IV. ResNet-18 | Head swap + fine-tuning | Leverage pre-trained models for SOTA results | > 90% |
-
Clone the project
git clone https://github.com/<your-user>/CNN-Custom-Layers.git cd CNN-Custom-Layers
-
Create an environment & install dependencies
- Conda (recommended):
The helper script provisions everything defined in
bash conda/install.sh conda activate cnn_custom_layers
conda/environment.ymland finishes withpip install -e .so the localcs6740package is importable from anywhere. - Plain
pip(if you already have Python ≥3.9 available):Feel free to mirror the conda dependencies if you need additional tooling such aspython -m venv .venv source .venv/bin/activate pip install torch torchvision numpy matplotlib pandas altair jupyter pytest pip install -e .
banditormypy.
- Conda (recommended):
-
(Optional but recommended) Run the unit tests
pytest
These tests validate the from-scratch
BatchNorm,Dropout, CNN architectures, and the ResNet fine-tuning wrapper before you start experimenting. -
Launch the notebook
jupyter lab main.ipynb # or `jupyter notebook main.ipynb`Execute the notebook top to bottom. The first setup cell mirrors the steps above when running in Colab; on a local machine you can skip the duplicate installations if your environment is already prepared.
-
Data & checkpoints
- When you first instantiate
cs6740.image_loader.ImageLoader, PyTorch will download STL-10 into./data/automatically (or whatever path you set in the setup cell). - Model checkpoints are written to
./model_checkpoints/by default; create the folder beforehand if you want to persist intermediate models.
- When you first instantiate
This project was originally based on a course assignment. The foundational code (the Solver module, data loaders, and general project structure) was provided.
My core contributions and the focus of this repository are:
- The complete from-scratch implementation (forward and backward pass) of the Batch Normalization and Dropout layers.
- The architecture design and implementation of
ToyCNNandToyCNNModified. - The setup and fine-tuning logic for the
MyResNet18transfer learning model. - All experiment analysis, visualization, and discussion in the
main.ipynbnotebook.