Skip to content

JadeXu16/CNN-Custom-Layers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PyTorch CNN Implementation: A Deep Dive into Custom BatchNorm & Dropout

This project is a hands-on exploration of key components in modern Convolutional Neural Networks. Using the STL-10 dataset, it follows a four-part journey: we begin with a simple baseline CNN, diagnose its weaknesses, implement foundational layers (BatchNorm, Dropout) from scratch to fix them, and finally, contrast this approach with the power of modern Transfer Learning (ResNet-18).

All code, analysis, and experiments are contained in the main.ipynb notebook.

The Project's Journey in Four Parts

This repository is structured as a progressive exploration, with each part building on the last.

Part I: The Baseline (ToyCNN)

We first implement a small, two-layer CNN (ToyCNN) to establish a performance baseline. The goal is to observe its raw performance and, more importantly, analyze its training curves to diagnose its primary weakness.

  • Key Result: The baseline model quickly overfits the training data, motivating the need for more advanced stabilization and regularization techniques.

Part II: Building the Toolkit (From Scratch)

Before improving our model, we build the tools. This section is a "deep dive" into two critical layers, implementing them from first principles to understand why they work.

  • Batch Normalization: Implemented from scratch (including forward and backward passes) to build a deep intuition for how it achieves faster, more stable training by correcting for internal covariate shift.
  • Dropout: Implemented the inverted dropout technique from scratch to fundamentally understand how it acts as a powerful regularizer and helps a model achieve better generalization by preventing neural co-adaptation.

Part III: Iterative Improvement (Building ToyCNNModified)

Armed with a deep, first-principles understanding of how these layers work (from Part II), we now apply these concepts iteratively. We will build our final ToyCNNModified in stages, adding one feature at a time to observe its specific impact. This methodical process is central to machine learning engineering.

  • Step 3.1: Tackling Overfitting with Dropout: First, we apply the library's nn.Dropout to our baseline model to control the severe overfitting diagnosed in Part I.

  • Step 3.2: Increasing Capacity by Going Deeper: With regularization in place, we increase the model's capacity by adding another convolutional block, allowing it to learn more complex features.

  • Step 3.3: Stabilizing the Deeper Network with Batch Norm: Finally, we add the library's nn.BatchNorm to stabilize the training of our new, deeper network, solving the internal covariate shift problem we studied in Part II.

  • Key Result: The final ToyCNNModified combines all three improvements, resulting in a model that successfully overcomes overfitting, has sufficient capacity, and trains stably, achieving a significant boost in validation accuracy.

Part IV: The State-of-the-Art (Transfer Learning with ResNet-18)

Finally, we explore the most common and powerful technique in applied computer vision. Instead of training from scratch, we adapt a ResNet-18 model pre-trained on ImageNet, leveraging its powerful learned features.

  • Key Result: By only fine-tuning the final classification layer, this model achieves state-of-the-art performance ( >90% accuracy) with minimal training, demonstrating the clear advantage of transfer learning for most practical tasks.

Performance Summary

This table, drawn from main.ipynb, summarizes the journey:

Stage What was Implemented Main Benefit Typical Outcome (Val Acc)
I. ToyCNN (Baseline) A basic CNN + training loop Establish baseline & diagnose overfitting >50%
II. Custom Layers BatchNorm & Dropout from scratch Understand internals by building them (Implementations validated by tests)
III. ToyCNNModified Iterative Improvement:
1. Add library Dropout
2. Go Deeper
3. Add library BatchNorm
Systematic improvement & generalization > 60% (Final Model)
IV. ResNet-18 Head swap + fine-tuning Leverage pre-trained models for SOTA results > 90%

How to Run

  1. Clone the project

    git clone https://github.com/<your-user>/CNN-Custom-Layers.git
    cd CNN-Custom-Layers
  2. Create an environment & install dependencies

    • Conda (recommended):
      bash conda/install.sh
      conda activate cnn_custom_layers
      The helper script provisions everything defined in conda/environment.yml and finishes with pip install -e . so the local cs6740 package is importable from anywhere.
    • Plain pip (if you already have Python ≥3.9 available):
      python -m venv .venv
      source .venv/bin/activate
      pip install torch torchvision numpy matplotlib pandas altair jupyter pytest
      pip install -e .
      Feel free to mirror the conda dependencies if you need additional tooling such as bandit or mypy.
  3. (Optional but recommended) Run the unit tests

    pytest

    These tests validate the from-scratch BatchNorm, Dropout, CNN architectures, and the ResNet fine-tuning wrapper before you start experimenting.

  4. Launch the notebook

    jupyter lab main.ipynb  # or `jupyter notebook main.ipynb`

    Execute the notebook top to bottom. The first setup cell mirrors the steps above when running in Colab; on a local machine you can skip the duplicate installations if your environment is already prepared.

  5. Data & checkpoints

    • When you first instantiate cs6740.image_loader.ImageLoader, PyTorch will download STL-10 into ./data/ automatically (or whatever path you set in the setup cell).
    • Model checkpoints are written to ./model_checkpoints/ by default; create the folder beforehand if you want to persist intermediate models.

Acknowledgements

This project was originally based on a course assignment. The foundational code (the Solver module, data loaders, and general project structure) was provided.

My core contributions and the focus of this repository are:

  • The complete from-scratch implementation (forward and backward pass) of the Batch Normalization and Dropout layers.
  • The architecture design and implementation of ToyCNN and ToyCNNModified.
  • The setup and fine-tuning logic for the MyResNet18 transfer learning model.
  • All experiment analysis, visualization, and discussion in the main.ipynb notebook.

About

This project explores key components of modern CNNs using the STL-10 dataset. We start with a simple baseline CNN, identify its weaknesses, implement BatchNorm and Dropout from scratch to address them, and finally compare this with modern Transfer Learning.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors