A clean, dependency-free implementation of a Multi-Layer Perceptron (MLP) built entirely from scratch using Python and NumPy.
This repository serves as an educational breakdown of foundational deep learning concepts, demonstrating how a network utilizes forward propagation, Mean Squared Error (MSE), and backpropagation via the chain rule to solve the classic non-linear XOR (Exclusive OR) problem.
The XOR gate is a fundamental problem in computational complexity. Because the outputs are not linearly separable, a single-layer perceptron without a hidden layer cannot converge on a solution. This implementation introduces a
| Input |
Input |
Target Output |
|---|---|---|
| 0 | 0 | 0 |
| 0 | 1 | 1 |
| 1 | 0 | 1 |
| 1 | 1 | 0 |
The network maps a 2-dimensional input space through a 4-dimensional hidden representation to output a single scalar probability between 0 and 1.
Data flows sequentially from left to right through matrix multiplication, bias addition, and element-wise activation:
Hidden Layer
The input matrix is projected into a 4-dimensional hidden space:
Dimensions:
$X \in \mathbb{R}^{4 \times 2}$ ,$W_1 \in \mathbb{R}^{2 \times 4}$ , and$b_1 \in \mathbb{R}^{1 \times 4}$
The hidden representations are compressed into a single scalar prediction:
Dimensions:
$a_1 \in \mathbb{R}^{4 \times 4}$ ,$W_2 \in \mathbb{R}^{4 \times 1}$ , and$b_2 \in \mathbb{R}^{1 \times 1}$
We use the Sigmoid activation function to map arbitrary real-valued inputs into a probability distribution curve bounded by
To calculate gradients during the backward pass, we compute its first derivative:
The network measures structural variance via Mean Squared Error (MSE):
To minimize
Parameters are adjusted dynamically in the opposite direction of the gradient step scaled by the learning rate (
The environment requires standard Python 3.x along with numpy for vectorized matrix mathematics.
pip install numpy