This repository contains a Jupyter Notebook that breaks down the implementation of a Transformer model using PyTorch. It provides a "from-scratch" approach to understanding how modern NLP models process sequential data.
The project follows the architecture introduced in the landmark paper "Attention is All You Need." It focuses on the modular implementation of the encoder and decoder components.
- Input Embeddings: Converting token IDs into dense vectors scaled by the square root of the model dimension.
- Positional Encoding: Using sine and cosine functions to inject sequence order information into embeddings.
- Multi-Head Attention: A custom class implementation that handles linear projections for Queries, Keys, and Values, head splitting, and attention weight computation.
- Model Inspection: Visualizing the full
nn.Transformerobject structure, including encoder/decoder layers, normalization, and dropout.
- Clone the repository:
git clone https://github.com/Joe-Naz01/transformers.git cd transformers conda create -n transformers conda activate transformers pip install requirements.txt