mini seq2seq

Minimal Seq2Seq model with attention for neural machine translation in PyTorch.

This implementation focuses on the following features:

Modular structure to be used in other projects
Minimal code for readability
Full utilization of batches and GPU.

Dataset (Multi30k DE→EN) is loaded via HuggingFace datasets; tokenization uses spaCy.

Model description

Encoder: Bidirectional GRU
Decoder: GRU with Attention Mechanism
Attention: Neural Machine Translation by Jointly Learning to Align and Translate

Requirements

Python 3.9+
PyTorch >= 2.0 (CPU, CUDA, or Apple MPS)
datasets (HuggingFace, replaces torchtext)
Spacy >= 3.7

pip install -r requirements.txt
python -m spacy download de_core_news_sm
python -m spacy download en_core_web_sm

Train

python train.py -epochs 30 -batch_size 32 -lr 3e-4

Device is auto-detected (CUDA → MPS → CPU). Smaller -hidden_size / -embed_size flags are useful for CPU smoke runs.

Sanity check (CPU, 500 batches, hidden=128/embed=64):

step	train loss	perplexity
init	9.19	9803
50	6.98	1071
100	5.48	239
250	5.15	173
500	4.84	127

Final val loss: 4.93 (random-init prior is log(|V|) ≈ 9.19).

References

Based on the following implementations

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
model.py		model.py
requirements.txt		requirements.txt
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mini seq2seq

Model description

Requirements

Train

References

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

mini seq2seq

Model description

Requirements

Train

References

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages