Skip to content

keon/seq2seq

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mini seq2seq

Minimal Seq2Seq model with attention for neural machine translation in PyTorch.

This implementation focuses on the following features:

  • Modular structure to be used in other projects
  • Minimal code for readability
  • Full utilization of batches and GPU.

Dataset (Multi30k DE→EN) is loaded via HuggingFace datasets; tokenization uses spaCy.

Model description

Requirements

  • Python 3.9+
  • PyTorch >= 2.0 (CPU, CUDA, or Apple MPS)
  • datasets (HuggingFace, replaces torchtext)
  • Spacy >= 3.7
pip install -r requirements.txt
python -m spacy download de_core_news_sm
python -m spacy download en_core_web_sm

Train

python train.py -epochs 30 -batch_size 32 -lr 3e-4

Device is auto-detected (CUDA → MPS → CPU). Smaller -hidden_size / -embed_size flags are useful for CPU smoke runs.

Sanity check (CPU, 500 batches, hidden=128/embed=64):

step train loss perplexity
init 9.19 9803
50 6.98 1071
100 5.48 239
250 5.15 173
500 4.84 127

Final val loss: 4.93 (random-init prior is log(|V|) ≈ 9.19).

References

Based on the following implementations

About

Minimal Seq2Seq model with Attention for Neural Machine Translation in PyTorch

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages