Skip to content
#

transformer-training

Here is 1 public repository matching this topic...

Advanced Mathematical Optimization & Deep Learning Optimizers from scratch. Covers KKT duality, L-BFGS, proximal methods (ADMM, FISTA), stochastic algorithms (SVRG, Lion), and cutting-edge deep learning optimizers like K-FAC, Shampoo, Sophia, SAM, and Muon. Bridging strict convex calculus with large-scale Transformer training.

  • Updated May 27, 2026
  • Jupyter Notebook

Improve this page

Add a description, image, and links to the transformer-training topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the transformer-training topic, visit your repo's landing page and select "manage topics."

Learn more