Skip to content

HAYDARKILIC/optimization_methods

Repository files navigation

Optimization Methods

A six-week, research-oriented curriculum that re-derives modern continuous optimization from first principles, implements every algorithm in pure NumPy and JAX, and benchmarks them on canonical convex problems, deep neural networks, and large-scale machine learning pipelines.

Audience. Graduate students, ML researchers, and engineers who already know multivariable calculus, linear algebra, and basic Python, and want to understand why Adam works, when SGD fails, and how proximal methods turn nonsmooth penalties into closed-form updates.

Philosophy. No black boxes. Every method is derived on paper, implemented in NumPy, validated against a reference (CVXPY for convex problems, PyTorch/Optax for deep learning baselines), and then stress-tested on a non-trivial workload.


Curriculum at a Glance

Week Theme Core Methods Capstone
1 Mathematical foundations Convexity, subdifferentials, smoothness, strong convexity, KKT Geometry of loss landscapes
2 Unconstrained smooth optimization Gradient descent, line search, Nesterov momentum, conjugate gradient, BFGS, L-BFGS, trust region, Gauss–Newton Levenberg–Marquardt for nonlinear least squares
3 Proximal & nonsmooth optimization Subgradient method, proximal gradient, FISTA, ADMM, primal–dual splitting L1/L2 regression, total-variation image denoising
4 Stochastic optimization SGD, mini-batch variance, SVRG, SAGA, Adam, AdamW, AdaGrad, RMSProp, Lookahead, Lion Training a ResNet from scratch
5 Constrained optimization & duality Lagrangian duality, projected gradient, Frank–Wolfe, interior-point, augmented Lagrangian, SQP Portfolio optimization with cardinality constraints
6 Modern deep learning optimizers Second-order methods (K-FAC, Shampoo, Sophia), sharpness-aware minimization (SAM, ASAM), schedule-free methods, Muon, distributed optimization Pre-training a small transformer with three different optimizer families

Each week ships with 3–5 Jupyter notebooks, a lecture.md with the derivations, and a lab.ipynb solving a research-grade problem end-to-end.


Repository Layout

optimization_methods/
├── README.md
├── requirements.txt
├── week1_foundations/
│   ├── lecture.md
│   ├── 01_convex_sets_and_functions.ipynb
│   ├── 02_subdifferentials_and_smoothness.ipynb
│   ├── 03_optimality_conditions_kkt.ipynb
│   └── lab_loss_landscapes.ipynb
├── week2_unconstrained_smooth/
│   ├── lecture.md
│   ├── 01_gradient_descent_and_linesearch.ipynb
│   ├── 02_momentum_and_nesterov_acceleration.ipynb
│   ├── 03_quasi_newton_bfgs_lbfgs.ipynb
│   ├── 04_trust_region_and_gauss_newton.ipynb
│   └── lab_levenberg_marquardt.ipynb
├── week3_proximal_nonsmooth/
│   ├── lecture.md
│   ├── 01_subgradient_method.ipynb
│   ├── 02_proximal_gradient_and_fista.ipynb
│   ├── 03_admm_and_splitting.ipynb
│   └── lab_total_variation_denoising.ipynb
├── week4_stochastic_optim/
│   ├── lecture.md
│   ├── 01_sgd_variance_and_mini_batch.ipynb
│   ├── 02_variance_reduction_svrg_saga.ipynb
│   ├── 03_adaptive_methods_adam_family.ipynb
│   ├── 04_modern_variants_lion_lookahead.ipynb
│   └── lab_train_resnet_from_scratch.ipynb
├── week5_constrained_duality/
│   ├── lecture.md
│   ├── 01_lagrangian_duality.ipynb
│   ├── 02_projected_gradient_and_frank_wolfe.ipynb
│   ├── 03_interior_point_methods.ipynb
│   ├── 04_augmented_lagrangian_and_sqp.ipynb
│   └── lab_constrained_portfolio.ipynb
├── week6_modern_deep_learning_optim/
│   ├── lecture.md
│   ├── 01_second_order_kfac_shampoo.ipynb
│   ├── 02_sharpness_aware_minimization.ipynb
│   ├── 03_schedule_free_and_muon.ipynb
│   ├── 04_distributed_optimization_zero.ipynb
│   └── lab_transformer_pretraining_shootout.ipynb
├── utils/
│   ├── plotting.py
│   ├── benchmarks.py
│   └── reference_problems.py
├── figures/
└── references/
    └── reading_list.md

Prerequisites

  • Math: multivariable calculus, real analysis at the level of Rudin Ch. 1–6, linear algebra (eigendecomposition, SVD), basic convex analysis.
  • Programming: Python 3.10+, NumPy, comfortable with Jupyter.
  • Helpful but not required: familiarity with PyTorch or JAX, prior ML coursework.

Installation

git clone https://github.com/HAYDARKILIC/optimization_methods.git
cd optimization_methods
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
jupyter lab

How to Use This Course

Each week is self-contained but builds on the previous one. The recommended cadence:

  1. Read lecture.md (1–2 hours). It contains the full derivations.
  2. Work through the numbered notebooks in order (~4–6 hours per week). Don't skip the convergence-rate experiments — they are how intuition is built.
  3. Tackle lab_*.ipynb end-to-end. Each lab is a small research project.

Solutions to all exercises live on the solutions branch. Resist looking until you have suffered.


## License

MIT for code. CC-BY-4.0 for prose and figures.

About

Advanced Mathematical Optimization & Deep Learning Optimizers from scratch. Covers KKT duality, L-BFGS, proximal methods (ADMM, FISTA), stochastic algorithms (SVRG, Lion), and cutting-edge deep learning optimizers like K-FAC, Shampoo, Sophia, SAM, and Muon. Bridging strict convex calculus with large-scale Transformer training.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors