Optimization Methods

A six-week, research-oriented curriculum that re-derives modern continuous optimization from first principles, implements every algorithm in pure NumPy and JAX, and benchmarks them on canonical convex problems, deep neural networks, and large-scale machine learning pipelines.

Audience. Graduate students, ML researchers, and engineers who already know multivariable calculus, linear algebra, and basic Python, and want to understand why Adam works, when SGD fails, and how proximal methods turn nonsmooth penalties into closed-form updates.

Philosophy. No black boxes. Every method is derived on paper, implemented in NumPy, validated against a reference (CVXPY for convex problems, PyTorch/Optax for deep learning baselines), and then stress-tested on a non-trivial workload.

Curriculum at a Glance

Week	Theme	Core Methods	Capstone
1	Mathematical foundations	Convexity, subdifferentials, smoothness, strong convexity, KKT	Geometry of loss landscapes
2	Unconstrained smooth optimization	Gradient descent, line search, Nesterov momentum, conjugate gradient, BFGS, L-BFGS, trust region, Gauss–Newton	Levenberg–Marquardt for nonlinear least squares
3	Proximal & nonsmooth optimization	Subgradient method, proximal gradient, FISTA, ADMM, primal–dual splitting	L1/L2 regression, total-variation image denoising
4	Stochastic optimization	SGD, mini-batch variance, SVRG, SAGA, Adam, AdamW, AdaGrad, RMSProp, Lookahead, Lion	Training a ResNet from scratch
5	Constrained optimization & duality	Lagrangian duality, projected gradient, Frank–Wolfe, interior-point, augmented Lagrangian, SQP	Portfolio optimization with cardinality constraints
6	Modern deep learning optimizers	Second-order methods (K-FAC, Shampoo, Sophia), sharpness-aware minimization (SAM, ASAM), schedule-free methods, Muon, distributed optimization	Pre-training a small transformer with three different optimizer families

Each week ships with 3–5 Jupyter notebooks, a lecture.md with the derivations, and a lab.ipynb solving a research-grade problem end-to-end.

Repository Layout

optimization_methods/
├── README.md
├── requirements.txt
├── week1_foundations/
│   ├── lecture.md
│   ├── 01_convex_sets_and_functions.ipynb
│   ├── 02_subdifferentials_and_smoothness.ipynb
│   ├── 03_optimality_conditions_kkt.ipynb
│   └── lab_loss_landscapes.ipynb
├── week2_unconstrained_smooth/
│   ├── lecture.md
│   ├── 01_gradient_descent_and_linesearch.ipynb
│   ├── 02_momentum_and_nesterov_acceleration.ipynb
│   ├── 03_quasi_newton_bfgs_lbfgs.ipynb
│   ├── 04_trust_region_and_gauss_newton.ipynb
│   └── lab_levenberg_marquardt.ipynb
├── week3_proximal_nonsmooth/
│   ├── lecture.md
│   ├── 01_subgradient_method.ipynb
│   ├── 02_proximal_gradient_and_fista.ipynb
│   ├── 03_admm_and_splitting.ipynb
│   └── lab_total_variation_denoising.ipynb
├── week4_stochastic_optim/
│   ├── lecture.md
│   ├── 01_sgd_variance_and_mini_batch.ipynb
│   ├── 02_variance_reduction_svrg_saga.ipynb
│   ├── 03_adaptive_methods_adam_family.ipynb
│   ├── 04_modern_variants_lion_lookahead.ipynb
│   └── lab_train_resnet_from_scratch.ipynb
├── week5_constrained_duality/
│   ├── lecture.md
│   ├── 01_lagrangian_duality.ipynb
│   ├── 02_projected_gradient_and_frank_wolfe.ipynb
│   ├── 03_interior_point_methods.ipynb
│   ├── 04_augmented_lagrangian_and_sqp.ipynb
│   └── lab_constrained_portfolio.ipynb
├── week6_modern_deep_learning_optim/
│   ├── lecture.md
│   ├── 01_second_order_kfac_shampoo.ipynb
│   ├── 02_sharpness_aware_minimization.ipynb
│   ├── 03_schedule_free_and_muon.ipynb
│   ├── 04_distributed_optimization_zero.ipynb
│   └── lab_transformer_pretraining_shootout.ipynb
├── utils/
│   ├── plotting.py
│   ├── benchmarks.py
│   └── reference_problems.py
├── figures/
└── references/
    └── reading_list.md

Prerequisites

Math: multivariable calculus, real analysis at the level of Rudin Ch. 1–6, linear algebra (eigendecomposition, SVD), basic convex analysis.
Programming: Python 3.10+, NumPy, comfortable with Jupyter.
Helpful but not required: familiarity with PyTorch or JAX, prior ML coursework.

Installation

git clone https://github.com/HAYDARKILIC/optimization_methods.git
cd optimization_methods
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
jupyter lab

How to Use This Course

Each week is self-contained but builds on the previous one. The recommended cadence:

Read lecture.md (1–2 hours). It contains the full derivations.
Work through the numbered notebooks in order (~4–6 hours per week). Don't skip the convergence-rate experiments — they are how intuition is built.
Tackle lab_*.ipynb end-to-end. Each lab is a small research project.

Solutions to all exercises live on the solutions branch. Resist looking until you have suffered.


## License

MIT for code. CC-BY-4.0 for prose and figures.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Optimization Methods

Curriculum at a Glance

Repository Layout

Prerequisites

Installation

How to Use This Course

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
references		references
utils		utils
week1_foundations		week1_foundations
week2_unconstrained_smooth		week2_unconstrained_smooth
week3_proximal_nonsmooth		week3_proximal_nonsmooth
week4_stochastic_optim		week4_stochastic_optim
week5_constrained_duality		week5_constrained_duality
week6_modern_deep_learning_optim		week6_modern_deep_learning_optim
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Optimization Methods

Curriculum at a Glance

Repository Layout

Prerequisites

Installation

How to Use This Course

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages