|
1 | 1 | { |
2 | 2 | "cells": [ |
3 | | - { |
4 | | - "cell_type": "raw", |
5 | | - "metadata": {}, |
6 | | - "source": [ |
7 | | - "---\n", |
8 | | - "title: PyTorch Hyperparameter Tuning --- A Tutorial for spotPython\n", |
9 | | - "subtitle: Version 0.2.14\n", |
10 | | - "format:\n", |
11 | | - " pdf:\n", |
12 | | - " template: bart23e_template.tex\n", |
13 | | - " fig-width: 7\n", |
14 | | - " fig-height: 5\n", |
15 | | - " keep-tex: true\n", |
16 | | - " linenumbers: false\n", |
17 | | - " doublespacing: false\n", |
18 | | - " number-sections: true\n", |
19 | | - " runninghead: PyTorch Hyperparameter Tuning With spotPython\n", |
20 | | - " html:\n", |
21 | | - " fig-width: 7\n", |
22 | | - " fig-height: 5\n", |
23 | | - "author:\n", |
24 | | - " - name: Thomas Bartz-Beielstein\n", |
25 | | - " affiliations:\n", |
26 | | - " - name: SpotSeven Lab\n", |
27 | | - " - city: Gummersbach\n", |
28 | | - " country: Germany\n", |
29 | | - " postal-code: 51643\n", |
30 | | - " orcid: 0000-0002-5938-5158\n", |
31 | | - " email: bartzbeielstein@gmail.com\n", |
32 | | - " url: 'https://www.spotseven.de'\n", |
33 | | - "abstract: |\n", |
34 | | - " The goal of hyperparameter tuning (or hyperparameter optimization) is to optimize the hyperparameters to improve the performance of the machine or deep learning model. spotPython (\"Sequential Parameter Optimization Toolbox in Python\") is the Python version of the well-known hyperparameter tuner SPOT, which has been developed in the R programming environment for statistical analysis for over a decade. PyTorch is an optimized tensor library for deep learning using GPUs and CPUs. This document shows how to integrate the spotPython hyperparameter tuner into the PyTorch training workflow. As an example, the results of the CIFAR10 image classifier are used. In addition to an introduction to spotPython, this tutorial also includes a brief comparison with Ray Tune, a Python library for running experiments and tuning hyperparameters. This comparison is based on the PyTorch hyperparameter tuning tutorial. The advantages and disadvantages of both approaches are discussed. We show that spotPython achieves similar or even better results while being more flexible and transparent than Ray Tune.\n", |
35 | | - "keywords:\n", |
36 | | - " - hyperparameter tuning\n", |
37 | | - " - hyperparameter optimization\n", |
38 | | - " - spotPython\n", |
39 | | - " - PyTorch\n", |
40 | | - " - CIFAR10\n", |
41 | | - " - optimization\n", |
42 | | - " - deep learning\n", |
43 | | - " - machine learning\n", |
44 | | - "bibliography: bart23e.bib\n", |
45 | | - "execute:\n", |
46 | | - " cache: false\n", |
47 | | - " eval: false\n", |
48 | | - " echo: true\n", |
49 | | - " warning: false\n", |
50 | | - "---" |
51 | | - ] |
52 | | - }, |
53 | | - { |
54 | | - "attachments": {}, |
55 | | - "cell_type": "markdown", |
56 | | - "metadata": {}, |
57 | | - "source": [ |
58 | | - "# Hyperparameter Tuning {#sec-hyperparameter-tuning}\n", |
59 | | - "\n", |
60 | | - "Hyperparameter tuning is an important, but often difficult and computationally intensive task.\n", |
61 | | - "Changing the architecture of a neural network or the learning rate of an optimizer can have a significant impact on the performance.\n", |
62 | | - "\n", |
63 | | - "The goal of hyperparameter tuning is to optimize the hyperparameters in a way that improves the performance of the machine learning or deep learning model.\n", |
64 | | - "The simplest, but also most computationally expensive, approach uses manual search (or trial-and-error [@Meignan:2015vp]).\n", |
65 | | - "Commonly encountered is simple random search, i.e., random and repeated selection of hyperparameters for evaluation, and lattice search (\"grid search\").\n", |
66 | | - "In addition, methods that perform directed search and other model-free algorithms, i.e., algorithms that do not explicitly rely on a model, e.g., evolution strategies [@Bart13j] or pattern search [@Torczon00] play an important role.\n", |
67 | | - "Also, \"hyperband\", i.e., a multi-armed bandit strategy that dynamically allocates resources to a set of random configurations and uses successive bisections to stop configurations with poor performance [@Li16a], is very common in hyperparameter tuning.\n", |
68 | | - "The most sophisticated and efficient approaches are the Bayesian optimization and surrogate model based optimization methods, which are based on the optimization of cost functions determined by simulations or experiments.\n", |
69 | | - "\n", |
70 | | - "We consider below a surrogate model based optimization-based hyperparameter tuning approach based on the Python version of the SPOT (\"Sequential Parameter Optimization Toolbox\") [@BLP05], which is suitable for situations where only limited resources are available. This may be due to limited availability and cost of hardware, or due to the fact that confidential data may only be processed locally, e.g., due to legal requirements.\n", |
71 | | - "Furthermore, in our approach, the understanding of algorithms is seen as a key tool for enabling transparency and explainability. This can be enabled, for example, by quantifying the contribution of machine learning and deep learning components (nodes, layers, split decisions, activation functions, etc.).\n", |
72 | | - "Understanding the importance of hyperparameters and the interactions between multiple hyperparameters plays a major role in the interpretability and explainability of machine learning models.\n", |
73 | | - "SPOT provides statistical tools for understanding hyperparameters and their interactions. Last but not least, it should be noted that the SPOT software code is available in the open source `spotPython` package on github^[[https://github.com/sequential-parameter-optimization](https://github.com/sequential-parameter-optimization)], allowing replicability of the results.\n", |
74 | | - "This tutorial descries the Python variant of SPOT, which is called `spotPython`. The R implementation is described in @bart21i.\n", |
75 | | - "SPOT is an established open source software that has been maintained for more than 15 years [@BLP05] [@bart21i].\n", |
76 | | - "\n", |
77 | | - "This tutorial is structured as follows. The concept of the hyperparameter tuning software `spotPython` is described in @sec-spot. \n", |
78 | | - "@sec-quickstart (\"Quickstart\") describes the execution of the example from the tutorial \"Hyperparameter Tuning with Ray Tune\" [@pyto23a].\n", |
79 | | - "@sec-hyperparameter-tuning-for-pytorch describes the integration of `spotPython` into the ``PyTorch`` training workflow in detail and presents the results. Finally, @sec-summary presents a summary and an outlook.\n", |
80 | | - "\n", |
81 | | - "::: {.callout-note}\n", |
82 | | - "The corresponding ` .ipynb` notebook [@bart23e] is updated regularly and reflects updates and changes in the `spotPython` package.\n", |
83 | | - "It can be downloaded from [https://github.com/sequential-parameter-optimization/spotPython/blob/main/notebooks/14_spot_ray_hpt_torch_cifar10.ipynb](https://github.com/sequential-parameter-optimization/spotPython/blob/main/notebooks/14_spot_ray_hpt_torch_cifar10.ipynb).\n", |
84 | | - ":::\n", |
85 | | - "\n", |
86 | | - "\n", |
87 | | - "# The Hyperparameter Tuning Software SPOT {#sec-spot}\n", |
88 | | - "\n", |
89 | | - "Surrogate model based optimization methods are common approaches in simulation and optimization. SPOT was developed because there is a great need for sound statistical analysis of simulation and optimization algorithms. SPOT includes methods for tuning based on classical regression and analysis of variance techniques.\n", |
90 | | - "It presents tree-based models such as classification and regression trees and random forests as well as Bayesian optimization (Gaussian process models, also known as Kriging). Combinations of different meta-modeling approaches are possible. SPOT comes with a sophisticated surrogate model based optimization method, that can handle discrete and continuous inputs. Furthermore, any model implemented in `scikit-learn` can be used out-of-the-box as a surrogate in `spotPython`.\n", |
91 | | - "\n", |
92 | | - "SPOT implements key techniques such as exploratory fitness landscape analysis and sensitivity analysis. It can be used to understand the performance of various algorithms, while simultaneously giving insights into their algorithmic behavior.\n", |
93 | | - "In addition, SPOT can be used as an optimizer and for automatic and interactive tuning. Details on SPOT and its use in practice are given by @bart21i.\n", |
94 | | - "\n", |
95 | | - "A typical hyperparameter tuning process with `spotPython` consists of the following steps:\n", |
96 | | - "\n", |
97 | | - "1. Loading the data (training and test datasets), see @sec-data-loading.\n", |
98 | | - "2. Specification of the preprocessing model, see @sec-specification-of-preprocessing-model. This model is called `prep_model` (\"preparation\" or pre-processing).\n", |
99 | | - "The information required for the hyperparameter tuning is stored in the dictionary `fun_control`. Thus, the information needed for the execution of the hyperparameter tuning is available in a readable form.\n", |
100 | | - "3. Selection of the machine learning or deep learning model to be tuned, see @sec-selection-of-the-algorithm. This is called the `core_model`. Once the `core_model` is defined, then the associated hyperparameters are stored in the `fun_control` dictionary. First, the hyperparameters of the `core_model` are initialized with the default values of the `core_model`.\n", |
101 | | - "As default values we use the default values contained in the `spotPython` package for the algorithms of the `torch` package.\n", |
102 | | - "4. Modification of the default values for the hyperparameters used in `core_model`, see @sec-modification-of-default-values. This step is optional.\n", |
103 | | - " 1. numeric parameters are modified by changing the bounds.\n", |
104 | | - " 2. categorical parameters are modified by changing the categories (\"levels\").\n", |
105 | | - "5. Selection of target function (loss function) for the optimizer, see @sec-selection-of-target-function.\n", |
106 | | - "6. Calling SPOT with the corresponding parameters, see @sec-call-the-hyperparameter-tuner. The results are stored in a dictionary and are available for further analysis.\n", |
107 | | - "7. Presentation, visualization and interpretation of the results, see @sec-results-tuning.\n", |
108 | | - "\n", |
109 | | - "\n", |
110 | | - "# Quickstart {#sec-quickstart}" |
111 | | - ] |
112 | | - }, |
113 | 3 | { |
114 | 4 | "cell_type": "code", |
115 | 5 | "execution_count": 1, |
116 | 6 | "metadata": {}, |
117 | 7 | "outputs": [], |
118 | 8 | "source": [ |
119 | | - "#| echo: true\n", |
120 | | - "#| eval: false\n", |
121 | 9 | "import numpy as np\n", |
122 | 10 | "import pandas as pd\n", |
123 | | - "import itertools\n", |
124 | 11 | "from math import inf\n", |
125 | 12 | "import torch\n", |
126 | 13 | "import torchmetrics\n", |
|
132 | 19 | "\n", |
133 | 20 | "from spotPython.spot import spot\n", |
134 | 21 | "from spotPython.utils.init import fun_control_init\n", |
135 | | - "from spotPython.data.torchdata import load_data_cifar10\n", |
136 | 22 | "from spotPython.hyperparameters.values import (\n", |
137 | 23 | " add_core_model_to_fun_control,\n", |
138 | 24 | " modify_hyper_parameter_levels,\n", |
|
146 | 32 | "from spotPython.data.torch_hyper_dict import TorchHyperDict\n", |
147 | 33 | "from spotPython.fun.hypertorch import HyperTorch\n", |
148 | 34 | "from spotPython.torch.netvbdp import Net_vbdp\n", |
149 | | - "# from spotPython.torch.netcifar10 import Net_CIFAR10\n", |
150 | 35 | "from spotPython.torch.traintest import (\n", |
151 | 36 | " train_tuned,\n", |
152 | 37 | " test_tuned,\n", |
|
170 | 55 | "metadata": {}, |
171 | 56 | "outputs": [], |
172 | 57 | "source": [ |
173 | | - "\n", |
174 | | - "#| echo: true\n", |
175 | | - "#| eval: false\n", |
176 | 58 | "fun_control = fun_control_init(task=\"classification\", tensorboard_path=\"runs/25_spot_torch_vbdp\")\n", |
177 | 59 | "fun_control.update({\"show_batch_interval\": 100_000_000})\n", |
178 | 60 | "# load data\n", |
|
200 | 82 | "name": "stderr", |
201 | 83 | "output_type": "stream", |
202 | 84 | "text": [ |
203 | | - "/var/folders/hf/rnzlxrlx0fq3fr59krvfr1800000gn/T/ipykernel_46069/3336454340.py:8: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`\n", |
| 85 | + "/var/folders/dw/pvtj6mt91znd0hftcztqb0k00000gn/T/ipykernel_12542/3336454340.py:8: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`\n", |
204 | 86 | " train_df[target_column] = target\n" |
205 | 87 | ] |
206 | 88 | }, |
|
496 | 378 | "\n", |
497 | 379 | "config: {'_L0': 6112, 'l1': 32768, 'dropout_prob': 0.7103122166156, 'lr_mult': 0.001, 'batch_size': 4, 'epochs': 64, 'k_folds': 1, 'patience': 64, 'optimizer': 'AdamW', 'sgd_momentum': 0.9}\n", |
498 | 380 | "Epoch: 1\n", |
499 | | - "Loss on hold-out set: 2.39726079185054\n", |
500 | | - "Accuracy on hold-out set: 0.10377358490566038\n", |
501 | | - "MAPK value on hold-out data: 0.19732704402515722\n", |
502 | | - "Epoch: 2\n" |
| 381 | + "Loss on hold-out set: 2.397013398836244\n", |
| 382 | + "Accuracy on hold-out set: 0.14622641509433962\n", |
| 383 | + "MAPK value on hold-out data: 0.23506289308176098\n", |
| 384 | + "Epoch: 2\n", |
| 385 | + "Loss on hold-out set: 2.3962632890017526\n", |
| 386 | + "Accuracy on hold-out set: 0.12735849056603774\n", |
| 387 | + "MAPK value on hold-out data: 0.2272012578616352\n", |
| 388 | + "Epoch: 3\n", |
| 389 | + "Loss on hold-out set: 2.3953690034038617\n", |
| 390 | + "Accuracy on hold-out set: 0.14622641509433962\n", |
| 391 | + "MAPK value on hold-out data: 0.23977987421383645\n", |
| 392 | + "Epoch: 4\n", |
| 393 | + "Loss on hold-out set: 2.394227927585818\n", |
| 394 | + "Accuracy on hold-out set: 0.13679245283018868\n", |
| 395 | + "MAPK value on hold-out data: 0.2366352201257861\n", |
| 396 | + "Epoch: 5\n", |
| 397 | + "Loss on hold-out set: 2.3928299624964877\n", |
| 398 | + "Accuracy on hold-out set: 0.13679245283018868\n", |
| 399 | + "MAPK value on hold-out data: 0.23663522012578614\n", |
| 400 | + "Epoch: 6\n", |
| 401 | + "Loss on hold-out set: 2.3910861375196926\n", |
| 402 | + "Accuracy on hold-out set: 0.13679245283018868\n", |
| 403 | + "MAPK value on hold-out data: 0.23506289308176098\n", |
| 404 | + "Epoch: 7\n" |
503 | 405 | ] |
504 | 406 | } |
505 | 407 | ], |
|
723 | 625 | "name": "python", |
724 | 626 | "nbconvert_exporter": "python", |
725 | 627 | "pygments_lexer": "ipython3", |
726 | | - "version": "3.10.11" |
| 628 | + "version": "3.10.10" |
727 | 629 | } |
728 | 630 | }, |
729 | 631 | "nbformat": 4, |
|
0 commit comments