Skip to content

Commit ab3c94f

Browse files
qiushiyantopepo
andauthored
Add new model type auto_ml (#758)
* add auto_ml * use default engine * missing pkgdown entry * automl -> auto_ml * validation parameter * typo Co-authored-by: topepo <mxkuhn@gmail.com>
1 parent b4f065b commit ab3c94f

File tree

10 files changed

+311
-0
lines changed

10 files changed

+311
-0
lines changed

NAMESPACE

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -137,6 +137,7 @@ export(C5.0_train)
137137
export(C5_rules)
138138
export(add_rowindex)
139139
export(augment)
140+
export(auto_ml)
140141
export(autoplot)
141142
export(bag_mars)
142143
export(bag_tree)

R/auto_ml.R

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
#' Automatic Machine Learning
2+
#'
3+
#' @description
4+
#'
5+
#' `auto_ml()` defines an automated searching and tuning process where
6+
#' many models of different families are trained and ranked given their
7+
#' performance on the training data.
8+
#'
9+
#' \Sexpr[stage=render,results=rd]{parsnip:::make_engine_list("auto_ml")}
10+
#'
11+
#' More information on how \pkg{parsnip} is used for modeling is at
12+
#' \url{https://www.tidymodels.org/}.
13+
#'
14+
#' @param mode A single character string for the prediction outcome mode.
15+
#' Possible values for this model are "unknown", "regression", or
16+
#' "classification".
17+
#' @param engine A single character string specifying what computational engine
18+
#' to use for fitting.
19+
#'
20+
#' @template spec-details
21+
#'
22+
#' @template spec-references
23+
#'
24+
#' @seealso \Sexpr[stage=render,results=rd]{parsnip:::make_seealso_list("auto_ml")}
25+
#' @export
26+
auto_ml <- function(mode = "unknown", engine = "h2o") {
27+
args <- list()
28+
out <- list(args = args, eng_args = NULL,
29+
mode = mode, method = NULL, engine = engine)
30+
class(out) <- make_classes("auto_ml")
31+
out
32+
}
33+
34+
# ------------------------------------------------------------------------------
35+
set_new_model("auto_ml")
36+
set_model_mode("auto_ml", "regression")
37+
set_model_mode("auto_ml", "classification")

R/auto_ml_h2o.R

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
#' Automatic machine learning via h2o
2+
#'
3+
#' [h2o::h2o.automl] defines an automated model training process and returns a
4+
#' leaderboard of models with best performances.
5+
#'
6+
#' @includeRmd man/rmd/auto_ml_h2o.md details
7+
#'
8+
#' @name details_auto_ml_h2o
9+
#' @keywords internal
10+
NULL
11+
12+
# See inst/README-DOCS.md for a description of how these files are processed

R/print.R

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@ get_model_desc <- function(cls) {
3232

3333
model_descs <- tibble::tribble(
3434
~cls, ~desc,
35+
"auto_ml", "Automatic Machine Learning",
3536
"bag_mars", "Bagged MARS",
3637
"bag_tree", "Bagged Decision Tree",
3738
"bart", "BART",

_pkgdown.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@ figures:
3333
reference:
3434
- title: Models
3535
contents:
36+
- auto_ml
3637
- bag_mars
3738
- bag_tree
3839
- bart

inst/models.tsv

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,6 @@
11
"model" "mode" "engine" "pkg"
2+
"auto_ml" "classification" "h2o" "agua"
3+
"auto_ml" "regression" "h2o" "agua"
24
"bag_mars" "classification" "earth" "baguette"
35
"bag_mars" "regression" "earth" "baguette"
46
"bag_tree" "censored regression" "rpart" "censored"

man/auto_ml.Rd

Lines changed: 41 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

man/details_auto_ml_h2o.Rd

Lines changed: 96 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

man/rmd/auto_ml_h2o.Rmd

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
```{r, child = "aaa.Rmd", include = FALSE}
2+
```
3+
4+
`r descr_models("auto_ml", "h2o")`
5+
6+
## Tuning Parameters
7+
8+
This model has no tuning parameters.
9+
10+
Engine arguments of interest
11+
12+
- `max_runtime_secs` and `max_models`: controls the maximum running time and number of models to build in the automatic process.
13+
14+
- `exclude_algos` and `include_algos`: a character vector indicating the excluded or included algorithms during model building. To see a full list of supported models, see the details section in [h2o::h2o.automl()].
15+
16+
- `validation`: An integer between 0 and 1 specifying the _proportion_ of training data reserved as validation set. This is used by h2o for performance assessment and potential early stopping.
17+
18+
## Translation from parsnip to the original package (regression)
19+
20+
[agua::h2o_train_auto()] is a wrapper around [h2o::h2o.automl()].
21+
22+
```{r h2o-reg}
23+
auto_ml() %>%
24+
set_engine("h2o") %>%
25+
set_mode("regression") %>%
26+
translate()
27+
```
28+
29+
30+
## Translation from parsnip to the original package (classification)
31+
32+
```{r h2o-cls}
33+
auto_ml() %>%
34+
set_engine("h2o") %>%
35+
set_mode("classification") %>%
36+
translate()
37+
```
38+
39+
## Preprocessing requirements
40+
41+
```{r child = "template-makes-dummies.Rmd"}
42+
```
43+
44+
## Initializing h2o
45+
46+
```{r child = "template-h2o-init.Rmd"}
47+
```

man/rmd/auto_ml_h2o.md

Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
2+
3+
4+
For this engine, there are multiple modes: classification and regression
5+
6+
## Tuning Parameters
7+
8+
This model has no tuning parameters.
9+
10+
Engine arguments of interest
11+
12+
- `max_runtime_secs` and `max_models`: controls the maximum running time and number of models to build in the automatic process.
13+
14+
- `exclude_algos` and `include_algos`: a character vector indicating the excluded or included algorithms during model building. To see a full list of supported models, see the details section in [h2o::h2o.automl()].
15+
16+
- `validation`: An integer between 0 and 1 specifying the _proportion_ of training data reserved as validation set. This is used by h2o for performance assessment and potential early stopping.
17+
18+
## Translation from parsnip to the original package (regression)
19+
20+
[agua::h2o_train_auto()] is a wrapper around [h2o::h2o.automl()].
21+
22+
23+
```r
24+
auto_ml() %>%
25+
set_engine("h2o") %>%
26+
set_mode("regression") %>%
27+
translate()
28+
```
29+
30+
```
31+
## Automatic Machine Learning Model Specification (regression)
32+
##
33+
## Computational engine: h2o
34+
##
35+
## Model fit template:
36+
## agua::h2o_train_auto(x = missing_arg(), y = missing_arg(), weights = missing_arg(),
37+
## validation_frame = missing_arg(), verbosity = NULL)
38+
```
39+
40+
41+
## Translation from parsnip to the original package (classification)
42+
43+
44+
```r
45+
auto_ml() %>%
46+
set_engine("h2o") %>%
47+
set_mode("classification") %>%
48+
translate()
49+
```
50+
51+
```
52+
## Automatic Machine Learning Model Specification (classification)
53+
##
54+
## Computational engine: h2o
55+
##
56+
## Model fit template:
57+
## agua::h2o_train_auto(x = missing_arg(), y = missing_arg(), weights = missing_arg(),
58+
## validation_frame = missing_arg(), verbosity = NULL)
59+
```
60+
61+
## Preprocessing requirements
62+
63+
64+
Factor/categorical predictors need to be converted to numeric values (e.g., dummy or indicator variables) for this engine. When using the formula method via \\code{\\link[=fit.model_spec]{fit()}}, parsnip will convert factor columns to indicators.
65+
66+
## Initializing h2o
67+
68+
69+
To use the h2o engine with tidymodels, please run `h2o::h2o.init()` first. By default, This connects R to the local h2o server. This needs to be done in every new R session. You can also connect to a remote h2o server with an IP address, for more details see [h2o::h2o.init()].
70+
71+
You can control the number of threads in the thread pool used by h2o with the `nthreads` argument. By default, it uses all CPUs on the host. This is different from the usual parallel processing mechanism in tidymodels for tuning, while tidymodels parallelizes over resamples, h2o parallelizes over hyperparameter combinations for a given resample.
72+
73+
h2o will automatically shut down the local h2o instance started by R when R is terminated. To manually stop the h2o server, run `h2o::h2o.shutdown()`.

0 commit comments

Comments
 (0)