Skip to content

Commit c4a52a5

Browse files
committed
Updates for quosure changes
1 parent a987e74 commit c4a52a5

File tree

2 files changed

+70
-55
lines changed

2 files changed

+70
-55
lines changed

vignettes/articles/Scratch.Rmd

Lines changed: 28 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -64,14 +64,14 @@ A row for "unknown" modes is not needed in this object.
6464

6565
Now, we enumerate the _main arguments_ for each engine. `parsnip` standardizes the names of arguments across different models and engines. For example, random forest and boosting use multiple trees to create the ensemble. Instead of using different argument names, `parsnip` standardizes on `trees` and the underlying code translates to the actual arguments used by the different functions.
6666

67-
In our case, the MDA argument name will be "subclasses".
67+
In our case, the MDA argument name will be "sub_classes".
6868

6969
Here, the object name will have the suffix `_arg_key` and will have columns for the engines and rows for the arguments. The entries for the data frame are the actual arguments for each engine (and is `NA` when an engine doesn't have that argument). Ours:
7070

7171
```{r arg-key}
7272
mixture_da_arg_key <- data.frame(
73-
mda = "subclasses",
74-
row.names = "subclasses",
73+
mda = "sub_classes",
74+
row.names = "sub_classes",
7575
stringsAsFactors = FALSE
7676
)
7777
```
@@ -89,27 +89,25 @@ The internals of `parsnip` will use these objects during the creation of the mod
8989
This is a fairly simple function that can follow a basic template. The main arguments to our function will be:
9090

9191
* The mode. If the model can do more than one mode, you might default this to "unknown". In our case, since it is only a classification model, it makes sense to default it to that mode.
92-
* The argument names (`subclasses` here). These should be defaulted to `NULL`.
93-
* An argument, `others`, that can be used to pass in other arguments to the underlying model fit functions.
94-
* `...`, although they are not currently used. We encourage developers to move the `...` after mode so that users are encouraged to use named arguments to the model specification.
92+
* The argument names (`sub_classes` here). These should be defaulted to `NULL`.
93+
* `...` is used to pass in other arguments to the underlying model fit functions.
9594

9695
A basic version of the function is:
9796

9897
```{r model-fun}
9998
mixture_da <-
100-
function(mode = "classification", ..., subclasses = NULL, others = list()) {
101-
102-
# start with some basic error traps
103-
check_empty_ellipse(...)
104-
99+
function(mode = "classification", sub_classes = NULL, ...) {
100+
# Check for correct mode
105101
if (!(mode %in% mixture_da_modes))
106102
stop("`mode` should be one of: ",
107103
paste0("'", mixture_da_modes, "'", collapse = ", "),
108104
call. = FALSE)
109105
110-
args <- list(subclasses = subclasses)
111-
112-
# save the other arguments but remove them if they are null.
106+
# Capture the arguments in quosures
107+
others <- enquos(...)
108+
args <- list(sub_classes = enquo(sub_classes))
109+
110+
# Save the other arguments but remove them if they are null.
113111
no_value <- !vapply(others, is.null, logical(1))
114112
others <- others[no_value]
115113
@@ -233,7 +231,7 @@ For example:
233231
library(parsnip)
234232
library(tidyverse)
235233
236-
mixture_da(subclasses = 2) %>%
234+
mixture_da(sub_classes = 2) %>%
237235
translate(engine = "mda")
238236
```
239237

@@ -248,7 +246,7 @@ iris_split <- initial_split(iris, prop = 0.90)
248246
iris_train <- training(iris_split)
249247
iris_test <- testing(iris_split)
250248
251-
mda_spec <- mixture_da(subclasses = 2)
249+
mda_spec <- mixture_da(sub_classes = 2)
252250
253251
mda_fit <- mda_spec %>%
254252
fit(Species ~ ., data = iris_train, engine = "mda")
@@ -278,7 +276,7 @@ There are some models (e.g. `glmnet`, `plsr`, `Cubist`, etc.) that can make pred
278276
For example, if I fit a linear regression model via `glmnet` and get four values of the regularization parameter (`lambda`):
279277

280278
```{r glmnet}
281-
linear_reg(others = list(nlambda = 4)) %>%
279+
linear_reg(nlambda = 4) %>%
282280
fit(mpg ~ ., data = mtcars, engine = "glmnet") %>%
283281
predict(new_data = mtcars[1:3, -1])
284282
```
@@ -302,7 +300,7 @@ logistic_reg() %>% translate(engine = "glm")
302300
303301
# but you can change it:
304302
305-
logistic_reg(others = list(family = expr(binomial(link = "probit")))) %>%
303+
logistic_reg(family = binomial(link = "probit")) %>%
306304
translate(engine = "glm")
307305
```
308306

@@ -322,13 +320,23 @@ translate.rand_forest <- function (x, engine, ...){
322320
# Run the general method to get the real arguments in place
323321
x <- translate.default(x, engine, ...)
324322
323+
# Make code easier to read
324+
arg_vals <- x$method$fit$args
325+
325326
# Check and see if they make sense for the engine and/or mode:
326327
if (x$engine == "ranger") {
327-
if (any(names(x$method$fit$args) == "importance"))
328-
if (is.logical(x$method$fit$args$importance))
328+
if (any(names(arg_vals) == "importance"))
329+
# We want to check the type of `importance` but it is a quosure. We first
330+
# get the expression. It is is logical, the value of `quo_get_expr` will
331+
# not be an expression but the actual logical. The wrapping of `isTRUE`
332+
# is there in case it is not an atomic value.
333+
if (isTRUE(is.logical(quo_get_expr(arg_vals$importance))))
329334
stop("`importance` should be a character value. See ?ranger::ranger.",
330335
call. = FALSE)
336+
if (x$mode == "classification" && !any(names(arg_vals) == "probability"))
337+
arg_vals$probability <- TRUE
331338
}
339+
x$method$fit$args <- arg_vals
332340
x
333341
}
334342
```

vignettes/parsnip_Intro.Rmd

Lines changed: 42 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -77,24 +77,23 @@ The arguments to the default function are:
7777
args(rand_forest)
7878
```
7979

80-
However, there might be other arguments that you would like to change or allow to vary. These are accessible using the `others` option. This is a named list of arguments in the form of the underlying function being called. For example, `ranger` has an option to set the internal random number seed. To set this to a specific value:
80+
However, there might be other arguments that you would like to change or allow to vary. These are accessible using the `...` slot. This is a named list of arguments in the form of the underlying function being called. For example, `ranger` has an option to set the internal random number seed. To set this to a specific value:
8181

8282
```{r rf-seed}
8383
rf_with_seed <- rand_forest(
84-
trees = 2000, mtry = varying(),
85-
others = list(seed = 63233),
84+
trees = 2000,
85+
mtry = varying(),
86+
seed = 63233,
8687
mode = "regression"
8788
)
8889
rf_with_seed
8990
```
9091

91-
If the model function contains the ellipses (`...`), these additional arguments can be passed along using `others`.
92-
9392
### Process
9493

9594
To fit the model, you must:
9695

97-
* define the model, including the _mode_,
96+
* have a defined model, including the _mode_,
9897
* have no `varying()` parameters, and
9998
* specify a computational engine.
10099

@@ -123,44 +122,52 @@ translate(rf_with_seed, engine = "randomForest")
123122

124123
These models can be fit using the `fit` function. Only the model object is returned.
125124

126-
```r
125+
```{r, eval = FALSE}
127126
fit(rf_mod, mpg ~ ., data = mtcars, engine = "ranger")
128127
```
129128

130129
```
131-
## parsnip model object
132-
##
133-
## Ranger result
134-
##
135-
## Call:
136-
## ranger::ranger(formula = mpg ~ ., data = mtcars, num.trees = 2000, num.threads = 1, verbose = FALSE, seed = sample.int(10^5, 1))
137-
##
138-
## Type: Regression
139-
## Number of trees: 2000
140-
## Sample size: 32
141-
## Number of independent variables: 10
142-
## Mtry: 3
143-
## Target node size: 5
144-
## Variable importance mode: none
145-
## Splitrule: variance
146-
## OOB prediction error (MSE): 5.71
147-
## R squared (OOB): 0.843
130+
#> parsnip model object
131+
#>
132+
#> Ranger result
133+
#>
134+
#> Call:
135+
#> ranger::ranger(formula = formula, data = data, num.trees = ~2000, num.threads = 1, verbose = FALSE, seed = sample.int(10^5, 1))
136+
#>
137+
#> Type: Regression
138+
#> Number of trees: 2000
139+
#> Sample size: 32
140+
#> Number of independent variables: 10
141+
#> Mtry: 3
142+
#> Target node size: 5
143+
#> Variable importance mode: none
144+
#> Splitrule: variance
145+
#> OOB prediction error (MSE): 5.71
146+
#> R squared (OOB): 0.843
148147
```
149148

150149

151-
```r
150+
```{r, eval = FALSE}
152151
fit(rf_mod, mpg ~ ., data = mtcars, engine = "randomForest")
153152
```
154153

155154
```
156-
## parsnip model object
157-
##
158-
## Call:
159-
## randomForest(x = as.data.frame(x), y = y, ntree = 2000)
160-
## Type of random forest: regression
161-
## Number of trees: 2000
162-
## No. of variables tried at each split: 3
163-
##
164-
## Mean of squared residuals: 5.6
165-
## % Var explained: 84.1
155+
#> parsnip model object
156+
#>
157+
#>
158+
#> Call:
159+
#> randomForest(x = as.data.frame(x), y = y, ntree = ~2000)
160+
#> Type of random forest: regression
161+
#> Number of trees: 2000
162+
#> No. of variables tried at each split: 3
163+
#>
164+
#> Mean of squared residuals: 5.6
165+
#> % Var explained: 84.1
166166
```
167+
168+
Note that, in the case of the `ranger` fit, the call object shows `num.trees = ~2000`. The tilde is the consequence of `parsnip` using quosures to process the model specification's arguments.
169+
170+
Normally, when a function is executed, the function's arguments are immediately evaluated. In the case of `parsnip`, the model specification's arguments are _not_; the expression is captured along with the environment where it should be evaluated. That is what a quosure does.
171+
172+
`parsnip` uses these expressions to make a model fit call that is evaluated. The tilde in the call above reflects that the argument was captured using a quosure.
173+

0 commit comments

Comments
 (0)