The extratests from tidymodels

Thinking about how we test parsnip model registration with extension packages

This is related to #185. Which was darn hard to figure out.

What happens is that the .onload() of baguette will trigger in a test. Which is what we want. But it will stay triggered for any of the following tests. Such that we get different results sometimes depending on if we test all files or one file. Also, it violates some soft of test smell, since the order of tests matter

library(parsnip)
library(testthat)

parsnip::get_model_env()$bag_tree_args
#> # A tibble: 0 × 5
#> # ℹ 5 variables: engine <chr>, parsnip <chr>, original <chr>, func <list>,
#> #   has_submodel <lgl>

# could be in a previous file
test_that("load baguette", {
  library(baguette)
})
#> ── Skip: load baguette ─────────────────────────────────────────────────────────
#> Reason: empty test

parsnip::get_model_env()$bag_tree_args
#> # A tibble: 6 × 5
#>   engine parsnip         original func             has_submodel
#>   <chr>  <chr>           <chr>    <list>           <lgl>       
#> 1 rpart  class_cost      cost     <named list [2]> FALSE       
#> 2 rpart  tree_depth      maxdepth <named list [2]> FALSE       
#> 3 rpart  min_n           minsplit <named list [2]> FALSE       
#> 4 rpart  cost_complexity cp       <named list [2]> FALSE       
#> 5 C5.0   class_cost      cost     <named list [2]> FALSE       
#> 6 C5.0   min_n           minCases <named list [2]> FALSE

GHA failures due to aorsf archival

GH R-CMD-CHECK is failing with:

  Error: 
  ! error in pak subprocess
  Caused by error: 
  ! Could not solve package dependencies:
  * deps::.: Can't install dependency aorsf
  * aorsf: Can't find package called aorsf.
  * any::sessioninfo: dependency conflict
  * any::rcmdcheck: dependency conflict

as aorsf got booted from CRAN. Let's configure so we install from GitHub for now.

Use CRAN version of aorsf

it's back :) let's undo https://github.com/tidymodels/extratests/pull/172/files

Adding tests that make sure tuneable steps work in external packages

I have problems in textrecipes and themis lately with not getting tuneable steps to work.

I have a feeling it has something to with .onLoad() but it is hard to test within the packages individually.

Here is an example of it not working in themis. I feel like it used to work at one point.

library(tidymodels)
library(themis)
data("credit_data")

ds_rec <- recipe(Status ~ Age + Income + Assets, data = credit_data) %>%
  step_meanimpute(all_predictors()) %>%
  step_bsmote(Status, all_neighbors = tune())

mod_spec <- null_model() %>%
  set_engine("parsnip")

wf_spec <- workflow() %>%
  add_model(mod_spec) %>%
  add_recipe(ds_rec)

folds <- vfold_cv(credit_data, v = 2)

tune_grid(wf_spec, resamples = folds, grid = grid_regular(all_neighbors()))
#> Warning: No tuning parameters have been detected, performance will be evaluated
#> using the resamples with no tuning. Did you want [fit_resamples()]?
#> x Fold1: recipe: Error: You cannot `prep()` a tuneable recipe. Argument(s) with `...
#> x Fold2: recipe: Error: You cannot `prep()` a tuneable recipe. Argument(s) with `...
#> Warning: All models failed in tune_grid(). See the `.notes` column.
#> Warning: This tuning result has notes. Example notes on model fitting include:
#> recipe: Error: You cannot `prep()` a tuneable recipe. Argument(s) with `tune()`: 'all_neighbors'. Do you want to use a tuning function such as `tune_grid()`?
#> recipe: Error: You cannot `prep()` a tuneable recipe. Argument(s) with `tune()`: 'all_neighbors'. Do you want to use a tuning function such as `tune_grid()`?
#> # Tuning results
#> # 2-fold cross-validation 
#> # A tibble: 2 x 4
#>   splits              id    .metrics .notes          
#>   <list>              <chr> <list>   <list>          
#> 1 <split [2.2K/2.2K]> Fold1 <NULL>   <tibble [1 × 1]>
#> 2 <split [2.2K/2.2K]> Fold2 <NULL>   <tibble [1 × 1]>

^{Created on 2020-08-07 by the reprex package (v0.3.0)}

Test `int_pctl()` for survival models

Upkeep for `test-recipes-varying.R`

varying() has been replaced by tune() and varying_args() by tune_args() (which has tests in test-tune_args.R).

It'd be good to check if any of the tests here should be updated (and possibly migrated to live either in test-tune_args.R or in recipes) or if they could be deleted.

`GH-R-CMD-CHECK` workflow overwrites branch dev installs

GH-R-CMD-CHECK uses try(pak::pkg_install()) for each dev package:

extratests/.github/workflows/GH-R-CMD-check.yaml

Lines 61 to 83 in def7031

    
                     try(pak::pkg_install("tidymodels/baguette")) 
        
                     try(pak::pkg_install("tidymodels/bonsai")) 
        
                     try(pak::pkg_install("tidymodels/censored")) 
        
                     try(pak::pkg_install("tidymodels/dials")) 
        
                     try(pak::pkg_install("tidymodels/discrim")) 
        
                     try(pak::pkg_install("tidymodels/hardhat")) 
        
                     try(pak::pkg_install("tidymodels/modeldata")) 
        
                     try(pak::pkg_install("tidymodels/multilevelmod")) 
        
                     try(pak::pkg_install("tidymodels/parsnip")) 
        
                     try(pak::pkg_install("tidymodels/plsmod")) 
        
                     try(pak::pkg_install("tidymodels/poissonreg")) 
        
                     try(pak::pkg_install("tidymodels/recipes")) 
        
                     try(pak::pkg_install("tidymodels/rsample")) 
        
                     try(pak::pkg_install("tidymodels/rules")) 
        
                     try(pak::pkg_install("tidymodels/spatialsample")) 
        
                     try(pak::pkg_install("tidymodels/stacks")) 
        
                     try(pak::pkg_install("tidymodels/themis")) 
        
                     try(pak::pkg_install("tidymodels/tidymodels")) 
        
                     try(pak::pkg_install("tidymodels/tune")) 
        
                     try(pak::pkg_install("tidymodels/workflows")) 
        
                     try(pak::pkg_install("tidymodels/yardstick")) 
        
                     try(pak::pkg_install("tidymodels/finetune")) 
        
                     try(pak::pkg_install("business-science/modeltime"))

This is nice in that we don't have to deal with known dependency resolution issues, but also means that installs can be overwritten by later ones, e.g. in #198, making it difficult to test against dev branches.

We could:

Try to install in an order that minimizes overwritten installs, and/or
Allow package dependency resolution for core packages by passing them as a vector to pak::pkg_install(), like:

try(pak::pkg_install(paste0("tidymodels/", c("rsample", "recipes", "parsnip", "dials", "workflows", "tune"))))

This would mean that we're alerted when packages have circular dev dependencies, which is nice in the case where we did that accidentally but would require making temporary changes every time we do this knowingly.

Separate survival tests for `tune_race_anova()` and `tune_race_win_loss()`

The current structure has one file per tuning function with 4 testthat() expressions for 4 different metric types. Each expression does the tuning and then tests the structure of the results (a tune_results object) as well as a bunch of methods for the tune_results object.

The tests for racing currently execute both tune_race_anova() and tune_race_win_loss() within one test_that() call and then check everything twice, basically. I think it's worth splitting those apart so that we can more easily tell which one of the two tuning functions a failure is related to. I'd put them in two different files, too, to follow the structure for the other tuning functions.

Test `fit_best()` for survival models

bag_tree fit call

Current GH CI is failing with

══ Failed ══════════════════════════════════════════════════════════════════════
── 1. Failure ('test-censored-case-weights.R:36:3'): bag_tree - rpart censored c
Snapshot of code has changed:
old[2:5] vs new[2:5]
    wt_fit$fit$call
  Output
    bagging.data.frame(formula = Surv(time, event) ~ ., data = data, 
-       weights = weights)
+       weights = weights, cp = ~0, minsplit = ~2)

We have not been able to reproduce locally.

move recipes::step_nnmf tests here

they are looooong

Test tuning of survival models with case weights

Split up `test-parsnip-case-weights.R`

This file is taking really long to run which is very not-handy when updating snaps for snapshot tests.

namespace `slice()`

I see a good bit of:

Error (test-survival-tune_race_anova.R:208:3): race tuning (anova) survival models with static metric
Error in `UseMethod("slice")`: no applicable method for 'slice' applied to an object of class "c('tbl_df', 'tbl', 'data.frame')"
Backtrace:
    ▆
 1. ├─testthat::expect_equal(metric_all %>% slice(), exp_metric_all) at test-survival-tune_race_anova.R:208:3
 2. │ └─testthat::quasi_label(enquo(object), label, arg = "object")
 3. │   └─rlang::eval_bare(expr, quo_get_env(quo))
 4. ├─metric_all %>% slice()
 5. └─xgboost::slice(.)

when I run tests locally.

Test `fit_resamples()` for survival models

What we already have: nothing for survival models

What we still want: something that emulates what we are doing for tuning (since this is a subset of our tuning methods).

I've added a template in the work for #117 since it will be easier for reviewers to do both at once.

Move `master` branch to `main`

The master branch of this repository will soon be renamed to main, as part of a coordinated change across several GitHub organizations (including, but not limited to: tidyverse, r-lib, tidymodels, and sol-eng). We anticipate this will happen by the end of September 2021.

That will be preceded by a release of the usethis package, which will gain some functionality around detecting and adapting to a renamed default branch. There will also be a blog post at the time of this master --> main change.

The purpose of this issue is to:

Help us firm up the list of targetted repositories
Make sure all maintainers are aware of what's coming
Give us an issue to close when the job is done
Give us a place to put advice for collaborators re: how to adapt

message id: euphoric_snowdog

skip for dev workflows version is higher than it ought to be

In skips, I see:

── Skipped tests ──
Installed workflows is version 1.1.3; but 1.1.3.9007 is required (1):
  test-parsnip-survival-censoring-weights.R:225:3

The current dev workflows version is 1.1.3.9001, so this test isn't being run and likely ought to be.

Test `fit_resamples()` with `eval_time` when it's not needed

I think we should test what happens if you pass in time_points to fit_resamples() when they are not needed

Originally posted by @EmilHvitfeldt in #141 (comment)

new failures with glmnet penalty checks

Including on the CRAN R CMD CHECK workflow, we're seeing:

  old[2:6] vs new[2:6]
      multinom_reg(penalty = 0.01) %>% set_engine("glmnet") %>% fit(class ~ ., data = hpc_data) %>%
        predict(hpc_data, penalty = 0:1)
    Condition
  -   Error in `.check_glmnet_penalty_predict()`:
  +   Error:
  -   ! `penalty` should be a single numeric value. `multi_predict()` can be used to get multiple predictions per row of data.
  +   ! argument "..3" is missing, with no default

CRAN checks for parsnip and censored are fine.

Test `collect_metrics()` for survival models

We already have about 18 invocations of collect_metrics() but none for survival models. There is a TODO at the time of the initial tests but eval_time was not implemented at the time.

What we still want: survival-specific examples that look across different metric types and different cases for eval_time.

Test `augment.workflow()` for survival models

Tests on augment() for parsnip models already exist, but not yet for workflow objects. Functionality was added in tidymodels/workflows#200.

Inspect spark tests

Are they running as intended?

Several are being skipped, e.g.:

(code run outside of test_that()) ('test-spark-linear-reg.R:2:1') - Reason: spark_not_installed() is TRUE

Improve plot/snapshot for `autoplot()` method on racing with survival models

The corresponding issue is tidymodels/tune#755. The issue mentioned in the comments is a duplicate of 755.

extratests/tests/testthat/test-survival-tune-race.R

Lines 654 to 658 in 1531442

    
           # TODO make better plot at resolution of https://github.com/tidymodels/tune/issues/754 
        
           # expect_snapshot_plot( 
        
           #   print(autoplot(aov_mixed_res, eval_time = c(1, 5))), 
        
           #   "mix-aov-race-2-times" 
        
           # )

and

extratests/tests/testthat/test-survival-tune-race.R

Lines 668 to 674 in 1531442

    
           # TODO make better plot at resolution of https://github.com/tidymodels/tune/issues/754 
        
           # expect_snapshot_warning( 
        
           #   expect_snapshot_plot( 
        
           #     print(autoplot(aov_mixed_res)), 
        
           #     "mix-aov-race-0-times" 
        
           #   ) 
        
           # )

Add test on logging for racing with survival models

The issue mentioned in this comment has been closed tidymodels/finetune#81

extratests/tests/testthat/test-survival-tune-race.R

Lines 404 to 406 in 1531442

    
           # TODO add a test for checking the evaluation time in this message: 
        
           # https://github.com/tidymodels/finetune/issues/81 
        
           expect_true(any(grepl("Racing will minimize the brier_survival metric", aov_dyn_output)))

Meta issue for survival analysis in tidymodels

Functionality/documentation

To discuss

tidymodels/censored#147

Testing

Reduce messages in test output

The output for test failures is very noisy, making it hard to spot the actual failures amidst all the messages.

test_check("extratests", reporter = "summary")
parsnip-extension-messaging: ....Loading required package: survival
...................
S3-registration: ......
butcher-recipes: 
Attaching package: 'Matrix'

The following objects are masked from 'package:tidyr':

    expand, pack, unpack

....
encodings-glmnet: Loaded glmnet 4.1-8
.........
encodings-randomForest: ......
encodings-ranger: ......
engine-parameters-c5: ..! The Gaussian process model is being fit using 3 features but only has 3
  data points to do so. This may cause errors or a poor model fit.
! The Gaussian process model is being fit using 3 features but only has 4
  data points to do so. This may cause errors or a poor model fit.
....! The Gaussian process model is being fit using 3 features but only has 3
  data points to do so. This may cause errors or a poor model fit.
! The Gaussian process model is being fit using 3 features but only has 4
  data points to do so. This may cause errors or a poor model fit.
..
engine-parameters-earth: ..! The Gaussian process model is being fit using 2 features but only has 3
  data points to do so. This may cause errors or a poor model fit.
..

Missing snapshots for `test-recipes-nnmf_sparse.R`

5. Correct values ('test-recipes-nnmf_sparse.R:21:3') - Adding new snapshot for variant 'recipes1.0.9.9000':
Code
  print(rec)
Message
  
  -- Recipe ----------------------------------------------------------------------
  
  -- Inputs 
  Number of variables by role
  outcome:   1
  predictor: 4
  
  -- Operations 
  * Non-negative matrix factorization for: all_predictors()

6. No NNF ('test-recipes-nnmf_sparse.R:43:3') - Adding new snapshot for variant 'recipes1.0.9.9000':
Code
  print(rec)
Message
  
  -- Recipe ----------------------------------------------------------------------
  
  -- Inputs 
  Number of variables by role
  outcome:   1
  predictor: 4
  
  -- Training information 
  Training data contained 150 data points and no incomplete rows.
  
  -- Operations 
  * No non-negative matrix factorization was extracted from: Sepal.Length,
    Sepal.Width, Petal.Length, Petal.Width | Trained

GH check tracks outdated tune version

The tune version in GH-R-CMD-CHECK is a specific tune PR:

extratests/.github/workflows/GH-R-CMD-check.yaml

Line 79 in c7bc3bd

try(pak::pkg_install("tidymodels/tune#804"))

The stacks tuning tests are taking too long

File: https://github.com/tidymodels/extratests/blob/main/tests/testthat/test-stacks-tuning.R

Audit `skip()` with `minimum_version` requirements

If we are past all the required versions on CRAN and RSPM, we don't really need those skip statements anymore, do we? Is there a case where it would install an older version and break again?

If we don't need the skip statements anymore, I suggest we remove them (even if they don't "hurt") so that we have one less thing to pay attention to when we look at the tests i.e. lowering the cognitive load 😄

prevent Actions from turning itself off

We recently got bit by Actions turning themselves off after inactivity in the repo. To avoid having to dig in on a PR like #121 in the future, we may consider Keepalive to ensure that extratests continue being run.

Test fit interfaces called from workflow for survival models

Let's test that we can use both the formula and the matrix interface to the underlying parsnip model, i.e., trigger fit() and fit_xy() from a workflow, by supplying a model formula.

This is a good opportunity to test stratification since strata() is a specials term that needs to go into the model formula arg of add_model(). Both engines for the proportional_hazards() model, "survival" and "glmnet", support stratification.

We should also test that prediction works, regardless of which fit interface was used.

Review running order of tests

We might be able to speed up the tests if we start long-running tests first.

Test that survival metrics can handle case weights

Is this already covered in yardstick?

Reduce warnings about non-portable files

In the GHA logs, there are plenty of warnings about non-portable files which we should inspect and either fix or silence, if possible, because it's making the output very noisy.

Example from R CMD build stage (as part of check-r-package):

Warning in utils::tar(filepath, pkgname, compression = compression, compression_level = 9L, :
storing paths of more than 100 bytes is not portable:
‘extratests/tests/testthat/_snaps/survival-tune-bayes/dynamic-metric-bayes-search-with-two-time-points-marginals.png’

Then, in R CMD check:

checking for portable file names ... NOTE
Found the following non-portable file paths:
extratests/tests/testthat/_snaps/survival-tune-bayes/dynamic-metric-bayes-search-with-two-time-points-marginals.png

And again in R CMD check results:

❯ checking for portable file names ... NOTE
Found the following non-portable file paths:
extratests/tests/testthat/_snaps/survival-tune-bayes/dynamic-metric-bayes-search-with-two-time-points-marginals.png

With the following note

Tarballs are only required to store paths of up to 100 bytes and cannot
store those of more than 256 bytes, with restrictions including to 100
bytes for the final component.
See section ‘Package structure’ in the ‘Writing R Extensions’ manual.

Since the files are part of snapshot tests, are these tests carried out?

Add a test(s) for spatialsample

Maybe add a test or two that an object created by spatialsample can be used for tuning

check tune::compute_metrics for survival models

Not sure if/where it is currently tested for this.

Test `augment()` for `tune_results`, `resample_results`, and `last_fit` objects with survival models

Test `workflow_set()` and `workflow_map()` for survival models

Problem with using `tidy()` (S3 registration) on Windows in parallel

Hi,all,
I'm reading this book to get to know with tidymodels, and get an problem with running parallel resampling on Windows.
As to the examples in Chapter 10, the main parallel part of fit_resamples() is O.K. with library doParallel on Windows, but only one issue with extract_fit_parsnip(x):

library(tidymodels)
# All operating systems
library(doParallel)
library(kableExtra)
library(tidyr)
tidymodels_prefer()

data(ames)
ames <- mutate(ames, Sale_Price = log10(Sale_Price))

set.seed(502)
ames_split <- initial_split(ames, prop = 0.80, strata = Sale_Price)
ames_train <- training(ames_split)
ames_test  <-  testing(ames_split)

ames_rec <- 
  recipe(Sale_Price ~ Neighborhood + Gr_Liv_Area + Year_Built + Bldg_Type + 
           Latitude + Longitude, data = ames_train) %>%
  step_log(Gr_Liv_Area, base = 10) %>% 
  step_other(Neighborhood, threshold = 0.01) %>% 
  step_dummy(all_nominal_predictors()) %>% 
  step_interact( ~ Gr_Liv_Area:starts_with("Bldg_Type_") ) %>% 
  step_ns(Latitude, Longitude, deg_free = 20)

lm_model <- linear_reg() %>% set_engine("lm")

lm_wflow <- 
  workflow() %>% 
  add_model(lm_model) %>% 
  add_recipe(ames_rec)

lm_fit <- fit(lm_wflow, ames_train)

# This line is O.K.
extract_fit_parsnip(lm_fit) %>% tidy()

# ------------------------------------------------------------------------------------
set.seed(1001)
ames_folds <- vfold_cv(ames_train, v = 10)

# Create a cluster object and then register: 
# cl <- makePSOCKcluster(parallel::detectCores())
cl <- makePSOCKcluster(10)
registerDoParallel(cl)

get_model <- function(x) {
  # not O.K. on Windows & Linux.
  extract_fit_parsnip(x) %>% tidy()
  # This line is O.K.
  # extract_recipe(x, estimated = TRUE)
}

ctrl <- control_resamples(save_pred=TRUE,verbose=TRUE, extract = get_model)
set.seed(1003)
lm_res <- lm_wflow %>% fit_resamples(resamples = ames_folds, control = ctrl)

# Stop parallel
stopCluster(cl)

#  These lines are O.K.
lm_res
lm_res$.metrics[[1]]
lm_res$.notes[[1]]
lm_res$.predictions[[1]]

> lm_res$.extracts[[1]]
# A tibble: 1 x 2
  .extracts      .config             
  <list>         <chr>               
1 <try-errr [1]> Preprocessor1_Model1

> # To get the results
> lm_res$.extracts[[1]][[1]]
[[1]]
[1] "Error in UseMethod(\"tidy\") : \n  no applicable method for 'tidy' applied to an object of class \"lm\"\n"
attr(,"class")
[1] "try-error"
attr(,"condition")
<simpleError in UseMethod("tidy"): no applicable method for 'tidy' applied to an object of class "lm">

Any idea?

Best regards.

Test `show_best()`, `select_best()`, and `select_by_*()` for survival models

show_best() #156
select_best() #165
select_by_*() #165

Many other PRs have affected these (esp improving the text in warnings and errors)

Test `last_fit()` for survival models

What we already have:

What we still want:

	try(pak::pkg_install("tidymodels/baguette"))
	try(pak::pkg_install("tidymodels/bonsai"))
	try(pak::pkg_install("tidymodels/censored"))
	try(pak::pkg_install("tidymodels/dials"))
	try(pak::pkg_install("tidymodels/discrim"))
	try(pak::pkg_install("tidymodels/hardhat"))
	try(pak::pkg_install("tidymodels/modeldata"))
	try(pak::pkg_install("tidymodels/multilevelmod"))
	try(pak::pkg_install("tidymodels/parsnip"))
	try(pak::pkg_install("tidymodels/plsmod"))
	try(pak::pkg_install("tidymodels/poissonreg"))
	try(pak::pkg_install("tidymodels/recipes"))
	try(pak::pkg_install("tidymodels/rsample"))
	try(pak::pkg_install("tidymodels/rules"))
	try(pak::pkg_install("tidymodels/spatialsample"))
	try(pak::pkg_install("tidymodels/stacks"))
	try(pak::pkg_install("tidymodels/themis"))
	try(pak::pkg_install("tidymodels/tidymodels"))
	try(pak::pkg_install("tidymodels/tune"))
	try(pak::pkg_install("tidymodels/workflows"))
	try(pak::pkg_install("tidymodels/yardstick"))
	try(pak::pkg_install("tidymodels/finetune"))
	try(pak::pkg_install("business-science/modeltime"))

	# TODO make better plot at resolution of https://github.com/tidymodels/tune/issues/754
	# expect_snapshot_plot(
	# print(autoplot(aov_mixed_res, eval_time = c(1, 5))),
	# "mix-aov-race-2-times"
	# )

	# TODO make better plot at resolution of https://github.com/tidymodels/tune/issues/754
	# expect_snapshot_warning(
	# expect_snapshot_plot(
	# print(autoplot(aov_mixed_res)),
	# "mix-aov-race-0-times"
	# )
	# )

	# TODO add a test for checking the evaluation time in this message:
	# https://github.com/tidymodels/finetune/issues/81
	expect_true(any(grepl("Racing will minimize the brier_survival metric", aov_dyn_output)))

tidymodels / extratests Goto Github PK

extratests's People

Stargazers

Watchers

Forkers

extratests's Issues

Recommend Projects

Recommend Topics

Recommend Org