tidymodels / extratests Goto Github PK
View Code? Open in Web Editor NEWIntegration and other testing for tidymodels
License: Other
Integration and other testing for tidymodels
License: Other
This is related to #185. Which was darn hard to figure out.
What happens is that the .onload()
of baguette will trigger in a test. Which is what we want. But it will stay triggered for any of the following tests. Such that we get different results sometimes depending on if we test all files or one file. Also, it violates some soft of test smell, since the order of tests matter
library(parsnip)
library(testthat)
parsnip::get_model_env()$bag_tree_args
#> # A tibble: 0 × 5
#> # ℹ 5 variables: engine <chr>, parsnip <chr>, original <chr>, func <list>,
#> # has_submodel <lgl>
# could be in a previous file
test_that("load baguette", {
library(baguette)
})
#> ── Skip: load baguette ─────────────────────────────────────────────────────────
#> Reason: empty test
parsnip::get_model_env()$bag_tree_args
#> # A tibble: 6 × 5
#> engine parsnip original func has_submodel
#> <chr> <chr> <chr> <list> <lgl>
#> 1 rpart class_cost cost <named list [2]> FALSE
#> 2 rpart tree_depth maxdepth <named list [2]> FALSE
#> 3 rpart min_n minsplit <named list [2]> FALSE
#> 4 rpart cost_complexity cp <named list [2]> FALSE
#> 5 C5.0 class_cost cost <named list [2]> FALSE
#> 6 C5.0 min_n minCases <named list [2]> FALSE
GH R-CMD-CHECK is failing with:
Error:
! error in pak subprocess
Caused by error:
! Could not solve package dependencies:
* deps::.: Can't install dependency aorsf
* aorsf: Can't find package called aorsf.
* any::sessioninfo: dependency conflict
* any::rcmdcheck: dependency conflict
as aorsf got booted from CRAN. Let's configure so we install from GitHub for now.
it's back :) let's undo https://github.com/tidymodels/extratests/pull/172/files
I have problems in textrecipes and themis lately with not getting tuneable steps to work.
I have a feeling it has something to with .onLoad()
but it is hard to test within the packages individually.
Here is an example of it not working in themis. I feel like it used to work at one point.
library(tidymodels)
library(themis)
data("credit_data")
ds_rec <- recipe(Status ~ Age + Income + Assets, data = credit_data) %>%
step_meanimpute(all_predictors()) %>%
step_bsmote(Status, all_neighbors = tune())
mod_spec <- null_model() %>%
set_engine("parsnip")
wf_spec <- workflow() %>%
add_model(mod_spec) %>%
add_recipe(ds_rec)
folds <- vfold_cv(credit_data, v = 2)
tune_grid(wf_spec, resamples = folds, grid = grid_regular(all_neighbors()))
#> Warning: No tuning parameters have been detected, performance will be evaluated
#> using the resamples with no tuning. Did you want [fit_resamples()]?
#> x Fold1: recipe: Error: You cannot `prep()` a tuneable recipe. Argument(s) with `...
#> x Fold2: recipe: Error: You cannot `prep()` a tuneable recipe. Argument(s) with `...
#> Warning: All models failed in tune_grid(). See the `.notes` column.
#> Warning: This tuning result has notes. Example notes on model fitting include:
#> recipe: Error: You cannot `prep()` a tuneable recipe. Argument(s) with `tune()`: 'all_neighbors'. Do you want to use a tuning function such as `tune_grid()`?
#> recipe: Error: You cannot `prep()` a tuneable recipe. Argument(s) with `tune()`: 'all_neighbors'. Do you want to use a tuning function such as `tune_grid()`?
#> # Tuning results
#> # 2-fold cross-validation
#> # A tibble: 2 x 4
#> splits id .metrics .notes
#> <list> <chr> <list> <list>
#> 1 <split [2.2K/2.2K]> Fold1 <NULL> <tibble [1 × 1]>
#> 2 <split [2.2K/2.2K]> Fold2 <NULL> <tibble [1 × 1]>
Created on 2020-08-07 by the reprex package (v0.3.0)
varying()
has been replaced by tune()
and varying_args()
by tune_args()
(which has tests in test-tune_args.R
).
It'd be good to check if any of the tests here should be updated (and possibly migrated to live either in test-tune_args.R
or in recipes) or if they could be deleted.
GH-R-CMD-CHECK uses try(pak::pkg_install())
for each dev package:
extratests/.github/workflows/GH-R-CMD-check.yaml
Lines 61 to 83 in def7031
This is nice in that we don't have to deal with known dependency resolution issues, but also means that installs can be overwritten by later ones, e.g. in #198, making it difficult to test against dev branches.
We could:
pak::pkg_install()
, like:try(pak::pkg_install(paste0("tidymodels/", c("rsample", "recipes", "parsnip", "dials", "workflows", "tune"))))
This would mean that we're alerted when packages have circular dev dependencies, which is nice in the case where we did that accidentally but would require making temporary changes every time we do this knowingly.
The current structure has one file per tuning function with 4 testthat() expressions for 4 different metric types. Each expression does the tuning and then tests the structure of the results (a tune_results
object) as well as a bunch of methods for the tune_results
object.
The tests for racing currently execute both tune_race_anova()
and tune_race_win_loss()
within one test_that()
call and then check everything twice, basically. I think it's worth splitting those apart so that we can more easily tell which one of the two tuning functions a failure is related to. I'd put them in two different files, too, to follow the structure for the other tuning functions.
Current GH CI is failing with
══ Failed ══════════════════════════════════════════════════════════════════════
── 1. Failure ('test-censored-case-weights.R:36:3'): bag_tree - rpart censored c
Snapshot of code has changed:
old[2:5] vs new[2:5]
wt_fit$fit$call
Output
bagging.data.frame(formula = Surv(time, event) ~ ., data = data,
- weights = weights)
+ weights = weights, cp = ~0, minsplit = ~2)
We have not been able to reproduce locally.
they are looooong
This file is taking really long to run which is very not-handy when updating snaps for snapshot tests.
I see a good bit of:
Error (test-survival-tune_race_anova.R:208:3): race tuning (anova) survival models with static metric
Error in `UseMethod("slice")`: no applicable method for 'slice' applied to an object of class "c('tbl_df', 'tbl', 'data.frame')"
Backtrace:
▆
1. ├─testthat::expect_equal(metric_all %>% slice(), exp_metric_all) at test-survival-tune_race_anova.R:208:3
2. │ └─testthat::quasi_label(enquo(object), label, arg = "object")
3. │ └─rlang::eval_bare(expr, quo_get_env(quo))
4. ├─metric_all %>% slice()
5. └─xgboost::slice(.)
when I run tests locally.
What we already have: nothing for survival models
What we still want: something that emulates what we are doing for tuning (since this is a subset of our tuning methods).
I've added a template in the work for #117 since it will be easier for reviewers to do both at once.
The master
branch of this repository will soon be renamed to main
, as part of a coordinated change across several GitHub organizations (including, but not limited to: tidyverse, r-lib, tidymodels, and sol-eng). We anticipate this will happen by the end of September 2021.
That will be preceded by a release of the usethis package, which will gain some functionality around detecting and adapting to a renamed default branch. There will also be a blog post at the time of this master
--> main
change.
The purpose of this issue is to:
message id: euphoric_snowdog
In skips, I see:
── Skipped tests ──
Installed workflows is version 1.1.3; but 1.1.3.9007 is required (1):
test-parsnip-survival-censoring-weights.R:225:3
The current dev workflows version is 1.1.3.9001, so this test isn't being run and likely ought to be.
I think we should test what happens if you pass in
time_points
tofit_resamples()
when they are not needed
Originally posted by @EmilHvitfeldt in #141 (comment)
Including on the CRAN R CMD CHECK workflow, we're seeing:
old[2:6] vs new[2:6]
multinom_reg(penalty = 0.01) %>% set_engine("glmnet") %>% fit(class ~ ., data = hpc_data) %>%
predict(hpc_data, penalty = 0:1)
Condition
- Error in `.check_glmnet_penalty_predict()`:
+ Error:
- ! `penalty` should be a single numeric value. `multi_predict()` can be used to get multiple predictions per row of data.
+ ! argument "..3" is missing, with no default
We already have about 18 invocations of collect_metrics()
but none for survival models. There is a TODO at the time of the initial tests but eval_time
was not implemented at the time.
What we still want: survival-specific examples that look across different metric types and different cases for eval_time
.
Tests on augment()
for parsnip models already exist, but not yet for workflow objects. Functionality was added in tidymodels/workflows#200.
Are they running as intended?
Several are being skipped, e.g.:
- (code run outside of
test_that()
) ('test-spark-linear-reg.R:2:1') - Reason: spark_not_installed() is TRUE
The corresponding issue is tidymodels/tune#755. The issue mentioned in the comments is a duplicate of 755.
extratests/tests/testthat/test-survival-tune-race.R
Lines 654 to 658 in 1531442
and
extratests/tests/testthat/test-survival-tune-race.R
Lines 668 to 674 in 1531442
The issue mentioned in this comment has been closed tidymodels/finetune#81
extratests/tests/testthat/test-survival-tune-race.R
Lines 404 to 406 in 1531442
Functionality/documentation
To discuss
Testing
fit_resamples()
#113last_fit()
#114fit_best()
#115collect_metrics()
#117show_best()
, select_best()
#118augment()
#132workflow_set()
and workflow_map()
#119tune_race_*()
tidymodels/finetune#81The output for test failures is very noisy, making it hard to spot the actual failures amidst all the messages.
test_check("extratests", reporter = "summary")
parsnip-extension-messaging: ....Loading required package: survival
...................
S3-registration: ......
butcher-recipes:
Attaching package: 'Matrix'
The following objects are masked from 'package:tidyr':
expand, pack, unpack
....
encodings-glmnet: Loaded glmnet 4.1-8
.........
encodings-randomForest: ......
encodings-ranger: ......
engine-parameters-c5: ..! The Gaussian process model is being fit using 3 features but only has 3
data points to do so. This may cause errors or a poor model fit.
! The Gaussian process model is being fit using 3 features but only has 4
data points to do so. This may cause errors or a poor model fit.
....! The Gaussian process model is being fit using 3 features but only has 3
data points to do so. This may cause errors or a poor model fit.
! The Gaussian process model is being fit using 3 features but only has 4
data points to do so. This may cause errors or a poor model fit.
..
engine-parameters-earth: ..! The Gaussian process model is being fit using 2 features but only has 3
data points to do so. This may cause errors or a poor model fit.
..
5. Correct values ('test-recipes-nnmf_sparse.R:21:3') - Adding new snapshot for variant 'recipes1.0.9.9000':
Code
print(rec)
Message
-- Recipe ----------------------------------------------------------------------
-- Inputs
Number of variables by role
outcome: 1
predictor: 4
-- Operations
* Non-negative matrix factorization for: all_predictors()
6. No NNF ('test-recipes-nnmf_sparse.R:43:3') - Adding new snapshot for variant 'recipes1.0.9.9000':
Code
print(rec)
Message
-- Recipe ----------------------------------------------------------------------
-- Inputs
Number of variables by role
outcome: 1
predictor: 4
-- Training information
Training data contained 150 data points and no incomplete rows.
-- Operations
* No non-negative matrix factorization was extracted from: Sepal.Length,
Sepal.Width, Petal.Length, Petal.Width | Trained
The tune version in GH-R-CMD-CHECK is a specific tune PR:
If we are past all the required versions on CRAN and RSPM, we don't really need those skip statements anymore, do we? Is there a case where it would install an older version and break again?
If we don't need the skip statements anymore, I suggest we remove them (even if they don't "hurt") so that we have one less thing to pay attention to when we look at the tests i.e. lowering the cognitive load 😄
Let's test that we can use both the formula and the matrix interface to the underlying parsnip model, i.e., trigger fit()
and fit_xy()
from a workflow, by supplying a model formula.
This is a good opportunity to test stratification since strata()
is a specials term that needs to go into the model formula
arg of add_model()
. Both engines for the proportional_hazards()
model, "survival"
and "glmnet"
, support stratification.
We should also test that prediction works, regardless of which fit interface was used.
We might be able to speed up the tests if we start long-running tests first.
Is this already covered in yardstick?
In the GHA logs, there are plenty of warnings about non-portable files which we should inspect and either fix or silence, if possible, because it's making the output very noisy.
Example from R CMD build
stage (as part of check-r-package
):
Warning in utils::tar(filepath, pkgname, compression = compression, compression_level = 9L, :
storing paths of more than 100 bytes is not portable:
‘extratests/tests/testthat/_snaps/survival-tune-bayes/dynamic-metric-bayes-search-with-two-time-points-marginals.png’
Then, in R CMD check
:
- checking for portable file names ... NOTE
Found the following non-portable file paths:
extratests/tests/testthat/_snaps/survival-tune-bayes/dynamic-metric-bayes-search-with-two-time-points-marginals.png
And again in R CMD check results
:
❯ checking for portable file names ... NOTE
Found the following non-portable file paths:
extratests/tests/testthat/_snaps/survival-tune-bayes/dynamic-metric-bayes-search-with-two-time-points-marginals.png
With the following note
Tarballs are only required to store paths of up to 100 bytes and cannot
store those of more than 256 bytes, with restrictions including to 100
bytes for the final component.
See section ‘Package structure’ in the ‘Writing R Extensions’ manual.
Since the files are part of snapshot tests, are these tests carried out?
Maybe add a test or two that an object created by spatialsample can be used for tuning
Not sure if/where it is currently tested for this.
Hi,all,
I'm reading this book to get to know with tidymodels, and get an problem with running parallel resampling on Windows.
As to the examples in Chapter 10, the main parallel part of fit_resamples() is O.K. with library doParallel on Windows, but only one issue with extract_fit_parsnip(x):
library(tidymodels)
# All operating systems
library(doParallel)
library(kableExtra)
library(tidyr)
tidymodels_prefer()
data(ames)
ames <- mutate(ames, Sale_Price = log10(Sale_Price))
set.seed(502)
ames_split <- initial_split(ames, prop = 0.80, strata = Sale_Price)
ames_train <- training(ames_split)
ames_test <- testing(ames_split)
ames_rec <-
recipe(Sale_Price ~ Neighborhood + Gr_Liv_Area + Year_Built + Bldg_Type +
Latitude + Longitude, data = ames_train) %>%
step_log(Gr_Liv_Area, base = 10) %>%
step_other(Neighborhood, threshold = 0.01) %>%
step_dummy(all_nominal_predictors()) %>%
step_interact( ~ Gr_Liv_Area:starts_with("Bldg_Type_") ) %>%
step_ns(Latitude, Longitude, deg_free = 20)
lm_model <- linear_reg() %>% set_engine("lm")
lm_wflow <-
workflow() %>%
add_model(lm_model) %>%
add_recipe(ames_rec)
lm_fit <- fit(lm_wflow, ames_train)
# This line is O.K.
extract_fit_parsnip(lm_fit) %>% tidy()
# ------------------------------------------------------------------------------------
set.seed(1001)
ames_folds <- vfold_cv(ames_train, v = 10)
# Create a cluster object and then register:
# cl <- makePSOCKcluster(parallel::detectCores())
cl <- makePSOCKcluster(10)
registerDoParallel(cl)
get_model <- function(x) {
# not O.K. on Windows & Linux.
extract_fit_parsnip(x) %>% tidy()
# This line is O.K.
# extract_recipe(x, estimated = TRUE)
}
ctrl <- control_resamples(save_pred=TRUE,verbose=TRUE, extract = get_model)
set.seed(1003)
lm_res <- lm_wflow %>% fit_resamples(resamples = ames_folds, control = ctrl)
# Stop parallel
stopCluster(cl)
# These lines are O.K.
lm_res
lm_res$.metrics[[1]]
lm_res$.notes[[1]]
lm_res$.predictions[[1]]
> lm_res$.extracts[[1]]
# A tibble: 1 x 2
.extracts .config
<list> <chr>
1 <try-errr [1]> Preprocessor1_Model1
> # To get the results
> lm_res$.extracts[[1]][[1]]
[[1]]
[1] "Error in UseMethod(\"tidy\") : \n no applicable method for 'tidy' applied to an object of class \"lm\"\n"
attr(,"class")
[1] "try-error"
attr(,"condition")
<simpleError in UseMethod("tidy"): no applicable method for 'tidy' applied to an object of class "lm">
Any idea?
Best regards.
What we already have:
What we still want:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.