tidymodels / baguette Goto Github PK
View Code? Open in Web Editor NEWparsnip Model Functions for Bagging
Home Page: https://baguette.tidymodels.org
License: Other
parsnip Model Functions for Bagging
Home Page: https://baguette.tidymodels.org
License: Other
To turn off batcher
ing the objects.
Prepare for release:
devtools::build_readme()
urlchecker::url_check()
devtools::check(remote = TRUE, manual = TRUE)
devtools::check_win_devel()
rhub::check_for_cran()
revdepcheck::cloud_check()
cran-comments.md
Submit to CRAN:
usethis::use_version('major')
devtools::submit_cran()
Wait for CRAN...
usethis::use_github_release()
usethis::use_dev_version()
parsnip has exported the bag_mlp()
model type for some time now, but the engine implementations for that model type in this package haven't yet hit CRAN.
The metadata on this implementation lives in parsnip's model info table. With CRAN versions of both packages installed, then, we get some confusing behavior:
library(parsnip)
bag_mlp("classification")
#> ! parsnip could not locate an implementation for `bag_mlp` classification model
#> specifications.
#> βΉ The parsnip extension package baguette implements support for this
#> specification.
#> βΉ Please install (if needed) and load to continue.
#>
#> Bagged Neural Network Model Specification (classification)
#>
#> Computational engine: nnet
library(baguette)
bag_mlp("classification")
#> ! parsnip could not locate an implementation for `bag_mlp` classification model
#> specifications.
#> βΉ The parsnip extension package baguette implements support for this
#> specification.
#> βΉ Please install (if needed) and load to continue.
#>
#> Bagged Neural Network Model Specification (classification)
#>
#> Computational engine: nnet
Created on 2023-03-28 with reprex v2.0.2
With dev baguette, the second print proceeds just fine. Note that disabling the parsnip::prompt_missing_implementation()
doesn't prevent fit errors--they just get more confusing.
The other option here could be to downgrade extension versions used when regenerating the model info table before this next parsnip release.
Include a allow_parallel
option that shuts off parallel processing even if a sequential plan is not enacted.
So this:
From: Prof Brian Ripley [email protected]
Date: Saturday, February 10, 2024 at 2:49β―AM
To: CRAN [email protected]
Subject: CRAN packages requiring 'TeachingDemos'Packages
BCEA BDWreg BTLLasso CEEMDANML CompareMultipleModels DWreg Ecdat Ecfun
FREEtree GPCMlasso HDSpatialScan HH LocalControl MARSANNhybrid MARSGWR
MARSSVRhybrid Power2Stage PowerTOST RBPcurve RcmdrPlugin.TeachingDemos
RcmdrPlugin.UCA SPEDInstabR SSDM TipDatingBeast WaveletML
WaveletMLbestFL adepro baguette berryFunctions bexy biomod2 brxx bujar
cTOST cmaRs earth ecm ecospat finnts gbts gecko geospt geosptdb gk
httk icensBKL invacost metafolio metajam missingHE moderate.mediation
nlnet palaeoSig pheble plotmo pre rbooster replicateBE spm spm2
stepgbm steprf tehtuner tourr tsensembler vaccine viraldomain
viralmodels viralx voi wallacerequire package TeachingDemos
directly or indirectly and it has now been orphaned. The CRAN policy
"Orphaned CRAN packages should not be strict requirements (in the
βDependsβ, βImportsβ or βLinkingToβ fields, including indirectly).
They are allowed in βSuggestsβ if used conditionally, although this is
discouraged."
We don't use TeachingDemos at all; I presume this is becuase the earth package does (but has not been updated for this issue on CRAN yet).
I'm going to do a conditional Suggests for earth, which may break some people's code, then undo the change once earth is updated Β―\(γ)/Β―
Right now these go to the extract()
function and opts
is used to pass them.
It's probably better to use ...
instead. If so, then there are some arguments that need .
prefixes (like control etc).
"regression rules"?
We might eventually add C5.0 rules...
Calculate out of bag error and use it for model performance estimates and hyper parameter tuning.
I did a fair amount of sleuthing on this and it looks like this used to be a feature that could be requested through control_bag()
, but was removed in 9fae03c because of something related to C5.0.
I'm wondering if we can get this back for rpart models?
Extending this further, I wonder if it might be possible to use tune_grid()
with a bagged model, using the OOB samples as the validation set, to tune hyperparameters. I also thought it might be worth thinking about a new function called something like fit_bagged()
that would basically operate exactly like fit_resamples()
, but would provide the metrics on the OOB samples.
We are standardizing to have our parsnip-adjacent package keep their model definitions in parsnip so that other packages can add engines.
This package should take a version dependency of parsnip >= 0.1.7.9000
.
That way we can either have the R objects back or the expressions (as opposed to just the expressions).
Move var_imp
, oob
, and allow_parallel
inside.
with their defaults for this package (e.g. cost_complexity = 0
etc). This may require each model to have it's own function though.
Create special bagging functions for tree-based models with costs. Make interface easy when there are two classes and derive class probabilities form vote percentages of hard classifications.
.estimator
etc
Prepare for release:
git pull
devtools::build_readme()
urlchecker::url_check()
devtools::check(remote = TRUE, manual = TRUE)
devtools::check_win_devel()
rhub::check_for_cran()
revdepcheck::cloud_check()
cran-comments.md
git push
Submit to CRAN:
usethis::use_version('minor')
devtools::submit_cran()
Wait for CRAN...
git push
usethis::use_github_release()
usethis::use_dev_version()
git push
To be more consistent with tune
names
2022
usethis::use_tidy_coc()
master
--> main
issuesdevelopment
is mode: auto
in pkgdown configusethis::use_lifecycle()
2023
Necessary:
person(given = "Posit Software, PBC", role = c("cph", "fnd"))
use_mit_license()
use_tidy_logo()
usethis::use_tidy_coc()
usethis::use_tidy_github_actions()
Optional:
pak::pak("org/pkg")
over devtools::install_github("org/pkg")
in READMEuse_tidy_dependencies()
and/or replace compat files with use_standalone()
use_standalone("r-lib/rlang", "types-check")
instead of home grown argument checkersFirst of all, thanks a lot for baguette
! It's a superb addition to tidymodels
.
I've been playing around with bagged decision trees and random forests and noticed that both models have different argument names for the same thing.
Since random forests are just an adaptation of bagged trees to sample mtry
differently between trees, they both have an argument for how many trees/bootstraps are performed. This means that the argument trees
of rand_forest
controls the same thing as the argument times
from bagger
. They both refer to the number of trees (or bootstraps) used in the ensemble.
If my intuition is right, have you considered adding the trees
argument to bag_tree
? It can be mapped directly to the times
argument of bagger
such that there's no backwards break. This means that the times
argument would stop being an engine specific argument and form part of bag_tree
. This wouldn't be a problem in terms of engines since both rpart
and C5.0
support the argument. This way there's consistency between models and makes switching from packages more familiar.
If you feel this might be useful, we can discuss it further and I could prepare a PR for this.
Prepare for release:
git pull
devtools::build_readme()
urlchecker::url_check()
devtools::check(remote = TRUE, manual = TRUE)
devtools::check_win_devel()
rhub::check_for_cran()
revdepcheck::cloud_check()
cran-comments.md
git push
Submit to CRAN:
usethis::use_version('patch')
devtools::submit_cran()
Wait for CRAN...
git push
usethis::use_github_release()
usethis::use_dev_version()
git push
I'm seeing a lot of warnings in the tests of the type below, would be nice to clean up π§Ή
Warning (test-mars.R:72:3): check model reduction
non-uniform 'Rounding' sampler used
Backtrace:
1. baguette::bagger(...)
at test-mars.R:72:2
24. dplyr `<fn>`(`<smplWrnn>`)
25. dplyr:::check_muffled_warning(w)
26. base::withRestarts(...)
27. base withOneRestart(expr, restarts[[1L]])
28. base doWithOneRestart(return(expr), restart)
Pre-history
usethis::use_readme_rmd()
usethis::use_roxygen_md()
usethis::use_github_links()
usethis::use_pkgdown_github_pages()
usethis::use_tidy_github_labels()
usethis::use_tidy_style()
usethis::use_tidy_description()
urlchecker::url_check()
2020
usethis::use_package_doc()
@importFrom
directives here.usethis::use_import_from()
is handy for this.usethis::use_testthat(3)
and upgrade to 3e, testthat 3e vignetteR/
files and test/
files for workflow happiness.usethis::rename_files()
can be helpful.2021
usethis::use_tidy_dependencies()
usethis::use_tidy_github_actions()
and update artisanal actions to use setup-r-dependencies
cran-comments.md
Authors@R
of DESCRIPTION like so, if appropriate:person("RStudio", role = c("cph", "fnd"))
2022
usethis::use_tidy_coc()
master
--> main
issuesdevelopment
is mode: auto
in pkgdown configThe docs for bag_tree()
show cost_complexity
values greater than one, which leads folks astray. We should show values that would actually work for, say, rpart, like this:
library(dials)
#> Loading required package: scales
cost_complexity()
#> Cost-Complexity Parameter (quantitative)
#> Transformer: log-10
#> Range (transformed scale): [-10, -1]
Created on 2020-12-17 by the reprex package (v0.3.0.9001)
Prepare for release:
devtools::build_readme()
devtools::check(remote = TRUE, manual = TRUE)
devtools::check_win_devel()
rhub::check_for_cran()
revdepcheck::revdep_check(num_workers = 4)
cran-comments.md
Submit to CRAN:
usethis::use_version('minor')
devtools::submit_cran()
Wait for CRAN...
usethis::use_github_release()
usethis::use_dev_version()
Prepare for release:
devtools::build_readme()
urlchecker::url_check()
devtools::check(remote = TRUE, manual = TRUE)
devtools::check_win_devel()
rhub::check_for_cran()
revdepcheck::revdep_check(num_workers = 4)
cran-comments.md
Submit to CRAN:
usethis::use_version('patch')
devtools::submit_cran()
Wait for CRAN...
usethis::use_github_release()
usethis::use_dev_version()
provide an interface for subsampling based on class outcomes (or censoring indicators). Do this before bootstrap sampling.
The master
branch of this repository will soon be renamed to main
, as part of a coordinated change across several GitHub organizations (including, but not limited to: tidyverse, r-lib, tidymodels, and sol-eng). We anticipate this will happen by the end of September 2021.
That will be preceded by a release of the usethis package, which will gain some functionality around detecting and adapting to a renamed default branch. There will also be a blog post at the time of this master
--> main
change.
The purpose of this issue is to:
message id: euphoric_snowdog
use with_r::with_seed()
for bagging fit code
Use cli errors in favor of rlang / home-grown machinery
I'm finding that bag_tree
models with the same data and parameter values will inconsistently fail with the error:
#> Warning: Unknown or uninitialised column: `importance`.
#> Error: Input must be a vector, not NULL.
#> Timing stopped at: 0.471 0 0.471
Below is an example where I run the same model 5 times, and it errors for the first 4 runs only to work on the 5th
library(tidymodels)
library(baguette)
bag_tree(cost_complexity = 0.5) %>%
set_mode("classification") %>%
set_engine("rpart", times = 3) %>%
fit(Species ~ ., data = iris)
#> Warning: Unknown or uninitialised column: `importance`.
#> Error: Input must be a vector, not NULL.
#> Timing stopped at: 0.525 0.004 0.529
bag_tree(cost_complexity = 0.5) %>%
set_mode("classification") %>%
set_engine("rpart", times = 3) %>%
fit(Species ~ ., data = iris)
#> Warning: Unknown or uninitialised column: `importance`.
#> Error: Input must be a vector, not NULL.
#> Timing stopped at: 0.469 0 0.469
bag_tree(cost_complexity = 0.5) %>%
set_mode("classification") %>%
set_engine("rpart", times = 3) %>%
fit(Species ~ ., data = iris)
#> Warning: Unknown or uninitialised column: `importance`.
#> Error: Input must be a vector, not NULL.
#> Timing stopped at: 0.471 0 0.472
bag_tree(cost_complexity = 0.5) %>%
set_mode("classification") %>%
set_engine("rpart", times = 3) %>%
fit(Species ~ ., data = iris)
#> Warning: Unknown or uninitialised column: `importance`.
#> Error: Input must be a vector, not NULL.
#> Timing stopped at: 0.475 0 0.475
bag_tree(cost_complexity = 0.5) %>%
set_mode("classification") %>%
set_engine("rpart", times = 3) %>%
fit(Species ~ ., data = iris)
#> parsnip model object
#>
#> Fit time: 496ms
#> Bagged CART (classification with 3 members)
#>
#> Variable importance scores include:
#>
#> # A tibble: 4 x 4
#> term value std.error used
#> <chr> <dbl> <dbl> <int>
#> 1 Petal.Length 50.2 0.953 3
#> 2 Petal.Width 50.2 0.953 3
#> 3 Sepal.Length 39.4 2.48 3
#> 4 Sepal.Width 25.0 3.06 3
Created on 2021-04-10 by the reprex package (v2.0.0)
sessioninfo::session_info()
#> β Session info βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
#> setting value
#> version R version 4.0.5 (2021-03-31)
#> os Ubuntu 20.10
#> system x86_64, linux-gnu
#> ui X11
#> language en_AU:en
#> collate en_AU.UTF-8
#> ctype en_AU.UTF-8
#> tz Australia/Perth
#> date 2021-04-10
#>
#> β Packages βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
#> package * version date lib source
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.4)
#> backports 1.2.1 2020-12-09 [1] CRAN (R 4.0.4)
#> baguette * 0.1.0 2020-10-28 [1] CRAN (R 4.0.5)
#> broom * 0.7.6 2021-04-05 [1] CRAN (R 4.0.5)
#> butcher 0.1.4 2021-03-19 [1] CRAN (R 4.0.5)
#> C50 0.1.3.1 2020-05-26 [1] CRAN (R 4.0.5)
#> class 7.3-18 2021-01-24 [4] CRAN (R 4.0.3)
#> cli 2.4.0 2021-04-05 [1] CRAN (R 4.0.5)
#> codetools 0.2-17 2020-10-17 [4] CRAN (R 4.0.3)
#> colorspace 2.0-0 2020-11-11 [1] CRAN (R 4.0.4)
#> crayon 1.4.1 2021-02-08 [1] CRAN (R 4.0.4)
#> Cubist 0.2.3 2020-01-10 [1] CRAN (R 4.0.5)
#> DBI 1.1.1 2021-01-15 [1] CRAN (R 4.0.4)
#> dials * 0.0.9 2020-09-16 [1] CRAN (R 4.0.4)
#> DiceDesign 1.9 2021-02-13 [1] CRAN (R 4.0.4)
#> digest 0.6.27 2020-10-24 [1] CRAN (R 4.0.4)
#> dplyr * 1.0.5 2021-03-05 [1] CRAN (R 4.0.4)
#> earth 5.3.0 2020-10-11 [1] CRAN (R 4.0.5)
#> ellipsis 0.3.1 2020-05-15 [1] CRAN (R 4.0.4)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.4)
#> fansi 0.4.2 2021-01-15 [1] CRAN (R 4.0.4)
#> foreach 1.5.1 2020-10-15 [1] CRAN (R 4.0.4)
#> Formula 1.2-4 2020-10-16 [1] CRAN (R 4.0.5)
#> fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.4)
#> furrr 0.2.2 2021-01-29 [1] CRAN (R 4.0.4)
#> future 1.21.0 2020-12-10 [1] CRAN (R 4.0.4)
#> generics 0.1.0 2020-10-31 [1] CRAN (R 4.0.4)
#> ggplot2 * 3.3.3 2020-12-30 [1] CRAN (R 4.0.4)
#> globals 0.14.0 2020-11-22 [1] CRAN (R 4.0.4)
#> glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.4)
#> gower 0.2.2 2020-06-23 [1] CRAN (R 4.0.4)
#> GPfit 1.0-8 2019-02-08 [1] CRAN (R 4.0.4)
#> gtable 0.3.0 2019-03-25 [1] CRAN (R 4.0.4)
#> hardhat 0.1.5 2020-11-09 [1] CRAN (R 4.0.4)
#> highr 0.8 2019-03-20 [1] CRAN (R 4.0.4)
#> htmltools 0.5.1.1 2021-01-22 [1] CRAN (R 4.0.4)
#> infer * 0.5.4 2021-01-13 [1] CRAN (R 4.0.4)
#> inum 1.0-3 2021-02-08 [1] CRAN (R 4.0.5)
#> ipred 0.9-11 2021-03-12 [1] CRAN (R 4.0.4)
#> iterators 1.0.13 2020-10-15 [1] CRAN (R 4.0.4)
#> knitr 1.31 2021-01-27 [1] CRAN (R 4.0.4)
#> lattice 0.20-41 2020-04-02 [4] CRAN (R 4.0.3)
#> lava 1.6.9 2021-03-11 [1] CRAN (R 4.0.4)
#> lhs 1.1.1 2020-10-05 [1] CRAN (R 4.0.4)
#> libcoin 1.0-8 2021-02-08 [1] CRAN (R 4.0.5)
#> lifecycle 1.0.0 2021-02-15 [1] CRAN (R 4.0.4)
#> listenv 0.8.0 2019-12-05 [1] CRAN (R 4.0.4)
#> lubridate 1.7.10 2021-02-26 [1] CRAN (R 4.0.4)
#> magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.0.4)
#> MASS 7.3-53.1 2021-02-12 [4] CRAN (R 4.0.3)
#> Matrix 1.3-2 2021-01-06 [4] CRAN (R 4.0.3)
#> modeldata * 0.1.0 2020-10-22 [1] CRAN (R 4.0.4)
#> munsell 0.5.0 2018-06-12 [1] CRAN (R 4.0.4)
#> mvtnorm 1.1-1 2020-06-09 [1] CRAN (R 4.0.5)
#> nnet 7.3-15 2021-01-24 [4] CRAN (R 4.0.3)
#> parallelly 1.24.0 2021-03-14 [1] CRAN (R 4.0.4)
#> parsnip * 0.1.5 2021-01-19 [1] CRAN (R 4.0.4)
#> partykit 1.2-13 2021-03-03 [1] CRAN (R 4.0.5)
#> pillar 1.5.1 2021-03-05 [1] CRAN (R 4.0.4)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.4)
#> plotmo 3.6.0 2020-09-13 [1] CRAN (R 4.0.5)
#> plotrix 3.8-1 2021-01-21 [1] CRAN (R 4.0.5)
#> plyr 1.8.6 2020-03-03 [1] CRAN (R 4.0.4)
#> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.0.4)
#> pROC 1.17.0.1 2021-01-13 [1] CRAN (R 4.0.4)
#> prodlim 2019.11.13 2019-11-17 [1] CRAN (R 4.0.4)
#> purrr * 0.3.4 2020-04-17 [1] CRAN (R 4.0.4)
#> R6 2.5.0 2020-10-28 [1] CRAN (R 4.0.4)
#> Rcpp 1.0.6 2021-01-15 [1] CRAN (R 4.0.4)
#> recipes * 0.1.15 2020-11-11 [1] CRAN (R 4.0.4)
#> reprex 2.0.0 2021-04-02 [1] CRAN (R 4.0.5)
#> reshape2 1.4.4 2020-04-09 [1] CRAN (R 4.0.4)
#> rlang 0.4.10 2020-12-30 [1] CRAN (R 4.0.4)
#> rmarkdown 2.7 2021-02-19 [1] CRAN (R 4.0.4)
#> rpart 4.1-15 2019-04-12 [4] CRAN (R 4.0.3)
#> rsample * 0.0.9 2021-02-17 [1] CRAN (R 4.0.4)
#> rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.0.4)
#> scales * 1.1.1 2020-05-11 [1] CRAN (R 4.0.4)
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.4)
#> stringi 1.5.3 2020-09-09 [1] CRAN (R 4.0.4)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.4)
#> styler 1.4.1 2021-03-30 [1] CRAN (R 4.0.5)
#> survival 3.2-10 2021-03-16 [4] CRAN (R 4.0.4)
#> TeachingDemos 2.12 2020-04-07 [1] CRAN (R 4.0.5)
#> tibble * 3.1.0 2021-02-25 [1] CRAN (R 4.0.4)
#> tidymodels * 0.1.2 2020-11-22 [1] CRAN (R 4.0.4)
#> tidyr * 1.1.3 2021-03-03 [1] CRAN (R 4.0.4)
#> tidyselect 1.1.0 2020-05-11 [1] CRAN (R 4.0.4)
#> timeDate 3043.102 2018-02-21 [1] CRAN (R 4.0.4)
#> tune * 0.1.3 2021-02-28 [1] CRAN (R 4.0.4)
#> usethis 2.0.1 2021-02-10 [1] CRAN (R 4.0.4)
#> utf8 1.2.1 2021-03-12 [1] CRAN (R 4.0.4)
#> vctrs 0.3.7 2021-03-29 [1] CRAN (R 4.0.5)
#> withr 2.4.1 2021-01-26 [1] CRAN (R 4.0.4)
#> workflows * 0.2.2 2021-03-10 [1] CRAN (R 4.0.4)
#> xfun 0.22 2021-03-11 [1] CRAN (R 4.0.4)
#> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.4)
#> yardstick * 0.0.8 2021-03-28 [1] CRAN (R 4.0.4)
#>
#> [1] /home/andrew/R/x86_64-pc-linux-gnu-library/4.0
#> [2] /usr/local/lib/R/site-library
#> [3] /usr/lib/R/site-library
#> [4] /usr/lib/R/library
and other things that we might want from tree models
The pkgdown site doesn't include the same information as the Github readme, is this on purpose?
https://baguette.tidymodels.org/dev/
https://baguette.tidymodels.org/
Prepare for release:
git pull
devtools::build_readme()
urlchecker::url_check()
devtools::check(remote = TRUE, manual = TRUE)
devtools::check_win_devel()
rhub::check_for_cran()
revdepcheck::cloud_check()
cran-comments.md
git push
Submit to CRAN:
usethis::use_version('major')
devtools::submit_cran()
Wait for CRAN...
git push
usethis::use_github_release()
usethis::use_dev_version()
git push
Prepare for release:
git pull
urlchecker::url_check()
devtools::build_readme()
devtools::check(remote = TRUE, manual = TRUE)
devtools::check_win_devel()
revdepcheck::cloud_check()
cran-comments.md
git push
Submit to CRAN:
usethis::use_version('patch')
devtools::submit_cran()
Wait for CRAN...
usethis::use_github_release()
usethis::use_dev_version(push = TRUE)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.