statworx / bouncer Goto Github PK
View Code? Open in Web Editor NEWAutomated Feature Selection
License: MIT License
Automated Feature Selection
License: MIT License
Thanks for this great package.
The function featureSelection
is not working because of the following part of the code:
future::plan(multiprocess, workers = cores)
The way a managed to fix it was rewrite this part of the function as follows:
future::plan("multicore", workers = cores)
And now it works again :)
Hi! Thanks for the package.
Is that requiriment really necessary?
> devtools::install_github("STATWORX/bounceR")
ERROR: this R is version 3.4.1, package 'bounceR' requires R >= 3.4.3
nstallation failed: Command failed (1)
During installation I have error like this:
Searching for a similar problems says that it's related to difference in versions of one or few dependaries. But how do I know exact package that have the wrong version
Really love the idea for this package. Been having fun playing with some of the test data functions, but I noticed a snag when I tried to use the builder() function.
test_df <- sim_data()
# feature selection
sel <- featureSelection(df = test_df, #this is out of date, needed to change to "data = "
target = "y",
index = NULL,
method = "randomboost",
n_cores = 1)
# extract one feature
form <- builder(object = sel, n_features = 1)
Throws this error
Error in object[["stability"]] : this S4 class is not subsettable
Looking at the function definition, it seems like it's just a syntax issue with S4. sel@stability[,'feature']
works fine so I basically did a find and replace on "object[[]]" in the function and it seemed to work okay.
`myBuilder <- function (object, n_features = 5)
{
if (!class(object) == "sel_obj")
stop(paste0("This function only works with objects of type ",
sQuote("sel_obj"), "!"))
if (n_features > length(pull(object@stability[, "feature"]))) {
n_features <- length(pull(object@stability[, "feature"]))
warning(paste0("Seems like you chose too many features. Try to reduce ",
sQuote("n_features"), ". Using all features: ",
n_features))
}
form <- as.formula(paste(object@setup[["target"]],
"~", paste(pull(object@stability[, "feature"][1:n_features,
]), collapse = "+"), sep = ""))
return(form)
}
form <- myBuilder(object = sel, n_features = 10)
form
y ~ noise199 + noise053 + noise119 + noise182 + noise095 + noise009 +
noise197 + noise061 + noise091 + noise198
`
I'm not a github pro by any means, in fact I've never submitted an issue or suggestion on someone's code before. Hopefully I haven't committed some cardinal sin with this issue post, but this is as an awesome package and it seemed like this small tweak worked. Hope this is helpful. Great work so far will be following to see how this package develops!
I've tried setting the max_time parameter to "5 mins" or to an interval of 5 and both function calls ran for well over 20 minutes.
Hi,
Could you please have a look at the tree base learner? I fit a regression problem into featureSelection function, however it complains:
Error in paste("In iteration", i, "I could not fit a model in round", :
object 'ii' not found
After further checking, the problem seems comes from below script:
mboost::gamboost(as.formula(paste(target, "~", ".", sep = "")),
data = df_mirrored_model,
control = mboost::boost_control(mstop = boosting[["mstop"]], nu = boosting[["nu"]]),
baselearner = c("btree"))
It will throw an error msg:
Error in get(baselearner, mode = "function", envir = parent.frame()) :
object 'btree' of mode 'function' was not found
I got error:
Error in .local(.Object, ...) :
user cannot request for more solutions than is possible given the data set
Code:
train_df <- sim_data(n = 1000,
modelvars = 10,
noisevars = 10,
model_sd = 4,
noise_sd = 4,
epsilon_sd = 4,
outcome = "regression",
cutoff = NULL)
head(train_df[, 1:10])
# Maximum Relevance Minimum Redundancy Filter
test_mr <- featureFiltering(data = train_df,
target = "y",
method = "mrmr",
returning = "names")
Hello,
I ran into following issue when I tried installing bounceR
Attempt to install:
devtools::install_github("STATWORX/bounceR")
Resulted in:
Downloading GitHub repo STATWORX/bounceR@master Skipping 1 packages not available: mRMRe Installing 1 packages: mRMRe Installing package into ‘C:/Users/Martin/Documents/GIT/as_systems/prediction_automatization/packrat/lib/x86_64-w64-mingw32/3.5.1’ (as ‘lib’ is unspecified) Error: (converted from warning) package ‘mRMRe’ is not available (for R version 3.5.1)
Result of session_info()
:
session_info()
:
- Session info ------------------------------------------------------------------------------------------
setting value
version R version 3.5.1 (2018-07-02)
os Windows >= 8 x64
system x86_64, mingw32
ui RStudio
language (EN)
collate English_United Kingdom.1252
ctype English_United Kingdom.1252
tz Europe/London
date 2019-02-06
- Packages ----------------------------------------------------------------------------------------------
! package * version date lib source
P assertthat 0.2.0 2017-04-11 [?] CRAN (R 3.5.1)
backports 1.1.3 2018-12-14 [1] CRAN (R 3.5.2)
P callr 3.1.1 2018-12-21 [?] CRAN (R 3.5.2)
P cli 1.0.1 2018-09-25 [?] CRAN (R 3.5.2)
P crayon 1.3.4 2017-09-16 [?] CRAN (R 3.5.1)
curl 3.2 2018-03-28 [1] CRAN (R 3.5.1)
desc 1.2.0 2018-05-01 [1] CRAN (R 3.5.1)
P devtools * 2.0.1 2018-10-26 [?] CRAN (R 3.5.2)
P digest 0.6.18 2018-10-10 [?] CRAN (R 3.5.2)
P fs 1.2.6 2018-08-23 [?] CRAN (R 3.5.2)
P glue 1.3.0 2018-07-17 [?] CRAN (R 3.5.1)
P magrittr 1.5 2014-11-22 [?] CRAN (R 3.5.1)
P memoise 1.1.0 2017-04-21 [?] CRAN (R 3.5.1)
packrat 0.5.0-5 2019-02-06 [1] Github (rstudio/packrat@9dee5d8)
P pkgbuild 1.0.2 2018-10-16 [?] CRAN (R 3.5.2)
P pkgload 1.0.2 2018-10-29 [?] CRAN (R 3.5.2)
P prettyunits 1.0.2 2015-07-13 [?] CRAN (R 3.5.1)
P processx 3.2.1 2018-12-05 [?] CRAN (R 3.5.2)
ps 1.3.0 2018-12-21 [1] CRAN (R 3.5.2)
V R6 2.2.2 2018-10-04 [1] CRAN (R 3.5.2)
P Rcpp 1.0.0 2018-11-07 [?] CRAN (R 3.5.2)
P remotes 2.0.2 2018-10-30 [?] CRAN (R 3.5.2)
P RevoUtils * 11.0.1 2018-08-01 [?] local
R RevoUtilsMath * 11.0.0 <NA> [?] <NA>
P rlang 0.3.1 2019-01-08 [?] CRAN (R 3.5.2)
rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.5.1)
rstudioapi 0.9.0 2019-01-09 [1] CRAN (R 3.5.2)
P sessioninfo 1.1.1 2018-11-05 [?] CRAN (R 3.5.2)
P usethis * 1.4.0 2018-08-14 [?] CRAN (R 3.5.2)
withr 2.1.2 2018-03-15 [1] CRAN (R 3.5.1)
I am also running on Microsoft Open R and currently use packrat for reproducibility with repos directed to Global CDN R Studio server. I though it might be something related to packrat project but I also got the error with clean R session (i.e.: outside any project).
Thank you for any help.
Please note: My work around this issue is that I've decided to solve it by installing archived version of mRMRe
from https://cran.r-project.org/src/contrib/Archive/mRMRe/
.
The code I used was:
library(remotes)
install_version("mRMRe", "2.0.7")
# Then proceed to:
devtools::install_github("STATWORX/bounceR")
The package installed correctly after this and was able to load.
Thanks a lot for incorporating furrr
! It's really great to see it get some love and use in other packages. I had a suggestion on "best practices" of using future
based packages, hopefully you find it useful.
I suggest you remove the future::plan()
call from featureSelection()
. It is best practice to let the user supply the plan, and the developer only worries about what code is parallelized, not how it is parallelized.
The reason for this is that you are inherently limiting the user by setting plan(multiprocess)
to only be able to use their local computer for parallel feature selection. future
can do much more than this, like run on EC2 or a remote cluster. Ideally, this is what you'd have:
# by default, future_map() runs sequentially if you don't specify any plan
featureSelection(...)
# runs in parallel on your local computer
plan(multiprocess)
featureSelection(...)
# runs in parallel sharded over a cluster somewhere
plan(cluster)
featureSelection(...)
# runs in parallel on multiple ec2 instances
plan(cluster, workers = ec2_ip_addresses)
featureSelection(...)
# sends x, y, and z each to a node of the cluster AND runs in parallel on those cluster nodes
plan(list(cluster, multiprocess))
map(list(x,y,z), featureSelection(.x))
See how many fun things you can do if you let the user specify the plan?
I am getting this error when installing from Github:
Installing package into ‘/Users/<me>/lib/R’
(as ‘lib’ is unspecified)
* installing *source* package ‘bounceR’ ...
** R
** byte-compile and prepare package for lazy loading
Error in setMethod("plot", signature("sel_obj"), function(x, n_features = NULL) { :
no existing definition for function ‘plot’
Error : unable to load R code in package ‘bounceR’
ERROR: lazy loading failed for package ‘bounceR’
This appears to be tied to
Line 17 in 9e83d61
Can anyone reproduce this?
threshold_corr is built using the average correlations of the features, but this also includes the correlation "1" of the target value itself. This means that in situations in which I either don't have many features or they correlate badly, the correlated_features variable remains empty and a subscriptOutOfBounds exception is thrown:
cc <- bounceR::featureFiltering(data = fewFeaturesData,
target = targetCol,
method = "cc",
returning = "data")
Error in subset_df[, paste(collinear$Var1[i])] : subscript out of bounds
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.