Giter Site home page Giter Site logo

bouncer's People

Contributors

andremonaco avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

bouncer's Issues

future::plan(multiprocess, workers = cores) not working

Thanks for this great package.

The function featureSelection is not working because of the following part of the code:

enable parallel processing

future::plan(multiprocess, workers = cores)

The way a managed to fix it was rewrite this part of the function as follows:
future::plan("multicore", workers = cores)

And now it works again :)

Package 'bounceR' requires R >= 3.4.3

Hi! Thanks for the package.

Is that requiriment really necessary?

> devtools::install_github("STATWORX/bounceR")

ERROR: this R is version 3.4.1, package 'bounceR' requires R >= 3.4.3
nstallation failed: Command failed (1)

Install error: Object vI is not found

During installation I have error like this:

  • installing source package 'bounceR' ...
    ** R
    ** preparing package for lazy loading
    Error in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]) :
    object 'vI' is not found
    ERROR: lazy loading failed for package 'bounceR'
  • removing 'C:/Program Files/R/R-3.4.4/library/bounceR'

Searching for a similar problems says that it's related to difference in versions of one or few dependaries. But how do I know exact package that have the wrong version

builder() function S4 class issue

Really love the idea for this package. Been having fun playing with some of the test data functions, but I noticed a snag when I tried to use the builder() function.

test_df <- sim_data()

# feature selection
sel <- featureSelection(df = test_df, #this is out of date, needed to change to "data = "
                        target = "y",
                        index = NULL,
                        method = "randomboost",
                        n_cores = 1)
# extract one feature
form <- builder(object = sel, n_features = 1)

Throws this error
Error in object[["stability"]] : this S4 class is not subsettable

Looking at the function definition, it seems like it's just a syntax issue with S4. sel@stability[,'feature'] works fine so I basically did a find and replace on "object[[]]" in the function and it seemed to work okay.

`myBuilder <- function (object, n_features = 5)
{
if (!class(object) == "sel_obj")
stop(paste0("This function only works with objects of type ",
sQuote("sel_obj"), "!"))
if (n_features > length(pull(object@stability[, "feature"]))) {
n_features <- length(pull(object@stability[, "feature"]))
warning(paste0("Seems like you chose too many features. Try to reduce ",
sQuote("n_features"), ". Using all features: ",
n_features))
}
form <- as.formula(paste(object@setup[["target"]],
"~", paste(pull(object@stability[, "feature"][1:n_features,
]), collapse = "+"), sep = ""))
return(form)
}

form <- myBuilder(object = sel, n_features = 10)

form
y ~ noise199 + noise053 + noise119 + noise182 + noise095 + noise009 +
noise197 + noise061 + noise091 + noise198

`

I'm not a github pro by any means, in fact I've never submitted an issue or suggestion on someone's code before. Hopefully I haven't committed some cardinal sin with this issue post, but this is as an awesome package and it seemed like this small tweak worked. Hope this is helpful. Great work so far will be following to see how this package develops!

Tree base learner not working

Hi,

Could you please have a look at the tree base learner? I fit a regression problem into featureSelection function, however it complains:
Error in paste("In iteration", i, "I could not fit a model in round", :
object 'ii' not found

After further checking, the problem seems comes from below script:
mboost::gamboost(as.formula(paste(target, "~", ".", sep = "")),
data = df_mirrored_model,
control = mboost::boost_control(mstop = boosting[["mstop"]], nu = boosting[["nu"]]),
baselearner = c("btree"))

It will throw an error msg:
Error in get(baselearner, mode = "function", envir = parent.frame()) :
object 'btree' of mode 'function' was not found

error in featureFiltering with small variable count

I got error:

Error in .local(.Object, ...) : 
  user cannot request for more solutions than is possible given the data set

Code:

train_df <- sim_data(n = 1000,
                     modelvars = 10,
                     noisevars = 10,
                     model_sd = 4,
                     noise_sd = 4,
                     epsilon_sd = 4,
                     outcome = "regression",
                     cutoff = NULL)
head(train_df[, 1:10])
# Maximum Relevance Minimum Redundancy Filter
test_mr <- featureFiltering(data = train_df,
                            target = "y",
                            method = "mrmr",
                            returning = "names")

Error: (converted from warning) package ‘mRMRe’ is not available (for R version 3.5.1)

Hello,
I ran into following issue when I tried installing bounceR

Attempt to install:

devtools::install_github("STATWORX/bounceR")

Resulted in:

Downloading GitHub repo STATWORX/bounceR@master Skipping 1 packages not available: mRMRe Installing 1 packages: mRMRe Installing package into ‘C:/Users/Martin/Documents/GIT/as_systems/prediction_automatization/packrat/lib/x86_64-w64-mingw32/3.5.1’ (as ‘lib’ is unspecified) Error: (converted from warning) package ‘mRMRe’ is not available (for R version 3.5.1)

Result of session_info():
session_info():

- Session info ------------------------------------------------------------------------------------------
 setting  value                       
 version  R version 3.5.1 (2018-07-02)
 os       Windows >= 8 x64            
 system   x86_64, mingw32             
 ui       RStudio                     
 language (EN)                        
 collate  English_United Kingdom.1252 
 ctype    English_United Kingdom.1252 
 tz       Europe/London               
 date     2019-02-06                  

- Packages ----------------------------------------------------------------------------------------------
 ! package       * version date       lib source                          
 P assertthat      0.2.0   2017-04-11 [?] CRAN (R 3.5.1)                  
   backports       1.1.3   2018-12-14 [1] CRAN (R 3.5.2)                  
 P callr           3.1.1   2018-12-21 [?] CRAN (R 3.5.2)                  
 P cli             1.0.1   2018-09-25 [?] CRAN (R 3.5.2)                  
 P crayon          1.3.4   2017-09-16 [?] CRAN (R 3.5.1)                  
   curl            3.2     2018-03-28 [1] CRAN (R 3.5.1)                  
   desc            1.2.0   2018-05-01 [1] CRAN (R 3.5.1)                  
 P devtools      * 2.0.1   2018-10-26 [?] CRAN (R 3.5.2)                  
 P digest          0.6.18  2018-10-10 [?] CRAN (R 3.5.2)                  
 P fs              1.2.6   2018-08-23 [?] CRAN (R 3.5.2)                  
 P glue            1.3.0   2018-07-17 [?] CRAN (R 3.5.1)                  
 P magrittr        1.5     2014-11-22 [?] CRAN (R 3.5.1)                  
 P memoise         1.1.0   2017-04-21 [?] CRAN (R 3.5.1)                  
   packrat         0.5.0-5 2019-02-06 [1] Github (rstudio/packrat@9dee5d8)
 P pkgbuild        1.0.2   2018-10-16 [?] CRAN (R 3.5.2)                  
 P pkgload         1.0.2   2018-10-29 [?] CRAN (R 3.5.2)                  
 P prettyunits     1.0.2   2015-07-13 [?] CRAN (R 3.5.1)                  
 P processx        3.2.1   2018-12-05 [?] CRAN (R 3.5.2)                  
   ps              1.3.0   2018-12-21 [1] CRAN (R 3.5.2)                  
 V R6              2.2.2   2018-10-04 [1] CRAN (R 3.5.2)                  
 P Rcpp            1.0.0   2018-11-07 [?] CRAN (R 3.5.2)                  
 P remotes         2.0.2   2018-10-30 [?] CRAN (R 3.5.2)                  
 P RevoUtils     * 11.0.1  2018-08-01 [?] local                           
 R RevoUtilsMath * 11.0.0  <NA>       [?] <NA>                            
 P rlang           0.3.1   2019-01-08 [?] CRAN (R 3.5.2)                  
   rprojroot       1.3-2   2018-01-03 [1] CRAN (R 3.5.1)                  
   rstudioapi      0.9.0   2019-01-09 [1] CRAN (R 3.5.2)                  
 P sessioninfo     1.1.1   2018-11-05 [?] CRAN (R 3.5.2)                  
 P usethis       * 1.4.0   2018-08-14 [?] CRAN (R 3.5.2)                  
   withr           2.1.2   2018-03-15 [1] CRAN (R 3.5.1)

I am also running on Microsoft Open R and currently use packrat for reproducibility with repos directed to Global CDN R Studio server. I though it might be something related to packrat project but I also got the error with clean R session (i.e.: outside any project).

Thank you for any help.

Please note: My work around this issue is that I've decided to solve it by installing archived version of mRMRe from https://cran.r-project.org/src/contrib/Archive/mRMRe/.

The code I used was:

library(remotes)
install_version("mRMRe", "2.0.7")
# Then proceed to:
devtools::install_github("STATWORX/bounceR")

The package installed correctly after this and was able to load.

A suggestion on your `furrr` implementation

Thanks a lot for incorporating furrr! It's really great to see it get some love and use in other packages. I had a suggestion on "best practices" of using future based packages, hopefully you find it useful.

I suggest you remove the future::plan() call from featureSelection(). It is best practice to let the user supply the plan, and the developer only worries about what code is parallelized, not how it is parallelized.

The reason for this is that you are inherently limiting the user by setting plan(multiprocess) to only be able to use their local computer for parallel feature selection. future can do much more than this, like run on EC2 or a remote cluster. Ideally, this is what you'd have:

# by default, future_map() runs sequentially if you don't specify any plan
featureSelection(...)

# runs in parallel on your local computer
plan(multiprocess)
featureSelection(...)

# runs in parallel sharded over a cluster somewhere
plan(cluster)
featureSelection(...)

# runs in parallel on multiple ec2 instances
plan(cluster, workers = ec2_ip_addresses)
featureSelection(...)

# sends x, y, and z each to a node of the cluster AND runs in parallel on those cluster nodes
plan(list(cluster, multiprocess))
map(list(x,y,z), featureSelection(.x))

See how many fun things you can do if you let the user specify the plan?

Install error: "no existing definition for function ‘plot’"

I am getting this error when installing from Github:

Installing package into ‘/Users/<me>/lib/R’
(as ‘lib’ is unspecified)
* installing *source* package ‘bounceR’ ...
** R
** byte-compile and prepare package for lazy loading
Error in setMethod("plot", signature("sel_obj"), function(x, n_features = NULL) { : 
  no existing definition for function ‘plot’
Error : unable to load R code in package ‘bounceR’
ERROR: lazy loading failed for package ‘bounceR’

This appears to be tied to

setMethod("plot", signature("sel_obj"),
, and after some initial research, I am not quite sure what is going wrong here.

Can anyone reproduce this?

When few features exist or none of the features correlates well with the target, a subscriptOutOfBounds exception is thrown

threshold_corr is built using the average correlations of the features, but this also includes the correlation "1" of the target value itself. This means that in situations in which I either don't have many features or they correlate badly, the correlated_features variable remains empty and a subscriptOutOfBounds exception is thrown:

cc <- bounceR::featureFiltering(data = fewFeaturesData,
                                target = targetCol,
                                method = "cc",
                                returning = "data")

Error in subset_df[, paste(collinear$Var1[i])] : subscript out of bounds

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.