Giter Site home page Giter Site logo

mlr-org / mlr3misc Goto Github PK

View Code? Open in Web Editor NEW
10.0 12.0 2.0 9.58 MB

Miscellaneous helper functions for mlr3

Home Page: https://mlr3misc.mlr-org.com

License: GNU Lesser General Public License v3.0

R 95.20% C 4.80%
mlr3 miscellaneous machine-learning r r-package

mlr3misc's Introduction

mlr3misc

Package website: release | dev

Miscellaneous helper functions for mlr3.

r-cmd-check CRAN Status StackOverflow Mattermost

mlr3misc's People

Contributors

be-marc avatar berndbischl avatar coorsaa avatar github-actions[bot] avatar gsenseless avatar jakob-r avatar mb706 avatar mllg avatar pat-s avatar sebffischer avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mlr3misc's Issues

insert_named doesn't work as expected for regr.glmnet

According to book chapter 2.3, we can use mlr3::insert_named() to set specific arguments without resetting existing arguments. However, when I tried to use it for a learner of class "regr.glmnet" the function didn't appear to work at all:

library(mlr3)
library(mlr3learners)
library(mlr3misc)

learner <- mlr_learners$get("regr.glmnet")
learner$param_set

<ParamSet>
                  id    class lower upper                     levels        default parents    value
 1:        alignment ParamFct    NA    NA            lambda,fraction         lambda                 
 2:            alpha ParamDbl     0     1                                         1                 
 3:              big ParamDbl  -Inf   Inf                                   9.9e+35                 
 4:           devmax ParamDbl     0     1                                     0.999                 
 5:            dfmax ParamInt     0   Inf                            <NoDefault[3]>                 
 6:              eps ParamDbl     0     1                                     1e-06                 
 7:            epsnr ParamDbl     0     1                                     1e-08                 
 8:            exact ParamLgl    NA    NA                 TRUE,FALSE          FALSE                 
 9:          exclude ParamInt     1   Inf                            <NoDefault[3]>                 
10:             exmx ParamDbl  -Inf   Inf                                       250                 
11:           family ParamFct    NA    NA           gaussian,poisson       gaussian         gaussian
12:             fdev ParamDbl     0     1                                     1e-05                 
13:            gamma ParamDbl  -Inf   Inf                                         1   relax         
14:          grouped ParamLgl    NA    NA                 TRUE,FALSE           TRUE                 
15:        intercept ParamLgl    NA    NA                 TRUE,FALSE           TRUE                 
16:             keep ParamLgl    NA    NA                 TRUE,FALSE          FALSE                 
17:           lambda ParamUty    NA    NA                            <NoDefault[3]>                 
18: lambda.min.ratio ParamDbl     0     1                            <NoDefault[3]>                 
19:     lower.limits ParamUty    NA    NA                            <NoDefault[3]>                 
20:            maxit ParamInt     1   Inf                                    100000                 
21:            mnlam ParamInt     1   Inf                                         5                 
22:             mxit ParamInt     1   Inf                                       100                 
23:           mxitnr ParamInt     1   Inf                                        25                 
24:        newoffset ParamUty    NA    NA                            <NoDefault[3]>                 
25:          nlambda ParamInt     1   Inf                                       100                 
26:           offset ParamUty    NA    NA                                                           
27:         parallel ParamLgl    NA    NA                 TRUE,FALSE          FALSE                 
28:   penalty.factor ParamUty    NA    NA                            <NoDefault[3]>                 
29:             pmax ParamInt     0   Inf                            <NoDefault[3]>                 
30:             pmin ParamDbl     0     1                                     1e-09                 
31:             prec ParamDbl  -Inf   Inf                                     1e-10                 
32:    predict.gamma ParamDbl  -Inf   Inf                                         1                 
33:            relax ParamLgl    NA    NA                 TRUE,FALSE          FALSE                 
34:                s ParamDbl     0   Inf                                      0.01                 
35:      standardize ParamLgl    NA    NA                 TRUE,FALSE           TRUE                 
36:           thresh ParamDbl     0   Inf                                     1e-07                 
37:         trace.it ParamInt     0     1                                         0                 
38:    type.gaussian ParamFct    NA    NA           covariance,naive <NoDefault[3]>  family         
39:    type.logistic ParamFct    NA    NA     Newton,modified.Newton <NoDefault[3]>                 
40:     type.measure ParamFct    NA    NA deviance,class,auc,mse,mae       deviance                 
41: type.multinomial ParamFct    NA    NA          ungrouped,grouped <NoDefault[3]>                 
42:     upper.limits ParamUty    NA    NA                            <NoDefault[3]>                 
                  id    class lower upper                     levels        default parents    value

## insert new params
insert_named(learner$param_set$values,
             list(family="poisson",
                  lambda=0))
learner$param_set

<ParamSet>
                  id    class lower upper                     levels        default parents    value
 1:        alignment ParamFct    NA    NA            lambda,fraction         lambda                 
 2:            alpha ParamDbl     0     1                                         1                 
 3:              big ParamDbl  -Inf   Inf                                   9.9e+35                 
 4:           devmax ParamDbl     0     1                                     0.999                 
 5:            dfmax ParamInt     0   Inf                            <NoDefault[3]>                 
 6:              eps ParamDbl     0     1                                     1e-06                 
 7:            epsnr ParamDbl     0     1                                     1e-08                 
 8:            exact ParamLgl    NA    NA                 TRUE,FALSE          FALSE                 
 9:          exclude ParamInt     1   Inf                            <NoDefault[3]>                 
10:             exmx ParamDbl  -Inf   Inf                                       250                 
11:           family ParamFct    NA    NA           gaussian,poisson       gaussian         gaussian
12:             fdev ParamDbl     0     1                                     1e-05                 
13:            gamma ParamDbl  -Inf   Inf                                         1   relax         
14:          grouped ParamLgl    NA    NA                 TRUE,FALSE           TRUE                 
15:        intercept ParamLgl    NA    NA                 TRUE,FALSE           TRUE                 
16:             keep ParamLgl    NA    NA                 TRUE,FALSE          FALSE                 
17:           lambda ParamUty    NA    NA                            <NoDefault[3]>                 
18: lambda.min.ratio ParamDbl     0     1                            <NoDefault[3]>                 
19:     lower.limits ParamUty    NA    NA                            <NoDefault[3]>                 
20:            maxit ParamInt     1   Inf                                    100000                 
21:            mnlam ParamInt     1   Inf                                         5                 
22:             mxit ParamInt     1   Inf                                       100                 
23:           mxitnr ParamInt     1   Inf                                        25                 
24:        newoffset ParamUty    NA    NA                            <NoDefault[3]>                 
25:          nlambda ParamInt     1   Inf                                       100                 
26:           offset ParamUty    NA    NA                                                           
27:         parallel ParamLgl    NA    NA                 TRUE,FALSE          FALSE                 
28:   penalty.factor ParamUty    NA    NA                            <NoDefault[3]>                 
29:             pmax ParamInt     0   Inf                            <NoDefault[3]>                 
30:             pmin ParamDbl     0     1                                     1e-09                 
31:             prec ParamDbl  -Inf   Inf                                     1e-10                 
32:    predict.gamma ParamDbl  -Inf   Inf                                         1                 
33:            relax ParamLgl    NA    NA                 TRUE,FALSE          FALSE                 
34:                s ParamDbl     0   Inf                                      0.01                 
35:      standardize ParamLgl    NA    NA                 TRUE,FALSE           TRUE                 
36:           thresh ParamDbl     0   Inf                                     1e-07                 
37:         trace.it ParamInt     0     1                                         0                 
38:    type.gaussian ParamFct    NA    NA           covariance,naive <NoDefault[3]>  family         
39:    type.logistic ParamFct    NA    NA     Newton,modified.Newton <NoDefault[3]>                 
40:     type.measure ParamFct    NA    NA deviance,class,auc,mse,mae       deviance                 
41: type.multinomial ParamFct    NA    NA          ungrouped,grouped <NoDefault[3]>                 
42:     upper.limits ParamUty    NA    NA                            <NoDefault[3]>                 
                  id    class lower upper                     levels        default parents    value

However it does appear to work when I overwrite it directly:

learner$param_set$values <- list(family="poisson",
                                 lambda=0)
learner$param_set

<ParamSet>
                  id    class lower upper                     levels        default parents   value
 1:        alignment ParamFct    NA    NA            lambda,fraction         lambda                
 2:            alpha ParamDbl     0     1                                         1                
 3:              big ParamDbl  -Inf   Inf                                   9.9e+35                
 4:           devmax ParamDbl     0     1                                     0.999                
 5:            dfmax ParamInt     0   Inf                            <NoDefault[3]>                
 6:              eps ParamDbl     0     1                                     1e-06                
 7:            epsnr ParamDbl     0     1                                     1e-08                
 8:            exact ParamLgl    NA    NA                 TRUE,FALSE          FALSE                
 9:          exclude ParamInt     1   Inf                            <NoDefault[3]>                
10:             exmx ParamDbl  -Inf   Inf                                       250                
11:           family ParamFct    NA    NA           gaussian,poisson       gaussian         poisson
12:             fdev ParamDbl     0     1                                     1e-05                
13:            gamma ParamDbl  -Inf   Inf                                         1   relax        
14:          grouped ParamLgl    NA    NA                 TRUE,FALSE           TRUE                
15:        intercept ParamLgl    NA    NA                 TRUE,FALSE           TRUE                
16:             keep ParamLgl    NA    NA                 TRUE,FALSE          FALSE                
17:           lambda ParamUty    NA    NA                            <NoDefault[3]>               0
18: lambda.min.ratio ParamDbl     0     1                            <NoDefault[3]>                
19:     lower.limits ParamUty    NA    NA                            <NoDefault[3]>                
20:            maxit ParamInt     1   Inf                                    100000                
21:            mnlam ParamInt     1   Inf                                         5                
22:             mxit ParamInt     1   Inf                                       100                
23:           mxitnr ParamInt     1   Inf                                        25                
24:        newoffset ParamUty    NA    NA                            <NoDefault[3]>                
25:          nlambda ParamInt     1   Inf                                       100                
26:           offset ParamUty    NA    NA                                                          
27:         parallel ParamLgl    NA    NA                 TRUE,FALSE          FALSE                
28:   penalty.factor ParamUty    NA    NA                            <NoDefault[3]>                
29:             pmax ParamInt     0   Inf                            <NoDefault[3]>                
30:             pmin ParamDbl     0     1                                     1e-09                
31:             prec ParamDbl  -Inf   Inf                                     1e-10                
32:    predict.gamma ParamDbl  -Inf   Inf                                         1                
33:            relax ParamLgl    NA    NA                 TRUE,FALSE          FALSE                
34:                s ParamDbl     0   Inf                                      0.01                
35:      standardize ParamLgl    NA    NA                 TRUE,FALSE           TRUE                
36:           thresh ParamDbl     0   Inf                                     1e-07                
37:         trace.it ParamInt     0     1                                         0                
38:    type.gaussian ParamFct    NA    NA           covariance,naive <NoDefault[3]>  family        
39:    type.logistic ParamFct    NA    NA     Newton,modified.Newton <NoDefault[3]>                
40:     type.measure ParamFct    NA    NA deviance,class,auc,mse,mae       deviance                
41: type.multinomial ParamFct    NA    NA          ungrouped,grouped <NoDefault[3]>                
42:     upper.limits ParamUty    NA    NA                            <NoDefault[3]>                
                  id    class lower upper                     levels        default parents   value

I'm running R 4.0.2, mlr3 0.9.0, mlr3learners 0.4.3, mlr3misc 0.6.0.

Release mlr3misc 0.14.0

Prepare for release:

  • git pull
  • Check current CRAN check results
  • Polish NEWS
  • urlchecker::url_check()
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • revdepcheck::revdep_check(num_workers = 4)
  • Update cran-comments.md
  • git push
  • Draft blog post

Submit to CRAN:

  • usethis::use_version('minor')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted 🎉
  • Add preemptive link to blog post in pkgdown news menu
  • usethis::use_github_release()
  • usethis::use_dev_version(push = TRUE)
  • Finish blog post
  • Tweet

now i need flatten

i want to peel off exactly one level of a list of list

so this should be
x = list(list(iris[1:2,]), list(iris[1:2,]))

list(iris[1:2,], iris[1:2,])

dictionary_sugar_get seems to fail when dictionary entries are functions

library("mlr3")
rsmp("holdout", x = 1)
#> Error: object of type 'closure' is not subsettable
#> Environment:
#>   1: rsmp("holdout", x = 1)
#>   2: dictionary_sugar_get(mlr_resamplings, .key, ...)
#>   3: stopf("Cannot set argument '%s' for '%s' (not a constructor argument, not a
#>   4: stop(simpleError(str_wrap(sprintf(msg, ...), width = wrap), call = NULL))
#>   5: simpleError(str_wrap(sprintf(msg, ...), width = wrap), call = NULL)
#>   6: structure(list(message = as.character(message), call = call), class = class
#>   7: str_wrap(sprintf(msg, ...), width = wrap)
#>   8: sprintf(msg, ...)
#>   9: did_you_mean(nn, c(constructor_args, param_ids, fields(obj$value)))
#>   10: unique(candidates)
#>   11: fields(obj$value)
#>   12: setdiff(names(x$public_methods), c("initialize", "clone", "print", "format"
#>   13: as.vector(x)

problem seems to be that rsmp()-entries are functions and not R6-constructors, which is explicitly allowed for dictionaries.

need something that removes environments from functions

I want to save an anonymous function in an object and serialize that object, but that object should not grow huge. It is then necessary that the function's environment hierarchy doesn't contain too much other stuff.

Usecase:

x = 1
large_obj = [huge object that takes lots of memory]
want_to_serialize = function(arg) return(arg + x)
saveRDS(want_to_serialize, "savefile.rds")

want_to_serialize's environment now contains large_obj and the savefile grows large. Instead I would want a function detachEnv(fun, keep, basis):

want_to_serialize_lean = detachEnv(want_to_serialize, keep = "x")
saveRDS(want_to_serialize_lean, "savefile.rds")

code for this:

# removes the environment from fun and potentially wraps it in a new environment
detachEnv <- function(fun, keep = character(0), basis = topenv(parent.frame())) {
  assertEnvironment(basis)
  assertCharacter(keep, any.missing = FALSE)
  assertFunction(fun)
  if (length(keep)) {
    keepvals <- mget(keep, parent.frame(), inherits = TRUE)
    basis <- new.env(parent = basis, size = length(keepvals))
    mapply(assign, names(keepvals), keepvals, MoreArgs = list(envir = basis))
  }
  environment(fun) <- basis
  fun
}

name up for debate

I can't download mlr3 onto my Rstudio

I tried to install it on my rstudio, but it just won't work. I did:

  1. install.packages("mlr3verse")
  2. install.packages("mlr3")

Every time when I run the library(ml3rverse) or library(mlr3), I get this error:
Error: package or namespace load failed for ‘mlr3verse’ in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]):
there is no package called ‘mlr3misc’

I have the latest Rstudio, is there something else that's wrong?

Automatic cleanup when unloading a package

As mlr3 populates dictionaries from packages like mlr3 or mlr3pipelines when loading extensions, those objects should be removed again when unloading the extension.
An idea brought forward by Lukas is to automate this, which I think is a good idea:
mlr-org/mlr3proba#301 (comment)

A simple solution would be to tag the dictionary entries with the package that added them.
Then, during the .onUnload function, we can remove all objects that were added by the package that is being unloaded.

backports version

mlr3misc uses backports::errorCondition, so to the degree it imports backports, it should require the backports version that introduced that. Otherwise it's very hard for a user who happens to have an old backports version installed to find out what the problem is.

BBmisc::chunk()

I could use this in the pipelines.
Is it broken, or does it make sense to copy this?

set_params

I think we should really offer this set_params public method for Learners, Graphs etc. so we can use it in the book.
Last time @mllg mentioned that we could just offer a s3 method that does something akin to

set_params.R6 = function(x, ...) {
  if (!is.null(x$param_set)) {
    x$param_set$values = insert_named(x$param_set$values, list(...))
  } 
  invisible(x)
}

admittedly that is less work and we could simply add it in mlr3misc without touching other packages. But I think the downsight is that this is isomewhat inconsistent. The user then has to guess when something is implemented as a method vs a s function (?) I think offering both would be fine as well but I definitely think that the following should work.

learner$set_params(x = 1, y = 2) 

This is also related to the suggestion that @mb706 made at some point, that it might make sense to introduce another class into the hierarchy that represents an object that has a param set and some other standardized properties? But even if this would make sense it would definitely take some time so I would suggest we make an mlr3misc release with this set_params utility that is already implemented and then add the method set_params to

  • Learner
  • Graph
  • PipeOp
  • Measure
  • Resampling
  • TaskGenerator
  • Optimizer
  • Terminator
  • Tuner
  • Surrogate
  • AcqOptimizer

I am not sure whether I forgot something.
I can do this if you are ok @mllg @be-marc

unnest() fails if column name y is present

a = data.table(y = list(0), opt_x = list(list(z_1 = 100, z_2 = 200)))
unnest(a, "opt_x")
# >    x y z_1 z_2
# > 1: 1 0   0   0

a = data.table(z = list(0), opt_x = list(list(z_1 = 100, z_2 = 200)))
unnest(a, "opt_x")
# >    x z z_1 z_2
# > 1: 1 0 100 200

The bug occurs when the unnested columns are added to the data.table with rcbind since y is used as the variable name for the unnested columns.

rcbind
function (x, y) 
{
    assert_data_table(x)
    assert_data_table(y)
    if (ncol(x) == 0L) {
        return(y)
    }
    if (ncol(y) == 0L) {
        return(x)
    }
    if (nrow(x) != nrow(y)) {
        stopf("Tables have different number of rows (x: %i, y: %i)", 
            nrow(x), nrow(y))
    }
    dup = intersect(names(x), names(y))
    if (length(dup)) {
        stopf("Duplicated names: %s", str_collapse(dup))
    }
    x[, `:=`(names(y), y)][]
}

register callback to namespace being loaded

UI something like

register_mlr3 = function() { ... }
register_mlr3pipelines = function() { ... }
.onLoad = function(libname, pkgname) {
  mlr3misc::react_to_namespace("mlr3", register_mlr3)
  mlr3misc::react_to_namespace("mlr3pipelines", register_mlr3pipelines)
}

should look something like

react_to_namespace = function(namespace, callback) {
  if (already_loaded(namespace)) callback()
  setHook(packageEvent(namespace, "onLoad"), .......)
  setHook(packageEvent(namespace_name_of(parent.frame()), "onUnload"), function(...) { < do the onUnload stuff > })
}

Warn when package is installed with srcrefs

Object sizes can grow very large when packages are installed with the --with-keep.source option. We should give a warning (that can be disabled) when users have mlr3 with sourcerefs installed. It should be possible to disable this warning using an option.
The warning should point to the FAQ section of the mlr-org website: mlr-org/mlr3website#132

Problem at installation: undefined symbol

Hi,
I just wanted to install your wonderful mlr3verse package (from CRAN). But it stopped at the installation of mlr3misc with the following error I can not understand:

Error: package or namespace load failed for ‘mlr3misc’ in dyn.load(file, DLLpath = DLLpath, ...):
unable to load shared object '/applvg/os/usr/lib64/R/library/mlr3misc/libs/mlr3misc.so':
/applvg/os/usr/lib64/R/library/mlr3misc/libs/mlr3misc.so: undefined symbol: REAL_RO
Error: loading failed
Execution halted
ERROR: loading failed

Further, I get the following warnings:

gcc -m64 -std=gnu99 -I/applvg/os/apps/gcc/gmp/include -I/applvg/os/apps/gcc/mpfr/include -I/usr/include/R -DNDEBUG -I/usr/local/include -fpic -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -c count_missing.c -o count_missing.o
count_missing.c: In function ‘count_missing_logical’:
count_missing.c:11:5: warning: implicit declaration of function ‘LOGICAL_RO’ [-Wimplicit-function-declaration]
const int * xp = LOGICAL_RO(x);
^
count_missing.c:11:22: warning: initialization makes pointer from integer without a cast
const int * xp = LOGICAL_RO(x);
^
count_missing.c: In function ‘count_missing_integer’:
count_missing.c:26:5: warning: implicit declaration of function ‘INTEGER_RO’ [-Wimplicit-function-declaration]
const int * xp = INTEGER_RO(x);
^
count_missing.c:26:22: warning: initialization makes pointer from integer without a cast
const int * xp = INTEGER_RO(x);
^
count_missing.c: In function ‘count_missing_double’:
count_missing.c:41:5: warning: implicit declaration of function ‘REAL_RO’ [-Wimplicit-function-declaration]
const double * xp = REAL_RO(x);
^
count_missing.c:41:25: warning: initialization makes pointer from integer without a cast
const double * xp = REAL_RO(x);
^
count_missing.c: In function ‘count_missing_complex’:
count_missing.c:52:5: warning: implicit declaration of function ‘COMPLEX_RO’ [-Wimplicit-function-declaration]
const Rcomplex * xp = COMPLEX_RO(x);
^
count_missing.c:52:27: warning: initialization makes pointer from integer without a cast
const Rcomplex * xp = COMPLEX_RO(x);
^

I have a Redhat 6 Unix System with R 3.4.3 and my default gcc is a 4.8.5. I also tried with the newer version 4.9.4.
Are there any dependencies I can not see? Do I need a newer gcc?
The version of the package I got from CRAN is 0.9.0

Best
Ulf

Suggestion: `Dictionary$help()` function

For many objects there is lrn("xxx")$help(), but the dictionary can also contain entries that do not map 1-to-1 with R6-classes, such as ppl(), which constructs different Graphs. However, we have the nomenclature of <dictname>_<key> for help files. I therefore suggest that Dictionary$help(<key>) tries to show help for the <dictname>_<key> help entry (and, failing that, tries to construct <key> and call its help() function.

Suggestion: Compound assignment operators

We could define operators such as %+=%, %-=% etc. What I seem to need a lot is

x = c(x, <something>)

so maybe we could have an %c=% operator for that.

See here for my implementation in an old project.

'transpose' nameclash with data.table transpose

The transpose function name-clashes with the data.table::transpose function. We usually import both these packages, so we end up having to write mlr3misc::transpose, while we usually don't prefix mlr3misc functions. Renaming this function to transpose_list would probably be more convenient. (We would be breaking with purrr function names).

consider adding of "reduce" function

aggregating lists down to a single element is an important operatuon.
base R has "Reduce". I will at least have a look at what purrr does and see if it is beneficial

parallelization of jobs with priority queue

Unfortunately the futures package does not offer this (and R seems to suck at synchronized parallelization in general) so we would have to hack this in a way using polling.

I can write this, but I am wondering if this should be in misc or belongs to its own package?
@berndbischl @mllg

map_dtc renames col names

xs = list(a = data.table(foo = 1), b = data.table(bar = 2))
d = map_dtc(xs, identity)
print(d)

leads to

   a.foo b.bar
1:     1     2

this is pretty much always unwanted i guess. i would suggest to either always
unname the xs list, or have a flag for that, which is by default true

better name handling in data.table:CJ

i have this code now here

https://github.com/mlr-org/paradox/blob/master/R/generate_design_grid.R

  # the un / renaming sucks a bit here, caused by dotdotdot-interface of CJ. would like to have a better way, but dont know
  # FIXME: mini helper in mlr3misc for this?
  ns = names(res); res = unname(res)
  res = do.call(CJ, c(res, sorted = FALSE))
  set_names(res, ns)

it sucks. problem is that res can contain "sorted".

should we add a helper CJ(list, sorted) ?

I would rather avoid this, but the above sucks a lot and keeping CJ as it is, is buggy and unsafe

About how `$mget()` handles multiple inputs

To prevent the following

library(mlr3learners)
library(mlr3)

foo = mlr_learners$mget("classif.rpart", "classif.svm")

length(foo)
#> [1] 1

Created on 2019-07-29 by the reprex package (v0.3.0)

In this case passing the learners as a vector is required.
Since the possible ellipsis in mget() usually needs to be named, we could somehow account for cases like above and maybe even get rid of the c() wrapping?

map_dtc is unreasonably slow when .f returns data.table

When the function in map_dtc returns a data.table with many rows, map_dtc appears to be slower than it needs to be by a factor of about 100.

system.time(mlr3misc::map_dtc(1:3, function(x) runif(1e6, max = x)))
#>    user  system elapsed 
#>   0.043   0.000   0.044 
system.time(mlr3misc::map_dtc(1:3, function(x) data.table(x = runif(1e6, max = x))))
#>    user  system elapsed 
#>   5.124   0.006   5.147 

profvis tells me this this is because name_dots is called in data.table.

crate() function in mlr3misc

I wrote something similar to carrier::crate here. The difference is that .fn does not need to be a verbatim function definition, with the cost that all values being used need to be declared in .... Do we want this in mlr3misc? I think it would be generally helpful in places where functions are given to objects, e.g. in ParamUty constructor argument custom_check.

Bug in crate function

library(mlr3misc)

l = list(a = 1)

crate(function() print(a), a = l$a)
#> Error in vapply(.x, .f, FUN.VALUE = .value, USE.NAMES = FALSE, ...): values must be length 1,
#>  but FUN(X[[2]]) result is length 3

Created on 2023-08-27 with reprex v2.0.2

Leanification fails with installing package with `--with-keep.source`

The object size below depends on how many packages are loaded before

library(mlr3)
library(mlr3verse)
library(data.table)

task = tsk("iris")
learner = lrn("classif.rpart")

learner$fallback = lrn("classif.featureless")

learner$train(task)

pth = tempfile(fileext = ".rds")

saveRDS(learner$state, pth)

x = readRDS(pth)

pryr::object_size(x)
#> 19.78 MB

Created on 2023-09-13 with reprex v2.0.2

pmap segfaults with bad parameters?

I think we should sanitize our inputs, even if .mapply doesn't.

> mlr3misc::pmap(1:4, function(x) x)

 *** caught segfault ***
address 0x200000001, cause 'memory not mapped'

Traceback:
 1: .mapply(.f, .x, list(...))
 2: mlr3misc::pmap(1:4, function(x) x)

Release mlr3misc 0.15.0

Prepare for release:

  • git pull
  • Check current CRAN check results
  • Polish NEWS
  • urlchecker::url_check()
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • revdepcheck::revdep_check(num_workers = 4)
  • Update cran-comments.md
  • git push
  • Draft blog post

Submit to CRAN:

  • usethis::use_version('minor')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted 🎉
  • Add preemptive link to blog post in pkgdown news menu
  • usethis::use_github_release()
  • usethis::use_dev_version(push = TRUE)
  • Finish blog post
  • Tweet

can we please make transpose S3

i need to do special NA-handling in paradox, when a generated dt contains in some elements NAs.

when transponse generates the list of row-configurations, the user should have the option to filter out NAs quickly (before config-evaluation)

for this, I could set a class for returned dt-objects in paradox and then specialize transpose in paradox

connected issues

mlr-org/paradox#140

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.