mlr-org / mlr3misc Goto Github PK

View Code? Open in Web Editor NEW

10.0 12.0 2.0 9.58 MB

Miscellaneous helper functions for mlr3

Home Page: https://mlr3misc.mlr-org.com

License: GNU Lesser General Public License v3.0

R 95.20% C 4.80%

mlr3 miscellaneous machine-learning r r-package

mlr3misc's Introduction

mlr3misc

Package website: release | dev

Miscellaneous helper functions for mlr3.

mlr3misc's People

Contributors

Stargazers

Watchers

Forkers

gsenseless m-muecke

mlr3misc's Issues

insert_named doesn't work as expected for regr.glmnet

According to book chapter 2.3, we can use mlr3::insert_named() to set specific arguments without resetting existing arguments. However, when I tried to use it for a learner of class "regr.glmnet" the function didn't appear to work at all:

library(mlr3)
library(mlr3learners)
library(mlr3misc)

learner <- mlr_learners$get("regr.glmnet")
learner$param_set

<ParamSet>
                  id    class lower upper                     levels        default parents    value
 1:        alignment ParamFct    NA    NA            lambda,fraction         lambda                 
 2:            alpha ParamDbl     0     1                                         1                 
 3:              big ParamDbl  -Inf   Inf                                   9.9e+35                 
 4:           devmax ParamDbl     0     1                                     0.999                 
 5:            dfmax ParamInt     0   Inf                            <NoDefault[3]>                 
 6:              eps ParamDbl     0     1                                     1e-06                 
 7:            epsnr ParamDbl     0     1                                     1e-08                 
 8:            exact ParamLgl    NA    NA                 TRUE,FALSE          FALSE                 
 9:          exclude ParamInt     1   Inf                            <NoDefault[3]>                 
10:             exmx ParamDbl  -Inf   Inf                                       250                 
11:           family ParamFct    NA    NA           gaussian,poisson       gaussian         gaussian
12:             fdev ParamDbl     0     1                                     1e-05                 
13:            gamma ParamDbl  -Inf   Inf                                         1   relax         
14:          grouped ParamLgl    NA    NA                 TRUE,FALSE           TRUE                 
15:        intercept ParamLgl    NA    NA                 TRUE,FALSE           TRUE                 
16:             keep ParamLgl    NA    NA                 TRUE,FALSE          FALSE                 
17:           lambda ParamUty    NA    NA                            <NoDefault[3]>                 
18: lambda.min.ratio ParamDbl     0     1                            <NoDefault[3]>                 
19:     lower.limits ParamUty    NA    NA                            <NoDefault[3]>                 
20:            maxit ParamInt     1   Inf                                    100000                 
21:            mnlam ParamInt     1   Inf                                         5                 
22:             mxit ParamInt     1   Inf                                       100                 
23:           mxitnr ParamInt     1   Inf                                        25                 
24:        newoffset ParamUty    NA    NA                            <NoDefault[3]>                 
25:          nlambda ParamInt     1   Inf                                       100                 
26:           offset ParamUty    NA    NA                                                           
27:         parallel ParamLgl    NA    NA                 TRUE,FALSE          FALSE                 
28:   penalty.factor ParamUty    NA    NA                            <NoDefault[3]>                 
29:             pmax ParamInt     0   Inf                            <NoDefault[3]>                 
30:             pmin ParamDbl     0     1                                     1e-09                 
31:             prec ParamDbl  -Inf   Inf                                     1e-10                 
32:    predict.gamma ParamDbl  -Inf   Inf                                         1                 
33:            relax ParamLgl    NA    NA                 TRUE,FALSE          FALSE                 
34:                s ParamDbl     0   Inf                                      0.01                 
35:      standardize ParamLgl    NA    NA                 TRUE,FALSE           TRUE                 
36:           thresh ParamDbl     0   Inf                                     1e-07                 
37:         trace.it ParamInt     0     1                                         0                 
38:    type.gaussian ParamFct    NA    NA           covariance,naive <NoDefault[3]>  family         
39:    type.logistic ParamFct    NA    NA     Newton,modified.Newton <NoDefault[3]>                 
40:     type.measure ParamFct    NA    NA deviance,class,auc,mse,mae       deviance                 
41: type.multinomial ParamFct    NA    NA          ungrouped,grouped <NoDefault[3]>                 
42:     upper.limits ParamUty    NA    NA                            <NoDefault[3]>                 
                  id    class lower upper                     levels        default parents    value

## insert new params
insert_named(learner$param_set$values,
             list(family="poisson",
                  lambda=0))
learner$param_set

<ParamSet>
                  id    class lower upper                     levels        default parents    value
 1:        alignment ParamFct    NA    NA            lambda,fraction         lambda                 
 2:            alpha ParamDbl     0     1                                         1                 
 3:              big ParamDbl  -Inf   Inf                                   9.9e+35                 
 4:           devmax ParamDbl     0     1                                     0.999                 
 5:            dfmax ParamInt     0   Inf                            <NoDefault[3]>                 
 6:              eps ParamDbl     0     1                                     1e-06                 
 7:            epsnr ParamDbl     0     1                                     1e-08                 
 8:            exact ParamLgl    NA    NA                 TRUE,FALSE          FALSE                 
 9:          exclude ParamInt     1   Inf                            <NoDefault[3]>                 
10:             exmx ParamDbl  -Inf   Inf                                       250                 
11:           family ParamFct    NA    NA           gaussian,poisson       gaussian         gaussian
12:             fdev ParamDbl     0     1                                     1e-05                 
13:            gamma ParamDbl  -Inf   Inf                                         1   relax         
14:          grouped ParamLgl    NA    NA                 TRUE,FALSE           TRUE                 
15:        intercept ParamLgl    NA    NA                 TRUE,FALSE           TRUE                 
16:             keep ParamLgl    NA    NA                 TRUE,FALSE          FALSE                 
17:           lambda ParamUty    NA    NA                            <NoDefault[3]>                 
18: lambda.min.ratio ParamDbl     0     1                            <NoDefault[3]>                 
19:     lower.limits ParamUty    NA    NA                            <NoDefault[3]>                 
20:            maxit ParamInt     1   Inf                                    100000                 
21:            mnlam ParamInt     1   Inf                                         5                 
22:             mxit ParamInt     1   Inf                                       100                 
23:           mxitnr ParamInt     1   Inf                                        25                 
24:        newoffset ParamUty    NA    NA                            <NoDefault[3]>                 
25:          nlambda ParamInt     1   Inf                                       100                 
26:           offset ParamUty    NA    NA                                                           
27:         parallel ParamLgl    NA    NA                 TRUE,FALSE          FALSE                 
28:   penalty.factor ParamUty    NA    NA                            <NoDefault[3]>                 
29:             pmax ParamInt     0   Inf                            <NoDefault[3]>                 
30:             pmin ParamDbl     0     1                                     1e-09                 
31:             prec ParamDbl  -Inf   Inf                                     1e-10                 
32:    predict.gamma ParamDbl  -Inf   Inf                                         1                 
33:            relax ParamLgl    NA    NA                 TRUE,FALSE          FALSE                 
34:                s ParamDbl     0   Inf                                      0.01                 
35:      standardize ParamLgl    NA    NA                 TRUE,FALSE           TRUE                 
36:           thresh ParamDbl     0   Inf                                     1e-07                 
37:         trace.it ParamInt     0     1                                         0                 
38:    type.gaussian ParamFct    NA    NA           covariance,naive <NoDefault[3]>  family         
39:    type.logistic ParamFct    NA    NA     Newton,modified.Newton <NoDefault[3]>                 
40:     type.measure ParamFct    NA    NA deviance,class,auc,mse,mae       deviance                 
41: type.multinomial ParamFct    NA    NA          ungrouped,grouped <NoDefault[3]>                 
42:     upper.limits ParamUty    NA    NA                            <NoDefault[3]>                 
                  id    class lower upper                     levels        default parents    value

However it does appear to work when I overwrite it directly:

learner$param_set$values <- list(family="poisson",
                                 lambda=0)
learner$param_set

<ParamSet>
                  id    class lower upper                     levels        default parents   value
 1:        alignment ParamFct    NA    NA            lambda,fraction         lambda                
 2:            alpha ParamDbl     0     1                                         1                
 3:              big ParamDbl  -Inf   Inf                                   9.9e+35                
 4:           devmax ParamDbl     0     1                                     0.999                
 5:            dfmax ParamInt     0   Inf                            <NoDefault[3]>                
 6:              eps ParamDbl     0     1                                     1e-06                
 7:            epsnr ParamDbl     0     1                                     1e-08                
 8:            exact ParamLgl    NA    NA                 TRUE,FALSE          FALSE                
 9:          exclude ParamInt     1   Inf                            <NoDefault[3]>                
10:             exmx ParamDbl  -Inf   Inf                                       250                
11:           family ParamFct    NA    NA           gaussian,poisson       gaussian         poisson
12:             fdev ParamDbl     0     1                                     1e-05                
13:            gamma ParamDbl  -Inf   Inf                                         1   relax        
14:          grouped ParamLgl    NA    NA                 TRUE,FALSE           TRUE                
15:        intercept ParamLgl    NA    NA                 TRUE,FALSE           TRUE                
16:             keep ParamLgl    NA    NA                 TRUE,FALSE          FALSE                
17:           lambda ParamUty    NA    NA                            <NoDefault[3]>               0
18: lambda.min.ratio ParamDbl     0     1                            <NoDefault[3]>                
19:     lower.limits ParamUty    NA    NA                            <NoDefault[3]>                
20:            maxit ParamInt     1   Inf                                    100000                
21:            mnlam ParamInt     1   Inf                                         5                
22:             mxit ParamInt     1   Inf                                       100                
23:           mxitnr ParamInt     1   Inf                                        25                
24:        newoffset ParamUty    NA    NA                            <NoDefault[3]>                
25:          nlambda ParamInt     1   Inf                                       100                
26:           offset ParamUty    NA    NA                                                          
27:         parallel ParamLgl    NA    NA                 TRUE,FALSE          FALSE                
28:   penalty.factor ParamUty    NA    NA                            <NoDefault[3]>                
29:             pmax ParamInt     0   Inf                            <NoDefault[3]>                
30:             pmin ParamDbl     0     1                                     1e-09                
31:             prec ParamDbl  -Inf   Inf                                     1e-10                
32:    predict.gamma ParamDbl  -Inf   Inf                                         1                
33:            relax ParamLgl    NA    NA                 TRUE,FALSE          FALSE                
34:                s ParamDbl     0   Inf                                      0.01                
35:      standardize ParamLgl    NA    NA                 TRUE,FALSE           TRUE                
36:           thresh ParamDbl     0   Inf                                     1e-07                
37:         trace.it ParamInt     0     1                                         0                
38:    type.gaussian ParamFct    NA    NA           covariance,naive <NoDefault[3]>  family        
39:    type.logistic ParamFct    NA    NA     Newton,modified.Newton <NoDefault[3]>                
40:     type.measure ParamFct    NA    NA deviance,class,auc,mse,mae       deviance                
41: type.multinomial ParamFct    NA    NA          ungrouped,grouped <NoDefault[3]>                
42:     upper.limits ParamUty    NA    NA                            <NoDefault[3]>                
                  id    class lower upper                     levels        default parents   value

I'm running R 4.0.2, mlr3 0.9.0, mlr3learners 0.4.3, mlr3misc 0.6.0.

Release mlr3misc 0.14.0

Prepare for release:

Submit to CRAN:

usethis::use_version('minor')
devtools::submit_cran()
Approve email

Wait for CRAN...

now i need flatten

i want to peel off exactly one level of a list of list

so this should be
x = list(list(iris[1:2,]), list(iris[1:2,]))

list(iris[1:2,], iris[1:2,])

dictionary_sugar_get seems to fail when dictionary entries are functions

library("mlr3")
rsmp("holdout", x = 1)
#> Error: object of type 'closure' is not subsettable
#> Environment:
#>   1: rsmp("holdout", x = 1)
#>   2: dictionary_sugar_get(mlr_resamplings, .key, ...)
#>   3: stopf("Cannot set argument '%s' for '%s' (not a constructor argument, not a
#>   4: stop(simpleError(str_wrap(sprintf(msg, ...), width = wrap), call = NULL))
#>   5: simpleError(str_wrap(sprintf(msg, ...), width = wrap), call = NULL)
#>   6: structure(list(message = as.character(message), call = call), class = class
#>   7: str_wrap(sprintf(msg, ...), width = wrap)
#>   8: sprintf(msg, ...)
#>   9: did_you_mean(nn, c(constructor_args, param_ids, fields(obj$value)))
#>   10: unique(candidates)
#>   11: fields(obj$value)
#>   12: setdiff(names(x$public_methods), c("initialize", "clone", "print", "format"
#>   13: as.vector(x)

problem seems to be that rsmp()-entries are functions and not R6-constructors, which is explicitly allowed for dictionaries.

if we are at it, we might add "rerun"

this is replicate(simplify = FALSE)
R's default sucks

need something that removes environments from functions

I want to save an anonymous function in an object and serialize that object, but that object should not grow huge. It is then necessary that the function's environment hierarchy doesn't contain too much other stuff.

Usecase:

x = 1
large_obj = [huge object that takes lots of memory]
want_to_serialize = function(arg) return(arg + x)
saveRDS(want_to_serialize, "savefile.rds")

want_to_serialize's environment now contains large_obj and the savefile grows large. Instead I would want a function detachEnv(fun, keep, basis):

want_to_serialize_lean = detachEnv(want_to_serialize, keep = "x")
saveRDS(want_to_serialize_lean, "savefile.rds")

code for this:

# removes the environment from fun and potentially wraps it in a new environment
detachEnv <- function(fun, keep = character(0), basis = topenv(parent.frame())) {
  assertEnvironment(basis)
  assertCharacter(keep, any.missing = FALSE)
  assertFunction(fun)
  if (length(keep)) {
    keepvals <- mget(keep, parent.frame(), inherits = TRUE)
    basis <- new.env(parent = basis, size = length(keepvals))
    mapply(assign, names(keepvals), keepvals, MoreArgs = list(envir = basis))
  }
  environment(fun) <- basis
  fun
}

name up for debate

insert_named() treat NULL like empty list / vector / whatever

Let insert_named(NULL, x) just be x (assuming x is named)?

I can't download mlr3 onto my Rstudio

I tried to install it on my rstudio, but it just won't work. I did:

install.packages("mlr3verse")
install.packages("mlr3")

Every time when I run the library(ml3rverse) or library(mlr3), I get this error:
Error: package or namespace load failed for ‘mlr3verse’ in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]):
there is no package called ‘mlr3misc’

I have the latest Rstudio, is there something else that's wrong?

Dictionaries are marked as unclonable

why? i just wanted to clone one.
can we please at least document the reason?

Automatic cleanup when unloading a package

As mlr3 populates dictionaries from packages like mlr3 or mlr3pipelines when loading extensions, those objects should be removed again when unloading the extension.
An idea brought forward by Lukas is to automate this, which I think is a good idea:
mlr-org/mlr3proba#301 (comment)

A simple solution would be to tag the dictionary entries with the package that added them.
Then, during the .onUnload function, we can remove all objects that were added by the package that is being unloaded.

backports version

mlr3misc uses backports::errorCondition, so to the degree it imports backports, it should require the backports version that introduced that. Otherwise it's very hard for a user who happens to have an old backports version installed to find out what the problem is.

BBmisc::chunk()

I could use this in the pipelines.
Is it broken, or does it make sense to copy this?

map_dtc doesnt work when the mapped call return dts

i would like to do this

foo = function(mu) {
  setDT(as.data.frame(rnorm(2, mean = mu)))
}
d = map_dtc(1:3, foo) # doenst work
print(d)

# this works
d = lapply(1:3, foo)
d = do.call(cbind, d)
print(d)

set_params

I think we should really offer this set_params public method for Learners, Graphs etc. so we can use it in the book.
Last time @mllg mentioned that we could just offer a s3 method that does something akin to

set_params.R6 = function(x, ...) {
  if (!is.null(x$param_set)) {
    x$param_set$values = insert_named(x$param_set$values, list(...))
  } 
  invisible(x)
}

admittedly that is less work and we could simply add it in mlr3misc without touching other packages. But I think the downsight is that this is isomewhat inconsistent. The user then has to guess when something is implemented as a method vs a s function (?) I think offering both would be fine as well but I definitely think that the following should work.

learner$set_params(x = 1, y = 2)

This is also related to the suggestion that @mb706 made at some point, that it might make sense to introduce another class into the hierarchy that represents an object that has a param set and some other standardized properties? But even if this would make sense it would definitely take some time so I would suggest we make an mlr3misc release with this set_params utility that is already implemented and then add the method set_params to

I am not sure whether I forgot something.
I can do this if you are ok @mllg @be-marc

unnest() fails if column name y is present

a = data.table(y = list(0), opt_x = list(list(z_1 = 100, z_2 = 200)))
unnest(a, "opt_x")
# >    x y z_1 z_2
# > 1: 1 0   0   0

a = data.table(z = list(0), opt_x = list(list(z_1 = 100, z_2 = 200)))
unnest(a, "opt_x")
# >    x z z_1 z_2
# > 1: 1 0 100 200

The bug occurs when the unnested columns are added to the data.table with rcbind since y is used as the variable name for the unnested columns.

rcbind
function (x, y) 
{
    assert_data_table(x)
    assert_data_table(y)
    if (ncol(x) == 0L) {
        return(y)
    }
    if (ncol(y) == 0L) {
        return(x)
    }
    if (nrow(x) != nrow(y)) {
        stopf("Tables have different number of rows (x: %i, y: %i)", 
            nrow(x), nrow(y))
    }
    dup = intersect(names(x), names(y))
    if (length(dup)) {
        stopf("Duplicated names: %s", str_collapse(dup))
    }
    x[, `:=`(names(y), y)][]
}

named_list() should produce an empty, named list

right now it returns an error

Partial argument matching for dictionary sugar get

It is e.g. impossible for a learner with a parameter / field that partially matches dict to be set during a lrn() call as the value of this parameter will be used as the dict argument.

register callback to namespace being loaded

UI something like

register_mlr3 = function() { ... }
register_mlr3pipelines = function() { ... }
.onLoad = function(libname, pkgname) {
  mlr3misc::react_to_namespace("mlr3", register_mlr3)
  mlr3misc::react_to_namespace("mlr3pipelines", register_mlr3pipelines)
}

should look something like

react_to_namespace = function(namespace, callback) {
  if (already_loaded(namespace)) callback()
  setHook(packageEvent(namespace, "onLoad"), .......)
  setHook(packageEvent(namespace_name_of(parent.frame()), "onUnload"), function(...) { < do the onUnload stuff > })
}

map_at does not work for data.frames or data.tables

toposort: is "depth" or layer really mathematically defined? if not should be removed?

Warn when package is installed with srcrefs

Object sizes can grow very large when packages are installed with the --with-keep.source option. We should give a warning (that can be disabled) when users have mlr3 with sourcerefs installed. It should be possible to disable this warning using an option.
The warning should point to the FAQ section of the mlr-org website: mlr-org/mlr3website#132

Problem at installation: undefined symbol

Hi,
I just wanted to install your wonderful mlr3verse package (from CRAN). But it stopped at the installation of mlr3misc with the following error I can not understand:

Error: package or namespace load failed for ‘mlr3misc’ in dyn.load(file, DLLpath = DLLpath, ...):
unable to load shared object '/applvg/os/usr/lib64/R/library/mlr3misc/libs/mlr3misc.so':
/applvg/os/usr/lib64/R/library/mlr3misc/libs/mlr3misc.so: undefined symbol: REAL_RO
Error: loading failed
Execution halted
ERROR: loading failed

Further, I get the following warnings:

gcc -m64 -std=gnu99 -I/applvg/os/apps/gcc/gmp/include -I/applvg/os/apps/gcc/mpfr/include -I/usr/include/R -DNDEBUG -I/usr/local/include -fpic -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -c count_missing.c -o count_missing.o
count_missing.c: In function ‘count_missing_logical’:
count_missing.c:11:5: warning: implicit declaration of function ‘LOGICAL_RO’ [-Wimplicit-function-declaration]
const int * xp = LOGICAL_RO(x);
^
count_missing.c:11:22: warning: initialization makes pointer from integer without a cast
const int * xp = LOGICAL_RO(x);
^
count_missing.c: In function ‘count_missing_integer’:
count_missing.c:26:5: warning: implicit declaration of function ‘INTEGER_RO’ [-Wimplicit-function-declaration]
const int * xp = INTEGER_RO(x);
^
count_missing.c:26:22: warning: initialization makes pointer from integer without a cast
const int * xp = INTEGER_RO(x);
^
count_missing.c: In function ‘count_missing_double’:
count_missing.c:41:5: warning: implicit declaration of function ‘REAL_RO’ [-Wimplicit-function-declaration]
const double * xp = REAL_RO(x);
^
count_missing.c:41:25: warning: initialization makes pointer from integer without a cast
const double * xp = REAL_RO(x);
^
count_missing.c: In function ‘count_missing_complex’:
count_missing.c:52:5: warning: implicit declaration of function ‘COMPLEX_RO’ [-Wimplicit-function-declaration]
const Rcomplex * xp = COMPLEX_RO(x);
^
count_missing.c:52:27: warning: initialization makes pointer from integer without a cast
const Rcomplex * xp = COMPLEX_RO(x);
^

I have a Redhat 6 Unix System with R 3.4.3 and my default gcc is a 4.8.5. I also tried with the newer version 4.9.4.
Are there any dependencies I can not see? Do I need a newer gcc?
The version of the package I got from CRAN is 0.9.0

Best
Ulf

implement transpose

Just .mapply(list, design, list())?

So we can remove: mlr-org/paradox#115

Suggestion: `Dictionary$help()` function

For many objects there is lrn("xxx")$help(), but the dictionary can also contain entries that do not map 1-to-1 with R6-classes, such as ppl(), which constructs different Graphs. However, we have the nomenclature of <dictname>_<key> for help files. I therefore suggest that Dictionary$help(<key>) tries to show help for the <dictname>_<key> help entry (and, failing that, tries to construct <key> and call its help() function.

Suggestion: Compound assignment operators

We could define operators such as %+=%, %-=% etc. What I seem to need a lot is

x = c(x, <something>)

so maybe we could have an %c=% operator for that.

See here for my implementation in an old project.

'transpose' nameclash with data.table transpose

The transpose function name-clashes with the data.table::transpose function. We usually import both these packages, so we end up having to write mlr3misc::transpose, while we usually don't prefix mlr3misc functions. Renaming this function to transpose_list would probably be more convenient. (We would be breaking with purrr function names).

can we please have a compose function?

also see this here
mlr-org/paradox#245

consider adding of "reduce" function

aggregating lists down to a single element is an important operatuon.
base R has "Reduce". I will at least have a look at what purrr does and see if it is beneficial

modify from purrr

https://purrr.tidyverse.org/reference/modify.html

This is useful for some graph modifications in mlr3pipelines

my mistake

maybe insert_named should not be in-place for data.table / environment

it is not for any other type. maybe make this optional

parallelization of jobs with priority queue

Unfortunately the futures package does not offer this (and R seems to suck at synchronized parallelization in general) so we would have to hack this in a way using polling.

I can write this, but I am wondering if this should be in misc or belongs to its own package?
@berndbischl @mllg

what is the best way to map an R6 method over a list of objects

`crate()` function seems to change behaviour of the byte code compiler

library(mlr3misc)

f = function() NULL
fc = crate(f)

f()
#> NULL
f()
#> NULL

fc()
#> NULL
fc()
#> NULL

f
#> function() NULL
#> <bytecode: 0x55fcfee0baa0>
fc
#> function() NULL
#> <environment: 0x55fcfec76550>

^{Created on 2023-10-30 with reprex v2.0.2}

can we get has_element?

we currently use code like this instead

any(map_lgl(xs, identical, x)))

map_dtc renames col names

xs = list(a = data.table(foo = 1), b = data.table(bar = 2))
d = map_dtc(xs, identity)
print(d)

leads to

   a.foo b.bar
1:     1     2

this is pretty much always unwanted i guess. i would suggest to either always
unname the xs list, or have a flag for that, which is by default true

better name handling in data.table:CJ

i have this code now here

https://github.com/mlr-org/paradox/blob/master/R/generate_design_grid.R

  # the un / renaming sucks a bit here, caused by dotdotdot-interface of CJ. would like to have a better way, but dont know
  # FIXME: mini helper in mlr3misc for this?
  ns = names(res); res = unname(res)
  res = do.call(CJ, c(res, sorted = FALSE))
  set_names(res, ns)

it sucks. problem is that res can contain "sorted".

should we add a helper CJ(list, sorted) ?

I would rather avoid this, but the above sucks a lot and keeping CJ as it is, is buggy and unsafe

About how `$mget()` handles multiple inputs

To prevent the following

library(mlr3learners)
library(mlr3)

foo = mlr_learners$mget("classif.rpart", "classif.svm")

length(foo)
#> [1] 1

^{Created on 2019-07-29 by the reprex package (v0.3.0)}

In this case passing the learners as a vector is required.
Since the possible ellipsis in mget() usually needs to be named, we could somehow account for cases like above and maybe even get rid of the c() wrapping?

chunk has wromg docs

it says chunks takes a vec, it takes a length arg

lrn(), msr(), po(), ... could print all available keys when called without an argument.

Unfortunately we can't do tab completion, but this is the closest thing

crate .parent default should be topenv() of calling environment

map_dtc is unreasonably slow when .f returns data.table

When the function in map_dtc returns a data.table with many rows, map_dtc appears to be slower than it needs to be by a factor of about 100.

system.time(mlr3misc::map_dtc(1:3, function(x) runif(1e6, max = x)))
#>    user  system elapsed 
#>   0.043   0.000   0.044 
system.time(mlr3misc::map_dtc(1:3, function(x) data.table(x = runif(1e6, max = x))))
#>    user  system elapsed 
#>   5.124   0.006   5.147

profvis tells me this this is because name_dots is called in data.table.

Rename print helpers, implement more

crate() function in mlr3misc

I wrote something similar to carrier::crate here. The difference is that .fn does not need to be a verbatim function definition, with the cost that all values being used need to be declared in .... Do we want this in mlr3misc? I think it would be generally helpful in places where functions are given to objects, e.g. in ParamUty constructor argument custom_check.

Bug in crate function

library(mlr3misc)

l = list(a = 1)

crate(function() print(a), a = l$a)
#> Error in vapply(.x, .f, FUN.VALUE = .value, USE.NAMES = FALSE, ...): values must be length 1,
#>  but FUN(X[[2]]) result is length 3

^{Created on 2023-08-27 with reprex v2.0.2}

Leanification fails with installing package with `--with-keep.source`

The object size below depends on how many packages are loaded before

library(mlr3)
library(mlr3verse)
library(data.table)

task = tsk("iris")
learner = lrn("classif.rpart")

learner$fallback = lrn("classif.featureless")

learner$train(task)

pth = tempfile(fileext = ".rds")

saveRDS(learner$state, pth)

x = readRDS(pth)

pryr::object_size(x)
#> 19.78 MB

^{Created on 2023-09-13 with reprex v2.0.2}

unnest: does not work with non-scalar columns

it is the underlying issue reported here
mlr-org/mlr3tuning#201

i guess the case / problem is very clear

a) we document this at least as a restriction that unnesting non-scalar columns is not possible
b) probably better: unnest the rest into individual list-columns

pmap segfaults with bad parameters?

I think we should sanitize our inputs, even if .mapply doesn't.

> mlr3misc::pmap(1:4, function(x) x)

 *** caught segfault ***
address 0x200000001, cause 'memory not mapped'

Traceback:
 1: .mapply(.f, .x, list(...))
 2: mlr3misc::pmap(1:4, function(x) x)

Release mlr3misc 0.15.0

Prepare for release:

Submit to CRAN:

usethis::use_version('minor')
devtools::submit_cran()
Approve email

Wait for CRAN...

can we please make transpose S3

i need to do special NA-handling in paradox, when a generated dt contains in some elements NAs.

when transponse generates the list of row-configurations, the user should have the option to filter out NAs quickly (before config-evaluation)

for this, I could set a class for returned dt-objects in paradox and then specialize transpose in paradox

connected issues

mlr-org/paradox#140