a-hanf / mlr3automl Goto Github PK
View Code? Open in Web Editor NEWAutomated machine learning in mlr3
License: GNU Lesser General Public License v3.0
Automated machine learning in mlr3
License: GNU Lesser General Public License v3.0
Dear Alex,
Thanks for your hard work on this amazing package! I watched your presentation on UserR session 3A and followed your examples on my own data. However I'm having trouble analysing the result using DALEX and other related packages (ArenaR, Triplot, ModelStudio, rSAFE). I wonder if you could provide example code in you vignette that shows how to:
Thanks again for your amazing work,
John
python runbenchmark.py mlr3automl openml/t/59 -f 0
...
Error in makeActiveBinding(name, active[[name]], public_bind_env) :
symbol already has a regular binding
Calls: run ... assert_r6 -> checkR6 -> -> makeActiveBinding
Thanks for great package. I planned to write a package with AutoML for finace (investing) using mlr3 but it seems on first the you have already made great package (better than I woud be able to do for sure :)).
I have tried the package on my dataset with only one learner and Inf runtime. Here is simple code:
bmr_results = AutoML(my_task, learner_list = c("classif.ranger"), runtime = Inf)
bmr_results$train()
I don't understand how can inspect results of the model after training?
I can see following methods and attributes:
names(bmr_results)
[1] ".__enclos_env__" "custom_trafo" "additional_params" "portfolio" "tuner" "runtime" "tuning_terminator"
[8] "measure" "resampling" "preprocessing" "learner" "learner_timeout" "learner_list" "task"
[15] "clone" "initialize" "tuned_params" "resample" "predict" "train"
I can't see aggregate method.
I have also tried to use resmaple method instead of train, but I got the same result.
Additionally I would like to if it is possible to use feature select steps in preprocessing?
Line 191 in e157226
When learner_timeout
is not NULL
, then the condition evaluates to TRUE
, when learner_timeout
is NULL
, then it evaluates to NA
.
When I add additional learners, very often I get following error:
Error in gunion(x) :
Assertion on 'ids of pipe operators' failed: Must have unique names, but element 7 is duplicated.
Sample code:
# define learners
new_params = ParamSet$new(list(
ParamInt$new("classif.kknn.k", lower = 1, upper = 5, default = 3),
ParamDbl$new("classif.glmnet.alpha", lower = 0, upper = 1),
ParamInt$new("classif.nnet.size", lower = 1, upper = 10),
ParamDbl$new("classif.nnet.decay", lower = 0, upper = 0.5)
# ParamInt$new("classif.bart.ntree", lower = 500, upper = 1000),
# ParamDbl$new("classif.C50.CF", lower = 0, upper = 1),
# ParamInt$new("classif.C50.trials", lower = 1, upper = 40)
))
my_trafo = function(x, param_set) {
if ("classif.kknn.k" %in% names(x)) {
x[["classif.kknn.k"]] = 2^x[["classif.kknn.k"]]
}
return(x)
}
task_ <- tsk("iris")
bmr_results = AutoML(task_,
learner_list = c("classif.ranger", "classif.xgboost", "classif.liblinear",
"classif.kknn", "classif.glmnet", "classif.nnet"),
additional_params = new_params,
custom_trafo=my_trafo,
runtime = Inf)
If I remove last learner (classif,nnet), I don't get the errors. But if I use some other learner from mlr3extralearners package, I get the same error.
Currently, all the packages that mlr3automl depends on are in the Imports section of the DESCRIPTION file. I wonder if it would be helpful to move mlr3 to Depends, so you don't need to explicitly load it when running the examples?
> library(mlr3automl)
> iris_task <- tsk('iris')
Error in tsk("iris") : could not find function "tsk"
You have to do this:
library(mlr3)
library(mlr3automl)
iris_task <- tsk('iris')
model <- AutoML(iris_task)
model$train()
This is very unintuitive behaviour.
self$tuner = tnr("hyperband", eta = 3L)
needs to be set to self$tuner = tnr("hyperband", eta = 3L, repititions=Inf)
.
Hello,
Thanks for this fantastic package. One strange thing I've noticed is that I seem to not be able to activate the preprocessing stage. I can tell for a couple of reasons (I think)! So I have some numeric data, that contains NAs, and that I want to do regression on. If I run my data as is using AutoML(task)
with no options set then I get an error:
Error in check_prediction_data.PredictionDataRegr(pdata) :
Assertion on 'pdata$response' failed: Contains missing values (element 1).
Ok, so if I remove the NAs first then, after running predict on the full data set (minus NAs) I get something like:
Very good predictions except the fact that the predictions lie at an angle to the x-y line, I think, is a result of a lack of scaling and centering because, if I do this manually first then I get:
I mean, obviously the plot looks different now as mlr3 doesn't know about the scaling, but now the predictions lie nicely around the x-y line.
So it seems the default is to do no preprocessing (it's not quite clear from the help pages). But, when I set the option AutoML(task, preprocessing = "full")
, I get no difference in the outcome with the original data or manually scaled data. Plus if I leave in the NAs then I still get the error:
Error in check_prediction_data.PredictionDataRegr(pdata) :
Assertion on 'pdata$response' failed: Contains missing values (element 1).
The help pages suggest NAs can be handled as they mention imputation - but I still get the error. And, as I mentioned above the predictions on data after removing NAs look the same as with not setting the preprocessing
option. Am I missing something?
EDIT;
but setting preprocessing = po("scale")
does work:
so it seems like it's the "full", "stability", "none" options that aren't being respected. Or I'm being stupid!
Hi!
Is there at this moment an option to use mlr3automl for data where time is important?
Most importantly it should use a resampling that respects time. However, as far as I know, the mlr3temporal package does not yet provide these options.
Is there a way to do temporal train- test split with mlr3automl?
Thank You!
I think you might be missing a command to install ml3extralearners in your list of installation commands:
devtools::install_github('https://github.com/mlr-org/mlr3@master')
devtools::install_github('https://github.com/mlr-org/mlr3tuning@autotuner-notimeout')
devtools::install_github('https://github.com/a-hanf/mlr3automl@development')
Here's the error I got:
> devtools::install_github('https://github.com/a-hanf/mlr3automl@development')
Downloading GitHub repo a-hanf/mlr3automl@master
Skipping 2 packages not available: mlr3extralearners, glmnet
✓ checking for file ‘/private/var/folders/gj/cm0k4b_s42j30zs376cq_5hh0000gn/T/Rtmp6Nhtam/remotesa773542f4f5/a-hanf-mlr3automl-eed029b/DESCRIPTION’ ...
─ preparing ‘mlr3automl’:
✓ checking DESCRIPTION meta-information ...
─ checking for LF line-endings in source and make files and shell scripts
─ checking for empty or unneeded directories
─ building ‘mlr3automl_0.0.0.9000.tar.gz’
Installing package into ‘/Users/me/Library/R/3.5/library’
(as ‘lib’ is unspecified)
ERROR: dependency ‘mlr3extralearners’ is not available for package ‘mlr3automl’
* removing ‘/Users/me/Library/R/3.5/library/mlr3automl’
Error: Failed to install 'mlr3automl' from GitHub:
(converted from warning) installation of package ‘/var/folders/gj/cm0k4b_s42j30zs376cq_5hh0000gn/T//Rtmp6Nhtam/filea7745b7e50b/mlr3automl_0.0.0.9000.tar.gz’ had non-zero exit status
However, after installing ml3extralearners then the install of the ml3automl worked:
devtools::install_github('https://github.com/mlr-org/mlr3extralearners@master')
It seems to me it is not possible to use two or more tasks in AutoML?
If that's true, I would like to make feature request.
This is ussually possible in using mlr3 benchmark. It is possible to degine multiple tasks, learners etc.
If I use two tasks in AutoML:
library(mlr3automl)
library(mlr3verse)
task_1 <- tsk("iris")
task_2 <- tsk("iris")
bmr_results = AutoML(list(task_1, task_2))
bmr_results = AutoML(c(task_1, task_2))
it returns an error:
Error in if (task$task_type == "classif") { : argument is of length zero
BTW, is there any way I can contribute tu this package and help in developing? Maybe adding new learners, there are many of them in mlr3extensions?
Here is code where I would have expected the aggregate results at the end for two identical benchmarks to be identical, but they are not. Since I am only an intermediate level coder in R, perhaps there is something wrong with my code. In any event, I pass this along for your consideration as a possible issue in mlr3automl. As you can imagine, this code takes a while to execute, ~10 minutes on my iMac Pro.
#############################################################
# Cross-validating the regression learners
#############################################################
library("doFuture")
library("doRNG")
library("future")
library("future.apply")
library("mlr3verse")
library("mlr3automl")
library("mlr3hyperband")
# set logger thresholds
lgr::get_logger("mlr3")$set_threshold("error")
lgr::get_logger("bbotk")$set_threshold("error")
# specify regression learners
learners = list(
lrn(
"regr.featureless",
id = "fl"
),
lrn(
"regr.lm",
id = "lm"
),
lrn(
"regr.cv_glmnet",
id = "glm"
),
lrn(
"regr.ranger",
id = "rf"
),
lrn(
"regr.xgboost",
id = "xgb"
),
lrn(
"regr.svm",
id = "svm"
)
)
learner_ids = sapply(
learners,
function(x) x$id
)
# define regression task
task = tsk("boston_housing")
# select small subset of features
task$select(c("age", "crim", "lat", "lon"))
# specify resampling
resampling = rsmp("cv")
# specify measure
measure = msr("regr.mse")
# autotuners for models with hyperparameters
learners[[3]] = create_autotuner(
learner = lrn("regr.cv_glmnet"),
tuner = tnr("hyperband")
)
learners[[4]] = create_autotuner(
learner = lrn("regr.ranger"),
tuner = tnr("hyperband"),
num_effective_vars = length(
task$feature_names
)
)
learners[[5]] = create_autotuner(
learner = lrn("regr.xgboost"),
tuner = tnr("hyperband")
)
learners[[6]] = create_autotuner(
learner = lrn("regr.svm"),
tuner = tnr("hyperband")
)
# create benchmark grid
design = benchmark_grid(
tasks = task,
learners = learners,
resamplings = resampling
)
# start parallel processing
registerDoFuture()
plan(multisession, workers = availableCores() - 1)
registerDoRNG(123456)
# execute benchmark
bmr1 = mlr3::benchmark(design)
# terminate parallel processing
plan(sequential)
# start parallel processing
registerDoFuture()
plan(multisession, workers = availableCores() - 1)
registerDoRNG(123456)
# execute benchmark
bmr2 = mlr3::benchmark(design)
# terminate parallel processing
plan(sequential)
# test for reproducibility
bmr1$aggregate()$regr.mse == bmr2$aggregate()$regr.mse
Here are a couple of interesting clues. If I run this code several times, the end result is the same each time (i.e., the same mix of TRUE and FALSE results for the different stochastic learners). But if I run this code in R and then run the same code in RStudio, I get a different mix of TRUE and FALSE results depending on the platform. Finally, if I run this code substituting a different dataset, then I get a different mix of TRUE and FALSE results at the end.
Using current versions of mlr3pipelines breaks mlr3automl due to this change: have a look at robustify ppl #489
Either AutoMLTuner
specializes mlr3tuning::AutoTuner
and AutoMLTuner$new(lrn("classif.svm"))
creates the desired AutoTuner
or create_auto_tuner(lrn("classif.svm"))
creates an mlr3tuning::AutoTuner
object.
I got a warning about not having the emoa package installed; not sure if this is a real issue or not. Just posting here for reference, in case not having it installed changes/hurts the results (if it's required for hyperband?).
> library(mlr3automl)
> iris_task <- tsk('iris')
> model <- AutoML(iris_task)
numeric_cols all_cols
no_encoding 4 4
one_hot_encoding 4 4
impact_encoding 4 4
Warning messages:
1: Package 'emoa' required but not installed for Tuner '<TunerHyperband>'
2: Package 'emoa' required but not installed for Optimizer '<OptimizerChain>'
There is an error when installing the new version of the package:
> devtools::install_github('a-hanf/mlr3automl')
Downloading GitHub repo a-hanf/mlr3automl@HEAD
√ checking for file 'C:\Users\Mislav\AppData\Local\Temp\Rtmp86dDna\remotes3efceab31af\a-hanf-mlr3automl-0a0d8c7/DESCRIPTION'
- preparing 'mlr3automl':
√ checking DESCRIPTION meta-information ...
- checking for LF line-endings in source and make files and shell scripts
- checking for empty or unneeded directories
- building 'mlr3automl_0.0.0.9000.tar.gz'
Installing package into ‘C:/Users/Mislav/Documents/R/win-library/4.1’
(as ‘lib’ is unspecified)
* installing *source* package 'mlr3automl' ...
** using staged installation
** R
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
converting help for package 'mlr3automl'
finding HTML links ... done
AutoML html
finding level-2 HTML links ... done
AutoMLBase html
Error: C:/Users/Mislav/AppData/Local/Temp/RtmpOowfbs/R.INSTALL6c4c644b4c74/mlr3automl/man/AutoMLBase.Rd:271: Bad \link text
* removing 'C:/Users/Mislav/Documents/R/win-library/4.1/mlr3automl'
* restoring previous 'C:/Users/Mislav/Documents/R/win-library/4.1/mlr3automl'
Warning message:
In i.p(...) :
installation of package ‘C:/Users/Mislav/AppData/Local/Temp/Rtmp86dDna/file3efc390535c3/mlr3automl_0.0.0.9000.tar.gz’ had non-zero exit status
Something broke:
model = AutoML(tsk("iris"), runtime=60)
Warning message:
The fallback learner 'response' and the base learner 'prob' have different predict types
model$train()
Error in makeActiveBinding(name, active[[name]], public_bind_env) :
symbol already has a regular binding
Called from: makeActiveBinding(name, active[[name]], public_bind_env)
It works without the runtime parameter
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.