Giter Site home page Giter Site logo

mlr-org / mlr3tuning Goto Github PK

View Code? Open in Web Editor NEW
53.0 17.0 5.0 9.06 MB

Hyperparameter optimization package of the mlr3 ecosystem

Home Page: https://mlr3tuning.mlr-org.com/

License: GNU Lesser General Public License v3.0

R 100.00%
mlr3 machine-learning tuning r tune optimization hyperparameter-tuning bbotk r-package hyperparameter-optimization

mlr3tuning's People

Contributors

be-marc avatar berndbischl avatar github-actions[bot] avatar jakob-r avatar juliambr avatar mb706 avatar mllg avatar pat-s avatar sebffischer avatar sumny avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mlr3tuning's Issues

Stacking of nested resampling objects does not work (yet)

Something like this does not work at the moment due to the filtering of the tasks:

task = mlr3::mlr_tasks$get("iris")

learner = mlr3::mlr_learners$get("classif.rpart")
learner$param_vals = list(minsplit = 3)

resampling = mlr3::mlr_resamplings$get("cv")
resampling$param_vals = list(folds = 4L)

param_set = paradox::ParamSet$new(params = list(
  paradox::ParamDbl$new("cp", lower = 0.001, upper = 0.1
)))

terminator = TerminatorEvaluations$new(5L)
ff = FitnessFunction$new(task, learner, resampling, param_set)
inner_tuner = TunerRandomSearch$new(ff, terminator)
outer = mlr3::mlr_resamplings$get("cv")

TunerNestedResampling$new(nested, outer)
#> Error: Stacking of nested resampling tuner is not supported.

I even don't know if we want this or stick to the error message?

Tuner:tune_result

  1. the public method seems to be undocumented

  2. why is this not an active binding?

Weird deep cloning behavior of tuner because of hooks

This is (I guess) a pretty annoying issue. With deep = TRUE the terminator of gs is used within the hooks of gs1 but the terminator that should stop gs1 doesn't get updates. Therefore, tuning runs forever:

> learner = mlr3::mlr_learners$get("classif.rpart")
> learner$predict_type = "prob"
> resampling = mlr3::mlr_resamplings$get("holdout")
> measures = mlr3::mlr_measures$mget(c("auc", "mmce"))
> param_set = paradox::ParamSet$new(
+   params = list(
+     paradox::ParamDbl$new("cp", lower = 0.001, upper = 0.1),
+     paradox::ParamInt$new("minsplit", lower = 2, upper = 5)
+   )
+ )
>
> ff = FitnessFunction$new(
+   task = task,
+   learner = learner,
+   resampling = resampling,
+   measures = measures,
+   param_set = param_set,
+   ctrl = tune_control(store_prediction = TRUE) # for the exceptions
+ )
>
> terminator = TerminatorEvaluations$new(5)
> gs = TunerRandomSearch$new(ff, terminator)
>
> gs1 = gs$clone(deep = TRUE)
> gs1$tune()
INFO [mlr3] Benchmarking 5 experiments
INFO [mlr3] Running learner 'classif.rpart1' on task 'spam (iteration 1/1)' ...
INFO [mlr3] Running learner 'classif.rpart2' on task 'spam (iteration 1/1)' ...
INFO [mlr3] Running learner 'classif.rpart3' on task 'spam (iteration 1/1)' ...
INFO [mlr3] Running learner 'classif.rpart4' on task 'spam (iteration 1/1)' ...
INFO [mlr3] Running learner 'classif.rpart5' on task 'spam (iteration 1/1)' ...
INFO [mlr3] Finished benchmark
INFO [mlr3] Benchmarking 5 experiments
INFO [mlr3] Running learner 'classif.rpart1' on task 'spam (iteration 1/1)' ...
INFO [mlr3] Running learner 'classif.rpart2' on task 'spam (iteration 1/1)' ...
INFO [mlr3] Running learner 'classif.rpart3' on task 'spam (iteration 1/1)' ...
INFO [mlr3] Running learner 'classif.rpart4' on task 'spam (iteration 1/1)' ...
INFO [mlr3] Running learner 'classif.rpart5' on task 'spam (iteration 1/1)' ...
INFO [mlr3] Finished benchmark
INFO [mlr3] Benchmarking 5 experiments
INFO [mlr3] Running learner 'classif.rpart1' on task 'spam (iteration 1/1)' ...
INFO [mlr3] Running learner 'classif.rpart2' on task 'spam (iteration 1/1)' ...
INFO [mlr3] Running learner 'classif.rpart3' on task 'spam (iteration 1/1)' ...
INFO [mlr3] Running learner 'classif.rpart4' on task 'spam (iteration 1/1)' ...
INFO [mlr3] Running learner 'classif.rpart5' on task 'spam (iteration 1/1)' ...
INFO [mlr3] Finished benchmark
INFO [mlr3] Benchmarking 5 experiments
INFO [mlr3] Running learner 'classif.rpart1' on task 'spam (iteration 1/1)' ...
INFO [mlr3] Running learner 'classif.rpart2' on task 'spam (iteration 1/1)' ...
^C
> gs$terminator
TerminatorEvaluations with 0 remaining evaluations
> gs1$terminator
TerminatorEvaluations with 5 remaining evaluations

Autotuner: If the learner has configured param_vals, AutoTuner looses them

The following does not error, but does not behave as it should:

library(mlr3)
library(mlr3tuning)
library(paradox)

# Define the ParamSet
ps = ParamSet$new(params = list(ParamDbl$new(id = "cp", lower = 0, upper = .8)))
lrn = mlr_learners$get("classif.rpart", param_vals = list(minsplit = 2))
terminator = TerminatorEvaluations$new(20)

=> We set an additional hardcoded minsplit

at = AutoTuner$new(lrn, "cv3", measures = "classif.acc", ps,
  terminator, tuner = TunerRandomSearch, tuner_settings = list())

at$param_set$values
named list()
=> We seem to loose the param_vals here

at$train("iris")

at$param_set$values
$cp
[1] 0.3260657

Store "tuning path" in AutoTuner

this is simply the the BMR of the PerformanceEvaluator

create an option so that this is not stored by default, but settable

simply at the end of AutoTune$train store the bmr slot of the PE in an AT slot

PerformancEvaluator:eval should return the numerical score

I just saw code like this:

It looks horrible

TunerGenSA = R6Class("TunerGenSA",
  inherit = Tuner,
  public = list(
    initialize = function(pe, terminator, ...) {
      if (any(pe$param_set$storage_type != "numeric")) {
        err_msg = "Parameter types needs to be numeric"
        lg$error(err_msg)
        stopf(err_msg)
      }

      # Default settings:
      settings = list(smooth = FALSE, acceptance.param = -15,
        simple.function = FALSE, temperature = 250)
      super$initialize(id = "GenSA", pe = pe, terminator = terminator,
        settings = insert_named(settings, list(...)))
    }
  ),
  private = list(
    tune_step = function() {
      blackBoxFun = function (x, pe) {
        # set measure
        measure = pe$measures[[1L]]
        hashes = pe$bmr$data$hash
        # convert ParamSet to data.table
        x = setDT(as.list(x))
        pe$eval(x)
        new_hash = setdiff(pe$bmr$data$hash, hashes)
        # calculate performance
        perf = pe$bmr$resample_result(new_hash)$aggregate(measure)
        if (measure$minimize) perf else -perf
        return (perf)
      }
      self$GenSA_res = GenSA(fn = blackBoxFun, lower = self$pe$param_set$lower, upper = self$pe$param_set$upper,
        control = self$settings, pe = self$pe)
    }
  )
)

multiple measures?

a) simply for logging
b) so we support multicrit optimization

note that this needs to be properly doced in the interface

tuning: plugin or hook funtion

this was really missing in mlr

the user should be allowed to define, in each tuning step

  • something piece of code that is being run
  • and can log into the archive

FitnessFunction sets ids of learners

  learners = mlr3misc::imap(design$transpose(), function(xt, i) {
    learner = self$learner$clone()
    learner$param_set$values = insert_named(learner$param_set$values, xt)
    learner$id = paste0(learner$id, n_evals + i)
    return(learner)
  })

please do NOT change stuff like this internally
the ID should stay as it is
and please unit tests that this does not happen

tuner$aggregated() with only one result goves error

fitness_function = FitnessFunction$new(
  task = mlr_tasks$get("iris"), 
  learner = mlr_learners$get("classif.ranger"), 
  resampling = mlr_resamplings$get("cv"), 
  param_set = ParamSet$new(params = list(
    ParamInt$new("mtry", lower = 1, upper = 4)
)))
terminator_1 = TerminatorEvaluations$new(max_evaluations = 1)
tuner_1 = TunerRandomSearch$new(ff = fitness_function, terminator = terminator_1, batch_size = 1)
tuner_1$tune()
tuner_1$aggregated()
# Error in rbindlist(x[[col]], fill = TRUE, use.names = TRUE) : 
#   Item 1 of input is not a data.frame, data.table or list

Syntactic sugar for simple tuning

Maybe this is a common enough case

ff <- PerformanceEvaluator$new(task, lrn, cv, ps)
tuner <- TunerRandomSearch$new(ff, TerminatorEvaluations$new(10))
result <- tuner$tune()$tune_result()

that it would be worth it to define a shortcut, e.g.

result <- TuneRandom(task, lrn, cv, ps, 10)

Interface of `update_start()` and `update_end()

I assume that it would be nice to allow the Terminator to filter out configurations during update_start() so that we can effectively control the termination inside batches of jobs (AFAIK this is not possible at the moment). E.g., for something like TerminationWalltime we could just throw away all new configurations after the time budget for the optimization is exhausted. TerminationThreshold could do the same as soon as a suitable configuration is found in the $experiments.
Similarly, update_end() could remove experiments (or change their performance values to the worst possible value or NA) for experiments which were performed illegally.
I guess something like this would be nice:

xs = self$terminator$update_prepare(ff, xs)  # may remove points from xs
self$ff$eval_vectorized(xs)
self$terminator$update_finalize(ff) # may set performances to NA

Additionally, while passing the FF is not the worst idea, maybe we can break it down to something simpler? I find it confusing that the terminator operates on the class where it is stored.

AutoTuner, how to pass the tuning method

At the moment the tuning method is passed as factory:

at = AutoTuner$new(learner, resampling, param_set, terminator, tuner = TunerGridSearch,
  tuner_settings = list(resolution = 10L))

This is not very nice. We should pass an instantiated tuner object. The problem here is, that the tuner then already contains the task and it cannot be set by the autotuner which is required for nested resampling. Suggestion: The tune() member function of tuner gets a task for the tuning. This would be more consistent with train(task) and therefore much easier to use within the auto tuner.

What do you think about that?

Renaming of the AutoTuner class

The auto in the naming is misleading, it sounds like there happens something magic. Maybe we should think about renaming the learner.

Nested resampling outer resampling

We have to replace the resampling object of the fitness function with an ResamplingCustom object to correctly evaluate the performance. This is not possible at the moment. See mlr-org/mlr3#108

At the moment we just use a holdout to see that it works in general:

# We want this:
resampling_temp = ResamplingCustom$new()
resampling_temp$instantiate(self$ff$task, train_sets = list(train_set_temp))

# But doint that:
resampling_temp = ResamplingHoldout$new()
self$ff$resampling = resampling_temp

# To use the fitness function functionality:
self$ff$eval(tuner_temp$tune_result()$param_vals)

I also think that we should keep this structure and use eval of the fitness function since this handles the registration in the benchmark result etc. automatically.

learner API: new function predict_trace

many learners have a "sequence" parameter so you can get full trace from a single training run.

e.g. ntree, boosting rounds, or the "s" param in glmnet

this is should be supported.

basically we need to allow to "mark up" one hyperparam in the set, with a tag.
and then maybe not have a new predict_trace function but handle this in the normal predict

Feature request: `getTuneResult`

From the old mlr help page:

"Returns the optimal hyperparameters and optimization path after training."

Could inherit the options of getTuneResultOptPath().

AutoTuner vignette

We need this. Do also describe how nested resampling can be done with the AutoTuner.

Tuner does not have a printer

it currently has none, but has a "state", the number of performed iterations, and so own
what does mlr3 here?
and should the PerformanceEvaluator have a printer?

Check error handling and fallbacks for tuner

This needs to be done. Test setup could be:

  • debug learner with parameter space including error_train and error_predict
  • featureless learner as fallback learner

Add a section to the tuner vignette on how to see what really happens during the tuning.

Merge eval and eval_vectorized

We should have just one eval method that takes a data.table of parameter values and evaluates the fitness function. Should be more consistent with paradox.

Tuning: GridSearch on mixed int/numeric space

I want to tune over the following Parameter Space:

param_set = paradox::ParamSet$new(
  params = list(
    paradox::ParamDbl$new("classif.svm.cost", lower = 0.1, upper = 1),
    paradox::ParamInt$new("pca.rank.",  lower = 1, upper = 3)
  )
)

calling

pe = PerformanceEvaluator$new(iris, learner, resampling, param_set)
terminator = TerminatorEvaluations$new(30)
tuner = TunerGridSearch$new(pe, terminator)$tune()

I would now assume that it searches through pca.rank 1,2,3 and cost 0.1, 0.2, 0.3, ..., 1,
as this would evaluate to 30 evaluations.

Instead I get WARN [11:38:30.936] Set number of maximal evaluations to 21 to avoid multiple computation of the same grid.

I.e. it seems to somehow compute a weird resolution from the evaluations?

FitnessFunction: Think about naming

Is it good to have the word "function" in the name of that class? This might inflict that the created object might be a function but in fact its a class.
I see upcoming confusion.

Why not something similar to mlr like TuneWrapper?

fitness function vignette

Explain what the fitness function is, do also mention paradox, what it is, and link to the vignette

Custom Tuner vignette

  • How to write the objective function using the FitnessFunction
  • Explain how different Temrinator works on the GenSA example
  • Example on how to wirte a real custom class e.g. a TunerGenSA class

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.