Giter Site home page Giter Site logo

mlr-org / mlr3 Goto Github PK

View Code? Open in Web Editor NEW
879.0 28.0 81.0 31.7 MB

mlr3: Machine Learning in R - next generation

Home Page: https://mlr3.mlr-org.com

License: GNU Lesser General Public License v3.0

R 99.56% TeX 0.44%
machine-learning data-science classification regression r mlr3 r-package

mlr3's People

Contributors

adibender avatar be-marc avatar berndbischl avatar coorsaa avatar giuseppec avatar hadley avatar jakob-r avatar jemus42 avatar kant avatar mb706 avatar mboecker avatar michaelchirico avatar mllg avatar pat-s avatar pfistfl avatar quayau avatar raphaels1 avatar sebffischer avatar sumny avatar tdhock avatar tpielok avatar web-flow avatar zzawadz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mlr3's Issues

Store measures as part of experiment, not as part of task?

Storing measures as part of the task looked like a good idea first, but complicates things for benchmarks where there are different tasks and different measures. Could be caught at the start of benchmark(), but we generally want to be able to fuse rather arbitrary experiments into a BenchmarkResult. Possible next steps:

  1. Measures stay as-is, part of the task. Determine the union of all measures in a BenchmarkResult, and calculate missing scores on-demand.
  2. Measures are stored as part of experiment, defaulting to the measures of the task. When experiments are combined, update experiments and calculate missing scores.

I currently tend to prefer (2). With (1), there is no natural location to store the measure object, except altering the task which is awkward.

Error in vignette: 01-basics

Reprex:

library(mlr3)
#> The mlr3 package is currently work-in-progress. Do not use in production. The API will change. You have been warned.

mlr_tasks
#> <DictionaryTask> with 6 stored values: bh, iris, pima, sonar,
#>   spam, zoo
#> 
#> Public: add, get, has, items, keys, mget, remove

# list keys
names(mlr_tasks)
#>  [1] ".__enclos_env__" "keys"            "items"          
#>  [4] "remove"          "mget"            "initialize"     
#>  [7] "has"             "print"           "add"            
#> [10] "get"

# get a quick overview
as.data.frame(mlr_tasks)
#> Error in as.data.frame.default(mlr_tasks): cannot coerce class 'c("DictionaryTask", "Dictionary", "R6")' to a data.frame

Created on 2018-11-08 by the reprex package (v0.2.1)

Set predict_type and other options for multiple learners in a benchmark setting

# get some example tasks
tasks = mlr_tasks$mget(c("pima", "sonar", "spam"))

# get a featureless learner and a classification tree
learners = mlr_learners$mget(c("classif.featureless", "classif.rpart"))

# let the learners predict probabilities instead of class labels (required for AUC measure)
learners$classif.featureless$predict_type = "prob"
learners$classif.rpart$predict_type = "prob"

I would like to do learners$predict_type = "prob" instead of setting the predict_type (and other options) for each learner.

This would probably require learners to be an R6 class with a method of handling certain slots of it's children?

format arg for task$data()

task$data()currently only returns data.table's.
In cases where the backend is a sparse matrix, we would like to supply a format = sparse arg, so the data is returned in sparse-matrix format.

  requireNamespace("Matrix")
  data = Matrix::Matrix(sample(0:1, 30, replace = TRUE), ncol = 3, sparse = TRUE)
  colnames(data) = c("x1", "x2", "target")
  rownames(data) = paste0("row_", 1:10)
  b = as_data_backend(data)
  task = TaskRegr$new(id = "spmat", b, target = "target")

Instead of:

 
  d = task$data()
  task$backend$data(paste0("row_", seq_len(nrow(d))), colnames(d), format = "sparse")

i would like to do:

task$data(format = "sparse")

I can try to create a PR, if required.

rename key(s) to id(s)

In Dictionary, we use key and everywhere else, we use id. Should probably be always id.

[discussion] scope of mlr3

Hi here. I would like to start discussion about mlr3 scope - hope it can be useful for community.

First of all I'm very glad to see that mlr converged to use R6. IMHO there is no need to re-invent the wheel and in R community we need just to leverage design of super-successful scikit-learn. For this reason I've created mlapi pkg which almost mimics scikit-learn API. I use it in https://github.com/dselivanov/text2vec and https://github.com/dselivanov/rsparse.

I believe that such core pkg as mlr3 should:

provide only interface other pkgs should follow. We have a zoo of pkgs implementing ML algorithms, quality and interfaces vary a lot. I think it is obvious now that the approach taken by caret and mlr was not entirely correct. We can't wrap every useful pkg and re-create API/interface.

So I believe essentially if pkg aims to be a standard pkg for ML in R there 2 choices:

  • implement everything internally as its done in scikit-learn
  • provide interface and some utils to help reduce boilerplate coding and let other developers follow it (despite I personally don't like tydyverse it is a good example of how this approach can successfully work)

Use correct level of abstraction. Here I strongly believe we have to stick to matrices (dense and sparse from Matrix pkg) as it's done in scikit-learn. On top of that we may implement "transformers" to construct design matrices from data.frames.

Missing Functionality

  • Dimension reduction
    • Feature selection
    • Filtering
    • Ensemble filters
  • Plots:
    • ROC/Threshold
    • Benchmark Plots
    • Learner Prediction
    • Calibration
    • Learning Curves
  • Tasks:
    • Cost sensitive
    • Anomaly
    • Multi Output
    • Stacking
    • FDA
    • Survival
    • Clustering
    • Forecasting
    • Spatial (i.e. coordinates)
  • Resampling
    • Spatial CV

Default return / improved error message for rr$experiment()

When calling rr$experiment() or rr$experiments() without an argument, we should either have a default way (e.g. iter = 1) including a message or print a custom error message.

library(mlr3)
task = mlr_tasks$get("iris")
learner = mlr_learners$get("classif.rpart")

resampling = mlr_resamplings$get("cv")
resampling$param_vals = list(folds = 3)
rr = resample(task, learner, resampling)
#> INFO [mlr3] Running learner 'classif.rpart' on task 'iris (iteration 1/3)' ...
#> INFO [mlr3] Running learner 'classif.rpart' on task 'iris (iteration 2/3)' ...
#> INFO [mlr3] Running learner 'classif.rpart' on task 'iris (iteration 3/3)' ...

rr$experiment()
#> Error in assert_int(iter, lower = 1L, upper = nrow(self$data), coerce = TRUE): argument "iter" is missing, with no default

Created on 2018-12-18 by the reprex package (v0.2.1)

Refactor `predict_type`

Currently, this is a single string, i.e. "response" or "prob". We need to encode that "prob" automatically includes "response".

prediction objects

What always disliked in mlr is that manually creating prediction objects was always difficult.
Currently, it seems that we can only create prediction objects by passing a task.
Suppose I have a vector of predictions and a vector of true values. It would be great if I could use mlr3 to construct a prediction object just using this (+ possibly other information such as predict.type...).
Any thoughts?

Refactor control / options

  • Make everything an option?
  • Do not set defaults in .onAttach?
  • Control object (or "overwrites") could be saved inside the experiment

syntax a bit complicated

In my humble opinion the syntax is still a bit complicated, also besides the custom R6 classes.

Compare the code:

data("mtcars", package = "datasets")
b = BackendDataTable$new(data = mtcars[, 1:3])
task = TaskRegr$new(id = "cars", b, target = "mpg")

with

data("mtcars", package = "datasets")
task = makeClassifTask(data = mtcars[, 1:3], target = "mpg")

The second one seems at the moment also more intuitive to me, how should I now, that I need the "new" function to create a new task... ;)

Issue labels

Can we discuss if we want to use the same ones as in mlr or a different structure?

We will probably soon have > 20 issues and should group them.

Maybe its worth taking a look into other projects how their labels are organized.

They should be somewhat generic so we can use them across all repos.

Encapsulation for scoring

Unsure if we really need this, but we could also encapsulate the "score" via evaluate or callr. Then we need to also store the score_log.

Presumably easy to implement, but not very urgent.

Task creation with ordered factor variable fails

Task creation with dataset with ordered factor variable fails on TaskClassif and TaskRegr

df = data.frame(x = c(1, 2, 3), y = factor(c(1, 2, 3), ordered = TRUE), z = c("M", "R", "R"))
b = as_data_backend(df)
TaskClassif$new(id = "id", backend = b, target = "z")

throws:

Error in vapply(.x, .f, FUN.VALUE = .value, USE.NAMES = FALSE, ...) : 
  values must be length 1,
 but FUN(X[[2]]) result is length 2

It works fine without ordered factor variable

scaling well in mlr 3

This would be nice if packages of pbdR are utilized in mlr for efficient scalability.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.