Giter Site home page Giter Site logo

mlr-org / mlr Goto Github PK

View Code? Open in Web Editor NEW
1.6K 106.0 403.0 618.69 MB

Machine Learning in R

Home Page: https://mlr.mlr-org.com

License: Other

R 97.99% C 0.31% Shell 0.05% HTML 1.65%
machine-learning data-science tuning cran r-package predictive-modeling classification regression statistics r

mlr's People

Contributors

alexengelhardt avatar berndbischl avatar bhvieira avatar coorsaa avatar danielhorn avatar dominikkirchhoff avatar florianfendt avatar gegznav avatar giuseppec avatar hetong007 avatar ja-thomas avatar jackknifex avatar jakob-r avatar jakobbossek avatar karinschork avatar kerschke avatar larskotthoff avatar mariaerdmann avatar masongallo avatar mb706 avatar mllg avatar pat-s avatar pfistfl avatar philipppro avatar pre-commit-ci[bot] avatar schiffner avatar studerus avatar t-8-n avatar web-flow avatar zmjones avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mlr's Issues

[survival] getTaskFormula: interface change required

Using the survival::Surv function on the LHS of the formula is the preferred way to construct formulas required by most survival packages as this does not inflict copies of the input data.
But the argument delete.envis a hindrance here: with no environment attached the survival package is not in the search path and the function lookup will fail. On the other hand, I'd like to not carry these environments around for obvious reasons.

Is it okay to touch the interface of this function? The parameter delete.envis never used in mlr, but might be used in other projects.
I'd opt to replace with new parameter env defaulting to NULL or emptyenv(). I could then set this to as.environment("package:survival") which should have a similar effect but will allow the function lookup.

Stratified CV does not distribute observations to folds equally

This code snippet is called on each class label separately:

instantiateResampleInstance.CVDesc = function(desc, size) {
  test.inds = sample(size)
  # don't warn when we can't split evenly
  test.inds = suppressWarnings(split(test.inds, seq_len(desc$iters)))
  makeResampleInstanceInternal(desc, size, test.inds=test.inds)
}

Remaining obs are distributed to first folds. After joining the separate splits you can end up with up to [iters] more observations in the first fold than in the others.

Nested Resampling

How can I implement a version of nested resampling?

So far, I'm splitting my data into training and test sets (using method = "subsample").
Now, I want to run a feature selection on the training sets, using crossvalidations. Afterwards, I want to evaluate my results on the test sets of the subsamples.

Unfortunately, I can't find anything similar in the tutorial.

Tutorial: set show.info=FALSE to reduce some unnecessary output

In some cases, e.g., calling resample in later tutorial sections, show.info should be set to false, so we do not get so much crap on the page.

Only, when he output is very long and the reader does not really gain any additional understanding from seeing it.

Extract probability matrix from prediction obejct

It seems helpful to either have a method to extract the probability matrix from a prediction object oder to store it directly as a matrix / data.frame.

Example

learner <- makeLearner('classif.lda', predict.type="prob")
task <- makeClassifTask(data=iris, target="Species")
mod <- train(learner=learner, task=task)
pred.obj <- predict(mod, newdata=iris)
as.matrix(pred.obj$data[,paste("prob.", levels(pred.obj$data$response), sep="")])

This does not seem like an elegant solution, neither does anything I can come up with at the moment.

I think pred.obj$pred should return a matrix. But I don't know how that would interfere with existing methods.

MultiClass AUC

See wether we can create an AUC measure for more than 2 classes in mlr.

Here is a hint, sent by Markus by mail.

 learner <- makeLearner('classif.lda', predict.type="prob")
 task <- makeClassifTask(data=iris, target="Species")
 mod <- train(learner=learner, task=task)
 pred.obj <- predict(mod, newdata=iris)
 library(HandTill2001)
 predicted <- as.matrix(pred.obj$data[,paste("prob.", levels(pred.obj$data$response), sep="")])
 colnames(predicted)<-levels(pred.obj$data$response)
 auc(multcap(response=pred.obj$data$response, predicted=predicted))
  1. Investigate

  2. Add Measure, doc. and test it

  3. Briefly describe in tutorial / ROC part.

add "range" as a new aggregation function

Similar to this

my.range.aggr = mlr:::makeAggregation(id="test.range", 
  fun = function (task, perf.test, perf.train, measure, group, pred) max(perf.test) - min(perf.test))

Possibly export makeAggregation so the user can do this, too.

Also explain how to do this in tutorial

Feature filtering

  1. Check that filtering is nicely explained in the tutorial

  2. Can we access the filtered features after training of filter wrapper

  3. add MRMR, maybe also fmrmr

Proposition: Add methods from package 'DiscriMiner'

Methods as e.g. plsDA, geoDA. Perhaps take a look if it's in general an interesting package. Last update was in November 2013.

Also linDA and quDA are available, but I don't know the difference towards MASS lda or qda that already exists in mlr.

Not every learner is compatible with makeBaggingWrapper()

For instance

library("mlr")
data(iris)
tsk = makeClassifTask(data=iris, target="Species")
lrn = makeLearner("classif.fnn")
bagLrn = makeBaggingWrapper(lrn, bag.iters=5, bag.replace=TRUE, bag.size=0.6, bag.feats=3/4, predict.type="prob")
rsmpl = makeResampleDesc("RepCV", reps=5, fold=2)
resample(learner=bagLrn, task=tsk, resampling=rsmpl)

[Resample] repeated cross-validation iter: 1
Fehler in (function (train, test, cl, k = 1, prob = FALSE, algorithm = c("kd_tree",  : 
  dims of 'test' and 'train' differ

I think the predictor dislikes the fact that he gets the full dataset with variables not used while learning (bag.feats=3/4).

Reread Michel's new imputation code and add a section in tutorial

  • read code
  • read roxygen help
  • correct errors in both and extend docs a bit
  • add section in tutorial to explain how it works

these are the files:

  • Impute.R
  • ImputeMethods.R
  • PreprocImputeWrapper.R

Also ImputeWrapper is probably a better and short name than PreprocImputeWrapper.

Unifiy interface of "preprocessing operations before training"

We already have a couple of those:

  • impute
  • filter features
  • over/undersample
  • what else?

We have to make a list, than make the interface the same, so like

doTheOp(obj, data, target) : generic

doTheOp.data.frame

doTheOp.task

makeOpWrapper: internally calls doTheOp

getOpResults(model): allows the user to access the operation results when the wrapper was trained

Describe how ROC curves can be plotted with mlr ROCR

Construct example, 2 class problem from mlbench, 2 learners.

Crossvalidiate and compare ROC curves in one plot.

Add this example to the tutorial part and in asROCRPredictions.R function @example.

ROCR has examples to show how the plot is constructed, copy a simple one after calling asROCRPredictions.

caret::train can shadow mlr::train

It is annoying if a learner from caret is loaded, caret's train methods shadows mlr's method. This only happens in the global user namespace, but it leads to an unintuitiv error message for users.

I currently see no real fix except for renaming - which I dislike.
Lets think about it.

Add a (web / github) example to show multicriteria evaluation with mlr

Here is an example how to simultaneously look at mmce and the range of errors over resampling.

library(mlr)
library(mlbench)
library(ggplot2)
task = makeClassifTask(data=iris, target="Species")
lrn = makeLearner("classif.rpart")
rdesc = makeResampleDesc("CV", iters=2)
ms1 = mmce
my.range.aggr = mlr:::makeAggregation(id="test.range", 
  fun = function (task, perf.test, perf.train, measure, group, pred) max(perf.test) - min(perf.test))
ms2 =  setAggregation(mmce, my.range.aggr)
res = selectFeatures(lrn, task, rdesc, measures=list(ms1, ms2),  control=makeFeatSelControlExhaustive())
perf.data = as.data.frame(res$opt.path)
p = ggplot(aes(x=mmce.test.mean, mmce.test.range), data=perf.data) + 
  geom_point()
print(p)

Observation weighting

mlr already supports weighted observations. Learner have a property that tells you wheter they can be fitted in a weighted way. listLearners can give you all such learners.

  1. Better describe in the tutorial how this works, probably in the "learner" part.

  2. train and resample allow the passing of the weights.
    tuneParams, selectFeatures and the corresponding wrappers do not.
    Discuss and then extend. Maybe one wants to set the weights also in the task? Less annoying in some cases.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.