mlr-org / mlr Goto Github PK
View Code? Open in Web Editor NEWMachine Learning in R
Home Page: https://mlr.mlr-org.com
License: Other
Machine Learning in R
Home Page: https://mlr.mlr-org.com
License: Other
I think there were some minor issues with lower bounds or other stuff
Click on the links on any page of the tutorial
Options:
NULL
e.g. numeric(0)
NA
Using the survival::Surv
function on the LHS of the formula is the preferred way to construct formulas required by most survival packages as this does not inflict copies of the input data.
But the argument delete.env
is a hindrance here: with no environment attached the survival package is not in the search path and the function lookup will fail. On the other hand, I'd like to not carry these environments around for obvious reasons.
Is it okay to touch the interface of this function? The parameter delete.env
is never used in mlr, but might be used in other projects.
I'd opt to replace with new parameter env
defaulting to NULL
or emptyenv()
. I could then set this to as.environment("package:survival")
which should have a similar effect but will allow the function lookup.
Output as list or a generic for conversion would be nice.
This code snippet is called on each class label separately:
instantiateResampleInstance.CVDesc = function(desc, size) {
test.inds = sample(size)
# don't warn when we can't split evenly
test.inds = suppressWarnings(split(test.inds, seq_len(desc$iters)))
makeResampleInstanceInternal(desc, size, test.inds=test.inds)
}
Remaining obs are distributed to first folds. After joining the separate splits you can end up with up to [iters] more observations in the first fold than in the others.
How can I implement a version of nested resampling?
So far, I'm splitting my data into training and test sets (using method = "subsample").
Now, I want to run a feature selection on the training sets, using crossvalidations. Afterwards, I want to evaluate my results on the test sets of the subsamples.
Unfortunately, I can't find anything similar in the tutorial.
https://github.com/berndbischl/mlr/blob/master/R/SupervisedTask_operators.R#L140
I guess this is unwanted?
Maybe show and example of a grid search and convert the the opt path to a data.frame.
Maybe for normal tuning and wrappers.
Users must simply understand how to get all evaluated points.
Checked by Bernd:
fu** R.
In some cases, e.g., calling resample in later tutorial sections, show.info should be set to false, so we do not get so much crap on the page.
Only, when he output is very long and the reader does not really gain any additional understanding from seeing it.
Also in tutorial.
What if the user wants to set a certain constant threshold value for a learner?
Wasnt there an option for that? Check again.
It seems helpful to either have a method to extract the probability matrix from a prediction object oder to store it directly as a matrix / data.frame.
Example
learner <- makeLearner('classif.lda', predict.type="prob")
task <- makeClassifTask(data=iris, target="Species")
mod <- train(learner=learner, task=task)
pred.obj <- predict(mod, newdata=iris)
as.matrix(pred.obj$data[,paste("prob.", levels(pred.obj$data$response), sep="")])
This does not seem like an elegant solution, neither does anything I can come up with at the moment.
I think pred.obj$pred
should return a matrix. But I don't know how that would interfere with existing methods.
See wether we can create an AUC measure for more than 2 classes in mlr.
Here is a hint, sent by Markus by mail.
learner <- makeLearner('classif.lda', predict.type="prob")
task <- makeClassifTask(data=iris, target="Species")
mod <- train(learner=learner, task=task)
pred.obj <- predict(mod, newdata=iris)
library(HandTill2001)
predicted <- as.matrix(pred.obj$data[,paste("prob.", levels(pred.obj$data$response), sep="")])
colnames(predicted)<-levels(pred.obj$data$response)
auc(multcap(response=pred.obj$data$response, predicted=predicted))
Investigate
Add Measure, doc. and test it
Briefly describe in tutorial / ROC part.
Basically, one wants to see which feature gets add removed and how that changes performance
The code to get started is here:
https://github.com/berndbischl/mlr/blob/master/R/analyzeFeatSelResult.R
Only the first 2 functions, the rest should be checked and possibly removed if not so useful.
Somehow the make tutorial
script does not work correctly. Check the performance page
Using knitr manualy out of RStudio works fine.
Code is in todo-files/benchmark
Similar to this
my.range.aggr = mlr:::makeAggregation(id="test.range",
fun = function (task, perf.test, perf.train, measure, group, pred) max(perf.test) - min(perf.test))
Possibly export makeAggregation so the user can do this, too.
Also explain how to do this in tutorial
result = resample(learner=lrn, task=tsk, resampling=rsmpl, show.info=FALSE)
Loading packages on slaves: mlr
Or look over here to see what happens.
Check that filtering is nicely explained in the tutorial
Can we access the filtered features after training of filter wrapper
add MRMR, maybe also fmrmr
Methods as e.g. plsDA, geoDA. Perhaps take a look if it's in general an interesting package. Last update was in November 2013.
Also linDA and quDA are available, but I don't know the difference towards MASS lda or qda that already exists in mlr.
For instance
library("mlr")
data(iris)
tsk = makeClassifTask(data=iris, target="Species")
lrn = makeLearner("classif.fnn")
bagLrn = makeBaggingWrapper(lrn, bag.iters=5, bag.replace=TRUE, bag.size=0.6, bag.feats=3/4, predict.type="prob")
rsmpl = makeResampleDesc("RepCV", reps=5, fold=2)
resample(learner=bagLrn, task=tsk, resampling=rsmpl)
[Resample] repeated cross-validation iter: 1
Fehler in (function (train, test, cl, k = 1, prob = FALSE, algorithm = c("kd_tree", :
dims of 'test' and 'train' differ
I think the predictor dislikes the fact that he gets the full dataset with variables not used while learning (bag.feats=3/4
).
Not that tragic. But see here
r = resample(lrn, task, resampling = rout, extract = getTuneResult, show.info = FALSE)
Will generate a lot of output.
For imbalanced classes. What are good and simple stragegies here?
Some methods can be used for regression, some for classification.
Some work with categorical, some with numerical, some with mixed feature sets.
Check this in code and document it on help page.
Whether they look right
Can sometimes be annoying. Add option to configureMlr?
these are the files:
Also ImputeWrapper is probably a better and short name than PreprocImputeWrapper.
We already have a couple of those:
We have to make a list, than make the interface the same, so like
doTheOp(obj, data, target) : generic
doTheOp.data.frame
doTheOp.task
makeOpWrapper: internally calls doTheOp
getOpResults(model): allows the user to access the operation results when the wrapper was trained
Construct example, 2 class problem from mlbench, 2 learners.
Crossvalidiate and compare ROC curves in one plot.
Add this example to the tutorial part and in asROCRPredictions.R function @example.
ROCR has examples to show how the plot is constructed, copy a simple one after calling asROCRPredictions.
It is annoying if a learner from caret is loaded, caret's train methods shadows mlr's method. This only happens in the global user namespace, but it leads to an unintuitiv error message for users.
I currently see no real fix except for renaming - which I dislike.
Lets think about it.
In all github wiki / readme / tutorial files
Here is an example how to simultaneously look at mmce and the range of errors over resampling.
library(mlr)
library(mlbench)
library(ggplot2)
task = makeClassifTask(data=iris, target="Species")
lrn = makeLearner("classif.rpart")
rdesc = makeResampleDesc("CV", iters=2)
ms1 = mmce
my.range.aggr = mlr:::makeAggregation(id="test.range",
fun = function (task, perf.test, perf.train, measure, group, pred) max(perf.test) - min(perf.test))
ms2 = setAggregation(mmce, my.range.aggr)
res = selectFeatures(lrn, task, rdesc, measures=list(ms1, ms2), control=makeFeatSelControlExhaustive())
perf.data = as.data.frame(res$opt.path)
p = ggplot(aes(x=mmce.test.mean, mmce.test.range), data=perf.data) +
geom_point()
print(p)
Add a page for such stuff in wiki
mlr already supports weighted observations. Learner have a property that tells you wheter they can be fitted in a weighted way. listLearners can give you all such learners.
Better describe in the tutorial how this works, probably in the "learner" part.
train and resample allow the passing of the weights.
tuneParams, selectFeatures and the corresponding wrappers do not.
Discuss and then extend. Maybe one wants to set the weights also in the task? Less annoying in some cases.
It might not be really needed but it seems like it had been there and at least analyzeFeatSelResult depended on it.
names(getOptPathEl(opt.path, i))
> NULL
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.