Store measures as part of experiment, not as part of task? about mlr3 HOT 4 CLOSED

mlr-org commented on May 15, 2024

Store measures as part of experiment, not as part of task?

from mlr3.

Comments (4)

jakob-r commented on May 15, 2024

If somebody wants to benchmark over multiple measures it's his fault. The result just contain missing values. Plots and tables should work fine nonetheless:

(e.g. the following will just have some boxes missing)

library(ggplot2)
set.seed(1)
res = data.frame(score = runif(50), task = sample(c("a","b"), 50, TRUE), learner = sample(c("A","B"), 50, TRUE), measure = sample(paste0("measure", 1:10), 50, TRUE))
ggplot(res, aes(x = learner, y = score)) + geom_boxplot() + facet_grid(task~measure)

library(dplyr)
res %>% group_by(learner, task, measure) %>% summarise(score_mean = mean(score))

I even would not care about missing scores. If you throw in tasks with different measures you might not be able to compare them anyway (if they had the same measure)

from mlr3.

berndbischl commented on May 15, 2024

Storing measures as part of the task looked like a good idea first, but complicates things for benchmarks where there are different tasks and different measures.

michel I find this rather important (and liked the previous design)

can you please provide a CONCRETE example what could go wrong now, in your opinion? because i dont see this, yet.

I am assuming this:
I habe tasks t_1, ..., t_k, each with a potentially different list of measures.
benchmark takes an arbitrary input design table describing the exps: task | learner | resamping

where is the problem now? I assumed the result dt is a dt where each row is an experiment.

from mlr3.

berndbischl commented on May 15, 2024

@mllg shall we define what happens now (after hamgout call) an close here?

from mlr3.

mllg commented on May 15, 2024

Each experiment now additionally stores the measures (as an extra slot/column). You can now optionally provide a list of measures to e$score(), resample() and benchmark(), with a fallback to task$measures to keep the API simple. Note that this can lead to benchmark results with different performance measures (and as a result, missing values in the performance aggregation). Still todo: Add methods to calculate performance values for missing or additional measures.

from mlr3.

Recommend Projects

Store measures as part of experiment, not as part of task? about mlr3 HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent