Comments (13)
Hey @ck37
If you are able to compute indices for yourself, you can do it already, see (https://mlr3.mlr-org.com/reference/mlr_resamplings_custom.html).
library(mlr3)
task = tsk("penguins")
task$filter(1:10)
# Instantiate Resampling
custom = rsmp("custom")
train_sets = list(1:5, 5:10)
test_sets = list(5:10, 1:5)
custom$instantiate(task, train_sets, test_sets)
custom$train_set(1)
custom$test_set(1)
from mlr3spatiotempcv.
Issue-Label Bot is automatically applying the label feature_request
to this issue, with a confidence of 0.56. Please mark this comment with 👍 or 👎 to give our bot feedback!
Links: app homepage, dashboard and code for this bot.
from mlr3spatiotempcv.
A standalone temporal CV would definitely fit here, yes.
One can take different approaches of accounting for this, the most common is probably clustering (with kmeans as the default approach).
Other approaches I know are predefined groups of temporal clusters (if the groups are clearly separated).
The latter is already doable, the kmeans clustering can be quickly adopted by spcv-coords
.
In the end you want to ensure to decluster observations that are close in time because they show a high correlation among them naturally. This is the same issue as observations in space.
And yes, we can port over RollingWindowCV
.
from mlr3spatiotempcv.
I think the thing we want here is not clustering, but instead basically splitting train / test such, that
max(train$date) < min(test$date)
, i.e. we always test how well our algorithms generate to future settings.
This means the train
data grows in each fold, just as RollingWindowCV
.
I think @mllg wanted this as well, have you already started something there?
from mlr3spatiotempcv.
Ah ok, this is also an interesting approach!
In RollingWindowCV
, if you specify folds
, you discard some obs in some folds, is that correct? (judging from the example and test fold 1).
I am not sure if folds
is a good name here since usually folds are characterized to be unique in the test sets - this is more a bootstrapping approach? Would iters
be a better term in this case?
An argument supporting a percentage increase could be interesting?
have you already started something there?
Nope, nothing exists in this way, never had such a dataset yet.
But I'd say it would fit really good into this package.
from mlr3spatiotempcv.
Datasets:
- The bikesharing dataset is an example of such a dataset, we use it in two gallery posts:
mlr-org/mlr3gallery#13
mlr-org/mlr3gallery#64
A ressource on time-series cross-validation:
https://robjhyndman.com/hyndsight/tscv/
Here they call it fold, but i do not care a lot about the naming.
from mlr3spatiotempcv.
I believe this would also fit nicely in mlr3. Tasks already have column role "order" which can be used in something like "ResamplingOrderedCV" or "ResamplingOrderedHoldout".
from mlr3spatiotempcv.
If we have already a dedicated package for spatial and temporal CV stuff, I'd argue it should live there, simply because users might look for it there?
from mlr3spatiotempcv.
Coming back to this after a while, I now have a different view on this:
- I think it would be neat if we would have one dedicated package to spatiotemporal tasks and resampling methods and I think {mlr3spatiotempcv} would be a good fit. Also I think having different tasks classes is more confusing than it helps as spatial or temporal tasks share many properties. Thus,
TaskRegrST
andTaskClassifST
already havetemporal
in their name. - I see {mlr3forecasting} more on the same level as {mlr3raster}, i.e. taking care of the prediction calls while leaving task and resampling to {mlr3spatiotempcv}
From a user point of view, task and resampling stuff could then be done with one extension package (i.e. {mlr3spatiotempcv}.
When it comes to prediction/measures/learners, {mlr3raster} (or maybe {mlr3spatial}) and {mlr3forecasting} would come into play.
Thoughts?
from mlr3spatiotempcv.
Oliveira et al 2021 could be an interesting read.
from mlr3spatiotempcv.
I think I would like to postpone the implementation after the paper has been submitted. Including it before would require to introduce and discuss a somewhat distinct field which I would like to avoid right now.
from mlr3spatiotempcv.
I need this kind of method to use mlr3 for EHR-based machine learning - specifically the ability to define training/test/validation sets using date-based splits.
Is it possible for me to provide the splits to mlr3 and use the existing framework? I wasn't able to see how to do that in the documentation so far. It seems like I will need to use tidymodels otherwise.
from mlr3spatiotempcv.
Ah ok, awesome - appreciate the help & fast response 🙏
from mlr3spatiotempcv.
Related Issues (20)
- Check out `spcosa` package
- spatial resampling for train and test set in computer vision cases HOT 1
- Loading mlr3spatiotempcv prevents pipelines with target variable transformations from making predictions HOT 2
- New SpCV method Zalazar et al.
- Handling of `sf` objects WRT `DataBackends` HOT 2
- Longterm play of Task*ST and DataBackends HOT 1
- `as_task_*_st` and friends could allow setting column roles directly HOT 2
- Update method help pages HOT 1
- as.data.table(mlr_resamplings) does not work without suggested packages
- Add label and man field to resamplings
- Clarify the use of column roles for grouping features HOT 2
- Task printer should show `time` and `space` column roles
- Log message during `private$sample()` when column roles "space" and "time" are set HOT 1
- sf object no longer accepted by TaskClassifST HOT 1
- CRAN 2.0.1 version produces bug when registering `sf` objects as spatial backend for `TaskClassifST` HOT 3
- `register_mlr3` fails due to non-matching columns HOT 1
- cleanup when unloading HOT 1
- Please remove dependencies on **rgdal**, **rgeos**, and/or **maptools** HOT 1
- Failure with the new version of **blockCV** HOT 5
- linnenbrink2023 reference broken in mlr3spatiotempcv vignette HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mlr3spatiotempcv.