Comments (11)
You are referring to this part of the vignette?
# buffering with presence-background data
bf2 <- buffering(speciesData = pb_data, # presence-background data
species = "Species",
theRange = 68000,
spDataType = "PB",
addBG = TRUE, # add background data to testing folds
progress = T)
meaning we should support arguments
- addBG
- spDataType
- species
in spcv-buffer
.
Target needs to be transformed to 0/1 before sampling.
If I understand correctly, the target is always binary. With "presence-background" (also called "presence-only") you simply perform a buffered-LOOCV on the presence points only whereas with "presence-absence" you do it on both, presence and absence observations.
from mlr3spatiotempcv.
Yes I am referring to this part.
If I understand correctly, the target is always binary. With "presence-background" (also called "presence-only") you simply perform a buffered-LOOCV on the presence points only whereas with "presence-absence" you do it on both, presence and absence observations.
The target column needs to be encoded with a numeric 0
or 1
. I think TRUE
or FALSE
might also work because in R TRUE == 1
is TRUE
. However, another encoding of presence/ absence would not work. But we can easily support this transformation if positive
is set in the TaskClassif
object.
from mlr3spatiotempcv.
Ah you mean it won't work if the target is a factor?
Actually that sounds a bit weird since binary targets should be encoded as a factor.
Not sure if adapting this behavior is the right thing to do, this should maybe be change upstream.
from mlr3spatiotempcv.
The presence data is filtered out like this in blockCV::buffering
presences <- speciesData[speciesData@data[,species]==1,]
from mlr3spatiotempcv.
@be-marc Would you like to take this on?
from mlr3spatiotempcv.
@pat-s Yes
from mlr3spatiotempcv.
Notes about spDataType
, species
and addBG
:
spDataType = PA
,species = NULL
- One fold for each observation.spDataType = PA
,species = Response
- One fold for each observation plus extra data about how many presence and absence observations are in each fold. The extra data might be not usable for us.spDataType = PB
,species = NULL
,addBG = TRUE
- Same as 1spDataType = PB
,species = NULL
,addBG = TRUE
- Same as 1spDataType = PB
,species = "response"
,addBG = TRUE
- One fold for each positive observation. Background points (negative observations) located inside the buffer are included in the test folds.spDataType = PB
,species = "response"
,addBG = FALSE
~ One fold for each positive observation. Test folds are just one positive observation.
For binary classification, we can reduce this to 1, 5, 6
For multi class classification and regression only 1 makes sense.
from mlr3spatiotempcv.
PA and PB always assume a binary response variable.
PA = presence & absence points are known
PB = only presence points are known, absence points are artificially created
Therefore 3&4 seem like 1 from a partitioning perspective but are different when it comes to modelling.
Maybe you were aware of all of this already and I just did this for no reason 😄
We should include the list you wrote down in the help page of SpCVBuffer.
from mlr3spatiotempcv.
PA and PB always assume a binary response variable
The wording is a bit misleading here. Both can be used with a multiclass and continuous response. Both work if species = NULL
and produce the same folds. spDataType = PB
and species = NULL
throws an error.
We should include the list you wrote down in the help page of SpCVBuffer.
I will create a new table with all possibilities. Maybe we should be a little bit more strict than the original function. I think it would be confusing that different parameter combinations create the same result and that parameter combinations are allowed which does not make really sense.
from mlr3spatiotempcv.
Multiclass and continuous response
spDataType = PA
, species =NULL
- Each observation is one test set. For each test set, all observations outside the buffer are the training set.
- All other combinations should throw an error.
Twoclass response
spDataType = PA
, species =NULL
- Each observation (positive or negative) is one test set. For each test set, all observations (positive and negative) outside the buffer are the training set.spDataType = PB
,species = "response"
,addBG = TRUE
- Each positive observation is one test set and the negative observations (background points) inside the buffer are also included in the test set. For each test set, all observations (positive and negative) outside the buffer are the training set.spDataType = PB
,species = "response"
,addBG = FALSE
~ Each positive observation is one test set. For each test set, all observations (positive and negative) outside the buffer are the training set.
spDataType = PA
,addBG = FALSE
-addBG
is only useable forspDataType = PB
, so we should not allow this.spDataType = PA
, species =Response
- Same train and test sets as 1. Since we do not use the extra information about the distribution of positive and negative observations in the train and test sets, we should not allow this combination.spDataType = PB
andspecies = NULL
should not be allowed sincePB
only makes sense if we can distinguish between positive and negative observations.
from mlr3spatiotempcv.
The wording is a bit misleading here. Both can be used with a multiclass and continuous response. Both work if species = NULL and produce the same folds. spDataType = PB and species = NULL throws an error.
In this case its just a LOOCV with some obs removed.
PA and PB always refer to a binary response.
Its even a dedicated modeling field which only uses these terms ("species distribution modeling") with their own algorithms.
Your last table shows a good overview:
- All kinds of response types: LOOCV with a buffer around the test set. We should not use "PA" for this one but something else
All others are specific to a binary response (either standard PA or the niche case PA).
from mlr3spatiotempcv.
Related Issues (20)
- Check out `spcosa` package
- spatial resampling for train and test set in computer vision cases HOT 1
- Loading mlr3spatiotempcv prevents pipelines with target variable transformations from making predictions HOT 2
- New SpCV method Zalazar et al.
- Handling of `sf` objects WRT `DataBackends` HOT 2
- Longterm play of Task*ST and DataBackends HOT 1
- `as_task_*_st` and friends could allow setting column roles directly HOT 2
- Update method help pages HOT 1
- as.data.table(mlr_resamplings) does not work without suggested packages
- Add label and man field to resamplings
- Clarify the use of column roles for grouping features HOT 2
- Task printer should show `time` and `space` column roles
- Log message during `private$sample()` when column roles "space" and "time" are set HOT 1
- sf object no longer accepted by TaskClassifST HOT 1
- CRAN 2.0.1 version produces bug when registering `sf` objects as spatial backend for `TaskClassifST` HOT 3
- `register_mlr3` fails due to non-matching columns HOT 1
- cleanup when unloading HOT 1
- Please remove dependencies on **rgdal**, **rgeos**, and/or **maptools** HOT 1
- Failure with the new version of **blockCV** HOT 5
- linnenbrink2023 reference broken in mlr3spatiotempcv vignette HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mlr3spatiotempcv.