If the target has a binary outcome, a presence-background approach (see blockCV::buffe

You are referring to this part of the <a href="http://htmlpreview.github.io/?https://g

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Multiclass and continuous response <code clas

Support presence-background option in "Spatial Buffer CV" about mlr3spatiotempcv HOT 11 CLOSED

mlr-org commented on May 24, 2024

Support presence-background option in "Spatial Buffer CV"

from mlr3spatiotempcv.

Comments (11)

pat-s commented on May 24, 2024

You are referring to this part of the vignette?

# buffering with presence-background data
bf2 <- buffering(speciesData = pb_data, # presence-background data
                 species = "Species",
                 theRange = 68000,
                 spDataType = "PB",
                 addBG = TRUE, # add background data to testing folds
                 progress = T)

meaning we should support arguments

addBG
spDataType
species

in spcv-buffer.

Target needs to be transformed to 0/1 before sampling.

If I understand correctly, the target is always binary. With "presence-background" (also called "presence-only") you simply perform a buffered-LOOCV on the presence points only whereas with "presence-absence" you do it on both, presence and absence observations.

from mlr3spatiotempcv.

be-marc commented on May 24, 2024

Yes I am referring to this part.

If I understand correctly, the target is always binary. With "presence-background" (also called "presence-only") you simply perform a buffered-LOOCV on the presence points only whereas with "presence-absence" you do it on both, presence and absence observations.

The target column needs to be encoded with a numeric 0 or 1. I think TRUE or FALSE might also work because in R TRUE == 1 is TRUE. However, another encoding of presence/ absence would not work. But we can easily support this transformation if positive is set in the TaskClassif object.

from mlr3spatiotempcv.

pat-s commented on May 24, 2024

Ah you mean it won't work if the target is a factor?
Actually that sounds a bit weird since binary targets should be encoded as a factor.
Not sure if adapting this behavior is the right thing to do, this should maybe be change upstream.

from mlr3spatiotempcv.

be-marc commented on May 24, 2024

The presence data is filtered out like this in blockCV::buffering

presences <- speciesData[speciesData@data[,species]==1,]

from mlr3spatiotempcv.

pat-s commented on May 24, 2024

@be-marc Would you like to take this on?

from mlr3spatiotempcv.

be-marc commented on May 24, 2024

@pat-s Yes

from mlr3spatiotempcv.

be-marc commented on May 24, 2024

Notes about spDataType, species and addBG:

spDataType = PA, species = NULL - One fold for each observation.
spDataType = PA, species = Response - One fold for each observation plus extra data about how many presence and absence observations are in each fold. The extra data might be not usable for us.
spDataType = PB, species = NULL, addBG = TRUE - Same as 1
spDataType = PB, species = NULL, addBG = TRUE - Same as 1
spDataType = PB, species = "response" , addBG = TRUE - One fold for each positive observation. Background points (negative observations) located inside the buffer are included in the test folds.
spDataType = PB, species = "response" , addBG = FALSE ~ One fold for each positive observation. Test folds are just one positive observation.

For binary classification, we can reduce this to 1, 5, 6

For multi class classification and regression only 1 makes sense.

from mlr3spatiotempcv.

pat-s commented on May 24, 2024

PA and PB always assume a binary response variable.

PA = presence & absence points are known
PB = only presence points are known, absence points are artificially created

Therefore 3&4 seem like 1 from a partitioning perspective but are different when it comes to modelling.

Maybe you were aware of all of this already and I just did this for no reason 😄

We should include the list you wrote down in the help page of SpCVBuffer.

from mlr3spatiotempcv.

be-marc commented on May 24, 2024

PA and PB always assume a binary response variable

The wording is a bit misleading here. Both can be used with a multiclass and continuous response. Both work if species = NULL and produce the same folds. spDataType = PB and species = NULL throws an error.

We should include the list you wrote down in the help page of SpCVBuffer.

I will create a new table with all possibilities. Maybe we should be a little bit more strict than the original function. I think it would be confusing that different parameter combinations create the same result and that parameter combinations are allowed which does not make really sense.

from mlr3spatiotempcv.

be-marc commented on May 24, 2024

Multiclass and continuous response

spDataType = PA, species = NULL - Each observation is one test set. For each test set, all observations outside the buffer are the training set.

All other combinations should throw an error.

Twoclass response

spDataType = PA, species = NULL - Each observation (positive or negative) is one test set. For each test set, all observations (positive and negative) outside the buffer are the training set.
spDataType = PB, species = "response" , addBG = TRUE - Each positive observation is one test set and the negative observations (background points) inside the buffer are also included in the test set. For each test set, all observations (positive and negative) outside the buffer are the training set.
spDataType = PB, species = "response" , addBG = FALSE ~ Each positive observation is one test set. For each test set, all observations (positive and negative) outside the buffer are the training set.

spDataType = PA, addBG = FALSE - addBG is only useable for spDataType = PB, so we should not allow this.
spDataType = PA, species = Response - Same train and test sets as 1. Since we do not use the extra information about the distribution of positive and negative observations in the train and test sets, we should not allow this combination.
spDataType = PB and species = NULL should not be allowed since PB only makes sense if we can distinguish between positive and negative observations.

from mlr3spatiotempcv.

pat-s commented on May 24, 2024

The wording is a bit misleading here. Both can be used with a multiclass and continuous response. Both work if species = NULL and produce the same folds. spDataType = PB and species = NULL throws an error.

In this case its just a LOOCV with some obs removed.

PA and PB always refer to a binary response.
Its even a dedicated modeling field which only uses these terms ("species distribution modeling") with their own algorithms.

Your last table shows a good overview:

All kinds of response types: LOOCV with a buffer around the test set. We should not use "PA" for this one but something else

All others are specific to a binary response (either standard PA or the niche case PA).

from mlr3spatiotempcv.

Support presence-background option in "Spatial Buffer CV" about mlr3spatiotempcv HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent