Giter Site home page Giter Site logo

Comments (11)

pat-s avatar pat-s commented on May 24, 2024

You are referring to this part of the vignette?

# buffering with presence-background data
bf2 <- buffering(speciesData = pb_data, # presence-background data
                 species = "Species",
                 theRange = 68000,
                 spDataType = "PB",
                 addBG = TRUE, # add background data to testing folds
                 progress = T)

meaning we should support arguments

  • addBG
  • spDataType
  • species

in spcv-buffer.

Target needs to be transformed to 0/1 before sampling.

If I understand correctly, the target is always binary. With "presence-background" (also called "presence-only") you simply perform a buffered-LOOCV on the presence points only whereas with "presence-absence" you do it on both, presence and absence observations.

from mlr3spatiotempcv.

be-marc avatar be-marc commented on May 24, 2024

Yes I am referring to this part.

If I understand correctly, the target is always binary. With "presence-background" (also called "presence-only") you simply perform a buffered-LOOCV on the presence points only whereas with "presence-absence" you do it on both, presence and absence observations.

The target column needs to be encoded with a numeric 0 or 1. I think TRUE or FALSE might also work because in R TRUE == 1 is TRUE. However, another encoding of presence/ absence would not work. But we can easily support this transformation if positive is set in the TaskClassif object.

from mlr3spatiotempcv.

pat-s avatar pat-s commented on May 24, 2024

Ah you mean it won't work if the target is a factor?
Actually that sounds a bit weird since binary targets should be encoded as a factor.
Not sure if adapting this behavior is the right thing to do, this should maybe be change upstream.

from mlr3spatiotempcv.

be-marc avatar be-marc commented on May 24, 2024

The presence data is filtered out like this in blockCV::buffering

presences <- speciesData[speciesData@data[,species]==1,]

from mlr3spatiotempcv.

pat-s avatar pat-s commented on May 24, 2024

@be-marc Would you like to take this on?

from mlr3spatiotempcv.

be-marc avatar be-marc commented on May 24, 2024

@pat-s Yes

from mlr3spatiotempcv.

be-marc avatar be-marc commented on May 24, 2024

Notes about spDataType, species and addBG:

  1. spDataType = PA, species = NULL - One fold for each observation.
  2. spDataType = PA, species = Response - One fold for each observation plus extra data about how many presence and absence observations are in each fold. The extra data might be not usable for us.
  3. spDataType = PB, species = NULL, addBG = TRUE - Same as 1
  4. spDataType = PB, species = NULL, addBG = TRUE - Same as 1
  5. spDataType = PB, species = "response" , addBG = TRUE - One fold for each positive observation. Background points (negative observations) located inside the buffer are included in the test folds.
  6. spDataType = PB, species = "response" , addBG = FALSE ~ One fold for each positive observation. Test folds are just one positive observation.

For binary classification, we can reduce this to 1, 5, 6

For multi class classification and regression only 1 makes sense.

from mlr3spatiotempcv.

pat-s avatar pat-s commented on May 24, 2024

PA and PB always assume a binary response variable.

PA = presence & absence points are known
PB = only presence points are known, absence points are artificially created

Therefore 3&4 seem like 1 from a partitioning perspective but are different when it comes to modelling.

Maybe you were aware of all of this already and I just did this for no reason 😄

We should include the list you wrote down in the help page of SpCVBuffer.

from mlr3spatiotempcv.

be-marc avatar be-marc commented on May 24, 2024

PA and PB always assume a binary response variable

The wording is a bit misleading here. Both can be used with a multiclass and continuous response. Both work if species = NULL and produce the same folds. spDataType = PB and species = NULL throws an error.

We should include the list you wrote down in the help page of SpCVBuffer.

I will create a new table with all possibilities. Maybe we should be a little bit more strict than the original function. I think it would be confusing that different parameter combinations create the same result and that parameter combinations are allowed which does not make really sense.

from mlr3spatiotempcv.

be-marc avatar be-marc commented on May 24, 2024

Multiclass and continuous response

  1. spDataType = PA, species = NULL - Each observation is one test set. For each test set, all observations outside the buffer are the training set.
  • All other combinations should throw an error.

Twoclass response

  1. spDataType = PA, species = NULL - Each observation (positive or negative) is one test set. For each test set, all observations (positive and negative) outside the buffer are the training set.
  2. spDataType = PB, species = "response" , addBG = TRUE - Each positive observation is one test set and the negative observations (background points) inside the buffer are also included in the test set. For each test set, all observations (positive and negative) outside the buffer are the training set.
  3. spDataType = PB, species = "response" , addBG = FALSE ~ Each positive observation is one test set. For each test set, all observations (positive and negative) outside the buffer are the training set.
  • spDataType = PA, addBG = FALSE - addBG is only useable for spDataType = PB, so we should not allow this.
  • spDataType = PA, species = Response - Same train and test sets as 1. Since we do not use the extra information about the distribution of positive and negative observations in the train and test sets, we should not allow this combination.
  • spDataType = PB and species = NULL should not be allowed since PB only makes sense if we can distinguish between positive and negative observations.

from mlr3spatiotempcv.

pat-s avatar pat-s commented on May 24, 2024

The wording is a bit misleading here. Both can be used with a multiclass and continuous response. Both work if species = NULL and produce the same folds. spDataType = PB and species = NULL throws an error.

In this case its just a LOOCV with some obs removed.

PA and PB always refer to a binary response.
Its even a dedicated modeling field which only uses these terms ("species distribution modeling") with their own algorithms.

Your last table shows a good overview:

  1. All kinds of response types: LOOCV with a buffer around the test set. We should not use "PA" for this one but something else

All others are specific to a binary response (either standard PA or the niche case PA).

from mlr3spatiotempcv.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.