Giter Site home page Giter Site logo

lleisong / itsdm Goto Github PK

View Code? Open in Web Editor NEW
4.0 3.0 1.0 13.62 MB

Purely presence-only species distribution modeling with isolation forest and its variations such as Extended isolation forest and SCiForest.

Home Page: https://lleisong.github.io/itsdm/

License: Other

R 100.00%
species-distribution-modelling presence-onlymodel outlier-detection isolation-forest shapley-value

itsdm's Introduction

itsdm

Project Status: Active – The project has reached a stable, usable state and is being actively developed. R-CMD-check CRAN status

Overview

itsdm calls isolation forest and variations such as SCiForest and EIF to model species distribution. It provides features including:

  • A few functions to download environmental variables.
  • Outlier tree-based suspicious environmental outliers detection.
  • Isolation forest-based environmental suitability modeling.
  • Non-spatial response curves of environmental variables.
  • Spatial response maps of environmental variables.
  • Variable importance analysis.
  • Presence-only model evaluation.
  • Method to convert predicted suitability to presence-absence map.
  • Variable contribution analysis for the target observations.
  • Method to analyze the spatial impacts of changing environment.

Installation

Install the CRAN release of itsdm with

install.packages("itsdm")

You can install the development version of itsdm from GitHub with:

# install.packages("remotes")
remotes::install_github("LLeiSong/itsdm")

Example

This is a basic example which shows you how to solve a common problem:

library(itsdm)
library(dplyr)
library(stars)
library(ggplot2)

# Using a pseudo presence-only occurrence dataset of
# virtual species provided in this package
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"
obs_type <- "presence_absence"

# Format the observations
obs_train_eval <- format_observation(
  obs_df = obs_df, eval_df = eval_df,
  x_col = x_col, y_col = y_col, obs_col = obs_col,
  obs_type = obs_type)

# Get environmental variables
env_vars <- system.file(
  'extdata/bioclim_tanzania_10min.tif',
  package = 'itsdm') %>% read_stars() %>%
  slice('band', c(1, 6, 12, 15))

# Train the model
mod <- isotree_po(
  obs_mode = "presence_absence",
  obs = obs_train_eval$obs,
  obs_ind_eval = obs_train_eval$eval,
  variables = env_vars, ntrees = 200,
  sample_size = 0.8, ndim = 2,
  seed = 123L)

# Check results
## Suitability
ggplot() +
  geom_stars(data = mod$prediction) +
  scale_fill_viridis_c('Predicted suitability',
                       na.value = 'transparent') +
  coord_equal() +
  theme_linedraw()

## Plot independent response curves
plot(mod$independent_responses, 
     target_var = c('bio1', 'bio12'))

The Shapley values-based analysis can apply to external models. Here is an example to analyze impacts of the bio12 decreasing 200 mm to species distribution based on Random Forest (RF) prediction:

# Prepare data
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% 
  filter(usage == "train")

env_vars <- system.file(
  'extdata/bioclim_tanzania_10min.tif',
  package = 'itsdm') %>% read_stars() %>%
  slice('band', c(1, 5, 12)) %>% 
  split()

model_data <- stars::st_extract(
  env_vars, at = as.matrix(obs_df %>% select(x, y))) %>% 
  as.data.frame()
names(model_data) <- names(env_vars)
model_data <- model_data %>% 
  mutate(occ = obs_df[['observation']])
model_data$occ <- as.factor(model_data$occ)

mod_rf <- randomForest(
  occ ~ .,
  data = model_data,
  ntree = 200)

pfun <- function(X.model, newdata) {
  # for data.frame
  predict(X.model, newdata, type = "prob")[, "1"]
}

# Use a fixed value
climate_changes <- detect_envi_change(
  model = mod_rf,
  var_occ = model_data %>% select(-occ),
  variables = env_vars,
  target_var = "bio12",
  bins = 20,
  var_future = -200,
  pfun = pfun)

Contributor

  1. David Cortes, helps to improve the flexibility of calling isotree.

We are welcome any helps! Please make a pull request or reach out to [email protected] if you want to make any contribution.

Funding

This package is part of project "Combining Spatially-explicit Simulation of Animal Movement and Earth Observation to Reconcile Agriculture and Wildlife Conservation". This project is funded by NASA FINESST program (award number: 80NSSC20K1640).

itsdm's People

Contributors

david-cortes avatar lleisong avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

scbrown86

itsdm's Issues

How can I model a species distribution for a future climate scenario

I am using itsdm for species distribution modeling.

I have an isotree_po object with historic bioclimatic variables. I want to model the spcies dsitribution for a future climate scenario. I have future bioclimatic variable.

I try using the function probability but the stars object has 3 dimensions (x, y, and the 19 bands).

How can I have a stars object of stacked rasters with only 2 dimensions?

time dimension

Would it be possible to include a time dimension in the data matching? A use-case for this would be when there are observations over a long period of time (e.g. long term monitoring data or fossils) where there is the possibility of a species distribution/niche shift where matching up to a single time step doesn't really make sense.

As an example, I've attached a stars object (RDS file within the zip archive) with 3 dimensions (x, y, time), and 3 attributes (temperature, precipitation, elevation), and sampling points (geopackage within the zip archive) that vary through time.

stars_example_3d_w_time.zip

Better way to visualize the points cloud in function `plot.ShapDependence`

As @ldemaz mentioned, currently the points cloud is very dense, so it may mask where the correlations are strongest. Here are some ideas of how to improve it:

  1. @ldemaz suggested to use sampling strategy to thin the points so the relationship is made clearer. E.g. sample = “auto”, where sampling is done automatically when point density is greater than X. Alternative values could be “none” or a proportion.
  2. Or, as we discussed, fit a simplified curve (with confident interval?) based on points cloud, and plot the fitted curve using geom_line or geom_path instead of plotting the points directly.
  3. Or we can combine both options into the plot function.

Breaking change(s) in new version of fastshap

Hi @LLeiSong,

I am preparing a new release of fastshap for CRAN, and it seems like it will break some functionality in your package. You can see a list of changes here. I suspect the biggest change affecting your package is that fastshap >= 0.1.0 will no longer return a tibble, but rather a matrix.

Just wanted to give you a heads up before I plan to submit in two weeks.

itsdm

Run revdepcheck::revdep_details(, "itsdm") for more info

Newly broken

  • checking examples ... ERROR
    Running examples in ‘itsdm-Ex.R’ failed
    The error most likely occurred in:
    
    > ### Name: detect_envi_change
    > ### Title: Detect areas influenced by a changing environment variable.
    > ### Aliases: detect_envi_change
    > 
    > ### ** Examples
    > 
    > # Using a pseudo presence-only occurrence dataset of
    ...
    +   variables = mod$variables,
    +   shap_nsim = 1,
    +   target_var = "bio1",
    +   var_future = 5)
    Just set the single future variable.
    Change current bio1 with 5.
    Error in UseMethod("pull") : 
      no applicable method for 'pull' applied to an object of class "c('explain', 'matrix', 'array')"
    Calls: detect_envi_change ... shap_dependence -> lapply -> FUN -> data.frame -> %>% -> pull
    Execution halted
    

How to format a `stars`-object for the `probability` function?

Hi,

Thank you for your package.

I was interested in the way you managed to use the predict-function on a raster stack/stars object.
I'm not doing any species distribution modeling, but I was interested in using your probability function.

However, when trying to run the function, I got the following error:

Error in probability(bio_for, env_vars) : 
  Please format inputs to stars object with x and y dimensions only, and distribute variables to attributes.

Since I'm very new to the stars-package. I have no clue how to "distribute variables to attributes".
I tried to read your code from the isotree_po-function and found you use split(3) when a stars-object has 3 dimensions, but inserting the result of that call into the probability-function did not work either.

So how can I prepare the stars-object with the variables as attributes?

Here is a reproducible example of what I was trying to do:

env_vars <- system.file(
  'extdata/bioclim_tanzania_10min.tif',
  package = 'itsdm') %>% 
  read_stars() %>%
  slice('band', c(1, 5, 12))

bio_mat <- occ_virtual_species %>% 
  select(x,y) %>% 
  as.matrix() %>%
  st_extract(x = env_vars, at =.)

bio_for <- isolation.forest(bio_mat,
                 ndim=3, 
                 ntrees=10,
                 missing_action="fail")

probability(bio_for, env_vars)

Make the SHAP-related functions flexible to use external models

Now these SHAP-related functions take an internal defined function (.pfun_shap) to calculate Shapley values, so they can only be applied to the model created in package itsdm. These functions include: variable_contrib, shap_dependence, variable_analysis, and spatial_response.

It will be more useful if they can take a user-customized predict function based on the user's own model that is created outside of itsdm. And this will be a long-lasting enhancement for the package. The issue will be opened and updated until the whole feature is ready to release.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.