Giter Site home page Giter Site logo

tutorials_marine_sdm's Introduction

Marine Species Distribution Model (SDM) Tutorial

Overview

This tutorial was developed during OceanHackWeek2023 to provide a simple workflow to developing a marine Species Distribution Model (SDM) using R programming. To see the OHW23 project at the end of OHW23, go to the ohw23_proj branch or see the ohw23_proj release.

Background

Species Distribution Modelling (SDM) also known as niche/environmental/ecological modelling uses an algorithm to predict the distribution of a species across space and time using environmental data. An understanding of the relationship between the species of interest and the physical environment they occupy will inform the selection of relevant environmental factors that will be included in the model.

Biotic information is also needed by SDMs and at the very least locations of individuals are needed. Abundance or densities can also be used as inputs, but are not compulsory. It is worth noting that absences, that is, the locations where individuals of a species are NOT present is just as important because it provides information about the environmental conditions where individuals are not usually sighted. Often absences are not recorded in biological data, but we can use background points (also known as pseudo-absences), which provide information about the full range of environmental conditions available for the species interest in our study area.

For a review of the performance of different SDM algorithms, see the following publications:

  • Valavi, Guillera-Arroita, Lahoz-Monfort, Elith (2021). Predictive performance of presence-only species distribution models: a benchmark study with reproducible code. DOI: 10.1002/ecm.1486

  • Elith et al (2006). Novel methods improve prediction of species’ distributions from occurrence data. DOI: 10.1111/j.2006.0906-7590.04596.x

For a discussion on the impact of background data on SDMs see: Phillips et al (2009). Sample selection bias and presence-only distribution models: implications for background and pseudo-absence data. DOI: 10.1890/07-2153.1. For a background sample generation refer to work by Valavi.

Datasets used in the tutorial

Biological Data

Our area of interest is the Indian Ocean, where four species of sea turtles have been reported to occupy this area:

  • Loggerhead, Caretta caretta
  • Green, Chelonia mydas
  • Olive Ridley, Lepidochelys olivacea
  • Hawksbill, Eretmochelys imbricata

For this tutorial, we will focus on predicting the areas occupied by loggerhead sea turtles. To do this, we will use presence-only data from 2000 until present, which have been sourced from the Ocean Biodiversity Information System (OBIS) via the robis package.

Environmental Data

This tutorial focuses on regions in the northern Indian Sea, specifically the western Arabian Sea, Persian Gulf, Gulf of Oman, Gulf of Aden and Red Sea. Environmental predictor variables were sourced via the sdmpredictors R package. The package give access to the https://bio-oracle.org/ and http://marspec.org/ high-resolution layers of various marine variables. Note these variables are location specific but not time specific: they are average values over time periods.

Workflow/Roadmap

This tutorial is based on the notes by Ben Tupper (Bigelow Lab, Maine), and highlights modeling presence-only data via maxnet R package.

Tutorial roadmap

  1. Presence Data -- obtain Loggerhead sea turtle (C. caretta) presence data from OBIS via robis
  2. Background Points -- shows two methods to create random background points within our area of interest
  3. Environmental Data -- obtain environmental predictors of interest using SDMpredictors
  4. Model -- run species distribution model and predict using maxnet
  5. Data Visualizations

References

Tutorial developers


Who is this tutorial intended for?

Some experience programming in R is needed to make the most of this tutorial. To run this tutorial make sure you clone this repository into your local machine by creating a new project that uses version control (git).

The tutorial content was developed in a R version 4.2.2 for Linux.

Additional resources

If you need additional support with R programming, you can check the following resources:

tutorials_marine_sdm's People

Contributors

eeholmes avatar lidefi87 avatar mackenziefiss avatar caitobrien avatar marysolokas avatar sjhong0117 avatar lauratsang avatar pfreire29 avatar btupper avatar cacourtier avatar 7yl4r avatar

Stargazers

 avatar

Watchers

 avatar Alex Kerney avatar  avatar  avatar  avatar  avatar Thomas Moore avatar  avatar  avatar  avatar

tutorials_marine_sdm's Issues

identfy "data stories" to look for

Looking for ideas on what data stories we might be able to look for using SDMs. The ideal data stories will have lots of data and be easy to see. Seasonal migrations might be a good starting point.

Within sea turtle data in the Arabian/Indian seas:

  • seasonal upwellings bring up cold water + chlorophyll blooms. How might this affect sea turtles?

predict function issue

Hi, I am trying to run the predict function following the same steps as in the exercise but I get the following error:

predicted <- predict(sdm.model,
envs.tars %>% sf::st_crop(bb),
clamp = clamp, type = type)

Error in [.stars(newdata, , v) :
selecting using invalid value label(s)?

Everything else I managed to replicate without problem. But I have not been able to solve this error.

identify gridded input layers

Input layers for generating an SDM need to be identified. Popular ones are Sea Surface Temperature, Cholorophyll A concentration, and bathymetry.

I have experience pulling these through ERDDAP, but there are other sources too. If using ERDDAP we can try using the (experimental) extractr library. A list of datasets in ERDDAP I like can be found here.

identify taxa occurrence data source(s)

I recommend OBIS (and|or) GBIF for this. The robis library works well for this. The data can be messy but it is still the best I have seen for large spatiotemporal regions of interest.

The holders of SEAMAP data listed in #1 are working to convert their data into Darwin Core so it can also be added to OBIS.

Adjusted get_seaturtle_data.Rmd

@lidefi87 Just a heads up I adjusted a few lines on the intro/end of your get_seaturtles_data.RMD to match better into the quarto.yml flow. Feel free to let me know if you have any questions.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.