iiasa / ibis.isdm Goto Github PK

Modelling framework for creating Integrated SDMS

Home Page: https://iiasa.github.io/ibis.iSDM/

License: Creative Commons Attribution 4.0 International

R 98.96% Stan 1.04%

biodiversity species-distribution-modelling spatial-grain scenarios spatial-predictions integrated-framework poisson-process sdm bayesian

ibis.isdm's Introduction

The ibis framework - An Integrated model for BiodIversity distribution projectionS

The ibis.iSDM package provides a series of convenience functions to fit integrated Species Distribution Models (iSDMs). With integrated models we generally refer to SDMs that incorporate information from different biodiversity datasets, external parameters such as priors or offsets with respect to certain variables and regions. See Fletcher et al. (2019) and Isaac et al. (2020) for an introduction to iSDMs.

Installation

The latest version can be installed from GitHub. A CRAN release is planned, but in the meantime the package can be found on R-universe as well.

# For installation (Not yet done)
install.packages("ibis.iSDM", repos = "https://iiasa.r-universe.dev")

# For Installation directly from github
install.packages("remotes")
remotes::install_github("IIASA/ibis.iSDM")

Basic usage

See relevant reference site and articles.

Note that the package is in active development and parameters of some functions might change.

Citation:

Jung, Martin. 2023. “An Integrated Species Distribution Modelling Framework for Heterogeneous Biodiversity Data.” Ecological Informatics, 102127, DOI

Acknowledgement

ibis.iSDM is developed and maintained by the Biodiversity, Ecology and Conservation group at the International Institute for Applied Systems Analysis (IIASA), Austria.

Contributors

All contributions to this project are gratefully acknowledged using the allcontributors package following the all-contributors specification. Contributions of any kind are welcome!

Martin-Jung

mhesselbarth

ibis.isdm's People

Contributors

Stargazers

Watchers

Forkers

xiewei18

ibis.isdm's Issues

Fix muddled namespaces and prevent integrating them from other packages

Loading the package (CTRL+SHIFT+L) places the functions from all imported or dependend packages into the ibis.iSDM own namespace.
To reproduce:

Check available functions via ibis.iSDM:: (and press < TAB > ), showing functions such as extract from raster or INLA-related functions.

Similar but likely directly related bug:
Helpfiles from the ibis.iSDM package are not loaded. For instance after loading the package ?distribution does not point to the distribution function but instead to stats::distribution

Fix remaining observed in threshold

I missed a few references to observed in threshold() function, which need to use the new field_occurrence argument. Will fix first thing tomorrow.

Right now, the code will still run if the default argument is not changed and the column of the provided points is named „observed“. So same behavior as before.

Add sufficient documentation to all functions with examples

Most functions in the package have only a skeleton of documentation according to best practices.
Ideally all functions that have matured and are exported should have - at a minimum - in the following order:

Overall name
Short description
@param: Parameters for each
@details: More detailed information on what the function does
\describe{} Eventual more detailed description of input parameters
@return: Explanation of the outputs that are returned
@examples: with simple test code

Add further spatial interpretability functionalities to ibis

Often we would like to ask what the limits to transferability are for a given model, where the model behaves badly and which predictors are driving the prediction spatial. Some of this information can be obtained by looking at prediction of model uncertainty (INLA and GDB currently, showing uncertainty in input data + model error + parameters).
There is also already the option to create spatial partial effects via the $spartial() function for GDB and BART models (and possibly INLA ?).
But further metrics in convenient functions would be good particular when we talking about scenarios where transferability between time steps is a major issue.
Implementing the MESS: Multivariate Environmental Similarity Surfaces index would be great or other proposed transferability metrics. Possibly this has already been done in other packages and some code examples could be taken from there.

Check that polygons work correctly with data integration

Currently adding small polygons via add_biodiversity_polpo etc are not working as expected and raising errors either because coordinates are wrong (check conversion to WKT / sampling points).
Needs to be fixed as soon as test data from dataclima is ready

Pipe operator

Currently both pipe operators are used (%>%, |>). I would suggest to either

Remove pipes completely
Stick to one (probably magrittr to not have to rely on R > 4.0)

Thoughts?

train() fails (missing value where TRUE/FALSE needed) when no CRS

Bit of an minor issue really...

I have been testing this out generated neutral landscapes using {NLMR} https://ropensci.github.io/NLMR/index.html and then using that to generate virtual species using {virtualspecies} however the rasters generated by NLMR aren't supplied with a CRS. Everything works as far as train() wherby I get this error:

> fit <- train(mod,
+              runname =  "Test INLA run",
+              verbose = T #
+              )
[Estimation] 2023-08-16 09:11:42 | Collecting input parameters.
[Estimation] 2023-08-16 09:11:44 | Adding engine-specific parameters.
Error in if (sf::st_is_longlat(bdry$crs)) { : 
  missing value where TRUE/FALSE needed

Which I presume is because I don't have a CRS:

> print(mod)
<Biodiversity distribution model>
Background extent: 
     xmin: 0, xmax: 0,
     ymin: 999, ymax: 999
   projection: NA
 --------- 
Biodiversity data:
   Point - Presence only <163 records>
 --------- 
  predictors:     env1, env2, env3 (3 predictors)
  priors:         <Default>
  latent:         None
  log:            <Console>
  engine:         <INLABRU>

Understandable if a CRS is needed but the error message is not immediately clear.

Is a CRS essential for model runs? In which case adding an assertion

ibis.iSDM/R/train.R

Line 145 in e3f658c

assertthat::assert_that(

about the CRS needing to not be an NA would give a user friendly message. (this is what I think needs to be done but I don't actually have much package development experience)

Or otherwise set up the function so that it can operate without a CRS? But maybe it makes more sense to do the previous option

How to obtain habitat suitability index (0 - 1) working with presence-only and oportunistic data (GBIF)

I really like the package.

I am working with presence-only and oportunistic data, for example, from GBIF.

How can I obtain the output and map from an specific engine in habitat suitability index (between 0 and 1).

Implement temporal interpolation between scenario time steps works

Currently project only works with (future) covariates that have a specified regular time range, e.g. [2020, 2021, 2022, 2023]. In some projects we are working in however data comes with larger than expected intervals such as decades [2020, 2030, 2040]. Here a helpful option could be to support temporal interpolation between those intervals, thus effectively altering the stars objects.

Allow for temporal interpolation within time steps (parameter date_interpolation in project()).
Add helper function so interpolate between time steps according to a specified interval.
(optional) Also support different types of interpolation between linear, nearest neighbor to smooth interpolations
Add some unit tests on default data.

Ensure that log file entries are correctly written and that progress updates are meaningful

ibis now supports a log function that saves all console output in a given text file.

    x <- distribution(india) %>%
      add_biodiversity_poipo(train, field_occurrence = 'observed', name = 'test') %>%
      add_predictors(covariates, transform = 'none',derivates = 'none') %>%
      add_log(filename = paste0('inlalog_','test','.txt'))
      engine_inla(optional_mesh = mesh)

However in tests for trained models this sometimes resulted in written output and sometimes not. Furthermore it needs to be ensured throughout that the package that progress messages are printed out at every processing step and those can be turned off as well (via a central verbose = FALSE parameter in train ).

INLA - Allow spatial dependency to be dataset specific

Currently the add_latent_spatial() for INLA only add a single spatial latent effect as SPDE to the model that is then shared in case multiple likelihoods are specified. This thus assumes that there across datasets/likelihoods there is a single source of spatial dependency/errors that are shared by all datasets.
Another option would be to have individual new latent effects that are shared but specific for each dataset. This could be added as a parameter add_latent_spatial() (maybe separate_spde = FALSE ?)
So
f(spatialfield1, model=spde, hyper=priors)
and
f(spatialfield2, copy='spatialfield1', fixed=FALSE, hyper=priors),

Sel_predictors does not work

x %>% sel_predictors(names = names(covs)[1:10]) returning TRUE instead of an altered distribution object.

Implement locally minimizing factors for predictions

A recent preprint suggested the use of min-linear logistic regressions for SDMs, which can help for example to identify those variables most important for a given pixel or area.

While I don't think we need to add their variation of a logistic regression as engine, i got another idea of a cool feature we could add. Specifically one could add a function that

calculates the spatial partial coefficient (via spartial) for each variable
Identify for each grid cell the maxima of the partial contributions
Assign a factorized representation of the variables to the resulting grid.
A plot parameter could also be supplied.

This would enable us to visually identify which variables are the main driving factor in each grid cell.

Improvements thin_observations

I already fixed quite a few bugs in thin_observations and streamlined the code which should make moving forward easier (14b2e52).

Further improvements:

Use minpoint is still confusing in terms of language I think. To me, this implies there is a chance also more points could remain in a cell/zone. However, this is not the case. If a cell/zone is sampled that is the fixed number of points that remains in the cell.
The upper limit of totake depends on the minimum point count per cell. However, in the environmental and zone method the sampling is done on a much larger scale. Thus, in this case it owuld be nice to use the minimum count per zone instead?
The weighting by bias value doesn't really make sense cause the sampling is grouped by cell, i.e., all weights are exactly the same anyhow.
I started to implement a spatial option, but similar as for the bias all intensity weights will actually be the same cause grouping by cell.

Prevent that functions alter non assigned objects of classes

For some unexplained reason (to me) distribution objects seem to store data even if the object itself is not stored. Although neglectable for now, this is a bug that we need to fix.

To reproduce:

library(ibis)
library(sf)
library(raster)

background <- raster::raster(system.file('extdata/europegrid_50km.tif', package='ibis'))
# Get test species
virtual_points <- sf::st_read(system.file('extdata/input_data.gpkg', package='ibis'),'points',quiet = TRUE)
ll <- list.files('inst/extdata/predictors/',full.names = T)
predictors <- raster::stack(ll);names(predictors) <- tools::file_path_sans_ext(basename(ll))

# Store new object in x with poipo data only
x <- distribution(background) %>%
  add_biodiversity_poipo(virtual_points, field_occurrence = 'Observed', name = 'Virtual points')

Now simply call the function to add predictors, but do not store the result anywhere.
For some reason the predictors are nevertheless stored in x. I think this is undesirable...

x %>% add_predictors(predictors, transform = 'none',derivates = 'none') 
print(x)

Error in match.arg(option, several.ok = FALSE)

Hello,

While running the vignette example I encountered the following error message:

Error in match.arg(option, several.ok = FALSE) : 
  'arg' should be one of “inla.call”, “inla.arg”, “fmesher.call”, “fmesher.arg”, “num.threads”, “smtp”, “safe”, “pardiso.license”, “keep”, “verbose”, “save.memory”, “working.directory”, “silent”, “debug”, “show.warning.graph.file”, “scale.model.default”, “short.summary”, “inla.timeout”, “fmesher.timeout”, “inla.mode”, “fmesher.evolution”

Here's my sessionInfo()

R version 4.3.0 (2023-04-21 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 22621)

Matrix products: default


locale:
[1] LC_COLLATE=Portuguese_Brazil.utf8  LC_CTYPE=Portuguese_Brazil.utf8   
[3] LC_MONETARY=Portuguese_Brazil.utf8 LC_NUMERIC=C                      
[5] LC_TIME=Portuguese_Brazil.utf8    

time zone: America/Sao_Paulo
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] assertthat_0.2.1 uuid_1.1-0       terra_1.7-29     xgboost_1.7.5.1  inlabru_2.8.0   
[6] sp_2.0-0         ibis.iSDM_0.0.5 

loaded via a namespace (and not attached):
 [1] tensorA_0.36.2       utf8_1.2.3           generics_0.1.3       class_7.3-22        
 [5] KernSmooth_2.23-22   lattice_0.21-8       magrittr_2.0.3       grid_4.3.0          
 [9] iterators_1.0.14     foreach_1.5.2        jsonlite_1.8.7       Matrix_1.6-0        
[13] e1071_1.7-13         backports_1.4.1      DBI_1.1.3            fansi_1.0.4         
[17] scales_1.2.1.9000    codetools_0.2-19     abind_1.4-5          cli_3.6.1           
[21] rlang_1.1.1          units_0.8-2          INLA_23.06.29        splines_4.3.0       
[25] munsell_0.5.0        withr_2.5.0          tools_4.3.0          proto_1.0.0         
[29] parallel_4.3.0       checkmate_2.2.0      dplyr_1.1.2          colorspace_2.1-0    
[33] ggplot2_3.4.2        vctrs_0.6.3          posterior_1.4.1      R6_2.5.1            
[37] proxy_0.4-27         lifecycle_1.0.3      classInt_0.4-9       pkgconfig_2.0.3     
[41] pillar_1.9.0         gtable_0.3.3         glue_1.6.2           data.table_1.14.8   
[45] Rcpp_1.0.11          sf_1.0-14            tibble_3.2.1         tidyselect_1.2.0    
[49] rstudioapi_0.15.0    farver_2.1.1         compiler_4.3.0       distributional_0.3.2

Any idea what could be happening?

Thanks!

Question about output map in vigenette "Train a basic model"

In the vigente "Train a basic model" there is a prediction map, "Normal PPM projection prediction". Is it the scale in the legend correct? Are in the thousands?

Support temporal specific predictions in ibis.iSDM by temporally linking occurrences and covariates

A key assumption of SDM is that the species is in equilibrium with their environmental niche. This assumption can however be false if species occurrences are seasonal dependent or vary over time (e.g. shifting their distribution).
A simple way of accounting for this to a limited extent, is to make all predictions in the package temporal specific by having the occurrences $s$ in a location $i$ interact with the environmental covariates $x$. This configuration will also have consequences for any temporal projections via project(...). Ideally, this integration also takes account of any lagged effects (e.g. $\beta_{t-1}$) so that impacts of temporal biases (e.g. species are not recorded in a certain year) can be mediated to some extent. Depending on the unit of time (month, season, year, decade) the specific implication might need to be altered.

Note: I propose to not work on this prior to the switch to the new terra framework, e.g. #17

Implementing this in the ibis.iSDM package would require several steps including some rewrites of the model formulations:

Propose to add this as a parameter to each add_biodiversity_* function call, e.g. temporal = TRUE, temporal_column = "year" and temporal_lag = FALSE
Some of the same parameters as for the biodiversity datasets have to be added to the covariates, specifically temporal = TRUE. Further the add_predictors() function has to support stars objects (default framework) as input here.
Implement error checks, unit tests and additional filters to make sure occurrences and covariates temporally align.
Need during model setup and data preparation a specific function that takes (if not available) the nearest environmental covariate as estimate for a point. This option could be set or unset in the train() call or the ibis_options().
Update model formula for each prediction step and change outputs generated by each.
Make sure projections work with the updated formula.
Update each threshold, plot, summary and validation call to - in the case of doubt - always output the first layer / time slot. Setting the temporal parameter should not affect the regular ibis functions.
Implement testthat calls
Check also that #28 works

Switch from `proto` to `R6` for all classes

There is some indication that this will have benefits in terms of memory efficiency. Thus we should at some point switch to objects defined by the R6 package instead.

[Scenarios] Implement a `summarize` function for showing directionality and displacement shifts of range centroids

Idea is to be able to summarize directionality and displacement shifts of the geographic range centroid for instance by delineating standard deviational ellipse (SDE) (Furfey et al.). So to calculate the direction as a bearing relative to true north (0°) and the linear distances respectively, between the centroids of the reference and future ranges.
Functions to calculate this are implemented in the calc_sde function implemented in ‘aspace’ R package (Bui et al 2012).
This can also calculate latitudinal shifts of northern / southern margins as well as gains/losses of area relative to reference range

Bug in thin_observations

There is a bug in thin_observations and specifically the bias method if no observation is located above the percentile of the bias layer. In this case, the vector for sub-setting the data.frame is empty.

[Enhancement] Support a validation comparison wrapper to compare different predictions

Currently validate can be run on individual raster and fitted model objects, but there is no way to compare two different spatial predictions in terms of their metrics. The idea is thus to add a handy function called compare() that acts as a wrapper on validation() outputs (or directly on model objects?).
Steps for implementation:

Ensure that each validate() output has an attribute that identifies the tibble as such.
Implement a C3 call for compare() on >= 2 outputs of these. The compare function then calculates the differential between the best and the next models and sorts the output (default parameter for sorting provided).
When called on a list the function should also work.
(Optional): Evaluate whether it is possible to run compare() directly on DistributionModels instead of validation outputs. This requires that each model is of the same family and/or can return some sort of parsimony criterion (AIC, BIC, WAIC, LOO, etc...). This likely requires considerably more work...

Beef up existing vignettes so that cover recent additions to the package

There are a number of recent functions that have been added to the package which would be good to have examples in the vignettes. Similarly I think there is also a need make vignette highlighting all the helper functions available in the package. Long-term this can only help internal/external folks in using the package.

Update existing vignettes, explaining better available parameters and fix typos
Add an additional vignette on data preparation functions in the package, highlighting both terra and stars options.
Add a vignette on options for mechanistic SDMs. Pending additions of them.

DESCRIPTION

Two questions about the DESCRIPTION

Why is Rcpp imported?
Why is the Collate field needed? I have never seen that before

get_ngbvalue is rather slow for fine meshes / grids

The get_ngbvalue function is rather slow, particularly for large grids. Rather than extracting the nearest neighbour, it might be possible to simply call raster::extract(..., fun =mean), particularly if the mesh (when using INLA) is really fine. This would massively improve speed, however might not work for cases and meshes as simple rule.

Maybe there are some rulesets on whether this can be improved

Support for compositing of threshold values via `ensemble`

Currently the threshold() the exact values x$get_thresholdvalue() are created for the provided suitability layer / threshold. But this information is never passed on to any eventual ensemble() calls. This can be suboptimal if a threshold is already created by maximizing for example a TSS score with independent data, but then again validated using the same dataset.

Ideally ensemble() should detect and reuse created threshold values if found in a model (or as attribute in a SpatRaster file). This could be supported directly within ensemble by basically using the various threshold values and averaging them as well (so for example a weighted mean of a series of thresholds from the provided models.

Todo:

Add respective parameter to ensemble() to reuse threshold values if found (default: true) and apply the same method used for ensemble calculations
Ensure that there is no logical fallacy when using the same parameter in ensemble forcasts (e.g. applied to scenario objects).
Add a small unit test to ensure this is tested and works

Helper function to summarize patch and landscape metrics on predicted

Capturing the idea spinned around during the BEC meeting today.
Ideally we have some helper function that loads in the namespace of landscapemetrics and calculates patch and landscape metrics on outputs of train, e.g. BiodiversityDistribution objects.

Add an inherent function to DistributionModel class objects in bdproto-distributionmodel.R names (for example) summary_patch or something else easy to interpret
Function should calculate a range of "standard" metrics such as average patch size, effective mesh index, average distance between patches etc.
Depending on whethera threshold (binary) has been added (check via self$get_thresholdvalue in object), calculate either continious (entropy) or discrete metrics.
(optional) Add a separate generic function to address this class function, for example summarize_patches or summarize_landscape ?
Add unit tests

Relevant reference here:
Lucas, P. M., González‐Suárez, M., & Revilla, E. (2019). Range area matters, and so does spatial configuration: predicting conservation status in vertebrates. Ecography, 42(6), 1103-1114. https://onlinelibrary.wiley.com/doi/abs/10.1111/ecog.03865

[Conceptual] Allow data integration options for different datasets

Currently different datasets are fitted in sequence of each other. Ideally we have more options here, specifically:

Allow more than 2 datasets for each algorithm
Allowing to change the order of the algorithm to be used for integration
Support two different forms that is (a) run in sequence and (b) consensus mapping via ensemble. For the latter support the use of an external (withheld) dataset for validation for each datasource.

Convenience function to support a PCA based Niche overview as graphical plot

The idea (by Piero) is as follows:

Take the environmental covariates from a BiodiversityDistribution object and do a PCA on them
Plot the first and second PC axis in environmental space as points (sample like 10000 from them)
Then overlay in a different colour all datasets, such as points or ranges to assess where data falls in.

This would allow a visual assessment of the extent to which the data falls within the whole environmental space. It should not replace the existing functionalities of partial_density() here but instead looks at all covariates.

Implement the function, either within the class definition or as external helper function (in plot.r)
Make sure it works and add a unit test

Ensure that all unit tests resolve correctly

I started with adding some unittests to the package (folder tests/testthat). Some of them were created a while ago and don't resolve anymore without error, which needs to be fixed
Reference for functions of testthat can be found here, but also see existing unittests.

If bugs are encountered while using the package, fix them and add a test!
If a new feature or function is added, add a test!

Trouble running models with engine_bart - additional documentation available?

I am trying to run some integrated SDMs with the BART engine to compare to models I ran with the embarcadero package with just one type of data, but cannot get the model to run with my data or the test data in the package. Perhaps I am missing some input parameters required for running, but either I can't find the relevant documentation or it doesn't exist yet.

Here is the example code that I tried to run with the data included in the package:

# Background layer
background <- terra::rast(system.file("extdata/europegrid_50km.tif", package = "ibis.iSDM", mustWork = TRUE))

# Load virtual species points
virtual_species <- sf::st_read(system.file("extdata/input_data.gpkg",package = "ibis.iSDM", mustWork = TRUE), "points")

# Predictors
predictors <- terra::rast(list.files(system.file("extdata/predictors/",  package = "ibis.iSDM",  mustWork = TRUE), "*.tif", full.names = TRUE))

# Make use only of a few of them
predictors <- subset(predictors, c("bio01_mean_50km","bio03_mean_50km","bio19_mean_50km",
                                   "CLC3_112_mean_50km","CLC3_132_mean_50km",
                                   "CLC3_211_mean_50km","CLC3_312_mean_50km",
                                   "elevation_mean_50km"))

mod.bart <- distribution(background) |>  
  add_biodiversity_poipo(virtual_species, field_occurrence = "Observed") |>
  add_predictors(env = predictors, transform = "scale", derivates = "none") |> 
  # A presence only dataset
  engine_bart() |> 
  # Train
  train(runname            = "Combined prediction", 
        only_linear        = FALSE)

When I try to run the model, I get the following error message:

Error in { : 
  task 1 failed - "unused arguments (newdata = list(c(-1.93168078800026, -1.92505439169727, -1.73861676829675, -1.70758217144704, -1.91975684387195, -1.99588761789738, -2.01016499496625, -2.02379005065133, -2.0738679664727, -2.09042068029056, -2.02469272600033, -1.9532551984751, -1.92903428398442, -1.7727665735043, -1.70782203692347, -1.65397925751084, -1.60795537344207, -1.63335829909637, -1.60448384112033, -1.55650025550981, -1.50606712040359, -1.40233581968582, -1.333916922279, -1.43675011679354, -1.54260525821538, -1.67125244832142, 
-1.79359415857199, -1.8530093145463, -2.04810651605745, -2.14524600704668, -2.15189296939762, -2.15647784789255, -2.22220024484918, -2.25294869772256, -2.14733133777705, -2.15867029427131, -2.19162297350525, -2.13085110411295, -1.96465055016459, -1.83387463222748, -1.67105924292237, -1.58961747535737, -1.61047172299883, -1.61022006790944, -1.61688190719595, -1.51371195679359, -1.42725549060479, -1.33465129260574, -1.16207506811305, -1.2490927369

I feel like I could easily be missing some necessary options, so maybe this is an easy fix.

My current session info as well:

R version 4.3.0 (2023-04-21 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] INLA_23.09.09     sp_1.6-0          matrixStats_1.0.0 dbarts_0.9-23     assertthat_0.2.1 
 [6] igraph_1.4.2      terra_1.7-29      xgboost_1.7.5.1   glmnet_4.1-7      Matrix_1.5-4     
[11] inlabru_2.9.0     fmesher_0.1.2     sf_1.0-12         lubridate_1.9.2   forcats_1.0.0    
[16] stringr_1.5.0     dplyr_1.1.2       purrr_1.0.1       readr_2.1.4       tidyr_1.3.0      
[21] tibble_3.2.1      ggplot2_3.4.3     tidyverse_2.0.0   ibis.iSDM_0.0.8  

loaded via a namespace (and not attached):
 [1] DBI_1.1.3              remotes_2.4.2          readxl_1.4.2           rlang_1.1.0           
 [5] magrittr_2.0.3         e1071_1.7-13           compiler_4.3.0         vctrs_0.6.2           
 [9] rgbif_3.7.7            httpcode_0.3.0         pkgconfig_2.0.3        shape_1.4.6           
[13] crayon_1.5.2           taxize_0.9.100         fastmap_1.1.1          lwgeom_0.2-13         
[17] utf8_1.2.3             rmarkdown_2.21         tzdb_0.3.0             bit_4.0.5             
[21] xfun_0.39              embarcadero_1.2.0.1003 jsonlite_1.8.4         reshape_0.8.9         
[25] uuid_1.1-1             parallel_4.3.0         R6_2.5.1               stringi_1.7.12        
[29] cellranger_1.1.0       stars_0.6-3            Rcpp_1.0.10            iterators_1.0.14      
[33] knitr_1.42             zoo_1.8-12             Metrics_0.1.4          splines_4.3.0         
[37] timechange_0.2.0       tidyselect_1.2.0       rstudioapi_0.14        abind_1.4-5           
[41] yaml_2.3.7             codetools_0.2-19       curl_5.0.0             lattice_0.21-8        
[45] plyr_1.8.8             withr_2.5.0            evaluate_0.20          survival_3.5-5        
[49] units_0.8-1            proxy_0.4-27           xml2_1.3.3             pillar_1.9.0          
[53] whisker_0.4.1          KernSmooth_2.23-20     foreach_1.5.2          generics_0.1.3        
[57] vroom_1.6.1            hms_1.1.3              munsell_0.5.0          scales_1.2.1          
[61] rgdal_1.6-6            class_7.3-21           glue_1.6.2             lazyeval_0.2.2        
[65] tools_4.3.0            data.table_1.14.8      grid_4.3.0             bold_1.2.0            
[69] ape_5.7-1              colorspace_2.1-0       nlme_3.1-162           raster_3.6-20         
[73] conditionz_0.1.0       proto_1.0.0            cli_3.6.1              fansi_1.0.4           
[77] gtable_0.3.4           oai_0.4.0              digest_0.6.31          classInt_0.4-9        
[81] crul_1.3               htmltools_0.5.5        lifecycle_1.0.3        dismo_1.3-9           
[85] httr_1.4.5             bit64_4.0.5

Allow `validate` to work with scenario outputs

Currently the validate() function only works with fitted models and independently supplied raster layers.
Ideally it is also possible to validate a scenario objects.
In this case validation points per time step (set as another column) have to be provided as input.
This functionality could be quite helpful in cases scenarios are back-casted (e.g. use years 2000 to 2005 data to project and validate for years 2005-2020).

Outputs should be overall averages and well average per time step.

INLA/INLABRU engines option

INLA removed the support for blas.num.thread recently (https://groups.google.com/g/r-inla-discussion-group/c/PP6_tloxUaI).

Add functionality to provide predictors in terms of point observations rather than gridded input.

To derive coefficients it can be valuable to match occurrence information to point rather than gridded inputs.
In this case point data is matched through a nearest neighbor matching to the nearest available point (already implemented via get_ngbvalue() .
This is for instance the case when we aim to derive coefficient for scenarios rather than making current predictions. By default this mode is thus for inference only.

Change `raster` functions to `terra`

There is an indication that terra is generally faster for many functions than raster. Thus we could change all the functionalities to terra. Unlikely to remove raster as a dependency since many other packages continue to use.
See package website.

Currently having terra and raster loaded still causes problems (for instance for as.data.frame(rasterObj)). Thus wait until terra has matured more

Note:
Also see how the use of the stars package develops

Joint likelihood seem to fail with `engine_inla` since last INLA upgrade

Individual models still work. Users are advised to switch to [engine:inlabru] for those types of models.

Implement wrappers for existing external mechanistic 'SDMs'

There are a number of more or less 'mechanistic' SDMs out there. Many of these packages still require or based on an initialization layers which are usually derived from correlative SDMs. Generally, it should be noted that the definition of what counts as 'mechanistic' is somewhat fluid, but in this context of the ibis.iSDM package we refer primarily to those functionalities that allow adding features not usually covered in correlative SDMs, e.g. demography, eco-evolutionary processes or population dynamics.

Add a wrapper for the poems R-package
Add a wrapper for the steps R-package.
Add a wrapper for the RangeShiftR R-package so that ibis.iSDM outputs can passed on semingly
Add some unit tests
Write a vignette explaining the available options in the package (links to #67 )

Integration several datasets

In the following example, predictions of the first biodiversity dataset are used as an additional predictor for the second dataset. However, the extracted predicted values are also added to the first dataset. Is there a particular reason for this? As far as I understand it, it should be enough to only add the values to the next following biodiversity dataset?

distribution(background) |>
  add_predictors(env = predictors, transform = "scale", derivates = "none") |>
  add_biodiversity_poipo(poipo = virtual_species, field_occurrence = "Observed",
                         formula = formula(paste("observed ~ a + b"))) |>
  add_biodiversity_poipo(poipo = virtual_species, field_occurrence = "Observed",
                         formula = formula(paste("observed ~ c + d"))) |>
  train(method_integration = "predictor")

Implemement a (Deep) Neural Network as engine in ibis

Among the different families of engines and algorithms in the package, one that we particularly miss at the moment are conventional and/or deep neural networks. Range of packages in R (usually calling python) available that can handle these types of models (e.g. keras or tensorflow).

Add engine_keras as new engine
Ensure that all possible data types and options (add_offset(), add_priors(), etc) are supported
Add some documentation to the package.
Add a unit testing.

Allow offsets and latent effects to be dataset type specific

Currently both the add_range_offset() function and the add_latent_spatial() function add the offset, respectively the shared SPDE to all equations regardless of dataset type.
Ideally there is a dataset= ... and group= parameter or similar to allow having different offsets per dataset and furthermore to have the option to have shared or independent spatial latent effects.

[Visualization] Implement a simple function to visualize the marginal response over the environmental space

Currently partial only allows the visualization of the direct response function on the linear or link scale. However much more common question would be to assess where - on the full range of a single or two covariates - the observed data fall in terms of occurrence density.
Proposition is thus to expand the partial scripts with an additional function that allows to quickly visualize the coverage in light of the available and provided data.

Add new function partial_density. Output always a ggplot object.
Ensure that different data types work

[Low priority] Potentially improve readability of code through `lintr`

Run the entire package through lintr:::addin_lint_package() and adjust style

Helper function to compare two or more predictions in terms of value and configuration overlap

A common issue with SDM projections is not only the validation of them, but also the comparison between predictions. In other words how similar in values, overlaps and even spatial configuration are two or more SDM projections?
We could think of adding a helper function to the package that allows to assess this.

Implement a wrapper method (compare() ?) that compares two or more predictions in terms of their overlap
Basic functionality could be a simple value comparison for continuous (e.g., pearson r, bray-curtis dissimilarity) or categorical data (e.g. sorensen similarity index, Schoner's D see Broenniemann et al. 2011).
Or similarity in terms of spatial configuration such as enabled via the motif package ?
Also support validation by providing a common metric and individually call and then rank (validate())
Add some unit tests and test data.
Small vignette entry

Partial (and spartial) calculations of glmnet seem to ignore lambda

The pdp package used to calculate the partial response function seems to ignore the optimal lambda, instead taking whatever is stored in lambda.min.
For consistency, need to consider removing this dependency and use direct prediction for engine_glmnet

Implement LOCV for INLA for single and multiple datasets

Prediction in INLA works by directly providing the unknow y^ to the model. This behaviour can also be used in cross-validation to assess the precision of the model. Benefit is that we can have a validation of the model while also providing all of it directly to it.
Pseudo-Code:

Take the provided dataset
Store a proportion p (5%?) of the dataset elsewhere
Set p to NA
Predict y with the model
Calculate (R)MSE

Besides single data points there should also be options to do this for whole datasets, e.g. remove all data (points, polygons) of dataset X and then predict it with the remaining data.

These options could be implemented either as parameter or - ideally - separately as whole function to be applied to ModelDistribution object. For instance using a new function called 'validate()' or similar.

Preparation for CRAN Release

First release:

usethis::use_cran_comments()
Update (aspirational) install instructions in README
Proofread Title: and Description:
Check that all exported functions have @return and @examples
Check that Authors@R: includes a copyright holder (role 'cph')
Check licensing of included files
usethis::use_revdep()
Reverse dependency checks revdepcheck::revdep_check(num_workers = 4)
Review https://github.com/DavisVaughan/extrachecks

Prepare for release:

Submit to CRAN:

usethis::use_version('patch')
devtools::submit_cran()
Approve email

Wait for CRAN...

Change parallel processing to `doFuture` throughout

The future package and its accompanying doFuture functionalities seems to be much richer and capable that doParallel.

Check where parallel processing is being used and where it could be
Change all parallel processing throughout the package and remove doParallel dependency
Make Options easier to use handling of cores and threads is more streamlined

'observed' vs 'Observed' default column name

Just a small one, inconsistency in default column name:

In validate the default column name is 'observed' https://iiasa.github.io/ibis.iSDM/reference/validate.html (although I see that there's future work to be done here so I guess you'll pick this up then)

in add_pseudoabsence() etc. it's 'Observed"

[Scenarios] Idea for an additional constraint -> minimum area size

Rationale: Smaller habitat patches do not provide similar value in terms of suitability as larger habitat patches. The idea would thus be that between scenario steps t and t+1 it is assessed whether a particular patch is isolated, and if so to reduce its value (continious) or remove it (bin threshold) for subsequent predictions. This would thus also affect dispersal calculations as it affects the amonut of source patches.
For continious projections this needs some thought, possibly a neighbourkernel sensitive to contrasts or similar?

Proposed function name:
add_constraint_minsize()
Parameters:
min_size = numeric()

Add additional warning messages to engines with regards to assumptions and applicability

Many engine in the package can be particular data hungry or make certain assumptions about the input data provided to the package. Further messages via myLog in a highlighted colour should be added directly to the engines to make users aware of any violations of assumptions.

Update:
Additionally or alternatively, a better idea would be to add a simple wrapper that tests common model assumptions implemented in a specific function check(fit).

Implement such a function (check) in a seperate call.
Implement engine and parameter specific posthoc checks.
Add a test

iiasa / ibis.isdm Goto Github PK

ibis.isdm's Introduction

The ibis framework - An Integrated model for BiodIversity distribution projectionS

Installation

Basic usage

Acknowledgement

Contributors

ibis.isdm's People

Contributors

Stargazers

Watchers

Forkers

ibis.isdm's Issues

Recommend Projects

Recommend Topics

Recommend Org