Giter Site home page Giter Site logo

funecology / fundiversity Goto Github PK

View Code? Open in Web Editor NEW
25.0 25.0 3.0 11.47 MB

📦 R package to compute functional diversity indices efficiently

Home Page: https://funecology.github.io/fundiversity/

License: GNU General Public License v3.0

R 65.39% TeX 34.61%
biodiversity biodiversity-indicators biodiversity-informatics functional-diversity functional-ecology functional-trait functional-traits r r-package trait trait-based traits

fundiversity's People

Contributors

actions-user avatar bisaloo avatar lionel- avatar rekyt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

fundiversity's Issues

Error when site contains no species in fd_fdiv() and fd_feve()

Don't know how this bug slip in the cracks but here it is:

fundiversity::fd_fdiv(fundiversity::traits_plants, fundiversity::site_sp_plants)
#> Error in traits[names(sub_site), , drop = FALSE]: indice hors limites

sum(fundiversity::site_sp_plants[10,])
#> [1] 0

fundiversity::fd_fdiv(fundiversity::traits_plants,
                      fundiversity::site_sp_plants[1:9,])
#>        site      FDiv
#> 1  elev_250 0.6341541
#> 2  elev_500 0.6543063
#> 3 elev_1000 0.7111319
#> 4 elev_1500 0.7546447
#> 5 elev_2000 0.7437969
#> 6 elev_2500 0.7620128
#> 7 elev_3000 0.6939043
#> 8 elev_3500 0.6414894
#> 9 elev_3750 0.5879492

Created on 2021-08-03 by the reprex package (v2.0.0)

This should mean that an early check in the individual fd_div() computation should be if all abundance are 0.

FRic returns NA without warnings nor messages

We use this example in the introduction vignette when subsetting the number of species.
However, it returns NA because they are not enough species present for FRic to be computed.

library("fundiversity")

fd_fric(traits_birds, site_sp_birds[, 1:5])
#> Differing number of species between trait dataset and site-species matrix
#> Taking subset of species
#>        site FRic
#> 1  elev_250   NA
#> 2  elev_500   NA
#> 3 elev_1000   NA
#> 4 elev_1500   NA
#> 5 elev_2000   NA
#> 6 elev_2500   NA
#> 7 elev_3000   NA
#> 8 elev_3500   NA

Created on 2022-10-21 with reprex v2.0.2

In my opinion, we should have a warning to make sure the user is aware the reason why FRic are NAs and also change the example in the introduction vignette.
Like with the following:

library("fundiversity")

fd_fric(traits_birds, site_sp_birds[, 1:60])
#> Differing number of species between trait dataset and site-species matrix
#> Taking subset of species
#>        site        FRic
#> 1  elev_250 18963.31311
#> 2  elev_500 18963.31311
#> 3 elev_1000 38586.75398
#> 4 elev_1500 38114.26828
#> 5 elev_2000  5888.93690
#> 6 elev_2500  5256.70628
#> 7 elev_3000  2710.81803
#> 8 elev_3500    88.11684

Created on 2022-10-21 with reprex v2.0.2

Add a switch in `options()` to prevent use of memoised function?

Currently, if the user has memoise on their computer, they will automatically get the memoised version of fd_chull() and fd_chull_intersect(), without any way to opt-out.

I think it would be good to offer the option to opt out, if the user wishes to do so for whatever reason.

Add support for sparse matrices

When using huge site-species matrices, it is sometimes more memory efficient to use sparse matrices.
We could implement the computation of indices with sparse matrices through the Matrix package (bundled with base R).

Renaming the vignettes to order them logically?

For now the vignettes on the CRAN and using vignette(package = "fundiversity") or browseVignettes(package = "fundiversity") are ordered based on alphabetical order:

image

This may not be a desirable order as it would make more sense to have the introduction vignette first, then the parallelization and other vignettes.

Maybe we should rename the files to be "fundiversity", "fundiversity-2", etc. as is done by future?

My preferred order would be:

  1. Introduction/Overview
  2. Parallelization
  3. Performance
  4. Correctness
  5. Design Principles

Maybe we should also rename the introduction vignette and parallelization vignette to have more explicit names.

Warn user when computing FRic with many species/many traits

geometry::convhulln() used in fd_fric() has some limitation in data size. In order to avoid wasting computer time we should probably issue a warning for large size datasets (many species × many traits) to say that the computation may fail in this case.

Performance comparison

In order to show the interest of having another package to compute functional diversity indices, we should also make a vignette that shows performance comparison between fundiversity and related packages.

Errors when using non-quantitative traits in fd_fric()

We should probably test the behavior of all functions when using non-quantitative traits to make the functions warn the user and not silently compute actual things.

See for example:

data("aviurba", package = "ade4")
fundiversity::fd_fric(aviurba$traits)
#> Warning in storage.mode(p) <- "double": NAs introduits lors de la conversion
#> automatique
#> Error in geometry::convhulln(traits, "FA"): The first argument should not contain any NAs

Created on 2021-08-05 by the reprex package (v2.0.0)

How should we deal with missing values?

Probably discard them but should we display a warning before doing so?

When a value is missing, should we discard the whole row or just ignore this value? What are the implications on the resulting indices (beyond the very practical fact that few datasets will be complete for all traits and species)?

Have a worked through example with individual level dataset

I'm getting a lot of question of confused users regarding the use of fundiversity with datasets at the individual level because we keep referring to "site-species" matrix and "species-trait" matrix.
Even though we've written it in the paper and in the introduction vignette, it seems that we need to specify it elsewhere with a well chosen example (maybe in another dedicated vignette?).

Parallel computation of functional diversity indices

From the performance vignette #8 we can see that some indices can take while to compute with big matrices.
One interesting (but certainly expansive development-wise) feature would be to allow for automatic parallel computation of functional diversity indices across sites through the split of the site-species matrix in chunks.
The implementation could use the future package.

Add CITATION file

Related to #27
Even before envisionning a manuscript, add a CITATION file that mention the version of the package used so that people can start using it beforne any scientific publication.

New index: intersection between convex hulls

As ecologists may be interested in the intersection of different convex hulls fundiversity could provide a wrapper around geometry::intersectn() that outputs the volume of the intersection between two volumes.

The input of this wrapper would be similar to fd_fric() (site-species matrix, trait matrix) and the output would be a distance matrix or rather a tidy data.frame with the first two columns giving the ids of the two considered sites, the third column would give the volume of the intersection.

Computationnally this could be quite intensive but should parallelize without issue (see #13).

fd_dis gives error if site x species matrix does not have row names

I ran into an issue using fd_dis() when my site x species matrix doesn't have row names. For my purposes, I cannot force my site x species matrices to have row names as they are contained within a simulation matrix (class simmat). I expect this could be an issue for future users as well.

An easy fix would be to either just force "Site" in the return dataframe from the fd_dis() function to be 1:nrow :

data.frame(site = 1:nrow(sp_com), FDis = fdis_site, row.names = NULL)

or use an if statement

if(is.null(rownames(sp_com))) { rownames(sp_com) <- 1:nrow(sp_com) }
data.frame(site = rownames(sp_com), FDis = fdis_site, row.names = NULL)

Site name when no site-species matrix is provided

When no site-species matrix is provided, the site is called "s1" should it be called something else?
Also the row.names is then "s1" this should probably be set to NULL to avoid confusion:

fundiversity::fd_fric(fundiversity::traits_birds)
#>    site     FRic
#> s1   s1 230967.7

Created on 2020-12-11 by the reprex package (v0.3.0)

Citation & documentation

We should cite the dataset used as example in the package as well as document it:

Nowak, Larissa et al. (2019), Data from: Projecting consequences of global warming for the functional diversity of fleshy-fruited plants and frugivorous birds along a tropical elevational gradient, Dryad, Dataset, https://doi.org/10.5061/dryad.c0n737b

Do not compute convex hulls on each row separately

A naive implementation to compute FRic is to compute the convex hull for each row separately.

However, this problem can be simplified since each row is a subset of the entire species list. The various convex hulls are not computed on completely independent points but on a subset / the union / etc. of points for which we previously computed the convex hull.

Release fundiversity 0.2.1

Prepare for release:

Submit to CRAN:

  • usethis::use_version('patch')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted 🎉
  • usethis::use_github_release()
  • usethis::use_dev_version()

Specific error obtained with Qhull

Here is an issue that aims to be a list of specific errors due to Qhull to better manage #38 in the future.

I obtained the following error in Qhull:

Error: Received error code 5 from qhull. Qhull error:
QH6271 qhull precision error (qh_check_dupridge): wide merge (1122660224434 times wider) due to duplicate ridge with nearly coincident points (0.027) between f102531 and f102515, merge dist 0.093, while processing p2261

  • Ignore error with option 'Q12'
  • To be fixed in a later version of Qhull
  • Vertex distance 0.027 is greater than 100 times maximum distance 8.3e-14
    Please report to [email protected] with steps to reproduce and all output
    ERRONEOUS FACET:
  • f102531
    • flags: top new seen mergehorizon dupridge mergeridge2
    • normal: 0.05806 0.1744 -0.5788 0.7752 -0.174
    • offset: 0.1009689
    • vertices: p2261(v1312) p2266(v1046) p2272(v799) p2262(v289) p2265(v160)
    • neighboring facets: f12145 f12159 f102533 f102515 f102532
    • ridges:
    • r74938 tested
      vertices: p2266(v1046) p2272(v799) p2262(v289) p2265(v160)
      between f102531 and f12145
    • r97773
      vertices: p2261(v13

When running the inst/new_benchmark.R script with the following parameter: 50 sites, 5 traits, 200 species, code at c9d85d8 commit.

Standardize FRic by theoretical maximum

Most packages that compute FRic propose a version where it is standardized by regional maximum. We could propose this in fd_fric() with a stand = TRUE argument

FDiv is computed on the vertices of the convex hull

FDiv is computed on the vertices of the convex hull formed by the data points, not on the entirety of the points.

This means we have to compute the convex hull even in the FDiv.

This might cause quite a performance hit so it might be interesting to think about caching the result of convhulln() to avoid unnecessary re-computation when getting both FRic and FDiv. This is likely the role of verts.txt in FD.

Should we write a vignette for non-continuous trait data?

I often get emailed about non-continuous trait data that fundiversity doesn't handle.
I wonder if we should write a full vignette to show the general workflow, an include this in the error message for non-continuous trait data.

The workflow would go as follow (with a worked through example):

  1. Identify the nature of all of your traits (binary, fuzzy-coded, categorical)
  2. Compute a dissimilarity matrix between your species of interest (Gower's dissimilarity or its extensions from Podani or Pavoine)
  3. Project this dissimilarity matrix onto continuous axes through a Principal Coordinates Analysis
  4. Select a number of PCoA axes and interpret them using correlation with original trait data
  5. Use them in fundiversity

I don't think it would be necessary to add any feature to fundiversity to deal with this case, as it's covered extensively by other tools (especially mFD), but maybe having a long-form documentation could be helpful to point users to.

Hex logo

Just dropping it here :)
It would be really cool to have a hex logo!
We should brainstorm on how to represent functional diversity the best? Possibly two convex hulls of points and their intersection highlighted?

Add tip in error message because of non-continuous traits

It's the second person that contacts me because of the non-continuous trait error message:

fundiversity/R/fd_fric.R

Lines 65 to 68 in 0a58ab6

if (!is.numeric(traits)) {
stop("Non-continuous trait data found in input traits. ",
"Please provide only continuous trait data", call. = FALSE)
}

Maybe we should point people to multivariate analyses to get back continuous traits. Like adding a line "If you want to use non-numeric traits with fundiversity, you have to transform them to obtain numerical traits beforehand (e.g., through a PCoA or similar techniques)"

It would probably be too specific, but at least would point user to ways to overcome the issue themselves.

Managing error from qhull transferred to geometry::convhulln() in fd_chull()

Sometimes when doing thousand of computation of FRic with fd_fric() qhull errors because of edge cases. This cancels the computation of all simulation and there is currently no option for the user to proceed anyway or at least to output an NA volume. Some of these errors are real errors while some are warnings from quickhull that are transformed into errors when transferred to R.

The solution would be to see with the geometry maintainers to see how to pass warning and errors to R. But it's probably not as easy.

The error message from qhull indicates that the Pp option can solve some of these issues.

Be more specific in the help with memoization

A little connex to #71 but the current description of memoisation

fundiversity/R/fd_fric.R

Lines 34 to 41 in 2d19f5c

#' @details By default, when loading \pkg{fundiversity}, the functions to
#' compute convex hulls are
#' [memoised](https://en.wikipedia.org/wiki/Memoization) through the `memoise`
#' package if it is installed. To deactivate this behavior you can set the
#' option `fundiversity.memoise` to `FALSE` by running the following line:
#' `options(fundiversity.memoise = FALSE)`. If you use it interactively it will
#' only affect your current session. Add it to your script(s) or `.Rprofile`
#' file to avoid toggling it each time.

doesn't mention that the options should be used before load fundiversity.

Maybe it could also be good to mention when to use and not use memoisation.

Using future and memoise together

We officially recommended not to use memoise and future at the same time in the fundiversity manuscript.
However, there maybe ways to get both.

I will collect here possibilities to work with both:

It seems for now that using memoise it's not straightforward to parallelize.
Maybe we should be extra-careful and add a warning when loading the package with memoisation that it shouldn't be used with parallelization.
We should also add this in:

  1. The README file
  2. The parallelization vignette
  3. The memoisation documentation segment
  4. The parallelization documentation segment.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.