funecology / fundiversity Goto Github PK

📦 R package to compute functional diversity indices efficiently

Home Page: https://funecology.github.io/fundiversity/

License: GNU General Public License v3.0

R 65.39% TeX 34.61%

biodiversity biodiversity-indicators biodiversity-informatics functional-diversity functional-ecology functional-trait functional-traits r r-package trait trait-based traits

fundiversity's People

Contributors

Stargazers

Watchers

Forkers

yangxhcaf lionel- theprestige-shop

fundiversity's Issues

Error when site contains no species in fd_fdiv() and fd_feve()

Don't know how this bug slip in the cracks but here it is:

fundiversity::fd_fdiv(fundiversity::traits_plants, fundiversity::site_sp_plants)
#> Error in traits[names(sub_site), , drop = FALSE]: indice hors limites

sum(fundiversity::site_sp_plants[10,])
#> [1] 0

fundiversity::fd_fdiv(fundiversity::traits_plants,
                      fundiversity::site_sp_plants[1:9,])
#>        site      FDiv
#> 1  elev_250 0.6341541
#> 2  elev_500 0.6543063
#> 3 elev_1000 0.7111319
#> 4 elev_1500 0.7546447
#> 5 elev_2000 0.7437969
#> 6 elev_2500 0.7620128
#> 7 elev_3000 0.6939043
#> 8 elev_3500 0.6414894
#> 9 elev_3750 0.5879492

^{Created on 2021-08-03 by the reprex package (v2.0.0)}

This should mean that an early check in the individual fd_div() computation should be if all abundance are 0.

Investigate RCGAL as a replacement of geometry

https://github.com/stla/RCGAL

I have not tested anything yet.

Would need to:

wait until it's on CRAN
evaluate the performance vs geometry

Downside:

has more imports but that might be discussed with the developer(?)

Check that computations are numerically correct

As we are reimplementing indices, we should probably check that the computation are numerically correct through other packages.

Add references that compute convex hulls intersections

Change title of package in DESCRIPTION as it doesn't only computes "Alpha functional diversity indices"

For now, the title of the package is "Easy Computation of Alpha Functional Diversity Indices". However, given that we include fd_fric_intersect() and could include other beta-diversity indices in the future, we could drop the word "Alpha" altogether.

I'm opening the issue to remind myself of doing so and check that we got rid of "Alpha" across CITATION files (and zenodo, etc.)

Add comparison to theoretical algorithmic complexity to performance vignette

Related issue: #4

FRic returns NA without warnings nor messages

We use this example in the introduction vignette when subsetting the number of species.
However, it returns NA because they are not enough species present for FRic to be computed.

library("fundiversity")

fd_fric(traits_birds, site_sp_birds[, 1:5])
#> Differing number of species between trait dataset and site-species matrix
#> Taking subset of species
#>        site FRic
#> 1  elev_250   NA
#> 2  elev_500   NA
#> 3 elev_1000   NA
#> 4 elev_1500   NA
#> 5 elev_2000   NA
#> 6 elev_2500   NA
#> 7 elev_3000   NA
#> 8 elev_3500   NA

^{Created on 2022-10-21 with reprex v2.0.2}

In my opinion, we should have a warning to make sure the user is aware the reason why FRic are NAs and also change the example in the introduction vignette.
Like with the following:

library("fundiversity")

fd_fric(traits_birds, site_sp_birds[, 1:60])
#> Differing number of species between trait dataset and site-species matrix
#> Taking subset of species
#>        site        FRic
#> 1  elev_250 18963.31311
#> 2  elev_500 18963.31311
#> 3 elev_1000 38586.75398
#> 4 elev_1500 38114.26828
#> 5 elev_2000  5888.93690
#> 6 elev_2500  5256.70628
#> 7 elev_3000  2710.81803
#> 8 elev_3500    88.11684

^{Created on 2022-10-21 with reprex v2.0.2}

Add a switch in `options()` to prevent use of memoised function?

Currently, if the user has memoise on their computer, they will automatically get the memoised version of fd_chull() and fd_chull_intersect(), without any way to opt-out.

I think it would be good to offer the option to opt out, if the user wishes to do so for whatever reason.

Add support for sparse matrices

When using huge site-species matrices, it is sometimes more memory efficient to use sparse matrices.
We could implement the computation of indices with sparse matrices through the Matrix package (bundled with base R).

Renaming the vignettes to order them logically?

For now the vignettes on the CRAN and using vignette(package = "fundiversity") or browseVignettes(package = "fundiversity") are ordered based on alphabetical order:

This may not be a desirable order as it would make more sense to have the introduction vignette first, then the parallelization and other vignettes.

Maybe we should rename the files to be "fundiversity", "fundiversity-2", etc. as is done by future?

My preferred order would be:

Introduction/Overview
Parallelization
Performance
Correctness
Design Principles

Maybe we should also rename the introduction vignette and parallelization vignette to have more explicit names.

Warn user when computing FRic with many species/many traits

geometry::convhulln() used in fd_fric() has some limitation in data size. In order to avoid wasting computer time we should probably issue a warning for large size datasets (many species × many traits) to say that the computation may fail in this case.

Performance comparison

In order to show the interest of having another package to compute functional diversity indices, we should also make a vignette that shows performance comparison between fundiversity and related packages.

Improve function efficiency through matrix algebra

I'm pretty sure that the call to apply() in fd_raoq() could be simplified through matrix algebra:
https://github.com/Bisaloo/fundiversity/blob/1e175ba88531e08242ab9c69b9e470b4cee9e759/R/fd_raoq.R#L60-L62

Errors when using non-quantitative traits in fd_fric()

We should probably test the behavior of all functions when using non-quantitative traits to make the functions warn the user and not silently compute actual things.

See for example:

data("aviurba", package = "ade4")
fundiversity::fd_fric(aviurba$traits)
#> Warning in storage.mode(p) <- "double": NAs introduits lors de la conversion
#> automatique
#> Error in geometry::convhulln(traits, "FA"): The first argument should not contain any NAs

^{Created on 2021-08-05 by the reprex package (v2.0.0)}

How should we deal with missing values?

Probably discard them but should we display a warning before doing so?

When a value is missing, should we discard the whole row or just ignore this value? What are the implications on the resulting indices (beyond the very practical fact that few datasets will be complete for all traits and species)?

Have a worked through example with individual level dataset

I'm getting a lot of question of confused users regarding the use of fundiversity with datasets at the individual level because we keep referring to "site-species" matrix and "species-trait" matrix.
Even though we've written it in the paper and in the introduction vignette, it seems that we need to specify it elsewhere with a well chosen example (maybe in another dedicated vignette?).

Parallel computation of functional diversity indices

From the performance vignette #8 we can see that some indices can take while to compute with big matrices.
One interesting (but certainly expansive development-wise) feature would be to allow for automatic parallel computation of functional diversity indices across sites through the split of the site-species matrix in chunks.
The implementation could use the future package.

Add CITATION file

Related to #27
Even before envisionning a manuscript, add a CITATION file that mention the version of the package used so that people can start using it beforne any scientific publication.

Handle the different error sources from geometry::convhulln()

95e5823 fix the case when geometry::convhulln() returns an error if there are less points than dimensions.
We should also handle the case where the dimensionality is artificially reduced because several points have the same coordinates.

New index: intersection between convex hulls

As ecologists may be interested in the intersection of different convex hulls fundiversity could provide a wrapper around geometry::intersectn() that outputs the volume of the intersection between two volumes.

The input of this wrapper would be similar to fd_fric() (site-species matrix, trait matrix) and the output would be a distance matrix or rather a tidy data.frame with the first two columns giving the ids of the two considered sites, the third column would give the volume of the intersection.

Computationnally this could be quite intensive but should parallelize without issue (see #13).

fd_fdis() bugs when site-species object is a data.frame() expects a matrix.

Reprex

library("fundiversity")

fd_fdis(traits_birds, as.data.frame(site_sp_birds))
#> Error in sp_com %*% traits: nécessite des arguments numériques/complexes matrice/vecteur

^{Created on 2023-01-30 with reprex v2.0.2}

The solution would be to wrap sp_com in fd_fdis() into as.matrix() before performing matrix multiplication.

fd_dis gives error if site x species matrix does not have row names

I ran into an issue using fd_dis() when my site x species matrix doesn't have row names. For my purposes, I cannot force my site x species matrices to have row names as they are contained within a simulation matrix (class simmat). I expect this could be an issue for future users as well.

An easy fix would be to either just force "Site" in the return dataframe from the fd_dis() function to be 1:nrow :

data.frame(site = 1:nrow(sp_com), FDis = fdis_site, row.names = NULL)

or use an if statement

if(is.null(rownames(sp_com))) { rownames(sp_com) <- 1:nrow(sp_com) }
data.frame(site = rownames(sp_com), FDis = fdis_site, row.names = NULL)

Site name when no site-species matrix is provided

When no site-species matrix is provided, the site is called "s1" should it be called something else?
Also the row.names is then "s1" this should probably be set to NULL to avoid confusion:

fundiversity::fd_fric(fundiversity::traits_birds)
#>    site     FRic
#> s1   s1 230967.7

^{Created on 2020-12-11 by the reprex package (v0.3.0)}

Create 'design principles' vignette

Using the info currently in the wiki

Archive releases into Zenodo

Just a reminder to do this so that the package is archived.

Add link to published paper

We should add in the CITATION file, as well as the description a link to the published paper

Citation & documentation

We should cite the dataset used as example in the package as well as document it:

Nowak, Larissa et al. (2019), Data from: Projecting consequences of global warming for the functional diversity of fleshy-fruited plants and frugivorous birds along a tropical elevational gradient, Dryad, Dataset, https://doi.org/10.5061/dryad.c0n737b

Do not compute convex hulls on each row separately

A naive implementation to compute FRic is to compute the convex hull for each row separately.

However, this problem can be simplified since each row is a subset of the entire species list. The various convex hulls are not computed on completely independent points but on a subset / the union / etc. of points for which we previously computed the convex hull.

Dependency for MST in FEve

ade4: mstree()
ape: mst()
igraph: mst()
vegan: spantree()

fd_fdiv() returns `NaN` on datasets without rownames

Reprex:

library(fundiversity)

data(traits_birds)
rownames(traits_birds) <- NULL

fd_fdiv(traits_birds)
#>   site FDiv
#> 1   s1  NaN

^{Created on 2022-08-05 by the reprex package (v2.0.1.9000)}

Row names should not be mandatory when sp_com is not provided.

Release fundiversity 0.2.1

Prepare for release:

Check current CRAN check results
Polish NEWS
~~devtools::build_readme()~~ (done by GitHub Action)
urlchecker::url_check()
devtools::check(remote = TRUE, manual = TRUE)
devtools::check_win_devel()
rhub::check_for_cran()
~~revdepcheck::revdep_check(num_workers = 4)~~
Update cran-comments.md

Submit to CRAN:

usethis::use_version('patch')
devtools::submit_cran()
Approve email

Wait for CRAN...

Accepted 🎉
usethis::use_github_release()
usethis::use_dev_version()

Specific error obtained with Qhull

Here is an issue that aims to be a list of specific errors due to Qhull to better manage #38 in the future.

I obtained the following error in Qhull:

Error: Received error code 5 from qhull. Qhull error:
QH6271 qhull precision error (qh_check_dupridge): wide merge (1122660224434 times wider) due to duplicate ridge with nearly coincident points (0.027) between f102531 and f102515, merge dist 0.093, while processing p2261

Ignore error with option 'Q12'

To be fixed in a later version of Qhull

Vertex distance 0.027 is greater than 100 times maximum distance 8.3e-14
Please report to [email protected] with steps to reproduce and all output
ERRONEOUS FACET:

f102531

flags: top new seen mergehorizon dupridge mergeridge2

normal: 0.05806 0.1744 -0.5788 0.7752 -0.174

offset: 0.1009689

vertices: p2261(v1312) p2266(v1046) p2272(v799) p2262(v289) p2265(v160)

neighboring facets: f12145 f12159 f102533 f102515 f102532

ridges:

r74938 tested
vertices: p2266(v1046) p2272(v799) p2262(v289) p2265(v160)
between f102531 and f12145

r97773
vertices: p2261(v13

When running the inst/new_benchmark.R script with the following parameter: 50 sites, 5 traits, 200 species, code at c9d85d8 commit.

Standardize FRic by theoretical maximum

Most packages that compute FRic propose a version where it is standardized by regional maximum. We could propose this in fd_fric() with a stand = TRUE argument

Mention vignette and link them in the README

FDiv is computed on the vertices of the convex hull

FDiv is computed on the vertices of the convex hull formed by the data points, not on the entirety of the points.

This means we have to compute the convex hull even in the FDiv.

This might cause quite a performance hit so it might be interesting to think about caching the result of convhulln() to avoid unnecessary re-computation when getting both FRic and FDiv. This is likely the role of verts.txt in FD.

Should we write a vignette for non-continuous trait data?

I often get emailed about non-continuous trait data that fundiversity doesn't handle.
I wonder if we should write a full vignette to show the general workflow, an include this in the error message for non-continuous trait data.

The workflow would go as follow (with a worked through example):

Identify the nature of all of your traits (binary, fuzzy-coded, categorical)
Compute a dissimilarity matrix between your species of interest (Gower's dissimilarity or its extensions from Podani or Pavoine)
Project this dissimilarity matrix onto continuous axes through a Principal Coordinates Analysis
Select a number of PCoA axes and interpret them using correlation with original trait data
Use them in fundiversity

I don't think it would be necessary to add any feature to fundiversity to deal with this case, as it's covered extensively by other tools (especially mFD), but maybe having a long-form documentation could be helpful to point users to.

Hex logo

Just dropping it here :)
It would be really cool to have a hex logo!
We should brainstorm on how to represent functional diversity the best? Possibly two convex hulls of points and their intersection highlighted?

Add tip in error message because of non-continuous traits

It's the second person that contacts me because of the non-continuous trait error message:

fundiversity/R/fd_fric.R

Lines 65 to 68 in 0a58ab6

    
           if (!is.numeric(traits)) { 
        
             stop("Non-continuous trait data found in input traits. ", 
        
                  "Please provide only continuous trait data", call. = FALSE) 
        
           }

Maybe we should point people to multivariate analyses to get back continuous traits. Like adding a line "If you want to use non-numeric traits with fundiversity, you have to transform them to obtain numerical traits beforehand (e.g., through a PCoA or similar techniques)"

It would probably be too specific, but at least would point user to ways to overcome the issue themselves.

Simplify CI matrix by using `_R_CHECK_DEPENDS_ONLY_TESTS_`

Since R 4.1.3, we can run tests without packages from Suggests by setting the _R_CHECK_DEPENDS_ONLY_TESTS_ env var to true.

https://cran.r-project.org/doc/manuals/r-devel/R-ints.html#index-_005fR_005fCHECK_005fDEPENDS_005fONLY_005fTESTS_005f

Managing error from qhull transferred to geometry::convhulln() in fd_chull()

Sometimes when doing thousand of computation of FRic with fd_fric() qhull errors because of edge cases. This cancels the computation of all simulation and there is currently no option for the user to proceed anyway or at least to output an NA volume. Some of these errors are real errors while some are warnings from quickhull that are transformed into errors when transferred to R.

The solution would be to see with the geometry maintainers to see how to pass warning and errors to R. But it's probably not as easy.

The error message from qhull indicates that the Pp option can solve some of these issues.

Be more specific in the help with memoization

A little connex to #71 but the current description of memoisation

fundiversity/R/fd_fric.R

Lines 34 to 41 in 2d19f5c

    
           #' @details By default, when loading \pkg{fundiversity}, the functions to 
        
           #' compute convex hulls are 
        
           #' [memoised](https://en.wikipedia.org/wiki/Memoization) through the `memoise` 
        
           #' package if it is installed. To deactivate this behavior you can set the 
        
           #' option `fundiversity.memoise` to `FALSE` by running the following line: 
        
           #' `options(fundiversity.memoise = FALSE)`. If you use it interactively it will 
        
           #' only affect your current session. Add it to your script(s) or `.Rprofile` 
        
           #' file to avoid toggling it each time.

doesn't mention that the options should be used before load fundiversity.

Maybe it could also be good to mention when to use and not use memoisation.

Using future and memoise together

We officially recommended not to use memoise and future at the same time in the fundiversity manuscript.
However, there maybe ways to get both.

I will collect here possibilities to work with both:

A question on SO about using plumber, future, and memoise together: https://stackoverflow.com/q/70805314
A discussion in the future repo: HenrikBengtsson/future#506
An answer on SO (by Henrik Bengsston the author of future) suggesting to use R.cache file caching system instead: https://stackoverflow.com/a/48102804
This open issue on memoise repo: r-lib/memoise#29

It seems for now that using memoise it's not straightforward to parallelize.
Maybe we should be extra-careful and add a warning when loading the package with memoisation that it shouldn't be used with parallelization.
We should also add this in:

The README file
The parallelization vignette
The memoisation documentation segment
The parallelization documentation segment.

	if (!is.numeric(traits)) {
	stop("Non-continuous trait data found in input traits. ",
	"Please provide only continuous trait data", call. = FALSE)
	}

	#' @details By default, when loading \pkg{fundiversity}, the functions to
	#' compute convex hulls are
	#' [memoised](https://en.wikipedia.org/wiki/Memoization) through the `memoise`
	#' package if it is installed. To deactivate this behavior you can set the
	#' option `fundiversity.memoise` to `FALSE` by running the following line:
	#' `options(fundiversity.memoise = FALSE)`. If you use it interactively it will
	#' only affect your current session. Add it to your script(s) or `.Rprofile`
	#' file to avoid toggling it each time.