theoreticalecology / s-jsdm Goto Github PK

View Code? Open in Web Editor NEW

62.0 7.0 14.0 46.62 MB

Scalable joint species distribution modeling

Home Page: https://theoreticalecology.github.io/s-jSDM/

License: GNU General Public License v3.0

R 64.15% Python 23.91% Jupyter Notebook 11.94%

species-distribution-modelling species-interactions machine-learning deep-learning gpu-acceleration

s-jsdm's Introduction

s-jSDM - Fast and accurate Joint Species Distribution Modeling

About the method

The method is described in Pichler & Hartig (2021) A new joint species distribution model for faster and more accurate inference of species associations from big community data, https://doi.org/10.1111/2041-210X.13687. The code for producing the results in this paper is available under the subfolder publications in this repo.

The method itself is wrapped into an R package, available under subfolder sjSDM. You can also use it stand-alone under Python (see instructions below). Note: for both the R and the python package, python >= 3.7 and pytorch must be installed (more details below).

Installing the R / Python package

R-package

Install the package via

install.packages("sjSDM")

Depencies for the package can be installed before or after installing the package. Detailed explanations of the dependencies are provided in vignette(“Dependencies”, package = “sjSDM”), source code here. Very briefly, the dependencies can be automatically installed from within R:

sjSDM::install_sjSDM(version = "gpu") # or
sjSDM::install_sjSDM(version = "cpu")

To cite sjSDM, please use the following citation:

citation("sjSDM")

Development

If you want to install the current (development) version from this repository, run

devtools::install_github("https://github.com/TheoreticalEcology/s-jSDM", subdir = "sjSDM", ref = "master")

Once the dependencies are installed, the following code should run:

Workflow

Simulate a community and fit a sjSDM model:

library(sjSDM)

## ── Attaching sjSDM ──────────────────────────────────────────────────── 1.0.4 ──

## ✔ torch <environment> 
## ✔ torch_optimizer  
## ✔ pyro  
## ✔ madgrad

set.seed(42)
community <- simulate_SDM(sites = 100, species = 10, env = 3, se = TRUE)
Env <- community$env_weights
Occ <- community$response
SP <- matrix(rnorm(200, 0, 0.3), 100, 2) # spatial coordinates (no effect on species occurences)

model <- sjSDM(Y = Occ, env = linear(data = Env, formula = ~X1+X2+X3), spatial = linear(data = SP, formula = ~0+X1:X2), se = TRUE, family=binomial("probit"), sampling = 100L)
summary(model)

## Family:  binomial 
## 
## LogLik:  -510.9816 
## Regularization loss:  0 
## 
## Species-species correlation matrix: 
## 
##  sp1  1.0000                                 
##  sp2 -0.3780  1.0000                             
##  sp3 -0.2050 -0.4070  1.0000                         
##  sp4 -0.1850 -0.3860  0.8220  1.0000                     
##  sp5  0.6820 -0.4090 -0.1240 -0.0730  1.0000                 
##  sp6 -0.3050  0.4870  0.1630  0.1510 -0.1220  1.0000             
##  sp7  0.5830 -0.1190  0.0960  0.1200  0.5520  0.2450  1.0000         
##  sp8  0.3140  0.1690 -0.5280 -0.5460  0.2330 -0.0480  0.1300  1.0000     
##  sp9 -0.0620 -0.0250  0.0840  0.0640 -0.4010 -0.3430 -0.2060 -0.1380  1.0000 
##  sp10     0.2080  0.4750 -0.7140 -0.6490  0.2540  0.1410  0.1480  0.4560 -0.2850  1.0000
## 
## 
## 
## Spatial: 
##            sp1       sp2      sp3       sp4      sp5      sp6      sp7      sp8
## X1:X2 2.103188 -4.041381 3.452883 0.2332844 2.681165 1.325118 3.126471 1.928931
##             sp9     sp10
## X1:X2 0.9001696 1.262238
## 
## 
## 
##                  Estimate Std.Err Z value Pr(>|z|)    
## sp1 (Intercept)   -0.0847  0.2671   -0.32  0.75124    
## sp1 X1             1.3854  0.5241    2.64  0.00820 ** 
## sp1 X2            -2.4736  0.4839   -5.11  3.2e-07 ***
## sp1 X3            -0.2583  0.4362   -0.59  0.55385    
## sp2 (Intercept)   -0.0145  0.2601   -0.06  0.95560    
## sp2 X1             1.2578  0.5233    2.40  0.01625 *  
## sp2 X2             0.2357  0.4909    0.48  0.63112    
## sp2 X3             0.6825  0.4302    1.59  0.11266    
## sp3 (Intercept)   -0.5653  0.2861   -1.98  0.04819 *  
## sp3 X1             1.4285  0.5099    2.80  0.00509 ** 
## sp3 X2            -0.4155  0.5096   -0.82  0.41489    
## sp3 X3            -1.1364  0.4898   -2.32  0.02034 *  
## sp4 (Intercept)   -0.1156  0.2580   -0.45  0.65406    
## sp4 X1            -1.5792  0.4921   -3.21  0.00133 ** 
## sp4 X2            -1.9313  0.5088   -3.80  0.00015 ***
## sp4 X3            -0.4306  0.4314   -1.00  0.31822    
## sp5 (Intercept)   -0.2109  0.2526   -0.83  0.40378    
## sp5 X1             0.7425  0.4843    1.53  0.12525    
## sp5 X2             0.5624  0.4582    1.23  0.21969    
## sp5 X3            -0.7171  0.4154   -1.73  0.08433 .  
## sp6 (Intercept)    0.2184  0.2707    0.81  0.41973    
## sp6 X1             2.6087  0.5552    4.70  2.6e-06 ***
## sp6 X2            -1.1176  0.5271   -2.12  0.03400 *  
## sp6 X3             0.2021  0.4461    0.45  0.65049    
## sp7 (Intercept)   -0.0719  0.2448   -0.29  0.76903    
## sp7 X1            -0.3372  0.4899   -0.69  0.49132    
## sp7 X2             0.3403  0.4328    0.79  0.43175    
## sp7 X3            -1.4822  0.4269   -3.47  0.00052 ***
## sp8 (Intercept)    0.1574  0.1625    0.97  0.33270    
## sp8 X1             0.3657  0.3158    1.16  0.24688    
## sp8 X2             0.3236  0.3102    1.04  0.29688    
## sp8 X3            -1.2363  0.2850   -4.34  1.4e-05 ***
## sp9 (Intercept)    0.0235  0.2003    0.12  0.90667    
## sp9 X1             1.4160  0.3943    3.59  0.00033 ***
## sp9 X2            -1.0606  0.3755   -2.82  0.00473 ** 
## sp9 X3             0.7943  0.3444    2.31  0.02111 *  
## sp10 (Intercept)  -0.0825  0.2076   -0.40  0.69104    
## sp10 X1           -0.5510  0.3781   -1.46  0.14505    
## sp10 X2           -1.3145  0.3777   -3.48  0.00050 ***
## sp10 X3           -0.5257  0.3590   -1.46  0.14310    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

plot(model)

## Family:  binomial 
## 
## LogLik:  -510.9816 
## Regularization loss:  0 
## 
## Species-species correlation matrix: 
## 
##  sp1  1.0000                                 
##  sp2 -0.3780  1.0000                             
##  sp3 -0.2050 -0.4070  1.0000                         
##  sp4 -0.1850 -0.3860  0.8220  1.0000                     
##  sp5  0.6820 -0.4090 -0.1240 -0.0730  1.0000                 
##  sp6 -0.3050  0.4870  0.1630  0.1510 -0.1220  1.0000             
##  sp7  0.5830 -0.1190  0.0960  0.1200  0.5520  0.2450  1.0000         
##  sp8  0.3140  0.1690 -0.5280 -0.5460  0.2330 -0.0480  0.1300  1.0000     
##  sp9 -0.0620 -0.0250  0.0840  0.0640 -0.4010 -0.3430 -0.2060 -0.1380  1.0000 
##  sp10     0.2080  0.4750 -0.7140 -0.6490  0.2540  0.1410  0.1480  0.4560 -0.2850  1.0000
## 
## 
## 
## Spatial: 
##            sp1       sp2      sp3       sp4      sp5      sp6      sp7      sp8
## X1:X2 2.103188 -4.041381 3.452883 0.2332844 2.681165 1.325118 3.126471 1.928931
##             sp9     sp10
## X1:X2 0.9001696 1.262238
## 
## 
## 
##                  Estimate Std.Err Z value Pr(>|z|)    
## sp1 (Intercept)   -0.0847  0.2671   -0.32  0.75124    
## sp1 X1             1.3854  0.5241    2.64  0.00820 ** 
## sp1 X2            -2.4736  0.4839   -5.11  3.2e-07 ***
## sp1 X3            -0.2583  0.4362   -0.59  0.55385    
## sp2 (Intercept)   -0.0145  0.2601   -0.06  0.95560    
## sp2 X1             1.2578  0.5233    2.40  0.01625 *  
## sp2 X2             0.2357  0.4909    0.48  0.63112    
## sp2 X3             0.6825  0.4302    1.59  0.11266    
## sp3 (Intercept)   -0.5653  0.2861   -1.98  0.04819 *  
## sp3 X1             1.4285  0.5099    2.80  0.00509 ** 
## sp3 X2            -0.4155  0.5096   -0.82  0.41489    
## sp3 X3            -1.1364  0.4898   -2.32  0.02034 *  
## sp4 (Intercept)   -0.1156  0.2580   -0.45  0.65406    
## sp4 X1            -1.5792  0.4921   -3.21  0.00133 ** 
## sp4 X2            -1.9313  0.5088   -3.80  0.00015 ***
## sp4 X3            -0.4306  0.4314   -1.00  0.31822    
## sp5 (Intercept)   -0.2109  0.2526   -0.83  0.40378    
## sp5 X1             0.7425  0.4843    1.53  0.12525    
## sp5 X2             0.5624  0.4582    1.23  0.21969    
## sp5 X3            -0.7171  0.4154   -1.73  0.08433 .  
## sp6 (Intercept)    0.2184  0.2707    0.81  0.41973    
## sp6 X1             2.6087  0.5552    4.70  2.6e-06 ***
## sp6 X2            -1.1176  0.5271   -2.12  0.03400 *  
## sp6 X3             0.2021  0.4461    0.45  0.65049    
## sp7 (Intercept)   -0.0719  0.2448   -0.29  0.76903    
## sp7 X1            -0.3372  0.4899   -0.69  0.49132    
## sp7 X2             0.3403  0.4328    0.79  0.43175    
## sp7 X3            -1.4822  0.4269   -3.47  0.00052 ***
## sp8 (Intercept)    0.1574  0.1625    0.97  0.33270    
## sp8 X1             0.3657  0.3158    1.16  0.24688    
## sp8 X2             0.3236  0.3102    1.04  0.29688    
## sp8 X3            -1.2363  0.2850   -4.34  1.4e-05 ***
## sp9 (Intercept)    0.0235  0.2003    0.12  0.90667    
## sp9 X1             1.4160  0.3943    3.59  0.00033 ***
## sp9 X2            -1.0606  0.3755   -2.82  0.00473 ** 
## sp9 X3             0.7943  0.3444    2.31  0.02111 *  
## sp10 (Intercept)  -0.0825  0.2076   -0.40  0.69104    
## sp10 X1           -0.5510  0.3781   -1.46  0.14505    
## sp10 X2           -1.3145  0.3777   -3.48  0.00050 ***
## sp10 X3           -0.5257  0.3590   -1.46  0.14310    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We support other distributions:

Count data with Poisson:

model <- sjSDM(Y = Occ, env = linear(data = Env, formula = ~X1+X2+X3), spatial = linear(data = SP, formula = ~0+X1:X2), se = TRUE, family=poisson("log"))

Count data with negative Binomial (which is still experimental, if you run into errors/problems, please let us know):

model <- sjSDM(Y = Occ, env = linear(data = Env, formula = ~X1+X2+X3), spatial = linear(data = SP, formula = ~0+X1:X2), se = TR, family="nbinom")

Gaussian (normal):

model <- sjSDM(Y = Occ, env = linear(data = Env, formula = ~X1+X2+X3), spatial = linear(data = SP, formula = ~0+X1:X2), se = TR, family=gaussian("identity"))

Anova

ANOVA can be used to partition the three components (abiotic, biotic, and spatial):

an = anova(model)
print(an)

## Analysis of Deviance Table
## 
## Terms added sequentially:
## 
##           Deviance Residual deviance R2 Nagelkerke R2 McFadden
## Abiotic  157.95722        1177.48500       0.79394      0.1139
## Biotic   175.41278        1160.02944       0.82694      0.1265
## Spatial   17.38643        1318.05579       0.15959      0.0125
## Full     385.98836         949.45386       0.97893      0.2784

plot(an)

The anova shows the relative changes in the R² of the groups and their intersections.

Internal metacommunity structure

Following Leibold et al., 2022 we can calculate and visualize the internal metacommunity structure (=partitioning of the three components for species and sites). The internal structure is already calculated by the ANOVA and we can visualize it with the plot method:

results = plotInternalStructure(an) # or plot(an, internal = TRUE)

## Registered S3 methods overwritten by 'ggtern':
##   method           from   
##   grid.draw.ggplot ggplot2
##   plot.ggplot      ggplot2
##   print.ggplot     ggplot2

The plot function returns the results for the internal metacommunity structure:

print(results$data$Species)

##           env         spa     codist         r2
## 1  0.17677667 0.000000000 0.16810146 0.03375475
## 2  0.08724636 0.026656011 0.18072040 0.02946228
## 3  0.12613742 0.004529856 0.21004115 0.03407084
## 4  0.16648179 0.000000000 0.15890110 0.03241345
## 5  0.08585343 0.005811074 0.16168802 0.02533525
## 6  0.18787936 0.012341719 0.11489709 0.03151182
## 7  0.10765006 0.012898782 0.13292549 0.02534743
## 8  0.12445149 0.015040332 0.06188116 0.02013730
## 9  0.17762242 0.000000000 0.04357315 0.02196159
## 10 0.08805858 0.017406001 0.13890574 0.02443703

Installation trouble shooting

If the installation fails, check out the help of ?install_sjSDM, ?installation_help, and vignette(“Dependencies”, package = “sjSDM”).

Try install_sjSDM()
New session, if no ‘PyTorch not found’ appears it should work, otherwise see ?installation_help
If do not get the pkg to run, create an issue issue tracker or write an email to maximilian.pichler at ur.de

Python Package

pip install sjSDM_py

Python example

import sjSDM_py as fa
import numpy as np
import torch
Env = np.random.randn(100, 5)
Occ = np.random.binomial(1, 0.5, [100, 10])

model = fa.Model_sjSDM(device=torch.device("cpu"), dtype=torch.float32)
model.add_env(5, 10)
model.build(5, optimizer=fa.optimizer_adamax(0.001),scheduler=False)
model.fit(Env, Occ, batch_size = 20, epochs = 10)
# print(model.weights)
# print(model.covariance)

## Iter: 0/10   0%|          | [00:00, ?it/s]Iter: 0/10   0%|          | [00:00, ?it/s, loss=7.17]Iter: 1/10  10%|#         | [00:00,  3.88it/s, loss=7.17]Iter: 1/10  10%|#         | [00:00,  3.88it/s, loss=7.158]Iter: 1/10  10%|#         | [00:00,  3.88it/s, loss=7.149]Iter: 1/10  10%|#         | [00:00,  3.88it/s, loss=7.142]Iter: 1/10  10%|#         | [00:00,  3.88it/s, loss=7.124]Iter: 1/10  10%|#         | [00:00,  3.88it/s, loss=7.118]Iter: 1/10  10%|#         | [00:00,  3.88it/s, loss=7.112]Iter: 1/10  10%|#         | [00:00,  3.88it/s, loss=7.108]Iter: 1/10  10%|#         | [00:00,  3.88it/s, loss=7.118]Iter: 9/10  90%|######### | [00:00, 30.63it/s, loss=7.118]Iter: 9/10  90%|######### | [00:00, 30.63it/s, loss=7.144]Iter: 10/10 100%|##########| [00:00, 26.74it/s, loss=7.144]

Calculate Importance:

Beta = np.transpose(model.env_weights[0])
Sigma = ( model.sigma @ model.sigma.t() + torch.diag(torch.ones([1])) ).data.cpu().numpy()
covX = fa.covariance( torch.tensor(Env).t() ).data.cpu().numpy()

fa.importance(beta=Beta, covX=covX, sigma=Sigma)

## {'env': array([[ 1.2717709e-02,  7.8437300e-03,  6.6514793e-03, -3.3015787e-04,
##          2.3806898e-04],
##        [ 9.9158729e-05, -1.8891758e-06,  1.0537009e-03,  4.0511694e-04,
##          1.1120385e-02],
##        [ 6.1564189e-03,  5.9850062e-03,  9.2307013e-03,  5.4843356e-03,
##         -3.3683516e-04],
##        [ 1.3349474e-02,  4.3294221e-04,  1.8103119e-03,  1.4068705e-02,
##          6.6316797e-04],
##        [ 3.3953122e-05,  3.1304134e-03,  2.6658648e-03, -3.6165391e-05,
##          7.3677581e-03],
##        [ 2.7722977e-03,  1.9519718e-03,  4.8086399e-04,  3.0876237e-03,
##          1.7828522e-04],
##        [ 7.9284189e-03,  5.7881157e-04,  7.5722663e-03,  2.1802005e-06,
##          4.2433664e-03],
##        [-5.1329907e-06,  6.0144444e-03,  2.1059261e-05,  7.5954124e-03,
##          1.5537007e-03],
##        [ 7.7161851e-04,  1.7209088e-02,  4.8407568e-03,  1.8020724e-03,
##          5.6920521e-04],
##        [ 1.7108561e-02,  7.2742125e-04,  9.4651995e-04,  6.8342132e-03,
##          1.1830850e-02]], dtype=float32), 'biotic': array([0.9728792 , 0.9873236 , 0.9734804 , 0.9696754 , 0.9868381 ,
##        0.9915289 , 0.97967494, 0.9848205 , 0.9748072 , 0.9625524 ],
##       dtype=float32)}

s-jsdm's People

Contributors

Stargazers

Watchers

Forkers

yangxhcaf frmunoz haiyangzhang798 joshualiuxu ettnerandreas warriorkt caiwang0503 pyspider wangyongcai2016 25280841 christianames jiexunxu chenyongpeng1 yanvt

s-jsdm's Issues

CPU dtype="float64" error

A user encountered overflow problems and the use of doubles should help here, but:

> com = simulate_SDM(env = 3L, species = 5L, sites = 100L)
> ## fit model:
> model = sjSDM(Y = com$response,env = com$env_weights, iter = 2L, dtype = "float64") 
Iter: 0/2   0%|          | [00:00, ?it/s]
 Error in py_call_impl(callable, dots$args, dots$keywords) : 
  RuntimeError: expected scalar type Float but found Double Timing stopped at: 0.018 0 0.019

Phylogeny

I got a question from a user how to include a phylogenetic distance matrix. At some point we have to finally tackle this problem.
At the moment I can think of two options:
a) phylogenetic distance matrix as a kind of species-species "prior" on the env weights
b) treat phylogenetic eigenvectors as traits and fit a fourth-corner-model (as they do in the gllvm pkg: see )

Various issues / questions about the installation

For me,

install_sjSDM()
install_sjSDM(method = "auto")

produces error

Error in install_sjSDM() : object 'package' not found

Also, https://github.com/TheoreticalEcology/s-jSDM#install-instructions doesn't seem to be up to date

CRAN release

To do:

installation problems

I'm having trouble installing on macOS, which is weird because i previously installed without problem (and then it stopped working). I removed both conda env folders (r-sjSDM and sjSDM_env) before installing, and i only have miniconda2.

conda create --name sjSDM_env python=3.7`
conda activate sjSDM_env`
conda install pytorch torchvision cpuonly -c pytorch # cpu
devtools::install_github("https://github.com/TheoreticalEcology/s-jSDM", subdir = "sjSDM", build_vignettes = TRUE, build_manual = TRUE)

library(sjSDM)
install_sjSDM(version = "cpu", conda_python_version = "3.7")

I get this error:

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

All requested packages already installed.

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... failed with initial frozen solve. Retrying with flexible solve.

PackagesNotFoundError: The following packages are not available from current channels:

torchvision

torch

Current channels:

https://conda.anaconda.org/conda-forge/osx-64

https://conda.anaconda.org/conda-forge/noarch

https://repo.anaconda.com/pkgs/main/osx-64

https://repo.anaconda.com/pkgs/main/noarch

https://repo.anaconda.com/pkgs/r/osx-64

https://repo.anaconda.com/pkgs/r/noarch

To search for alternate channels that may provide the conda package you're
looking for, navigate to
https://anaconda.org
and use the search bar at the top of the page.

Installation failed... Try to install manually PyTorch (install instructions: https://github.com/TheoreticalEcology/s-jSDM
If the installation still fails, please report the following error on https://github.com/TheoreticalEcology/s-jSDM/issues
one or more Python packages failed to install [error code 1]

install issues

Hi Max, as I said, on my new system, it first didn't work at all (conda not found). I installed Anaconda with python 3.7

I now re-installed reticulate, and now it finds the python system (so, do we maybe have to increase the minimum version for reticulate? Unfortunately, not sure which version I had before).

However, now I get

PackagesNotFoundError: The following packages are not available from current channels:

  - torch
  - torchvision

Current channels:

  - https://conda.anaconda.org/conda-forge/osx-64
  - https://conda.anaconda.org/conda-forge/noarch
  - https://repo.anaconda.com/pkgs/main/osx-64
  - https://repo.anaconda.com/pkgs/main/noarch
  - https://repo.anaconda.com/pkgs/r/osx-64
  - https://repo.anaconda.com/pkgs/r/noarch

To search for alternate channels that may provide the conda package you're
looking for, navigate to

    https://anaconda.org

and use the search bar at the top of the page.

I have a call now, will try to solve this later, just to let you know.

Clustered heat map for species associations

Should we have a plot like this https://pypi.org/project/sjSDM-py/

Improve documentation

Vignettes should answer/cover (derived from user question):

how to evaluate model fit?
which learning rate should I choose?
what about model selection?

Anything to add? @florianhartig

nn.Sequential to DNN()

It should be possible to pass neural network objects (torch.nn.Sequentia(...)) directly to the DNN() config, e.g. it could be used to build custom NNs such as CNNs or pre-trained NNs to sjSDM

column sums for importance?

would be nice to implement column sum for print(imp)

Line breaks in help

Any idea why I get these line breaks in the help files?

DNN support

Implementation of functional DNN api (same style as in rstudio-keras)

install to google colab / kaggle

Hey, I wanna install sjSDM to google colab. But I get this error message. I don't know if it's easy to fix but better to check with you.
Error: Failed to install 'sjSDM' from GitHub:
(converted from warning) installation of package ‘/tmp/RtmpJEcT8x/file436a79464/sjSDM_0.1.3.9000.tar.gz’ had non-zero exit status
Traceback:

devtools::install_github("https://github.com/TheoreticalEcology/s-jSDM",
. subdir = "sjSDM", auth_token = "xxxxxxx")
pkgbuild::with_build_tools({
. ellipsis::check_dots_used(action = getOption("devtools.ellipsis_action",
. rlang::warn))
. {
. remotes <- lapply(repo, github_remote, ref = ref, subdir = subdir,
. auth_token = auth_token, host = host)
. install_remotes(remotes, auth_token = auth_token, host = host,
. dependencies = dependencies, upgrade = upgrade, force = force,
. quiet = quiet, build = build, build_opts = build_opts,
. build_manual = build_manual, build_vignettes = build_vignettes,
. repos = repos, type = type, ...)
. }
. }, required = FALSE)
install_remotes(remotes, auth_token = auth_token, host = host,
. dependencies = dependencies, upgrade = upgrade, force = force,
. quiet = quiet, build = build, build_opts = build_opts, build_manual = build_manual,
. build_vignettes = build_vignettes, repos = repos, type = type,
. ...)
tryCatch(res[[i]] <- install_remote(remotes[[i]], ...), error = function(e) {
. stop(remote_install_error(remotes[[i]], e))
. })
tryCatchList(expr, classes, parentenv, handlers)
tryCatchOne(expr, names, parentenv, handlers[[1L]])
value[3L]

on.load() checks

I think we check for pytorch, but not for python / conda, right? I would add such a check.

As said, maybe best to get a general diagnostics function, which checks the system for requirements, and provides a comprehensive error message, together with the note to post this in GitHub in case the problem persists?

multiple gpu when running 'sjSDM'

Hi,
We figured out that there's no argument 'n_gpu' in the function 'sjSDM', but only in 'sjSDM_cv'. Is it possible to use multiple gpus to run 'sjSDM' function at all? If so, is it implemented yet in 'sjSDM' function?
Thanks a lot!

Better error message for missing pytorch installation?

Without installing, I got this error message when running the sjSDM

Error in sjSDM(X = com$env_weights, Y = com$response, iter = 10L) : 
  object 'fa' not found

I assume that is because of the missing pytorch install. Given that we can anticipate that a user would forget to do this, maybe provide a better error message?

error: ModuleNotFoundError: No module named 'pyro'

Dear Max,

I have the following error when installing the latest version of the package:

.onLoad failed in loadNamespace() for 'sjSDM', details: call: py_module_import(module, convert = convert) error: ModuleNotFoundError: No module named 'pyro'

However, pyro has been installed on my Windows system with Anaconda.
Maybe there is a path to change somewhere to allow proper installation.

Best wishes,

François

Importance, R^2 and p-values for single env predictors.

Hi,

maybe I just havent found it, but is there a way to see the importance, R^2 and p-value of a single environmental predictor?
e.g.
model <- sjSDM(Y = Occ, env = linear(data = Env, formula = ~X1+X2+X3), spatial = linear(data = SP, formula = ~0+X1:X2), se = TRUE, family=binomial("probit"), sampling = 100L, device = 'gpu' )

Where can I see the contribution of X3 in the model? I I got your outputs correct all predictors from the env argument are summed into A in the anova() and under env in the importance() output?!

Best regards,
Julian

NumPy array is not writeable, and PyTorch does not support non-writeable tensors

Dear colleagues,

I have the following issue when attempting to run s-jSDM with the R package:

..\torch\csrc\utils\tensor_numpy.cpp:141: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program.

I don't figure out what is the problem here, and how to resolve it.
Do you have any idea?

Best wishes,

François

Install error sh: line 1: 79672 Killed: 9

Some users get error messages such as the following during install

sh: line 1: 79672 Killed: 9               R_TESTS= '/Library/Frameworks/R.framework/Resources/bin/R' --no-save --no-restore --no-echo 2>&1 < '/var/folders/m_/zb7c8p_13k59p3zrpw84c4hm0000gq/T//Rtmpc0CxFQ/file13725537e4ec1'
ERROR: loading failed
* removing ‘/Library/Frameworks/R.framework/Versions/4.0/Resources/library/sjSDM’
Warning in install.packages :
  installation of package ‘/Users/pedro/Desktop/sjSDM_0.0.6.9000.tar.gz’ had non-zero exit status

Migrating sjSDM code to AWS

Hello,

I'm trying to run an R script that uses 'sjSDM' to do model training on AWS SageMaker. I'm trying to run the code in a Docker container, but the installation procedure fails to install PyTorch and all the other sjSDM dependencies. I'm trying to install sjSDM and dependecies in a Dockerfile using RUN R -e "remotes::install_github('https://github.com/TheoreticalEcology/s-jSDM', subdir = 'sjSDM', dep=FALSE)" and RUN R -e "sjSDM::install_sjSDM(version = 'gpu')".

I wanted to point out this issue for anyone who tries to migrate sjSDM code to AWS, but I would also like to solve this. Thanks.

Python install

Hi, Max, I just removed the link that didn't work in fc6fb9b

Does the rest of the pip stuff work (e.g. pip install sjSDM_py), or was this changed now that the code is in the package?

dependency installation issue in 0.1.8 - missing madgrad

I reinstalled s-jSDM this morning to get the importance update. Loading the package gave this readout:

── Attaching sjSDM ──────────────────────────────────────────────────── 0.1.8 ──
✔ torch
✔ torch_optimizer
✔ pyro
✖ madgrad

Torch or other dependencies not found:
1. Use install_sjSDM() to install Pytorch and conda automatically
2. Installation trouble shooting guide: ?installation_help
3. If 1) and 2) did not help, please create an issue on https://github.com/TheoreticalEcology/s-jSDM/issues (see ?install_diagnostic)

I tried install_sjSDM() with version = "cpu" which successfully added madgrad, but removed pyro. I also tried version = c("cpu","gpu") and version = "gpu" but that didn't change anything. install_sjSDM() says that all requirements are satisfied, including pyro, but the package still won't load successfully.

install diagnostic

Hi Max, I wonder if we should merge install diagnostic with installation_help. Seems to me logical to have both functions together.

Also, possibly, I wonder if check_dependencies or so would be a better name for the function?

Importance plots with missing component (env, BI, space)

If there is no spatial component, the importance plot produces a barplot instead of the ternary plot. I wonder if we should just keep the ternary plot, and set space (or any other component) to zero if it is absent?

linear() doesn't accept formula as object

Doesn't seem possible to add formula as object to sjDM() with linear()

set.seed(42)
# simulate data
community <- simulate_SDM(sites = 100, species = 10, env = 3)
Env <- community$env_weights

Env <- as.data.frame(Env)

# make formula
form1 <- as.formula(~V1+V2+V3)
form1

Env.lin1 <- linear(data = Env, formula = form1) # this throws an error: (Error: object of type 'symbol' is not subsettable)

Env.lin2 <- linear(data = Env, formula = ~V1+V2+V3) # this is OK

Memory problems for importance() with large covariances

Question from a user (redacted for conciseness and privacy):

... we have been working on analyzing an absolutely enormous XXX dataset with s-jSDM.

Good news: given enough processors and memory, s-jSDM does handle datasets working in the tens of thousands of species pretty well.

However, I have run into a subsequent memory problem when attempting to parse the importance from the model output. I’ve looked at the code for the function and I’m pretty sure it stems from the matrix multiplication expression involving the species covariance matrix (unsurprising, given its size).

So I was wondering: have either of you run any tests on resource requirements for the importance function to see how they scale with the number of species?

Species / site / predictor names

Systematically support names in outputs / plots?

readme suggestions

change to simulate_SDM(sites = 100, species = 50, env = 5)
or change to matrix(rnorm(800), 100, 2)

also, i find ternary diagrams easier to read if the three elements are on the axis (e.g. environment at the top vertex, biotic bottom left, spatial bottom right)

sjSDM_cv() Error in unserialize(node$con) : error reading from connection

I get a weird error running sjSDM_cv(). I'm using R 4.0.0. my students are running R 3.6.3 and are able to run the test cod e and also on their dataset.

so i'm wondering if it's an R 4 thing.

# sjSDM_cv()
# simulate sparse community:
com = simulate_SDM(env = 5L, species = 25L, sites = 100L, sparse = 0.5)

# tune regularization:
tune_results = sjSDM_cv(Y = com$response,
                        env = com$env_weights, 
                        tune = "random", # random steps in tune-paramter space
                        CV = 3L, # 3-fold cross validation
                        tune_steps = 25L,
                        alpha_cov = seq(0, 1, 0.1),
                        alpha_coef = seq(0, 1, 0.1),
                        lambda_cov = seq(0, 0.1, 0.001), 
                        lambda_coef = seq(0, 0.1, 0.001),
                        n_cores = 2L, # small models can be also run in parallel on the GPU
                        iter = 2L # we can pass arguments to sjSDM via ...
                        )

Error in unserialize(node$con) : error reading from connection

AIC function

I have implemented a logLik function in 80c906b ... question is if we should also implement an AIC ... I would tend towards not, because of the problem of counting df.

sjSDM::install_sjSDM(version = "cpu") seems to want pytorch

> sjSDM::install_sjSDM(version = "cpu")

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... failed with initial frozen solve. Retrying with flexible solve.

PackagesNotFoundError: The following packages are not available from current channels:

  - torchvision
  - torch

Current channels:

  - https://conda.anaconda.org/conda-forge/osx-64
  - https://conda.anaconda.org/conda-forge/noarch
  - https://repo.anaconda.com/pkgs/main/osx-64
  - https://repo.anaconda.com/pkgs/main/noarch
  - https://repo.anaconda.com/pkgs/r/osx-64
  - https://repo.anaconda.com/pkgs/r/noarch

To search for alternate channels that may provide the conda package you're
looking for, navigate to

    https://anaconda.org

and use the search bar at the top of the page.



Installation failed... Try to install manually PyTorch (install instructions: https://github.com/TheoreticalEcology/s-jSDM
If the installation still fails, please report the following error on https://github.com/TheoreticalEcology/s-jSDM/issues
one or more Python packages failed to install [error code 1]

Traits - fourth corner model?

We could provide the option to include traits following the fourth corner model from Brown et al., 2015 (The GLLVM model does it this way).

Setting a multivariate penality (prior) on the environmental predictors would be another option (I think, Hmsc does it this way), but I think the former would be preferable because any type of penalty would interfere with the p-values.

Error message in sjSDM if pytorch not available

Hi, I just tried this out, if you run

com = simulate_SDM(env = 3L, species = 5L, sites = 100L)
model = sjSDM(Y = com$response,env = com$env_weights, iter = 10L)

you without pytorch (luckily, I can do this, as I still haven't updated), I get

 Error in reticulate::py_is_null_xptr(fa) : object 'fa' not found 
3.
reticulate::py_is_null_xptr(fa) at utils.R#84
2.
check_module() at sjSDM.R#58
1.
sjSDM(Y = com$response, env = com$env_weights, iter = 10L)

whereas a good error message would say "pytorch not installed". I would just do the startup check also in sjSDM to check if the requirements are there.

test can't run in 0.1.8

hi there,
when i update to 0.1.8 and run the test model, it throws an error.
what happen with my mac?

Model_sjSDM object has no attribute 'set_weights'

I get the following error :

pred = predict(model, test_X)
  # Error in py_get_attr_impl(x, name, silent) : 
  # AttributeError: 'Model_sjSDM' object has no attribute 'set_weights'

Indeed model has an attribute weights, but not an attribute set_weights.

move troubleshooting help to missing_installation?

The large section about trouble shooting in sjSDM is a bit distracting. Maybe move this to missing_installation, and throw an error message that says look for ?missing_installation

Space

What shall we do about spatial predictors?

a) No api changes but provide an example with additional predictors in the env matrix
b) provide an extra argument in sjSDM for spatial predictors

Install "private" conda version?

Hi Max, just a follow-up to #23 - Now at least it works from a clean (= conda-free) computer. One thing that I am wondering - what happens if a user already has a conda version on their computer? At the moment, you are trying to use it, right?

Wouldn't it be safer to always install a dedicated "private" miniconda version for sjSDM?

Register importance and possibly other functions as S3 classes

Just running through Pedro's example, while having run RF before, I noted that if you load the RandomForest package before, this will create a problem

> imp = importance(model)
Error in UseMethod("importance") : 
  no applicable method for 'importance' applied to an object of class "c('sjSDM', 'linear')"

because RF registers importance as S3 class. Because of this, I think it would be safer to register all reasonably general sounding functions as S3 classes, or else use names such as sjSDM_importance (but I would prefer the former)

multiprocessing on the GPU

just stumbled across (for CV and tuning): https://discuss.pytorch.org/t/training-parallel-multiple-models/35238/4
for sometime in the future:

real multiprocessing instead of multiprocessing via R slaves (memory, efficiency, etc.)
native, in python

p-values on env components

As discussed. If faster, I would calculate the hessian per species, as one can assume that env estimates will be approximately independent across species.

sjSDM_model - hide or push?

At the moment, sjSDM_model is only / mostly? used internally to build the model. I wonder - is it distracting to have this open, and should we rather hide it? If we're not hiding it, I would add it a bit more prominently to the help and link it to other functions.

Regularization - model evaluation

We should at least implement a CV function

summary how to display the covariance/correlation matrix

We discussed it a few days ago but I only remembered it AFTER the latest PR. So it will go into the next PR:

Show the full matrix instead of the L
Question: covariance or correlation matrix?

CPU dtype=

Error in py_call_impl(callable, dots$args, dots$keywords) : can't convert np.ndarray of type numpy.object_

Dear Max,

I am sorry to bother you with a new issue when using sjSDM function with device = "gpu"...

Error in py_call_impl(callable, dots$args, dots$keywords) : 
  TypeError: can't convert np.ndarray of type numpy.object_. The only supported types are: float64, float32, float16, int64, int32, int16, int8, uint8, and bool.

Detailed traceback: 
  File "C:\Program Files\R\R-3.6.2\library\sjSDM\python\sjSDM_py\model_new.py", line 207, in fit
    dataLoader = self._get_DataLoader(X, Y, SP, RE, batch_size, True, parallel)
  File "C:\Program Files\R\R-3.6.2\library\sjSDM\python\sjSDM_py\model_new.py", line 164, in _get_DataLoader
    torch.tensor(Y, dtype=torch.float32, device=torch.device('cpu')))

Error in py_call_impl: not enough values to unpack

I have the following error with sjSDM function,

Error in py_call_impl(callable, dots$args, dots$keywords) : ValueError: not enough values to unpack (expected 2, got 1)

Any idea on what could be the cause?

TypeError: type torch.cuda.FloatTensor not available

With the latest version, I have the new following error :

Error in py_call_impl(callable, dots$args, dots$keywords) : 
  TypeError: type torch.cuda.FloatTensor not available. Torch not compiled with CUDA enabled.

Detailed traceback: 
  File "C:\Program Files\R\R-3.6.2\library\sjSDM\python\sjSDM_py\model_new.py", line 171, in build
    torch.set_default_tensor_type('torch.cuda.FloatTensor')
  File "C:\ProgramData\Anaconda3\envs\r-reticulate\lib\site-packages\torch\__init__.py", line 206, in set_default_tensor_type
    _C._set_default_tensor_type(t)

Does the new version require reinstalling Torch or Cuda?

Should some functions be better internal

There are a number of functions, such as

is_sjSDM_py_available

for which I wondered if they should be set as internal in Roxygen