pln-team / plnmodels Goto Github PK

View Code? Open in Web Editor NEW

54.0 7.0 19.0 110 MB

A collection of Poisson lognormal models for multivariate count data analysis

Home Page: https://pln-team.github.io/PLNmodels

License: GNU General Public License v3.0

R 84.41% C++ 14.29% TeX 1.31%

poisson-lognormal-model count-data network-inference pca multivariate-analysis r-package

plnmodels's People

Contributors

Stargazers

Watchers

Forkers

ayanotakeuchi pythseq ahmedihafez jgarthur codacy-badger haohua zihao12 stjordanis uberstig fgindraud scj-robin yannmean evayiwenwang liqiang3323 nicolasjouvin cole-trapnell-lab camillemndn ananyapam7 jie-lee-hs

plnmodels's Issues

PLNnetwork_stabs should be a method of PLNnetworkfamily

The resampling procedure implemented in PLNnetwork_stabs relies on the sample data (responses, covariates, offsets) and parameters (penalties).

To avoid bad specifications, useless copies, I suggest to pass this function as a method of PLNnetworkfamily.

This is a prior requirement for answering Issue 5, that is, implementaiton of StARS.

PLN does not work when Y is a matrix with a single column

The following code fails

library(PLNmodels)
Y1 <- matrix(rpois(10, exp(0.5)), ncol = 1)
Y2 <- matrix(rpois(10, exp(0.5)), ncol = 1)
Y  <- cbind(Y1, Y2)
colnames(Y1) <- "Y1"
colnames(Y) <- c("Y1", "Y2")

PLN(Y ~ 1) # this works

PLN(Y1 ~ 1) # this does not work

Error in matrix(0, nrow(Y), ncol(Y)) : non-numeric matrix extent

Add dependencies that are not on CRAN?

prepare_data_from_biom and prepare_data_from_phyloseq depend respectively on biomformat and phyloseq which are not on the CRAN but on Bioconductor and we only use those packages to have access to extractor functions. Right now, I don't importFrom the extractors to limit the number of dependencies but this results in warnings when checking the package.

Should we:

add the bioconductor dependencies (knowing that phyloseq already depends on biomformat)
or ditch the two functions and only mention them in some "import data" vignettte?

Add test / se for estimated parameters in PLN

Implement score test score and/or Wald test to test significance of model parameters. Stick with univariate tests for the time being.

Vignettes are not build nor downloaded with devtools::install_github

Hi,

Installing the package from the Github repo with devtools (as recommended in the README) did not build the vignettes

devtools::install_github("jchiquet/PLNmodels", build_vignettes = TRUE)

The vignettes directory was not even downloaded from the Github repo by R. I tested a manual download of the archive and a local installation but it did not build the vignettes either.

However, the issue is solved (vignettes are properly built) when removing the default parameters of the build_opts options (as follows)

devtools::install_github("jchiquet/PLNmodels",build_opts = c("--no-resave-data"))
# instead of the following default 
# build_opts = c("--no-resave-data", "--no-manual",   "--no-build-vignettes")

Nonetheless, some dependencies from the vignettes (e.g. GGally or tidyverse) needed to be installed by the user.

Hope you wanted to know
Charlie

Offset

Create a set_offset method to implement a few popular normalization methods: TSS, CSS, GMPR, etc.

Problem when forcing the penalties in PLNnet

I get an error when trying to fix the penalties for PLNnetwork, even when reusing the penalties provided by default:

PLNnet = PLNnetwork(Y ~ 1);
Initialization...
Adjusting 30 PLN with sparse inverse covariance estimation
Joint optimization alternating gradient descent and graphical-lasso
sparsifying penalty = 0.593625
Post-treatments
DONE!
lambda = PLNnet$penalties

lambda = exp(seq(1.5, -1.5, length.out=50))

PLNnet = PLNnetwork(Y ~ 1, penalties=lambda)
Initialization...
Adjusting 30 PLN with sparse inverse covariance estimation
Joint optimization alternating gradient descent and graphical-lasso
Error in if ((convergence[iter] < control$ftol_out

dimnames in Post-Treatment (Matrix, matrix or data.frame objects)

Salut Julien,

je copie ci-dessous un bout de code dans lequel PLN plante (1e appel "PLN(Y ~ X)") mais ne plante plus si je nomme les colonnes de la matrice des réponses (2e appel "PLN(Y ~ X)"). J'ai le même plantage en simulant Y avec rPLN (3e appel "PLN(Y ~ X)").

> rm(list=ls())
> library(PLNmodels)
> n = 10; d = 3; p = 10
> X = matrix(rnorm(n*d), n, d)
> Y = matrix(rpois(n*p, 1), n, p)

> PLN(Y ~ X)

 Initialization...
 Adjusting a PLN model with full covariance model
 Post-treatments...Error in dimnamesGets(x, value) : 
  invalid dimnames given for “dgCMatrix” object

> colnames(Y) = paste0('Y', 1:p)
> PLN(Y ~ X)

 Initialization...
 Adjusting a PLN model with full covariance model
 Post-treatments...
 DONE!
A multivariate Poisson Lognormal fit with full covariance model.
==================================================================
 nb_param   loglik      BIC     ICL R_squared
       95 -100.612 -209.985 -73.565     0.007
==================================================================
* Useful fields
    $model_par, $latent, $var_par, $optim_par
    $loglik, $BIC, $ICL, $loglik_vec, $nb_param, $criteria
* Useful S3 methods
    print(), coef(), sigma(), vcov(), fitted(), predict(), standard_error()

> Y = rPLN (n=n, X%*%rnorm(d), Sigma = diag(p))
> PLN(Y ~ X)

 Initialization...
 Adjusting a PLN model with full covariance model
 Post-treatments...Error in dimnamesGets(x, value) : 
  invalid dimnames given for “dgCMatrix” object

PLNLDA plot fails with one dimension (aka with a 2-level grouping factor)

The following code runs ok:

library(PLNmodels)
data("mollusk")
mollusc <- prepare_data(mollusk$Abundance, mollusk$Covariate, offset = "TSS")
mollusc$Trt<-sample(c("A","B"),size = nrow(mollusc),replace = T)
LDA_TEST <- PLNLDA(Abundance ~ 1 + site + offset(log(Offset)), mollusc, grouping = mollusc$Trt)

However, the plot method fails with

plot(LDA_TEST)
Don't know how to automatically pick scale for object of type function. Defaulting to continuous.
Erreur : All columns in a tibble must be 1d or 2d objects:
* Column `colour` is function
Call `rlang::last_error()` to see a backtrace

And the backtrace corresponds to:

<error>
message: All columns in a tibble must be 1d or 2d objects:
* Column `colour` is function
class:   `rlang_error`
backtrace:
  1. graphics::plot(LDA_TESTdiag)
  2. PLNmodels:::plot.PLNLDAfit(LDA_TESTdiag)
  3. x$plot_LDA(nb_axes = nb_axes, var_cols = var_cols, plot = plot)
  4. gridExtra::arrangeGrob(...)
  5. base::lapply(grobs[toconv], ggplot2::ggplotGrob)
  6. ggplot2:::FUN(X[[i]], ...)
  9. ggplot2:::ggplot_build.ggplot(x)
 10. ggplot2:::by_layer(function(l, d) l$compute_aesthetics(d, plot))
 11. ggplot2:::f(l = layers[[i]], d = data[[i]])
 12. l$compute_aesthetics(d, plot)
 13. ggplot2:::f(..., self = self)
 14. ggplot2:::as_gg_data_frame(evaled)
 17. tibble:::as_tibble.list(x)
 18. tibble:::lst_to_tibble(x, .rows, .name_repair, col_lengths(x))
 19. tibble:::check_valid_cols(x)
Call `rlang::last_trace()` to see the full backtrace

It seems to incriminate the tibble.

Inversion issue in fisher information matrix

In the last version (which compute approximated standard error of the coefficients), I tried

library(PLNmodels)
data(mollusk)
out <- PLNPCA(Abundance ~ 1 + log(offset(duration)) + site, data = mollusk, rank = 5)

which fails with

Error in .local(a, b, ...) : 
  cs_lu(A) failed: near-singular A (or out of memory) 
15..local(a, b, ...) 
14.solve(.) 
13.solve(.) 
12.function_list[[i]](value) 
11.freduce(value, `_function_list`) 
10.`_fseq`(`_lhs`) 
9.eval(quote(`_fseq`(`_lhs`)), env, env) 
8.eval(quote(`_fseq`(`_lhs`)), env, env) 
7.withVisible(eval(quote(`_fseq`(`_lhs`)), env, env)) 
6.fim %>% solve %>% diag %>% sqrt %>% matrix(nrow = self$d) %>%  t() at PLNfit-class.R#298
5.self$compute_standard_error() at PLNfit-class.R#190
4.super$postTreatment(responses, covariates, offsets, weights) at PLNPCAfit-class.R#122
3.model$postTreatment(self$responses, self$covariates, self$offsets,  self$weights) at PLNfamily-class.R#57
2. myPCA$postTreatment() at PLNPCA.R#64
1. PLNPCA(Abundance ~ 1 + log(offset(duration)) + site, data = mollusk, 
    rank = 5)

Modify predict method for PLNPCA

Change predict method PLNPCA to easily handle new data by (i) estimating position in the latent space (without changing others parameters) and (ii) using those positions to predict class.

improve doc/vignette on PCA

I managed to run a PLNPCA on my data but I m lost at how to get the following elements:
rotation matrix?
covariates regressors coefficients values?
Principal components values?

my model is something like PLNPCA(pois_matrix ~ log_exposure_time + numeric_covariate)

Prediction in PLNmodels

Hi @mahendra-mariadassou,

Regarding the prediction function in
https://github.com/jchiquet/PLNmodels/blob/abcfa7072c488dd1037a00fa1b7a6dc7f9148585/R/PLNfit-class.R#L288

I was wondering why we use the expectancy of Z, that is exp(XB + O), for the prediction and not the expectancy (under the variational distribution) of what we denoted A in the model (which is equal to exp(O + M + X B + S/2) for PLN). It should now be possible to compute this with a single call to the V-EM step function that you wrote, even for new data...

What do yo think ?

Add weights parameters

I imagine it could be possible to weight each observation differently with a weight vector.
An argument weights would be handy in the PLN(PCA) calls (like in glm basically)

PLNPCA was reported to be slow at initialization when there are many variables, even when constraining with a small rank

I think this can be corrected by initializing with a linear model rather than with a PLN when p is too large.

Fails of some tests in CRAN devel version of R due t change of class in R base

Dear maintainer,

Please see the problems shown on
https://cran.r-project.org/web/checks/check_results_PLNmodels.html.

Specifically, see the problems shown for the r-devel Debian checks.

These can be reproduced by checking with --as-cran using current
r-devel, which for now sets

R_CLASS_MATRIX_ARRAY=true

in the check environment, to the effect that

R> class(matrix(1 : 4, 2, 2))
[1] "matrix" "array"

(and no longer just "matrix" as before).

According to the R NEWS file,

For now only active when environment variable R_CLASS_MATRIX_ARRAY
is set to non-empty, but planned to be the new unconditional behavior
when R 4.0.0 is released:

matrix objects now also inherit from class "array", namely, e.g.,
class(diag(1)) is c("matrix", "array") which invalidates code
assuming that length(class(obj)) == 1, an incorrect assumption that
is less frequently fulfilled now.

S3 methods for "array", i.e., .array(), are now also
dispatched for matrix objects.

Apparently your package no longer works correctly when
class(matrix(...)) gives a vector of length two: please fix as
necessary.

See
https://developer.r-project.org/Blog/public/2019/11/09/when-you-think-class.-think-again/index.html
for more information about correctly using class() in package code.

Please correct before 2019-12-18 to safely retain your package on CRAN.

Best,
-k

vcov is traditionally associated with the covariance of the coefficients, not with the residuals covariance

vcov should be used to access estimateor of the regression coefficients (also related to #10)
to access the covariance of the residuals (aka Sigma), @mahendra-mariadassou and I suggest to use
- coefficients(PLN, type = "covariance)
- define a specific method. In LME4; "VarCorr" is used
- overload vcov to add an argument (type = c("coefficients", "residuals")

For portability and hope for CRAN acceptance, we should rely on nloptr to include nlopt

I am on it in the nloptrAPI branch

plot_network fails when latent_network is of class dsyMatrix

The problem comes from igraph::graph_from_adjacency_matrix which expects Matrix objects to be sparse.

## MWE
x <- new("dsyMatrix"
    , x = c(0, 0.0830239291515631, 0.124614466958739, 0.0830239291515631, 
0, 0.169519822826263, 0.124614466958739, 0.169519822826263, 0
)
    , Dim = c(3L, 3L)
    , Dimnames = list(c("ARTHAUD Nathalie", "ASSELINEAU François", "CHEMINADE Jacques"
), c("ARTHAUD Nathalie", "ASSELINEAU François", "CHEMINADE Jacques"
))
    , uplo = "U"
    , factors = list()
)

igraph::graph_from_adjacency_matrix(x)

This can be fixed by forcing latent_network to output a sparse Matrix.

To boost coverage, we should add unit tests for the construction and manipulation of each class

Typically, object construction and call to class method is done internally and is not directly tested at the moment. Doing basic tests should boost the code coverage and help us maintaining the package.

Plot method for PLN network

Salut Julien,

je continue mon exploration de PLNnetwork. J'ai l'impression qu'il y a une blague dans l'option la fonction plot() en présence de covariables. La figure en bas à gauche ne contient qu'une colonne alors qu'elle devrait en comprendre autant que de covariables.

parallel fit?

when fitting the PLNPCA up to 5 dimensions, it uses 10 cores. I wonder if we could use more cores inside each fit to speed up the process?

Warnings in vignette PLN.Rmd on solaris and os x

CRAN maintainers now ask for removing warnings in

https://cran.r-project.org/web/checks/check_results_PLNmodels.html

Warning: Transformation introduced infinite values in continuous x-axis
Quitting from lines 186-190 (PLN.Rmd)
Error: processing vignette 'PLN.Rmd' failed with diagnostics:
unused argument (options = options)
--- failed re-building ‘PLN.Rmd’

probably due to an invalid option for the version of ggplot in solaris and os x...

Installation fails on Windows 64 with gcc8 due to DLL 'Rhdf5lib' not found

CRAN asks to correct the following error before 2020-01-15

https://www.r-project.org/nosvn/R.check/r-devel-windows-ix86+x86_64-gcc8/PLNmodels-00install.html

I suspect this is related to installation of BioConductor packages (phyloseq msot probably).

Add StARS selection criteria to PLN network

Add a stability selection step by default (with a limited number of resamples) and update getBestModel to output network with StARS selected lambda.

Add tests for graphical output

Use vdiffr to add tests for graphical output. Seems to work nicely with base graphics and ggplot but less so with grobs.

Does the method work for Poisson distributed data that's not an integer?

Hello, I enjoyed reading your paper and would like to use your method. I have to duel boot my mac to run ubuntu before I can install and run your software though, so I wanted to ask my question before I try yo get your software running.

My data consists of Poisson distributed count data that has been divided by a constant to account for a covariate. Does your approach only work for integer count data? Or can it work with transformed count data that's no longer an integer.

Best,
Will

Message d'erreur lors d'une tentative d'installation sous Mac OS

Infos sur la session :

R version 3.4.1 (2017-06-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: OS X El Capitan 10.11.6

devtools::install_github("jchiquet/PLNmodels")
Downloading GitHub repo jchiquet/PLNmodels@master
from URL https://api.github.com/repos/jchiquet/PLNmodels/zipball/master
Installing PLNmodels
'/Library/Frameworks/R.framework/Resources/bin/R' --no-site-file --no-environ --no-save
--no-restore --quiet CMD INSTALL
'/private/var/folders/rz/5x9sk5zn3x78vx78l4q8yk100000gn/T/RtmpxDaFGu/devtools9337ae5d612/jchiquet-PLNmodels-661d206'
--library='/Library/Frameworks/R.framework/Versions/3.4/Resources/library' --install-tests

installing source package ‘PLNmodels’ ...
Using NLOPT_LIBS=-L/usr/local/Cellar/nlopt/2.4.2_2/lib -lnlopt -lm
** libs
clang++ -std=gnu++11 -I/Library/Frameworks/R.framework/Resources/include -DNDEBUG -I"/Library/Frameworks/R.framework/Versions/3.4/Resources/library/Rcpp/include" -I"/Library/Frameworks/R.framework/Versions/3.4/Resources/library/RcppArmadillo/include" -I/usr/local/include -fopenmp -fPIC -Wall -g -O2 -c RcppExports.cpp -o RcppExports.o
clang: error: unsupported option '-fopenmp'
make: *** [RcppExports.o] Error 1
ERROR: compilation failed for package ‘PLNmodels’
removing ‘/Library/Frameworks/R.framework/Versions/3.4/Resources/library/PLNmodels’
Installation failed: Command failed (1)

Flexible offset computations

Change compute_offset so that it can compute "block-wise" offsets (for example one for bacterial taxa and one for fungal taxa).

Error in common_samples(counts, covariates) : Incompatible dimensions

Hello Julien,
I tried to create the list input in order to use PLNmodels and I encountered some problems.

I have two dataframes :

head(Covariante)

Prec0city Rep G L S M T MG B0u1dFree Year
1 1 1 1 1 0 1 1 1 0 2017
2 1 1 1 1 1 1 1 1 0 2017
3 1 1 1 1 1 1 0 1 0 2017
4 1 1 1 1 1 1 1 1 0 2017
5 1 1 1 1 1 0 1 1 0 2017
6 1 1 1 1 1 1 1 1 0 2017

head(Abundance)

U1k10w1_75 U1k10w1_34 U1k10w1_27 U1k10w1_111 U1k10w1_100 U1k10w1_6 U1k10w1_56 U1k10w1_48 U1k10w1_60 U1k10w1_68
4023 21132 0 18661 0 97939 11972 7695 4804 5309
5544 15848 0 15970 0 86601 10622 7005 4112 4016
3692 18743 0 16992 0 83510 12104 6355 4135 5051
U1k10w1_69 U1k10w1_115 U1k10w1_31 U1k10w1_35 U1k10w1_49 U1k10w1_112 U1k10w1_130 U1k10w1_37 U1k10w1_32
9119 11325 82049 23164 10984 16138 0 20213 18832
7458 0 55167 20371 10559 13495 0 18999 20433
8679 9141 48772 22972 10898 14701 10551 19786 18824
...
...
...

I created a list of these dataframes :
liste=list(Abundance=Abundance, Covariante=Covariante)

str(liste, max.level = 1)

List of 2
$ Abundance :'data.frame': 444 obs. of 256 variables:
$ Covariante:'data.frame': 444 obs. of 10 variables:

names(liste$Covariante)

[1] "Prec0city" "Rep" "G" "L" "S" "M" "T" "MG" "B0u1dFree"
[10] "Year"

names(liste$Abundance)

[1] "U1k10w1_75" "U1k10w1_34" "U1k10w1_27" "U1k10w1_111" "U1k10w1_100" "U1k10w1_6" "U1k10w1_56"
[8] "U1k10w1_48" "U1k10w1_60" "U1k10w1_68" "U1k10w1_69" "U1k10w1_115" "U1k10w1_31" "U1k10w1_35"
[15] "U1k10w1_49" "U1k10w1_112" "U1k10w1_130" "U1k10w1_37" "U1k10w1_32"
...
...

So I think here I have the good entries for the function prepare_data, so I tested :

data_prepare <- prepare_data(counts = liste$Abundance, covariates = liste$Covariate)

Error in common_samples(counts, covariates) : Incompatible dimensions
In addition: Warning message:
In common_samples(counts, covariates) :
There are no matching names in the count matrix and the covariates data.frame.
Function will proceed assuming:

samples are in the same order;

samples are rows of the abundance matrix.

My samples are in the same order and samples are rows of the abundance matrix, so no problem with the warnings. But I don't understand the error :

Error in common_samples(counts, covariates) : Incompatible dimensions

I have the same number of rows, so what is the problem of incompatible dimensions ?

Thank you for your help.
Best,
Amandine

Adjustment of the penalty matrix to include prior independent features/species

Hi!
The ranges of penalties (lambda) is currently uniform for the entire covariance matrix except the diagonal during the glasso step. Could it be possible to specify the penalties of some pairs of features (or species) within the PLNnetwork function ?

Prior sparsity could then already be included if some pairs were not of interest. Such additional feature could be done by providing the indices of the entries to be set to zeroes just like in the the glasso package, and set such pairs to a high penalty (ex. lambda_[max] or a huge number).
However glassoFast only accepts penalty matrices so a post treatment of the generated matrix of penalty to set the pairs to high penalties. Such post treatment could happen there https://github.com/jchiquet/PLNmodels/blob/bf14cc032f0cd011fd1fd4afbb2e30cccc548abe/R/PLNnetworkfit-class.R#L76

Or the user could provide the entire matrix to PLNnetwork function, but it should be the list of matrices over the range of the suggested penalties and it seems a bit more difficult to include.

Thanks,
Charlie

PLN LDA can't plot when the grouping variable has only two levels

When running PLN LDA when the grouping variable (here Group) has only two levels (TRUE,FALSE), the visualisation of the correlation circle is not implemented yet and raise the following error:

Error in combn(axes, 2, simplify = FALSE) : n < m

Here is a MWE:

library(ade4)
data("trichometeo")
trichometeo$meteo$Group <- trichometeo$meteo$Precip.Tot > 0

X<-as.matrix(trichometeo$fau)
G<-as.factor(trichometeo$meteo$Group)
library(PLNmodels)
LDA<-PLNLDA(X,G)
LDA$plot_LDA()

Definition of BIC/ICL is package dependent: we should explicitly say that it should be maximized in PLN

In PLNmodels, the BIC (and similarly EBIC, ICL, ...) is defined by loglik - 1/2 log(n) * n_param. It allows us to easily compare these criteria with the loglikelihood. However it is often defined in others packages by -2 loglik + log(n) n_param. So we should explicitly say that in PLNmodels, 'the greater BIC, the better',

This can be done in the console when calling plot.PLNfamily or added in the print method.

Stop PLNnetwork function when "blocked" (enhancement)

May be, stop after a given running duration ...

Add preconditionning in variational optimization for CCSAQ

Preconditionning is available in nlopt for CCSAQ and may speed up the optimizaiton process.

For the basic PLN model, all calculations have been derived for the Hessian and for the product of the Hessian with a 2 (n x p) vectors of variational parameters vec(M,S), and should also be memory efficient (same storage as the gradient actually).

Can't extract the StARS probability for all edges

Once the stability selection has been ran, it would be convenient to extract the probabilities from StARS for any model selected (BIC, StARS etc..). However, the stability_path data frame contains numeric indexes for the nodes instead of the nodes labels.

I wrote few lines to extract such probabilities just to check if it was feasible:

Data

library(PLNmodels)
library(tidyverse)
data("mollusk")
mollusc <- prepare_data(mollusk$Abundance, mollusk$Covariate, offset = "TSS")
mollusc<- mollusc %>%
  mutate(site_new =
           fct_collapse(site,
                        Gravier = c("GGravier1", "GGravier2", "GGravier3"),
                        Negria = c("Negria1", "Negria2")
           )
  )

Nets<-PLNnetwork( Abundance ~ 0 + site_new + offset(log(Offset)),data=mollusc,
                  control_init = list(nPenalties = 40,min.ratio=1e-2))
net_BIC<-getBestModel(Nets, "BIC")
Nets$stability_selection()

or Nets <- readRDS("Nets_debug_StARS.RDS") using the RDS (and zipped) file attached.

Issue


addStARsProba<-function(PLNfamily,PLNnet,debug=FALSE){
  # Check if stability was performed
  if (anyNA(PLNfamily$stability)) stop("stability selection has not yet been performed! Use stability_selection()")
  # Get the node labels
  node.labels <- colnames(PLNnet$latent_network())
  # Fetch the stability_path table and construct a named list with 
  #  the edges identifiers as names and the StARS probability as values (for the given penalty)
  probs<-PLNfamily$stability_path %>%
    # had to round to the same digits otherwise no intersection
    filter(round(Penalty,6)==round(PLNnet$penalty,6)) %>% 
    # Create the edge sequence
    mutate(NamedEdge=paste(node.labels[as.numeric(Node1)],
                           node.labels[as.numeric(Node2)],sep="|")) %>%
    select(x = Prob, nm = NamedEdge) %>% pmap(set_names) %>% unlist # as named list
  # Get the igraph object
  g<-plot(PLNnet,plot=F,remove.isolated = F)
  # match the edge sequence and add the probabilities
  if(debug){
    list("E" = probs[igraph::as_ids(igraph::E(g))],
         "Probs"=probs) 
  } else{ probs[igraph::as_ids(igraph::E(g))]}
}

# Extract the probabilities
addStARsProba(Nets, net_BIC)
Bit|Vac Vac|Vea    <NA>    <NA> Ans|Anl Ans|An- Ans|Pio Anl|An- Anl|Pio Pic|Pis Pin|Cat Pim|Pi- Cam|Cat 
   0.80    0.65      NA      NA    0.80    1.00    0.35    0.60    0.40    0.60    0.70    0.80    0.95 
Euf|Vea 
   1.00

Presence of some NAs, meaning some edges in the selected network are not in the stability selection.

> which(colnames(net_BIC$latent_network())=="Lit")
[1] 7
> which(colnames(net_BIC$latent_network())=="Pis")
[1] 25
> Nets$stability_path[Nets$stability_path$Edge=="7--25",]
[1] Edge    Penalty Prob    Node1   Node2  
<0 lignes> (ou 'row.names' de longueur nulle)
> Nets$stability_path[Nets$stability_path$Edge=="25--7",]
[1] Edge    Penalty Prob    Node1   Node2  
<0 lignes> (ou 'row.names' de longueur nulle)

So the numerical indexes of the nodes in the StARS data frame would not be in the same order as the initial table?

I thought about tweaking the bit of code there but it seems to depends on the edge_to_node which I was not sure what it was doing.

Thanks for considering this issue,
Charlie

Nets_debug_StARS.zip

Use the pipe "|" separator of edges instead of "--"

Minor issue here, but if #27 modifies the lines close to L150, it could be useful to change as well the separator from "--" to "|" L155.
Igraph uses the "--" to "print" the edges but the edge identifiers are coded with "|" instead to provide both directed and undirected and it is useful for subsetting.

library(igraph)
g<-graph_from_literal( Alice-Bob-Cecil-Alice, Daniel-Cecil-Eugene, Cecil-Gordon )
> E(g)
+ 6/6 edges from dbcead1 (vertex names):
[1] Alice--Bob    Alice--Cecil  Bob  --Cecil  Cecil--Daniel Cecil--Eugene Cecil--Gordon
> as_ids(E(g))
[1] "Alice|Bob"    "Alice|Cecil"  "Bob|Cecil"    "Cecil|Daniel" "Cecil|Eugene" "Cecil|Gordon"

Make the interface more GLM-like?

Would be handy to be able to provide a formula and the data where to pick the terms so it could deal with interactions automatically. Currently I understand I need to pass df$covariate_1 etc in the call to PLN(PCA) Also if I want interactions I have to build the vector before hand which is quite cumbersome.

Ideally something like PLNPCA(formula = output ~ x1 + x2 * x3, data = df)

Network reconstruction on very sparse matrices

Hello,

I submitted a GitHub issue a few months back asking whether the PLNnetwork() function works on a matrix of Poisson distributed data that's been transformed so it's not an integer. That issue is closed and the function worked fine. However, I tried running PLNnetwork() on the matrix containing raw counts and it basically returned an identity matrix as the network, where all the values are zero except for the diagonal elements.

The raw count matrix is very sparse (the majority of non-zero observations have counts of one or two). There are also only six observations. Does PLNnetwork() automatically remove observations with low counts, similar to how a lot of ecological analyses ignore species that are only counted once (i.e., singletons)?

Again, thank you for building and maintaining this package and for responding to issues. I've already gotten a lot of use out of this package.

Best,
Will

Extend to distributions mix

Was wondering if the algorithms here could be extended to a mix of count/binomial/continuous variables factors estimation in a unique algorithm. Say I have two Poisson and two Gaussian and 2 binomial variables?

Computational cost of post-treatment is excessive

This is due to multiple call to post-treatment for each child object. We can gain some time for instance by computing nullPoissonModel once and not multiple time.

https://github.com/jchiquet/PLNmodels/blob/c69a1da3d5cee434e44bbd8502f1a79cb365caae/R/utils.R#L19

called

https://github.com/jchiquet/PLNmodels/blob/c69a1da3d5cee434e44bbd8502f1a79cb365caae/R/PLNfamily-class.R#L57

Fisher matrix for PLNPCA

Check whether Fisher matrix computation is correct for PLNPCA (matrices have different dimensions than in standard PLN, some matrices maybe non conformable).

When preparing data from phyloseq, the abundance table has no rownames which breaks some plot functions

Basically, the object is converted to a matrix but it should have some rownames. Otherwise private$M does not have rownames and plot_indivudal_map breaks appart.

Add options to getModel

Method getModel is very sensitive to rounding errors on param value (especially in PLNnetwork). Add support to:

choose model by number in the list (like networks$models[[i]] but using the getModel interface) or
allow approximate param value matching (I guess matching up to 1e-8 should be enough).

What should we use as the null model in our goodness of fit criterion (R2) ?

We typically cannot compare R2 when different offset have been used right ? So at least send a warning to the user.

Handle missing data for real

We sold in the PCA paper that PLN PCA can easily handle with missing data, but it is not yet implemented...

Offset RLE

Correct RLE offset to mimic DESeq2 by ignoring null and infinite values in the median of ratio:
https://github.com/jchiquet/PLNmodels/blob/c54e7da9cd9be40001afdfa30f6e351fe833b3a6/R/import_utils.R#L137

Posterior probabilities in PLNLDA

Change predict.PLNLDAfit so it:

reports posterior probabilities (in addition to MAP and likelihoods)
allows for user-specified prior probabilities (instead of the ones recovered from the training data set).
computes average positions in the latent (e.g. as weighted averages of the positions conditional under each group).

Allow vector and matrix offsets in `prepare_data`

Change prepare_data so that it accepts user-specified offsets (either vector or matrix).

Samples with NA in covariates are silently removed

Hi again,

I just noticed that some samples were removed if NA were present in the covariates used (reprex below). Perhaps some users would like to know at least as a warning, like in the prepare_data function, before the #8 is solved, I guess. Otherwise, one can still check before including them.

data("trichoptera")
trichoptera <- prepare_data(trichoptera$Abundance, trichoptera$Covariate)
trichoptera$Pressure[29:32]<-NA
fit <- PLN(Abundance ~ 1 + Pressure + Temperature, data = trichoptera)

> dim(fitted(fit))
[1] 45 17
> dim(trichoptera$Abundance)
[1] 49 17

> data.frame(
+ fitted   = as.vector(fitted(fit)),
+ observed = as.vector(trichoptera$Abundance)
+ )
Error in data.frame(fitted = as.vector(fitted(fit)), observed = as.vector(trichoptera$Abundance)) : 
  les arguments impliquent des nombres de lignes différents : 765, 833

pln-team / plnmodels Goto Github PK

plnmodels's People

Contributors

Stargazers

Watchers

Forkers

plnmodels's Issues

I get an error when trying to fix the penalties for PLNnetwork, even when reusing the penalties provided by default:

lambda = exp(seq(1.5, -1.5, length.out=50))

Recommend Projects

Recommend Topics

Recommend Org