pmbio / sclvm Goto Github PK

scLVM is a modelling framework for single-cell RNA-seq data that can be used to dissect the observed heterogeneity into different sources, thereby allowing for the correction of confounding sources of variation.

License: Apache License 2.0

Python 1.55% R 1.10% HTML 92.39% Jupyter Notebook 4.96%

sclvm's Introduction

scLVM

What is scLVM?

scLVM was primarily designed to account for cell-cycle induced variations in single-cell RNA-seq data where cell cycle is the primary soure of variability. For other use cases tutorials will follow shortly.

Software by Florian Buettner, Paolo Casale and Oliver Stegle. scLVM is explained in more detail in the accompanying publication [1].

Philosophy

Observed heterogeneity in single-cell profiling data is multi-factorial. scLVM provides an efficient framework for unravelling this heterogeneity, correcting for confounding factors and facilitating unbiased downstream analyses. scLVM builds on Gaussian process latent variable models and linear mixed models. The underlying models are based on inference schemes implemented in LIMIX.

Installation:

scLVM can be installed using pip install scLVM on most systems. If you have trouble using pip, have a look at the detailed instructions in the wiki.
It requires Python 2.7 with
- scipy, h5py, numpy, pylab
In addition, scLVM relies heavily on limix (version 1.0.8 or higher).
If you would like to use the non-linear GPLVM for visualisation, we suggest installing the GPy package. This can be installed using pip install GPy.
Preprocessing steps are executed in R and require R>3.0: This can either be perfromed as part of the R package (see also next bullet point) or via scripts. For an example of how raw counts can be processed appropriately, see our markdown vignette.
For users who prefer to run the entire scLVM pipeline in R, we also provide an R package wich is based on rPython. The scLVM R package can be downloaded here

How to use scLVM?

The current software version should be considered as beta. Still, the method is working and can be used to reproduce the result of the accompanying publication [1]. More extensive documentation, tutorials and examples will be available soon.

A good starting point are the tutorials for our R package and for the python implementation.

For an illustration of how scLVM can be applied to the T-cell data considered in Buettner et al. [1], we have prepared a notebook that can be viewed interactively or alternatively as PDF export. This is also available for the R package.

While in principle both the R package and the python package have the same funcitonality, we recommend using the R package as more extensive documentation is available and the focus of development currently lies on the R package.

Problems ?

If you want to use scLVM and encounter any issues, please contact us by email: [email protected]

License

See LICENSE

References

[1] Buettner F, Natarajan KN, Casale FP, Proserpio V, Scialdone A, Theis FJ, Teichmann SA, Marioni JC & Stegle O, 2015. Computational analysis of cell-to-cell heterogeneity in single-cell RNA-Sequencing data reveals hidden subpopulation of cells, Nature Biotechnology, doi: 10.1038/nbt.3102.

sclvm's People

Contributors

Stargazers

Watchers

Forkers

fpcasale bioboy2014 gmstanle eyay ekk2014 xiaojieqiu silask mdurante1 maaskola dduser luisacutillo78 willtownes chenhao392 matahi jeeachoi szmk aiminy xuxaxy q-kim flying-sheep cyang-2014 arcolombo shicheng-guo genomicsnx hoohm alenzhao jamesaliba scharch shians joeblack83 johnreid mgood2 anilbey echo250 senaj haroon123 samanfrm chitrita flyingcattle emattei shunsunsun mubashermohammed opnumten karolineholler

sclvm's Issues

ERROR running: sclvm = init(sclvm,Y=Y,tech_noise = tech_noise)

Error in python.exec(paste(objName, " = scLVM(Y,geneID=geneID,tech_noise=tech_noise)", :
The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

scLVM R package is really hard to install and use.
Can anybody give a user friendly guideline of scLVM !!!

Use of scLVM for single cell data under multiple conditions

I wish to remove the cell cycle effect for my data which consists of single cell RNAseq for cells under different conditions. I waas wondering which of the following methods made more sense:

Removing the cell cycle effect using the entire data as input
Removing the effect for each condition and combine the data that has the cell cycle effect regressed out after.
Thank you.

Issue in creating scLVM object (Python execution error)

Error Message: python.exec(paste(objName, " = scLVM(Y,geneID=geneID,tech_noise=tech_noise)", : name 'scLVM' is not defined

I am currently trying to work through the tutorial on scLVM package for R and running into the above error message when I try to create an sclvm object (line# 137 of scLVM_vignette.Rmd). I have been able to create the fitTechnicalNoise object (needed for sclvm object). However, when I try to use it to initialise the sclvm object the above error was obtained.

Reading some of the previous issues they suggested that the version of rPython seems to be the issue. My current rPython version is 0.0-6. I have also set the code to check that the versions match. Included is the running suggestions that was posted previously. These I have executed as well.

Previous Running suggestions:
Suggested help: python --version
export RPYTHON_PYTHON_VERSION="2.7.16"
R CMD INSTALL /Users/chadwatson/Downloads/rPython_0.0-6.tar.gz

I have also tried using smaller matrices,changing fitType of fitTechnicalNoise object, all of which resulting in the same python.exec error from line#137 of the tutorial. Is there a way for me to fix this issue to proceed in the tutorial?

error message when loading scLVM package

Hello, I followed the installation steps for the scLVM R package but I run into the following error message when doing R CMD INSTALL scLVM:

Error : .onAttach failed in attachNamespace() for 'scLVM', details:
call: if (pmatch("No module named ", e[[1]]) == 1) {
error: missing value where TRUE/FALSE needed

Do you have any knowledge how to fix this error?

Thanks

Feature request: more parameters to function getVariableGenes()

Dear scLVM team,

I am currently using some of the functionality implemented in scLVM to analyse your single-cell data.
I've made a few modifications to the plotting/identifying function getVariableGenes(... method="fdr" ...) that might be of interest as extra features (I've commented out the sections I have not updated):

# -----------------------------------------------------------------
# modification of getVariableGenes() to add more parameters:
# - (required) ERCC counts for overlay on plot
# - modifiable min bio dispersion parameter
# - adjustable ylim
# - possibility to have an interactive plot to identify points (genes)
#   within a hand-drawn polygon
# - returns plotted X and Y values
#   (& interactively selected points if applicable)
#   for futher manipulation
getVariableGenes <- function (
  nCountsEndo, nCountsERCC, fit, method = "fit", threshold = 0.1, minBiolDisp=0.5,
  ylim = NULL,
  fit_type = NULL, sfEndo = NULL, sfERCC = NULL, plot = T,
  interactive=FALSE) {

  if (!(method %in% c("fdr", "fit"))) {
    stop("'method' needs to be either 'fdr' or 'fit'")
  }
  if (is.null(fit_type)) {
    print("No 'fit_type' specified. Trying to guess its from parameter names")
    if ("a0" %in% names(coefficients(fit)) & "a1tilde" %in% 
        names(coefficients(fit))) {
      fit_type = "counts"
    }
    else {
      if ("a" %in% names(coefficients(fit)) & "k" %in% 
          names(coefficients(fit))) {
        fit_type = "log"
      }
      else {
        if (is.call(fit$call)) {
          fit_type = "logvar"
        }
      }
    }
    print(paste("Assuming 'fit_type' is ", "'", fit_type, 
                "'", sep = ""))
  }
  if (is.null(fit_type)) {
    stop("Couldn't guess fit_type. Please specify it or run the getTechincalNoise \n                           function to obtain the fit")
  }
  if (!(fit_type %in% c("counts", "log", "logvar")) & !is.null(fit_type)) {
    stop("'fit_type' needs to be either 'fdr' or 'fit'")
  }
  if (method == "fdr" & fit_type != "counts") {
    stop("method='fdr', can only be used with fit_type 'counts'")
  }
  if (method == "fdr" & (is.null(sfERCC) | is.null(sfEndo))) {
    stop("Please specify sfERCC and sfEndo when using method='fdr'")
  }
  if (method == "fdr") {
    meansEndo <- rowMeans(nCountsEndo)
    varsEndo <- rowVars(nCountsEndo)
    cv2Endo <- varsEndo/meansEndo^2

    meansERCC <- rowMeans(nCountsERCC)
    varsERCC <- rowVars(nCountsERCC)
    cv2ERCC <- varsERCC/meansERCC^2

    minBiolDisp <- (minBiolDisp^2)
    xi <- mean(1/sfERCC)
    m <- ncol(nCountsEndo)
    psia1thetaA <- mean(1/sfERCC) + (coefficients(fit)["a1tilde"] - xi) * mean(sfERCC/sfEndo)
    cv2thA <- coefficients(fit)["a0"] + minBiolDisp + coefficients(fit)["a0"] * minBiolDisp
    testDenomA <- (meansEndo * psia1thetaA + meansEndo^2 * cv2thA)/(1 + cv2thA/m)
    pA <- 1 - pchisq(varsEndo * (m - 1)/testDenomA, m - 1)
    padjA <- p.adjust(pA, "BH")
    print(table(padjA < 0.1))
    is_het = padjA < threshold
    is_het[is.na(is_het)] = FALSE
    if (plot == TRUE) {
      if (is.null(ylim)) ylim <- c(0.1, 250)
      plot(meansEndo, cv2Endo, log = "xy", col = 1 + is_het, 
           ylim = ylim, xlab = "Mean Counts", ylab = "CV2 Counts")
      xg <- 10^seq(-3, 5, length.out = 100)
      lines(xg, coefficients(fit)[1] + coefficients(fit)[2]/xg, 
            lwd = 2, col = "green")
      try(points(meansERCC, cv2ERCC, pch = 20, cex = 1, 
                 col = "blue"))
      legend("bottomleft", c("Endo. genes", "Var. genes", 
                             "ERCCs", "Fit"), pch = c(1, 1, 20, NA), 
                             lty = c(NA, NA, NA, 1), 
                             col = c("black", "red", "blue", "green"), 
                             cex = 0.7)
    }
  }
#   if (method == "fit" & fit_type == "log") {
#     LCountsEndo <- log10(nCountsEndo + 1)
#     LmeansEndo <- rowMeans(LCountsEndo)
#     Lcv2Endo = rowVars(LCountsEndo)/LmeansEndo^2
#     is_het = (fit$opts$offset * coefficients(fit)["a"] * 
#                 10^(-coefficients(fit)["k"] * LmeansEndo) < Lcv2Endo) & 
#       LmeansEndo > fit$opts$minmean
#     if (plot == TRUE) {
#       plot(LmeansEndo, Lcv2Endo, log = "y", col = 1 + 
#              is_het, xlab = "meansLogEndo", ylab = "cv2LogEndo")
#       xg <- seq(0, 5.5, length.out = 100)
#       lines(xg, fit$opts$offset * coefficients(fit)[1] * 
#               10^(-coefficients(fit)[2] * xg), lwd = 2, col = "green")
#       legend("bottomright", c("Endo. genes", "Var. genes", 
#                               "Fit"), pch = c(1, 1, NA), lty = c(NA, NA, 1), 
#              col = c("black", "red", "blue"), cex = 0.7)
#     }
#   }
#   if (method == "fit" & fit_type == "counts") {
#     meansEndo <- rowMeans(nCountsEndo)
#     varsEndo <- rowVars(nCountsEndo)
#     cv2Endo <- varsEndo/meansEndo^2
#     is_het = (coefficients(fit)[[1]] + coefficients(fit)[[2]]/meansEndo) < 
#       cv2Endo
#     if (plot == TRUE) {
#       if (is.null(ylim)) ylim <- c(0.1, 95)
#       plot(meansEndo, cv2Endo, log = "xy", col = 1 + is_het, 
#            ylim = ylim, xlab = "Mean Counts", ylab = "CV2 Counts")
#       xg <- 10^seq(-3, 5, length.out = 100)
#       lines(xg, coefficients(fit)[1] + coefficients(fit)[2]/xg, 
#             lwd = 2, col = "green")
#       legend("bottomright", c("Endo. genes", "Var. genes", 
#                               "Fit"), pch = c(1, 1, NA), lty = c(NA, NA, 1), 
#              col = c("black", "red", "green"), cex = 0.7)
#     }
#   }
#   if (method == "fit" & fit_type == "logvar") {
#     LCountsEndo <- log10(nCountsEndo + 1)
#     LmeansEndo <- rowMeans(LCountsEndo)
#     LVarsEndo <- rowVars(LCountsEndo)
#     xg = LmeansEndo
#     Var_techEndo_logfit_loess = predict(fit, LmeansEndo)
#     minVar_Endo = min(LVarsEndo[LmeansEndo > 2.5])
#     if (any(xg > 2.5 & Var_techEndo_logfit_loess < 0.6 * 
#             minVar_Endo)) {
#       idx = which(xg > 2.5 & Var_techEndo_logfit_loess < 
#                     0.6 * minVar_Endo)
#       Var_techEndo_logfit_loess[idx] = 0.6 * minVar_Endo
#     }
#     is_het = (Var_techEndo_logfit_loess < LVarsEndo) & LmeansEndo > 
#       0.3
#     print(sum(is_het))
#     if (plot == TRUE) {
#       plot(LmeansEndo, LVarsEndo, log = "y", col = 1 + 
#              is_het, xlab = "meansLogEndo", ylab = "varsLogEndo")
#       xg <- seq(0, 5.5, length.out = 100)
#       Var_techEndo_logfit_loess = predict(fit, xg)
#       if (any(xg > 2.5 & Var_techEndo_logfit_loess < 0.6 * 
#               minVar_Endo)) {
#         idx_1 = which(xg > 2.5 & Var_techEndo_logfit_loess < 
#                         0.6 * minVar_Endo)[1]
#         idx_end = length(Var_techEndo_logfit_loess)
#         Var_techEndo_logfit_loess[idx_1:idx_end] = 0.6 * 
#           minVar_Endo
#       }
#       lines(xg, Var_techEndo_logfit_loess, lwd = 2, col = "green")
#       legend("bottomright", c("Endo. genes", "Var. genes", 
#                               "Fit"), pch = c(1, 1, NA), lty = c(NA, NA, 1), 
#              col = c("black", "red", "green"), cex = 0.7)
#     }
#   }

  if (exists("meansEndo"))  X <- meansEndo
  if (exists("LmeansEndo")) X <- LmeansEndo
  if (exists("cv2Endo"))    Y <- cv2Endo
  if (exists("Lcv2Endo"))   Y <- Lcv2Endo

  selected <- NULL
  if (interactive) {
    if(!require(sp))      stop("Interactiveness requires package sp")
    if(!require(splancs)) stop("Interactiveness requires package splancs.")
    poly <- getpoly( quiet=TRUE )
    selected <- names(X)[point.in.polygon(X, Y, poly[,1], poly[,2])>0]
  }

  return( list(
    X        = X,
    Y        = Y,
    is_het   = is_het,
    selected = selected
  ))
}

Hope this helps in some way!

Best regards,

-- Alex

fitLMM dies with AttributeError

Hi,
When I call fitLMM, I get this:

`AttributeError Traceback (most recent call last)
in ()
7
8 # fit lmm without correction
----> 9 pv0,beta0,info0 = sclvm.fitLMM(K=None,i0=i0,i1=i1,verbose=False)
10 pv0,beta0,info0 = hack_fitLMM(sclvm, K=None,i0=i0,i1=i1,verbose=False)
11 # fit lmm with correction

/usr/lib/python2.7/site-packages/scLVM/core.pyc in fitLMM(self, K, tech_noise, idx, i0, i1, verbose)
333 else:
334 _K = None
--> 335 lm = QTL.test_lmm(Ystd,Ystd[:,ids:ids+1],K=_K,verbose=False,**lmm_params)
336 pv[count,:] = lm.getPv()[0,:]
337 beta[count,:] = lm.getBetaSNP()[0,:]

AttributeError: 'module' object has no attribute 'test_lmm'`

I'm using limix version 1.0.18, but previously tried it with 2.0.2 and had the same problem. The problem seems to be that scLVM calls QTL.test_lmm() when it should be calling QTL.qtl_test_lmm(). I think that test_lmm() was renamed.

Thanks and cheers.
Urbs

Useful all and no * import

Your example notebooks import stuff from utils and import core as scLVM

that’s unnecessary if you import only the useful stuff in __init__.py.

then users can simply do

from scLVM import load_data, scLVM, ...
# or
import scLVM
d = scLVM.load_data(...)

Error running init(sclvm,Y=Y,tech_noise = tech_noise)

I got this error message when running the example in the vignette and running on my own dataset:
Error in python.exec(paste(objName, " = scLVM(Y,geneID=geneID,tech_noise=tech_noise)", : The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

I'm pretty sure I installed every dependency correctly.

python.exec("import sys; print(sys.version)") 2.7.10 (default, Feb 7 2017, 00:08:15) [GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.34)]

and calling python --version in command window get
Python 2.7.10

R session info:

sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base

other attached packages:
[1] org.Mm.eg.db_3.4.1 AnnotationDbi_1.38.1 scLVM_0.99.3 rPython_0.0-6 RJSONIO_1.3-0
[6] DESeq2_1.16.1 SummarizedExperiment_1.6.3 DelayedArray_0.2.7 matrixStats_0.52.2 GenomicRanges_1.28.4
[11] GenomeInfoDb_1.12.2 IRanges_2.10.2 S4Vectors_0.14.3 gplots_3.0.1 statmod_1.4.30
[16] genefilter_1.58.1 cellrangerRkit_1.1.0 Rmisc_1.5 plyr_1.8.4 lattice_0.20-35
[21] bit64_0.9-7 bit_1.1-12 ggplot2_2.2.1 RColorBrewer_1.1-2 Biobase_2.36.2
[26] BiocGenerics_0.22.0 Matrix_1.2-10

loaded via a namespace (and not attached):
[1] bitops_1.0-6 httr_1.2.1 rprojroot_1.2 tools_3.4.1 backports_1.1.0
[6] R6_2.2.2 irlba_2.2.1 DT_0.2 rpart_4.1-11 KernSmooth_2.23-15
[11] Hmisc_4.0-3 DBI_0.7 lazyeval_0.2.0 colorspace_1.3-2 nnet_7.3-12
[16] gridExtra_2.2.1 compiler_3.4.1 htmlTable_1.9 plotly_4.7.0 checkmate_1.8.3
[21] caTools_1.17.1 scales_0.4.1 stringr_1.2.0 digest_0.6.12 foreign_0.8-69
[26] rmarkdown_1.6 XVector_0.16.0 base64enc_0.1-3 pkgconfig_2.0.1 htmltools_0.3.6
[31] htmlwidgets_0.9 rlang_0.1.1 RSQLite_2.0 bindr_0.1 jsonlite_1.5
[36] BiocParallel_1.10.1 gtools_3.5.0 acepack_1.4.1 dplyr_0.7.2 RCurl_1.95-4.8
[41] magrittr_1.5 GenomeInfoDbData_0.99.0 Formula_1.2-2 Rcpp_0.12.12 munsell_0.4.3
[46] stringi_1.1.5 yaml_2.1.14 zlibbioc_1.22.0 rhdf5_2.20.0 Rtsne_0.13
[51] grid_3.4.1 blob_1.1.0 gdata_2.18.0 splines_3.4.1 annotate_1.54.0
[56] locfit_1.5-9.1 knitr_1.16 geneplotter_1.54.0 XML_3.98-1.9 glue_1.1.1
[61] evaluate_0.10.1 latticeExtra_0.6-28 data.table_1.10.4 gtable_0.2.0 purrr_0.2.2.2
[66] tidyr_0.6.3 assertthat_0.2.0 xtable_1.8-2 survival_2.41-3 viridisLite_0.2.0
[71] tibble_1.3.3 pheatmap_1.0.8 memoise_1.1.0 bindrcpp_0.2 cluster_2.0.6

Uninitialized dataCB reference in "transform_counts_demo_no_spikeins.Rmd"

There seems to be an uninitialized "dataCB" reference in the "transform_counts_demo_no_spikeins.Rmd" demo file.

negative expression after cell cycle correction

Hi
This is Joe from Cedars-Sinai Medical Center. I am currently using your scLVM on single cell RNA-seq data to reduce the cell cycle effect. This is a very nice pipeline on single cell data analysis! So far, everything goes very well and no error message pop out by following your R tutorials. However, when I checked the gene expression after cell cycle correction, I found some of the genes have negative value of expression. I also used your demo data (Mouse T cells) to run the scLVM pipeline and also found the negative value after cell cycle correction. We know gene expression would never be negative, so my question is how to interpret this negative value? Thanks.

Looking forwards to your reply.

Joe

issue with plot getVariableGenes

Hi,

In the plot draw by getVariableGenes there are some missing elements as:

(1) spike-in
(2) fit curve

is_het = getVariableGenes(nCounts_batch$ENS, techNoise$fit, method = "fdr",threshold = 0.1, fit_type="counts",sfEndo=f_size_batch$sfENS, sfERCC=f_size_batch$sfERCC)

FALSE TRUE
27752 5461
Error in points(meansERCC, cv2ERCC, pch = 20, cex = 1, col = "blue") :
object 'meansERCC' not found

It possible to add this

Many thanks

Cynthia

python commands to get tech_noise

Hello,
I cannot find a python tutorial to get tech_noise, only the R version available. I do not want to jump between the two languages. Can anyone provide a python script to calculate tech_noise?

Feature Request- Conditional GPLVM

In the Nature Methods paper it suggests it would be possible to include covariates in the scLVM procedure but I don't think it is fully implemented yet in the gp_clvm function. Just wanted to say I'm very interested in this application and look forward to seeing it as a new feature.

Error running vignette

I am going through the vignette for the R package using the mouse TCell data. Everything works fine up until this point:

CellCycleARD = fitFactor(sclvm,geneSet = ens_ids_cc, k=20,use_ard = TRUE)

Which returns this error

Error in python.exec(paste("X,K,Kint,varGPLVM_ARD = ", objName, ".fitFactor(idx=idx, X0=X0, k=k,standardize=standardize, use_ard=use_ard, interaction=interaction, initMethod=initMethod)", : BLAS/LAPACK routine 'DLASCL' gave error code -4

Interestingly if I run the same method on my own (human) single cell data, it works fine, but I am not sure how to interpret the plot that is generated by the next part of the vignette.

plot(seq(1, length(CellCycleARD$X_ard)), CellCycleARD$X_ard, xlab = '# Factor', ylab = 'Variance explained')
title('Variance explained by latent factors')

core.py limix version check triggers automatic failure state, attempts to load nonexistent package

I'm trying to get scLVM working and am running into some installation problems.

Below I reference code in core.py:

import limix
try:
	limix.__version__
	if versiontuple(limix.__version__)>versiontuple('0.7.3'):
		import limix.deprecated as limix

except:
	import limix.deprecated as limix

versiontuple is not defined in local scope during import, so the except loop is automatically triggered by the NameError generated when versiontuple is referenced. As a result, limix.deprecated is imported constitutively.

However, many versions of limix do not have a deprecated package and it is not trivial to find the correct version of the package that will work with this code.

Suggest hard coding the correct versions into requirements.txt or setup.py so that a valid version is required for installation.

error with format R package scLVM_0.99.2.tar.gz

Hi I'm not enable to install or open your tar.gz (it seems that there is an issue with your archive):

R CMD INSTALL scLVM_0.99.2.tar.gz
Error in getOctD(x, offset, len) : invalid octal digit

tar -xvzf scLVM_0.99.2.tar.gz

gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now

Many thanks

Cynthia

IndexError: index XX is out of bounds for axis 0 with size XX of scLVM/core.py

Hi,

the code around 204 line of scLVM/core.py :

if not _conv:
        var[count,-2] = SP.maximum(0,y.var()-tech_noise[ids])
        var[count,-1] = tech_noise[ids]
        count+=1;
        if self.geneID is not None: geneID[count] = self.geneID[ids]
        continue

I think count+=1; should be put after if self.geneID is not None: geneID[count] = self.geneID[ids]. Otherwise when I call the function varianceDecomposition() with i0 and i1 setting as, for example, 100 and 150, the out of bounds error occur frequently if ids reach i1.
Also I don't know why do the assignment geneID[count] = self.geneID[ids] only when _conv is False, rather than assigning the geneID for every ids.

Thanks!

Issue with general use of the package via pip, R and github installation

I explored various ways of installing and using scLVM, without any success. I suggest you don't use this package anymore unless there are updates (open pull request) regarding imports and other Errors.

installation error

Tried installing with pip and got the following error, any suggestions?

Using cached scLVM-0.1.5.tar.gz
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 20, in <module>
      File "/private/var/folders/_r/b9d9b51x4c7dkxfv5m74bwsw0000gn/T/pip-build-UVqiVd/sclvm/setup.py", line 16, in <module>
        with open(path.join(here, 'README.md'), encoding='utf-8') as f:
      File "/usr/local/Cellar/python/2.7.10_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.py", line 884, in open
        file = __builtin__.open(filename, mode, buffering)
    IOError: [Errno 2] No such file or directory: '/private/var/folders/_r/b9d9b51x4c7dkxfv5m74bwsw0000gn/T/pip-build-UVqiVd/sclvm/README.md'

Calculation of squared coefficient of variation

Following is the snippet of R code for calculation of squared coefficient of variation for endogenous genes and ERCC spike - ins:

normalized counts (brennecke)

meansMmus <- rowMeans( nCountsMmus )
varsMmus <- rowVars( nCountsMmus )
cv2Mmus <- varsMmus / meansMmus^2

meansERCC <- rowMeans( nCountsERCC )
varsERCC <- rowVars( nCountsERCC )
cv2ERCC <- varsERCC / meansERCC^2

But squared coefficient of variation should be calculated as (variance/means)^2 and not variance/mean^2.

Is there a mistake in the code or am I missing something?

'module' object has no attribute 'CFixedCF'

Dear Authors,

I am running the vignette
https://github.com/PMBio/scLVM/blob/master/R/tutorials/scLVM_vignette.Rmd

on the current (cloned) version of scLVM
https://github.com/PMBio/scLVM/blob/master/R/scLVM_0.99.2.tar.gz.

In the section 'Fitting multiple factors', the line
% th2 = fitFactor(sclvmMult, idx = idx_Th2, XKnown = Xcc, k = 1, interaction=TRUE)
returns an error message of the type
% 'module' object has no attribute 'CFixedCF'.

Do you have an explanation and possibly a solution for that?

Thanks,
Jens

Error in running scLVM_vignette.Rmd

Error when running init(sclvm,Y=Y,tech_noise = tech_noise)

Hi,
I have already installed scLVM R package and limix (from anaconda) and is trying to go through the vignette. However, I encountered the following error when running init(sclvm,Y=Y,tech_noise = tech_noise):

Loading required package: rPython
Loading required package: RJSONIO
Error in python.exec(paste(objName, " = scLVM(Y,geneID=geneID,tech_noise=tech_noise)", : name 'scLVM' is not defined

All the previous code are exactly the same as the ones in the vignette. I am wondering why this error occurs and how to fix it. Thanks!

Below is the sessioninfo:

                             R version 3.2.3 (2015-12-10)
                       Platform: x86_64-apple-darwin13.4.0 (64-bit)
                       Running under: OS X 10.11.3 (El Capitan)

                       locale:
                             [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

                       attached base packages:
                             [1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods  
                       [9] base     

                       other attached packages:
                             [1] rPython_0.0-6             RJSONIO_1.3-0             org.Mm.eg.db_3.1.2       
                       [4] RSQLite_1.0.0             DBI_0.3.1                 AnnotationDbi_1.30.1     
                       [7] Biobase_2.28.0            scLVM_0.99.2              DESeq2_1.8.2             
                       [10] RcppArmadillo_0.6.600.4.0 Rcpp_0.12.5               GenomicRanges_1.20.8     
                       [13] GenomeInfoDb_1.4.3        IRanges_2.2.9             S4Vectors_0.6.6          
                       [16] BiocGenerics_0.14.0       gplots_3.0.1              ggplot2_2.1.0            
                       [19] statmod_1.4.24            genefilter_1.50.0        

                       loaded via a namespace (and not attached):
                             [1] RColorBrewer_1.1-2   futile.logger_1.4.1  plyr_1.8.4          
                       [4] XVector_0.8.0        bitops_1.0-6         futile.options_1.0.0
                       [7] tools_3.2.3          rpart_4.1-10         lattice_0.20-33     
                       [10] annotate_1.46.1      gtable_0.2.0         gridExtra_2.2.1     
                       [13] cluster_2.0.3        gtools_3.5.0         caTools_1.17.1      
                       [16] locfit_1.5-9.1       nnet_7.3-12          grid_3.2.3          
                       [19] XML_3.98-1.4         survival_2.38-3      BiocParallel_1.2.22 
                       [22] foreign_0.8-66       latticeExtra_0.6-28  gdata_2.17.0        
                       [25] Formula_1.2-1        geneplotter_1.46.0   lambda.r_1.1.7      
                       [28] scales_0.4.0         Hmisc_3.17-2         splines_3.2.3       
                       [31] rsconnect_0.4.2.2    xtable_1.8-2         colorspace_1.2-6    
                       [34] KernSmooth_2.23-15   acepack_1.3-3.3      munsell_0.4.3

proper naming

It’s bad style to name things in lowerCamelCase

Also classes have to be UpperCamelCase

As a python user I expect to do

# aah, I’m importing a class from a module, sure!
from sclvm import ScLVM

not

# huh, “scLVM” twice? the first one is a module, and the second one?
from scLVM.core import scLVM

error message when load limix in R

I was trying to use scLVM but I got an error message. I was following the instruction at https://github.com/PMBio/scLVM/blob/master/R/tutorials/scLVMr_demo_nospikeins.Rmd. Below is what I did and the error message I got:

I installed LIMIX by downloading the limix-master from github. I used the setup.py to install it: python setup.py install --user
I installed scLVM R package, by downloading the file from (https://github.com/PMBio/scLVM/tree/master/R). I installed it in the terminal: R CMD INSTALL scLVM_0.99.2.tar.gz -l ../R/x86_64-pc-linux-gnu-library/2.10/
I then opened R, and was able to load scLVM by library(scLVM). However, I was not able to run configLimix(limix_path). I found two possible limix_path:
./src/interfaces/python or ./build/temp.linux-x86_64-2.7/src/interfaces/python, but both of these paths gave me the same error message:

configLimix(limix_path)
File "", line 3
except Exception as e:_r_error = e.str()
^
IndentationError: expected an indented block
Warning message:
In file(con, "r") :
file("") only supports open = "w+" and open = "w+b": using the former

Please let me know how I would be able to use scLVM. Thanks.

Feature request: parallelised varianceDecomposition()