marionilab / scran Goto Github PK

View Code? Open in Web Editor NEW

39.0 7.0 23.0 2.92 MB

Clone of the Bioconductor repository for the scran package.

Home Page: https://bioconductor.org/packages/devel/bioc/html/scran.html

R 93.36% C++ 6.22% C 0.07% TeX 0.36%

human-cell-atlas single-cell-rna-seq bioconductor-package

scran's People

Contributors

Stargazers

Watchers

scran's Issues

To-do list

Need to add support for ranked overlapExprs output.
Need to modify trendVar and decomposeVar to report zero residual d.f. for all-zero groups.
Need to modify findMarkers to correctly compute variance in the presence of all-zero groups.
Need to modify findMarkers to test residuals with design=, rather than fitting a linear model.
Need to switch findMarkers to doing pairwise Welch t-tests.
Need to modify residual calculations to add back coefficients, like removeBatchEffect.

makeTechTrend() throws error : 'to' must be a finite number

Hi,

I am trying to apply your beautiful 10X template (https://bioconductor.org/packages/release/workflows/vignettes/simpleSingleCell/inst/doc/work-3-tenx.html) to our data set. Everything seems about right until the computeSumFactors() step which warns me that it "encountered negative size factor estimates" which I can avoid with "positive = TRUE". In both cases, the next step, makeTechTrend() throws an error which I did not manage to track back in the source code : "Error in seq.default(from = 0, to = upper.value, length.out = 100) : 'to' must be a finite number".

Any idea what might be happening here?

Thank a million !

Cheers,
Marc

Impossible to use list of matrices as mnnCorrect input

Hi,

How could one use a list of matrices as mnnCorrect input (more convenient for automation purpose)?

Taking the example provided in scran user manual, this works ok:

B1 <- matrix(rnorm(10000), ncol=50)                     # Batch 1
B2 <- matrix(rnorm(10000), ncol=50)                    # Batch 2
out <- mnnCorrect(B1, B2)                                       # corrected values

But there is an error with the following:

mat_list=list()
mat_list[["Batch_1"]]=B1
mat_list[["Batch_2"]]=B2
str(mat_list)
List of 2
 $ Batch1: num [1:200, 1:50] 1.107 -0.828 1.559 -1.353 0.667 ...
 $ Batch2: num [1:200, 1:50] -0.231 0.894 0.369 1.606 -1.346 ...

out <- mnnCorrect(mat_list)
Error in mnnCorrect(mat_list) : at least two batches must be specified

out <- mnnCorrect(cat(paste(gsub("^","mat_list$",names(mat_list)),collapse=", "))
Error in mnnCorrect(mat_list) : at least two batches must be specified

out <- mnnCorrect(capture.output(cat(paste(gsub("^","mat_list$",names(mat_list)),collapse=", ")))
Error in mnnCorrect(mat_list) : at least two batches must be specified

Is there a way to achieve this?

Thanks !

Error with computeSumFactors #2

Hi,

I was following the vignette from https://master.bioconductor.org/packages/release/workflows/html/simpleSingleCell.html
to count HVG:s starting from a gene-cell -matrix, but am facing issues with SummarizedExperiment -function, with the following error message coming along:

Error in .colSums(object) :
unknown SEXP type for SummarizedExperiment object

I saw this issue before, and then it was perhaps a discrepancy issue with SummarizedExperiment and SCE-object. However, even with a reproducible example I get the same issue as well:

nrows <- 200
ncols <- 6
counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows)
colData <- DataFrame(Treatment=rep(c("IO", "No"), 3),
                     row.names=LETTERS[1:6])

temp <- SummarizedExperiment(assays=list(counts=counts),
                     colData=colData)


calcAverage(temp)

Error in .colSums(object) : 
  unknown SEXP type for SummarizedExperiment object

With R version 3.5.1 running under macOS Sierra 10.12.6

FDR values show huge difference between `multiBlockVar()` and `ModelGeneVar()`

Hi,
I'm trying to update my workflow according to the new version of scran (v1.13.30)
In the feature selection step, I found that the new function ModelGeneVar() produced a very different hypothesis test result compare to multiBlockVar.
The multiBlockVar would give me ~ 19000 genes below FDR < 0.01 while ModelGeneVar only give ~ 300 genes

out <- multiBlockVar(sce, block = sce$donor, trend.args = list(min.mean = 0.5, use.spikes = F))
out.new <- modelGeneVar(sce, block = sce$donor)
par(mfrow = c(1,2))
hist(out$FDR)
hist(out.new$FDR)

The deconvoluted biological component is quite consistent with only slight difference:

I was wondering that what is the difference between two functions ? Do you change the hypothesis test ?

Thanks,

Yao He

fastMNN correction vector calculation

Hi,

Haven't quite delved into the source code yet, but in the documentation for the fastMNN function, it says for argument ndist:

A numeric scalar specifying the threshold beyond which neighbours are to be ignored when computing correction vectors. Each threshold is defined in terms of the number of median distances.

I'm wondering how cells with all neighbors exceeding this threshold have their correction vectors calculated, since those neighbors are to be ignored, if I understand correctly.

In the original paper (doi: 10.1038/nbt.4091) for MNN correction, it was stated that correction vectors for all cells would be a weighted average of the MNN vectors.

Thanks for any help you're able to give.

Negative expression value in the MNN corrected matrix

Hi,
One naive question regarding the MNN output.
I tried to run mnnCorrect to perform a batch normalization on my data sets. All input matrices have been normalized independently by using the normalise in the same scran package

clusters <- quickCluster(sce1)
sce1 <- computeSumFactors(sce1, clusters = clusters, size = seq(25, 110, 5))
sce1 <- normalise(sce1, log_exprs_offset = 1, exprs_values = 'counts')
clusters <- quickCluster(sce2)
sce2 <- computeSumFactors(sce2, clusters = clusters, size = seq(25, 110, 5))
sce2 <- normalise(sce2, log_exprs_offset = 1, exprs_values = 'counts')
......
mnn.result <- mnnCorrect(exprs(sce1), exprs(sce2), exprs(sce3), k = 20, cos.norm.in = TRUE, cos.norm.out = FALSE)

The cos.norm.out has been set to FALSE so the output matrix should be on the log scale, which is the same as to the input matrices. However, I found that a certain amount of genes had negative expression value in the corrected matrix.

I guess this is normal after scaling. However, if I really want to get rid of those negative log expression value, what is the best practice? mnnCorrect did not allow me to supply a log offset. Will it be OK if I just simply add a constant

mnn.matrix <- do.call(cbind, mnn.result[[1]])
mnn.matrix <- mnn.matrix - min(mnn.matrix)

All suggestions and recommended are welcomed.

Thanks in advance,
Yi-Chien.

Normalizing cell-specific biases issue for experiment without spike in

Hi,
really powerful package!
I am currently using data set from 10X to try the package, which is lack of spike-ins.
So I am confused that how should I use the normalization as well as trendVar function to deal with my data? (Actually, I tried to set use.spikes=FALSE, assuming that technical noise is the major contributor to the variance of most genes in the data set, however, I got top HVGs as house-keeping genes such as ACTA2, TPM1.)
Best.

Returning in findMarkers() the t-statistics (or other pairwise statistics calculated)

Hi Aaron,

For a project with @mattntran, we want to run findMarkers() and retain all the t-statistics it calculates. Matt currently has a line of code like this:

findMarkers(sce.hpc, groups=sce.hpc$contrast,
    assay.type="logcounts", design=mod, test="t",
    direction="up", pval.type="all", full.stats=TRUE
)

Current code

I see that pairwiseTTests() at pairwiseTTests.R#L271 computes the information it needs for calculating the t-statistics using .get_t_test_stats() at pairwiseTTests.R#L303-L324. Then that information is used for computing the t-statistics and the p-values at pairwiseTTests.R#L276 using .run_t_test() pairwiseTTests.R#L431-L466 that itself calculates the t-statistic (cur.t) depending on some input options. For example, if thresh.lfc is 0, then it does it by dividing the log fold change by the square root of the error pairwiseTTests.R#L437. Currently, .run_t_test() doesn't return the cur.t value at the end at pairwiseTTests.R#L465.

Option 1: Currently compute t-statistic

One option currently for us, would be to use the p-value to compute the t-statistic.

Since we know that thresh.lfc option we are using and because we used direction = "up" (so pairwiseTTests.R#L447-L454 is not run), we could use stats::qt() to get the t-statistic from the p-value. We would need to run the internal function .get_t_test_stats() pairwiseTTests.R#L303-L324 to obtain the degrees of freedom.

Or maybe we missed a function in scran() that does this already.

Option 2: edit `scran` to support returning the statistics

Another option would be to edit scran such that .run_t_test() returns the cur.t at pairwiseTTests.R#L465, then also return it at STATFUN at pairwiseTTests.R#L285-L294 then potentially also edit .pairwise_blocked_template() utils_markers.R#L38-L51 (you know what would need to be edited :P).

But then, maybe you'd also want to return the corresponding statistic for the Wilcoxon tests or other tests supported by findMarkers() when full.stats = TRUE.

Let us know how you think it would be best to proceed. For example, I can try submitting a PR about the t-stats.

Best,
Leo

Add more options for marker discovery

Allow specification of an exclude= argument to complement restrict=.
Allow specification of a minimum number of significant comparisons with pval.type="any".
Allow specification of a minimum number of significant comparisons with pval.type="some".

mnnCorrect output generates error when calculating TSNE

Hi, I don't have much experience with scran, but I've recently been testing the use of mnnCorrect on some of the samples I have. Everything seems to work and I can generate figures up until I try to run TSNE calculations on the generated output. At that point I get the error message:
Error in svd(x, nu = 0, nv = k) : a dimension is zero

After some digging around, I haven't quite figured out how to solve this error message in this particular instance. I was just wondering if you've had experience with something like this before, and if I should be doing some extra filtering or normalization pre/post analysis to avoid this.

The general overview of the code I use is the following:

rawData <- Read10X(filepath1)
fixed <- SingleCellExperiment(assays = list(counts=as.matrix(rawData)), colData = list(Sample = rep('Fixed',dim(rawData)[2])))
fixed <- computeSumFactors(fixed)
fixed <- normalize(fixed)

rawData <- Read10X(filepath2)
mixed <- SingleCellExperiment(assays = list(counts=as.matrix(rawData)), colData = list(Sample = rep('Mixed',dim(rawData)[2])))
mixed <- computeSumFactors(mixed)
mixed <- normalize(mixed)

out <- mnnCorrect(fixed@assays$data$logcounts, mixed@assays$data$logcounts)

mixed@assays$data$mnnCounts <- out$corrected[[2]]
fixed@assays$data$mnnCounts <- out$corrected[[1]]
combined <- cbind(fixed, mixed)
combined <- runPCA(combined, exprs_values = "mnnCounts")
plotReducedDim(combined, use_dimred = "PCA",
colour_by = "Sample", shape_by = "Sample")

combined <- runTSNE(combined, exprs_values = "logcounts")
plotTSNE(combined, colour_by = "Sample", shape_by = "Sample")

Thanks

Add 'known doublet' mode to doubletCells()

Following up from our Slack convo:

any thoughts on a way to tell scran::doubletCells() 'i know these droplets are doublets, treat them accordingly'? or another way of approaching a dataset where i know some droplets are doublets (based on HTOs and genotypes) and want to use 'guilt by assocation' like suggested in https://osca.bioconductor.org/doublet-detection.html#doublet-detection-in-multiplexed-experiments?

problem with argument "span“ in trendVar function

Hi, Aaron
Now I want to use trendVar to indentify HVGs in scRNA data, but I'm a little confused with the "span" argument. As you know, different "span" values lead to substantial difference in the downstream analysis. Do you have any suggestions for how to decide the "span" value for specific dataset?
Looking forward for your reply!

Ask about trendVar step

Hi again, I met an error when running trendVar in identifying highly variable genes as below.
"Error in model.frame.default(formula = to.fit ~ kept.means, weights = c(1_EpiAPST2 = 59L, :
variable lengths differ (found for 'kept.means')"
If set use.spikes=FALSE then no error.
My questions are:

How to fix the error?
If it is a big differece use or don´t use spick-in here?
Thank you very much!

Default sigma change in mnnCorrect

Hi,

The default value of sigma has changed from 1.0 to 0.1, but the documentation still says:

The choice of sigma determines the extent of smoothing - a value of 1 is used by default to reflect the boundaries of the space after cosine normalization

What's the reason behind the change? Is 0.1 a better parameter for most use cases? Thanks!

Streamline DE machinery

Bring pairwiseWilcox() and pairwiseBinom() under the auspices of findMarkers()
Deprecate overlapExprs(), which is basically just findMarkers() for pairwiseWilcox() anyway.
Add a way to quickly get the top markers for use in marker detection in SingleR.

Comparing modelGeneVar with old function trendVar

Hi.
Thanks to your package, I've been normalized sc-RNA seq data and found HVGs for years.
Now, I'm wondering the difference between updated function 'modelGeneVar' and old function trendVar.
The results between two functions are quiet different.
Also, I cannot find fitting method like loess and splind in modelGeneVar.
Is there any way to see same results with trendVar from modelGeneVar?
And what is the major difference between two functions?

doubletCluster error: invalid rownames length

Hi，

I'm using doubletCluster to identify potential doublets but having this error:
dbl.out <- doubletCluster(sce, sce$Cluster)
Error in rownames<-(*tmp*, value = c("AL627309.1", "CICP27", "AL627309.7", :
invalid rownames length
This error does not show up when I run doubletCluster using the following example?
https://rdrr.io/github/MarioniLab/scran/man/doubletCluster.html
Anyone can help?

Best,
Danshu

computeSumFactors

Error in "Modeling the technical noise in gene expression" with trendVar()

Hello,

I am reproducing the pipeline in my RStudio (v1.2.1335) (R 3.6) called "Analyzing single-cell RNAseq data containing read counts" and at Section 7 "Modeling the technical noise in gene expression" I get the following error though I executed all earlier commands necessary for trendVar, including computeSpikeFactors() and normalize(), without any hassle. That's indeed the first first bug I found the in the running pipeline so everything worked out pretty well up until now.

var.fit <- trendVar(sce, parametric=TRUE, block=sce$Plate,loess.args=list(span=0.3))
Error in seq_len(nrow(x)) :
argument must be coercible to non-negative integer
In addition: Warning message:
In seq_len(nrow(x)) : first element used of 'length.out' argument

The sce object looks OK, too. I am guessing and NAs or zeros might be the leading cause but do not know how to fix this. Thank you so very much for the time and the effort!

problem about "BOOST_CONSTEXPR" while installing the scran package

I'm trying to install scran with command:
BiocManager::install("scran")

And then this ERROR came out:

In file included from /xxx/xxx/rpackages/BH/include/boost/utility.hpp(22),
                 from /xxx/xxx/rpackages/BH/include/boost/range/size.hpp(25),
                 from /xxx/xxx/rpackages/BH/include/boost/random/hyperexponential_distribution.hpp(29),
                 from /xxx/xxx/rpackages/BH/include/boost/random.hpp(69),
                 from rand_custom.h(6),
                 from compute_rho_null.cpp(4):
/xxx/xxx/rpackages/BH/include/boost/core/noncopyable.hpp(42): error: defaulted default constructor cannot be constexpr because the corresponding implicitly declared default constructor would not be constexpr
        BOOST_CONSTEXPR noncopyable() = default;
                        ^

compilation aborted for compute_rho_null.cpp (code 2)
make: *** [compute_rho_null.o] Error 2
ERROR: compilation failed for package ‘scran’

I tried reinstalling the 'BH' pacakge and it didn't solve the problem.
I really don't know how to solve the problem.

My R version is
R version 3.6.0 (2019-04-26) -- "Planting of a Tree"
Copyright (C) 2019 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

Many thanks for any suggestions.

Copy-paste typo in modelGeneVarByPoisson docs

The title and description of ?modelGeneVarByPoisson makes reference to using spike-ins, but this function is intended for use when there aren't spike-ins available.
I'm guessing it's a copy-paste typo from ?modelGeneVarWithSpikes.

Unexpected error when using fastMNN

Thanks for your advice in the last question. We used the pipeline in
pipeline
All are perfect in the first time, but when we use other gene as variable genes, error appears.

mnn.out <- do.call(fastMNN, c(original, list(k=50, d=50, approximate=TRUE, auto.order=TRUE)))
Error in (function (A, nv = 5, nu = nv, maxit = 100, work = nv + 7, reorth = TRUE,  :
  BLAS/LAPACK routine 'DLASCL' gave error code -4

Thanks & Best wishes

Error with mnnCorrect: Not compatible with requested type

First of all thank you very much for this so helpful package!

I am encountering an error when using the newly developed mnnCorrect function for correcting batch effects.
Please find below the error message when applying this function to two datasets generated with 10x Genomics technology (capture of cells at two timepoints):

> res <- mnnCorrect(d0, d5)
Error in compute.correction.vectors(ref.batch.in, other.batch.in, s1,  : 
  Not compatible with requested type: [type=S4; target=double].

Here are some details on the input objects:

# d0 converted to CellDataset object (generated with monocle)
> class(d0)
[1] "dgCMatrix"
attr(,"package")
[1] "Matrix"
> class(d5)
[1] "dgCMatrix"
attr(,"package")
[1] "Matrix"
# CellDataset objects
> d0_cells
CellDataSet (storageMode: environment)
assayData: 33694 features, 872 samples 
  element names: exprs 
protocolData: none
phenoData
  sampleNames: AACACGTAGAAACGCC-1 AACGTTGGTCTTCAAG-1 ...
    TTTGTCATCTGCTGCT-3 (872 total)
  varLabels: barcode Size_Factor
  varMetadata: labelDescription
featureData
  featureNames: ENSG00000243485 ENSG00000237613 ... ENSG00000268674
    (33694 total)
  fvarLabels: id gene_short_name
  fvarMetadata: labelDescription
experimentData: use 'experimentData(object)'
> d5_cells
CellDataSet (storageMode: environment)
assayData: 33694 features, 3239 samples 
  element names: exprs 
protocolData: none
phenoData
  sampleNames: AAACCTGAGCCATCGC-1 AAACCTGAGCTGTCTA-1 ...
    TTTGTCATCACAACGT-6 (3239 total)
  varLabels: barcode Size_Factor
  varMetadata: labelDescription
featureData
  featureNames: ENSG00000243485 ENSG00000237613 ... ENSG00000268674
    (33694 total)
  fvarLabels: id gene_short_name
  fvarMetadata: labelDescription
experimentData: use 'experimentData(object)'
Annotation:

Below are R session informations. Thank you very much in advance for your help

R Under development (unstable) (2018-01-13 r74117)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 9 (stretch)

Matrix products: default
BLAS: /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/libopenblasp-r0.2.19.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C             
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] scran_1.7.12               SingleCellExperiment_1.1.2
 [3] SummarizedExperiment_1.9.5 DelayedArray_0.5.16       
 [5] matrixStats_0.52.2         Biobase_2.39.1            
 [7] GenomicRanges_1.31.3       GenomeInfoDb_1.15.1       
 [9] IRanges_2.13.10            S4Vectors_0.17.21         
[11] BiocGenerics_0.25.1        BiocParallel_1.13.1       
[13] BiocInstaller_1.29.3      

loaded via a namespace (and not attached):
 [1] bitops_1.0-6           DDRTree_0.1.5          bit64_0.9-7           
 [4] RColorBrewer_1.1-2     progress_1.1.2         httr_1.3.1            
 [7] dynamicTreeCut_1.63-1  tools_3.5.0            DT_0.2                
[10] R6_2.2.2               irlba_2.3.2            vipor_0.4.5           
[13] DBI_0.7                lazyeval_0.2.1         colorspace_1.3-2      
[16] gridExtra_2.3          prettyunits_1.0.2      bit_1.1-12            
[19] compiler_3.5.0         slam_0.1-42            scales_0.5.0          
[22] stringr_1.2.0          digest_0.6.14          XVector_0.19.5        
[25] scater_1.7.3           pkgconfig_2.0.1        htmltools_0.3.6       
[28] limma_3.35.5           htmlwidgets_0.9        rlang_0.1.6           
[31] RSQLite_2.0            VGAM_1.0-4             FNN_1.1               
[34] shiny_1.0.5            bindr_0.1              combinat_0.0-8        
[37] dplyr_0.7.4            RCurl_1.95-4.10        magrittr_1.5          
[40] GenomeInfoDbData_1.1.0 Matrix_1.2-12          Rhdf5lib_1.1.5        
[43] Rcpp_0.12.14           ggbeeswarm_0.6.0       munsell_0.4.3         
[46] viridis_0.4.1          stringi_1.1.6          edgeR_3.21.5          
[49] zlibbioc_1.25.0        rhdf5_2.23.4           Rtsne_0.13            
[52] plyr_1.8.4             grid_3.5.0             blob_1.1.0            
[55] ggrepel_0.7.0          shinydashboard_0.6.1   lattice_0.20-35       
[58] splines_3.5.0          locfit_1.5-9.1         pillar_1.1.0          
[61] igraph_1.1.2           rjson_0.2.15           reshape2_1.4.3        
[64] biomaRt_2.35.6         XML_3.98-1.9           glue_1.2.0            
[67] data.table_1.10.4-3    httpuv_1.3.5           gtable_0.2.0          
[70] assertthat_0.2.0       ggplot2_2.2.1          mime_0.5              
[73] xtable_1.8-2           monocle_2.7.0          qlcMatrix_0.9.5       
[76] HSMMSingleCell_0.113.0 viridisLite_0.2.0      tibble_1.4.1          
[79] pheatmap_1.0.8         AnnotationDbi_1.41.4   beeswarm_0.2.3        
[82] memoise_1.1.0          tximport_1.7.4         bindrcpp_0.2          
[85] cluster_2.0.6          fastICA_1.2-1          densityClust_0.3      
[88] statmod_1.4.30

multiBatchNorm with subset.row

Hi,
I am using multiBatchNorm function on two SingleCellExperiment objects with different numbers of rows. I want to use the subset.row argument to specify the common rows between the datasets to use for the normalization, but I keep getting an error that the number of rows is different between the dataset.

`> sce_cptp28_test
class: SingleCellExperiment
dim: 11745 1552
metadata(0):
assays(1): counts
rownames(11745): ENSG00000237491 ENSG00000225880 ... ENSG00000278817
ENSG00000271254
rowData names(0):
colnames(1552): AAACGGGCATGCAATC AAACGGGTCGAATGCT ... TTTGTCATCAAGATCC
TTTGTCATCCTATGTT
colData names(0):
reducedDimNames(0):
spikeNames(0):

sce_cptp29_test
class: SingleCellExperiment
dim: 12200 2009
metadata(0):
assays(1): counts
rownames(12200): ENSG00000237491 ENSG00000225880 ... ENSG00000278384
ENSG00000271254
rowData names(0):
colnames(2009): AAACCTGCAAGTAGTA AAACCTGCATCTATGG ... TTTGTCAGTTATGCGT
TTTGTCAGTTGCTCCT
colData names(0):
reducedDimNames(0):
spikeNames(0):
str(var_genes)
chr [1:2000] "ENSG00000143546" "ENSG00000115523" "ENSG00000101439" ...
rescaled_test <- multiBatchNorm(sce_cptp28_test, sce_cptp29_test, subset.row = var_genes)
Error in .check_batch_consistency(batches, byrow = TRUE) :
number of rows is not the same across batches
`

Error in .check_batch_consistency(batches, byrow = TRUE) : number of rows is not the same across batches

Hi, Aaron
When I run the fastMNN on my sampls( which from 10x), it keep complain the error "number of rows is not the same across batches". May you help me with that?
Best,
Yong

pathlist <- c("pig_v94/filtered_p/Adult_nc_r1_p", "pig_v94/filtered_p/Adult_r1_p" )
pathlist <-pathlist[1:2]
library(DropletUtils)
library(scran)
library(SingleCellExperiment)
dalist <- list()
davar <- list()
batch <- c()
aliases <- c()
if (is.null(aliases)) aliases <- sub(".*/","", pathlist)
for( i in 1:length(pathlist)){
newite <- read10xCounts(pathlist[i], col.names=T)
colnames(newite) <- paste(aliases[i],colnames(newite),sep="_")
logcounts(newite) <- counts(newite)
dec <- tryCatch(rownames(decomposeVar(newite, trendVar(newite, parametric=T, use.spikes = FALSE))),error=function(cond){return(c())})
dalist <- c(dalist, newite)
davar <- c(davar, dec)
batch <- c(batch, rep(aliases[i], ncol(counts(newite))))
}
Warning messages:
1: In (function (formula, data = parent.frame(), start, control = nls.control(), :
singular gradient
2: In (function (formula, data = parent.frame(), start, control = nls.control(), :
singular gradient
dalist <- do.call(multiBatchNorm, dalist)
Warning messages:
1: In .batch_rescaler(batches, subset.row = nonspike.subset, exprs_values = assay.type, :
no size factors in batch 1, using sum of counts instead
2: In .batch_rescaler(batches, subset.row = nonspike.subset, exprs_values = assay.type, :
no size factors in batch 2, using sum of counts instead
davar <- sort(table(as.character(davar)))
if (length(davar) > 2000) davar <- names(davar)[1:2000]
else davar <- names(davar)
mnninput <- list();
data <- list()
for( i in 1:length(pathlist)){
mnninput <- c(mnninput, list(as.matrix(logcounts(dalist[[i]]))[davar,]))
data <- c(data, (logcounts(dalist[[i]])))
print(dim(mnninput[[i]]))
}
[1] 2000 2552
[1] 2000 737
mnnout <- do.call(fastMNN, c(dalist, list(k=20,d=50,approximation=T,subset.row=davar)))
Error in .check_batch_consistency(batches, byrow = TRUE) :
number of rows is not the same across batches

sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252

attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base

other attached packages:
[1] DropletUtils_1.2.1 scater_1.10.0 ggplot2_3.1.0 scran_1.10.1 SingleCellExperiment_1.4.0
[6] SummarizedExperiment_1.12.0 DelayedArray_0.8.0 matrixStats_0.54.0 Biobase_2.42.0 GenomicRanges_1.34.0
[11] GenomeInfoDb_1.18.0 IRanges_2.16.0 S4Vectors_0.20.0 BiocGenerics_0.28.0 BiocParallel_1.16.0

loaded via a namespace (and not attached):
[1] Rcpp_1.0.0 locfit_1.5-9.1 lattice_0.20-35 digest_0.6.18 assertthat_0.2.0
[6] R6_2.3.0 plyr_1.8.4 dynamicTreeCut_1.63-1 pillar_1.3.0 zlibbioc_1.28.0
[11] rlang_0.3.0.1 lazyeval_0.2.1 irlba_2.3.2 rstudioapi_0.8 Matrix_1.2-14
[16] labeling_0.3 BiocNeighbors_1.0.0 statmod_1.4.30 Rtsne_0.13 stringr_1.3.1
[21] igraph_1.2.2 RCurl_1.95-4.11 munsell_0.5.0 HDF5Array_1.10.0 compiler_3.5.1
[26] vipor_0.4.5 pkgconfig_2.0.2 ggbeeswarm_0.6.0 tidyselect_0.2.5 tibble_1.4.2
[31] gridExtra_2.3 GenomeInfoDbData_1.2.0 edgeR_3.24.0 viridisLite_0.3.0 crayon_1.3.4
[36] dplyr_0.7.8 withr_2.1.2 bitops_1.0-6 grid_3.5.1 gtable_0.2.0
[41] magrittr_1.5 scales_1.0.0 stringi_1.2.4 XVector_0.22.0 reshape2_1.4.3
[46] viridis_0.5.1 bindrcpp_0.2.2 limma_3.38.2 DelayedMatrixStats_1.4.0 cowplot_0.9.3
[51] Rhdf5lib_1.4.0 tools_3.5.1 glue_1.3.0 beeswarm_0.2.3 purrr_0.2.5
[56] colorspace_1.3-2 rhdf5_2.26.0 bindr_0.1.1

Error: Not compatible with requested type: [type=character; target=double].

findMarkers test.type functionality broke

I think the test.type functionality of findMarkers() has broke.

When I try running my former code after updating the package I now get the following error message: Error in .local(x, ...) : unused argument (test.type = "wilcox")

'clusters' or 'cluster' in computeSumFactor

Hi,

Just a very naive question.
In the reference manual of scran, computeSumFactors has one argument named clusters. However, in almost all of the documents/tutorials, people use cluster instead of clusters.

For instance:

Are cluster and clusters interchangeable in computeSumFactors?

Thanks,
Yi-Chien (Bart).

NaNs in mnnCorrect output matrix

Hi,
I found that in corrected batch2 data, most of values are NaN. Here are my codes for running mnnCorrect. In logcounts(sce1) and logcounts(sce2), there are no NAs or infinite values.

fit1 <- trendVar(sce1, parametric=TRUE,use.spikes=FALSE)
dec1 <- decomposeVar(sce1, fit1)
fit2 <- trendVar(sce2, parametric=TRUE,use.spikes=FALSE)
dec2 <- decomposeVar(sce2, fit2)
universe <- intersect(rownames(dec1), rownames(dec2))
mean.bio <- (dec1[universe,"bio"] + dec2[universe,"bio"])/2
chosen <- universe[mean.bio > 0]
original <- list(batch1=logcounts(sce1)[chosen,],batch2=logcounts(sce2)[chosen,])
mnn.out <- do.call(mnnCorrect, c(original, list(k=20,cos.norm.in=FALSE,cos.norm.out=FALSE)))
rmat <- do.call(cbind, list(counts1=counts(sce1)[chosen,],counts2=counts(sce2)[chosen,]))
mnndat <- do.call(cbind, list(mnn1=mnn.out$corrected[[1]],mnn2=mnn.out$corrected[[2]]))
sce <- SingleCellExperiment(list(counts=rmat,logcounts=mnndat))

Best,
danshu

is sizeFactorNames being removed now in scran?

sce = computeSumFactors(sce)
sce = normalize(sce)
var.fit = trendVar(sce, method="loess", use.spikes=TRUE)

you will have errors:

Error in sizeFactorNames(object) : could not find function "sizeFactorNames"

in normalize, add centre_size_factors = FALSE avoid the error but you still have the same error when running trendVar. I think it is because trendVar check the size factors by .check_centered_SF(x, assay.type=assay.type).

Error in denoisePCA function

An error occurs while running denoisePCA function.

Error in (function (A, nv = 5, nu = nv, maxit = 1000, work = nv + 7, reorth = TRUE, :
BLAS/LAPACK routine 'DLASCL' gave error code -4

Did anybody have the similar issue before?

object ‘areSizeFactorsCentred’ is not exported by 'namespace:scater'

I have installed all the library successfully and they are also the latest version. However, I have this issue with load scran failed.

library(DropletUtils)
library(SingleCellExperiment)
library(scater)
library(SC3)
library(scran)
Error: package or namespace load failed for ‘scran’:
object ‘areSizeFactorsCentred’ is not exported by 'namespace:scater'
Best,
Yong

Can I cluster cells before normalization with PCA + kmeans?

Hi,

I noticed that the manual of quickCluster recommends using the igraph method for dataset with large amount of cells. PCA + kmeans is a very fast clustering method even for large dataset. I would like to known is it appropriate to cluster cells with PCA + kmeans (Will the performance of scran be considerably affected)?

Thanks in advance.

porblem using denoisePCA

I am trying to follow the tutorial

https://master.bioconductor.org/packages/release/workflows/vignettes/simpleSingleCell/inst/doc/work-3-tenx.html

everything works fine except when I ran this line

sce <- denoisePCA(sce, technical=new.trend, approximate=TRUE)

Giving me this error

Error in get(name, envir = asNamespace(pkg), inherits = FALSE) :
object 'get_default_block_maxlength' not found

Can you help me pleae?

Thanks

**R version 3.5.2 (2018-12-20)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.6 LTS

Matrix products: default
BLAS: /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/libopenblasp-r0.2.18.so

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=es_ES.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=es_ES.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=es_ES.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base

other attached packages:
[1] scran_1.10.2 EnsDb.Hsapiens.v86_2.99.0 ensembldb_2.5.3 AnnotationFilter_1.5.2
[5] GenomicFeatures_1.34.1 AnnotationDbi_1.44.0 scater_1.10.1 ggplot2_3.1.0
[9] dplyr_0.8.0.1 BiocFileCache_1.6.0 dbplyr_1.3.0 DropletUtils_1.2.2
[13] SingleCellExperiment_1.4.1 SummarizedExperiment_1.12.0 DelayedArray_0.8.0 matrixStats_0.54.0
[17] Biobase_2.42.0 GenomicRanges_1.34.0 GenomeInfoDb_1.18.2 IRanges_2.16.0
[21] S4Vectors_0.20.1 BiocGenerics_0.28.0 BiocParallel_1.16.6

loaded via a namespace (and not attached):
[1] ProtGenerics_1.14.0 bitops_1.0-6 bit64_0.9-7 progress_1.2.0
[5] httr_1.3.1 dynamicTreeCut_1.63-1 tools_3.5.2 irlba_2.3.3
[9] R6_2.4.0 HDF5Array_1.10.1 vipor_0.4.5 DBI_1.0.0
[13] lazyeval_0.2.1 colorspace_1.4-0 withr_2.1.2 tidyselect_0.2.5
[17] gridExtra_2.3 prettyunits_1.0.2 bit_1.1-14 curl_3.3
[21] compiler_3.5.2 BiocNeighbors_1.0.0 rtracklayer_1.41.3 labeling_0.3
[25] scales_1.0.0 rappdirs_0.3.1 stringr_1.4.0 digest_0.6.18
[29] Rsamtools_1.33.3 XVector_0.22.0 pkgconfig_2.0.2 limma_3.38.3
[33] rlang_0.3.1 rstudioapi_0.9.0 RSQLite_2.1.1 DelayedMatrixStats_1.3.4
[37] RCurl_1.95-4.12 magrittr_1.5 GenomeInfoDbData_1.2.0 Matrix_1.2-16
[41] Rcpp_1.0.0 ggbeeswarm_0.6.0 munsell_0.5.0 Rhdf5lib_1.4.2
[45] viridis_0.5.1 stringi_1.4.3 yaml_2.2.0 edgeR_3.24.3
[49] zlibbioc_1.28.0 rhdf5_2.26.2 plyr_1.8.4 grid_3.5.2
[53] blob_1.1.1 crayon_1.3.4 lattice_0.20-38 cowplot_0.9.4
[57] Biostrings_2.50.2 hms_0.4.2 locfit_1.5-9.1 pillar_1.3.1
[61] igraph_1.2.4 reshape2_1.4.3 biomaRt_2.37.3 XML_3.98-1.19
[65] glue_1.3.1 gtable_0.2.0 purrr_0.3.1 assertthat_0.2.0
[69] viridisLite_0.3.0 tibble_2.0.1 GenomicAlignments_1.17.3 beeswarm_0.2.3
[73] memoise_1.1.0 statmod_1.4.30**

References to `normalize`

Scattered references throughout docs to (defunct?) function normalize(), for example in multiBatchNorm() function docs

MNN correction generates NaN or Inf values

I found that after MNN correction , some genes were assign NaN or Inf values. Such values impeded further MNN correction ( in the nested setting ).
It seems that such NaN or Inf values are generates by zero counts genes. I am wondering that what is the reasonable way to handle these NaN or Inf values ?

modelGeneVar function missing

Good afternoon,

I'm trying to follow the workflow from http://bioconductor.org/packages/devel/bioc/vignettes/batchelor/inst/doc/correction.html for batch correcting two SCE. I have scRNAseq, scran, scater, and batchelor installed and loaded but I cannot find the modelGeneVar function to run! I've checked that this function exists from Scran (https://rdrr.io/github/MarioniLab/scran/src/R/modelGeneVar.R) and I tried to install from github for the latest version but it ends with:

`** R
** inst
** byte-compile and prepare package for lazy loading
Error: object ‘bsparam’ is not exported by 'namespace:BiocSingular'
Execution halted
ERROR: lazy loading failed for package ‘scran’

removing ‘/home/brian/R/x86_64-pc-linux-gnu-library/3.6/scran’
restoring previous ‘/home/brian/R/x86_64-pc-linux-gnu-library/3.6/scran’
Error: Failed to install 'scran' from GitHub:
(converted from warning) installation of package ‘/tmp/RtmpKn8SFk/filecd82737ce8f/scran_1.13.30.tar.gz’ had non-zero exit status
`
I'm working with an older version of HDF5Array (1.10.1) because Installing 1.12.2 is having compiler problems, hope this isn't causing the issue.

`gcc: error: "/home/brian/R/x86_64-pc-linux-gnu-library/3.6/Rhdf5lib/lib/libhdf5.a": No such file or directory
gcc: error: "/home/brian/R/x86_64-pc-linux-gnu-library/3.6/Rhdf5lib/lib/libsz.a": No such file or directory
/usr/share/R/share/make/shlib.mk:6: recipe for target 'HDF5Array.so' failed
make: *** [HDF5Array.so] Error 1
ERROR: compilation failed for package ‘HDF5Array’

removing ‘/home/brian/R/x86_64-pc-linux-gnu-library/3.6/HDF5Array’
restoring previous ‘/home/brian/R/x86_64-pc-linux-gnu-library/3.6/HDF5Array’
Error: Failed to install 'scran' from GitHub:
(converted from warning) installation of package ‘HDF5Array’ had non-zero exit status
`

Session Info is:
`> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.5 LTS

Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

Random number generation:
RNG: Mersenne-Twister
Normal: Inversion
Sample: Rounding

locale:
[1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_CA.UTF-8 LC_COLLATE=en_CA.UTF-8
[5] LC_MONETARY=en_CA.UTF-8 LC_MESSAGES=en_CA.UTF-8
[7] LC_PAPER=en_CA.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets
[8] methods base

other attached packages:
[1] BiocSingular_1.0.0 batchelor_1.0.1
[3] scRNAseq_1.10.0 scater_1.12.2
[5] ggplot2_3.2.1 scran_1.12.1
[7] SingleCellExperiment_1.6.0 SummarizedExperiment_1.14.1
[9] DelayedArray_0.10.0 BiocParallel_1.18.1
[11] matrixStats_0.55.0 Biobase_2.44.0
[13] GenomicRanges_1.36.1 GenomeInfoDb_1.20.0
[15] IRanges_2.18.3 S4Vectors_0.22.1
[17] BiocGenerics_0.30.0 Rhdf5lib_1.6.2
[19] Seurat_3.1.1

loaded via a namespace (and not attached):
[1] ggbeeswarm_0.6.0 Rtsne_0.15 colorspace_1.4-1
[4] ggridges_0.5.1 dynamicTreeCut_1.63-1 XVector_0.24.0
[7] BiocNeighbors_1.2.0 leiden_0.3.1 listenv_0.7.0
[10] npsurv_0.4-0 remotes_2.1.0 ggrepel_0.8.1
[13] codetools_0.2-16 splines_3.6.1 R.methodsS3_1.7.1
[16] lsei_1.2-0 zeallot_0.1.0 jsonlite_1.6
[19] ica_1.0-2 cluster_2.1.0 png_0.1-7
[22] R.oo_1.22.0 uwot_0.1.4 sctransform_0.2.0
[25] BiocManager_1.30.7 compiler_3.6.1 httr_1.4.1
[28] dqrng_0.2.1 backports_1.1.5 assertthat_0.2.1
[31] Matrix_1.2-17 lazyeval_0.2.2 limma_3.40.6
[34] htmltools_0.4.0 tools_3.6.1 rsvd_1.0.2
[37] igraph_1.2.4.1 gtable_0.3.0 glue_1.3.1
[40] GenomeInfoDbData_1.2.1 RANN_2.6.1 reshape2_1.4.3
[43] dplyr_0.8.3 Rcpp_1.0.2 vctrs_0.2.0
[46] gdata_2.18.0 ape_5.3 nlme_3.1-141
[49] DelayedMatrixStats_1.6.1 gbRd_0.4-11 lmtest_0.9-37
[52] stringr_1.4.0 globals_0.12.4 lifecycle_0.1.0
[55] irlba_2.3.3 gtools_3.8.1 statmod_1.4.32
[58] future_1.14.0 edgeR_3.26.8 zlibbioc_1.30.0
[61] MASS_7.3-51.4 zoo_1.8-6 scales_1.0.0
[64] RColorBrewer_1.1-2 curl_4.2 reticulate_1.13
[67] pbapply_1.4-2 gridExtra_2.3 stringi_1.4.3
[70] caTools_1.17.1.2 bibtex_0.4.2 Rdpack_0.11-0
[73] SDMTools_1.1-221.1 rlang_0.4.0 pkgconfig_2.0.3
[76] bitops_1.0-6 lattice_0.20-38 ROCR_1.0-7
[79] purrr_0.3.2 htmlwidgets_1.5.1 cowplot_1.0.0
[82] tidyselect_0.2.5 RcppAnnoy_0.0.13 plyr_1.8.4
[85] magrittr_1.5 R6_2.4.0 gplots_3.0.1.1
[88] withr_2.1.2 pillar_1.4.2 fitdistrplus_1.0-14
[91] survival_2.44-1.1 RCurl_1.95-4.12 tibble_2.1.3
[94] future.apply_1.3.0 tsne_0.1-3 crayon_1.3.4
[97] KernSmooth_2.23-16 plotly_4.9.0 viridis_0.5.1
[100] locfit_1.5-9.1 grid_3.6.1 data.table_1.12.4
[103] metap_1.1 digest_0.6.21 tidyr_1.0.0
[106] R.utils_2.9.0 RcppParallel_4.4.4 munsell_0.5.0
[109] beeswarm_0.2.3 viridisLite_0.3.0 vipor_0.4.5 `

Any assistance in being able to use the modelGeneVar would be greatly appreciated!

Best,
-Frances

Installation scran package for R version 3.5.0

Dear Marioni's group,

I tried to install package scran and test the function fastMNN using the guide in this page:
https://rdrr.io/github/MarioniLab/scran/
install.packages("devtools")
library(devtools)
install_github("MarioniLab/scran")

However, I used Rstudio-server and R version 3.5.0. And it failed
to install this package, the log file is:
package ‘kmknn’ is not available (for R version 3.5.0)
package ‘scran’ is not available (for R version 3.5.0)

Have you ever tested your package using R version 3.5.x?
Thanks a lot,
Best Regards,
Hoa Tran

Finding C++ code for cxx_smooth_gaussian_kernel

Apologies as I am not a regular R user. But I am having a hard time finding the code for cxx_smooth_gaussian_kernel which you use in the compute.correction.vectors function of mnnCorrect.R. It seems that it may be registered in the init.cpp file:

./src/init.cpp:57: REGISTER(smooth_gaussian_kernel, 4),
but that still doesn't help me find the actual code. Thanks!

Add type="jaccard" to BuildSNNGraph

Would it be possible to add an option to weight edges in the SNN graph by the Jaccard index between the two sets of nearest neighbours? This would replicate how Seurat builds its SNN graph.

Exporting the batch corrected data

Hi,

I am trying the MNN approach following the tutorial at http://bioconductor.riken.jp/packages/release/workflows/vignettes/simpleSingleCell/inst/doc/work-5-mnn.html

I would like to know how can I export the batch corrected expression values from sce. Are sce@assays$data$logcounts the batch corrected values ?

Error with computeSumFactors

When QC'ing my 10X genomics scRNA-seq data, I've followed the tutorial as found here: https://f1000researchdata.s3.amazonaws.com/manuscripts/10712/3370b003-e980-4395-b641-daad385c65d3_9501_-__aaron_lun_v2.pdf?doi=10.12688/f1000research.9501.2

Everything proceeds normally until the normalization step and then receive the following error

Error in .local(x, ...) : unknown SEXP type for SCESet object

I'm currently using R v3.4.1 and have downloaded/installed everything from stable releases of scater and scran. Is this an issue of the C++ underneath returning something that the R v3.4.2 stable release version of SingleCellExperiment or scater (I'm creating a newSCESet with the count matrix)?

Thanks for your time!

fastMNN compute.variances

In the fastMNN function description I read:

If compute.variances=TRUE, the function will compute the percentage of variance that is parallel to the average correction vectors at each merge step.

As I understand it, for n batches I should get n-1 percentages as there are n-1 merge steps. For example, for 2 batches there is a single merge step and I should get 1 percentage. What I get from the function though is n percentages, that is 2 percentages in the case of 2 batches.

Will you please explain this?

Convenience function for marker selection - unique global markers

Some sort of function to do above could be implemented that looks at pairwise lfc's to pick the top "globally unique" markers, to get nice heatmaps in less tries.

Something like this is what I currently do:

## after running findMarkers ...

## Select unique-ish markers
top_markers <- lapply(markers, function(x, lfc_threshold, leniency) {
    ## lfc_threshold = min lfc per cluster
    ## leniency = number of clusters allowed to not meet lfc_threshold
    cols <- grepl('logFC', colnames(x))
    mat <- apply(x[, cols], 2, function(f) { f > 1.25 })
    rowsums <- apply(mat, 1, sum)
    filt <- rowsums > sum(cols) - 3
    rownames(x)[filt]
})
    
top_markers <- sort(unique(unlist(top_markers)))
top_markers <- top_markers[!grepl('MT-|^RP', top_markers)]

Try out the bootstrapping function

Note to self: add a draft function for bootstrapping based on the idea in SingleCellThoughts. Try not to use mean mapping, but instead use the number of co-clustered cells between pairs of clusters.

makeTechTrend() identified many housekeeping/ribosome genes as variable genes

Hi,
Following the new updated simpleSingleCell workflow , I use makeTechTrend() to fit mean-variance trend for my 10X data. However, after decomposeVar() from the new fit, many housekeeping genes, which always show highest expression, are detected as variable genes with very large biological component.
I wonder is there any way to avoid this ? Or should I trust this result ?
The housekeeping gene list I used are sourced from http://science.sciencemag.org/content/352/6282/189

hk.genes <- c("ACTB
B2M
HNRPLL
HPRT
PSMB2
PSMB4
PPIA
PRPS1
PRPS1L1
PRPS1L3
PRPS2
PRPSAP1
PRPSAP2
RPL10
RPL10A
RPL10L
RPL11
RPL12
RPL13
RPL14
RPL15
RPL17
RPL18
RPL19
RPL21
RPL22
RPL22L1
RPL23
RPL24
RPL26
RPL27
RPL28
RPL29
RPL3
RPL30
RPL32
RPL34
RPL35
RPL36
RPL37
RPL38
RPL39
RPL39L
RPL3L
RPL4
RPL41
RPL5
RPL6
RPL7
RPL7A
RPL7L1
RPL8
RPL9
RPLP0
RPLP1
RPLP2
RPS10
RPS11
RPS12
RPS13
RPS14
RPS15
RPS15A
RPS16
RPS17
RPS18
RPS19
RPS20
RPS21
RPS24
RPS25
RPS26
RPS27
RPS27A
RPS27L
RPS28
RPS29
RPS3
RPS3A
RPS4X
RPS5
RPS6
RPS6KA1
RPS6KA2
RPS6KA3
RPS6KA4
RPS6KA5
RPS6KA6
RPS6KB1
RPS6KB2
RPS6KC1
RPS6KL1
RPS7
RPS8
RPS9
RPSA
TRPS1
UBB
")

hk.genes <- strsplit(hk.genes, "\n")[[1]]
keep <- rowSums(scater::calcIsExprs(sce, detection_limit = 0, exprs_values = "counts")) >= 30
sce <- sce[keep,]
new.trend <- makeTechTrend(x=sce)
fit <- trendVar(sce.sub, loess.args = list(span = .05), design = NULL, min.mean = 0.05, use.spikes = F)
fit$trend <- new.trend

dec <- decomposeVar(fit=fit)
 plot(dec$mean, dec$total, pch=16, cex=0.6, xlab="Mean log-expression",ylab="Variance of log-expression")
  points(dec[hk.genes, "mean"], dec[hk.genes, "total"], col="red", pch=16)
  curve(fit$trend(x), col="dodgerblue", add=TRUE, lwd=2)
top.dec <- dec[order(dec$bio, decreasing = T),]
as.data.frame(top.dec[rownames(top.dec) %in% hk.genes, ])

Most of data are changed into NAN

Dear John C Marioni
Our team is doing research related to umbilical cord blood. When we tried to use MNNs to remove the batch effect in our RNA data from single-cell RNA sequencing(10X Genomics),we found that more than 90% of data were changed into NAN. Then we changed k value and used variant genes to subset the rows, but this issue still exists. Here are the code.

##first part
library(dplyr)
library(Seurat)
exp=readRDS("ucb1.exp")
exp.db.matrix<-matrix(as.double(as.matrix(exp)), nrow = nrow(exp))
ucb1<-CreateSeuratObject(raw.data = exp.db.matrix,is.expr = 0.8, min.cells = 30, min.genes =400, project = 'UCB')
mito.genes <- grep(pattern = "^MT-", x = rownames(x = ucb1@data), value = TRUE)
percent.mito <- Matrix::colSums(ucb1@raw.data[mito.genes, ])/Matrix::colSums(ucb1@raw.data)
ucb1 <- AddMetaData(object = ucb1, metadata = percent.mito, col.name = "percent.mito")
VlnPlot(object = ucb1, features.plot = c("nGene", "nUMI", "percent.mito"), nCol = 3, group.by = "orig.ident")
ucb1<- FilterCells(object = ucb1, subset.names = c("nGene", "percent.mito"),
                   low.thresholds = c(200, -Inf), high.thresholds = c(4000, 0.1))
Cont_MFs2 <- as.matrix(ucb1@data)
write.table(Cont_MFs2,file = "fliter.1.csv")
##second part
library(scran)
library(FNN)
Cont_MFs2 <- read.table("fliter.1.csv")
Cont_MFs2=as.matrix(Cont_MFs2)
Cont_sce <- SingleCellExperiment(assays = list(counts = Cont_MFs2))
Cont_sce <- scran::computeSumFactors(Cont_sce)
Cont_sce <- normalize(Cont_sce)
Cont_sce_norm <- as.matrix(exprs(Cont_sce))
Cont_norm_nolog <- (2^(Cont_sce_norm)) -1
Cont_norm_ln <- log(Cont_norm_nolog + 1)
save(Cont_norm_ln,file = "f1_normalize.Robj")
##final part
name1= load("f1_normalize.Robj")
name2=load("f2_normalize.Robj")
ref=get(name1)
ref=as.matrix(ref)
data=get(name2)
genef=read.table("var.txt")
vargene=genef[,1]
name=row.names(ref)
t1=data.frame(gene=name)
data1=as.data.frame(data)
data1$gene=row.names(data)
mer1=merge(t1,data1,by='gene',all=T)
mer1[is.na(mer1)] <- 0
mer2 <- mer1[,-1]
row.names(mer2) <- as.character(mer1$gene)
data<- as.matrix(mer2)
Xmnn <- scran::mnnCorrect(ref,data, k=37, sigma=0.1, cos.norm.in=FALSE, cos.norm.out=FALSE, var.adj=TRUE,subset.row=vargene)
saveRDS(Xmnn$corrected[[2]],"XmnnCorrected.2.rds")

Best regards!

object ‘normalizeCounts’ is not exported by 'namespace:scater'

Hi I got the below error message when load scran package:

No methods found in package ‘DelayedArray’ for request: ‘write_block_to_sink’ when loading ‘DelayedMatrixStats’
Error: package or namespace load failed for ‘scran’:
 object ‘normalizeCounts’ is not exported by 'namespace:scater'

Any suggestions?

findMarkers: Error in FUN(...) : class name has no 'package' attribute

I am pretty new to R and have no idea why I am getting the error in the title of this issue. Below is my code:

library(scran)
library(purrr)
counts <- read.csv('data.csv', header=TRUE, sep=',')
sampletypes <- unlist(map(strsplit(colnames(counts), '_'), {function(x) x[1]}))
findMarkers(counts, sampletypes)
Error in FUN(...) : class name has no 'package' attribute

I looked at one of the other issues and checked > BiocManager::valid() which gave me the following:

* sessionInfo()

R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS

Matrix products: default
BLAS: /home/tobias/projects/aim1_mutational_signatures/envs/r_env/lib/R/lib/libRblas.so
LAPACK: /home/tobias/projects/aim1_mutational_signatures/envs/r_env/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods  
[9] base     

other attached packages:
 [1] purrr_0.3.0                 scran_1.10.2                SingleCellExperiment_1.4.1 
 [4] SummarizedExperiment_1.12.0 DelayedArray_0.8.0          matrixStats_0.54.0         
 [7] Biobase_2.42.0              GenomicRanges_1.34.0        GenomeInfoDb_1.18.1        
[10] IRanges_2.16.0              S4Vectors_0.20.1            BiocGenerics_0.28.0        
[13] BiocParallel_1.16.6         RevoUtils_11.0.1            RevoUtilsMath_11.0.0       

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.0               locfit_1.5-9.1           lattice_0.20-38         
 [4] assertthat_0.2.0         R6_2.3.0                 plyr_1.8.4              
 [7] dynamicTreeCut_1.63-1    ggplot2_3.1.0            pillar_1.3.1            
[10] zlibbioc_1.28.0          rlang_0.3.1              lazyeval_0.2.1          
[13] rstudioapi_0.9.0         Matrix_1.2-15            BiocNeighbors_1.0.0     
[16] statmod_1.4.30           stringr_1.4.0            igraph_1.2.3            
[19] RCurl_1.95-4.11          munsell_0.5.0            HDF5Array_1.10.1        
[22] compiler_3.5.1           vipor_0.4.5              pkgconfig_2.0.2         
[25] ggbeeswarm_0.6.0         tidyselect_0.2.5         tibble_2.0.1            
[28] gridExtra_2.3            GenomeInfoDbData_1.2.0   edgeR_3.24.3            
[31] viridisLite_0.3.0        crayon_1.3.4             dplyr_0.7.8             
[34] bitops_1.0-6             grid_3.5.1               gtable_0.2.0            
[37] magrittr_1.5             scales_1.0.0             stringi_1.2.4           
[40] XVector_0.22.0           reshape2_1.4.3           viridis_0.5.1           
[43] bindrcpp_0.2.2           limma_3.38.3             scater_1.10.1           
[46] DelayedMatrixStats_1.4.0 Rhdf5lib_1.4.2           tools_3.5.1             
[49] glue_1.3.0               beeswarm_0.2.3           BiocVersion_3.8.0       
[52] yaml_2.2.0               colorspace_1.4-0         rhdf5_2.26.2            
[55] BiocManager_1.30.4       bindr_0.1.1             

Bioconductor version '3.8'

  * 1 packages out-of-date
  * 0 packages too new

create a valid installation with

  BiocManager::install("curl", update = TRUE, ask = FALSE)

more details: BiocManager::valid()$too_new, BiocManager::valid()$out_of_date

Warning messages:
1: In file(con, "r") :
  URL 'https://bioconductor.org/config.yaml': status was 'Problem with the SSL CA cert (path? access rights?)'
2: 1 packages out-of-date; 0 packages too new

marionilab / scran Goto Github PK

scran's People

Contributors

Stargazers

Watchers

Forkers

scran's Issues

A numeric scalar specifying the threshold beyond which neighbours are to be ignored when computing correction vectors. Each threshold is defined in terms of the number of median distances.

Current code

Option 1: Currently compute t-statistic

Option 2: edit scran to support returning the statistics

Recommend Projects

Recommend Topics

Recommend Org

Option 2: edit `scran` to support returning the statistics