singler-inc / singler Goto Github PK

View Code? Open in Web Editor NEW

166.0 12.0 19.0 1.14 MB

Clone of the Bioconductor repository for the SingleR package.

Home Page: https://bioconductor.org/packages/devel/bioc/html/SingleR.html

License: GNU General Public License v3.0

R 53.76% C++ 44.08% Dockerfile 0.13% Shell 0.09% TeX 1.82% C 0.12%

singler bioconductor

singler's People

Contributors

Stargazers

Watchers

Forkers

friedue jiangst akhileshkaushal zehualilab chenmengpin edegrace j-andrews7 satellite119 juzheng87 11918287 kumar-4u dtm2451 yuw444 marcelosua jliu678 charlesshen-bot zhangkaicr waltno petehaitch

singler's Issues

could not find function "HumanPrimaryCellAtlasData"

Hi there,
I am using SingleR to do cell annotation, but when loading the built reference Human Primary cell Atlas data, I met an error as follow:

library(SingleR)
hpca.se <- HumanPrimaryCellAtlasData()
Error in HumanPrimaryCellAtlasData() :
could not find function "HumanPrimaryCellAtlasData"

I have correctly install SingleR package. Could you tell me how to fix it?

Basic Usage From Seurat Object?

Hello,

I realize this package is evolving quickly, but I'm wondering if you can comment on this, which is trying ot start with a seurat object and run using HPCA data:

seuratObj <- readRDS(file = 'Test.seurat0.rds')
singler <- OOSAP:::GenerateSingleR(seuratObj = seuratObj0)

seuratObj <- readRDS(file = 'seurat.rds')
ref <- SingleR::HumanPrimaryCellAtlasData()

#Subset genes:
genesPresent <- intersect(rownames(seuratObj), rownames(ref))
ref <- ref[genesPresent,]
seuratObj <- seuratObj[genesPresent,]
sce <- as.SingleCellExperiment(seuratObj)

sce <- scater:::logNormCounts(sce, log = T)
pred <- SingleR::SingleR(test = sce, ref = ref, labels = ref$label.main, assay.type.ref = 'normcounts')

The HPCA data only contains the assay type 'normcounts', not 'logcounts' like the pred() default, so I assume we need to switch? I'm transforming the raw seurat counts using logNormCounts(log=T), which puts them into the normcounts() assay slot.

Thanks for any help/suggestions.

Changes for Bioconductor submission

Tagging @dviraran, @dtm2451.

Algorithm

I have tried to keep the output as close as possible to the original. A few minor changes have been made to improve speed:

The quantile is rounded to the nearest cell. This should have negligible impact on the scores but allows us to use a different algorithm to compute the quantile, which should greatly improve speed. Still quadratic, but it should have a much better scaling.
The fine-tuning step for genes="sd" does not use the 500 gene cut-off, instead just taking all genes with standard deviations above the SD threshold. This was performed simply to make it easier to implement (and also to make it consistent with the initial marker gene detection prior to fine-tuning). Note that ease of implementation is important; C++ gets pretty intense.

In my hands, the original SingleR on the pancreas data example takes 500 seconds on 7 cores, while this modified SingleR takes 125 seconds on 1 core. It should be possible to achieve greater speed by parallelizing the fine-tuning here, I just haven't gotten around to it.

Functions

As discussed, I have removed all functions except the most critical ones. This made the entire package much simpler to test and document, and it avoids problems with feature creep.

Some of the visualization functions belong somewhere else, e.g., Dittoseq?
Some of the data structure-related functions are no longer necessary if we're using the SingleCellExperiment as the data container. Note that this doesn't preclude use with non-SCE-based workflows (e.g., Seurat), the functions will happily accept raw count/expression matrices.
I wasn't sure what the SingScore and GSVA-related functions were meant for, so I took them out for the time being.

(IMO, feature creep is a real problem - looking at https://github.com/dviraran/SingleR/issues suggests that most of the issues have nothing to do with the SingleR() function! Many of them seem to be problems with converting to/from data structures; better to shift the responsibility for answering those questions to the maintainers who defined those data structures.)

In addition, the SingleR() function itself can be called by advanced users as two separate components, trainSingleR() and classifySingleR(). This allows the training cost (i.e., marker gene detection, setting up the nearest neighbor indices) to be paid once for multiple rounds of classification with different test datasets.

Documentation

There's now a lot more of it, concentrated in fewer functions. The vignette has also been stripped down so that it will run within the 5 minute timeframe for Bioconductor builds.

Unit tests

Now there are some.

Data

This is the last remaining outstanding issue (we should open a new Issue to discuss this). The various reference datasets have been stripped out to reduce the repository size (cloning took minutes and 1 GB of space!) and the package size. Instead, I propose that data be stored in Bioconductor's ExperimentHub, where it is pulled down as needed by each user and cached locally. This separates code from data, reduces the package/repo size, and avoids cluttering the user's R installation with reference datasets they might not ever use.

To put data in ExperimentHub, we need to describe how the reference datasets were obtained. I don't want to have an entire treatise on scraping GEO for bulk RNA-seq data, but we should be able to point interested parties to a repository of code that did it. Is that code somewhere in the NI paper? If so, our description can simply point to the NI paper, pull objects from the data directory of https://github.com/dviraran/SingleR, and clean them up for EHub upload.

To do:

Need to parallelize various functions with BiocParallel. This allows the same type of parallelization as mcapply, but also allows users to parallelize with SNOW, Slurm, LSF, etc.
Need to add some functions in BiocNeighbors to reduce memory usage and make SingleR() slightly faster. Specifically, this avoids allocating a matrix of distances to neighbors when only the distance to the kth neighbor is of interest.

Interpreting first.label vs. label?

Hello,

The dataframe returned by SingleR has 'label', first.label and score. I find instances where cells have the highest score as 'NK_cell', label=T_cells and first.label=NK_cell. this is being generated using hpca as the reference. Do you have a suggestion on how I should interpret this?

Related to that: do you expect that users might interrogate the score matrix and set some kind of threshold, or are you assuming SingleR has internally done this and one would interpret any call to be of good quality?

pseudoBulk as a reference

Hi Aaron,
cool to see you're involved in this project!
I want to create reference data to use with SingleR, is it ok to generate pseudobulks from an annotated SC dataset? And would I need to TPM the pseudobulks to create the reference or what normalization should I use then?
thanks!

Error in nn.d^2 : non-numeric argument to binary operator

test and ref are SingleCellExperiment from seurat objects

scDWSpr <- as.SingleCellExperiment(ref)
sc10x.se <- as.SingleCellExperiment(test)

filtered by common genes

common <- intersect(rownames(sc10x.se),rownames(scDWSpr))
scDWSpr <- scDWSpr[common,]
sc10x.se <- sc10x.se[common,]

singler <- SingleR(sc10x.se,ref=scDWSpr,labels=scDWSpr$ident)

...gives error:

Error in nn.d^2 : non-numeric argument to binary operator
In addition: Warning message:
In (function (jobs, data, centers, info, distance, k, query, get.index, :
tied distances detected in nearest-neighbor calculation

Expression matrices

Where can I find the expression matrices for the built-in references such as HumanPrimaryCellAtlasData and MouseRNAseqData? I'm interested in trying to add a few more cell types to the existing matrices to generate new references. It would be nice if you already have the matrices somewhere. Otherwise, I would have to grab the matrix from each data source.

Thanks!

ImmGenData has a NA row name

I don't know where it came from, but we should filter this out.

plotScoreHeatmap Error

HI,

I created a singleR object from Seurat3 integrated objects by as.SingleCellExperiment by;

ap.int.singler.im<-SingleR(ap.int.sc, ref=immgen.se, labels = immgen.se$label.main, assay.type.ref = "logcounts" )

Then, I run plotScoreHeatmap() and got a following error;

Error in UseMethod("depth") : 
  no applicable method for 'depth' applied to an object of class "NULL"

Here is str() of my singleR objects;

Formal class 'DFrame' [package "S4Vectors"] with 6 slots
  ..@ rownames       : chr [1:18423] "AAACCCACACTGTGTA_1" "AAACCCACATTAAAGG_1" "AAACCCAGTGCATGTT_1" "AAACCCAGTTAGTTCG_1" ...
  ..@ nrows          : int 18423
  ..@ listData       :List of 4
  .. ..$ scores       : num [1:18423, 1:20] 0.334 0.133 0.246 0.107 0.128 ...
  .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. ..$ : NULL
  .. .. .. ..$ : chr [1:20] "Macrophages" "Monocytes" "B cells" "DC" ...
  .. ..$ first.labels : chr [1:18423] "Fibroblasts" "Endothelial cells" "Fibroblasts" "Epithelial cells" ...
  .. ..$ tuning.scores:Formal class 'DFrame' [package ""] with 6 slots
  .. .. .. ..@ rownames       : NULL
  .. .. .. ..@ nrows          : int 18423
  .. .. .. ..@ listData       :List of 2
  .. .. .. .. ..$ first : num [1:18423] 0.336 0.34 0.542 0.217 0.115 ...
  .. .. .. .. ..$ second: num [1:18423] 0.0716 0.2805 0.4848 -0.0373 0.1129 ...
  .. .. .. ..@ elementType    : chr "ANY"
  .. .. .. ..@ elementMetadata: NULL
  .. .. .. ..@ metadata       : list()
  .. ..$ labels       : chr [1:18423] "Fibroblasts" "Endothelial cells" "Fibroblasts" "Epithelial cells" ...
  ..@ elementType    : chr "ANY"
  ..@ elementMetadata: NULL
  ..@ metadata       : list()

I see two slots having NULL. I'd appreciate any pointers on this.

Thanks.

Cannot use Custom References

I have some custom references that I made with the previous version of SingleR.
However when I try to use them with SingleR 1.0.4 from Bioconductor I get the following error:

Error in FUN(X[[i]], ...) : 'ref' must have row names

I tried to look if there's a way to build references using the new version of SingleR, but haven't found much.
Should I stick with the old legacy version?

Thanks!

How to adjust the plot with the plotScoreHeatmap？

I try to do the plotScoreHeatmap in different ways as the previous version of singleR. as below:
SingleR.DrawHeatmap(singler$singler[[1]]$SingleR.single.main, top.n = Inf,
clusters = singler$meta.data$orig.ident)

And how to show the top 50 cell types which can give me more details :
SingleR.DrawHeatmap(singler$singler[[1]]$SingleR.single, top.n = 50,
clusters = singler$meta.data$orig.ident)

NovershternHematopoieticData

Hello,

An error occurs when I using "NovershternHematopoieticData()" as a reference.
It works fine with the "label.main" option, but when I set the "label.fine" option I get the following mesage:
Error in (function (Exprs, scores, References, quantile, tune_thresh, :
Not compatible with requested type: [type=NULL; target=integer].

Other references such as HumanPrimaryCellAtlasData() or BlueprintEncodeData() works fine with both "label.main" and "label.fine".

Thanks for your help.

Applying more than one reference set to SingleR analysis

Hi all,
In order to subject my data to a bigger reference set, I would like to combine some reference sets together: HumanPrimaryCellAtlasData(), BlueprintEncodeData() and MonacoImmuneData(). Is there an option to do this?

Best Regards,
Erik

Re-use the download reference

Hi,

I downloaded reference data successfully on a server. I'd like to run SingleR on another server which failed to download the reference data several times. Can I copy the content of ~/.cache/ExperimentHub to the failed server and use the cache in the SingeR? How should I set the option during the running?

Thank you very much.

Best,

Fei

Cellular annotation with multiple references

Hi,
When providing a list of references for cellular annotation, are scores from fine tunning considered?

I am trying to annotate cells in a Seurat object against multiple references (BlueprintEncode and HumanPrimaryCellAtlas). Previously, I have noticed ambiguous annotations for Tregs and Tcell-gamma delta cells, presumably since BlueprintEncode doesn't include the Tcell-gamma delta samples.

Reading from the current manual (SingleR_1.1.6 ), I see a convenient option to provide references in a list and obtain cellular annotation with the top score across references. Since scores from only the first step are comparable (before fine.tuning) across references, then how are we able to obtain label.fine annotations in the output? Is the fine.tune step even executed when providing a list of references?
If possible, kindly share further details.

Code:

setup reference datasets

list.ref <- list()
list.ref[["bp.encode"]] <- BlueprintEncodeData()
list.ref[["hpca"]] <- HumanPrimaryCellAtlasData()

labels

list.labels <- lapply(list.ref, function(x) x$label.fine)

run SingleR

pred <- SingleR(test = SeuratObject@assays$SCT@counts,
ref = list.ref, labels = list.labels, fine.tune = T)
Best,
Namit

Label doesn't look right

Howdy, can I get some input on some data I have please:

I'm having a few labels which confuse me a little. A good example is the ID to the far right of this heatmap. This data was run in cluster mode. The far right one after fine tuning get's the label LE... the SM correlation seems so strong though that it seems unlikely that LE is the correct assignment. Even on the left there are a bunch of clusters which look like it should maybe be Hillock, but is getting a LE assignment.

FYI... I am trying to use a human reference to ID a mouse target, so the human genes are filtered and converted to orthologous mouse genes. That with filtering only for shared genes only leaves me about 14K genes. I know this is really reduced... but the first label is convincing and it seems unconvincing.

Thanks!
Gervaise

Error in intI(i, n = x@Dim[1], dn[[1]], give.dn = FALSE) : invalid character indexing

Happens on this line in .build_trained_index:

sr.out <- .scaled_colranks_safe(current[common, , drop = FALSE])

In my case, this occurs because common is not a subset of row.names(current). This occurs because I passed genes in as the result of scran::getTopMarkers and then the ref SCE gets subset to include genes in common with the test SCE.

PR on way.

Using SingleR with clustered data

Hello,

I've successfully used the SingleR function to assign cell-types using the "single" method, but am not succeeding with the "cluster" method. The clustering is done in Seurat and the object is imported as a SingleCellExperiment.

sample.pred.cluster.dice <- SingleR(test = sample.dice.sce, ref = dice.se.common, labels = dice.se.common$label.fine, fine.tune = TRUE, method = "cluster", clusters = "seurat_clusters")

Here is the error:

Error in .local(x, group, reorder, na.rm, ...) :
incorrect length for 'group'

Thanks for your help!

colnames not showing in plotScoreHeatmap

Despite setting plotScoreHeatmap(..., show_colnames = TRUE) to try and pass the pheatmap argument, currently does not allow the override to occur. Makes interpreting the scores heatmap difficult without the colnames (although understandable why you'd want it off for single-cell based score heatmaps). To fix this, had to go into the function and manually set show_colnames to TRUE.

## hotfix to make colnames appear
plotScoreHeatmap <- function(...) {
...
args <- list(mat = scores[, order, drop = FALSE], border_color = NA,
    show_colnames = TRUE, clustering_method = "ward.D2",
    cluster_cols = cluster_cols, breaks = breaks, ...)
...
}

Index out of bounds when subseting using common geneset

Hi, using SingleR v1 from bioconductor. When using a custom classifier, I run intersect() between the classifer and the set to annotate, but then when running

SingleR(test = logcounts(xxxxx)[ common_genes, ], 
             ref = loaded_classifier[ common_genes, ],
              ....)

I get the error

Error in SummarizedExperiment:::.SummarizedExperiment.charbound(subset,  : 
  index out of bounds: gene1 gene2 gene3

Testing to make sure whether the rownames in the intersect really are in both the classifier and the dataset shows that yes, all rownames are present. Only when running the SingleR function does an index out of bounds error get thrown. Thoughts/suggestions??

Applying of Seurat cell-clustering analysis in SingleR

Hi All!
I'm currently working on my first project of Single Cell analysis and it will be great to have your advice.
I did all the necessary analysis of mt data with Surat - including combining 5 data sets, filtering the cells, normalizing and scaling the expression values, etc. I finally cluster the pre-processed data and found 25 cell clusters, now I need to annotate them.
As I understand from the vignette, I should provide SingleR with the counts. Two questions:
1, Is there any option to use the differential expressed marker gene output of Seurat to insure that SingleR analysis will define the same 25 clusters?
2. Which pre-processing is required for the input data?

Best,
Erik

Not able to download reference data from Bioconductor when using SingleR

I am trying to use SingleR for analysis of my dataset and hitting a roadblock accessing the reference data. I cannot download within R from Bioconductor due to an expired license. Can you suggest a work around?

hpca.se <- HumanPrimaryCellAtlasData()

Error in .util_download(x, rid[i], proxy, config, "bfcadd()", ...) : 
  bfcadd() failed; see warnings()
In addition: Warning messages:
1: download failed
  web resource path: ‘https://experimenthub.bioconductor.org/metadata/experimenthub.sqlite3’
  local file path: ‘/wynton/home/fong/bkeenan/.cache/ExperimentHub/1de062c5e7265_experimenthub.sqlite3’
  reason: Peer's Certificate has expired. 
2: bfcadd() failed; resource removed
  rid: BFC6
  fpath: ‘https://experimenthub.bioconductor.org/metadata/experimenthub.sqlite3’
  reason: download failed

Adding SingleR cluster annotations back into Seurat Object

Hi There!

I am currently attempting to add the SingleR cluster annotations back into my original Seurat Object (on which I've already run UMAP clustering and have 14 cell clusters), but unfortunately it does not seem to be working (the metadata column just outputs <NA>).

This is my current code:

#convert Seurat object into SingleCellExperiment
matrix <- as.SingleCellExperiment(p1_1)

#load immune cell database
imm <- DatabaseImmuneCellExpressionData()

#Grab the common genes between the sets.
common <- intersect(rownames(matrix), rownames(imm)) imm <- imm[common,] matrix <- matrix[common,]

#Run SingleR
singler <- SingleR(test = matrix, ref = imm, labels = imm$label.fine, method = 'cluster', assay.type.ref = 'logcounts', clusters = matrix$seurat_clusters)

#Add SingleR clusters back into Seurat object
p1_1[['singler_clusters']] <- singler$labels[match(p1_1[[]][['seurat_clusters']], singler$first.labels)]

I believe the issue is with the last part where I am trying to add the SingleR assigned clusters back into my original Seurat object. The readme file has this piece of code:

seurat.obj[["SingleR.cluster.labels"]] <- singler.results$labels[match(seurat.obj[[]][["my.input.clusters"]], singler.results$clusts)]]

But when I look at my 'singler' object, I have no 'clusts' variable (which is why I just used 'first.labels'). And once again, after I run the above code, my newly defined 'singler_clusters' metadata column is blank!

Any help would be greatly appreciated! Many thanks.

CreateSinglerObject function

Hi,

CreateSinglerObject function has been removed ,could you tell me how to do in following code:

singler <- CreateSinglerObject(counts=counts,
                               project.name="excelerate course", # choose
                               min.genes = 200, # ignore cells with fewer than 200 transcripts
                               technology = "CEL-Seq2", # choose
                               species = "Human",
                               citation = "Schelker et al. 2017", # choose
                               ref.list = list(hpca=hpca, bpe=blueprint_encode),
                               normalize.gene.length = FALSE,        # needed for full-length

Can't create SummarizedExperiment object

Commit I used: e08fcba

If I try hpca.se <- SingleR::BlueprintEncodeData() I get the following error:

Using temporary cache /tmp/RtmpxUy46B/BiocFileCache
snapshotDate(): 2019-04-29
Using temporary cache /tmp/RtmpxUy46B/BiocFileCache
Error in .local(x, i, j, ...): 'i' must be length 1
Traceback:

1. SingleR::BlueprintEncodeData()
2. .create_se(file.path("blueprint_encode", version), assays = "logcounts", 
 .     rm.NA = rm.NA, has.rowdata = FALSE, has.coldata = TRUE)
3. hub[hub$rdatapath == file.path(host, paste0(a, ".rds"))][[1]]
4. hub[hub$rdatapath == file.path(host, paste0(a, ".rds"))][[1]]
5. .local(x, i, j, ...)
6. stop("'i' must be length 1")

The same for hpca and the other references. As I can't update to the bioconductor devel I installed this directly from github. Can this be an issue?
Actually I want to apply SingleR to an Seurat object, but I am not quite sure on how. In the old SingleR version I used CreateSinglerObject on the Seurat counts....

Protect against NAs

Remove NAs from incoming data in all *SingleR() functions.

Prune NAs from the reference data sets, tagging @friedue.

HI Can the new singleR do the same T-sne plot as the old one?

I used the old version of the singleR. I need the T-sne plot function.
Can the new singleR do the same t-SNE plot as the old one?
And how can I do it? I use the seurat a lot, but I did not have experience to use scater.
Thank you very much.

downlaod

mouse_1=ImmGenData()
Error in curl::curl_fetch_memory(url, handle = handle) :
Timeout was reached: [experimenthub.bioconductor.org] Connection timed out after 10001 milliseconds

how to fix it?

Package dependencies origin

I was trying to follow the vignette when I got an error because scRNAseq does not provide the function LaMannoBrainData. I then tried to install BiocManager::install("LTLA/scRNAseq"), but again I got an error:

   ERROR: package installation failed
Error: Failed to install 'scRNAseq' from GitHub:
  System command error, exit status: 1, stdout + stderr (last 10 lines):
E> ** R
E> ** data
E> ** inst
E> ** byte-compile and prepare package for lazy loading
E> Error: object ‘splitAltExps’ is not exported by 'namespace:SingleCellExperiment'
E> Execution halted
E> ERROR: lazy loading failed for package ‘scRNAseq’
E> * removing ‘/tmp/RtmpMWKJi7/Rinst290421337ff/scRNAseq’
E>       -----------------------------------
E> ERROR: package installation failed

All these errors seems to raise from using different versions of packages for development than those available through Bioconductor or CRAN. I am not using Bioconductor devel.
Are all the dependencies of the package already on Bioconductor devel? Or are there some dependencies on Github?

(I don't want to move to Bioconductor devel on my system but I would like to run some parts of the vignette)

Thanks

Sparse matrix counts support

When SingleR() is passed SingleCellExperiment objects with sparse matrixes as counts with method = 'cluster' and clusters defined, a rowsum error is thrown. As far as I can tell, nothing is wrong with the count matrix nor the cluster labels. The example code runs fine, and the only difference I can see is that the count matrix for the SCE object there is a base matrix, whereas the objects I'm using have sparse matrix formats (dgCMatrix). Running with method = 'single' works fine. Adding sparse matrix support would be excellent.

To reproduce:

library(Seurat)
library(SingleR)
library(scater)

scrna.sce <- as.SingleCellExperiment(pbmc_small)

hpca <- SingleR::getReferenceDataset(dataset = 'hpca')
common <- intersect(rownames(scrna.sce), rownames(hpca$data))
scrna.sce <- scrna.sce[common,]
scrna.sce <- scrna.sce[,colSums(counts(scrna.sce)) > 0]

scrna.sce <- scater::normalize(scrna.sce)
hpca$data <- hpca$data[common,]

# Works.
pred.hpca2 <- SingleR(test=scrna.sce, training=hpca$data, labels=hpca$main_types, assay.type = 2)

# Throws "rowsum 'x': must be numeric error" at 'colsum(test, clusters)' step of SingleR().
pred.hpca2 <- SingleR(test=scrna.sce, training=hpca$data, labels=hpca$main_types, assay.type = 2, 
  method = "cluster", clusters = as.character(scrna.sce$ident))

This can be worked around by manually converting the sparse matrix to a typical matrix and assigning that to counts:

class(counts(scrna.sce))
> 'dgCMatrix'

counts(scrna.sce) <- as.matrix(counts(scrna.sce))
class(counts(scrna.sce))
> 'matrix'

scrna.sce <- scater::normalize(scrna.sce)

# Now works.
pred.hpca <- SingleR(test=scrna.sce, training=hpca$data, labels=hpca$main_types, assay.type = 2, 
                    method = "cluster", clusters = as.character(scrna.sce$ident))

The error isn't immediately intuitive and given the performance benefits of sparse matrices, it would be good to look further into this.

Easy option to perform single-cell marker detection

Like in the vignette, but as a trainSingleR() option rather than requiring users to do it all manually.

Less sucky unidirectional colors in plotScoreHeatmap

The current white->blue color scale sucks a bit, mostly because it's hard to distinguish between "blue" and "very blue". We should look for something with better evenness if possible. I guess I don't mind Suggestsing a dedicated package for that. Good ol' viridis never let me down.

Also, the bidirectional color scale should reach its maximum color intensity at max(abs(scores)), so as to make use of the full range of colors when the largest correlation is not 1 or -1.

Download reference failed

Hi,

I want to get the "DatabaseImmuneCellExpressionData" dataset using "DatabaseImmuneCellExpressionData()" function, but I couldn't download it and report following error:

Warning messages:
1: download failed
  web resource path: ‘https://experimenthub.bioconductor.org/fetch/3112’
  local file path: ‘/data/.cache/ExperimentHub/dbd05d1658b9_3112’
  reason: Timeout was reached: [s3.amazonaws.com] Connection timed out after 10002 milliseconds
2: bfcadd() failed; resource removed
  rid: BFC45
  fpath: ‘https://experimenthub.bioconductor.org/fetch/3112’
  reason: download failed
3: download failed
  hub path: ‘https://experimenthub.bioconductor.org/fetch/3112’
  cache resource: ‘EH3096 : 3112’
  reason: bfcadd() failed; see warnings()

I wonder if you would consider upload a copy (RData or RDS) for me.
Thanks!

Function for compressing scRNA-seq data to be used as an efficient reference data set

Following up on the ideas discussed in another issue, it'd be great to have a way of wrangling the count matrix of a labelled scRNA-seq data set into an object that can be efficiently used with SingleR.

It is possible to have your cake and eat it too, via k-means clustering.

Error in .local

Dear all,
Installed the developers version (1.1.0), but get the following error when trying to get the Immune Cell Expression values:
devtools::install_github('LTLA/SingleR')
library(SingleR)
immCellExpr <- DatabaseImmuneCellExpressionData()
snapshotDate(): 2019-04-29
Error in .local(x, i, j, ...) : 'i' must be length 1

Any ideas?

I have attached the sessionInfo:
sessionInfo_SingleR.txt

Thanks in advance!
kind regards,
Aldo

Exact quantile calculations

The old version of SingleR computed the quantiles exactly. The new version rounds up the relevant probability to the nearest cell, in order to identify the quantile via the NN search (can't have fractional neighbors). This results in some differences in the results between versions - these should have been negligible beyond very small numbers of samples per label.

The problem arises when you actually do only have very few samples per label, and the labels are very poorly separated (basically most of our fine labels), such that the approximation of the quantile results in a different label. One can debate whether the old behavior was correct in the first place - certainly taking the 80th percentile of scores when you only have two observations is kind of meaningless - but it would be better to have the exact same result as before, if for no other reason than to stop people asking questions about why the results have changed.

The fix is somewhat involved and requires modification to BiocNeighbors to retrieve distances to the last two neighbors, plus addition of some interpolation code to mimic quantile().

README.md MouseRNAseqData()

Looks like MouseBulkData() referenced in README.md is actually MouseRNAseqData()

Can‘t find ‘pheatmap’

When I run the code
plotScoreHeatmap(pred.grun), I get this error:
there is no package called ‘pheatmap’
Is there anything I should do?
Thank you.

Additional Labels for Immgen Data

Proposing an additional layer of annotations for the T cell population. Splitting on CD4 vs CD8 would be first and foremost (looks like its already mostly doable by grepping on T.4 or T.8 from the fine labels). Adding the different subsets would be next, again, mostly can get this from the name already.

In case this is useful, leaving this here for consideration. Obviously way too many different ways to cut this data to really have it all in a single object and make everybody happy.

To add this to the current immgen dataset, below be some code. Please excuse the tidyverse coding.

library(SingleR)
library(tidyverse)

immgen <- ImmGenData()

manual <- tribble(
    ~label.manual, ~label.fine,
    "CD4 Naive", c("T cells (T.4NVE)", "T cells (T.4NVE44-49D-11A-)", "T cells (T.4Nve)"),
    "CD4 Effector", "T cells (T.4EFF49D+11A+.D8.LCMV)",
    "CD4 Memory", c("T cells (T.4MEM)", "T cells (T.4MEM44H62L)",
                    "T cells (T.4MEM49D+11A+.D30.LCMV)", "T cells (T.4Mem)"),
    "CD8 Naive", c("T cells (T.8NVE)", "T cells (T.8NVE.OT1)", "T cells (T.8Nve)"),
    "CD8 Effector", c("T cells (T.8EFF.OT1.D10.LISOVA)", "T cells (T.8EFF.OT1.D10LIS)",
                      "T cells (T.8EFF.OT1.D8.LISOVA)", "T cells (T.8EFF.OT1.D8.VSVOVA)",
                      "T cells (T.8EFF.OT1.D8LISO)"),
    "CD8 Memory", c("T cells (T.8MEM)", "T cells (T.8MEM.OT1.D100.LISOVA)",
                    "T cells (T.8MEM.OT1.D106.VSVOVA)", "T cells (T.8MEM.OT1.D45.LISOVA)",
                    "T cells (T.8Mem)"),
    "Treg", "T cells (T.Tregs)"
)

manual.vec <- manual$label.manual
names(manual.vec) <- manual$label.fine


## Filter based on manual new annotation
immgen.tc <- immgen[, immgen$label.fine %in% manual$label.fine]

## Append new label
immgen.tc$label.manual <- manual.vec[immgen.tc$label.fine]

Why does HPCA contain 'Neutrophil' and 'Neutrophils'

Probably one for @friedue.

table(SingleR::HumanPrimaryCellAtlasData()$label.main)
#> 
#>            Astrocyte               B_cell                   BM 
#>                    2                   26                    7 
#>           BM & Prog.         Chondrocytes                  CMP 
#>                    1                    8                    2 
#>                   DC Embryonic_stem_cells    Endothelial_cells 
#>                   88                   17                   64 
#>     Epithelial_cells         Erythroblast          Fibroblasts 
#>                   16                    8                   10 
#>          Gametocytes                  GMP          Hepatocytes 
#>                    5                    2                    3 
#>           HSC_-G-CSF            HSC_CD34+            iPS_cells 
#>                   10                    6                   42 
#>        Keratinocytes           Macrophage                  MEP 
#>                   25                   90                    2 
#>             Monocyte                  MSC            Myelocyte 
#>                   60                    9                    2 
#> Neuroepithelial_cell              Neurons           Neutrophil 
#>                    1                   16                    3 
#>          Neutrophils              NK_cell          Osteoblasts 
#>                   18                    5                   15 
#>            Platelets     Pre-B_cell_CD34-     Pro-B_cell_CD34+ 
#>                    5                    2                    2 
#>        Pro-Myelocyte  Smooth_muscle_cells              T_cells 
#>                    2                   16                   68 
#>    Tissue_stem_cells 
#>                   55

^{Created on 2019-08-30 by the reprex package (v0.3.0)}

Looking at how they get broken down in the 'fine' label:

                                         Neutrophil Neutrophils
  Neutrophil                                      3           3
  Neutrophil:commensal_E._coli_MG1655             0           2
  Neutrophil:GM-CSF_IFNg                          0           4
  Neutrophil:inflam                               0           4
  Neutrophil:LPS                                  0           4
  Neutrophil:uropathogenic_E._coli_UTI89          0           1

Should 'Neutrophil' and 'Neutrophils' be one 'main' label or is there a distinction I'm missing?

can not download sqlite3 file

Hi，when i read the SingleR html document, i mention that SingleR is a powerful tool for scRNA sequencing cell type classify,but there is some problems for me ,so could anyone can solve this.

my code:

library(SingleR)
hpca.se <- HumanPrimaryCellAtlasData()

error information:

Error in curl::curl_fetch_memory(url, handle = handle) :
Timeout was reached: [experimenthub.bioconductor.org] Connection timed out after 10001 milliseconds

and there will have another error:
Error in .local(x, i, j, ...) : 'i' must be length 1

i have try to reset some options in options function,eg:options(timeout =50000 ),but it keeps reporting same errors

error: matrix object should have 'dim' attribute when trying to use scanpy data

Hi,
I am a Scanpy user and trying to export and format my object for use in R and SingleR, struggling since although it says this is applicable to other data formats, most people have used Seurat single cell objects. I am using a csv with rownames of genes, column names of barcodes, subsetted to 100 cells as in the vignette to make it easier to work with ("MyData"). The dimensions of MyData after running

hpca.se <- hpca.se[common,]
MyData <- MyData[common,]

are 12501 and 100, as are the dimensions of hpca.se. However, I get the error

pred.hpca <- SingleR(test = MyData, ref = hpca.se, labels = hpca.se$label.main)
Error in (function (Exprs, scores, References, quantile, tune_thresh,  : 
  matrix object should have 'dim' attribute

Any suggestions for formatting my scanpy data so I can use SingleR?
Thanks,
Bridget

Revamp docs to focus on marker expression diagnostics

I'd like to revamp the docs to focus on marker expression as the primary diagnostic. This is the easiest to interpret from a human perspective as they provide insight into how the decision was made to assign a particular label. (Though the scores are better for automated pruning.)

Also, if there are any points I made here that aren't present in the vignette, they should be added to the vignette. Same applies if I made them in a more eloquent way in the book, in which case the vignette should be modified. The end goal is to try to cut down on the redundancy by simply referring to the vignette for some of the discussions that are currently in the book.

Error in dataset function hpca.se <- HumanPrimaryCellAtlasData()

Hi there,
I am using SingleR to do cell annotation, but when starting to learn from the tutorial, it always gives me an error as follow:
> hpca.se <- HumanPrimaryCellAtlasData()
snapshotDate(): 2019-10-22
see ?SingleR and browseVignettes('SingleR') for documentation
downloading 0 resources
loading from cache
‘EH3090 : 3106’
see ?SingleR and browseVignettes('SingleR') for documentation
downloading 0 resources
loading from cache
‘EH3091 : 3107’
Error in rownames<-(tmp, value = c("GSM112490", "GSM112491", "GSM112540", :
invalid rownames length

Could you tell me how to fix it?

Bioconda version of SingleR

Hello,

Do you have any plans to create a conda version of SingleR? It seems to me like a large number of Bioconductor packages are available via Bioconda, so was wondering if that is something you're planning on.

Thank you - apologies if this is the wrong place to be asking.

Harish

Problems with Annotating TSNEPlot for Seurat Object with SingleR

I have an object of class Seurat named Integration, consisting of 8 merged datasets from GSE 122960 IPF scRNA data, which I want to annotate with cell types using SingleR on the TSNE plot
1). First, I converted my Seurat object, Integration, to a SingleCellExperiment, which I named Integration1 using the function as.SingleCellExperiment(Integration)
2). I am using the Immgen database to annotate my cells
3). I have successfully used the SingleR function to label my dataset with the reference dataset, Immgen.
4). I am trying to use the TSNEPlot("labels", object)
function to map the reference labels onto my integration TSNE plot

Here are my problems

The output TSNE plot that is generated has no labels
The "single" method usually works for SingleR, but when I use the "cluster" method, it says that I have to specify clusters, which I do not understand how to do.

I'll send in my code as a reference. Thank you!

Mysterious parallelism

Hi,

Even if I run SingleR with BPPARAM=BiocParallel::SerialParam(), I see multiple processes working. Since I'm running SingleR in docker with limited memory, this usually leads to a memory error and consequently the following error:

Error in result[[njob]] <- value : 
  attempt to select less than one element in OneIndex
Calls: <Anonymous> ... .local -> bplapply -> bplapply -> bploop -> bploop.lapply

In parallel::mccollect(wait = FALSE, timeout = 1) :
1 parallel job did not deliver a result

I was wondering if you know whether there is another source of parallelism in the code :/

'rownames(test)' does not contain all genes used in 'trained'

Hello

I have a problem in applying new verision of SingleR

I want to know how to fix the following errors when executing "classifySingleR" function.

'rownames(test)' does not contain all genes used in 'trained'

I think that due to various sensitivity of detection between library prep modalities, it could be common problems to others

Thank you

Make score quality check function

Currently, SingleR provides scores for all cells, but it would be nice to also have an idea of the quality of those calls.

Aaron's suggestions:

A simple approach would be to examine the distribution of max scores across all cells assigned with a single label, identify low outliers and ignore them.
We could ask for “delta correlation > 0.05” and no. mads < 3
Or we could ask that the top scoring label must be at least X above the median score across all labels for this cell.

issues about install

devtools::install_github('LTLA/SingleR')

@LTLA

Downloading GitHub repo LTLA/SingleR@master
Installing 8 packages: DelayedMatrixStats, BiocNeighbors, BiocFileCache, ExperimentHub, beachmat, HDF5Array, Rhdf5lib, rhdf5

package ‘DelayedMatrixStats’ successfully unpacked and MD5 sums checked
package ‘BiocNeighbors’ successfully unpacked and MD5 sums checked
package ‘BiocFileCache’ successfully unpacked and MD5 sums checked
package ‘ExperimentHub’ successfully unpacked and MD5 sums checked
package ‘beachmat’ successfully unpacked and MD5 sums checked
package ‘HDF5Array’ successfully unpacked and MD5 sums checked
package ‘Rhdf5lib’ successfully unpacked and MD5 sums checked
package ‘rhdf5’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in
C:\Users\zhegu\AppData\Local\Temp\RtmpCqDKu6\downloaded_packages
√ checking for file 'C:\Users\zhegu\AppData\Local\Temp\RtmpCqDKu6\remotes19d058b1584d\LTLA-SingleR-a2a8921/DESCRIPTION' (988ms)

preparing 'SingleR': (363ms)
√ checking DESCRIPTION meta-information ...
cleaning src
checking for LF line-endings in source and make files and shell scripts (398ms)
checking for empty or unneeded directories
building 'SingleR_0.99.4.tar.gz'

installing source package 'SingleR' ...
** libs
C:/RBUILD~~1/3.4/mingw_64/bin/g++ -std=gnu++11 -I"D:/R/soft/R-35~~1.3/include" -DNDEBUG -I"D:/R/soft/R-3.5.3/library/Rcpp/include" -I"D:/R/soft/R-3.5.3/library/beachmat/include" -O2 -Wall -mtune=generic -c RcppExports.cpp -o RcppExports.o
C:/RBUILD~~1/3.4/mingw_64/bin/g++ -std=gnu++11 -I"D:/R/soft/R-35~~1.3/include" -DNDEBUG -I"D:/R/soft/R-3.5.3/library/Rcpp/include" -I"D:/R/soft/R-3.5.3/library/beachmat/include" -O2 -Wall -mtune=generic -c fine_tune_de.cpp -o fine_tune_de.o
In file included from D:/R/soft/R-3.5.3/library/beachmat/include/beachmat/all_readers.h:4:0,
from D:/R/soft/R-3.5.3/library/beachmat/include/beachmat/LIN_matrix.h:4,
from D:/R/soft/R-3.5.3/library/beachmat/include/beachmat/numeric_matrix.h:4,
from fine_tune_de.cpp:1:
D:/R/soft/R-3.5.3/library/beachmat/include/beachmat/beachmat.h:15:19: fatal error: H5Cpp.h: No such file or directory
#include "H5Cpp.h"
^
compilation terminated.
make: *** [fine_tune_de.o] Error 1
ERROR: compilation failed for package 'SingleR'
removing 'D:/R/soft/R-3.5.3/library/SingleR'
In R CMD INSTALL
Error: Failed to install 'SingleR' from GitHub:
(converted from warning) installation of package ‘C:/Users/zhegu/AppData/Local/Temp/RtmpCqDKu6/file19d06502ecf/SingleR_0.99.4.tar.gz’ had non-zero exit status

singler-inc / singler Goto Github PK

singler's People

Contributors

Stargazers

Watchers

Forkers

singler's Issues

Algorithm

Functions

Documentation

Unit tests

Data

To do:

setup reference datasets

labels

run SingleR

Recommend Projects

Recommend Topics

Recommend Org