singler-inc / singler Goto Github PK
View Code? Open in Web Editor NEWClone of the Bioconductor repository for the SingleR package.
Home Page: https://bioconductor.org/packages/devel/bioc/html/SingleR.html
License: GNU General Public License v3.0
Clone of the Bioconductor repository for the SingleR package.
Home Page: https://bioconductor.org/packages/devel/bioc/html/SingleR.html
License: GNU General Public License v3.0
Hi there,
I am using SingleR to do cell annotation, but when loading the built reference Human Primary cell Atlas data, I met an error as follow:
library(SingleR)
hpca.se <- HumanPrimaryCellAtlasData()
Error in HumanPrimaryCellAtlasData() :
could not find function "HumanPrimaryCellAtlasData"
I have correctly install SingleR package. Could you tell me how to fix it?
Hello,
I realize this package is evolving quickly, but I'm wondering if you can comment on this, which is trying ot start with a seurat object and run using HPCA data:
seuratObj <- readRDS(file = 'Test.seurat0.rds')
singler <- OOSAP:::GenerateSingleR(seuratObj = seuratObj0)
seuratObj <- readRDS(file = 'seurat.rds')
ref <- SingleR::HumanPrimaryCellAtlasData()
#Subset genes:
genesPresent <- intersect(rownames(seuratObj), rownames(ref))
ref <- ref[genesPresent,]
seuratObj <- seuratObj[genesPresent,]
sce <- as.SingleCellExperiment(seuratObj)
sce <- scater:::logNormCounts(sce, log = T)
pred <- SingleR::SingleR(test = sce, ref = ref, labels = ref$label.main, assay.type.ref = 'normcounts')
The HPCA data only contains the assay type 'normcounts', not 'logcounts' like the pred() default, so I assume we need to switch? I'm transforming the raw seurat counts using logNormCounts(log=T), which puts them into the normcounts() assay slot.
Thanks for any help/suggestions.
I have tried to keep the output as close as possible to the original. A few minor changes have been made to improve speed:
genes="sd"
does not use the 500 gene cut-off, instead just taking all genes with standard deviations above the SD threshold. This was performed simply to make it easier to implement (and also to make it consistent with the initial marker gene detection prior to fine-tuning). Note that ease of implementation is important; C++ gets pretty intense.In my hands, the original SingleR
on the pancreas data example takes 500 seconds on 7 cores, while this modified SingleR
takes 125 seconds on 1 core. It should be possible to achieve greater speed by parallelizing the fine-tuning here, I just haven't gotten around to it.
As discussed, I have removed all functions except the most critical ones. This made the entire package much simpler to test and document, and it avoids problems with feature creep.
SingleCellExperiment
as the data container. Note that this doesn't preclude use with non-SCE-based workflows (e.g., Seurat), the functions will happily accept raw count/expression matrices.(IMO, feature creep is a real problem - looking at https://github.com/dviraran/SingleR/issues suggests that most of the issues have nothing to do with the SingleR()
function! Many of them seem to be problems with converting to/from data structures; better to shift the responsibility for answering those questions to the maintainers who defined those data structures.)
In addition, the SingleR()
function itself can be called by advanced users as two separate components, trainSingleR()
and classifySingleR()
. This allows the training cost (i.e., marker gene detection, setting up the nearest neighbor indices) to be paid once for multiple rounds of classification with different test datasets.
There's now a lot more of it, concentrated in fewer functions. The vignette has also been stripped down so that it will run within the 5 minute timeframe for Bioconductor builds.
Now there are some.
This is the last remaining outstanding issue (we should open a new Issue to discuss this). The various reference datasets have been stripped out to reduce the repository size (cloning took minutes and 1 GB of space!) and the package size. Instead, I propose that data be stored in Bioconductor's ExperimentHub, where it is pulled down as needed by each user and cached locally. This separates code from data, reduces the package/repo size, and avoids cluttering the user's R installation with reference datasets they might not ever use.
To put data in ExperimentHub, we need to describe how the reference datasets were obtained. I don't want to have an entire treatise on scraping GEO for bulk RNA-seq data, but we should be able to point interested parties to a repository of code that did it. Is that code somewhere in the NI paper? If so, our description can simply point to the NI paper, pull objects from the data
directory of https://github.com/dviraran/SingleR, and clean them up for EHub upload.
mcapply
, but also allows users to parallelize with SNOW, Slurm, LSF, etc.SingleR()
slightly faster. Specifically, this avoids allocating a matrix of distances to neighbors when only the distance to the k
th neighbor is of interest.Hello,
The dataframe returned by SingleR has 'label', first.label and score. I find instances where cells have the highest score as 'NK_cell', label=T_cells and first.label=NK_cell. this is being generated using hpca as the reference. Do you have a suggestion on how I should interpret this?
Related to that: do you expect that users might interrogate the score matrix and set some kind of threshold, or are you assuming SingleR has internally done this and one would interpret any call to be of good quality?
Hi Aaron,
cool to see you're involved in this project!
I want to create reference data to use with SingleR, is it ok to generate pseudobulks from an annotated SC dataset? And would I need to TPM the pseudobulks to create the reference or what normalization should I use then?
thanks!
test and ref are SingleCellExperiment from seurat objects
scDWSpr <- as.SingleCellExperiment(ref)
sc10x.se <- as.SingleCellExperiment(test)
filtered by common genes
common <- intersect(rownames(sc10x.se),rownames(scDWSpr))
scDWSpr <- scDWSpr[common,]
sc10x.se <- sc10x.se[common,]
singler <- SingleR(sc10x.se,ref=scDWSpr,labels=scDWSpr$ident)
...gives error:
Error in nn.d^2 : non-numeric argument to binary operator
In addition: Warning message:
In (function (jobs, data, centers, info, distance, k, query, get.index, :
tied distances detected in nearest-neighbor calculation
Where can I find the expression matrices for the built-in references such as HumanPrimaryCellAtlasData
and MouseRNAseqData
? I'm interested in trying to add a few more cell types to the existing matrices to generate new references. It would be nice if you already have the matrices somewhere. Otherwise, I would have to grab the matrix from each data source.
Thanks!
I don't know where it came from, but we should filter this out.
HI,
I created a singleR object from Seurat3 integrated objects by as.SingleCellExperiment
by;
ap.int.singler.im<-SingleR(ap.int.sc, ref=immgen.se, labels = immgen.se$label.main, assay.type.ref = "logcounts" )
Then, I run plotScoreHeatmap() and got a following error;
Error in UseMethod("depth") :
no applicable method for 'depth' applied to an object of class "NULL"
Here is str() of my singleR objects;
Formal class 'DFrame' [package "S4Vectors"] with 6 slots
..@ rownames : chr [1:18423] "AAACCCACACTGTGTA_1" "AAACCCACATTAAAGG_1" "AAACCCAGTGCATGTT_1" "AAACCCAGTTAGTTCG_1" ...
..@ nrows : int 18423
..@ listData :List of 4
.. ..$ scores : num [1:18423, 1:20] 0.334 0.133 0.246 0.107 0.128 ...
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : NULL
.. .. .. ..$ : chr [1:20] "Macrophages" "Monocytes" "B cells" "DC" ...
.. ..$ first.labels : chr [1:18423] "Fibroblasts" "Endothelial cells" "Fibroblasts" "Epithelial cells" ...
.. ..$ tuning.scores:Formal class 'DFrame' [package ""] with 6 slots
.. .. .. ..@ rownames : NULL
.. .. .. ..@ nrows : int 18423
.. .. .. ..@ listData :List of 2
.. .. .. .. ..$ first : num [1:18423] 0.336 0.34 0.542 0.217 0.115 ...
.. .. .. .. ..$ second: num [1:18423] 0.0716 0.2805 0.4848 -0.0373 0.1129 ...
.. .. .. ..@ elementType : chr "ANY"
.. .. .. ..@ elementMetadata: NULL
.. .. .. ..@ metadata : list()
.. ..$ labels : chr [1:18423] "Fibroblasts" "Endothelial cells" "Fibroblasts" "Epithelial cells" ...
..@ elementType : chr "ANY"
..@ elementMetadata: NULL
..@ metadata : list()
I see two slots having NULL. I'd appreciate any pointers on this.
Thanks.
I have some custom references that I made with the previous version of SingleR.
However when I try to use them with SingleR 1.0.4 from Bioconductor I get the following error:
Error in FUN(X[[i]], ...) : 'ref' must have row names
I tried to look if there's a way to build references using the new version of SingleR, but haven't found much.
Should I stick with the old legacy version?
Thanks!
I try to do the plotScoreHeatmap in different ways as the previous version of singleR. as below:
SingleR.DrawHeatmap(singler$singler[[1]]$SingleR.single.main, top.n = Inf,
clusters = singler$meta.data$orig.ident)
And how to show the top 50 cell types which can give me more details :
SingleR.DrawHeatmap(singler$singler[[1]]$SingleR.single, top.n = 50,
clusters = singler$meta.data$orig.ident)
Hello,
An error occurs when I using "NovershternHematopoieticData()" as a reference.
It works fine with the "label.main" option, but when I set the "label.fine" option I get the following mesage:
Error in (function (Exprs, scores, References, quantile, tune_thresh, :
Not compatible with requested type: [type=NULL; target=integer].
Other references such as HumanPrimaryCellAtlasData() or BlueprintEncodeData() works fine with both "label.main" and "label.fine".
Thanks for your help.
Hi all,
In order to subject my data to a bigger reference set, I would like to combine some reference sets together: HumanPrimaryCellAtlasData(), BlueprintEncodeData() and MonacoImmuneData(). Is there an option to do this?
Best Regards,
Erik
Hi,
I downloaded reference data successfully on a server. I'd like to run SingleR on another server which failed to download the reference data several times. Can I copy the content of ~/.cache/ExperimentHub to the failed server and use the cache in the SingeR? How should I set the option during the running?
Thank you very much.
Best,
Fei
Hi,
When providing a list of references for cellular annotation, are scores from fine tunning considered?
I am trying to annotate cells in a Seurat object against multiple references (BlueprintEncode and HumanPrimaryCellAtlas). Previously, I have noticed ambiguous annotations for Tregs and Tcell-gamma delta cells, presumably since BlueprintEncode doesn't include the Tcell-gamma delta samples.
Reading from the current manual (SingleR_1.1.6 ), I see a convenient option to provide references in a list and obtain cellular annotation with the top score across references. Since scores from only the first step are comparable (before fine.tuning) across references, then how are we able to obtain label.fine annotations in the output? Is the fine.tune step even executed when providing a list of references?
If possible, kindly share further details.
Code:
list.ref <- list()
list.ref[["bp.encode"]] <- BlueprintEncodeData()
list.ref[["hpca"]] <- HumanPrimaryCellAtlasData()
list.labels <- lapply(list.ref, function(x) x$label.fine)
pred <- SingleR(test = SeuratObject@assays$SCT@counts,
ref = list.ref, labels = list.labels, fine.tune = T)
Best,
Namit
Howdy, can I get some input on some data I have please:
I'm having a few labels which confuse me a little. A good example is the ID to the far right of this heatmap. This data was run in cluster mode. The far right one after fine tuning get's the label LE... the SM correlation seems so strong though that it seems unlikely that LE is the correct assignment. Even on the left there are a bunch of clusters which look like it should maybe be Hillock, but is getting a LE assignment.
FYI... I am trying to use a human reference to ID a mouse target, so the human genes are filtered and converted to orthologous mouse genes. That with filtering only for shared genes only leaves me about 14K genes. I know this is really reduced... but the first label is convincing and it seems unconvincing.
Thanks!
Gervaise
Happens on this line in .build_trained_index
:
sr.out <- .scaled_colranks_safe(current[common, , drop = FALSE])
In my case, this occurs because common
is not a subset of row.names(current)
. This occurs because I passed genes
in as the result of scran::getTopMarkers
and then the ref
SCE gets subset to include genes in common with the test
SCE.
PR on way.
Hello,
I've successfully used the SingleR function to assign cell-types using the "single" method, but am not succeeding with the "cluster" method. The clustering is done in Seurat and the object is imported as a SingleCellExperiment.
sample.pred.cluster.dice <- SingleR(test = sample.dice.sce, ref = dice.se.common, labels = dice.se.common$label.fine, fine.tune = TRUE, method = "cluster", clusters = "seurat_clusters")
Here is the error:
Error in .local(x, group, reorder, na.rm, ...) :
incorrect length for 'group'
Thanks for your help!
Despite setting plotScoreHeatmap(..., show_colnames = TRUE) to try and pass the pheatmap argument, currently does not allow the override to occur. Makes interpreting the scores heatmap difficult without the colnames (although understandable why you'd want it off for single-cell based score heatmaps). To fix this, had to go into the function and manually set show_colnames to TRUE.
## hotfix to make colnames appear
plotScoreHeatmap <- function(...) {
...
args <- list(mat = scores[, order, drop = FALSE], border_color = NA,
show_colnames = TRUE, clustering_method = "ward.D2",
cluster_cols = cluster_cols, breaks = breaks, ...)
...
}
Hi, using SingleR v1 from bioconductor. When using a custom classifier, I run intersect() between the classifer and the set to annotate, but then when running
SingleR(test = logcounts(xxxxx)[ common_genes, ],
ref = loaded_classifier[ common_genes, ],
....)
I get the error
Error in SummarizedExperiment:::.SummarizedExperiment.charbound(subset, :
index out of bounds: gene1 gene2 gene3
Testing to make sure whether the rownames in the intersect really are in both the classifier and the dataset shows that yes, all rownames are present. Only when running the SingleR function does an index out of bounds error get thrown. Thoughts/suggestions??
Hi All!
I'm currently working on my first project of Single Cell analysis and it will be great to have your advice.
I did all the necessary analysis of mt data with Surat - including combining 5 data sets, filtering the cells, normalizing and scaling the expression values, etc. I finally cluster the pre-processed data and found 25 cell clusters, now I need to annotate them.
As I understand from the vignette, I should provide SingleR with the counts. Two questions:
1, Is there any option to use the differential expressed marker gene output of Seurat to insure that SingleR analysis will define the same 25 clusters?
2. Which pre-processing is required for the input data?
Best,
Erik
I am trying to use SingleR for analysis of my dataset and hitting a roadblock accessing the reference data. I cannot download within R from Bioconductor due to an expired license. Can you suggest a work around?
hpca.se <- HumanPrimaryCellAtlasData()
Error in .util_download(x, rid[i], proxy, config, "bfcadd()", ...) :
bfcadd() failed; see warnings()
In addition: Warning messages:
1: download failed
web resource path: ‘https://experimenthub.bioconductor.org/metadata/experimenthub.sqlite3’
local file path: ‘/wynton/home/fong/bkeenan/.cache/ExperimentHub/1de062c5e7265_experimenthub.sqlite3’
reason: Peer's Certificate has expired.
2: bfcadd() failed; resource removed
rid: BFC6
fpath: ‘https://experimenthub.bioconductor.org/metadata/experimenthub.sqlite3’
reason: download failed
Hi There!
I am currently attempting to add the SingleR cluster annotations back into my original Seurat Object (on which I've already run UMAP clustering and have 14 cell clusters), but unfortunately it does not seem to be working (the metadata column just outputs <NA>
).
This is my current code:
#convert Seurat object into SingleCellExperiment
matrix <- as.SingleCellExperiment(p1_1)
#load immune cell database
imm <- DatabaseImmuneCellExpressionData()
#Grab the common genes between the sets.
common <- intersect(rownames(matrix), rownames(imm)) imm <- imm[common,] matrix <- matrix[common,]
#Run SingleR
singler <- SingleR(test = matrix, ref = imm, labels = imm$label.fine, method = 'cluster', assay.type.ref = 'logcounts', clusters = matrix$seurat_clusters)
#Add SingleR clusters back into Seurat object
p1_1[['singler_clusters']] <- singler$labels[match(p1_1[[]][['seurat_clusters']], singler$first.labels)]
I believe the issue is with the last part where I am trying to add the SingleR assigned clusters back into my original Seurat object. The readme file has this piece of code:
seurat.obj[["SingleR.cluster.labels"]] <- singler.results$labels[match(seurat.obj[[]][["my.input.clusters"]], singler.results$clusts)]]
But when I look at my 'singler' object, I have no 'clusts' variable (which is why I just used 'first.labels'). And once again, after I run the above code, my newly defined 'singler_clusters' metadata column is blank!
Any help would be greatly appreciated! Many thanks.
Hi,
CreateSinglerObject function has been removed ,could you tell me how to do in following code:
singler <- CreateSinglerObject(counts=counts,
project.name="excelerate course", # choose
min.genes = 200, # ignore cells with fewer than 200 transcripts
technology = "CEL-Seq2", # choose
species = "Human",
citation = "Schelker et al. 2017", # choose
ref.list = list(hpca=hpca, bpe=blueprint_encode),
normalize.gene.length = FALSE, # needed for full-length
Commit I used: e08fcba
If I try hpca.se <- SingleR::BlueprintEncodeData()
I get the following error:
Using temporary cache /tmp/RtmpxUy46B/BiocFileCache
snapshotDate(): 2019-04-29
Using temporary cache /tmp/RtmpxUy46B/BiocFileCache
Error in .local(x, i, j, ...): 'i' must be length 1
Traceback:
1. SingleR::BlueprintEncodeData()
2. .create_se(file.path("blueprint_encode", version), assays = "logcounts",
. rm.NA = rm.NA, has.rowdata = FALSE, has.coldata = TRUE)
3. hub[hub$rdatapath == file.path(host, paste0(a, ".rds"))][[1]]
4. hub[hub$rdatapath == file.path(host, paste0(a, ".rds"))][[1]]
5. .local(x, i, j, ...)
6. stop("'i' must be length 1")
The same for hpca and the other references. As I can't update to the bioconductor devel I installed this directly from github. Can this be an issue?
Actually I want to apply SingleR to an Seurat object, but I am not quite sure on how. In the old SingleR version I used CreateSinglerObject
on the Seurat counts....
Remove NA
s from incoming data in all *SingleR()
functions.
Prune NA
s from the reference data sets, tagging @friedue.
I used the old version of the singleR. I need the T-sne plot function.
Can the new singleR do the same t-SNE plot as the old one?
And how can I do it? I use the seurat a lot, but I did not have experience to use scater.
Thank you very much.
mouse_1=ImmGenData()
Error in curl::curl_fetch_memory(url, handle = handle) :
Timeout was reached: [experimenthub.bioconductor.org] Connection timed out after 10001 milliseconds
how to fix it?
I was trying to follow the vignette when I got an error because scRNAseq does not provide the function LaMannoBrainData
. I then tried to install BiocManager::install("LTLA/scRNAseq")
, but again I got an error:
ERROR: package installation failed
Error: Failed to install 'scRNAseq' from GitHub:
System command error, exit status: 1, stdout + stderr (last 10 lines):
E> ** R
E> ** data
E> ** inst
E> ** byte-compile and prepare package for lazy loading
E> Error: object ‘splitAltExps’ is not exported by 'namespace:SingleCellExperiment'
E> Execution halted
E> ERROR: lazy loading failed for package ‘scRNAseq’
E> * removing ‘/tmp/RtmpMWKJi7/Rinst290421337ff/scRNAseq’
E> -----------------------------------
E> ERROR: package installation failed
All these errors seems to raise from using different versions of packages for development than those available through Bioconductor or CRAN. I am not using Bioconductor devel.
Are all the dependencies of the package already on Bioconductor devel? Or are there some dependencies on Github?
(I don't want to move to Bioconductor devel on my system but I would like to run some parts of the vignette)
Thanks
When SingleR()
is passed SingleCellExperiment
objects with sparse matrixes as counts
with method = 'cluster'
and clusters
defined, a rowsum
error is thrown. As far as I can tell, nothing is wrong with the count matrix nor the cluster labels. The example code runs fine, and the only difference I can see is that the count matrix for the SCE object there is a base matrix
, whereas the objects I'm using have sparse matrix formats (dgCMatrix
). Running with method = 'single'
works fine. Adding sparse matrix support would be excellent.
To reproduce:
library(Seurat)
library(SingleR)
library(scater)
scrna.sce <- as.SingleCellExperiment(pbmc_small)
hpca <- SingleR::getReferenceDataset(dataset = 'hpca')
common <- intersect(rownames(scrna.sce), rownames(hpca$data))
scrna.sce <- scrna.sce[common,]
scrna.sce <- scrna.sce[,colSums(counts(scrna.sce)) > 0]
scrna.sce <- scater::normalize(scrna.sce)
hpca$data <- hpca$data[common,]
# Works.
pred.hpca2 <- SingleR(test=scrna.sce, training=hpca$data, labels=hpca$main_types, assay.type = 2)
# Throws "rowsum 'x': must be numeric error" at 'colsum(test, clusters)' step of SingleR().
pred.hpca2 <- SingleR(test=scrna.sce, training=hpca$data, labels=hpca$main_types, assay.type = 2,
method = "cluster", clusters = as.character(scrna.sce$ident))
This can be worked around by manually converting the sparse matrix to a typical matrix and assigning that to counts:
class(counts(scrna.sce))
> 'dgCMatrix'
counts(scrna.sce) <- as.matrix(counts(scrna.sce))
class(counts(scrna.sce))
> 'matrix'
scrna.sce <- scater::normalize(scrna.sce)
# Now works.
pred.hpca <- SingleR(test=scrna.sce, training=hpca$data, labels=hpca$main_types, assay.type = 2,
method = "cluster", clusters = as.character(scrna.sce$ident))
The error isn't immediately intuitive and given the performance benefits of sparse matrices, it would be good to look further into this.
Like in the vignette, but as a trainSingleR()
option rather than requiring users to do it all manually.
The current white->blue color scale sucks a bit, mostly because it's hard to distinguish between "blue" and "very blue". We should look for something with better evenness if possible. I guess I don't mind Suggests
ing a dedicated package for that. Good ol' viridis never let me down.
Also, the bidirectional color scale should reach its maximum color intensity at max(abs(scores))
, so as to make use of the full range of colors when the largest correlation is not 1 or -1.
Hi,
I want to get the "DatabaseImmuneCellExpressionData" dataset using "DatabaseImmuneCellExpressionData()" function, but I couldn't download it and report following error:
Warning messages:
1: download failed
web resource path: ‘https://experimenthub.bioconductor.org/fetch/3112’
local file path: ‘/data/.cache/ExperimentHub/dbd05d1658b9_3112’
reason: Timeout was reached: [s3.amazonaws.com] Connection timed out after 10002 milliseconds
2: bfcadd() failed; resource removed
rid: BFC45
fpath: ‘https://experimenthub.bioconductor.org/fetch/3112’
reason: download failed
3: download failed
hub path: ‘https://experimenthub.bioconductor.org/fetch/3112’
cache resource: ‘EH3096 : 3112’
reason: bfcadd() failed; see warnings()
I wonder if you would consider upload a copy (RData or RDS) for me.
Thanks!
Following up on the ideas discussed in another issue, it'd be great to have a way of wrangling the count matrix of a labelled scRNA-seq data set into an object that can be efficiently used with SingleR.
It is possible to have your cake and eat it too, via k-means clustering.
Dear all,
Installed the developers version (1.1.0), but get the following error when trying to get the Immune Cell Expression values:
devtools::install_github('LTLA/SingleR')
library(SingleR)
immCellExpr <- DatabaseImmuneCellExpressionData()
snapshotDate(): 2019-04-29
Error in .local(x, i, j, ...) : 'i' must be length 1
Any ideas?
I have attached the sessionInfo:
sessionInfo_SingleR.txt
Thanks in advance!
kind regards,
Aldo
The old version of SingleR computed the quantiles exactly. The new version rounds up the relevant probability to the nearest cell, in order to identify the quantile via the NN search (can't have fractional neighbors). This results in some differences in the results between versions - these should have been negligible beyond very small numbers of samples per label.
The problem arises when you actually do only have very few samples per label, and the labels are very poorly separated (basically most of our fine labels), such that the approximation of the quantile results in a different label. One can debate whether the old behavior was correct in the first place - certainly taking the 80th percentile of scores when you only have two observations is kind of meaningless - but it would be better to have the exact same result as before, if for no other reason than to stop people asking questions about why the results have changed.
The fix is somewhat involved and requires modification to BiocNeighbors to retrieve distances to the last two neighbors, plus addition of some interpolation code to mimic quantile()
.
Looks like MouseBulkData() referenced in README.md is actually MouseRNAseqData()
When I run the code
plotScoreHeatmap(pred.grun), I get this error:
there is no package called ‘pheatmap’
Is there anything I should do?
Thank you.
Proposing an additional layer of annotations for the T cell population. Splitting on CD4 vs CD8 would be first and foremost (looks like its already mostly doable by grepping on T.4 or T.8 from the fine labels). Adding the different subsets would be next, again, mostly can get this from the name already.
In case this is useful, leaving this here for consideration. Obviously way too many different ways to cut this data to really have it all in a single object and make everybody happy.
To add this to the current immgen dataset, below be some code. Please excuse the tidyverse coding.
library(SingleR)
library(tidyverse)
immgen <- ImmGenData()
manual <- tribble(
~label.manual, ~label.fine,
"CD4 Naive", c("T cells (T.4NVE)", "T cells (T.4NVE44-49D-11A-)", "T cells (T.4Nve)"),
"CD4 Effector", "T cells (T.4EFF49D+11A+.D8.LCMV)",
"CD4 Memory", c("T cells (T.4MEM)", "T cells (T.4MEM44H62L)",
"T cells (T.4MEM49D+11A+.D30.LCMV)", "T cells (T.4Mem)"),
"CD8 Naive", c("T cells (T.8NVE)", "T cells (T.8NVE.OT1)", "T cells (T.8Nve)"),
"CD8 Effector", c("T cells (T.8EFF.OT1.D10.LISOVA)", "T cells (T.8EFF.OT1.D10LIS)",
"T cells (T.8EFF.OT1.D8.LISOVA)", "T cells (T.8EFF.OT1.D8.VSVOVA)",
"T cells (T.8EFF.OT1.D8LISO)"),
"CD8 Memory", c("T cells (T.8MEM)", "T cells (T.8MEM.OT1.D100.LISOVA)",
"T cells (T.8MEM.OT1.D106.VSVOVA)", "T cells (T.8MEM.OT1.D45.LISOVA)",
"T cells (T.8Mem)"),
"Treg", "T cells (T.Tregs)"
)
manual.vec <- manual$label.manual
names(manual.vec) <- manual$label.fine
## Filter based on manual new annotation
immgen.tc <- immgen[, immgen$label.fine %in% manual$label.fine]
## Append new label
immgen.tc$label.manual <- manual.vec[immgen.tc$label.fine]
Probably one for @friedue.
table(SingleR::HumanPrimaryCellAtlasData()$label.main)
#>
#> Astrocyte B_cell BM
#> 2 26 7
#> BM & Prog. Chondrocytes CMP
#> 1 8 2
#> DC Embryonic_stem_cells Endothelial_cells
#> 88 17 64
#> Epithelial_cells Erythroblast Fibroblasts
#> 16 8 10
#> Gametocytes GMP Hepatocytes
#> 5 2 3
#> HSC_-G-CSF HSC_CD34+ iPS_cells
#> 10 6 42
#> Keratinocytes Macrophage MEP
#> 25 90 2
#> Monocyte MSC Myelocyte
#> 60 9 2
#> Neuroepithelial_cell Neurons Neutrophil
#> 1 16 3
#> Neutrophils NK_cell Osteoblasts
#> 18 5 15
#> Platelets Pre-B_cell_CD34- Pro-B_cell_CD34+
#> 5 2 2
#> Pro-Myelocyte Smooth_muscle_cells T_cells
#> 2 16 68
#> Tissue_stem_cells
#> 55
Created on 2019-08-30 by the reprex package (v0.3.0)
Looking at how they get broken down in the 'fine' label:
Neutrophil Neutrophils
Neutrophil 3 3
Neutrophil:commensal_E._coli_MG1655 0 2
Neutrophil:GM-CSF_IFNg 0 4
Neutrophil:inflam 0 4
Neutrophil:LPS 0 4
Neutrophil:uropathogenic_E._coli_UTI89 0 1
Should 'Neutrophil' and 'Neutrophils' be one 'main' label or is there a distinction I'm missing?
Hi,when i read the SingleR html document, i mention that SingleR is a powerful tool for scRNA sequencing cell type classify,but there is some problems for me ,so could anyone can solve this.
my code:
library(SingleR)
hpca.se <- HumanPrimaryCellAtlasData()
error information:
Error in curl::curl_fetch_memory(url, handle = handle) :
Timeout was reached: [experimenthub.bioconductor.org] Connection timed out after 10001 milliseconds
and there will have another error:
Error in .local(x, i, j, ...) : 'i' must be length 1
i have try to reset some options in options function,eg:options(timeout =50000 ),but it keeps reporting same errors
Hi,
I am a Scanpy user and trying to export and format my object for use in R and SingleR, struggling since although it says this is applicable to other data formats, most people have used Seurat single cell objects. I am using a csv with rownames of genes, column names of barcodes, subsetted to 100 cells as in the vignette to make it easier to work with ("MyData"). The dimensions of MyData after running
hpca.se <- hpca.se[common,]
MyData <- MyData[common,]
are 12501 and 100, as are the dimensions of hpca.se. However, I get the error
pred.hpca <- SingleR(test = MyData, ref = hpca.se, labels = hpca.se$label.main)
Error in (function (Exprs, scores, References, quantile, tune_thresh, :
matrix object should have 'dim' attribute
Any suggestions for formatting my scanpy data so I can use SingleR?
Thanks,
Bridget
I'd like to revamp the docs to focus on marker expression as the primary diagnostic. This is the easiest to interpret from a human perspective as they provide insight into how the decision was made to assign a particular label. (Though the scores are better for automated pruning.)
Also, if there are any points I made here that aren't present in the vignette, they should be added to the vignette. Same applies if I made them in a more eloquent way in the book, in which case the vignette should be modified. The end goal is to try to cut down on the redundancy by simply referring to the vignette for some of the discussions that are currently in the book.
Hi there,
I am using SingleR to do cell annotation, but when starting to learn from the tutorial, it always gives me an error as follow:
> hpca.se <- HumanPrimaryCellAtlasData()
snapshotDate(): 2019-10-22
see ?SingleR and browseVignettes('SingleR') for documentation
downloading 0 resources
loading from cache
‘EH3090 : 3106’
see ?SingleR and browseVignettes('SingleR') for documentation
downloading 0 resources
loading from cache
‘EH3091 : 3107’
Error in
rownames<-(
tmp, value = c("GSM112490", "GSM112491", "GSM112540", :
invalid rownames length
Could you tell me how to fix it?
Hello,
Do you have any plans to create a conda version of SingleR? It seems to me like a large number of Bioconductor packages are available via Bioconda, so was wondering if that is something you're planning on.
Thank you - apologies if this is the wrong place to be asking.
Harish
I have an object of class Seurat named Integration, consisting of 8 merged datasets from GSE 122960 IPF scRNA data, which I want to annotate with cell types using SingleR on the TSNE plot
1). First, I converted my Seurat object, Integration, to a SingleCellExperiment, which I named Integration1 using the function as.SingleCellExperiment(Integration)
2). I am using the Immgen database to annotate my cells
3). I have successfully used the SingleR function to label my dataset with the reference dataset, Immgen.
4). I am trying to use the TSNEPlot("labels", object)
function to map the reference labels onto my integration TSNE plot
Here are my problems
I'll send in my code as a reference. Thank you!
Hi,
Even if I run SingleR with BPPARAM=BiocParallel::SerialParam()
, I see multiple processes working. Since I'm running SingleR in docker with limited memory, this usually leads to a memory error and consequently the following error:
Error in result[[njob]] <- value :
attempt to select less than one element in OneIndex
Calls: <Anonymous> ... .local -> bplapply -> bplapply -> bploop -> bploop.lapply
In parallel::mccollect(wait = FALSE, timeout = 1) :
1 parallel job did not deliver a result
I was wondering if you know whether there is another source of parallelism in the code :/
Hello
I have a problem in applying new verision of SingleR
I want to know how to fix the following errors when executing "classifySingleR" function.
'rownames(test)' does not contain all genes used in 'trained'
I think that due to various sensitivity of detection between library prep modalities, it could be common problems to others
Thank you
Currently, SingleR provides scores for all cells, but it would be nice to also have an idea of the quality of those calls.
Aaron's suggestions:
devtools::install_github('LTLA/SingleR')
Downloading GitHub repo LTLA/SingleR@master
Installing 8 packages: DelayedMatrixStats, BiocNeighbors, BiocFileCache, ExperimentHub, beachmat, HDF5Array, Rhdf5lib, rhdf5
package ‘DelayedMatrixStats’ successfully unpacked and MD5 sums checked
package ‘BiocNeighbors’ successfully unpacked and MD5 sums checked
package ‘BiocFileCache’ successfully unpacked and MD5 sums checked
package ‘ExperimentHub’ successfully unpacked and MD5 sums checked
package ‘beachmat’ successfully unpacked and MD5 sums checked
package ‘HDF5Array’ successfully unpacked and MD5 sums checked
package ‘Rhdf5lib’ successfully unpacked and MD5 sums checked
package ‘rhdf5’ successfully unpacked and MD5 sums checked
The downloaded binary packages are in
C:\Users\zhegu\AppData\Local\Temp\RtmpCqDKu6\downloaded_packages
√ checking for file 'C:\Users\zhegu\AppData\Local\Temp\RtmpCqDKu6\remotes19d058b1584d\LTLA-SingleR-a2a8921/DESCRIPTION' (988ms)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.