dviraran / singler Goto Github PK
View Code? Open in Web Editor NEWSingleR: Single-cell RNA-seq cell types Recognition (legacy version)
License: GNU General Public License v3.0
SingleR: Single-cell RNA-seq cell types Recognition (legacy version)
License: GNU General Public License v3.0
Hi,
I'm attempting to use singleR within an R script submitted via qsub. It appears to be erroring out at:
1] "Fine-tunning round on top cell types (using 16 CPU cores):"
Have you seen this before and do you know a work around?
Thanks,
Alex
I realize that, within that function, the line: "orig.ident = sc.data$orig.ident[N >= min.genes]" should be
"orig.ident = sc.data$orig.ident[N >= min.genes,]". My understanding is that orig.ident is a data.frame, and you are trying to fetch 'rows' with enough genes. But your current code seems to deal with a 'vector', which brings error to incorporate custom data.
Btw, I am giving SingleR a shot to annotate my single cells. Hope it works.
I would appreciate if you guys can post the published paper soon for me to cite in near future, (or at least hope that it is close to publication).
Regards,
I've ran the following lines:
library("Seurat")
library("dplyr")
library("SingleR")
SingleR.numCores <- 31
exp.data <- Read10X_h5("/gfs/work/avoda/ibd/data/rna/HCA/immune_census/cord_blood/ica_cord_blood_h5.h5")
singler <- CreateBigSingleRObject(exp.data, annot = NULL, project.name = "28Jan_HCA_CB", xy = NULL, clusters = NULL,
N = 10000, min.genes = 200, technology = "10X",
species = "Human", citation = "", ref.list = hpca,
normalize.gene.length = F, variable.genes = "de", fine.tune = F,
reduce.file.size = T, do.signatures = F, do.main.types = T,
temp.dir = getwd(), numCores = SingleR.numCores)
The reason why I ran it without fine-tuning is for speed & debugging purposes.
And I get the following error:
[1] "Dimensions of counts data: 33694x10000"
[1] "Annotating data with HPCA..."
[1] "Variable genes method: de"
[1] "Number of DE genes:4394"
[1] "Number of cells: 5620"
[1] "Fine-tuning round on top cell types (using 31 CPU cores):"
Error in sendMaster(try(lapply(X = S, FUN = FUN, ...), silent = TRUE)) :
write error, closing pipe to the master
Calls: CreateBigSingleRObject ... pbmclapply -> mclapply -> lapply -> FUN -> sendMaster
Error in dimnames(x) <- dn :
length of 'dimnames' [1] not equal to array extent
Calls: CreateBigSingleRObject ... SingleR.CreateObject -> SingleR -> SingleR.FineTune -> rownames<-
Execution halted
The expression matrix is available here: https://s3.amazonaws.com/preview-ica-expression-data/ica_cord_blood_h5.h5
Hello, I found that in the source code, the ref.list option is not passed into the CreateSinglerObject function when I am using CreateBigSingleRObject function. Therefore if I provide this function a customized reference list, it will not use it to annotate my cells.
Thanks for your help!
Best
Hi,
Thanks for developing such a great package! I was hoping if you could elaborate on the decision making process for the 'quantile.use' argument. I have been playing around with different values and am getting widely different results for each run. I looked at the documentation,
'correlation coefficients are aggregated for multiple cell types in the reference data set. This parameter allows to choose how to sort the cell types scores, by median (0.5) or any other number between 0 and 1. The default is 0.9.'
What does the quantile of 0.9 mean and is their any intuition in how to decide this variable for a given dataset?
Thank you so much and look forward to hearing from you!
Hi! I'm still trying to combine my different data sets in SingleR.Combine. That part works now, but convertSingleR2Browser is complaining. I updated SingleR since I saw someone post a similar issue here a couple of days ago, but I still got the same error message.
> list_samples<- list(singler_ctrl, singler_stim)
> list_xy <- list(singler_ctrl[["seurat"]]@dr[["tsne"]]@cell.embeddings,singler_stim[["seurat"]]@dr[["tsne"]]@cell.embeddings)
>
> singler=SingleR.Combine(list_samples)
> saveRDS(singler, file=paste0(ctrl_samp.name,"_singlertemp.rds"))
> singler_ctrl.stim.new = convertSingleR2Browser(singler, use.singler.cluster.annot = F)
Error in `row.names<-.data.frame`(`*tmp*`, value = value) :
invalid 'row.names' length
> #saveRDS(singler_ctrl.stim.new, 'ctrl_stim.rds')
> traceback()
5: stop("invalid 'row.names' length")
4: `row.names<-.data.frame`(`*tmp*`, value = value)
3: `row.names<-`(`*tmp*`, value = value)
2: `rownames<-`(`*tmp*`, value = cell.names)
1: convertSingleR2Browser(singler, use.singler.cluster.annot = F)
I noticed that in my SingleR.Combine object (called singler) some parts does not seem to have been combined properly.(?) For example in the screenshot below there is always 1464 cells, except for in pval where it's 708 values (the same number of cells as in singler_ctrl). "cell names" in SingleR.single.main also has a dodgy number, and so on. In the original singler_ctrl and singler_stim lists, these numbers are the same throughout the list. (708 and 756).
The full code I use if you need to look at it:
SingleR.pdf
I have done all my efforts to install SingleR package. But it still cannot work. My R is updated to version 3.5.2. And seurat is updated to version 3.0.
Method 1:
library(curl)
library(devtools)
install_github('dviraran/SingleR')
Method 2:
devtools::install_github('dviraran/SingleR')
Method 3:
I downloaded the package file and installed it by using local files.
Who can help me to solve this problem? Thank you.
Hi, I have 2 data, I have merged them into 1 seurat object for integrated analysis using this pipeline: https://satijalab.org/seurat/immune_alignment.html
Then I tried to use singleR for analysis:
immune.combined = AlignSubspace(pbmc.combined, reduction.type = "cca", grouping.var = "stim", dims.align = 1:20)
singler = CreateSinglerObject([email protected], annot = NULL, project.name = "PBMC", min.genes = 0,
technology = "10X", species = "Human", citation = "",
ref.list = list(), normalize.gene.length = F, variable.genes = "de",
fine.tune = T, do.signatures = T, clusters = NULL, do.main.types = T,
reduce.file.size = T, numCores = 3)
out = SingleR.PlotTsne(singler$singler[[1]]$SingleR.single,
singler$meta.data$xy,do.label = F,
do.letters = T,labels = singler$meta.data$orig.ident,
dot.size = 1.3,alpha=0.5,label.size = 6)
out$p
But it resulted in this error:
Error in FUN(X[[i]], ...) : object 'x' not found
I also tried to draw TSNE plot using seurat identity
out = SingleR.PlotTsne(singler$singler[[1]]$SingleR.single,
singler$meta.data$xy,do.label = T,
do.letters = F,labels=singler$seurat@ident,
dot.size = 1.3,label.size = 5,alpha=0.5)
out$p
but it has error:
Error in SingleR.PlotTsne(singler$singler[[1]]$SingleR.single, singler$meta.data$xy, :
trying to get slot "ident" from an object of a basic class ("NULL") with no slots
Can you please point out what's wrong with my analysis? Thank you very much
Is there a feature to add custom expression data to the algorithm? ex if I have tumor signatures?
Hello,
I've successfully made a new reference, and now I'm interested in creating more specific sub-types in this reference. How would I go about doing this? Is there a way to see how the build-in reference objects are structured? Does each main type require subtypes?
Thanks!
Hi,
I can run SingleR using the following commands:
singler = CreateSinglerObject(seu@data, project.name=pname, min.genes=as.integer(nGene),technology="10X",species=species, normalize.gene.length=F,variable.genes="de",fine.tune=T,do.signatures=T,do.main.types=T,reduce.file.size = T,numCores = 30)
singler$meta.data$xy = seu@dr$[email protected] # the UMAP coordinates
singler$meta.data$clusters = as.character([email protected]$DBclust.ident)
However, because some samples have a large number of cells, I then run SingleR using "CreateBigSingleRObject":
singler = CreateBigSingleRObject(seu@data, annot=NULL, xy=seu@dr$[email protected], clusters=as.character([email protected]$DBclust.ident),project.name=pname,min.genes=as.integer(nGene), technology="10X", species=species, normalize.gene.length=F,variable.genes="de",fine.tune=T,reduce.file.size=T,do.signatures=T,do.main.types=T,temp.dir=getwd(), numCores = 30)
Then I'm getting this error:
Error in singler.list[[i]] : subscript out of bounds
Calls: CreateBigSingleRObject -> SingleR.Combine
In addition: Warning messages:
1: In .local(expr, gset.idx.list, ...) :
4553 genes with constant expression values throuhgout the samples.
2: In .local(expr, gset.idx.list, ...) :
5227 genes with constant expression values throuhgout the samples.
3: In .local(expr, gset.idx.list, ...) :
5094 genes with constant expression values throuhgout the samples.
4: In .local(expr, gset.idx.list, ...) :
5696 genes with constant expression values throuhgout the samples.
Execution halted
Does this error results from running "CreateBigSingleRObject" on samples with less than 10000 cells?
when i use
singler.new = convertSingleR2Browser(singler)
i have some error
Error in `.rowNamesDF<-`(x, value = value) : invalid 'row.names' length
my code is:
library(Seurat)
library(SingleR)
data<-Read10X(data.dir = '/Users/su/Documents/Single/hg38/filtered_gene_bc_matrices/GRCh38')
singler = CreateSinglerObject(counts=data, annot = NULL, project.name="ovary", min.genes = 0,
technology = "10X", species = "Human", citation = "",
ref.list = list(), normalize.gene.length = F, variable.genes = "de",
fine.tune = F, do.signatures = F, clusters = NULL, do.main.types = T,
reduce.file.size = T, numCores = SingleR.numCores)
ovary<-readRDS("/Users/su/Documents/Single/hg38/ovary.rds")
singler$meta.data$orig.ident = [email protected]$orig.ident
singler$meta.data$xy = ovary@[email protected] # the umap coordinates
singler$meta.data$clusters = [email protected]
singler.new = convertSingleR2Browser(singler)
everythings are ok except create new single objects
Hi,
Thanks for developing such wonderful tool. I want to use singleR to identify cell types in my 10X data. The result shows there is a cluster of "Monocyte" in my data. However, there should not be monocyte because our cell are from liver tissue. I think these monocyte should be macrophages. So I want to know if I can exclude Monocyte from the ENCODE/Blueprint dataset, thus these cells will be annotated to macropahge.
Meanwhile, as singleR will assign each cell to a celltype finally, but there maybe some doublets in 10X data, for example, some cells express the markers for T cell and macrophage simultaneously. Could you give me some advice on how to remove these cells using singleR result?
Thanks a lot!
Yang
Another problem: I successfully produce a single-cell identification for my data set (fine.tune =T), upload to SingleRbrowseR, the uploading is completed but then I get disconnected from the server. I've tried with both .rds and .Rdata. Have I created the S4 object properly? my raw_DGE is 10000 cells and 18264 genes.
code: (Sorry, I deleted the output, but I found no apparent errors)
#fine tune = TRUE
library(SingleR)
raw_DGE <- read.table(file = "/home/proj/data/DGE/CP1_DGE.txt", header = TRUE, row.names = 1, colClasses =c("character", rep("numeric", 10000)))
singler7 <- CreateSinglerSeuratObject(counts = raw_DGE, project.name = 'CP1', species = "Mouse", fine.tune = T)
singler.new = convertSingleR2Browser(singler7)
saveRDS(singler7, 'CP1_singler7.rds')
#accidentally cleared my environment so had to load it again
singler8<-readRDS('/home/proj/tools/SingleR/CP1_singler7.rds')
save(singler8, file="CP1_singler7.RData")
Hi,
I want to visualize top 30 most abundant cell types in a heat map so I set top.n=30 in SingleR.DrawHeatmap. However, some top cell types (such macrophage) are missing from the heatmap.
So How are top.n cell types selected in SingleR.DrawHeatmap?
Best,
Danshu
Hi,
Here is the error message:
ref$de.genes = CreateVariableGeneSet(expr,types,200)
Error in if (sum(A) == 1) { : missing value where TRUE/FALSE needed
Best,
danshu
Hi
When running singleR.combine on a multiple singler objects i get
1: In rbind(singler$singler[[j]]$SingleR.single.main$pval, singler.list[[i]]$singler[[j]]$SingleR.single.main$pval) : number of columns of result is not a multiple of vector length (arg 2)
Ive updated to the latest version and still getting the same problem. Suggestions please?
All im running is
singler_combined = SingleR.Combine( singler_objects, order = colnames(combined_normalized_counts) )
Hello,
I am using SingleR package v0.2.2 in Window.
Than I get following error message:
If I do manually assign number of CPU core to 1, I still get following error message:
Is there no other way to increase number of cores to use in Window?
Also, can you take a look at the error message regarding missing TRUE/FALSE value?
Thank you.
Hi, I was trying to install SingleR and got the following error message:
configure: error: The version of hdf5 installed on your system is not sufficient. Please ensure that at least version 1.8.13 is installed.
However, I already have Seurat development version (3.0) installed and worked fine. Is there any workaround or ultimate solution? Thanks.
Hi,
Just a quick question regarding the input of counts data with existing Seurat object. Should I use the raw counts ([email protected]) or log normalized counts (SeuratObject@data)? I tried using raw.data but I wasn't able to integrate the created SingleR object with the original TSNE coordinates & cluster IDs because I filtered some cells out. I get errors like this when I try to plot new Tsne:
Error in $<-.data.frame
(*tmp*
, "x", value = c(AAACCTGCAAAGGAAG = -11.8154689265475, :
replacement has 6444 rows, data has 6476
Any advices are highly appreciated! Thank you!
Thanks for developing this tools. I am following the case 2. by providing the member of the Seurat object (X): Do you have any suggestion?
singler <- CreateSinglerObject(as.matrix([email protected]), annot = NULL, project.name="Lung10X",min.genes = 0,
technology = "10X", species = "Mouse", citation = "",
ref.list = list(), normalize.gene.length = F, variable.genes = "de",
fine.tune = T, do.signatures = T, clusters = NULL, do.main.types = T,
reduce.file.size = T, numCores = 32)
[1] "Dimensions of counts data: 17620x16043"
[1] "Annotating data with Immgen..."
[1] "Variable genes method: de"
[1] "Number of DE genes:3400"
[1] "Number of cells: 16043"
[1] "Fine-tuning round on top cell types (using 32 CPU cores):"
|======================================================================================================================| 100%, Elapsed 25:06
[1] "Number of DE genes:3400"
[1] "Number of clusters: 10"
[1] "Fine-tuning round on top cell types (using 32 CPU cores):"
|======================================================================================================================| 100%, Elapsed 00:01
[1] "Annotating data with Immgen (Main types)..."
[1] "Number of DE genes:2231"
[1] "Number of cells: 16043"
[1] "Fine-tuning round on top cell types (using 32 CPU cores):"
|======================================================================================================================| 100%, Elapsed 02:10
[1] "Number of DE genes:2231"
[1] "Number of clusters: 10"
[1] "Fine-tuning round on top cell types (using 32 CPU cores):"
|======================================================================================================================| 100%, Elapsed 00:01
[1] "Annotating data with Mouse-RNAseq..."
[1] "Variable genes method: de"
[1] "Number of DE genes:3630"
[1] "Number of cells: 16043"
[1] "Fine-tuning round on top cell types (using 32 CPU cores):"
|======================================================================================================================| 100%, Elapsed 00:34
[1] "Number of DE genes:3630"
[1] "Number of clusters: 10"
[1] "Fine-tuning round on top cell types (using 32 CPU cores):"
|======================================================================================================================| 100%, Elapsed 00:01
[1] "Annotating data with Mouse-RNAseq (Main types)..."
[1] "Number of DE genes:2871"
[1] "Number of cells: 16043"
[1] "Fine-tuning round on top cell types (using 32 CPU cores):"
|======================================================================================================================| 100%, Elapsed 00:30
[1] "Number of DE genes:2871"
[1] "Number of clusters: 10"
[1] "Fine-tuning round on top cell types (using 32 CPU cores):"
|======================================================================================================================| 100%, Elapsed 00:01
Error in dimnames(x) <- dn :
length of 'dimnames' [1] not equal to array extent
Hi, I got an error below following "Case 2: Already have a single-cell object" in
http://comphealth.ucsf.edu/sample-apps/SingleR/SingleR_create.html
counts is a 10X directory which has a matrix file.
On the other hand, CreateSinglerSeuratObject function doesn't give the error with the counts.
Would you advise how I can resolve the issue and not create a Seurat object?
Thanks!
singler = CreateSinglerObject(counts, annot = NULL, project.name, min.genes = 500,
+ technology = "10X", species = "Human", citation = "",
+ ref.list = NULL, normalize.gene.length = F, variable.genes = "de",
+ fine.tune = F, do.signatures = T, clusters = NULL, do.main.types = T,
+ reduce.file.size = T, numCores = 8)
[1] "Dimensions of counts data: 32734x16194"
Error in base::colSums(x, na.rm = na.rm, dims = dims, ...) :
'x' must be an array of at least two dimensions
Hi,
After combining singler objects using SingleR.Combine() and converting using convertSingleR2Browser(), the "other" slot is empty. And I also can not find slot containning pvalue for SingleR.
Best,
Danshu
in line:
singler$singler[[j]]$SingleR.single.main$labels1 = c(singler$singler[[j]]$SingleR.single.main$labels1,
singler.list[[i]]$singler[[j]]$SingleR.single.main$labels1)
results in incorrect # of dimensions for labels1
changed concatenate to rbind and now the function is working as expected and SingleR.Subset works
I have tried to install Single R in all the following ways in different combinations:
devtools::install_github('dviraran/SingleR')
devtools::install_github("dviraran/SingleR")
library(curl)
library(devtools)
install_github('dviraran/SingleR')
I have also re-installed curl, re-installed devtools with dependencies, restarted R (3.4.4) updated everything, installed libcurl4-gnutls-dev instead of libcurl4-openssl-dev (they always replace each other) but after a loooooong wait I always get the following error:
Downloading GitHub repo dviraran/SingleR@master
Error in utils::download.file(url, path, method = download_method(), quiet = quiet, :
download from 'https://api.github.com/repos/dviraran/SingleR/tarball/master' failed
And at one point I got an error in curl (sorry, i did not save the full curl error message!)
Error in curl::curl_fetch_memory(url, handle = h) :
Timeout was reached:
I also tried with and without proxy. Same problem.
This might be unrelated to Single R, but in fact i used devtools:install_github earlier today and it worked. I already have a working version of Seurat installed, could this cause issues? If possible, can I install Single R without Seurat? OR is there any other way to install Single R?
Hi Dvir,
Was wondering how to regress out covariates other than nUMI.
My code is as below. When I try to regress out orig.ident, there is an error.
A = sample(1:ncol(combined$sc.data), 2000)
annot=data.frame(orig.ident = combined$orig.ident[A],cancer_type =cancer_type[A] )
singler = CreateSinglerSeuratObject(combined$sc.data[,rownames(annot)], annot = annot[,1], project.name="test",
min.genes = 200, technology = "10X", species = "Human", citation = "",
ref.list = list(), normalize.gene.length = F, variable.genes = "de",
fine.tune = T, reduce.file.size = T, do.signatures = T, min.cells = 2,
npca = 10, regress.out = "orig.ident", do.main.types = T,
reduce.seurat.object = T, numCores = SingleR.numCores)
Regressing out: orig.ident
| | 0%Error in contrasts<-
(*tmp*
, value = contr.funs[1 + isOF[nn]]) :
contrasts can be applied only to factors with 2 or more levels
Hi,
I know you recently changed the calculation to use singscore instead of ssGSEA. When I tried to run CreateSinglerObject with do.signatures=T, I got the following error
[1] "Number of DE genes:2377"
[1] "Number of clusters: 10"
[1] "Fine-tuning round on top cell types (using 16 CPU cores):"
Error in rownames(a) : object 'a' not found
Calls: CreateSinglerObject -> calculateSingScores -> rownames
Execution halted
I think this "a" may be a holdover from your previous calculateSignatures function that doesn't get defined in your current function.
Thanks
Hi,
I've been using the CreateSinglerObject to run SingleR for a number of different samples that I'm working with. What I've observed is that for some cell line samples, I come across an error when it gets to the calculateSignatures step.
Estimating ssGSEA scores for 4 gene sets.
|
| | 0%Using parallel with 15 cores
|======================================================================| 100%
Error in scores[, i:last] <- a :
number of items to replace is not a multiple of replacement length
Calls: CreateSinglerObject -> calculateSignatures
In addition: Warning message:
In .local(expr, gset.idx.list, ...) :
1471 genes with constant expression values throuhgout the samples.
Execution halted
Upon digging further into the code, it seems like this is because when the scores matrix is generated the number of rows is based on the length of egc. However, potentially due to the number of genes with constant expression values, the results returned by the gsva function is actually shorter. (In my case "Cytotoxicity" was missing.)
I don't know if you had any advice for how to avoid/fix this error.
Thanks
Hi,
After generating singler, there is an error running "convertSingleR2Browser".
Here is the error message:
Error in .rowNamesDF<-
(x, value = value) : invalid 'row.names' length
convertSingleR2Browser
function (singler, use.singler.cluster.annot = T)
{
ref.names = unlist(lapply(singler$singler, FUN = function(x) x$about$RefData))
cell.names = rownames(singler$singler[[1]]$SingleR.single$labels)
labels = as.data.frame(sapply(singler$singler, FUN = function(x) x$SingleR.single$labels))
if (!is.null(singler$singler[[1]]$SingleR.single.main)) {
labels.main = as.data.frame(sapply(singler$singler, FUN = function(x) x$SingleR.single.main$labels))
labels = cbind(labels, labels.main)
colnames(labels) = c(ref.names, paste0(ref.names, ".main"))
}
else {
colnames(labels) = c(ref.names)
}
rownames(labels) = cell.names
labels1 = data.frame()
if (!is.null(singler$singler[[1]]$SingleR.single$labels1)) {
labels1 = as.data.frame(sapply(singler$singler, FUN = function(x) x$SingleR.single$labels1))
if (!is.null(singler$singler[[1]]$SingleR.single.main)) {
labels1.main = as.data.frame(sapply(singler$singler,
FUN = function(x) x$SingleR.single.main$labels1))
labels1 = cbind(labels1, labels1.main)
colnames(labels1) = c(ref.names, paste0(ref.names,
".main"))
}
else {
colnames(labels1) = c(ref.names)
}
rownames(labels1) = cell.names
}
labels.clusters = data.frame()
labels.clusters1 = data.frame()
if (use.singler.cluster.annot == T) {
if (length(levels(singler$meta.data$clusters)) > 1) {
if (!is.null(singler$singler[[1]]$SingleR.clusters)) {
labels.clusters = as.data.frame(sapply(singler$singler,
FUN = function(x) x$SingleR.clusters$labels))
if (!is.null(singler$singler[[1]]$SingleR.clusters.main)) {
labels.clusters.main = as.data.frame(sapply(singler$singler,
FUN = function(x) x$SingleR.clusters.main$labels))
labels.clusters = cbind(labels.clusters, labels.clusters.main)
colnames(labels.clusters) = c(ref.names, paste0(ref.names,
".main"))
}
else {
colnames(labels.clusters) = c(ref.names)
}
rownames(labels.clusters) = levels(singler$meta.data$clusters)
}
if (!is.null(singler$singler[[1]]$SingleR.cluster$labels1)) {
if (!is.null(singler$singler[[1]]$SingleR.clusters)) {
labels.clusters1 = as.data.frame(sapply(singler$singler,
FUN = function(x) x$SingleR.clusters$labels1))
if (!is.null(singler$singler[[1]]$SingleR.clusters.main)) {
labels.clusters.main = as.data.frame(sapply(singler$singler,
FUN = function(x) x$SingleR.clusters.main$labels1))
labels.clusters1 = cbind(labels.clusters1,
labels.clusters.main)
colnames(labels.clusters1) = c(ref.names,
paste0(ref.names, ".main"))
}
else {
colnames(labels.clusters1) = c(ref.names)
}
rownames(labels.clusters1) = levels(singler$meta.data$clusters)
}
}
}
}
scores = lapply(singler$singler, FUN = function(x) x$SingleR.single$scores)
if (!is.null(singler$singler[[1]]$SingleR.single.main)) {
scores.main = lapply(singler$singler, FUN = function(x) x$SingleR.single.main$scores)
scores = c(scores, scores.main)
names(scores) = c(ref.names, paste0(ref.names, ".main"))
}
else {
names(scores) = c(ref.names)
}
clusters = data.frame(clusters = singler$meta.data$clusters)
rownames(clusters) = cell.names
ident = data.frame(orig.ident = singler$meta.data$orig.ident)
rownames(ident) = cell.names
singler.small = new("SingleR", project.name = singler$meta.data$project.name,
xy = singler$meta.data$xy, labels = labels, labels.NFT = labels1,
labels.clusters = labels.clusters, labels.clusters.NFT = labels.clusters1,
scores = scores, clusters = clusters, ident = ident,
other = data.frame(singler$signatures), expr = singler$seurat@data,
meta.data = c(Citation = singler$singler[[1]]$about$Citation,
Organism = singler$singler[[1]]$about$Organism, Technology = singler$singler[[1]]$about$Technology))
singler.small
}
<bytecode: 0x561ba82a9c18>
<environment: namespace:SingleR>
Best,
Danshu
Hi,
Thanks for the great tool. I have some questions with regard to building the singleR obj, as I found the readme file not detailed.
CreateSinglerObject
function the make the SingleR obj. The R commands went well. However, the webtool was giving me the following errors:error: cannot coerce type 'closure' to vector of type 'character
(which is not clear what this mean)object 'x' not found
what is the size limit for uploading singleR obj to the webtool? what I should do in order to avoid such message if I have big dataset?
What this function means CreateSinglerSeuratObject
? The singleR package will make a Seurat obj in addtion to the singleR obj?
It is a must to use the following lines if I have a Seurat obj...
singler$meta.data$orig.ident = [email protected]$orig.ident
singler$meta.data$xy = seuratObj@[email protected]
singler$meta.data$clusters = seuratObj@ident
In the readme file I found "annot can be a tab delimited text file or a data.frame. Rownames correspond to column names in the counts data.". It is not clear for me what is the structur of this file? row by columns with empty cells?
Thanks a lot,
HM
Having troubles converting the SingleR object to be able to upload it on the browser. This is raw DGE data that I'v downloaded. it has worked in Seurat previously, but never in SingleR. I've tried a couple of different ways to read in the data, but the same issue occurs. what have I missed?
> > raw_DGE_TEST <- read.table( file ="/home/proj/TEST/kidney2/GSM2906426_Kidney2_dge.txt.gz", header=TRUE)
> >
> > singler6 <- CreateSinglerSeuratObject(counts = raw_DGE_TEST, project.name = 'Kidney2', species = "Mouse", fine.tune = F)
> [1] "Kidney2"
> [1] "Reading single-cell data..."
> [1] "Create Seurat object..."
> Performing log-normalization
> 0% 10 20 30 40 50 60 70 80 90 100%
> [----|----|----|----|----|----|----|----|----|----|
> **************************************************|
> Calculating gene means
> 0% 10 20 30 40 50 60 70 80 90 100%
> [----|----|----|----|----|----|----|----|----|----|
> **************************************************|
> Calculating gene variance to mean ratios
> 0% 10 20 30 40 50 60 70 80 90 100%
> [----|----|----|----|----|----|----|----|----|----|
> **************************************************|
> Regressing out: nUMI
> |===================================================================================================================================================| 100%
> Time Elapsed: 33.442033290863 secs
> Scaling data matrix
> |===================================================================================================================================================| 100%
> [1] "Creat SingleR object..."
> [1] "Dimensions of counts data: 19697x6202"
> [1] "Annotating data with Immgen..."
> [1] "Variable genes method: de"
> [1] "Number of DE genes:3389"
> [1] "Number of cells: 6202"
> [1] "Number of DE genes:3389"
> [1] "Number of clusters: 13"
> [1] "Annotating data with Immgen (Main types)..."
> [1] "Number of DE genes:2188"
> [1] "Number of cells: 6202"
> [1] "Number of DE genes:2188"
> [1] "Number of clusters: 13"
> [1] "Annotating data with Mouse-RNAseq..."
> [1] "Variable genes method: de"
> [1] "Number of DE genes:3555"
> [1] "Number of cells: 6202"
> [1] "Number of DE genes:3555"
> [1] "Number of clusters: 13"
> [1] "Annotating data with Mouse-RNAseq (Main types)..."
> [1] "Number of DE genes:2796"
> [1] "Number of cells: 6202"
> [1] "Number of DE genes:2796"
> [1] "Number of clusters: 13"
> > singler.new = convertSingleR2Browser(singler6)
> Error in names(x) <- value :
> 'names' attribute [4] must be the same length as the vector [0]
> > traceback()
> 2: `colnames<-`(`*tmp*`, value = c(ref.names, paste0(ref.names,
> ".main")))
> 1: convertSingleR2Browser(singler6)
Loading the data directly doesn't work at all since it doesn't recognize the column names I guess (?)
> singler7 <- CreateSinglerSeuratObject(counts = "/home/proj/TEST/kidney2/GSM2906426_Kidney2_dge.txt", project.name = '7CP1', species = "Mouse", fine.tune = F)
[1] "7CP1"
[1] "Reading single-cell data..."
Error in make.unique(colnames(counts)) :
'names' must be a character vector
> traceback()
3: make.unique(colnames(counts))
2: ReadSingleCellData(counts, annot)
1: CreateSinglerSeuratObject(counts = "/home/proj/TEST/kidney2/GSM2906426_Kidney2_dge.txt",
project.name = "7CP1", species = "Mouse", fine.tune = F)
Hi,
Previously I was using SingleR on my own laptop, which can handle around 5000 cells with fine tuning. However I want to analyze 50000 cells data so I installed Single R on HPC running with R/3.5.0. After running for 3 to 4 hours it always give me the error:
my output:
[1] "Dimensions of counts data: 16127x52693"
[1] "Annotating data with Immgen..."
[1] "Variable genes method: de"
[1] "Number of DE genes:3121"
[1] "Number of cells: 52693"
[1] "Fine-tuning round on top cell types (using 8 CPU cores):"
[1] "Number of DE genes:3121"
[1] "Number of clusters: 10"
[1] "Fine-tuning round on top cell types (using 8 CPU cores):"
[1] "Annotating data with Immgen (Main types)..."
[1] "Number of DE genes:2013"
[1] "Number of cells: 52693"
[1] "Fine-tuning round on top cell types (using 8 CPU cores):"
~
System output file:
------------------------------------------------------------
Successfully completed.
Resource usage summary:
CPU time : 70723.40 sec.
Max Memory : 104 GB
Average Memory : 57.05 GB
Total Requested Memory : 180.00 GB
Delta Memory : 76.00 GB
Max Swap : -
Max Processes : 12
Max Threads : 13
Run time : 15481 sec.
Turnaround time : 16104 sec.
The output (if any) follows:
During startup - Warning message:
Setting LC_CTYPE failed, using "C"
Loading required package: ggplot2
Loading required package: cowplot
Attaching package: 'cowplot'
The following object is masked from 'package:ggplot2':
ggsave
Loading required package: Matrix
Warning message:
replacing previous import 'BiocGenerics::dims' by 'Biobase::dims' when loading 'AnnotationDbi'
*** caught segfault ***
address 0x2aadc852ee90, cause 'memory not mapped'
Traceback:
1: hclust(dist(scores, method = "euclidean"), method = "ward.D2")
2: SingleR.Cluster(singler$SingleR.single.main, 10)
3: SingleR.CreateObject(sc.data.gl, x, clusters, species, citation, technology, do.main.types = do.main.types, variable.genes = variable.genes, fine.tune = fine.tune, numCores = numCores)
4: FUN(X[[i]], ...)
5: lapply(ref.list, FUN = function(x) { SingleR.CreateObject(sc.data.gl, x, clusters, species, citation, technology, do.main.types = do.main.types, variable.genes = variable.genes, fine.tune = fine.tune, numCores = numCores)})
6: CreateSinglerObject(counts = treated.cca@data, annot = NULL, project.name = "treated", technology = "10X", species = "Mouse", citation = "", ref.list = list(immgen), normalize.gene.length = F, variable.genes = "de", fine.tune = T, do.signatures = T, clusters = NULL, do.main.types = T, reduce.file.size = T, numCores = 8)
An irrecoverable exception occurred. R is aborting now ...
/home/jhan6/.lsbatch/1549232611.536704.shell: line 26: 29140 Segmentation fault (core dumped) Rscript run_singler.R > run_singler.out
My R jobs are done.
Sun Feb 3 20:51:55 CST 2019
My job ran on the following host:
cdragon096
How can I solve this problem?
Sincere
Han.
Dear Dvir
Recently I am working with Seurat, and I felt some problem with single marker identification. Indeed, before I know SingleR, I found it could be useful if we have some reference to score the cells -- and I am really glad that you have done this. Thank you very much.
Compared to Seurat tutorial, SingleR's one and trouble shooting are fairly not enough. When I start with Seurat with zero experience, I did very good job in just 1 week. SingleR seems more vague for me, maybe because my poor understanding.
The first problem I met was fine tune. I have analyzed my data with Seurat and subset T/NK cell out. Then I used data frame as SingleR count data.
`#Try SingleR
singler_Lym = CreateSinglerObject(counts = Lym_repca@data[,[email protected]], annot = NULL, project.name = "Lym",
technology = "10X", species = "Mouse", citation = "",
ref.list = list(immgen), normalize.gene.length = F, variable.genes = "de",
fine.tune = T, do.signatures = F, clusters = NULL, do.main.types = T,
reduce.file.size = T, numCores = 4)`
My mbp 2017 ran about 2 hours and return SingleR object without error message, but in singler_Lym.singler[[1]].SingleR.single.labels, which suppose to be fine tuned data, I guess, are errors:
X2_TACTCGCTCTGAGTGT-2 "Error in if (sd(sc_data.filtered) > 0) { : \n missing value where TRUE/FALSE needed\n"
I retried several time but all failed, and I cannot find the reason. Also, can I add fine tune in this singleR project without reusing create function?
Thank you very much!
Sincere
Han.
Hi,
I prefer using UMAP for dimension reduction. Would you considering updating "SingleR.PlotTsne" to allow specifying other XY labels instead of tSNE_1 and tSNE_2?
Best,
Danshu
Hello, thanks so much for this extremely helpful method for the annotation of the cell types. I am just wondering that if its possible to add a reference list from a single cell expression matrix (with annotated cell types), if so, is there any differences in constructing it from the bulk RNAseq reference in code?
Thanks so much for your time!
Best Regards
I wanted to combine a control and stimulated dataset to look at DE genes in the SingleRbrowseR. My plan was to give "ctrl" or "stim" as original identities for the two sets, create a SinglerSeurat object for both sets and combine them. I managed to view both of the datasets ctrl.rds and stim.rds in the browser, so that bit of code works fine. I get an error at the SingleR.Combine:
> comb_ctrl.stim=SingleR.Combine(c(singler_ctrl, singler_ctrl))
Error in if (singler.list[[i]]$singler[[j]]$about$RefData != singler.list[[1]]$singler[[j]]$about$RefData) { :
argument is of length zero
> traceback()
1: SingleR.Combine(c(singler_ctrl, singler_ctrl))
The RefData is created automatically as Immgen for both ctrl and stim. In the example below I've tried to define xy also, with the same result. (Please note that I'm creating a Seurat object from an object that already is Seurat, I will remove this soon). Is my idea doable at all? In that case, what is the problem with RefData?
#creating original identity text documents for CTRL and STIM
write_ctrl=rep("ctrl", length(colnames(ctrl@data)))
write.table(write_ctrl, file="ctrl_orig.ident.txt", sep="\t", eol ="\n",row.names=colnames(ctrl@data))
write_stim=rep("stim", length(colnames(stim@data)))
write.table(write_stim, file="stim_orig.ident.txt", sep="\t", eol ="\n",row.names=colnames(stim@data))
#creating SinglerSeurat objects for CTRL and STIM
ctrl_raw <- as.matrix(x=ctrl@data)
singler_ctrl <- CreateSinglerSeuratObject(counts = ctrl_raw, annot="ctrl_orig.ident.txt", project.name = 'CP1 small', min.genes = 500, min.cells = 1, technology = "Microwell-seq", species = "Mouse", npca = 10, fine.tune = T)
singler_ctrl.new = convertSingleR2Browser(singler_ctrl)
saveRDS(singler_ctrl.new, 'ctrl.rds')
stim_raw <- as.matrix(x=stim@data)
singler_stim <- CreateSinglerSeuratObject(counts = stim_raw, annot= "stim_orig.ident.txt", project.name = 'CS1 small', min.genes = 500, min.cells = 1, technology = "Microwell-seq", species = "Mouse", npca = 10, fine.tune = T)
singler_stim.new = convertSingleR2Browser(singler_stim)
saveRDS(singler_stim.new, 'stim.rds')
#Combining CTRL and STIM
comb_ctrl.stim=SingleR.Combine(c(singler_ctrl, singler_stim), xy = c(singler_ctrl[["seurat"]]@dr[["tsne"]]@cell.embeddings,singler_stim[["seurat"]]@dr[["tsne"]]@cell.embeddings))
singler_ctrl.stim.new = convertSingleR2Browser(comb_ctrl.stim)
saveRDS(singler_ctrl.stim.new, 'ctrl_stim.rds')
Looking at the the web application's incorporated data sets, it seems that every cell is assigned to a cell type. SingleR should be able to report Unknown for cells which have little correlation to any of the cell types in the reference database.
Hi! 've got the following error when running the following tutorial (http://comphealth.ucsf.edu/sample-apps/SingleR/SingleR.MCA.html) line-by-line:
> for (i in s) {
+ print(i)
+ A = seq(i,min(i+20000-1,length([email protected])))
+ [email protected]$Tissue[A]
+ names(annot) = rownames([email protected])[A]
+
+ singler = CreateSinglerObject([email protected][,A], annot = annot, project.name='MCA',
+ min.genes = 0, technology = "Microwell-Seq",
+ species = "Mouse", citation = "Han et al. 2018",
+ do.signatures = F, clusters = mca@ident[A])
+
+ save(singler,file=paste0('/gfs/work/avoda/repli_mca/MCA/singler.partial.mca.',i,'.RData'))
+ }
[1] 1
[1] "Dimensions of counts data: 39855x20000"
[1] "Annotating data with Immgen..."
[1] "Variable genes method: de"
[1] "Number of DE genes:4171"
[1] "Number of cells: 20000"
Error in foreach(i = 0:length(s)) %dopar% { :
could not find function "%dopar%"
Seems to be caused by not requiring some parallelism libraries before (https://stackoverflow.com/questions/33250475/r-could-not-find-function-dopar)
HI,
I am using singler but when I run CreateSinglerSeuratObject():
singler <- CreateSinglerSeuratObject(counts = file_count,
annot = NULL,
project.name = 'HCC',
min.genes = 0,
min.cells = 0,
technology = '10X',
species = 'Human',
regress.out = 'nUMI',
variable.genes = 'de',
normalize.gene.length = FALSE)
It successfully create the Seurat object (NormalizeData, Findvariablegenes, ScaleData), but throw an error when doing finetuning:
[1] "HCC"
[1] "Reading single-cell data..."
[1] "Create Seurat object..."
Time Elapsed: 41.8454532623291 secs[1] "Creat SingleR object..."
[1] "Dimensions of counts data: 15924x9927"
[1] "Annotating data with HPCA..."
[1] "Variable genes method: de"
[1] "Number of DE genes:3376"
[1] "Number of cells: 9927"
[1] "Fine-tuning round on top cell types (using 15 CPU cores):"
Error in dimnames(x) <- dn :
length of 'dimnames' [1] not equal to array extent
Calls: CreateSinglerSeuratObject ... SingleR.CreateObject -> SingleR -> SingleR.FineTune -> rownames<-
And this error doesn't occur in all samples. Sometimes if I run this command again on a failed sample, it works well and no error occurs. Could you figure out which cause this error? Could it be something related with parallel computing or memory?
Thanks!
Hello,
I've been recently tried to use singleR but it failed after the step creating a singleR object.
My seurat object contains 14793 genes and 1378 cells. I use:
singler = CreateSinglerObject(syno@data, annot = NULL, syno, min.genes = 500,
technology = "10X", species = "Human", citation = "",
ref.list = list(), normalize.gene.length = F, variable.genes = "de",
fine.tune = T, do.signatures = F, clusters = NULL, do.main.types = T,
reduce.file.size = T, numCores = SingleR.numCores)
singler$seurat <- syno
singler$meta.data$xy <- ra.control@dr$[email protected]
singler$meta.data$clusters <- ra.control@ident
to create the singleR object, and:
SingleR.DrawHeatmap(singler$singler[[1]]$SingleR.single.main,top.n=Inf,
clusters = singler$meta.data$clusters)
out = SingleR.PlotTsne(singler$singler[[1]]$SingleR.single.main,
singler$meta.data$xy,do.label=FALSE,
do.letters =T,labels=singler$singler[[1]]$SingleR.single.main$labels,
dot.size = 2, font.size = 12)
out$p
for analysis. However, it returns with an error saying 'replacement has 1378 rows, data has 1345'. Do you know what might cause the data read 33 less rows?
Thank you very much.
Best,
Shurui.
Hi,
thanks for developing SingleR. Great tool!
When i created object with CreateSinglerSeuratObject,
singler = CreateSinglerObject(dir/10X/mm10, annot = NULL, "test", min.genes = 0,technology = "10X", species = "Mouse", citation = "", normalize.gene.length = F, variable.genes = "de",fine.tune = T, do.signatures = T, clusters = NULL, do.main.types = T, reduce.file.size = T, numCores = SingleR.numCores)
[1] "test"
[1] "Reading single-cell data..."
[1] "Create Seurat object..."
Performing log-normalization
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Calculating gene means
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Calculating gene variance to mean ratios
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Regressing out: nUMI
|======================================================================| 100%Error in { : task 1 failed - "incorrect number of dimensions"
I get error with "in { : task 1 failed - "incorrect number of dimensions"."
I also tryied counts file, get the same Error.
Best Wishes
I'm getting the following output when i try to run CreateSinglerSeuratObject:
> singler = CreateSinglerSeuratObject(org18_singleR,
+ min.genes = 500, technology = "10X", project.name = "org18",
+ species = "Human",fine.tune=F,
+ normalize.gene.length = F, min.cells = 5, npca = 10,
+ regress.out = "nUMI", reduce.seurat.object = T)
[1] "org18"
[1] "Reading single-cell data..."
[1] "Create Seurat object..."
Performing log-normalization
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Calculating gene means
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Calculating gene variance to mean ratios
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Regressing out: nUMI
|======================================================================================| 100%
Time Elapsed: 48.5627040863037 secs
Scaling data matrix
|======================================================================================| 100%
[1] "Creat SingleR object..."
[1] "Dimensions of counts data: 17636x12871"
[1] "Annotating data with HPCA..."
[1] "Variable genes method: de"
[1] "Number of DE genes:3342"
[1] "Number of cells: 12871"
Error in makeCluster(numCores) : could not find function "makeCluster"
I just re-installed SingleR in order to convert to the new S4 format and use the browser interface. When I installed I did get this "possible error" which might be related:
> devtools::install_github('dviraran/SingleR')
Downloading GitHub repo dviraran/SingleR@master
Skipping 3 packages not available: GSEABase, GSVA, singscore
✔ checking for file ‘/private/var/folders/sh/qplsh1hj1rncjz_h9wtk4j8c0000gn/T/RtmpHMjK4z/remotes12ee22a6238ec/dviraran-SingleR-f53ff05/DESCRIPTION’ ...
─ preparing ‘SingleR’: (416ms)
✔ checking DESCRIPTION meta-information
─ checking for LF line-endings in source and make files and shell scripts
─ checking for empty or unneeded directories
─ looking to see if a ‘data/datalist’ file should be added
─ building ‘SingleR_0.2.2.tar.gz’ (17s)
* installing *source* package ‘SingleR’ ...
** R
** data
*** moving datasets to lazyload DB
** byte-compile and prepare package for lazy loading
Note: possible error in 'calculateSingScores(sc.data.gl, ': unused argument (numCores = numCores)
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
* DONE (SingleR)
Any help would be appreciated, love the package otherwise! Thanks!
Hello, there're two questions that I'm wondering whether you can help answer.
First, I want to replace "DC: monocyte-derived" of the reference genome with my own bulk RNA-Seq data of tissue-resident DC, but keep all other cell categories. I wonder whether there's a function in your package that will help me retrieve the reference matrix that SingleR use by default (for human), so that I can append my own gene expression data.
Secondly, I noticed that for cell types that are not exist in the reference genome(for example mast cell in one of my samples), SingleR will still do the iterations and return cell types with less correlation values. Is it possible for singleR to return "no cell match" when the correlation is below certain values? So that novel cell types will not be assigned with mandatory types.
Hi Dvir,
I had this working previously, but now when I run SingleR.Subset it thinks for a minute then I get:
Error in s$singler[[i]]$SingleR.single$labels[subsetdata, ] :
incorrect number of dimensions
I've checked and I have the same number of cells in the SingleR and Seurat objects. Could this have to do with the way singleR stores list information for the labels?
Thanks for your attention!
Best,
Ryan
Hi,
I am working on the Single cell analysis using Seurat. I am new to the SingleR, It is really very useful for the single cell level cell type prediction. I have read the SingleR documentation, but still little confused that how a cell is categorized by a cell type. It is written that it uses correlation of the cells with the reference sample. Are signatures genes available for all of the cell types(at single cell level) ? I want to understand more about the score calculation for the cell types. It will be really helpful for the analysis.
Thanks
Hi,
Great tool!
I have same problem with SingleR.DrawHeatmap.
My code as follows.
First proble,
singler = CreateSinglerObject(cell.meta.data, annot = NULL, "test", min.genes = 0,technology = "10X", species = "Mouse",citation = "",normalize.gene.length = F, variable.genes = "de",fine.tune = T, do.signatures = T, clusters = NULL, do.main.types = T,reduce.file.size = T, numCores = SingleR.numCores)
immune.combined <- readRDS(rds_file)
singler$seurat = immune.combined
singler$meta.data$orig.ident = [email protected]$orig.ident
singler$meta.data$xy = immune.combined@dr$[email protected]
singler$meta.data$clusters = immune.combined@ident
#single cell type
SingleR.DrawHeatmap(singler$singler[[2]]$SingleR.single,top.n=37,clusters = singler$meta.data$orig.ident,fontsize_row=15)
But i get error with Error ' in annotation_colors[[colnames(annotation)[i]]] : subscript out of bounds',
Second problem
#cluster cell type
SingleR.DrawHeatmap(singler$singler[[2]]$SingleR.clusters,clusters = c(row.names(singler$singler[[2]]$SingleR.clusters$scores)),order.by.clusters=T,fontsize_row=15,cells_order=c(row.names(singler$singler[[2]]$SingleR.clusters$scores)))
my seurat object have 12 clusters,but i get only 10 cluster with use cluster cell type,how could i get right cluster with seurat object。
Best,
Dvir
The set names have hyperlink URLs which concatenate the GEO ID with some textual description, such as https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM2889481_Prostate1, which leads to an invalid GEO accession error on the GEO website.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.