plger / scdblfinder Goto Github PK
View Code? Open in Web Editor NEWMethods for detecting doublets in single-cell sequencing data
Home Page: https://plger.github.io/scDblFinder/
License: GNU General Public License v3.0
Methods for detecting doublets in single-cell sequencing data
Home Page: https://plger.github.io/scDblFinder/
License: GNU General Public License v3.0
I have 18 samples constructed as a list of 18 SCE objects in which I performed emptyDrops()
and scDblFinder()
in loops. One of the samples (loop 11) failed at the scDblFinder()
step with the error "Error in ecdf(d$cxds_score[w]) : 'x' must have 1 or more non-missing values"
. I know the scripts are working because the other 17 worked so I'm not sure what is wrong with that one object since it was constructed the same as all others that is causing this error. I found the line I think is hanging up 2*ecdf(d$cxds_score[w])(d$cxds_score[w])
in the command, but I don't know how to fix my file.
I reran all the scripts again thinking maybe something didn't get saved in-between correctly on the HPC, but same error same spot.
Update: if I run use.cxds = FALSE
it finishes. I'm still uncertain why this one SCE object has this problem.
Samples = list.files(path='./mats')
for(i in Samples){
RawDat[[i]] <- read10xCounts(paste("/PATH/RAW/mats",i,sep='/'))
cat(paste(i,', ',sep=''))
}
emp_out = vector(mode='list',length=length(Samples))
names(emp_out) = Samples
for(i in Samples){
emp_out[[i]] <- emptyDrops(counts(RawDat[[i]]))
cat(paste(i,', ',sep=''))
}
CellDat = vector(mode='list',length=length(Samples))
names(CellDat) = Samples
for(i in Samples){
CellDat[[i]] <- RawDat[[i]][,which(emp_out[[i]]$FDR <= 0.001)]
cat(paste(i,', ',sep=''))
}
set.seed(101)
for(i in Samples){
CellDat[[i]] <- scDblFinder(CellDat[[i]])
cat(paste(i,', ',sep=''))
}
# ERROR at loop 11 (1-10 and 12 - 18 finishes)
Clustering cells...
11 clusters
Creating ~8883 artifical doublets...
Dimensional reduction
Finding KNN...
Evaluating cell neighborhoods...
Training model...
Error in ecdf(d$cxds_score[w]) : 'x' must have 1 or more non-missing values
> CellDat[[11]]
class: SingleCellExperiment
dim: 32285 14805
metadata(1): Samples
assays(1): counts
rownames(32285): ENSMUSG00000051951 ENSMUSG00000089699 ...
ENSMUSG00000095019 ENSMUSG00000095041
rowData names(3): ID Symbol Type
colnames: NULL
colData names(2): Sample Barcode
reducedDimNames(0):
altExpNames(0):
> CellDat[[12]]
class: SingleCellExperiment
dim: 32285 2131
metadata(2): Samples scDblFinder.stats
assays(1): counts
rownames(32285): ENSMUSG00000051951 ENSMUSG00000089699 ...
ENSMUSG00000095019 ENSMUSG00000095041
rowData names(4): ID Symbol Type scDblFinder.selected
colnames(2131): cell1 cell2 ... cell2130 cell2131
colData names(12): Sample Barcode ... scDblFinder.mostLikelyOrigin
scDblFinder.originAmbiguous
reducedDimNames(0):
altExpNames(0):
Hi, I have run scDblFinder in "split" smaple mode to detect doublets with following code (since the data is large, I only provide code):
set.seed(221113L)
sce_qc <- scDblFinder::scDblFinder(
sce_raw[, !sce_raw$low_lib_size],
clusters = TRUE, dims = 50L,
samples = "Sample", multiSampleMode = "split",
returnType = "sce"
)
When I check the results, the scDblFinder.sample
column seems strange:
data.frame(colData(sce_qc)) %>%
dplyr::select(Sample, scDblFinder.sample) %>%
dplyr::filter(Sample != scDblFinder.sample)
# here is some output
Sample scDblFinder.sample
AAACCCAAGCCTCTCT-1 B4T B16T2
AAACCCAAGTGTAGAT-1 B4T B1T
AAACGCTGTGTATTGC-1 B4T B14T2
AAAGTGAGTAGATCGG-1 B4T B16U
AACAAAGGTGGATCGA-1 B4T B1U
AACAAGAGTCTACATG-1 B4T B14T1
AACCAACAGGTAAACT-1 B4T B1T
AACGGGAGTGAGATCG-1 B4T B14T2
AAGAACATCTCTCGCA-1 B4T B12T
AAGATAGAGCCTCATA-1 B4T B1U
AAGATAGAGTAAGACT-1 B4T B1T
AAGATAGCAAATGGCG-1 B4T B16U
AAGGAATGTTGAATCC-1 B4T B12U
I don't know why they are different when I used a "split" mode? From the help page of scDblFinder
, "split" mode runs all process separated by samples, I think they should be the same, is it right?
Probably something to do with I()
:
http://bioconductor.org/checkResults/devel/bioc-LATEST/scDblFinder/malbec2-checksrc.html
Also note the many other complaints in the CHECK report. Some of these are mine, some of these are for @plger.
Trying to update gives the following error
ERROR: this R is version 4.0.5, package 'scDblFinder' requires R >= 4.1
Isn't 4.1 still in development?
Hello,
Thank you so much writting this tool, I have used it on some sc datasets and has worked nicely. But when trying a different dataset prepared in the same way as the previous ones, I get an error and I was wondering if you have seen this before.
These are my commands:
#this first one works fine
seurat.sce <- as.SingleCellExperiment(seurat)
#This is the one that gives me the error
seurat.sce <- scDblFinder(seurat.sce,clusters = 'seurat_clusters')
The error is:
19 clusters
Creating ~10468 artifical doublets...
Error in sample.int(length(x), size, replace, prob) :
invalid 'replace' argument
Here is my session info:
R version 4.0.3 (2020-10-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.3
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] grid stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] scDblFinder_1.4.0 UpSetR_1.4.0 eulerr_6.1.0 SeuratObject_4.0.0
[5] Seurat_4.0.0 ggvenn_0.1.8 ggplot2_3.3.3 dplyr_1.0.4
loaded via a namespace (and not attached):
[1] VGAM_1.1-5 plyr_1.8.6
[3] igraph_1.2.6 lazyeval_0.2.2
[5] enrichR_3.0 polylabelr_0.2.0
[7] splines_4.0.3 BiocParallel_1.24.1
[9] densityClust_0.3 listenv_0.8.0
[11] scattermore_0.7 scater_1.18.6
[13] GenomeInfoDb_1.26.2 fastICA_1.2-2
[15] digest_0.6.27 htmltools_0.5.1.1
[17] viridis_0.5.1 fansi_0.4.2
[19] magrittr_2.0.1 tensor_1.5
[21] cluster_2.1.0 ROCR_1.0-11
[23] limma_3.46.0 globals_0.14.0
[25] matrixStats_0.58.0 docopt_0.7.1
[27] colorspace_2.0-0 ggrepel_0.9.1
[29] xfun_0.20 sparsesvd_0.2
[31] crayon_1.4.1 RCurl_1.98-1.2
[33] jsonlite_1.7.2 spatstat_1.64-1
[35] spatstat.data_2.0-0 survival_3.2-7
[37] zoo_1.8-8 glue_1.4.2
[39] polyclip_1.10-0 gtable_0.3.0
[41] zlibbioc_1.36.0 XVector_0.30.0
[43] leiden_0.3.7 DelayedArray_0.16.1
[45] BiocSingular_1.6.0 future.apply_1.7.0
[47] SingleCellExperiment_1.12.0 BiocGenerics_0.36.0
[49] abind_1.4-5 scales_1.1.1
[51] pheatmap_1.0.12 edgeR_3.32.1
[53] DBI_1.1.1 miniUI_0.1.1.1
[55] Rcpp_1.0.6 viridisLite_0.3.0
[57] xtable_1.8-4 dqrng_0.2.1
[59] reticulate_1.18 rsvd_1.0.3
[61] stats4_4.0.3 htmlwidgets_1.5.3
[63] httr_1.4.2 FNN_1.1.3
[65] RColorBrewer_1.1-2 ellipsis_0.3.1
[67] ica_1.0-2 scuttle_1.0.4
[69] pkgconfig_2.0.3 farver_2.0.3
[71] uwot_0.1.10 deldir_0.2-9
[73] locfit_1.5-9.4 utf8_1.1.4
[75] tidyselect_1.1.0 labeling_0.4.2
[77] rlang_0.4.10 reshape2_1.4.4
[79] later_1.1.0.1 munsell_0.5.0
[81] tools_4.0.3 xgboost_1.3.2.1
[83] cli_2.3.0 generics_0.1.0
[85] ggridges_0.5.3 stringr_1.4.0
[87] fastmap_1.1.0 goftest_1.2-2
[89] fitdistrplus_1.1-3 DDRTree_0.1.5
[91] purrr_0.3.4 RANN_2.6.1
[93] sparseMatrixStats_1.2.1 pbapply_1.4-3
[95] future_1.21.0 nlme_3.1-152
[97] mime_0.9 monocle_2.18.0
[99] slam_0.1-48 scran_1.18.6
[101] compiler_4.0.3 rstudioapi_0.13
[103] beeswarm_0.3.1 plotly_4.9.3
[105] png_0.1-7 testthat_3.0.1
[107] spatstat.utils_2.0-0 statmod_1.4.35
[109] tibble_3.0.6 stringi_1.5.3
[111] desc_1.2.0 bluster_1.0.0
[113] lattice_0.20-41 Matrix_1.3-2
[115] HSMMSingleCell_1.10.0 vctrs_0.3.6
[117] pillar_1.4.7 lifecycle_0.2.0
[119] combinat_0.0-8 lmtest_0.9-38
[121] BiocNeighbors_1.8.2 RcppAnnoy_0.0.18
[123] bitops_1.0-6 data.table_1.13.6
[125] cowplot_1.1.1 irlba_2.3.3
[127] GenomicRanges_1.42.0 httpuv_1.5.5
[129] patchwork_1.1.1 R6_2.5.0
[131] promises_1.2.0.1 KernSmooth_2.23-18
[133] gridExtra_2.3 vipor_0.4.5
[135] IRanges_2.24.1 parallelly_1.23.0
[137] codetools_0.2-18 MASS_7.3-53
[139] assertthat_0.2.1 pkgload_1.1.0
[141] SummarizedExperiment_1.20.0 rprojroot_2.0.2
[143] rjson_0.2.20 withr_2.4.1
[145] qlcMatrix_0.9.7 sctransform_0.3.2
[147] GenomeInfoDbData_1.2.4 S4Vectors_0.28.1
[149] mgcv_1.8-33 parallel_4.0.3
[151] beachmat_2.6.4 rpart_4.1-15
[153] tidyr_1.1.2 DelayedMatrixStats_1.12.3
[155] MatrixGenerics_1.2.1 Rtsne_0.15
[157] Biobase_2.50.0 shiny_1.6.0
[159] ggbeeswarm_0.6.0 tinytex_0.30
Thank you so much for your help!
Oops:
library(scDblFinder)
example(computeDoubletDensity, echo=FALSE)
library(DelayedArray)
scores <- computeDoubletDensity(DelayedArray(counts))
## Error in .check_Ops_vector_arg_length(e, x_nrow, e_what = e_what, x_what = x_what) :
## when the right operand is not a DelayedArray object (or derivative),
## its length (250000) cannot be greater than the first dimension of the
## left operand (10000)
Should be a very simple matter of slapping @importFrom DelayedArray sweep
on top of .spawn_doublet_pcs()
. Still spawns a warning but I think that's a DelayedArray problem rather than anything on our end.
Good day,
I wanna use scDblFinder
on my scATAC data. For RNA, you warn users to perform initial QC so it does not influence the modeling of doublets for more precise doublet calling.
I have checked the ATAC vignette but could not find information about this particular point. What would you recommend?
I appreciate any suggestions you can give me.
Looking at the code, I think that if you specify multiSampleMode' = 'split'
, then you always get an augmented sce
back, is that correct? Regardless of the specified returnType
. Unless I'm missing something :)
Will
Hi, thanks for developing this cool tool.
When I try to install it in R 3.6.0 using "BiocManager::install("scDblFinder")" , it says "package ‘scDblFinder’ is not available (for R version 3.6.0)".
Could you give any advice? Thanks.
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("scDblFinder")
Hello,
I found that scDblFinder is not deterministic when running it with multiple batches and with the BPPARAM argument, producing quite different results. Below is a MWE, do you have any idea what is going on? Output is deterministic when either setting BPPARAM to SerialParam or removing the batch argument.
library(BiocParallel)
library(SingleCellExperiment)
library(scDblFinder)
library(parallel)
library(scRNAseq)
library(scran)
sce <- scRNAseq::LawlorPancreasData()
sce$batch <- factor(c(rep("A", 100), rep("B", 200), rep("C", 100), rep("D", 238)))
sce$cluster <- as.character(scran::quickCluster(sce))
k <-
mclapply(1:3, mc.cores=3, function(x){
set.seed(123)
m <-
scDblFinder::scDblFinder(sce=sce,
clusters=as.character(sce$cluster),
samples=sce$batch,
BPPARAM=MulticoreParam(workers = 3))
return(m$scDblFinder.score)
}); names(k) <- paste0("run_",1:length(k))
par(mfrow=c(2,2))
plot(k$run_1, k$run_2)
plot(k$run_1, k$run_3)
plot(k$run_2, k$run_3)
R version 4.0.3 (2020-10-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 12.0.1
Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel stats4 stats graphics grDevices datasets utils methods base
other attached packages:
[1] scran_1.18.3 scRNAseq_2.4.0 scDblFinder_1.4.0 SingleCellExperiment_1.12.0
[5] SummarizedExperiment_1.20.0 Biobase_2.50.0 GenomicRanges_1.42.0 GenomeInfoDb_1.26.2
[9] IRanges_2.24.1 S4Vectors_0.28.1 BiocGenerics_0.36.0 MatrixGenerics_1.2.0
[13] matrixStats_0.57.0 BiocParallel_1.24.1
loaded via a namespace (and not attached):
[1] ggbeeswarm_0.6.0 colorspace_2.0-0 ellipsis_0.3.1 scuttle_1.0.4
[5] bluster_1.0.0 XVector_0.30.0 BiocNeighbors_1.8.2 rstudioapi_0.13
[9] bit64_4.0.5 interactiveDisplayBase_1.28.0 AnnotationDbi_1.52.0 fansi_0.4.2
[13] xml2_1.3.2 sparseMatrixStats_1.2.0 cachem_1.0.1 scater_1.18.3
[17] Rsamtools_2.6.0 dbplyr_2.1.1 shiny_1.6.0 BiocManager_1.30.10
[21] compiler_4.0.3 httr_1.4.2 dqrng_0.2.1 lazyeval_0.2.2
[25] assertthat_0.2.1 Matrix_1.3-2 fastmap_1.1.0 limma_3.46.0
[29] later_1.1.0.1 BiocSingular_1.6.0 htmltools_0.5.1.1 prettyunits_1.1.1
[33] tools_4.0.3 rsvd_1.0.3 igraph_1.2.6 gtable_0.3.0
[37] glue_1.4.2 GenomeInfoDbData_1.2.4 dplyr_1.0.5 rappdirs_0.3.1
[41] Rcpp_1.0.7 vctrs_0.3.6 Biostrings_2.58.0 ExperimentHub_1.16.1
[45] rtracklayer_1.50.0 DelayedMatrixStats_1.12.2 stringr_1.4.0 beachmat_2.6.4
[49] mime_0.9 lifecycle_1.0.0 irlba_2.3.3 ensembldb_2.14.0
[53] renv_0.13.2 statmod_1.4.35 XML_3.99-0.5 AnnotationHub_2.22.0
[57] edgeR_3.32.1 zlibbioc_1.36.0 scales_1.1.1 ProtGenerics_1.22.0
[61] hms_1.0.0 promises_1.1.1 AnnotationFilter_1.14.0 yaml_2.2.1
[65] curl_4.3 memoise_2.0.0 gridExtra_2.3 ggplot2_3.3.5
[69] biomaRt_2.46.1 stringi_1.5.3 RSQLite_2.2.3 BiocVersion_3.12.0
[73] GenomicFeatures_1.42.1 rlang_0.4.12 pkgconfig_2.0.3 bitops_1.0-6
[77] lattice_0.20-41 purrr_0.3.4 GenomicAlignments_1.26.0 bit_4.0.4
[81] tidyselect_1.1.0 magrittr_2.0.1 R6_2.5.0 generics_0.1.0
[85] DelayedArray_0.16.1 DBI_1.1.1 withr_2.4.2 pillar_1.6.0
[89] RCurl_1.98-1.2 tibble_3.1.1 crayon_1.4.1 xgboost_1.3.2.1
[93] utf8_1.1.4 BiocFileCache_1.14.0 viridis_0.5.1 progress_1.2.2
[97] locfit_1.5-9.4 grid_4.0.3 data.table_1.13.6 blob_1.2.1
[101] digest_0.6.27 xtable_1.8-4 httpuv_1.5.5 openssl_1.4.3
[105] munsell_0.5.0 beeswarm_0.2.3 viridisLite_0.3.0 vipor_0.4.5
[109] askpass_1.1
bcmvn.MM482 <- find.pK(sweep.stats.MM482)
DimPlot(object = MM482.BM, reduction = 'umap', group.by = "RNA_snn_res.0.5", label = TRUE, repel = TRUE, raster=FALSE) + NoLegend()
FeaturePlot(MM482.BM, features = "scDblFinder.score", cols = c("yellow", "red"), reduction = 'umap', raster = FALSE) + DarkTheme()
FeaturePlot(MM482.BM, features = "pANN_0.25_0.005_555",cols = c("yellow", "red"), reduction = 'umap', raster=FALSE) + DarkTheme()
DimPlot(MM482.BM,pt.size = 1,label=FALSE, label.size = 5,reduction = "umap",group.by = "DF.classifications_0.25_0.005_555")
DimPlot(MM482.BM,pt.size = 1,label=FALSE, label.size = 5,reduction = "umap",group.by = "DF.classifications_0.25_0.005_483")
MM482.BM <- doubletFinder_v3(MM482.BM, PCs = use.pcs, pN = 0.25, pK = mpk.MM482, nExp = nExp_poi.adj.MM482, reuse.pANN = "pANN_0.25_0.005_555", sct = FALSE)
MM482.singlet <- subset(x = MM482.BM, subset = DF.classifications_0.25_0.005_483 == "Singlet")
MM482.singlet
FeaturePlot(MM482.BM, features = "scDblFinder.score", cols = c("yellow", "red"), reduction = 'umap', raster = FALSE) + DarkTheme()
Error: None of the requested features were found: scDblFinder.score in slot data
In addition: Warning message:
In FetchData(object = object, vars = c(dims, "ident", features), :
The following requested variables were not found: scDblFinder.score
Describe the bug
When I ran library(scDblFinder) on jupyter, it appeared this error: Error: package or namespace load failed for ‘scDblFinder’ in dyn.load(file, DLLpath = DLLpath, ...):
unable to load shared object '/home/nguyen/R/x86_64-pc-linux-gnu-library/4.1/xgboost/libs/xgboost.so':
/home/nguyen/anaconda3/lib/python3.10/site-packages/zmq/backend/cython/../../../../.././libstdc++.so.6: version `GLIBCXX_3.4.30' not found (required by /home/nguyen/R/x86_64-pc-linux-gnu-library/4.1/xgboost/libs/xgboost.so)
I tried to update libstdc++.so.6 and remove/install again but it can not work.
Please help me.
I am using R version 4.1.2
I set up scDblFinder through BiocManager: version 1.13.13
Thank you very much.
I'm having a new issue where scDblFinder.class
is no longer added to my SCE. I am using the development version. It seems to run with no issue, so I'm at a bit of a loss.
> sce <- scDblFinder(sce, samples = "Sample", BPPARAM = BP, verbose = TRUE)
Training model...
Error in base::table(...) : all arguments must have the same length
> names(colData(sce))
[1] "Sample" "Barcode" "Group"
[4] "Batch" "sum" "detected"
[7] "subsets_Mito_sum" "subsets_Mito_detected" "subsets_Mito_percent"
[10] "total" "discard" "Phase"
[13] "G1.score" "S.score" "G2M.score"
[16] "scDblFinder.sample" "scDblFinder.cluster" "scDblFinder.distanceToNearest"
[19] "scDblFinder.nearestClass" "scDblFinder.difficulty" "scDblFinder.ratio"
[22] "scDblFinder.cxds_score" "scDblFinder.weighted" "scDblFinder.score"
[25] "scDblFinder.mostLikelyOrigin" "scDblFinder.originAmbiguous"
sessionInfo:
> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.7 LTS
Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] BiocStyle_2.17.1 openxlsx_4.2.2 pheatmap_1.0.12 PCAtools_2.1.22
[5] lattice_0.20-41 reshape2_1.4.4 ggrepel_0.8.2 BiocParallel_1.23.2
[9] scDblFinder_1.3.9 cowplot_1.1.0 scuttle_0.99.15 celldex_0.99.1
[13] dittoSeq_1.1.9 SingleR_1.3.8 scran_1.17.18 scater_1.17.5
[17] ggplot2_3.3.2 DropletUtils_1.9.12 SingleCellExperiment_1.11.7 SummarizedExperiment_1.19.7
[21] DelayedArray_0.15.11 matrixStats_0.57.0 Matrix_1.2-18 Biobase_2.49.1
[25] GenomicRanges_1.41.6 GenomeInfoDb_1.25.11 IRanges_2.23.10 S4Vectors_0.27.13
[29] BiocGenerics_0.35.4
loaded via a namespace (and not attached):
[1] ggbeeswarm_0.6.0 colorspace_1.4-1 ellipsis_0.3.1 ggridges_0.5.2
[5] bluster_0.99.1 XVector_0.29.3 BiocNeighbors_1.7.0 yaImpute_1.0-32
[9] rstudioapi_0.11 farver_2.0.3 bit64_4.0.5 interactiveDisplayBase_1.27.5
[13] AnnotationDbi_1.51.3 R.methodsS3_1.8.1 knitr_1.30 pROC_1.16.2
[17] dbplyr_1.4.4 R.oo_1.24.0 shiny_1.5.0 HDF5Array_1.17.11
[21] BiocManager_1.30.10 compiler_4.0.2 httr_1.4.2 dqrng_0.2.1
[25] assertthat_0.2.1 fastmap_1.0.1 limma_3.45.14 later_1.1.0.1
[29] BiocSingular_1.5.1 htmltools_0.5.0 tools_4.0.2 rsvd_1.0.3
[33] igraph_1.2.5 gtable_0.3.0 glue_1.4.2 GenomeInfoDbData_1.2.3
[37] dplyr_1.0.2 rappdirs_0.3.1 Rcpp_1.0.5 vctrs_0.3.4
[41] rhdf5filters_1.1.3 ExperimentHub_1.15.3 DelayedMatrixStats_1.11.1 xfun_0.18
[45] stringr_1.4.0 mime_0.9 lifecycle_0.2.0 irlba_2.3.3
[49] statmod_1.4.34 AnnotationHub_2.21.5 edgeR_3.31.4 zlibbioc_1.35.0
[53] scales_1.1.1 promises_1.1.1 rhdf5_2.33.10 RColorBrewer_1.1-2
[57] yaml_2.2.1 curl_4.3 memoise_1.1.0 gridExtra_2.3
[61] stringi_1.5.3 RSQLite_2.2.1 BiocVersion_3.12.0 zip_2.1.1
[65] rlang_0.4.7 pkgconfig_2.0.3 bitops_1.0-6 evaluate_0.14
[69] purrr_0.3.4 Rhdf5lib_1.11.3 labeling_0.3 bit_4.0.4
[73] tidyselect_1.1.0 plyr_1.8.6 magrittr_1.5 R6_2.4.1
[77] generics_0.0.2 DBI_1.1.0 pillar_1.4.6 withr_2.3.0
[81] RCurl_1.98-1.2 tibble_3.0.3 crayon_1.3.4 intrinsicDimension_1.2.0
[85] xgboost_1.2.0.1 BiocFileCache_1.13.1 rmarkdown_2.4 viridis_0.5.1
[89] locfit_1.5-9.4 grid_4.0.2 data.table_1.13.0 blob_1.2.1
[93] digest_0.6.25 xtable_1.8-4 httpuv_1.5.4 R.utils_2.10.1
[97] scds_1.5.0 munsell_0.5.0 beeswarm_0.2.3 viridisLite_0.3.0
Dear all,
with arbitrary samples (until recently I did not know which property was decisive) I got 'Error in .local(x, ...) : size factors should be positive' as error from computeDoubletDensity.
In some cases increasing subset.row was helpful (e.g. selecting the top 5000 highly variable features, instead of 2000). Sometimes though this could not resolve the issue.
I saw the other, similar, issue raised here (#32): My data set though was cleaned for cells with very low total read counts. The error persisted.
Solution:
One has to exclude cells which have zero total reads for features provided in subset.row:
factors <- scuttle::librarySizeFactors(expr_mat[subset.row,])
which(factors == 0)
Would it be acceptable to add a more meaningful error message? E.g. inform the user about cell names which have zero as library size?
I expect handling such error inside your function is not in your interest. If it is though:
(i) What would happen of library sizes are increased by a common value to avoid zeros? I mean adding 1 or 0.0001 or so, similar to log1p.
(ii) If such cells are excluded, an NA could be returned as doublet score. Or a -1 or so? Or they could be excluded complete from the return, which would cause other problems though.
Thanks.
Hi,
Thanks for maintaining this tool, I met a problem when trying this tool when using MulticoreParam
code:
library(scDblFinder)
library(BiocParallel)
sce = as.SingleCellExperiment(seurat_filtered)
sce = scDblFinder(sce, samples="sample_label", BPPARAM=MulticoreParam(4))
Error in serialize(data, node$con, xdr = FALSE) :
error writing to connection
Error in manager$availability[[as.character(result$node)]] <- TRUE :
wrong args for environment subassignment
In addition: Warning messages:
1: In serialize(data, node$con, xdr = FALSE) :
'package:stats' may not be available when loading
2: In serialize(data, node$con, xdr = FALSE) :
'package:stats' may not be available when loading
3: In serialize(data, node$con, xdr = FALSE) :
'package:stats' may not be available when loading
Error in serialize(data, node$con, xdr = FALSE) :
error writing to connection
When I remove BPPARAM=MulticoreParam(4)
, the code can be run through without error (although slow). so I guess it might be related to the multiple processing. The object size I am dealing with is 4.3 GB, while the server has more than 140 GB of memory, so I guess it shouldn't be the memory issue, May I ask if you have any idea about this problem and the potential solution?
Thanks,
Hello,
I am trying to run scDblFinder to find doublets in my scATAC-seq data but run into the following error early on:
Error in names(res) <- nms :
'names' attribute [4] must be the same length as the vector [2]
In addition: Warning message:
stop worker failed:
attempt to select less than one element in OneIndex
From preliminary google searches, this problem seems external to scDblFinder, but any insight you may have will be very helpful.
I am running the following code:
cancer_2_h_new
#An object of class Seurat
#251195 features across 23665 samples within 1 assay
#Active assay: ATAC (251195 features, 251195 variable features)
#4 dimensional reductions calculated: lsi, umap, harmony, umap_harmony
cancer_sce = as.SingleCellExperiment(cancer_2_h_new)
set.seed(123)
library(scDblFinder)
library(BiocParallel)
sce <- scDblFinder(cancer_sce, samples="Mouse", aggregateFeatures=TRUE, nfeatures=25,BPPARAM=MulticoreParam(3), processing = "normFeatures")
Hello,
I am able to run the developer version of scDblFinder with one sample, but when using an SCE with multiple samples (named in colData) I run into the following error (true whether I load in an SCE or matrix with a vector of sample IDs):
masterSCE = scDblFinder(sce = sce, samples = "sample_ID", nfeatures = 1000, score = 'xgb',verbose = TRUE)
Error in .format_mismatch_message(x_colnames, object_colnames) :
the DataFrame objects to rbind do not have the same column names ('ratio.k20' is unique)
sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C LC_TIME=English_United States.1252
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] forcats_0.5.0 stringr_1.4.0 dplyr_1.0.2 purrr_0.3.4 readr_1.4.0 tidyr_1.1.2 tibble_3.0.3 ggplot2_3.3.2 tidyverse_1.3.0 DropletUtils_1.9.13 SingleCellExperiment_1.11.8
[12] SummarizedExperiment_1.19.9 Biobase_2.49.1 GenomicRanges_1.41.6 GenomeInfoDb_1.25.11 IRanges_2.23.10 S4Vectors_0.27.13 BiocGenerics_0.35.4 MatrixGenerics_1.1.3 matrixStats_0.57.0 scDblFinder_1.3.9
loaded via a namespace (and not attached):
[1] ggbeeswarm_0.6.0 colorspace_1.4-1 ellipsis_0.3.1 rprojroot_1.3-2 scuttle_0.99.18 bluster_0.99.1 XVector_0.29.3 BiocNeighbors_1.7.0 fs_1.5.0 yaImpute_1.0-32 rstudioapi_0.11 remotes_2.2.0
[13] fansi_0.4.1 lubridate_1.7.9 xml2_1.3.2 R.methodsS3_1.8.1 scater_1.17.5 jsonlite_1.7.1 pROC_1.16.2 broom_0.7.1 dbplyr_1.4.4 R.oo_1.24.0 HDF5Array_1.17.13 BiocManager_1.30.10
[25] compiler_4.0.2 httr_1.4.2 dqrng_0.2.1 backports_1.1.10 assertthat_0.2.1 Matrix_1.2-18 limma_3.45.14 cli_2.0.2 BiocSingular_1.5.2 prettyunits_1.1.1 tools_4.0.2 rsvd_1.0.3
[37] igraph_1.2.5 gtable_0.3.0 glue_1.4.2 GenomeInfoDbData_1.2.4 Rcpp_1.0.5 cellranger_1.1.0 vctrs_0.3.4 rhdf5filters_1.1.3 DelayedMatrixStats_1.11.1 ps_1.3.4 rvest_0.3.6 beachmat_2.5.8
[49] lifecycle_0.2.0 irlba_2.3.3 statmod_1.4.34 edgeR_3.31.4 zlibbioc_1.35.0 scales_1.1.1 hms_0.5.3 rhdf5_2.33.10 yaml_2.2.1 curl_4.3 gridExtra_2.3 stringi_1.5.3
[61] scran_1.17.20 pkgbuild_1.1.0 BiocParallel_1.23.2 rlang_0.4.7 pkgconfig_2.0.3 bitops_1.0-6 lattice_0.20-41 Rhdf5lib_1.11.3 processx_3.4.4 tidyselect_1.1.0 plyr_1.8.6 magrittr_1.5
[73] R6_2.4.1 generics_0.0.2 DelayedArray_0.15.15 DBI_1.1.0 pillar_1.4.6 haven_2.3.1 withr_2.3.0 RCurl_1.98-1.2 modelr_0.1.8 crayon_1.3.4 intrinsicDimension_1.2.0 xgboost_1.2.0.1
[85] viridis_0.5.1 locfit_1.5-9.4 grid_4.0.2 readxl_1.3.1 data.table_1.13.0 blob_1.2.1 callr_3.4.4 reprex_0.3.0 R.utils_2.10.1 scds_1.5.0 munsell_0.5.0 beeswarm_0.2.3
[97] viridisLite_0.3.0 vipor_0.4.5
Hi Pierre-Luc,
fyi, the Matrix package >= 1.5.0 has deprecated the as(., "dgCMatrix")
syntax, now erroring if that is used. It now must be as(., "CsparseMatrix")
. I would therefore suggest to update the respective lines in the source and require Matrix to be >= 1.5.0 in the DESCRIPTION.
I do not have a MRE at hand now, but once you update Matrix you get something like:
> bp <- BiocParallel::MulticoreParam(mc_workers, RNGseed=1234)
> sce <- scDblFinder::scDblFinder(sce, clusters="cluster", samples="zt", BPPARAM=bp)
Error: BiocParallel errors
2 remote errors, element index: 1, 2
0 unevaluated and other errors
first remote error:
Error in value[[3L]](cond): An error occured while processing sample 'zt1':
Error: as(<dgeMatrix>, "dgCMatrix") is deprecated since Matrix 1.5-0; do as(., "CsparseMatrix") instead
best,
-Alex
In R version 4.1.2
Error: package or namespace load failed for ‘scDblFinder’ in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]): there is no package called ‘xgboost’
I have tried
devtools::install_github("plger/scDblFinder")
rjson (0.2.20 -> 0.2.21 ) [CRAN]
xgboost (NA -> 1.5.2.1) [CRAN]
Skipping 33 packages ahead of CRAN: BiocGenerics, S4Vectors, DelayedArray, Biobase, MatrixGenerics, Rhtslib, zlibbioc, GenomeInfoDbData, XVector, BiocParallel, Rsamtools, Biostrings, SummarizedExperiment, GenomicRanges, GenomeInfoDb, IRanges, BiocNeighbors, beachmat, ScaledMatrix, sparseMatrixStats, limma, DelayedMatrixStats, SingleCellExperiment, BiocIO, GenomicAlignments, BiocSingular, scuttle, metapod, bluster, edgeR, rtracklayer, scater, scran
Installing 2 packages: rjson, xgboost
Installing packages into ‘/home/jovyan/R/x86_64-pc-linux-gnu-library/4.1’
(as ‘lib’ is unspecified)
Warning message in i.p(...):
“installation of package ‘xgboost’ had non-zero exit status”
✔ checking for file ‘/tmp/RtmprOot6n/remotes3b5e71218072/plger-scDblFinder-fec63bf/DESCRIPTION’ (454ms)
─ preparing ‘scDblFinder’:
✔ checking DESCRIPTION meta-information
─ checking for LF line-endings in source and make files and shell scripts
─ checking for empty or unneeded directories
─ looking to see if a ‘data/datalist’ file should be added
─ building ‘scDblFinder_1.9.5.tar.gz’
Installing package into ‘/home/jovyan/R/x86_64-pc-linux-gnu-library/4.1’
(as ‘lib’ is unspecified)
Warning message in i.p(...):
“installation of package ‘/tmp/RtmprOot6n/file3b5e6d7b9987/scDblFinder_1.9.5.tar.gz’ had non-zero exit status”
and
install.packages("xgboost")
(as ‘lib’ is unspecified)
Warning message in install.packages("xgboost"):
“installation of package ‘xgboost’ had non-zero exit status”
any suggestions for ways round this short of rolling back my R version?
Hi! Thanks for your great software.
I am using your package with some Multiome data (calling doublets separately for RNA and ATAC). I have multiple samples and was following your recommendation of creating a SingleCellExperiment
with all samples together. This is straightforward for the RNA data (they all quantify the same rows/genes), but for the ATAC data, peaks/rows are unique for each library. I could do a merge of the ATAC peaks for each sample, and re-quantify in those regions (e.g. like this), but it seems like a lot of pre-processing and modifying the raw count data prior to running the doublet finder algorithm.
So, what I'm asking is, how preferable is it to merge samples instead of doing the doublet finding separately per each sample? If results are (more or less) robust, maybe it is OK to run the samples separately?
thanks for your help
Hi,
Trying the function on the Kumar
dataset I got :
> scDblFinder(sce, trans = "scran", verbose=FALSE)
Error in `$<-.data.frame`(`*tmp*`, "classification", value = logical(0)) :
replacement has 0 rows, data has 457
Debugging a bit the function I noticed that doubletThresholding()
returns NULL
which causes the error with the empty values at this line in the code of scDblFinder
;
d$classification <- ifelse(d$ratio >= th, "doublet", "singlet")
Given that th
is NULL
.
Thanks for this tool! I met this problem when I tried to set customed clusters = '...' during running :
scDblFinder(sce, samples = "batch", clusters = 'my_cluster_label')
Error in scDblFinder(sce[, x], artificialDoublets = artificialDoublets, :
Only one cluster generated
my_cluster_label is a colname in the colData(sce)
I tried to convert the class or type of sce@colData$my_cluster_labelto factors/numeric/characters or set clusters = sce@colData$my_cluster_label but they all turn out to be useless.
I could get the 'scDblFinder.class' label in my sce without setting clusters.
I was using scDblFinder V1.1.8.
Thanks for any help!!!
Hi, not sure if this is an issue with scDblFinder, BiocParallel, or me, but this worked in the past but isn't working any more for some reason.
library(Seurat)
#> Attaching SeuratObject
library(scDblFinder)
library(BiocParallel)
l <- c(pbmc_small, pbmc_small)
l[[1]][["batch"]] = "A"
l[[2]][["batch"]] = "B"
seu <- merge(x=l[[1]], y=l[[2]])
#> Warning in CheckDuplicateCellNames(object.list = objects): Some cell names are
#> duplicated across objects provided. Renaming to enforce unique cell names.
sce <- as.SingleCellExperiment(seu)
out <- scDblFinder(sce, samples = "batch", BPPARAM=MulticoreParam(2))
#> Warning in parallel::mccollect(wait = FALSE, timeout = 1): 1 parallel job did
#> not deliver a result
#> Error in result[[njob]] <- value: attempt to select less than one element in OneIndex
Created on 2021-05-02 by the reprex package (v2.0.0)
sessioninfo::session_info()
─ Session info ─────────────────────────────────────────────────────────────────────────────────────────
setting value
version R version 4.0.5 (2021-03-31)
os macOS Big Sur 10.16
system x86_64, darwin17.0
ui RStudio
language (EN)
collate en_GB.UTF-8
ctype en_GB.UTF-8
tz Europe/London
date 2021-05-02
─ Packages ─────────────────────────────────────────────────────────────────────────────────────────────
package * version date lib source
abind 1.4-5 2016-07-21 [1] CRAN (R 4.0.2)
assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.2)
backports 1.2.1 2020-12-09 [1] CRAN (R 4.0.2)
beachmat 2.6.4 2020-12-20 [1] Bioconductor
beeswarm 0.3.1 2021-03-07 [1] CRAN (R 4.0.3)
Biobase 2.50.0 2020-10-27 [1] Bioconductor
BiocGenerics 0.36.1 2021-04-16 [1] Bioconductor
BiocNeighbors 1.8.2 2020-12-07 [1] Bioconductor
BiocParallel * 1.24.1 2020-11-06 [1] Bioconductor
BiocSingular 1.6.0 2020-10-27 [1] Bioconductor
bitops 1.0-7 2021-04-24 [1] CRAN (R 4.0.2)
bluster 1.0.0 2020-10-27 [1] Bioconductor
callr 3.7.0 2021-04-20 [1] CRAN (R 4.0.2)
cli 2.5.0 2021-04-26 [1] CRAN (R 4.0.2)
clipr 0.7.1 2020-10-08 [1] CRAN (R 4.0.2)
cluster 2.1.2 2021-04-17 [1] CRAN (R 4.0.2)
codetools 0.2-18 2020-11-04 [1] CRAN (R 4.0.5)
colorspace 2.0-0 2020-11-11 [1] CRAN (R 4.0.2)
cowplot 1.1.1 2020-12-30 [1] CRAN (R 4.0.2)
crayon 1.4.1 2021-02-08 [1] CRAN (R 4.0.2)
data.table 1.14.0 2021-02-21 [1] CRAN (R 4.0.2)
DBI 1.1.1 2021-01-15 [1] CRAN (R 4.0.3)
DelayedArray 0.16.3 2021-03-24 [1] Bioconductor
DelayedMatrixStats 1.12.3 2021-02-03 [1] Bioconductor
deldir 0.2-10 2021-02-16 [1] CRAN (R 4.0.2)
digest 0.6.27 2020-10-24 [1] CRAN (R 4.0.2)
dplyr 1.0.5 2021-03-05 [1] CRAN (R 4.0.2)
dqrng 0.2.1 2019-05-17 [1] CRAN (R 4.0.2)
edgeR 3.32.1 2021-01-14 [1] Bioconductor
ellipsis 0.3.1 2020-05-15 [1] CRAN (R 4.0.2)
evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.1)
fansi 0.4.2 2021-01-15 [1] CRAN (R 4.0.2)
fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.0.2)
fitdistrplus 1.1-3 2020-12-05 [1] CRAN (R 4.0.2)
fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.2)
future 1.21.0 2020-12-10 [1] CRAN (R 4.0.3)
future.apply 1.7.0 2021-01-04 [1] CRAN (R 4.0.2)
generics 0.1.0 2020-10-31 [1] CRAN (R 4.0.2)
GenomeInfoDb 1.26.7 2021-04-08 [1] Bioconductor
GenomeInfoDbData 1.2.4 2021-01-17 [1] Bioconductor
GenomicRanges 1.42.0 2020-10-27 [1] Bioconductor
ggbeeswarm 0.6.0 2017-08-07 [1] CRAN (R 4.0.2)
ggplot2 3.3.3 2020-12-30 [1] CRAN (R 4.0.2)
ggrepel 0.9.1 2021-01-15 [1] CRAN (R 4.0.2)
ggridges 0.5.3 2021-01-08 [1] CRAN (R 4.0.2)
globals 0.14.0 2020-11-22 [1] CRAN (R 4.0.2)
glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.2)
goftest 1.2-2 2019-12-02 [1] CRAN (R 4.0.2)
gridExtra 2.3 2017-09-09 [1] CRAN (R 4.0.2)
gtable 0.3.0 2019-03-25 [1] CRAN (R 4.0.2)
highr 0.9 2021-04-16 [1] CRAN (R 4.0.5)
htmltools 0.5.1.1 2021-01-22 [1] CRAN (R 4.0.2)
htmlwidgets 1.5.3 2020-12-10 [1] CRAN (R 4.0.3)
httpuv 1.6.0 2021-04-23 [1] CRAN (R 4.0.2)
httr 1.4.2 2020-07-20 [1] CRAN (R 4.0.2)
ica 1.0-2 2018-05-24 [1] CRAN (R 4.0.2)
igraph 1.2.6 2020-10-06 [1] CRAN (R 4.0.2)
IRanges 2.24.1 2020-12-12 [1] Bioconductor
irlba 2.3.3 2019-02-05 [1] CRAN (R 4.0.2)
jsonlite 1.7.2 2020-12-09 [1] CRAN (R 4.0.2)
KernSmooth 2.23-18 2020-10-29 [1] CRAN (R 4.0.5)
knitr 1.33 2021-04-24 [1] CRAN (R 4.0.2)
later 1.2.0 2021-04-23 [1] CRAN (R 4.0.2)
lattice 0.20-41 2020-04-02 [1] CRAN (R 4.0.5)
lazyeval 0.2.2 2019-03-15 [1] CRAN (R 4.0.2)
leiden 0.3.7 2021-01-26 [1] CRAN (R 4.0.3)
lifecycle 1.0.0 2021-02-15 [1] CRAN (R 4.0.2)
limma 3.46.0 2020-10-27 [1] Bioconductor
listenv 0.8.0 2019-12-05 [1] CRAN (R 4.0.2)
lmtest 0.9-38 2020-09-09 [1] CRAN (R 4.0.2)
locfit 1.5-9.4 2020-03-25 [1] CRAN (R 4.0.2)
magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.0.2)
MASS 7.3-53.1 2021-02-12 [1] CRAN (R 4.0.5)
Matrix 1.3-2 2021-01-06 [1] CRAN (R 4.0.5)
MatrixGenerics 1.2.1 2021-01-30 [1] Bioconductor
matrixStats 0.58.0 2021-01-29 [1] CRAN (R 4.0.2)
mgcv 1.8-35 2021-04-18 [1] CRAN (R 4.0.2)
mime 0.10 2021-02-13 [1] CRAN (R 4.0.2)
miniUI 0.1.1.1 2018-05-18 [1] CRAN (R 4.0.2)
munsell 0.5.0 2018-06-12 [1] CRAN (R 4.0.2)
nlme 3.1-152 2021-02-04 [1] CRAN (R 4.0.5)
parallelly 1.24.0 2021-03-14 [1] CRAN (R 4.0.2)
patchwork 1.1.1 2020-12-17 [1] CRAN (R 4.0.2)
pbapply 1.4-3 2020-08-18 [1] CRAN (R 4.0.2)
pillar 1.6.0 2021-04-13 [1] CRAN (R 4.0.5)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.2)
plotly 4.9.3 2021-01-10 [1] CRAN (R 4.0.2)
plyr 1.8.6 2020-03-03 [1] CRAN (R 4.0.2)
png 0.1-7 2013-12-03 [1] CRAN (R 4.0.2)
polyclip 1.10-0 2019-03-14 [1] CRAN (R 4.0.2)
processx 3.5.1 2021-04-04 [1] CRAN (R 4.0.2)
promises 1.2.0.1 2021-02-11 [1] CRAN (R 4.0.2)
ps 1.6.0 2021-02-28 [1] CRAN (R 4.0.3)
purrr 0.3.4 2020-04-17 [1] CRAN (R 4.0.2)
R6 2.5.0 2020-10-28 [1] CRAN (R 4.0.2)
RANN 2.6.1 2019-01-08 [1] CRAN (R 4.0.2)
RColorBrewer 1.1-2 2014-12-07 [1] CRAN (R 4.0.2)
Rcpp 1.0.6 2021-01-15 [1] CRAN (R 4.0.2)
RcppAnnoy 0.0.18 2020-12-15 [1] CRAN (R 4.0.2)
RCurl 1.98-1.3 2021-03-16 [1] CRAN (R 4.0.2)
reprex * 2.0.0 2021-04-02 [1] CRAN (R 4.0.2)
reshape2 1.4.4 2020-04-09 [1] CRAN (R 4.0.2)
reticulate 1.19 2021-04-21 [1] CRAN (R 4.0.2)
rlang 0.4.10 2020-12-30 [1] CRAN (R 4.0.2)
rmarkdown 2.7 2021-02-19 [1] CRAN (R 4.0.2)
ROCR 1.0-11 2020-05-02 [1] CRAN (R 4.0.2)
rpart 4.1-15 2019-04-12 [1] CRAN (R 4.0.5)
rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.0.2)
rsvd 1.0.5 2021-04-16 [1] CRAN (R 4.0.5)
Rtsne 0.15 2018-11-10 [1] CRAN (R 4.0.2)
S4Vectors 0.28.1 2020-12-09 [1] Bioconductor
scales 1.1.1 2020-05-11 [1] CRAN (R 4.0.2)
scater 1.18.6 2021-02-26 [1] Bioconductor
scattermore 0.7 2020-11-24 [1] CRAN (R 4.0.2)
scDblFinder * 1.5.16 2021-04-19 [1] Github (plger/scDblFinder@d11467b)
scran 1.18.7 2021-04-16 [1] Bioconductor
sctransform 0.3.2.9006 2021-04-01 [1] Github (ChristophH/sctransform@73e2e3e)
scuttle 1.0.4 2020-12-17 [1] Bioconductor
sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.2)
Seurat * 4.0.1 2021-04-13 [1] Github (satijalab/seurat@4e868fc)
SeuratObject * 4.0.0 2021-01-15 [1] CRAN (R 4.0.2)
shiny 1.6.0 2021-01-25 [1] CRAN (R 4.0.3)
SingleCellExperiment 1.12.0 2020-10-27 [1] Bioconductor
sparseMatrixStats 1.2.1 2021-02-02 [1] Bioconductor
spatstat.core 2.1-2 2021-04-18 [1] CRAN (R 4.0.2)
spatstat.data 2.1-0 2021-03-21 [1] CRAN (R 4.0.3)
spatstat.geom 2.1-0 2021-04-15 [1] CRAN (R 4.0.2)
spatstat.sparse 2.0-0 2021-03-16 [1] CRAN (R 4.0.2)
spatstat.utils 2.1-0 2021-03-15 [1] CRAN (R 4.0.2)
statmod 1.4.35 2020-10-19 [1] CRAN (R 4.0.2)
stringi 1.5.3 2020-09-09 [1] CRAN (R 4.0.2)
stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.2)
styler 1.4.1 2021-03-30 [1] CRAN (R 4.0.2)
SummarizedExperiment 1.20.0 2020-10-27 [1] Bioconductor
survival 3.2-11 2021-04-26 [1] CRAN (R 4.0.2)
tensor 1.5 2012-05-05 [1] CRAN (R 4.0.2)
tibble 3.1.1 2021-04-18 [1] CRAN (R 4.0.2)
tidyr 1.1.3 2021-03-03 [1] CRAN (R 4.0.3)
tidyselect 1.1.0 2020-05-11 [1] CRAN (R 4.0.2)
utf8 1.2.1 2021-03-12 [1] CRAN (R 4.0.2)
uwot 0.1.10 2020-12-15 [1] CRAN (R 4.0.2)
vctrs 0.3.7 2021-03-29 [1] CRAN (R 4.0.2)
vipor 0.4.5 2017-03-22 [1] CRAN (R 4.0.2)
viridis 0.6.0 2021-04-15 [1] CRAN (R 4.0.5)
viridisLite 0.4.0 2021-04-13 [1] CRAN (R 4.0.5)
withr 2.4.2 2021-04-18 [1] CRAN (R 4.0.5)
xfun 0.22 2021-03-11 [1] CRAN (R 4.0.2)
xgboost 1.4.1.1 2021-04-22 [1] CRAN (R 4.0.2)
xtable 1.8-4 2019-04-21 [1] CRAN (R 4.0.2)
XVector 0.30.0 2020-10-28 [1] Bioconductor
yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.2)
zlibbioc 1.36.0 2020-10-28 [1] Bioconductor
zoo 1.8-9 2021-03-09 [1] CRAN (R 4.0.2)
[1] /Library/Frameworks/R.framework/Versions/4.0/Resources/library
Hi,
I installed scDblFinder by BiocManager::install("scDblFinder")
. When I ran library(scDblFinder)
, I got this error message:
Error: package or namespace load failed for ‘scDblFinder’:
object ‘colBlockApply’ is not exported by 'namespace:beachmat'
I tried to updated the package beachmat, but this error still remain unsolved.
Could you provide any suggestion for this?
Thank you for your helping!
Here is where cxds2
is being called:
Line 406 in 30090a0
Here is an example ctype
that I came across (assigned just above):
# dimensions of various things
ncol_sce <- 26657
len_wDbl <- 0
ncol_ad <- 21325
knownUse <- 'discard'
# using above dimensions to make ctype
ctype_bad <- factor(
rep(
c(1L, ifelse(knownUse=="positive", 2L, 1L), 2L),
c(ncol_sce, len_wDbl, ncol_ad),
),
labels = c("real", "doublet")
)
# ctype looks reasonable
table(ctype_bad)
#ctype_bad
# real doublet
# 26657 21325
# given above, and call to cxds2: artificial doublets not added to whichDbls
which(ctype_bad == 2L)
# integer(0)
Just wondering if this is a bug or intentional. Thanks!
Thanks for providing this good AI tools!
I have a question or asking for suggestions, when I encountered with a larger number of cells as input:
did not converge in 20 iterations
I didn't find any parameters to increase the number of iterations.
Thanks
It seems that you could replace
Lines 45 to 46 in 6bfd3b0
with something like:
x <- t(assay(sumCountsAcrossCells(x, k, average=TRUE)))
using scuttle's sumCountsAcrossCells
function. This implements parallelization if required, and is also safer when x
does not have any names, in which case names(k)
is probably not going to do the right thing.
Also, scran::buildKNNGraph
can be swapped for the lower-level bluster::makeKNNGraph
, which does the same thing but doesn't need the d=
and transposed=
arguments because your input should directly match up to what it wants. Again, this function can be augmented with parallelization via BPPARAM=
, if one so pleases.
Hi there,
I realize this may be an issue external to scDblFinder, but I figured I would post it in case you had an insight. Feel free to close if you're not sure.
When running scDblFinder with multiple samples, I'm getting a BiocParallel error that I can't seem to figure out. Any help is appreciated!
sce <- scDblFinder(sce, samples="orig.ident", BPPARAM=MulticoreParam(2))
Error: BiocParallel errors
element index: 1, 2, 3, 4
first error: An error occured while processing sample 'Control':
Error in rowVars(DelayedArray(x)): could not find symbol "useNames" in environment of the generic function
> sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 11.1
Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] scran_1.20.1 scuttle_1.2.1 BiocParallel_1.26.2
[4] forcats_0.5.1 stringr_1.4.0 dplyr_1.0.7
[7] purrr_0.3.4 readr_2.0.2 tidyr_1.1.4
[10] tibble_3.1.5 ggplot2_3.3.5 tidyverse_1.3.1
[13] SingleCellExperiment_1.14.1 SummarizedExperiment_1.22.0 Biobase_2.52.0
[16] GenomicRanges_1.44.0 GenomeInfoDb_1.28.4 IRanges_2.26.0
[19] S4Vectors_0.30.2 BiocGenerics_0.38.0 MatrixGenerics_1.5.1
[22] matrixStats_0.61.0 scDblFinder_1.6.0 SeuratObject_4.0.2
[25] Seurat_4.0.5
loaded via a namespace (and not attached):
[1] utf8_1.2.2 reticulate_1.22 tidyselect_1.1.1
[4] htmlwidgets_1.5.4 grid_4.1.1 Rtsne_0.15
[7] munsell_0.5.0 ScaledMatrix_1.0.0 codetools_0.2-18
[10] ica_1.0-2 statmod_1.4.36 xgboost_1.4.1.1
[13] future_1.22.1 miniUI_0.1.1.1 withr_2.4.2
[16] colorspace_2.0-2 knitr_1.36 rstudioapi_0.13
[19] ROCR_1.0-11 tensor_1.5 listenv_0.8.0
[22] GenomeInfoDbData_1.2.6 polyclip_1.10-0 parallelly_1.28.1
[25] vctrs_0.3.8 generics_0.1.0 xfun_0.27
[28] R6_2.5.1 ggbeeswarm_0.6.0 rsvd_1.0.5
[31] locfit_1.5-9.4 bitops_1.0-7 spatstat.utils_2.2-0
[34] DelayedArray_0.18.0 assertthat_0.2.1 promises_1.2.0.1
[37] scales_1.1.1 beeswarm_0.4.0 gtable_0.3.0
[40] beachmat_2.8.1 globals_0.14.0 goftest_1.2-3
[43] rlang_0.4.12 splines_4.1.1 lazyeval_0.2.2
[46] spatstat.geom_2.3-0 broom_0.7.9 yaml_2.2.1
[49] reshape2_1.4.4 abind_1.4-5 modelr_0.1.8
[52] backports_1.2.1 httpuv_1.6.3 tools_4.1.1
[55] ellipsis_0.3.2 spatstat.core_2.3-0 RColorBrewer_1.1-2
[58] ggridges_0.5.3 Rcpp_1.0.7 plyr_1.8.6
[61] sparseMatrixStats_1.4.2 zlibbioc_1.38.0 RCurl_1.98-1.5
[64] rpart_4.1-15 deldir_1.0-5 pbapply_1.5-0
[67] viridis_0.6.2 cowplot_1.1.1 zoo_1.8-9
[70] haven_2.4.3 ggrepel_0.9.1 cluster_2.1.2
[73] fs_1.5.0 magrittr_2.0.1 data.table_1.14.2
[76] scattermore_0.7 lmtest_0.9-38 reprex_2.0.1
[79] RANN_2.6.1 fitdistrplus_1.1-6 hms_1.1.1
[82] patchwork_1.1.1 mime_0.12 evaluate_0.14
[85] xtable_1.8-4 readxl_1.3.1 gridExtra_2.3
[88] compiler_4.1.1 scater_1.20.1 KernSmooth_2.23-20
[91] crayon_1.4.1 htmltools_0.5.2 mgcv_1.8-36
[94] later_1.3.0 tzdb_0.1.2 lubridate_1.8.0
[97] DBI_1.1.1 dbplyr_2.1.1 MASS_7.3-54
[100] Matrix_1.3-4 cli_3.0.1 metapod_1.0.0
[103] igraph_1.2.7 pkgconfig_2.0.3 plotly_4.10.0
[106] spatstat.sparse_2.0-0 xml2_1.3.2 vipor_0.4.5
[109] dqrng_0.3.0 XVector_0.32.0 rvest_1.0.2
[112] digest_0.6.28 sctransform_0.3.2 RcppAnnoy_0.0.19
[115] spatstat.data_2.1-0 rmarkdown_2.11 cellranger_1.1.0
[118] leiden_0.3.9 uwot_0.1.10 edgeR_3.34.1
[121] DelayedMatrixStats_1.14.3 shiny_1.7.1 lifecycle_1.0.1
[124] nlme_3.1-152 jsonlite_1.7.2 BiocNeighbors_1.10.0
[127] viridisLite_0.4.0 limma_3.48.3 fansi_0.5.0
[130] pillar_1.6.4 lattice_0.20-44 fastmap_1.1.0
[133] httr_1.4.2 survival_3.2-11 glue_1.4.2
[136] png_0.1-7 bluster_1.2.1 stringi_1.7.5
[139] BiocSingular_1.8.1 irlba_2.3.3 future.apply_1.8.1
Hi! Thank you for this great tool. I am encountering the error in the title when running scDblFinder on a large dataset (CellRanger estimated ~20,000 cells):
Assuming the input to be a matrix of counts or expected counts.
Aggregating features...
Warning message:
"Quick-TRANSfer stage steps exceeded maximum (= 1905250)"
Creating ~11084 artificial doublets...
Dimensional reduction
Evaluating kNN...
Training model...
Error in if (length(expected) > 1 && x > min(expected) && x < max(expected)) return(0): missing value where TRUE/FALSE needed
I have not encountered this error in several other (much smaller) samples I have tried, so is this related to the dataset being too large?
Traceback:
1. scDblFinder(peak_assay, aggregateFeatures = TRUE, nfeatures = 25,
. processing = "normFeatures")
2. .scDblscore(d, scoreType = score, addVals = pca[, includePCs,
. drop = FALSE], threshold = threshold, dbr = dbr, dbr.sd = dbr.sd,
. nrounds = nrounds, max_depth = max_depth, iter = iter, BPPARAM = BPPARAM,
. features = trainingFeatures, verbose = verbose, metric = metric,
. filterUnidentifiable = removeUnidentifiable, unident.th = unident.th)
3. which((d$type == "real" & doubletThresholding(d, dbr = dbr, dbr.sd = dbr.sd,
. stringency = 0.7, perSample = perSample, returnType = "call") ==
. "doublet") | (d$type == "doublet" & d$score < unident.th &
. filterUnidentifiable) | !d$include.in.training)
4. doubletThresholding(d, dbr = dbr, dbr.sd = dbr.sd, stringency = 0.7,
. perSample = perSample, returnType = "call")
5. .optimThreshold(d, dbr = .gdbr(d, dbr), dbr.sd = dbr.sd, stringency = stringency)
6. optimize(totfn, c(0, 1), maximum = FALSE)
7. (function (arg)
. f(arg, ...))(0.381966011250105)
8. f(arg, ...)
9. .prop.dev(d$type, d$score, expected, x)
Session info
`R version 4.2.2 (2022-10-31)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Ubuntu 20.04.3 LTS
Matrix products: default
BLAS/LAPACK: /opt/conda/envs/NET_R_env/lib/libopenblasp-r0.3.21.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] BSgenome.Hsapiens.UCSC.hg38_1.4.5 BSgenome_1.66.3
[3] Biostrings_2.66.0 XVector_0.38.0
[5] CopyscAT_0.40 MASS_7.3-60
[7] jsonlite_1.8.4 sp_1.6-1
[9] rtracklayer_1.58.0 gplots_3.1.3
[11] tibble_3.2.1 tidyr_1.3.0
[13] edgeR_3.40.2 limma_3.54.2
[15] stringr_1.5.0 mclust_6.0.0
[17] changepoint_2.2.4 zoo_1.8-12
[19] data.table_1.14.8 igraph_1.4.3
[21] FNN_1.1.3.2 Rtsne_0.16
[23] biomaRt_2.54.0 fastcluster_1.2.3
[25] NMF_0.26 cluster_2.1.4
[27] rngtools_1.5.2 registry_0.5-1
[29] viridis_0.6.3 viridisLite_0.4.2
[31] dplyr_1.1.2 RColorBrewer_1.1-3
[33] scDblFinder_1.13.13 SingleCellExperiment_1.20.1
[35] SummarizedExperiment_1.28.0 MatrixGenerics_1.10.0
[37] matrixStats_0.63.0 glue_1.6.2
[39] ggplot2_3.4.1 EnsDb.Hsapiens.v86_2.99.0
[41] ensembldb_2.22.0 AnnotationFilter_1.22.0
[43] GenomicFeatures_1.50.2 AnnotationDbi_1.60.0
[45] Biobase_2.58.0 GenomicRanges_1.50.2
[47] GenomeInfoDb_1.34.9 IRanges_2.32.0
[49] S4Vectors_0.36.2 BiocGenerics_0.44.0
[51] Signac_1.10.0 SeuratObject_4.1.3
[53] Seurat_4.3.0
loaded via a namespace (and not attached):
[1] rappdirs_0.3.3 pbdZMQ_0.3-9
[3] scattermore_1.0 bit64_4.0.5
[5] irlba_2.3.5.1 DelayedArray_0.24.0
[7] KEGGREST_1.38.0 RCurl_1.98-1.12
[9] doParallel_1.0.17 generics_0.1.3
[11] ScaledMatrix_1.6.0 cowplot_1.1.1
[13] RSQLite_2.2.20 RANN_2.6.1
[15] future_1.32.0 bit_4.0.5
[17] spatstat.data_3.0-1 xml2_1.3.4
[19] httpuv_1.6.11 hms_1.1.3
[21] evaluate_0.21 promises_1.2.0.1
[23] fansi_1.0.4 restfulr_0.0.15
[25] progress_1.2.2 caTools_1.18.2
[27] dbplyr_2.3.1 DBI_1.1.3
[29] htmlwidgets_1.6.2 spatstat.geom_3.2-1
[31] purrr_1.0.1 ellipsis_0.3.2
[33] gridBase_0.4-7 deldir_1.0-9
[35] sparseMatrixStats_1.10.0 vctrs_0.6.2
[37] ROCR_1.0-11 abind_1.4-5
[39] cachem_1.0.8 withr_2.5.0
[41] progressr_0.13.0 sctransform_0.3.5
[43] GenomicAlignments_1.34.1 prettyunits_1.1.1
[45] scran_1.26.2 goftest_1.2-3
[47] IRdisplay_1.1 lazyeval_0.2.2
[49] crayon_1.5.2 spatstat.explore_3.2-1
[51] pkgconfig_2.0.3 nlme_3.1-162
[53] vipor_0.4.5 ProtGenerics_1.30.0
[55] rlang_1.1.0 globals_0.16.2
[57] lifecycle_1.0.3 miniUI_0.1.1.1
[59] filelock_1.0.2 BiocFileCache_2.6.0
[61] rsvd_1.0.5 polyclip_1.10-4
[63] lmtest_0.9-40 Matrix_1.5-4
[65] IRkernel_1.3.2 base64enc_0.1-3
[67] beeswarm_0.4.0 ggridges_0.5.4
[69] png_0.1-8 rjson_0.2.21
[71] bitops_1.0-7 KernSmooth_2.23-21
[73] blob_1.2.3 DelayedMatrixStats_1.20.0
[75] parallelly_1.35.0 spatstat.random_3.1-5
[77] beachmat_2.14.2 scales_1.2.1
[79] memoise_2.0.1 magrittr_2.0.3
[81] plyr_1.8.8 ica_1.0-3
[83] zlibbioc_1.44.0 compiler_4.2.2
[85] dqrng_0.3.0 BiocIO_1.8.0
[87] fitdistrplus_1.1-11 Rsamtools_2.14.0
[89] cli_3.6.1 listenv_0.9.0
[91] patchwork_1.1.2 pbapply_1.7-0
[93] tidyselect_1.2.0 stringi_1.7.12
[95] yaml_2.3.7 BiocSingular_1.14.0
[97] locfit_1.5-9.7 ggrepel_0.9.3
[99] grid_4.2.2 fastmatch_1.1-3
[101] tools_4.2.2 future.apply_1.10.0
[103] parallel_4.2.2 uuid_1.1-0
[105] bluster_1.8.0 foreach_1.5.2
[107] metapod_1.6.0 gridExtra_2.3
[109] digest_0.6.31 BiocManager_1.30.20
[111] shiny_1.7.4 Rcpp_1.0.10
[113] scuttle_1.8.4 later_1.3.1
[115] RcppAnnoy_0.0.20 httr_1.4.5
[117] colorspace_2.1-0 XML_3.99-0.14
[119] tensor_1.5 reticulate_1.28
[121] splines_4.2.2 uwot_0.1.14
[123] RcppRoll_0.3.0 statmod_1.5.0
[125] spatstat.utils_3.0-3 scater_1.26.1
[127] xgboost_1.7.5.1 plotly_4.10.1
[129] xtable_1.8-4 R6_2.5.1
[131] pillar_1.9.0 htmltools_0.5.5
[133] mime_0.12 fastmap_1.1.1
[135] BiocParallel_1.32.6 BiocNeighbors_1.16.0
[137] codetools_0.2-19 utf8_1.2.3
[139] lattice_0.21-8 spatstat.sparse_3.0-1
[141] curl_4.3.3 ggbeeswarm_0.7.2
[143] leiden_0.4.3 gtools_3.9.4
[145] survival_3.5-5 repr_1.1.6
[147] munsell_0.5.0 GenomeInfoDbData_1.2.9
[149] iterators_1.0.14 reshape2_1.4.4
[151] gtable_0.3.3 `
Hi there,
Aaron pointed me towards this package and I think there's something I can help with. In the doubletThresholding
function, the "griffiths"
method is calculating deviations from the per-cluster medians. However, this is a problem when a cluster consists only of doublets, as cells may not deviate from the cluster average, which will be high. These clusters are quite common if you have very different cell types in your experiment.
Rather, in the paper I used this approach for, I calculated deviations per-sample. The idea being that different samples have different characteristics (timepoint, level of digestion etc.) and therefore should be handled separately.
I thought about submitting a PR to tweak this, but I wasn't sure if you wanted the function to be used across samples (in which case $cluster
can more or less be replaced with $sample
), or on one sample at a time (in which case the code just gets very much simpler).
Cheers,
Jonny.
Dear developers,
Thank you very much for developing this useful tool. I tried it on my dataset. I used the samples = sampleID
argument. However, I still have >10% doublets rate, which is unreasonable. Could you help please?
Here is my code:
bp <- SnowParam(8, RNGseed=1234) #to make the results reproducible. Unix use MulticoreParam()
bpstart(bp)
split_D<- scDblFinder(split_D,samples = 'sampleID',BPPARAM = bp) #splitD is my SCE object.
bpstop(bp)
split_D@colData$scDblFinder.class %>% table
singlet doublet
31037 3260
Here are the numbers of cells for each sampleID:
split_D@colData$sampleID
4210 5831 6486 2981 5037 5525 1424 2803.
I double checked in the resulting SCE object and the scDblFinder.sample equals the sampleID.
According to 10X, each sample at this cell number should contain <5% doublets: https://kb.10xgenomics.com/hc/en-us/articles/360001378811-What-is-the-maximum-number-of-cells-that-can-be-profiled-
sessionInfo()
R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22621)
Matrix products: default
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] BiocParallel_1.32.5 scDblFinder_1.13.7 SingleCellExperiment_1.20.0 SummarizedExperiment_1.28.0
[5] Biobase_2.58.0 GenomicRanges_1.50.2 GenomeInfoDb_1.34.6 IRanges_2.32.0
[9] S4Vectors_0.36.1 BiocGenerics_0.44.0 MatrixGenerics_1.10.0 matrixStats_0.63.0
[13] future_1.31.0 dittoSeq_1.10.0 forcats_0.5.2 stringr_1.5.0
[17] dplyr_1.0.10 purrr_1.0.1 readr_2.1.3 tidyr_1.2.1
[21] tibble_3.1.8 ggplot2_3.4.0 tidyverse_1.3.2 plyr_1.8.8
[25] data.table_1.14.6 SeuratObject_4.1.3 Seurat_4.3.0
Hi,
I have 10X Multiome data where we have scATAC and scGEX data from the same nucleus. As is understand scDblFinder can use either ATAC or Genexression data to call doublets. Would it be possible to use both assays together for doublet-calling?
Thanks
Hi scDblFinder team,
Thanks for such a great package. Recently, I am using this package to find the doublet cells for my Seurat object. I transferred my Seurat object into single cell experiment, but when I run the scDblFinder, I got an error: Size factors should be positive. My Seurat object has no log-transformation. But even after I log-transformed them, I still got the same error. I also have other datasets and they can run it smoothly. The only difference is the failed datasets have mouse cell spike-in, but I have removed these cells before running scDblFinder. Is there any solution for this issue?
Thanks,
Yale
I'm somewhat new to using this so I am not sure how to fix it.
Org_nodoub <- processing_seurat_sctransform(Org_nodoub,
vars_to_regress = c("nCount_RNA","percent.mito","percent.ribo"),
npcs = 30,
res = 0.5)
Error in qr.resid(qr = qr, y = data.expr[x, ]) :
'qr' and 'y' must have the same number of rows
Hi scDblFinder team!
It's mentioned in the paper that scDblFinder utilizes multiple features obtained from the Knn network, such as projections on principal components; library size; the number of detected features; and co-expression scores. But I can only find the scDblFinder.weighted and scDblFinder.cxds_score in the output R object. Could you tell me how to obtain all features used in training GDBT tree in the R object?
Thanks
Hello, when combining samples in a single SCE, I find that I get an error during model training. This does not occur when running scDblFinder on single samples. The error is independent of cluster method (it shows up with "overcluster" as well):
masterSCE1 = scDblFinder(sce = masterSCE, samples = "sample_ID", nfeatures = 1000, clust.method = "fastcluster", score='xgb',verbose = TRUE, use.cxds = TRUE)
Training model...
Error in quantile.default(d$score[w], 1 - dbr) : 'probs' outside [0,1]
########################
sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C LC_TIME=English_United States.1252
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] DropletUtils_1.9.13 SingleCellExperiment_1.11.8 SummarizedExperiment_1.19.9 Biobase_2.49.1 GenomicRanges_1.41.6 GenomeInfoDb_1.25.11 IRanges_2.23.10 S4Vectors_0.27.13
[9] BiocGenerics_0.35.4 MatrixGenerics_1.1.3 matrixStats_0.57.0 scDblFinder_1.3.13 future_1.19.1 Seurat_3.2.2 forcats_0.5.0 stringr_1.4.0
[17] dplyr_1.0.2 purrr_0.3.4 readr_1.4.0 tidyr_1.1.2 tibble_3.0.3 ggplot2_3.3.2 tidyverse_1.3.0
loaded via a namespace (and not attached):
[1] R.utils_2.10.1 reticulate_1.16 tidyselect_1.1.0 htmlwidgets_1.5.2 grid_4.0.2 BiocParallel_1.23.2 Rtsne_0.15 pROC_1.16.2
[9] munsell_0.5.0 codetools_0.2-16 ica_1.0-2 statmod_1.4.34 scran_1.17.21 xgboost_1.2.0.1 miniUI_0.1.1.1 withr_2.3.0
[17] colorspace_1.4-1 rstudioapi_0.11 intrinsicDimension_1.2.0 ROCR_1.0-11 tensor_1.5 listenv_0.8.0 labeling_0.3 GenomeInfoDbData_1.2.4
[25] polyclip_1.10-0 farver_2.0.3 rhdf5_2.33.11 rprojroot_1.3-2 vctrs_0.3.4 generics_0.0.2 R6_2.4.1 ggbeeswarm_0.6.0
[33] rsvd_1.0.3 locfit_1.5-9.4 rhdf5filters_1.1.3 bitops_1.0-6 spatstat.utils_1.17-0 DelayedArray_0.15.16 assertthat_0.2.1 promises_1.1.1
[41] scales_1.1.1 beeswarm_0.2.3 gtable_0.3.0 beachmat_2.5.8 globals_0.13.0 processx_3.4.4 goftest_1.2-2 rlang_0.4.8
[49] splines_4.0.2 lazyeval_0.2.2 broom_0.7.1 BiocManager_1.30.10 yaml_2.2.1 reshape2_1.4.4 abind_1.4-5 modelr_0.1.8
[57] backports_1.1.10 httpuv_1.5.4 tools_4.0.2 ellipsis_0.3.1 RColorBrewer_1.1-2 ggridges_0.5.2 Rcpp_1.0.5 plyr_1.8.6
[65] zlibbioc_1.35.0 RCurl_1.98-1.2 ps_1.4.0 prettyunits_1.1.1 rpart_4.1-15 deldir_0.1-29 pbapply_1.4-3 viridis_0.5.1
[73] cowplot_1.1.0 zoo_1.8-8 haven_2.3.1 ggrepel_0.8.2 cluster_2.1.0 fs_1.5.0 magrittr_1.5 data.table_1.13.0
[81] RSpectra_0.16-0 lmtest_0.9-38 reprex_0.3.0 RANN_2.6.1 fitdistrplus_1.1-1 hms_0.5.3 patchwork_1.0.1 mime_0.9
[89] xtable_1.8-4 scds_1.5.0 readxl_1.3.1 gridExtra_2.3 compiler_4.0.2 scater_1.17.5 KernSmooth_2.23-17 crayon_1.3.4
[97] R.oo_1.24.0 htmltools_0.5.0 mgcv_1.8-31 later_1.1.0.1 lubridate_1.7.9 DBI_1.1.0 dbplyr_1.4.4 MASS_7.3-51.6
[105] rappdirs_0.3.1 Matrix_1.2-18 cli_2.0.2 R.methodsS3_1.8.1 igraph_1.2.5 pkgconfig_2.0.3 plotly_4.9.2.1 scuttle_0.99.18
[113] xml2_1.3.2 yaImpute_1.0-32 vipor_0.4.5 dqrng_0.2.1 XVector_0.29.3 rvest_0.3.6 callr_3.4.4 digest_0.6.25
[121] sctransform_0.3.1 RcppAnnoy_0.0.16 spatstat.data_1.4-3 cellranger_1.1.0 leiden_0.3.3 edgeR_3.31.4 uwot_0.1.8 DelayedMatrixStats_1.11.1
[129] curl_4.3 shiny_1.5.0 lifecycle_0.2.0 nlme_3.1-148 jsonlite_1.7.1 Rhdf5lib_1.11.3 BiocNeighbors_1.7.0 limma_3.45.14
[137] viridisLite_0.3.0 fansi_0.4.1 pillar_1.4.6 lattice_0.20-41 fastmap_1.0.1 httr_1.4.2 pkgbuild_1.1.0 survival_3.1-12
[145] glue_1.4.2 remotes_2.2.0 spatstat_1.64-1 png_0.1-7 bluster_0.99.1 HDF5Array_1.17.14 stringi_1.5.3 blob_1.2.1
[153] BiocSingular_1.5.2 irlba_2.3.3 future.apply_1.6.0
Hi,
I tested your algorithm and doubletFinder on a single 10X PBMC sample of about 7600 cells (after some basic filtering).
I do not see a lot of overlap between the two. See below the output from table(doubletFinder,scDblFinder)
scDblFinder
doubletFinder doublet singlet
Doublet 124 317
Singlet 340 6983
scDblFinder call:
pbmc.sce <- scDblFinder(pbmc.sce, clusters="res.1.2",dbr=0.06, dims=50)
doubletFinder_v3 call:
pbmc <- doubletFinder_v3(pbmc, PCs = 1:50, pN = 0.25, pK = 0.02, nExp = nExp_poi.adj, reuse.pANN = F, sct = T)
pbmc <- doubletFinder_v3(pbmc, PCs = 1:50, pN = 0.25, pK = 0.02, nExp = nExp_poi.adj, reuse.pANN = "pANN_0.25_0.02_466", sct = T)
Is this expected ?
did I make a mistake in the calls ?
Thanks
Hello,
I noted that the following error comes up when there are empty factor levels in the clusters
argument:
Error in value[[3L]](cond) :
An error occured while processing sample 'batch1':
Error in sample.int(length(x), size, replace, prob): invalid 'replace' argument
library(scDblFinder)
library(SingleCellExperiment)
sce <- mockDoubletSCE()
sce <- cbind(sce, sce)
sce <- sce[,!grepl("\\+", as.character(sce$cluster))]
sce$cluster <- as.character(sce$cluster)
#/ Create a cluster level not present in one of the batches by simply forming a cluster of one cell:
sce$cluster[ncol(sce)] <- "cluster3" # batch 1 (see below) will be empty for that factor level "cluster3"
sce$cluster <- factor(sce$cluster)
#/ simulate batch:
sce$batch <- c(rep("batch1", floor(ncol(sce)/2)), rep("batch2", ceiling(ncol(sce)/2)))
#/ will exit with error
scDblFinder(sce = sce, clusters = sce$cluster, samples = sce$batch)
#/ fine as empty factor level is removed
scDblFinder(sce = sce, clusters = as.character(sce$cluster), samples = sce$batch)
So I guess if you add a checkpoint that removes empty factor levels things should be fine.
Empty factor levels could happen if you work with multiple batches and/or an integrated dataset with clusters being specific for one batch/condition etc.
the knownDoublets option in scDblFinder
is is throwing an error when presented with knownDoublets
and also samples
Reproducible example:
library(scDblFinder)
sce <- mockDoubletSCE()
sce$type <- sce$type %in% "doublet"
sce$channel <- c(rep("sample1", floor(ncol(sce)/2)), rep("sample2", ceiling(ncol(sce)/2)))[1:ncol(sce)]
scldbl <- scDblFinder(sce = sce,
samples = "channel",
knownDoublets = "type")
yields
Error in value[[3L]](cond) : An error occured while processing sample 'cluster1': Error in .checkColArg(sce, knownDoublets):
knownDoubletsshould have a length equal to the number of columns in
sce.
but
scldbl <- scDblFinder(sce = sce,
#samples = "channel",
knownDoublets = "type")
succeeds
Clustering cells... 4 clusters Creating ~5000 artifical doublets... Dimensional reduction Finding KNN... Evaluating cell neighborhoods... Training model... Finding threshold... Threshold found:0.425 19 (3.7%) doublets called
Dear Developers,
I'm including this awesome tool in my scRNA-seq analysis workflow but hope you could help clarify the correct procedures.
I notices that in the github readme page, the function takes the count matrix without empty cells as input. My question is, do I need to perform the regular filters (such as lower/upper thresholds for the number of genes per cell or the total UMI counts per cell) before I feed the data to scDblFinder? I saw people doing different things, but think may double-check with you.
Thanks,
Jp
Dear developers,
Thank you for nice package.
I know doublet reproducibility already discussed a lot in issue and I also read them.
But when I adjust that code to my data, it's still not reproducible. Always give me a different results.
I checked my data by using the code which was uploaded on the issue #53.
This is the code which I used and the results.
> sce <- as.SingleCellExperiment(my_seurat_object)
> bp <- MulticoreParam(2, RNGseed=123)
> bpstart(bp)
> m1 <- scDblFinder(sce, clusters=sce$cluster, BPPARAM=bp)$scDblFinder.score
Creating ~5000 artificial doublets...
Dimensional reduction
Evaluating kNN...
Training model...
iter=0, 83 cells excluded from training.
iter=1, 83 cells excluded from training.
iter=2, 80 cells excluded from training.
Threshold found:0.738
50 (4.7%) doublets called
> bpstop(bp)
> bpstart(bp)
> m2 <- scDblFinder(sce, clusters=sce$cluster, BPPARAM=bp)$scDblFinder.score
Creating ~5000 artificial doublets...
Dimensional reduction
Evaluating kNN...
Training model...
iter=0, 76 cells excluded from training.
iter=1, 89 cells excluded from training.
iter=2, 79 cells excluded from training.
Threshold found:0.784
44 (4.1%) doublets called
> bpstop(bp)
> identical(m1,m2)
[1] FALSE
Do you have any ideas about this? My BiocParallel package version is already 1.28.3.
I tried a lot but it's not matched again and again... Please help!
This is the sessioninfo of my R.
R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.3 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
[6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] rsvd_1.0.5 batchelor_1.10.0 remotes_2.4.2 Nebulosa_1.4.0 patchwork_1.1.1
[6] SeuratWrappers_0.3.0 harmony_0.1.0 Rcpp_1.0.8.3 cowplot_1.1.1 dplyr_1.0.9
[11] Seurat_4.1.0 SeuratObject_4.0.4 scDblFinder_1.11.4 SingleCellExperiment_1.16.0 SummarizedExperiment_1.24.0
[16] GenomicRanges_1.46.1 GenomeInfoDb_1.30.1 IRanges_2.28.0 S4Vectors_0.32.4 MatrixGenerics_1.6.0
[21] matrixStats_0.62.0 scaterlegacy_1.5.0 ggplot2_3.3.6 Biobase_2.54.0 BiocGenerics_0.40.0
[26] BiocParallel_1.28.3
loaded via a namespace (and not attached):
[1] utf8_1.2.2 shinydashboard_0.7.2 ks_1.13.5 R.utils_2.11.0 reticulate_1.24
[6] tidyselect_1.1.2 RSQLite_2.2.12 AnnotationDbi_1.56.2 htmlwidgets_1.5.4 grid_4.1.2
[11] Rtsne_0.16 munsell_0.5.0 ScaledMatrix_1.2.0 codetools_0.2-18 ica_1.0-2
[16] xgboost_1.6.0.1 statmod_1.4.36 scran_1.22.1 future_1.24.0 miniUI_0.1.1.1
[21] withr_2.5.0 spatstat.random_2.2-0 colorspace_2.0-3 filelock_1.0.2 rstudioapi_0.13
[26] ROCR_1.0-11 tensor_1.5 listenv_0.8.0 labeling_0.4.2 tximport_1.22.0
[31] GenomeInfoDbData_1.2.7 polyclip_1.10-0 farver_2.1.0 bit64_4.0.5 rhdf5_2.38.1
[36] parallelly_1.31.0 vctrs_0.4.1 generics_0.1.2 BiocFileCache_2.2.1 R6_2.5.1
[41] ggbeeswarm_0.6.0 locfit_1.5-9.5 bitops_1.0-7 rhdf5filters_1.6.0 spatstat.utils_2.3-0
[46] cachem_1.0.6 DelayedArray_0.20.0 assertthat_0.2.1 BiocIO_1.4.0 promises_1.2.0.1
[51] scales_1.2.0 beeswarm_0.4.0 gtable_0.3.0 beachmat_2.10.0 globals_0.14.0
[56] goftest_1.2-3 rlang_1.0.2 splines_4.1.2 rtracklayer_1.54.0 lazyeval_0.2.2
[61] spatstat.geom_2.4-0 BiocManager_1.30.16 yaml_2.3.5 reshape2_1.4.4 abind_1.4-5
[66] httpuv_1.6.5 tools_4.1.2 ellipsis_0.3.2 spatstat.core_2.4-2 RColorBrewer_1.1-3
[71] ggridges_0.5.3 plyr_1.8.7 sparseMatrixStats_1.6.0 progress_1.2.2 zlibbioc_1.40.0
[76] purrr_0.3.4 RCurl_1.98-1.6 prettyunits_1.1.1 rpart_4.1.16 deldir_1.0-6
[81] pbapply_1.5-0 viridis_0.6.2 zoo_1.8-10 ggrepel_0.9.1 cluster_2.1.3
[86] magrittr_2.0.3 data.table_1.14.2 scattermore_0.8 ResidualMatrix_1.4.0 lmtest_0.9-40
[91] RANN_2.6.1 mvtnorm_1.1-3 fitdistrplus_1.1-8 hms_1.1.1 mime_0.12
[96] xtable_1.8-4 XML_3.99-0.9 mclust_5.4.9 gridExtra_2.3 scater_1.22.0
[101] compiler_4.1.2 biomaRt_2.50.3 tibble_3.1.7 KernSmooth_2.23-20 crayon_1.5.1
[106] R.oo_1.24.0 htmltools_0.5.2 mgcv_1.8-40 later_1.3.0 tidyr_1.2.0
[111] DBI_1.1.2 dbplyr_2.1.1 MASS_7.3-56 rappdirs_0.3.3 Matrix_1.4-1
[116] cli_3.3.0 R.methodsS3_1.8.1 metapod_1.2.0 parallel_4.1.2 igraph_1.3.1
[121] pkgconfig_2.0.3 GenomicAlignments_1.30.0 scuttle_1.4.0 plotly_4.10.0 spatstat.sparse_2.1-1
[126] xml2_1.3.3 vipor_0.4.5 dqrng_0.3.0 XVector_0.34.0 stringr_1.4.0
[131] digest_0.6.29 pracma_2.3.8 sctransform_0.3.3 RcppAnnoy_0.0.19 spatstat.data_2.2-0
[136] Biostrings_2.62.0 leiden_0.3.9 uwot_0.1.11 edgeR_3.36.0 DelayedMatrixStats_1.16.0
[141] restfulr_0.0.13 curl_4.3.2 shiny_1.7.1 Rsamtools_2.10.0 rjson_0.2.21
[146] lifecycle_1.0.1 nlme_3.1-157 jsonlite_1.8.0 Rhdf5lib_1.16.0 BiocNeighbors_1.12.0
[151] viridisLite_0.4.0 limma_3.50.3 fansi_1.0.3 pillar_1.7.0 lattice_0.20-45
[156] ggrastr_1.0.1 KEGGREST_1.34.0 fastmap_1.1.0 httr_1.4.2 survival_3.3-1
[161] glue_1.6.2 png_0.1-7 bluster_1.4.0 bit_4.0.4 stringi_1.7.6
[166] blob_1.2.3 BiocSingular_1.10.0 memoise_2.0.1 irlba_2.3.5 future.apply_1.8.1
Would be useful to be able to specify a dimensionality reduction in scDblFinder
rather than automatically defaulting to PCA.
What if you have some other latent space calculated and want to work there?
What if you have corrected your PCA space with fastMNN or harmony and want to work there?
Hello,
First, thank you for a great package. It's been incredibly useful in analyzing our datasets. I'm very interested in the methods you've developed to look at over-represented doublets as a way to explore physically interacting cells in our data.
I have data from multiple samples run on a single lane, which I demultiplexed using genotype-based methods (souporcell in this case). I am using your tool to identify doublets derived from the same sample but two different cell types, so I ran scDblfinder
with the knownDoublets
argument set to the output from souporcell. This worked great, but now I'm trying to use the scDblFinder.stats
within the metadata to explore overrepresented doublets to look for possible cell-cell interactions. The output however doesn't specify whether identified doublets derived from the same sample or not and if they did which sample. In other words, I can't tell if doublets were formed before or after the samples were combined and I can't tell if the doublets are present in all samples or derive just from one condition. I would like to see if a doublets from a treated sample are present in the untreated sample as well or if they are in one but not the other. I know that the samples
argument is available, but it specifies that this is for multiple lanes/independently process samples not for multiplexed samples. Is there an argument to scDblfinder where I can provide the multiplexed sample assignment and run on the same 10X lane to do this?
Thank you for your help!
Would it be possible to return the doublet threshold as part of the standard output, rather than only printing to console, so that it can be used programatically?
The full output becomes very unwieldy with an R environment with multiple large datasets so it would be v helpful to have this as a standard output to help interpret the doublet scores in the output.
Hi, thanks for a great package. When I run scDblFinder on a single cell experiment object with arguments knowns= and knownsUse="discard", the output sce$scDblFinder.class calls some of the known doublets as singlets.
The help for scDblFinder seems to state that with option "discard", the known doublets, while not used for training, should still be called as doublets, so I'm not sure why this is happening. I can of course just add those known doublets back in as doublets manually, but wondered if there was an issue with the scDblFinder code here?
Hello,
I am receiving an error when I try to use scDblFinder
which returns the following messages/error:
Clustering cells...
Identifying top genes per cluster...
Error in base::rowMeans(x, na.rm = na.rm, dims = dims, ...) :
'x' must be an array of at least two dimensions
Here is the code I used:
library(scDblFinder)
library(DropletUtils)
library(SingleCellExperiment)
tenX <- "/path/to/10x/filtered_gene_bc_matrices/"
counts <- read10xCounts(tenX)
sce <- SingleCellExperiment(list(counts=counts))
sce <- scDblFinder(sce, verbose = TRUE)
And this is what the sce object looks like:
class: SingleCellExperiment
dim: 32838 19330
metadata(0):
assays(1): counts
rownames(32838): ENSG00000243485 ENSG00000237613 ... ENSG00000198695
ENSG00000198727
rowData names(0):
colnames: NULL
colData names(0):
reducedDimNames(0):
spikeNames(0):
altExpNames(0):
Here is my session info if needed:
R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 10 (buster)
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.8.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.8.0
locale:
[1] C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] DropletUtils_1.6.1 SingleCellExperiment_1.8.0
[3] SummarizedExperiment_1.16.1 DelayedArray_0.12.3
[5] BiocParallel_1.20.1 matrixStats_0.57.0
[7] Biobase_2.46.0 GenomicRanges_1.38.0
[9] GenomeInfoDb_1.22.1 IRanges_2.20.2
[11] S4Vectors_0.24.4 BiocGenerics_0.32.0
[13] scDblFinder_1.1.8
loaded via a namespace (and not attached):
[1] Rcpp_1.0.5 rsvd_1.0.3 locfit_1.5-9.4
[4] lattice_0.20-40 R6_2.4.1 ggplot2_3.3.2
[7] pillar_1.4.6 zlibbioc_1.32.0 rlang_0.4.8
[10] data.table_1.13.2 irlba_2.3.3 R.oo_1.24.0
[13] R.utils_2.10.1 Matrix_1.2-18 BiocNeighbors_1.4.2
[16] statmod_1.4.34 igraph_1.2.6 RCurl_1.98-1.2
[19] munsell_0.5.0 HDF5Array_1.14.4 compiler_3.6.3
[22] vipor_0.4.5 BiocSingular_1.2.2 pkgconfig_2.0.3
[25] ggbeeswarm_0.6.0 tidyselect_1.1.0 tibble_3.0.4
[28] gridExtra_2.3 GenomeInfoDbData_1.2.2 edgeR_3.28.1
[31] randomForest_4.6-14 viridisLite_0.3.0 crayon_1.3.4
[34] dplyr_1.0.2 R.methodsS3_1.8.1 bitops_1.0-6
[37] grid_3.6.3 gtable_0.3.0 lifecycle_0.2.0
[40] magrittr_1.5 scales_1.1.1 dqrng_0.2.1
[43] XVector_0.26.0 viridis_0.5.1 limma_3.42.2
[46] scater_1.14.6 DelayedMatrixStats_1.8.0 ellipsis_0.3.1
[49] generics_0.0.2 vctrs_0.3.4 Rhdf5lib_1.8.0
[52] tools_3.6.3 glue_1.4.2 beeswarm_0.2.3
[55] purrr_0.3.4 scran_1.14.6 colorspace_1.4-1
[58] rhdf5_2.30.1
Any help is appreciated!
Hello,
I was trying to run scDblFinder with the samples
parameter, set.seed()
, but without BPPARAM
and noticed that reproducibility was not given (Finding the same number of doublets).
Either removing the samples
parameter or adding BPPARAM=MulticoreParam(1, RNGseed=seed)
produced reproducible results.
However, I was searching for a way for serial execution suitable for running in RStudio (I keep having problems with BiocParallel) and needed to consider individual samples. So, after some testing I ended up using BPPARAM=SerialParam(RNGseed = seed)
, which seems to lead to the behaviour I was looking for.
I did not find any comment on SerialParam()
in the documentation. Would this also be your suggested solution in my case or could there be a better alternative?
I´m grateful for any clarification.
Best wishes,
Christian
Hi,
I am trying to install scDblFinder trought conda using:
conda install -c bioconda bioconductor-scdblfinder
However, without any sucess. The error message follows:
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: /
Found conflicts! Looking for incompatible packages.
This can take several minutes. Press CTRL-C to abort.
failed
UnsatisfiableError:
I also tried to use BiocManager::install("scDblFinder")
, without sucess once again. Please, I would appreciate any help on this matter, since scDblFinder seems to be the state of the art software for doublet removal.
Thanks
Hi, thanks for nice package!
I found this package does not support DelayedArray,
so please think about the extension.
library("TENxPBMCData")
library("scRNAseq")
library("scDblFinder")
# Dense matrix
sce <- ZeiselBrainData()
is(counts(sce))
sce <- scDblFinder(sce)
# Sparse matrix
sce2 <- BaronPancreasData('human')
is(counts(sce2))
sce2 <- scDblFinder(sce2)
# DelayedArray
sce3 <- TENxPBMCData(dataset = "pbmc3k")
is(counts(sce3))
sce3 <- scDblFinder(sce3)
# Overclustering...
# clusters
# 1 2 3 4 5 6 7 8 9 10 11
# 354 345 175 162 150 193 276 293 301 312 139
# Creating ~2700 artifical doublets...
# cbind(...) でエラー:
# missing 'cbind' method for DataTable class DelayedMatrix
Hi scDbIFinder developers,
Thank you so much for such wonderful package. It run sooooo fast~~~.
I have a few questions regarding the usage of scDbiFinder. Maybe because it is roughly new, there is no tutorial to follow.
I assumed the "sample" parameter could be used for batches information when we deal with multiple samples/batches data. Then what would be better, detect/remove doublets from individual data and merge for further analysis, or work on merged dataset as a whole.
what is the "normal/common", doublets rate, based on experience?
Thank you.
Y
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.