Bioconductor contributions
Package | BioC-devel | BioC-release |
---|---|---|
edgeR | ||
scater | ||
SingleCellExperiment | ||
iSEE | ||
iSEEu | ||
TSCAN |
Package | BioC-devel | BioC-release |
---|---|---|
TENxBrainData | ||
scDblFinder | ||
zellkonverter | ||
velociraptor | ||
snifter | ||
PCAtools | ||
DelayedMatrixStats |
Clone of the Bioconductor repository for the batchelor package.
Home Page: https://bioconductor.org/packages/devel/bioc/html/batchelor.html
Package | BioC-devel | BioC-release |
---|---|---|
edgeR | ||
scater | ||
SingleCellExperiment | ||
iSEE | ||
iSEEu | ||
TSCAN |
Package | BioC-devel | BioC-release |
---|---|---|
TENxBrainData | ||
scDblFinder | ||
zellkonverter | ||
velociraptor | ||
snifter | ||
PCAtools | ||
DelayedMatrixStats |
fastMNN()
.mnnCorrect()
to use trees, strip out the distinction between input/output.Hi,
I am trying to use multiple datasets for input as fastMNN. I can run it successfully with the below code.
mnn.out <- fastMNN(sce1,sce2)
As I have more than 10 datasets, and it will be an hassle to de type their names. Secondly, the batch variable is a numeric array rather than character array in the output.
I tried following the OSCA tutorial way to first create a list, process the individual elements of list and then use that as input.
list.sce <-list(A=sce1,B=sce2)
mnn.out <- fastMNN(list.sce)
But, this gave me an error that I need to provide batch information. However, in the example in tutorial
mnn.pancreas <- fastMNN(normed.pancreas)
There is no batch information given and still the batch appear as characters.
When I provided batch information as character or factor array, both time it gave me an error regarding difference in dimensions. I created the array repeating sample ID times the number of cells in each sample.
Could you suggest what could be the solution?
Hello, thank you for the amazing package! However we ran into some issue when I was running reducedMNN()
with my own PCA embeddings. I hit an error saying ERROR: C stack usage 7978148 is too close to the limit.
When we looked into the function we realized for reducedMNN, ultimately .create_tree_predefined
is called and that calls .binarize_tree
and .fill_tree
. Both of those two functions are recursive.
Any thought on fixing this problem? Thanks!
I finished installing the "batchelor" toolbox, without any bugs. But, when I use "quickCorrect" function, it showed
could not find function "quickCorrect"
It's very weird, as this is a very important function of this toolbox. Where is it?
Thanks,
Cain
Hi Aaron,
Do you think it is appropriate to use MNN corrected data for diffusion map formation too?
Thank you for your help.
Hi MNN group,
I see in the MNN help pages that it is not recommended to use MNN-corrected gene expression matrix for quantitative analysis such as differential gene expression. But I wonder if it is reasonable to use it to calculate pathway scores in each cell (i.e. in each cell, transform the gene expression into pathway scores based on certain pre-defined gene-pathway mappings). This process does not involve any between-cell comparisons. Does this sound reasonable to you?
Thanks you,
Jack
I tried to run fastMNN (installed today from bioconductor), but I get more cells than expecte din the corrected object. Any idea why?
sobj <- subset(data, features=hvg)
print('sobj')
print(sobj)
expr <- GetAssayData(object = sobj ,slot = "data")
print('expr')
print(dim(expr))
sce <- fastMNN(expr, batch = [email protected][[batch]])
print('sce')
print(dim(sce))
[1] "sobj"
An object of class Seurat
2000 features across 2730 samples within 1 assay
Active assay: originalexp (2000 features, 0 variable features)
[1] "expr"
[1] 2000 2730
[1] "sce"
[1] 2000 9815
Dear developer,
Thank you a lot for the the great tool! I have two datasets with very different cell compositions (one dataset contains only T cells, and the other dataset has diverse cell types), and they benefited a lot from fastMNN integration.
Now I'd like to use the corrected data to understand how the cells from the 1st dataset communicate with malignant cells in the 2nd using CellPhoneDB/CellChat. However, both tools only accept normalized counts as input, and negative values are not allowed. Do you have advice on how to convert the corrected data to positive values?
Thank you very much!
Hi,
I was hoping you could give some advice for how to correct for multiple batches. I have batches from experimental procedures (samples processed in two batches) and due to a sequencing error, the sequencing was also done in two batches. The breakdown is as follows (Y axis - experimental batches, X axis - sequencing batches)
Batch1 Batch2
1 19 23
2 15 5
Currently, I've read in the two sequencing runs separately, correct on them (correction worked beautifully!), extracted the data and created two new separate objects based on the experimental batches. The second batch did not correct very much. I'm used to doing my scSEQ analysis on Seurat so after batch correcting with batchelor, I extracted the corrected counts to make a Seurat object so after the first correction, Seurat is still showing an influence from experimental batch but when I look at the pre-correction PCA separated on experimental batch with the scater/batchelor workflow, it doesn't detect a difference.
I can provide some code and figures if it would help but don't want to bore you if the approach is completely wrong.
Your advice is greatly appreciated!
Thanks,
-Frances
Hello,
I have an issue with the 'fastMNN' function in batchelor.
I can't access the assay of the generated object as I get an error.
out= batchelor::fastMNN(objects.sce)
out
class: SingleCellExperiment
dim: 2000 10000
metadata(2): merge.info pca.info
assays(1): reconstructed
rownames(2000): PTGDS S100B ... BCAS1 AGPAT5
rowData names(1): rotation
colnames(10000): AGAGCGAGTTAGATGA-1 ATTGGTGCAATCTACG-1 ... TTAGGCATCATGGTCA-1
TGGCCAGCAATGGACG-1
colData names(1): batch
reducedDimNames(1): corrected
mainExpName: NULL
altExpNames(0):
assay(out)
Error in (function (cond) : erreur d'évaluation de l'argument 'x' lors de la sélection d'une méthode pour la fonction 'type' : object of type 'S4' is not subsettable
When I manually create an sce object, I don't encounter this problem.
Similarly, when I convert a Seurat object into an sce object, there are no issues accessing the assays.
Here's my SessionInfo()
R version 4.3.1 (2023-06-16 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)
Matrix products: default
locale:
[1] LC_COLLATE=French_France.utf8 LC_CTYPE=French_France.utf8
[3] LC_MONETARY=French_France.utf8 LC_NUMERIC=C
[5] LC_TIME=French_France.utf8
time zone: Europe/Paris
tzcode source: internal
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] SeuratWrappers_0.3.1 genefilter_1.82.1 magick_2.8.0
[4] rgl_1.2.1 destiny_3.14.0 sinaplot_1.1.0
[7] plyr_1.8.8 RColorBrewer_1.1-3 ggplot2_3.4.3
[10] DESeq2_1.40.2 patchwork_1.1.3 UCell_2.4.0
[13] harmony_1.0.1 Rcpp_1.0.11 sscVis_0.1.0
[16] SingleCellExperiment_1.22.0 SummarizedExperiment_1.30.2 Biobase_2.60.0
[19] GenomicRanges_1.52.0 GenomeInfoDb_1.36.3 IRanges_2.34.1
[22] S4Vectors_0.38.2 BiocGenerics_0.46.0 MatrixGenerics_1.12.3
[25] matrixStats_1.0.0 dplyr_1.1.2 SeuratObject_4.1.3
[28] Seurat_4.3.0.1 markdown_1.8 knitr_1.44
loaded via a namespace (and not attached):
[1] spatstat.sparse_3.0-2 bitops_1.0-7 httr_1.4.7
[4] doParallel_1.0.17 dynamicTreeCut_1.63-1 tools_4.3.1
[7] sctransform_0.4.0 backports_1.4.1 ResidualMatrix_1.10.0
[10] utf8_1.2.3 R6_2.5.1 lazyeval_0.2.2
[13] uwot_0.1.16 GetoptLong_1.0.5 withr_2.5.0
[16] sp_2.0-0 gridExtra_2.3 progressr_0.14.0
[19] cli_3.6.1 spatstat.explore_3.2-3 sass_0.4.7
[22] mvtnorm_1.2-3 robustbase_0.99-0 spatstat.data_3.0-1
[25] proxy_0.4-27 ggridges_0.5.4 pbapply_1.7-2
[28] R.utils_2.12.2 parallelly_1.36.0 maps_3.4.1
[31] limma_3.56.2 TTR_0.24.3 RSQLite_2.3.1
[34] rstudioapi_0.15.0 impute_1.74.1 generics_0.1.3
[37] shape_1.4.6 ica_1.0-3 spatstat.random_3.1-6
[40] car_3.1-2 dendextend_1.17.1 Matrix_1.6-1.1
[43] ggbeeswarm_0.7.2 fansi_1.0.4 abind_1.4-5
[46] R.methodsS3_1.8.2 lifecycle_1.0.3 scatterplot3d_0.3-44
[49] yaml_2.3.7 carData_3.0-5 Rtsne_0.16
[52] blob_1.2.4 grid_4.3.1 promises_1.2.1
[55] crayon_1.5.2 miniUI_0.1.1.1 lattice_0.21-8
[58] beachmat_2.16.0 cowplot_1.1.1 annotate_1.78.0
[61] KEGGREST_1.40.0 pillar_1.9.0 ComplexHeatmap_2.16.0
[64] rjson_0.2.21 boot_1.3-28.1 future.apply_1.11.0
[67] codetools_0.2-19 leiden_0.4.3 glue_1.6.2
[70] remotes_2.4.2.1 pcaMethods_1.92.0 data.table_1.14.8
[73] vcd_1.4-11 vctrs_0.6.3 png_0.1-8
[76] spam_2.9-1 gtable_0.3.4 cachem_1.0.8
[79] ks_1.14.1 xfun_0.40 S4Arrays_1.0.6
[82] mime_0.12 RcppEigen_0.3.3.9.3 pracma_2.4.2
[85] survival_3.5-7 iterators_1.0.14 fields_15.2
[88] ellipsis_0.3.2 fitdistrplus_1.1-11 ROCR_1.0-11
[91] nlme_3.1-163 xts_0.13.1 bit64_4.0.5
[94] RcppAnnoy_0.0.21 bslib_0.5.1 irlba_2.3.5.1
[97] vipor_0.4.5 KernSmooth_2.23-22 DBI_1.1.3
[100] colorspace_2.1-0 nnet_7.3-19 smoother_1.1
[103] ggrastr_1.0.2 tidyselect_1.2.0 bit_4.0.5
[106] extrafontdb_1.0 curl_5.0.2 compiler_4.3.1
[109] BiocNeighbors_1.18.0 DelayedArray_0.26.7 plotly_4.10.2
[112] scales_1.2.1 hexbin_1.28.3 DEoptimR_1.1-2
[115] lmtest_0.9-40 stringr_1.5.0 digest_0.6.33
[118] goftest_1.2-3 spatstat.utils_3.0-3 rmarkdown_2.25
[121] XVector_0.40.0 RhpcBLASctl_0.23-42 base64enc_0.1-3
[124] htmltools_0.5.6 pkgconfig_2.0.3 extrafont_0.19
[127] sparseMatrixStats_1.12.2 fastmap_1.1.1 ggthemes_4.2.4
[130] rlang_1.1.1 GlobalOptions_0.1.2 htmlwidgets_1.6.2
[133] shiny_1.7.5 DelayedMatrixStats_1.22.6 jquerylib_0.1.4
[136] zoo_1.8-12 jsonlite_1.8.7 BiocParallel_1.34.2
[139] mclust_6.0.0 R.oo_1.25.0 BiocSingular_1.16.0
[142] RCurl_1.98-1.12 magrittr_2.0.3 scuttle_1.10.2
[145] GenomeInfoDbData_1.2.10 dotCall64_1.0-2 munsell_0.5.0
[148] viridis_0.6.4 reticulate_1.32.0 stringi_1.7.12
[151] zlibbioc_1.46.0 MASS_7.3-60 parallel_4.3.1
[154] listenv_0.9.0 ggrepel_0.9.3 deldir_1.0-9
[157] Biostrings_2.68.1 splines_4.3.1 tensor_1.5
[160] circlize_0.4.15 locfit_1.5-9.8 ranger_0.15.1
[163] igraph_1.5.1 ggpubr_0.6.0 spatstat.geom_3.2-5
[166] ggsignif_0.6.4 RcppHNSW_0.5.0 ScaledMatrix_1.8.1
[169] reshape2_1.4.4 XML_3.99-0.14 evaluate_0.21
[172] BiocManager_1.30.22 laeken_0.5.2 batchelor_1.16.0
[175] foreach_1.5.2 httpuv_1.6.11 Rttf2pt1_1.3.12
[178] VIM_6.2.2 RANN_2.6.1 tidyr_1.3.0
[181] purrr_1.0.2 polyclip_1.10-4 future_1.33.0
[184] clue_0.3-65 scattermore_1.2 gridBase_0.4-7
[187] rsvd_1.0.5 broom_1.0.5 xtable_1.8-4
[190] e1071_1.7-13 RSpectra_0.16-1 rstatix_0.7.2
[193] later_1.3.1 viridisLite_0.4.2 class_7.3-22
[196] tibble_3.2.1 moduleColor_1.8-4 memoise_2.0.1
[199] AnnotationDbi_1.62.2 beeswarm_0.4.0 cluster_2.1.4
[202] ggplot.multistats_1.0.0 globals_0.16.2
Thank you for your help
suppressPackageStartupMessages(library(batchelor))
d1 <- matrix(rnorm(5000), ncol=100)
d1[1:10,1:10] <- d1[1:10,1:10] + 2 # unique population in d1
d2 <- matrix(rnorm(2000), ncol=40)
d2[11:20,1:10] <- d2[11:20,1:10] + 2 # unique population in d2
d3 <- d2 + 5
mat <- cbind(d1, d2, d3)
b <- c(rep("D1", ncol(d1)), rep("D2", ncol(d2)), rep("D3", ncol(d3)))
w <- list("D1", list("D2", "D3"))
# multiBatchPCA()
# Ok
multiBatchPCA(mat, batch = b, weights = w, d = 10)
#> List of length 3
#> names(3): D1 D2 D3
# Ok
multiBatchPCA(mat, batch = factor(b), weights = w, d = 10)
#> List of length 3
#> names(3): D1 D2 D3
# Errors
multiBatchPCA(mat, batch = factor(b, levels = c("D2", "D1", "D3")), weights = w, d = 10)
#> Error in .construct_weight_vector(tab, weights): names in tree-like 'weights' do not match names in '...'
# Ok
multiBatchPCA(mat, batch = as.character(factor(b, levels = c("D2", "D1", "D3"))), weights = w, d = 10)
#> List of length 3
#> names(3): D1 D2 D3
# Source of error
# Ok
batchelor:::.construct_weight_vector(table(b), w)
#> D1 D2 D3
#> 0.50 0.25 0.25
# Ok
batchelor:::.construct_weight_vector(table(factor(b)), w)
#> D1 D2 D3
#> 0.50 0.25 0.25
# Errors
batchelor:::.construct_weight_vector(table(factor(b, levels = c("D2", "D1", "D3"))), w)
#> Error in batchelor:::.construct_weight_vector(table(factor(b, levels = c("D2", : names in tree-like 'weights' do not match names in '...'
# Ok
batchelor:::.construct_weight_vector(table(as.character(factor(b, levels = c("D2", "D1", "D3")))), w)
#> D1 D2 D3
#> 0.50 0.25 0.25
Created on 2023-07-11 with reprex v2.0.2
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.3.1 (2023-06-16)
#> os Ubuntu 22.04.2 LTS
#> system x86_64, linux-gnu
#> ui X11
#> language en_AU:en
#> collate en_AU.UTF-8
#> ctype en_AU.UTF-8
#> tz Australia/Melbourne
#> date 2023-07-11
#> pandoc 3.1.1 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> batchelor * 1.16.0 2023-04-25 [1] Bioconductor
#> beachmat 2.16.0 2023-04-25 [3] Bioconductor
#> Biobase * 2.60.0 2023-04-25 [3] Bioconductor
#> BiocGenerics * 0.46.0 2023-04-25 [3] Bioconductor
#> BiocNeighbors 1.18.0 2023-04-25 [3] Bioconductor
#> BiocParallel 1.34.2 2023-05-22 [1] Bioconductor
#> BiocSingular 1.16.0 2023-04-25 [3] Bioconductor
#> bitops 1.0-7 2021-04-24 [3] RSPM (R 4.2.0)
#> cli 3.6.1 2023-03-23 [3] RSPM (R 4.2.0)
#> codetools 0.2-19 2023-02-01 [3] RSPM (R 4.2.0)
#> crayon 1.5.2 2022-09-29 [3] RSPM (R 4.2.0)
#> DelayedArray 0.26.3 2023-05-22 [1] Bioconductor
#> DelayedMatrixStats 1.22.1 2023-06-09 [1] Bioconductor
#> digest 0.6.32 2023-06-26 [3] RSPM (R 4.2.0)
#> evaluate 0.21 2023-05-05 [3] RSPM (R 4.2.0)
#> fastmap 1.1.1 2023-02-24 [3] RSPM (R 4.2.0)
#> fs 1.6.2 2023-04-25 [3] RSPM (R 4.2.0)
#> GenomeInfoDb * 1.36.1 2023-06-21 [3] Bioconductor
#> GenomeInfoDbData 1.2.10 <NA> [3] Bioconductor
#> GenomicRanges * 1.52.0 2023-04-25 [3] Bioconductor
#> glue 1.6.2 2022-02-24 [3] RSPM (R 4.2.0)
#> htmltools 0.5.5 2023-03-23 [3] RSPM (R 4.2.0)
#> igraph 1.5.0 2023-06-16 [1] CRAN (R 4.3.0)
#> IRanges * 2.34.1 2023-06-22 [3] Bioconductor
#> irlba 2.3.5.1 2022-10-03 [3] RSPM (R 4.2.0)
#> knitr 1.43 2023-05-25 [3] RSPM (R 4.2.0)
#> lattice 0.21-8 2023-04-05 [3] RSPM (R 4.2.0)
#> lifecycle 1.0.3 2022-10-07 [3] RSPM (R 4.2.0)
#> magrittr 2.0.3 2022-03-30 [3] RSPM (R 4.2.0)
#> Matrix 1.5-4.1 2023-05-18 [3] RSPM (R 4.2.0)
#> MatrixGenerics * 1.12.2 2023-06-09 [1] Bioconductor
#> matrixStats * 1.0.0 2023-06-02 [3] RSPM (R 4.2.0)
#> pkgconfig 2.0.3 2019-09-22 [3] CRAN (R 4.0.1)
#> purrr 1.0.1 2023-01-10 [3] RSPM (R 4.2.0)
#> R.cache 0.16.0 2022-07-21 [3] RSPM (R 4.2.0)
#> R.methodsS3 1.8.2 2022-06-13 [3] RSPM (R 4.2.0)
#> R.oo 1.25.0 2022-06-12 [3] RSPM (R 4.2.0)
#> R.utils 2.12.2 2022-11-11 [3] RSPM (R 4.2.0)
#> Rcpp 1.0.10 2023-01-22 [3] RSPM (R 4.2.0)
#> RCurl 1.98-1.12 2023-03-27 [3] RSPM (R 4.2.0)
#> reprex 2.0.2 2022-08-17 [3] RSPM (R 4.2.0)
#> ResidualMatrix 1.4.0 2021-10-26 [3] Bioconductor
#> rlang 1.1.1 2023-04-28 [3] RSPM (R 4.2.0)
#> rmarkdown 2.23 2023-07-01 [3] RSPM (R 4.2.0)
#> rstudioapi 0.14 2022-08-22 [3] RSPM (R 4.2.0)
#> rsvd 1.0.5 2021-04-16 [3] RSPM (R 4.2.0)
#> S4Arrays 1.0.4 2023-05-14 [1] Bioconductor
#> S4Vectors * 0.38.1 2023-05-02 [3] Bioconductor
#> ScaledMatrix 1.8.1 2023-05-03 [1] Bioconductor
#> scuttle 1.10.1 2023-05-02 [1] Bioconductor
#> sessioninfo 1.2.2 2021-12-06 [3] RSPM (R 4.2.0)
#> SingleCellExperiment * 1.22.0 2023-04-25 [3] Bioconductor
#> sparseMatrixStats 1.12.0 2023-04-25 [3] Bioconductor
#> styler 1.10.1 2023-06-05 [1] CRAN (R 4.3.0)
#> SummarizedExperiment * 1.30.2 2023-06-06 [3] Bioconductor
#> vctrs 0.6.3 2023-06-14 [3] RSPM (R 4.2.0)
#> withr 2.5.0 2022-03-03 [3] RSPM (R 4.2.0)
#> xfun 0.39 2023-04-20 [3] RSPM (R 4.2.0)
#> XVector 0.40.0 2023-04-25 [3] Bioconductor
#> yaml 2.3.7 2023-01-23 [3] RSPM (R 4.2.0)
#> zlibbioc 1.46.0 2023-04-25 [3] Bioconductor
#>
#> [1] /home/peter/R/x86_64-pc-linux-gnu-library/4.3
#> [2] /usr/local/lib/R/site-library
#> [3] /usr/lib/R/site-library
#> [4] /usr/lib/R/library
#>
#> ──────────────────────────────────────────────────────────────────────────────
I'm unsure if the fix as simple as making the comparison be against sort(names(ncells))
in the following
Line 343 in 0a15bbf
because I get a bit lost tracking this one down through the internal function calls.
Hi,
great package. I was wondering if this form of batch integration is also applicable to bulk RNA-Seq data. Sure the data is less sparse, but would that be an issue?
Happy about any feedback!
Best,
M
Hello there batchelor Team!
I'm trying to install batchelor. However, when I run devtools::install_github("LTLA/batchelor")
, I get the following error output:
Downloading GitHub repo LTLA/batchelor@master
Skipping 2 packages not available: BiocNeighbors, BiocSingular
Skipping 11 packages ahead of CRAN: beachmat, BiocGenerics, BiocParallel, DelayedArray, gtable, HDF5Array, IRanges, rhdf5, Rhdf5lib, rlang, S4Vectors
Installing 17 packages: Biobase, BiocSingular, DelayedMatrixStats, edgeR, GenomeInfoDb, GenomeInfoDbData, GenomicRanges, limma, locfit, rjson, scater, shinydashboard, SingleCellExperiment, SummarizedExperiment, tximport, XVector, zlibbioc
Installing packages into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)
Error: (converted from warning) package ‘BiocSingular’ is not available (for R version 3.5.3)
I've then tried to install biocSingular. However, I get the following error:
In file included from /usr/local/lib/R/site-library/beachmat/include/beachmat/all_readers.h:4:0,
from /usr/local/lib/R/site-library/beachmat/include/beachmat/LIN_matrix.h:4,
from /usr/local/lib/R/site-library/beachmat/include/beachmat/numeric_matrix.h:4,
from compute_scale.cpp:2:
/usr/local/lib/R/site-library/beachmat/include/beachmat/beachmat.h:15:19: fatal error: H5Cpp.h: No such file or directory
#include "H5Cpp.h"
^
compilation terminated.
/usr/local/lib/R/etc/Makeconf:172: recipe for target 'compute_scale.o' failed
make: *** [compute_scale.o] Error 1
ERROR: compilation failed for package ‘BiocSingular’
* removing ‘/usr/local/lib/R/site-library/BiocSingular’
Error in i.p(...) :
(converted from warning) installation of package ‘/tmp/RtmpGJQkfa/file1977a4d39ec/BiocSingular_0.99.14.tar.gz’ had non-zero exit status
Does anyone know a way to solve this? Any help will be really appreciated.
Thanks in advance!
Davi
Hi batchelor team,
I have applied your aggregation mothod fastMNN on my six batches of 10x data. MNN is the best mothod for my data set, comparing with CCA and scMerge. It worked quite well in terms of downstream clustering! But I get some difficulties on finding marker genes of clusters, since most of the methons prefer to use the counts/logNormalized counts matrix. Do you have any mothod recommended for finding marker genes based on your output matrix, which is scaled and centred to zero?
Regards,
Nelosn
Hello,
Thank you for this great tool.
I'm working to analyze multiple samples from different conditions and compare to published data, in total there are 240k cells over ~40 samples. Four broad cell types (epithelial, stromal, vascular, and immune) are present in many but not all samples (e.g., samples are not balanced by cell type). I have normalized in Seurat with SCTv2 and then integrated with FastMNN via the SeuratWrappers function, and FastMNN does a great job. Other methods have led to significant misclassification of cell types.
My issue and request for help is related to subsetting. I generated cell type specific Variable Feature using subsets from a few samples with 3-5k cells in that type, and then use than in rerunning FastMNN on a subsetted object as follows. Note that the DietSeurat function strips the Seurat object of all other assays/reductions, such as the original MNN. The issue is my output object is enormous now, 140gb for 55k cells. I tried with another cell type with only 5k cells and the object with 30gb. The starting integrated object is about 14gb, with all 240k cells. My guess is that this may be related to some samples having fewer than 50 cells after subsetting, but I am not sure. And note that a few samples only had 100-300 cells in the initial object creation so perhaps low cells/sample isn't the issue after all.
mnn_integrated_object <- readRDS(file = "xxx")
mnn_integrated_object <- DietSeurat(mnn_integrated_object, counts = TRUE, data = TRUE, assays = c("RNA", "SCT"))
gc()
subset_mnn <- subset(mnn_integrated_object, cells = Cell_Type_1_List)
DefaultAssay(subset_mnn) <- "SCT"
subset_mnn <- RunFastMNN(object.list = SplitObject(subset_mnn, split.by = "orig.ident"), features = VariableFeatures, assay = "SCT", verbose = TRUE)
subset_mnn <- RunUMAP(subset_mnn, reduction = "mnn", dims = 1:50, min.dist = 0.3)
subset_mnn <- FindNeighbors(subset_mnn, reduction = "mnn", dims = 1:50)
subset_mnn <- FindClusters(subset_mnn, resolution = 1.0)
Any idea what is going on? Of course I could just skip rerunning FastMNN on the subset but I suspect there is interesting biology present and would prefer to redo the PCA/Integration steps.
Many thanks!
I want to implement the mnnCorrect() function with R(without C++), so I read the source code of smooth_gaussian_kernel.cpp. In the smooth_gaussian_kernel function:
const size_t ngenes=averaged.nrow();
const size_t nmnn=averaged.ncol();
if (nmnn!=index.size()) { throw std::runtime_error("'index' must have length equal to number of rows in 'averaged'"); }
the variable nmnn represents the column of "averaged", why it should be equal to number of rows in "averaged" in the if condition?
When I debug the .cpp source file with lldb, I found I have to transpose the "averaged" before I call the smooth_gaussion_kernel function in the smooth_gaussian_kernel.cpp, otherwise It will throw the error 'index' must have length equal to number of rows in 'averaged'
. but in the R code where mnnCorrect call this C++ function.
function (data1, data2, mnn1, mnn2, tdata2, sigma) { vect <- data1[mnn1, , drop = FALSE] - data2[mnn2, , drop = FALSE] cell.vect <- .Call(cxx_smooth_gaussian_kernel, vect, mnn2 - 1L, tdata2, sigma) t(cell.vect) }
It didn't transpose the vect(in my simulation data,the number of columns of vect represents the number of genes ), but it give the correct result and didn't throw an error, I get confused and dont know why this happens? In a word, I have two problems,
if n_genes==index.size()
instead of if (nmnn!=index.size())
,Originally posted by @yuxiaokang-source in #18 (comment)
I have tried installing batchelor with this command, but with error:
> devtools::install_github("LTLA/batchelor")
Using github PAT from envvar GITHUB_PAT
Downloading GitHub repo LTLA/batchelor@master
Skipping 1 packages not available: scuttle
✓ checking for file ‘/tmp/RtmpceMH6Z/remotes2424a91d4bf/LTLA-batchelor-115a0f4/DESCRIPTION’ ...
─ preparing ‘batchelor’:
✓ checking DESCRIPTION meta-information ...
─ cleaning src
─ checking for LF line-endings in source and make files and shell scripts
─ checking for empty or unneeded directories
─ building ‘batchelor_1.5.1.tar.gz’
Installing package into ‘/home/ubuntu/R/x86_64-pc-linux-gnu-library/3.6’
(as ‘lib’ is unspecified)
ERROR: dependency ‘scuttle’ is not available for package ‘batchelor’
* removing ‘/home/ubuntu/R/x86_64-pc-linux-gnu-library/3.6/batchelor’
Error: Failed to install 'batchelor' from GitHub:
(converted from warning) installation of package ‘/tmp/RtmpceMH6Z/file2424202709f/batchelor_1.5.1.tar.gz’ had non-zero exit status
To deal with the error I tried to install scuttle with the following,
But also with error:
> devtools::install_github("LTLA/scuttle")
Using github PAT from envvar GITHUB_PAT
Downloading GitHub repo LTLA/scuttle@master
✓ checking for file ‘/tmp/RtmpceMH6Z/remotes2424384a9729/LTLA-scuttle-7120e64/DESCRIPTION’ ...
─ preparing ‘scuttle’:
✓ checking DESCRIPTION meta-information ...
─ cleaning src
─ checking for LF line-endings in source and make files and shell scripts
─ checking for empty or unneeded directories
─ building ‘scuttle_0.99.9.tar.gz’
Installing package into ‘/home/ubuntu/R/x86_64-pc-linux-gnu-library/3.6’
(as ‘lib’ is unspecified)
* installing *source* package ‘scuttle’ ...
** using staged installation
** libs
g++ -std=gnu++11 -I"/usr/share/R/include" -DNDEBUG -I../inst/include/ -I"/home/ubuntu/R/x86_64-pc-linux-gnu-library/3.6/Rcpp/include" -I"/home/ubuntu/R/x86_64-pc-linux-gnu-library/3.6/beachmat/include" -fpic -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g -c RcppExports.cpp -o RcppExports.o
g++ -std=gnu++11 -I"/usr/share/R/include" -DNDEBUG -I../inst/include/ -I"/home/ubuntu/R/x86_64-pc-linux-gnu-library/3.6/Rcpp/include" -I"/home/ubuntu/R/x86_64-pc-linux-gnu-library/3.6/beachmat/include" -fpic -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g -c combined_qc.cpp -o combined_qc.o
g++ -std=gnu++11 -I"/usr/share/R/include" -DNDEBUG -I../inst/include/ -I"/home/ubuntu/R/x86_64-pc-linux-gnu-library/3.6/Rcpp/include" -I"/home/ubuntu/R/x86_64-pc-linux-gnu-library/3.6/beachmat/include" -fpic -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g -c downsample_counts.cpp -o downsample_counts.o
g++ -std=gnu++11 -I"/usr/share/R/include" -DNDEBUG -I../inst/include/ -I"/home/ubuntu/R/x86_64-pc-linux-gnu-library/3.6/Rcpp/include" -I"/home/ubuntu/R/x86_64-pc-linux-gnu-library/3.6/beachmat/include" -fpic -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g -c fit_linear_model.cpp -o fit_linear_model.o
g++ -std=gnu++11 -I"/usr/share/R/include" -DNDEBUG -I../inst/include/ -I"/home/ubuntu/R/x86_64-pc-linux-gnu-library/3.6/Rcpp/include" -I"/home/ubuntu/R/x86_64-pc-linux-gnu-library/3.6/beachmat/include" -fpic -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g -c sum_counts.cpp -o sum_counts.o
g++ -std=gnu++11 -I"/usr/share/R/include" -DNDEBUG -I../inst/include/ -I"/home/ubuntu/R/x86_64-pc-linux-gnu-library/3.6/Rcpp/include" -I"/home/ubuntu/R/x86_64-pc-linux-gnu-library/3.6/beachmat/include" -fpic -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g -c utils.cpp -o utils.o
g++ -std=gnu++11 -shared -L/usr/lib/R/lib -Wl,-Bsymbolic-functions -Wl,-z,relro -o scuttle.so RcppExports.o combined_qc.o downsample_counts.o fit_linear_model.o sum_counts.o utils.o -llapack -lblas -lgfortran -lm -lquadmath -L/usr/lib/R/lib -lR
installing to /home/ubuntu/R/x86_64-pc-linux-gnu-library/3.6/00LOCK-scuttle/00new/scuttle/libs
** R
** inst
** byte-compile and prepare package for lazy loading
Error: object ‘make_zero_col_DFrame’ is not exported by 'namespace:S4Vectors'
Execution halted
ERROR: lazy loading failed for package ‘scuttle’
* removing ‘/home/ubuntu/R/x86_64-pc-linux-gnu-library/3.6/scuttle’
Error: Failed to install 'scuttle' from GitHub:
(converted from warning) installation of package ‘/tmp/RtmpceMH6Z/file242432a0e43f/scuttle_0.99.9.tar.gz’ had non-zero exit status
>
Please advice how can I resolve the issue?
My R session is:
> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.6 LTS
Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] rstudioapi_0.11 magrittr_1.5 usethis_1.6.1 devtools_2.3.0 pkgload_1.1.0 R6_2.4.1
[7] rlang_0.4.6 fansi_0.4.1 tools_3.6.1 pkgbuild_1.0.8 sessioninfo_1.1.1 cli_2.0.2
[13] withr_2.2.0 ellipsis_0.3.1 remotes_2.1.1 assertthat_0.2.1 digest_0.6.25 rprojroot_1.3-2
[19] crayon_1.3.4 processx_3.4.2 BiocManager_1.30.10 callr_3.4.3 fs_1.4.1 ps_1.3.3
[25] curl_4.3 testthat_2.3.2 memoise_1.1.0 glue_1.4.1 compiler_3.6.1 desc_1.2.0
[31] backports_1.1.7 prettyunits_1.1.1
>
Hi,
first of all I just wanted to say it's a pleasure to read your documentation, answers and even the code you write as they're always clear and full of opportunities for learning.
Now to the matter at hand: I have a quite large dataset of single nucleus RNA-seq from 8 individuals (8 separate 10X runs). These were prepped/sequenced in 2 different batches, but I am not interested in the inter-individual variability. In other words, I think it is safe to say that I can remove the batch effect (1 and 2) by removing the individual effect (1:8).
By reading on scran
, batchelor
and looking at your workflow on the Bioconductor OSCA website, however, I am still undecided as to what is the best strategy to normalize my data and wanted to ask for advice.
I have already conducted all the necessary count-level QC (capping low/high library sizes, capping % mitochondrial content, removing empty droplets, removing non-identified genes, etc). This was done on a merged count matrix so there is no subsetting problem.
Now for normalization, I reckon I have the following options:
scran
pooling and deconvolution normalization on all individuals in the merged object, ignoring the individual: quickCluster
, then computeSizeFactors
, then logNormCounts
. This appears to be what you did in the pancreas datasets in the OSCA tutorials (although I do not know whether in that case the different individuals were different 10X captures).
scran
pooling and deconvolution as before, only this is done separately on each sample (i.e. subsetting each object by individual and running the normalization steps separately). It's very fast and easy to parallelize, which I don't dislike, and may make more sense in case clustering results are largely different for each individual. However, this may still introduce some bias as size factors may have different scales. I have to say I do not have large differences in coverage across batches/individuals so I do not expect this to be a big issue. Perhaps plotting the deconvolution size factors for each individual separately against the library size factors may shed some light on whether that's the case.
multiBatchNorm
on the merged object, specifying the batch, then logNormCounts
. This method should solve the size factor scaling issue, but it is unclear to me whether it sill uses the clustering + deconvolution approach (which I wanted to use given its success in some benchmarks) or whether it is a different method. Moreover, in the OSCA tutorials (chapter 13) the use of combinedVar
is suggested for HVG selection, whereas I thought it would be sufficient to model the mean-variance with the blocked design.
I would then use fastMNN
to remove any further batch effect that would still be present.
So, the question: what do you think is the most sensible approach? Is there something I'm missing?
Thanks for your time.
suppressPackageStartupMessages(library(batchelor))
B1 <- matrix(rnorm(10000), ncol = 40, dimnames = list(NULL, paste0("B1_", 1:40)))
B2 <- matrix(rnorm(10000), ncol = 40, dimnames = list(NULL, paste0("B2_", 1:40)))
batch <- c(rep(1, ncol(B1)), rep(2, ncol(B2)))
sce <- SingleCellExperiment(list(logcounts = cbind(B1, B2)))
assay(sce, "cosnormed") <- cosineNorm(logcounts(sce))
set.seed(666)
pcs <- multiBatchPCA(sce, batch = batch, assay.type = "cosnormed")
#> Warning in sweep(centered, 2, w, "/", check.margin = FALSE): 'check.margin' is ignored when 'x' is a DelayedArray object or
#> derivative
reducedDim(sce, "PCA") <- do.call(rbind, pcs)
set.seed(666)
named_sce <- fastMNN(sce, batch = batch) # Works
#> Warning in sweep(centered, 2, w, "/", check.margin = FALSE): 'check.margin' is ignored when 'x' is a DelayedArray object or
#> derivative
set.seed(666)
unnamed_sce <- fastMNN(unname(sce), batch = batch) # Works
#> Warning in sweep(centered, 2, w, "/", check.margin = FALSE): 'check.margin' is ignored when 'x' is a DelayedArray object or
#> derivative
all.equal(reducedDim(named_sce), reducedDim(unnamed_sce),
check.attributes = FALSE) # Sanity check
#> [1] TRUE
unnamed_pca <- fastMNN(unname(sce), batch = batch, use.dimred = "PCA") # Works
all.equal(unnamed_pca$corrected, reducedDim(unnamed_sce)) # Sanity check
#> [1] TRUE
named_pca <- fastMNN(sce, batch = batch, use.dimred = "PCA") # Errors
#> Error in dimnames(x) <- dn: length of 'dimnames' [2] not equal to array extent
Created on 2019-05-22 by the reprex package (v0.2.1)
devtools::session_info()
#> ─ Session info ──────────────────────────────────────────────────────────
#> setting value
#> version R version 3.6.0 (2019-04-26)
#> os Ubuntu 18.04.2 LTS
#> system x86_64, linux-gnu
#> ui X11
#> language en_AU:en
#> collate en_AU.UTF-8
#> ctype en_AU.UTF-8
#> tz Australia/Melbourne
#> date 2019-05-22
#>
#> ─ Packages ──────────────────────────────────────────────────────────────
#> package * version date lib source
#> assertthat 0.2.1 2019-03-21 [3] CRAN (R 3.5.3)
#> backports 1.1.4 2019-04-10 [3] CRAN (R 3.5.3)
#> batchelor * 1.0.0 2019-05-02 [1] Bioconductor
#> beeswarm 0.2.3 2016-04-25 [1] CRAN (R 3.6.0)
#> Biobase * 2.44.0 2019-05-02 [1] Bioconductor
#> BiocGenerics * 0.30.0 2019-05-02 [1] Bioconductor
#> BiocNeighbors 1.2.0 2019-05-02 [1] Bioconductor
#> BiocParallel * 1.18.0 2019-05-03 [1] Bioconductor
#> BiocSingular 1.0.0 2019-05-02 [1] Bioconductor
#> bitops 1.0-6 2013-08-17 [3] CRAN (R 3.5.0)
#> callr 3.2.0 2019-03-15 [3] CRAN (R 3.5.3)
#> cli 1.1.0 2019-03-19 [3] CRAN (R 3.5.3)
#> colorspace 1.4-1 2019-03-18 [3] CRAN (R 3.5.3)
#> crayon 1.3.4 2017-09-16 [3] CRAN (R 3.5.0)
#> DelayedArray * 0.10.0 2019-05-02 [1] Bioconductor
#> DelayedMatrixStats 1.6.0 2019-05-02 [1] Bioconductor
#> desc 1.2.0 2018-05-01 [3] CRAN (R 3.5.0)
#> devtools 2.0.2 2019-04-08 [1] CRAN (R 3.6.0)
#> digest 0.6.18 2018-10-10 [3] CRAN (R 3.5.1)
#> dplyr 0.8.0.1 2019-02-15 [3] CRAN (R 3.5.2)
#> evaluate 0.13 2019-02-12 [3] CRAN (R 3.5.2)
#> fs 1.3.1 2019-05-06 [3] CRAN (R 3.6.0)
#> GenomeInfoDb * 1.20.0 2019-05-02 [1] Bioconductor
#> GenomeInfoDbData 1.2.1 2019-05-07 [1] Bioconductor
#> GenomicRanges * 1.36.0 2019-05-02 [1] Bioconductor
#> ggbeeswarm 0.6.0 2017-08-07 [1] CRAN (R 3.6.0)
#> ggplot2 3.1.1 2019-04-07 [3] CRAN (R 3.5.3)
#> glue 1.3.1 2019-03-12 [3] CRAN (R 3.5.3)
#> gridExtra 2.3 2017-09-09 [1] CRAN (R 3.6.0)
#> gtable 0.3.0 2019-03-25 [3] CRAN (R 3.5.3)
#> highr 0.8 2019-03-20 [3] CRAN (R 3.5.3)
#> htmltools 0.3.6 2017-04-28 [3] CRAN (R 3.5.0)
#> IRanges * 2.18.0 2019-05-02 [1] Bioconductor
#> irlba 2.3.3 2019-02-05 [1] CRAN (R 3.6.0)
#> knitr 1.22 2019-03-08 [3] CRAN (R 3.5.2)
#> lattice 0.20-38 2018-11-04 [4] CRAN (R 3.5.1)
#> lazyeval 0.2.2 2019-03-15 [3] CRAN (R 3.5.3)
#> magrittr 1.5 2014-11-22 [3] CRAN (R 3.5.0)
#> Matrix 1.2-17 2019-03-22 [4] CRAN (R 3.5.3)
#> matrixStats * 0.54.0 2018-07-23 [1] CRAN (R 3.6.0)
#> memoise 1.1.0 2017-04-21 [1] CRAN (R 3.6.0)
#> munsell 0.5.0 2018-06-12 [3] CRAN (R 3.5.0)
#> pillar 1.3.1 2018-12-15 [3] CRAN (R 3.5.2)
#> pkgbuild 1.0.3 2019-03-20 [3] CRAN (R 3.5.3)
#> pkgconfig 2.0.2 2018-08-16 [3] CRAN (R 3.5.1)
#> pkgload 1.0.2 2018-10-29 [3] CRAN (R 3.5.1)
#> plyr 1.8.4 2016-06-08 [3] CRAN (R 3.5.0)
#> prettyunits 1.0.2 2015-07-13 [3] CRAN (R 3.5.0)
#> processx 3.3.1 2019-05-08 [3] CRAN (R 3.6.0)
#> ps 1.3.0 2018-12-21 [3] CRAN (R 3.5.2)
#> purrr 0.3.2 2019-03-15 [3] CRAN (R 3.5.3)
#> R6 2.4.0 2019-02-14 [3] CRAN (R 3.5.2)
#> Rcpp 1.0.1 2019-03-17 [3] CRAN (R 3.5.3)
#> RCurl 1.95-4.12 2019-03-04 [3] CRAN (R 3.5.2)
#> remotes 2.0.4 2019-04-10 [1] CRAN (R 3.6.0)
#> rlang 0.3.4 2019-04-07 [3] CRAN (R 3.5.3)
#> rmarkdown 1.12 2019-03-14 [3] CRAN (R 3.5.3)
#> rprojroot 1.3-2 2018-01-03 [3] CRAN (R 3.5.3)
#> rsvd 1.0.0 2018-11-06 [1] CRAN (R 3.6.0)
#> S4Vectors * 0.22.0 2019-05-02 [1] Bioconductor
#> scales 1.0.0 2018-08-09 [3] CRAN (R 3.5.1)
#> scater 1.12.1 2019-05-15 [1] Bioconductor
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.0)
#> SingleCellExperiment * 1.6.0 2019-05-02 [1] Bioconductor
#> stringi 1.4.3 2019-03-12 [3] CRAN (R 3.5.3)
#> stringr 1.4.0 2019-02-10 [3] CRAN (R 3.5.2)
#> SummarizedExperiment * 1.14.0 2019-05-02 [1] Bioconductor
#> testthat 2.1.1 2019-04-23 [1] CRAN (R 3.6.0)
#> tibble 2.1.1 2019-03-16 [3] CRAN (R 3.5.3)
#> tidyselect 0.2.5 2018-10-11 [3] CRAN (R 3.5.1)
#> usethis 1.5.0 2019-04-07 [1] CRAN (R 3.6.0)
#> vipor 0.4.5 2017-03-22 [1] CRAN (R 3.6.0)
#> viridis 0.5.1 2018-03-29 [1] CRAN (R 3.6.0)
#> viridisLite 0.3.0 2018-02-01 [3] CRAN (R 3.5.0)
#> withr 2.1.2 2018-03-15 [3] CRAN (R 3.5.0)
#> xfun 0.6 2019-04-02 [3] CRAN (R 3.5.3)
#> XVector 0.24.0 2019-05-02 [1] Bioconductor
#> yaml 2.2.0 2018-07-25 [3] CRAN (R 3.5.1)
#> zlibbioc 1.30.0 2019-05-02 [1] Bioconductor
#>
#> [1] /home/peter/R/x86_64-pc-linux-gnu-library/3.6
#> [2] /usr/local/lib/R/site-library
#> [3] /usr/lib/R/site-library
#> [4] /usr/lib/R/library
The last line should work and all.equal(named_pca$corrected, reducedDim(named_sce)
should be TRUE
, right?
I got a bit lost in the traceback()
trying to figure this one out.
Similar to LTLA/BiocNeighbors#21, LTLA/csaw#21, and LTLA/metapod#1.
See details of the failure on the Mac arm64 semiweekly report: https://bioconductor.org/checkResults/3.18/bioc-mac-arm64-LATEST/batchelor/kjohnson1-checksrc.html
As a consequence, there's no Mac arm64 binary in BioC 3.18: https://bioconductor.org/packages/3.18/batchelor
Thanks,
H.
This should also take care of setting correct.all=TRUE
.
Hi,
Hopefully this is the right place for this.....
....I'm a big fan of fastMNN, but would like to be able to apply it directly to a set of reduced dimensions (PCs, LSI, etc) rather than a single-cell experiment object. I think this was an option in an earlier version of scran, but seems to have vanished more recently.
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("batchelor")
'getOption("repos")' replaces Bioconductor standard repositories, see '?repositories' for details
replacement repositories:
CRAN: https://cran.rstudio.com/
Bioconductor version 3.14 (BiocManager 1.30.16), R 4.1.2 (2021-11-01)
Installing package(s) 'batchelor'
also installing the dependency ‘scuttle’
Packages which are only available in source form, and may need compilation of C/C++/Fortran: ‘scuttle’
‘batchelor’
Do you want to attempt to install these from sources? (Yes/no/cancel) yes
installing the source packages ‘scuttle’, ‘batchelor’
downloaded 956 KB
downloaded 1.9 MB
The downloaded source packages are in
‘/private/var/folders/mq/j68g448j3j59cct30c1wb9hh0000gp/T/Rtmpp9wICn/downloaded_packages’
Old packages: 'Matrix'
Update all/some/none? [a/s/n]:
n
Warning messages:
1: In .inet_warning(msg) :
unable to access index for repository https://bioconductor.org/packages/3.14/bioc/bin/macosx/big-sur-arm64/contrib/4.1:
cannot open URL 'https://bioconductor.org/packages/3.14/bioc/bin/macosx/big-sur-arm64/contrib/4.1/PACKAGES'
2: In .inet_warning(msg) :
unable to access index for repository https://bioconductor.org/packages/3.14/data/annotation/bin/macosx/big-sur-arm64/contrib/4.1:
cannot open URL 'https://bioconductor.org/packages/3.14/data/annotation/bin/macosx/big-sur-arm64/contrib/4.1/PACKAGES'
3: In .inet_warning(msg) :
unable to access index for repository https://bioconductor.org/packages/3.14/data/experiment/bin/macosx/big-sur-arm64/contrib/4.1:
cannot open URL 'https://bioconductor.org/packages/3.14/data/experiment/bin/macosx/big-sur-arm64/contrib/4.1/PACKAGES'
4: In .inet_warning(msg) :
unable to access index for repository https://bioconductor.org/packages/3.14/workflows/bin/macosx/big-sur-arm64/contrib/4.1:
cannot open URL 'https://bioconductor.org/packages/3.14/workflows/bin/macosx/big-sur-arm64/contrib/4.1/PACKAGES'
5: In .inet_warning(msg) :
unable to access index for repository https://bioconductor.org/packages/3.14/books/bin/macosx/big-sur-arm64/contrib/4.1:
cannot open URL 'https://bioconductor.org/packages/3.14/books/bin/macosx/big-sur-arm64/contrib/4.1/PACKAGES'
6: In .inet_warning(msg) :
installation of package ‘scuttle’ had non-zero exit status
7: In .inet_warning(msg) :
installation of package ‘batchelor’ had non-zero exit status
Dear author,
I have seen that there was a compute.variances parameter in the previous version of fastMNN in scran package. While I want to check out if there is any lost of biologically meaningful "batch effect" doing the fastMNN function, how can I achieve this purpose with current fastMNN missing the compute.variances parameter? Thanks.
In detail.
I have three samples of tissues from the same donor and find that there is one specific type of cells show obviously larger "batch effect" than others. The cells can't be clustered together without batch removal while they can be merged in one cluster with fastMNN. I wonder if there is any biologically meaningful "batch effect" lost during fastMNN function. I think maybe the compute.variances parameter can give me some tips about this.
I previously posted this on bio-conductor support, but later realized here would be better. Also as this is another question, so I made a separate post.
I am trying to use fastMNN approach to integrate multiple datasets. I went through the vignettes for scran and batchelor and tutorial on OSCA to understand how to pre-process my data and then perform fastMNN based correction. Also I followed the advice on this post #12
But I just wanted to be sure if I am getting this right: I should normalize using clustering based size factors individually (using scran), which would then be adjusted across batches by multiBatchNorm. So it will use the pre-computed normalized data to make adjustment? Also for marker analysis using findmarker and convert to edgeR, will it be using the adjusted normalized values or the raw counts? The reason is as Vieth et al suggested scran normalization with clustering is best way to get best DE estimates, I want to use the clustering based normalized data for the merged data to peform DE analysis.
Also after performing fastMNN, could I check how many of the corrected PCs are explaining the variance just like the elbow plot of normal PCA?
Thanks! Piyush
That does the intersection, multiBatchNorm and combining of variances, prior to calling batchCorrect()
. While calling them separately is great for pedagogical value, actually having to type them all out hurts my fingers and is a pain to read.
The same function should also be in charge of merging disparate SCE objects (via the hypothetical combineCols
) and attaching the correction results onto that merged object.
Hi Aaron, thank you for developing and maintaining a great package!
I have time course data from 4 different time points, and each time point has multiple batches. There are biological differences between time points, but there are common cells too.
How would you recommend me to select HVGs, multibatchnorm
and batch correct for such a dataset?
If I batch correct within each time point first then batch correct across time points, can I use MNN corrected matrix as an input for MNN correction?
Thank you so much for your help!
Should emit a message naming the problematic batch, especially when people typo an argument name and get this completely different inconsistency error.
I have a list of SingleCellExperiment
objects from 10X and non-10X experiments, and the counts
and logcounts
assays have a mixture of DelayedMatrix
(10X) & dgCMatrix
(non-10X) data types.
In such case I noticed fastMNN
will go into some kind of never-ending loop and does not complete even after a long long time. However, fastMNN
will run perfectly and very quickly If I specifically changed the logcounts
assays from DelayedMatrix
to dgCMatrix
so that all the inputs have the dgCMatrix
data type.
R version 4.0.3 (2020-10-10)
batchelor_1.6.3
Hi, when I use multiBatchNorm
to do normalization and adjust for the sequencing depth, I tried to figure out it by changing the argument min.mean
, but it cannot work. And it seems this problem has never been reported since I googled and searched this in github without any result.
When we encountered this error, does it mean multiBatchNorm
is not fit for out data and we should do something else like cosineNorm
function?
Just to avoid some unnecessary headaches.
Hello,
I'm trying to use multiBatchNorm with two assays (B10 and B01).
I've checked multiple times for naming and number of rows to be equal but still get this error:
universe <- intersect(rownames(B10), rownames(B01))
rescaled <- multiBatchNorm(B10[universe,], B01[universe,])
Error inassays<-
(*tmp*
, ..., value =*vtmp*
) :
current and replacement dimnames() differ
please advice
Hi I am new to data analysis and I am trying to do batch correction for three batches of single cell data [counts data obtained from alignment ] as part of my project and I tried using mnnCorrect as follows:
##Execute the below in R version 3.6.2
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("batchelor")
##I have counts from alignment of three batches
batch1<-read.csv("file1.csv", skip=1,row.names = 1,header=FALSE, sep='\t')
batch2<-read.csv("file2.csv", skip=1,row.names = 1, header=FALSE,sep='\t')
batch3<-read.csv("file3.csv", skip=1,row.names = 1, header=FALSE,sep='\t')
##and I modified the code below at only two places:
##1. mnnCorrect <- function(batch1,batch2,batch3, batch=NULL, restrict=3, k=20,..
##2. original <- batches <- .unpack_batches(batch1,batch2,batch3)
##----code modified as mentioned above and excited in R-----
mnnCorrect <- function(batch1,batch2,batch3, batch=NULL, restrict=3, k=20, prop.k=NULL, sigma=0.1,
cos.norm.in=TRUE, cos.norm.out=TRUE, svd.dim=0L, var.adj=TRUE,
subset.row=NULL, correct.all=FALSE, merge.order=NULL, auto.merge=FALSE,
assay.type="logcounts", BSPARAM=ExactParam(), BNPARAM=KmknnParam(), BPPARAM=SerialParam())
{
original <- batches <- .unpack_batches(batch1,batch2,batch3)
checkBatchConsistency(batches)
restrict <- checkRestrictions(batches, restrict)
is.sce <- checkIfSCE(batches)
if (any(is.sce)) {
batches[is.sce] <- lapply(batches[is.sce], assay, i=assay.type, withDimnames=FALSE)
}
do.split <- length(batches)==1L
if (do.split) {
divided <- divideIntoBatches(batches[[1]], batch=batch, restrict=restrict[[1]])
batches <- divided$batches
restrict <- divided$restrict
}
if (.bpNotSharedOrUp(BPPARAM)) {
bpstart(BPPARAM)
on.exit(bpstop(BPPARAM), add=TRUE)
}
output <- do.call(.mnn_correct, c(batches,
list(k=k, prop.k=prop.k, sigma=sigma, cos.norm.in=cos.norm.in, cos.norm.out=cos.norm.out, svd.dim=svd.dim,
var.adj=var.adj, subset.row=subset.row, correct.all=correct.all, restrict=restrict,
merge.order=merge.order, auto.merge=auto.merge,
BSPARAM=BSPARAM, BNPARAM=BNPARAM, BPPARAM=BPPARAM)))
if (do.split) {
d.reo <- divided$reorder
output <- output[,d.reo,drop=FALSE]
metadata(output)$merge.info$pairs <- .reindex_pairings(metadata(output)$merge.info$pairs, d.reo)
}
.rename_output(output, original, subset.row=subset.row)
}
-------------------------end of execution
There was no output from the above code [though everything is executed and no error was thrown.
I tried to access 'output' object but I got [ Error: object 'output' not found]
Can you please help me understand where I am going wrong? or if my execution is wrong?
Please guide me to how I can obtain the batch corrected count reads. Thanks a lot for your help in advance.
(First, thanks Aaron for the development and maintenance of this awesome package!)
After reading this preprint, I was wondering if there would be the possibility for such a semi-supervised correction with fastMNN()
?
For example filtering MNN pairs could be done based on the prior annotation of different batches, based on the labels inferred from a SingleR run, based on the matching clusters after a clusterMNN() run... What do you think?
Hello, developers
when using the mnnCorrect, I set the parameter subset.row as the 5000 HVGs and correct.all as TRUE for better determination of MNN pairs. However, the corrected singlecellexperiment object shows that there are NAs in the rownames( gene names). Looking deeply into the corrected matrix, I find that all the gene names are NAs except for the 5000 HVGs, while the corrected values exist.
I can't understand why the gene names are NAs after I set the subset.row as the 5000 HVGs and correct.all as TRUE. What should I do to fix it?
Best wishes
Below are my code and session info:
` # set the sce data and logNorm
luad_a_sce <- SingleCellExperiment(assay = list("counts" = as.matrix(luad_a@assays$Spatial@counts))) %>%
logNormCounts()
colnames(luad_a_sce) <- paste0(colnames(luad_a_sce), "_1")
luad_b_sce <- SingleCellExperiment(assay = list("counts" = as.matrix(luad_b@assays$Spatial@counts))) %>%
logNormCounts()
colnames(luad_b_sce) <- paste0(colnames(luad_b_sce), "_2")
luad_c_sce <- SingleCellExperiment(assay = list("counts" = as.matrix(luad_c@assays$Spatial@counts))) %>%
logNormCounts()
colnames(luad_c_sce) <- paste0(colnames(luad_c_sce), "_3")
#find the highly-variable genes for downstream MNN
dec_luad_a_sce <- modelGeneVar(luad_a_sce)
dec_luad_b_sce <- modelGeneVar(luad_b_sce)
dec_luad_c_sce <- modelGeneVar(luad_c_sce)
combined_dec_abc <- combineVar(dec_luad_a_sce, dec_luad_b_sce, dec_luad_c_sce)
HVGs_abc_sce <- getTopHVGs(combined_dec_abc, n=5000)
luad_abc_batch_corrected_sce_MNN <- mnnCorrect(luad_a_sce, luad_b_sce, luad_c_sce,
cos.norm.out = FALSE, subset.row = HVGs_abc_sce,
correct.all = TRUE, merge.order = c(1,3,2))`
`R version 4.0.5 (2021-03-31)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS/LAPACK: /public/home/maintain/miniconda3/lib/libopenblasp-r0.3.20.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] forcats_0.5.1 stringr_1.4.1 dplyr_1.0.9 purrr_0.3.4
[5] readr_2.1.2 tidyr_1.2.0 tibble_3.1.7 tidyverse_1.3.2
[9] sp_1.5-0 SeuratObject_4.1.0 Seurat_4.1.1 scuttle_1.0.4
[13] scater_1.18.6 ggplot2_3.3.6 scran_1.18.7 batchelor_1.6.3
[17] SingleCellExperiment_1.12.0 SummarizedExperiment_1.20.0 Biobase_2.50.0 GenomicRanges_1.42.0
[21] GenomeInfoDb_1.26.7 IRanges_2.24.1 S4Vectors_0.28.1 BiocGenerics_0.43.4
[25] MatrixGenerics_1.2.1 matrixStats_0.62.0
loaded via a namespace (and not attached):
[1] utf8_1.2.2 reticulate_1.26 tidyselect_1.1.2 htmlwidgets_1.5.4
[5] grid_4.0.5 BiocParallel_1.24.1 Rtsne_0.16 munsell_0.5.0
[9] codetools_0.2-18 ica_1.0-2 statmod_1.4.36 future_1.28.0
[13] miniUI_0.1.1.1 withr_2.5.0 spatstat.random_2.2-0 colorspace_2.0-3
[17] progressr_0.10.1 rstudioapi_0.14 ROCR_1.0-11 tensor_1.5
[21] listenv_0.8.0 GenomeInfoDbData_1.2.4 polyclip_1.10-0 parallelly_1.32.1
[25] vctrs_0.4.1 generics_0.1.3 R6_2.5.1 ggbeeswarm_0.6.0
[29] rsvd_1.0.5 locfit_1.5-9.4 bitops_1.0-7 spatstat.utils_2.3-1
[33] DelayedArray_0.16.3 assertthat_0.2.1 promises_1.2.0.1 scales_1.2.1
[37] googlesheets4_1.0.1 rgeos_0.5-8 beeswarm_0.4.0 gtable_0.3.1
[41] beachmat_2.6.4 globals_0.16.1 goftest_1.2-3 rlang_1.0.2
[45] splines_4.0.5 lazyeval_0.2.2 gargle_1.2.1 spatstat.geom_2.4-0
[49] broom_0.8.0 reshape2_1.4.4 abind_1.4-5 modelr_0.1.9
[53] backports_1.4.1 httpuv_1.6.5 tools_4.0.5 ellipsis_0.3.2
[57] spatstat.core_2.4-4 RColorBrewer_1.1-3 ggridges_0.5.3 Rcpp_1.0.8.3
[61] plyr_1.8.7 sparseMatrixStats_1.2.1 zlibbioc_1.36.0 RCurl_1.98-1.6
[65] rpart_4.1.16 deldir_1.0-6 pbapply_1.5-0 viridis_0.6.2
[69] cowplot_1.1.1 zoo_1.8-10 haven_2.5.0 ggrepel_0.9.1
[73] cluster_2.1.3 fs_1.5.2 magrittr_2.0.3 data.table_1.14.2
[77] scattermore_0.8 ResidualMatrix_1.0.0 lmtest_0.9-40 reprex_2.0.2
[81] RANN_2.6.1 googledrive_2.0.0 fitdistrplus_1.1-8 hms_1.1.1
[85] patchwork_1.1.1 mime_0.12 xtable_1.8-4 readxl_1.4.0
[89] gridExtra_2.3 compiler_4.0.5 KernSmooth_2.23-20 crayon_1.5.2
[93] htmltools_0.5.2 mgcv_1.8-40 later_1.3.0 tzdb_0.3.0
[97] lubridate_1.8.0 DBI_1.1.3 dbplyr_2.2.1 MASS_7.3-57
[101] Matrix_1.4-1 cli_3.3.0 parallel_4.0.5 igraph_1.3.1
[105] pkgconfig_2.0.3 plotly_4.10.0 spatstat.sparse_2.1-1 xml2_1.3.3
[109] vipor_0.4.5 dqrng_0.3.0 XVector_0.30.0 rvest_1.0.3
[113] digest_0.6.29 sctransform_0.3.3 RcppAnnoy_0.0.19 spatstat.data_2.2-0
[117] cellranger_1.1.0 leiden_0.4.2 uwot_0.1.11 edgeR_3.32.1
[121] DelayedMatrixStats_1.12.3 shiny_1.7.3 lifecycle_1.0.1 nlme_3.1-157
[125] jsonlite_1.8.0 BiocNeighbors_1.8.2 viridisLite_0.4.1 limma_3.46.0
[129] fansi_1.0.3 pillar_1.8.1 lattice_0.20-45 fastmap_1.1.0
[133] httr_1.4.4 survival_3.3-1 glue_1.6.2 png_0.1-7
[137] bluster_1.0.0 stringi_1.7.6 BiocSingular_1.6.0 irlba_2.3.5
[141] future.apply_1.9.0 `
I am testing fastMNN() on our single-cell dataset to see how well it works at removing batch effects in our dataset. I got matrices from the groups I wanted to test using by extracting the specific cells by their batches from a Seurat object, making sure they were in the right sparse matrix format. I then ran it overnight, setting it to run with multiple cores.
However, it did not appear to work as it was still processing and did not appear to be using the multiple cores like I input. I used the same method of selecting multiple cores as I did with a previous runthrough of scran so I don't think it should have been that. I have also run Seurat CCA overnight on a less powerful computer and have been able to get results so I don't think processing power is the issue.
What could have gone wrong that it wasn't able to complete processing? Or is it actually that intensive of a function that it would require more than a day to finish running?
Hello,
I am using RunFastMNN which can be used directly with 'Seurat object'.
There were no problem with it but all of the sudden, I met this error below.
Would there be any suggestion to solve this??
Thank you!
im <- RunFastMNN(object.list = SplitObject(a, split.by = "batch"))
Computing 2000 integration features
Error in SummarizedExperiment::SummarizedExperiment(assays = assays) :
the rownames and colnames of the supplied assay(s) must be NULL or
identical to those of the SummarizedExperiment object (or derivative)
to construct
Dear all,
I am asking about the functions fastMNN and mnnCorrect,
I give the functions like mnnCorrect or fastMNN the two gene expression matrices (genes in rows and cells in columns) but I always get
the same error : Error in checkBatchConsistency(batches) :
row names are not the same across batches
although I kept only the common genes in the two matrices.
Also, from the output variable, how to know which cells from both matrices match with each other?
Thanks
Ahmed
Hello,
I am running fastMNN with data set that having 2 batches , 'dataset' and 'condition' .
I tried to correct both in once by specifying batch
fastMNN(combined,batch=c(combined$dataset, combined$condition), subset.row=chosen.hvgs)
but this does not work.
Is there a way to correct multiple batches?
Thank you :)
For those people who work with cbind
'd batches in a single object.
Hi,
I'm trying to perform batch correction using fastMNN with a hierarchical merge order which integrates within each developmental stage before merging across stages. I've run the following below but I'm running into a subscript out of bounds error.
sce <- loadWagner2018()
gene_var <- modelGeneVar(sce)
hvgs <- getTopHVGs(gene_var,n=5000)
meta <- colData(sce)
order_df = meta[!duplicated(sce$library_id), c("stage", "library_id")]
order_df$ncells = sapply(order_df$library_id, function(x) sum(meta$library_id == x))
order_df$stage = factor(order_df$stage,
levels = rev(c("24hpf",
"18hpf",
"14hpf",
"10hpf",
"8hpf",
"6hpf",
"4hpf")))
order_df$library_id <- as.factor(order_df$library_id)
order_df = order_df[order(order_df$stage, order_df$ncells, decreasing = TRUE),]
merge_order <- lapply(split(order_df,order_df$stage), function(x) list(x$library_id))
names(merge_order)<- NULL
order_df$stage = as.character(order_df$stage)
out <- fastMNN(sce,batch=sce$library_id,subset.row = hvgs,
merge.order = rev(merge_order))
Error message:
Error: subscript contains out-of-bounds indices
After a bit of digging, it looks like the error is thrown while trying to reorder the MNN output at line 387.
# Reordering by the input order.
d.reo <- divided$reorder
output <- output[d.reo,,drop=FALSE]
I was wondering if you could suggest what I might be doing wrong...?
The dataset has 63,530 cells and so d.reo
is a permutation of these, however, the dimensions of output
seems to be (29106, 2)
which I was a bit confused by.
The merge.order list specified contains a nested list of the library IDs:
[[1]]
[[1]][[1]]
[1] DEW106 DEW108 DEW105 DEW103 DEW102 DEW107 DEW109 DEW164 DEW101 DEW166 DEW104 DEW163 DEW110
[14] DEW165 DEW055 DEW162 DEW057 DEW056 DEW053 DEW054 DEW052 DEW168 DEW169 DEW021 DEW159 DEW158
[27] DEW167 DEW160 DEW161
54 Levels: DEW001 DEW003 DEW010 DEW011 DEW012 DEW021 DEW032 DEW033 DEW034 DEW035 ... DEW169
[[2]]
[[2]][[1]]
[1] DEW003 DEW038 DEW039 DEW041 DEW040 DEW012 DEW001
54 Levels: DEW001 DEW003 DEW010 DEW011 DEW012 DEW021 DEW032 DEW033 DEW034 DEW035 ... DEW169
[[3]]
[[3]][[1]]
[1] DEW035 DEW036 DEW037 DEW011
54 Levels: DEW001 DEW003 DEW010 DEW011 DEW012 DEW021 DEW032 DEW033 DEW034 DEW035 ... DEW169
[[4]]
[[4]][[1]]
[1] DEW033 DEW034 DEW010 DEW032
54 Levels: DEW001 DEW003 DEW010 DEW011 DEW012 DEW021 DEW032 DEW033 DEW034 DEW035 ... DEW169
[[5]]
[[5]][[1]]
[1] DEW048 DEW047 DEW049 DEW046
54 Levels: DEW001 DEW003 DEW010 DEW011 DEW012 DEW021 DEW032 DEW033 DEW034 DEW035 ... DEW169
[[6]]
[[6]][[1]]
[1] DEW043 DEW044 DEW045 DEW042
54 Levels: DEW001 DEW003 DEW010 DEW011 DEW012 DEW021 DEW032 DEW033 DEW034 DEW035 ... DEW169
[[7]]
[[7]][[1]]
[1] DEW050 DEW051
54 Levels: DEW001 DEW003 DEW010 DEW011 DEW012 DEW021 DEW032 DEW033 DEW034 DEW035 ... DEW169
Any hits would be appreciated!
Best,
Dan
Session info:
R version 4.0.4 (2021-02-15)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)
Matrix products: default
locale:
[1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] batchelor_1.6.2 SingleCellExperiment_1.12.0 SummarizedExperiment_1.20.0
[4] Biobase_2.50.0 GenomicRanges_1.42.0 GenomeInfoDb_1.26.4
[7] BiocNeighbors_1.8.2 BiocParallel_1.24.1 DelayedArray_0.16.2
[10] IRanges_2.24.1 S4Vectors_0.28.1 MatrixGenerics_1.2.1
[13] matrixStats_0.58.0 BiocGenerics_0.36.0 Matrix_1.3-2
[16] BiocSingular_1.6.0
loaded via a namespace (and not attached):
[1] Rcpp_1.0.6 BiocManager_1.30.10 compiler_4.0.4
[4] bluster_1.0.0 XVector_0.30.0 bitops_1.0-6
[7] tools_4.0.4 DelayedMatrixStats_1.12.3 zlibbioc_1.36.0
[10] digest_0.6.27 statmod_1.4.35 evaluate_0.14
[13] lattice_0.20-41 rlang_0.4.10 pkgconfig_2.0.3
[16] igraph_1.2.6 yaml_2.2.1 xfun_0.21
[19] GenomeInfoDbData_1.2.4 knitr_1.31 locfit_1.5-9.4
[22] grid_4.0.4 scuttle_1.0.4 rmarkdown_2.7
[25] limma_3.46.0 irlba_2.3.3 magrittr_2.0.1
[28] edgeR_3.32.1 htmltools_0.5.1.1 sparseMatrixStats_1.2.1
[31] beachmat_2.6.4 rsvd_1.0.3 dqrng_0.2.1
[34] ResidualMatrix_1.0.0 RCurl_1.98-1.2 scran_1.18.5
Dear Developer,
I wanna ask about an error occurred when I run the fastMNN
out <- fastMNN(sce, batch = sce$Batch,
auto.merge = TRUE,
subset.row = rowData(sce)$use_channel,
assay.type = "exprs")
Warning in (function (A, nv = 5, nu = nv, maxit = 1000, work = nv + 7, reorth = TRUE, :
You're computing too large a percentage of total singular values, use a standard svd instead.
Warning message:
In check_numbers(k = k, nu = nu, nv = nv, limit = min(dim(x)) - :
more singular values/vectors requested than available
Can you help me figure that out?
Thanks in advance!
Hello,
I have integrated multiple datasets successfully using fastMNN. I am now needing to subset the data to focus on the analysis of specific clusters. My questions are as follows:
In other discussion threads that have discussed integration such as Seurat's CCA, it is not recommended to rerun the integration if an integrated dataset is subsetted. However, Seurat conducts the correction in the gene expression space versus the PCA space like fastMNN so it is unclear to me what the best approach should be.
Any advice would be greatly appreciated!
I get the following error. I appreciate your help:
Hi,
There is a problem using correctExperiments
with subset.row=
flag to subset assays to the defined genes. I am using the function on a list of SingleCellExperiments.
batchelor/R/correctExperiments.R
Lines 100 to 102 in ea09e98
Error in eval(subscript, envir = eframe, enclos = eframe): '...' used in an incorrect context
Traceback:
1. lapply(raw.ass, "[", i = subset.row, , drop = FALSE)
2. lapply(raw.ass, "[", i = subset.row, , drop = FALSE)
3. FUN(X[[i]], ...)
4. FUN(X[[i]], ...)
5. extract_Nindex_from_syscall(sys.call(), parent.frame())
6. lapply(seq_len(length(call) - 2L), function(i) {
. subscript <- call[[2L + i]]
. if (missing(subscript))
. return(NULL)
. subscript <- eval(subscript, envir = eframe, enclos = eframe)
. if (is.null(subscript))
. return(integer(0))
. subscript
. })
7. lapply(seq_len(length(call) - 2L), function(i) {
. subscript <- call[[2L + i]]
. if (missing(subscript))
. return(NULL)
. subscript <- eval(subscript, envir = eframe, enclos = eframe)
. if (is.null(subscript))
. return(integer(0))
. subscript
. })
8. FUN(X[[i]], ...)
9. eval(subscript, envir = eframe, enclos = eframe)
10. eval(subscript, envir = eframe, enclos = eframe)
11. eval(subscript, envir = eframe, enclos = eframe)
I propose changed it as below and that should work.
if (!is.null(subset.row) && !correct.all) {
raw.ass <- lapply(raw.ass, function(x) x[subset.row,,drop=FALSE])
}
R version 4.1.0 (2021-05-18)
batchelor_1.8.0
Hi Aaron,
I would like to use MNN for batch correction and have a few questions regarding the data pre-processing for MNN.
At the moment my steps are 1. Filter out empty droplets/low quality cells (for each sample), 2. merge samples in the same batch, 3. normalization within a batch, 4. get HVG, normalize across batches and run MNN.
(My experimental design in detail is described here (#21).)
In step 2, which would you recommend, normalising within a batch and normalising within a sample?
At which point should I filter out lowly expressed genes (calculateAverage(sce) > 0.1)? I am setting min.mean=0.1 for computeSumFactors and using top 5000 highly variable genes for multiBatchNorm/MNN. Do I still have to worry about filtering out genes before within-batch normalization or batch correction?
Thank you very much for your help.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.