The escape from ncborcherding

Differentially expressed genes in enrichment pathways

Dear developer,

This software is so powerful and useful to annotate cell functions.
However, I would like to understand whether this software can further identify differentially expressed genes in certain enriched pathways.
Thank you so much!

Sincerely

YK

Available categories in GetGeneSets

Hello, I wasing wondering how to get gene sets other than H, C1, C2, ....

I'm interesetd in REACTOME subset of CP. What should be the input?

Thank you.

Error in running enrichIt() according to the vignette

http://www.bioconductor.org/packages/release/bioc/vignettes/escape/inst/doc/vignette.html
https://ncborcherding.github.io/vignettes/escape_vignette.html

I am trying to run the above vignettes. Errors are reported when running the function "enrichIt()".
Here is the report:
ES <- enrichIt(obj = seurat_ex, gene.sets = GS, groups = 1000, cores = 4)
[1] "Using sets of 1000 cells. Running 2 times."
Error in .mapGeneSetsToFeatures(mapped.gset.idx.list, rownames(expr)) :
No identifiers in the gene sets could be matched to the identifiers in the expression data.

It seems to be a problem with GSVA.

Here is sessionInfo:
R version 4.0.3 (2020-10-10)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] dittoSeq_1.2.3 ggplot2_3.3.3 SeuratObject_4.0.0 Seurat_4.0.0
[5] escape_1.1.1

loaded via a namespace (and not attached):
[1] circlize_0.4.12 plyr_1.8.6
[3] igraph_1.2.6 lazyeval_0.2.2
[5] GSEABase_1.52.1 splines_4.0.3
[7] BiocParallel_1.24.1 listenv_0.8.0
[9] scattermore_0.7 GenomeInfoDb_1.26.2
[11] digest_0.6.27 htmltools_0.5.1.1
[13] magrittr_2.0.1 memoise_2.0.0
[15] tensor_1.5 cluster_2.1.0
[17] ROCR_1.0-11 limma_3.46.0
[19] ComplexHeatmap_2.6.2 globals_0.14.0
[21] annotate_1.68.0 matrixStats_0.58.0
[23] colorspace_2.0-0 blob_1.2.1
[25] ggrepel_0.9.1 xfun_0.20
[27] dplyr_1.0.4 crayon_1.4.0
[29] RCurl_1.98-1.2 jsonlite_1.7.2
[31] graph_1.68.0 spatstat_1.64-1
[33] spatstat.data_1.7-0 survival_3.2-7
[35] zoo_1.8-8 glue_1.4.2
[37] polyclip_1.10-0 gtable_0.3.0
[39] zlibbioc_1.36.0 XVector_0.30.0
[41] leiden_0.3.7 GetoptLong_1.0.5
[43] DelayedArray_0.16.1 future.apply_1.7.0
[45] shape_1.4.5 SingleCellExperiment_1.12.0
[47] BiocGenerics_0.36.0 abind_1.4-5
[49] scales_1.1.1 msigdbr_7.2.1
[51] pheatmap_1.0.12 edgeR_3.32.1
[53] DBI_1.1.1 miniUI_0.1.1.1
[55] Rcpp_1.0.6 viridisLite_0.3.0
[57] xtable_1.8-4 clue_0.3-58
[59] reticulate_1.18 bit_4.0.4
[61] stats4_4.0.3 GSVA_1.38.1
[63] htmlwidgets_1.5.3 httr_1.4.2
[65] RColorBrewer_1.1-2 ellipsis_0.3.1
[67] ica_1.0-2 farver_2.0.3
[69] pkgconfig_2.0.3 XML_3.99-0.5
[71] uwot_0.1.10 deldir_0.2-9
[73] locfit_1.5-9.4 labeling_0.4.2
[75] tidyselect_1.1.0 rlang_0.4.10
[77] reshape2_1.4.4 later_1.1.0.1
[79] AnnotationDbi_1.52.0 munsell_0.5.0
[81] tools_4.0.3 cachem_1.0.3
[83] generics_0.1.0 RSQLite_2.2.3
[85] ggridges_0.5.3 stringr_1.4.0
[87] fastmap_1.1.0 goftest_1.2-2
[89] bit64_4.0.5 fitdistrplus_1.1-3
[91] purrr_0.3.4 RANN_2.6.1
[93] pbapply_1.4-3 future_1.21.0
[95] nlme_3.1-152 mime_0.9
[97] compiler_4.0.3 plotly_4.9.3
[99] png_0.1-7 spatstat.utils_2.0-0
[101] tibble_3.0.6 stringi_1.5.3
[103] lattice_0.20-41 Matrix_1.3-2
[105] vctrs_0.3.6 pillar_1.4.7
[107] lifecycle_0.2.0 lmtest_0.9-38
[109] GlobalOptions_0.1.2 RcppAnnoy_0.0.18
[111] data.table_1.13.6 cowplot_1.1.1
[113] bitops_1.0-6 irlba_2.3.3
[115] httpuv_1.5.5 patchwork_1.1.1
[117] GenomicRanges_1.42.0 R6_2.5.0
[119] promises_1.1.1 KernSmooth_2.23-18
[121] gridExtra_2.3 IRanges_2.24.1
[123] parallelly_1.23.0 codetools_0.2-18
[125] MASS_7.3-53 assertthat_0.2.1
[127] SummarizedExperiment_1.20.0 rjson_0.2.20
[129] withr_2.4.1 sctransform_0.3.2
[131] S4Vectors_0.28.1 GenomeInfoDbData_1.2.4
[133] mgcv_1.8-33 parallel_4.0.3
[135] grid_4.0.3 rpart_4.1-15
[137] tidyr_1.1.2 MatrixGenerics_1.2.1
[139] Cairo_1.5-12.2 Rtsne_0.15
[141] Biobase_2.50.0 shiny_1.6.0
[143] tinytex_0.29

Question regarding enrichIt

I have a question regarding the output of enrichIt(). This type of analysis is new to me, so maybe the question is quite simple:
I tried to run enrichIt on a seurat object (13 clusters) with two genesets and then afterwards with only one of the gene sets. When I look at the enrichment scores for the geneset I tried twice, they are not the same (although the overall picture is the same), and I am not sure why?
Does some sort of normalization over the genesets take place when running enrichIt() and can it explain the difference I see?

Error when running enrichIt

I was trying to run escape on a list of genes (Formal class GeneSet) I supplied to an expression matrix. I received the following error:

[1] "Using sets of 1000 cells. Running 1 times."
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘gsva’ for signature ‘"matrix", "character"’

I would very grateful for any advice you can provide. Installed packages and versions below:

                             Package    Version

abind abind 1.4-5
annotate annotate 1.68.0
AnnotationDbi AnnotationDbi 1.52.0
askpass askpass 1.1
AUCell AUCell 1.12.0
base64enc base64enc 0.1-3
BH BH 1.75.0-0
Biobase Biobase 2.50.0
BiocGenerics BiocGenerics 0.36.1
BiocManager BiocManager 1.30.12
BiocParallel BiocParallel 1.24.1
BiocVersion BiocVersion 3.12.0
bit bit 4.0.4
bit64 bit64 4.0.5
bitops bitops 1.0-6
blob blob 1.2.1
brew brew 1.0-6
brio brio 1.1.1
bslib bslib 0.2.4
cachem cachem 1.0.4
callr callr 3.6.0
caTools caTools 1.18.2
cli cli 2.4.0
clipr clipr 0.7.1
colorspace colorspace 2.0-0
commonmark commonmark 1.7
cowplot cowplot 1.1.1
cpp11 cpp11 0.2.7
crayon crayon 1.4.1
credentials credentials 1.3.0
crosstalk crosstalk 1.1.1
curl curl 4.3
data.table data.table 1.14.0
DBI DBI 1.1.1
DelayedArray DelayedArray 0.16.3
deldir deldir 0.2-10
desc desc 1.3.0
devtools devtools 2.4.0
diffobj diffobj 0.3.4
digest digest 0.6.27
dplyr dplyr 1.0.5
dqrng dqrng 0.2.1
ellipsis ellipsis 0.3.1
escape escape 1.1.1
evaluate evaluate 0.14
fansi fansi 0.4.2
farver farver 2.1.0
fastmap fastmap 1.1.0
fitdistrplus fitdistrplus 1.1-3
FNN FNN 1.1.3
formatR formatR 1.9
fs fs 1.5.0
futile.logger futile.logger 1.4.3
futile.options futile.options 1.0.1
future future 1.21.0
future.apply future.apply 1.7.0
generics generics 0.1.0
GenomeInfoDb GenomeInfoDb 1.26.7
GenomeInfoDbData GenomeInfoDbData 1.2.4
GenomicRanges GenomicRanges 1.42.0
gert gert 1.3.0
ggplot2 ggplot2 3.3.3
ggrepel ggrepel 0.9.1
ggridges ggridges 0.5.3
gh gh 1.2.1
gitcreds gitcreds 0.1.1
globals globals 0.14.0
glue glue 1.4.2
goftest goftest 1.2-2
gplots gplots 3.1.1
graph graph 1.68.0
gridExtra gridExtra 2.3
GSEABase GSEABase 1.52.1
GSVA GSVA 1.38.2
gtable gtable 0.3.0
gtools gtools 3.8.2
highr highr 0.9
htmltools htmltools 0.5.1.1
htmlwidgets htmlwidgets 1.5.3
httpuv httpuv 1.5.5
httr httr 1.4.2
ica ica 1.0-2
igraph igraph 1.2.6
ini ini 0.3.1
IRanges IRanges 2.24.1
irlba irlba 2.3.3
isoband isoband 0.2.4
jquerylib jquerylib 0.1.3
jsonlite jsonlite 1.7.2
kernlab kernlab 0.9-29
knitr knitr 1.32
labeling labeling 0.4.2
lambda.r lambda.r 1.2.4
later later 1.1.0.1
lazyeval lazyeval 0.2.2
leiden leiden 0.3.7
lifecycle lifecycle 1.0.0
limma limma 3.46.0
listenv listenv 0.8.0
lmtest lmtest 0.9-38
magrittr magrittr 2.0.1
markdown markdown 1.1
MatrixGenerics MatrixGenerics 1.2.1
matrixStats matrixStats 0.58.0
memoise memoise 2.0.0
mime mime 0.10
miniUI miniUI 0.1.1.1
mixtools mixtools 1.2.0
msigdbr msigdbr 7.2.1
munsell munsell 0.5.0
openssl openssl 1.4.3
parallelly parallelly 1.24.0
patchwork patchwork 1.1.1
pbapply pbapply 1.4-3
pillar pillar 1.6.0
pkgbuild pkgbuild 1.2.0
pkgconfig pkgconfig 2.0.3
pkgload pkgload 1.2.1
plogr plogr 0.2.0
plotly plotly 4.9.3
plyr plyr 1.8.6
png png 0.1-7
polyclip polyclip 1.10-0
praise praise 1.0.0
prettyunits prettyunits 1.1.1
processx processx 3.5.1
promises promises 1.2.0.1
ps ps 1.6.0
purrr purrr 0.3.4
R.methodsS3 R.methodsS3 1.8.1
R.oo R.oo 1.24.0
R.utils R.utils 2.10.1
R6 R6 2.5.0
RANN RANN 2.6.1
rappdirs rappdirs 0.3.3
rcmdcheck rcmdcheck 1.3.3
RColorBrewer RColorBrewer 1.1-2
Rcpp Rcpp 1.0.6
RcppAnnoy RcppAnnoy 0.0.18
RcppArmadillo RcppArmadillo 0.10.4.0.0
RcppEigen RcppEigen 0.3.3.9.1
RcppProgress RcppProgress 0.4.2
RCurl RCurl 1.98-1.3
rematch2 rematch2 2.1.2
remotes remotes 2.3.0
reshape2 reshape2 1.4.4
reticulate reticulate 1.18
rlang rlang 0.4.10
ROCR ROCR 1.0-11
roxygen2 roxygen2 7.1.1
rprojroot rprojroot 2.0.2
RSpectra RSpectra 0.16-0
RSQLite RSQLite 2.2.6
rstudioapi rstudioapi 0.13
Rtsne Rtsne 0.15
rversions rversions 2.0.2
S4Vectors S4Vectors 0.28.1
sass sass 0.3.1
scales scales 1.1.1
scattermore scattermore 0.7
sctransform sctransform 0.3.2
segmented segmented 1.3-3
sessioninfo sessioninfo 1.1.1
Seurat Seurat 4.0.1
SeuratObject SeuratObject 4.0.0
shiny shiny 1.6.0
SingleCellExperiment SingleCellExperiment 1.12.0
sitmo sitmo 2.0.1
snow snow 0.4-3
sourcetools sourcetools 0.1.7
spatstat.core spatstat.core 2.0-0
spatstat.data spatstat.data 2.1-0
spatstat.geom spatstat.geom 2.1-0
spatstat.sparse spatstat.sparse 2.0-0
spatstat.utils spatstat.utils 2.1-0
stringi stringi 1.5.3
stringr stringr 1.4.0
SummarizedExperiment SummarizedExperiment 1.20.0
sys sys 3.4
tensor tensor 1.5
testthat testthat 3.0.2
tibble tibble 3.1.1
tidyr tidyr 1.1.3
tidyselect tidyselect 1.1.0
usethis usethis 2.0.1
utf8 utf8 1.2.1
uwot uwot 0.1.10
vctrs vctrs 0.3.7
viridisLite viridisLite 0.4.0
waldo waldo 0.2.5
whisker whisker 0.4
withr withr 2.4.1
xfun xfun 0.22
XML XML 3.99-0.6
xml2 xml2 1.3.2
xopen xopen 1.0.0
xtable xtable 1.8-4
XVector XVector 0.30.0
yaml yaml 2.2.1
zip zip 2.1.1
zlibbioc zlibbioc 1.36.0
zoo zoo 1.8-9

Heatmap colours

Hello,
I have been following this vignette: http://www.bioconductor.org/packages/release/bioc/vignettes/escape/inst/doc/vignette.html which has been very helpful, thank you.

However, When following your script with the pbmc_small data set I am unable to reproduce the heatmap colours you have shown.
Below is the heatmap you show:

By copying in the same code and running it I get the following:

Where the expression level colour scale is clearly very different and very confusing. Do you know why this is happening?

Thanks for your time,
Naomi

Visualization of Enrichment

Hello,

Thank you for this package and the vignette - they are extremely helpful!

I was curious if there is a way to visualize the enrichment results as a dot plot as with ClusterProfiler?(https://datascience.bioturing.com/app/repo/how-to-do-gene-set-enrichment-analysis-using-clusterprofiler-b6643ddde97f4896882d6e16565045ac/v1.0.3-rc1)
I have two groups that I want to know what KEGG and Hallmark pathways are differentially enriched between the clusters.

Thank you,
Brooke

enrichIT values-negative scores

Hello!
I am new to this package and comp biology in general and was just applying the enrichIT code that was published in a paper on my dataset. Turns out that most of the enrichIT values for the geneset I used from the paper on my single cell data were negative. I just wanted to check if I interpret it as downregulation of this geneset in my data.
Thanks!
Sam

getSignificance very memory intensive when fit = ANOVA

Hi Nick!

Amazing tool! I was wondering if you could help me with a issue I am having owing to a large dataset.
I have a seurat object which consists of sample data integrated across 15 individuals 50k+ subsetted high-quality doublet free cells.
Now in this dataset I have a metadata for celltype (15 types) which are further broken into race (2 types) and sex (2 types) which leads to joined metadata slot for 60 celltypes, on the basis of sex and race.

Now I have already downloaded the entire msigdb for cat C5 (gene ontologies), using:

# molecular Signature database for msigdb
gene.sets <- getGenesets(org = "hsa",
                         db = c("msigdb"), #https://rdrr.io/bioc/EnrichmentBrowser/man/getGenesets.html
                         gene.id.type = "SYMBOL", #idTypes(org = "hsa")
                         cat = c("C5"), # C5 is gene ontology
                         cache = TRUE, 
                         return.type = "list")

and successfuly ran enrichIt:

# Enrichment calculation
ES <- enrichIt(obj = pancreas.combined.h.s, 
               gene.sets = gene.sets, 
               groups = 1000, 
               cores = 30, 
               min.size = NULL)

At this point, I have successfully reproduced all of the plotting params that I have tested and which are outlined in your vignette. Obviously after transferring an enormous amount of C5 metadata or the ES file to the Seurat object.

Now I am interested in running a statistical test across all 60 samples in various configurations to test for pathways that are more activated in some sex_ancestral_celltype vs another sex_ancestral_celltype (obviously the comparison is for the same cell type but differing sex and ancestry). In order to achieve this I run:

# Significance testing
significant_pathways <- getSignificance(ES2, group = "celltype.sample", fit = "ANOVA")

As I write this I just crossed 450GB of ram utilization. Is there any way I can reduce this colossal computational complexity?
I am thinking probably the only way, is to individually subset cell types, calculate the significance and then add that metadata iteratively back to the primary Seurat file. Or is there some way of using getSignificance to specify the testing combinations?

Thanks again for the wonderful tool.

Cheers,

🐉

Issue Getting Mouse Gene Sets

Some of you might notice the difference in pulling say the hallmark library for human (50 gene sets) versus mouse (11 gene sets). This is a product of the annotation available in the msigdb R package. A fix would be to use the human library and then convert to the mouse nomenclature:


##Load Escape
library(escape)

#Load Human Gene Sets
GS <- getGeneSets(library = "H")

##Get ensembl annotation
library("biomaRt")
human = useMart("ensembl", dataset = "hsapiens_gene_ensembl")
mouse = useMart("ensembl", dataset = "mmusculus_gene_ensembl")

##Define a function for converting human to mouse genes
convertHumanGeneList <- function(x){

genesV2 = getLDS(attributes = c("hgnc_symbol"), filters = "hgnc_symbol", values = x , mart = human, attributesL = c("mgi_symbol"), martL = mouse, uniqueRows=T)

humanx <- unique(genesV2[, 2])

return(humanx)
}

##loop the function from above
for (i in seq_along(GS)) {
    GS[[i]]@geneIds <- convertHumanGeneList(GS[[i]]@geneIds)
}

Although this is feasible for smaller libraries - the code is a little slow for larger gene set collections. Credit for the conversion goes to radiaj.

Getting Gene Sets Is Slow

Hey All,

Just writing because I got a great email question about having to wait for getGeneSets() to pull whole libraries. Here is my response with some code and ideas.

Unfortunately part of the submission to Bioconductor required us to remove the built in GSEA library (which was much faster) and rely on the msigdbr R package for downloading the gene sets each time.

I think to bypass the time it takes you could use the msigdbr directly where category is the library and subcategory is the subsets you are interested in, then run the for loop below that to get everything formatted to work with the escape enrichIt() function.

m_df <- msigdbr(species = "Homo sapiens", category = NULL, subcategory = NULL)
 
#Loop convert to gene set collection for work with escape
gs <- unique(m_df$gs_name)
ls <- list()
for (i in seq_along(gs)) {
        tmp <- m_df[m_df$gs_name == gs[i],]
       tmp <- tmp$gene_symbol
       tmp <- unique(tmp)
       tmp <- GeneSet(tmp, setName=paste(gs[i]))
       ls[[i]] <- tmp
    }
gsc <- GeneSetCollection(ls)

An alternative that I use is to wait the extra time once – pull in the whole C5 library and just save it as an rda or rds file, then all I need to do is load it and subset it for the gene sets of interest, this would take far longer initially, but would save you time in the long run if you are going to be calling multiple subsets of the same library.

Error in enrichIt()

Hello,

Thank you for creating such a useful package.

I am following your vignette and trying to perform ssGSEA on my single cell RNA-Seq data. However, I am running into this error - Error in if (attr(class(egc), "package") == "GSEABase") { : argument is of length zero

Here's my code:

gene.sets <- list(MES=c(genes1),
                  ADRN=c(genes2))
ES <- enrichIt(obj = seurat_integrated, 
               gene.sets = gene.sets, 
               groups = 1000, cores = 4)

Where genes1 and genes2 are vectors containing list of genes.

I also tried running ES <- enrichIt(obj = seurat_integrated, method = "UCell", gene.sets = gene.sets) but that throws an error too - Error in enrichIt(obj = seurat_integrated, method = "UCell", gene.sets = gene.sets) : unused argument (method = "UCell")

My R version is 4.0.3. I am not sure what I am missing here. Any help is greatly appreciated.

Thanks,
Khushbu

Having two condition in Seurat

Hi

Indeed this is not an issue with your package rather my question

I have a Seurat object with two conditions and several clusters

> unique([email protected]$Condition)
[1] "cancer"  "control"
> unique([email protected]$integrated_snn_res.0.5)
 [1] 0  4  12 2  6  9  1  3  10 7  11 5  14 13 8 
Levels: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
>

I want to get pathways by comparing cluster 11 in cancer versus control samples

What is your suggestion please?

Thanks a lot

Understanding the groups parameter in escape

Hi Nick,

Thanks for the great tool. I was looking through the code you have for escape to better understand how the groups parameter works in the enrichIT function. If I understand it correctly you are splitting the data into n number of groups of cells and then plugging these groups of cells into GSVA. My question is, since GSVA calculates a distribution for each gene, and a rank for that gene based on the inferred distribution, would this not be an issue when generating the gene rankings if you have split the cells into multiple groups? I may be over thinking this, but I was just curious to hear what you thought.

Best,
Dylan

Info ES

Hi,
Thanks for this package! I have a question regarding how you get the ES. I used enrichIt on a SingleCellExperiment object without asking for the normalisation. I get very hight values between 3000 to 5000 for the 3 misigdb that I am interested on. Do you think it is making sens? when I used normalisation I get the values between 0 and 1 like you mension in another post. Then when I used dittoHeatmap the scale is between -4 and 4 in both cases. I was wandering if dittoHeatmap is normalising the data by itself? Also I try to getSignificance but it looks like it needs a seurat object. Is it possible to use it with my SingleCellExperiment object?

Thank you very much in adavnces

masterPCAPlot

Dear Nick,

I'm trying this wonderful package on our data. When running masterPCAPlot I find the following error:

GS.hallmark <- getGeneSets(library = "H")
masterPCAPlot(PCA_IP_TEST, gene.sets = names(GS.hallmark), PCx = "PC1", PCy = "PC2", top.contribution = 10)

Error in svd(x, nu = 0, nv = k) : a dimension is zero

I checked and there are no zeros or missing values in my input dataset. I leave the test dataset attached.

Thanks for your help.

Best,

Juanlu.

PCA_IP_TEST.zip

> sessionInfo()
R version 4.1.3 (2022-03-10)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server release 6.7 (Santiago)

Matrix products: default
BLAS/LAPACK: /home/devel/jtrincado/anaconda3/envs/Seurat4/lib/libopenblasp-r0.3.17.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=es_ES.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=es_ES.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=es_ES.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] escape_1.4.1       dittoSeq_1.2.6     ggplot2_3.3.5      SeuratObject_4.0.4
[5] Seurat_4.0.2      

loaded via a namespace (and not attached):
  [1] backports_1.4.1             plyr_1.8.6                 
  [3] igraph_1.2.9                lazyeval_0.2.2             
  [5] GSEABase_1.52.1             splines_4.1.3              
  [7] BiocParallel_1.24.1         listenv_0.8.0              
  [9] scattermore_0.7             GenomeInfoDb_1.30.0        
 [11] digest_0.6.29               htmltools_0.5.2            
 [13] fansi_1.0.3                 magrittr_2.0.2             
 [15] memoise_2.0.1               tensor_1.5                 
 [17] cluster_2.1.3               ROCR_1.0-11                
 [19] limma_3.46.0                globals_0.14.0             
 [21] Biostrings_2.62.0           annotate_1.68.0            
 [23] matrixStats_0.61.0          spatstat.sparse_2.0-0      
 [25] colorspace_2.0-3            blob_1.2.2                 
 [27] ggrepel_0.9.1               dplyr_1.0.8                
 [29] hexbin_1.28.2               crayon_1.5.1               
 [31] RCurl_1.98-1.6              jsonlite_1.8.0             
 [33] graph_1.68.0                spatstat.data_2.1-0        
 [35] survival_3.3-1              zoo_1.8-9                  
 [37] glue_1.6.2                  polyclip_1.10-0            
 [39] gtable_0.3.0                zlibbioc_1.40.0            
 [41] XVector_0.34.0              leiden_0.3.9               
 [43] DelayedArray_0.20.0         future.apply_1.8.1         
 [45] SingleCellExperiment_1.12.0 BiocGenerics_0.40.0        
 [47] msigdbr_7.4.1               abind_1.4-5                
 [49] scales_1.1.1                pheatmap_1.0.12            
 [51] DBI_1.1.2                   edgeR_3.32.1               
 [53] miniUI_0.1.1.1              Rcpp_1.0.8.3               
 [55] isoband_0.2.5               viridisLite_0.4.0          
 [57] xtable_1.8-4                reticulate_1.22            
 [59] spatstat.core_2.1-2         bit_4.0.4                  
 [61] GSVA_1.38.2                 stats4_4.1.3               
 [63] htmlwidgets_1.5.4           httr_1.4.2                 
 [65] RColorBrewer_1.1-2          ellipsis_0.3.2             
 [67] ica_1.0-2                   farver_2.1.0               
 [69] pkgconfig_2.0.3             XML_3.99-0.8               
 [71] uwot_0.1.11                 deldir_1.0-6               
 [73] locfit_1.5-9.4              utf8_1.2.2                 
 [75] labeling_0.4.2              tidyselect_1.1.2           
 [77] rlang_1.0.2                 reshape2_1.4.4             
 [79] later_1.3.0                 AnnotationDbi_1.56.1       
 [81] munsell_0.5.0               tools_4.1.3                
 [83] cachem_1.0.6                cli_3.2.0                  
 [85] generics_0.1.2              RSQLite_2.2.8              
 [87] broom_0.7.12                ggridges_0.5.3             
 [89] stringr_1.4.0               fastmap_1.1.0              
 [91] goftest_1.2-3               babelgene_21.4             
 [93] bit64_4.0.5                 fitdistrplus_1.1-6         
 [95] purrr_0.3.4                 RANN_2.6.1                 
 [97] KEGGREST_1.34.0             pbapply_1.5-0              
 [99] future_1.23.0               nlme_3.1-157               
[101] mime_0.12                   compiler_4.1.3             
[103] plotly_4.10.0               png_0.1-7                  
[105] spatstat.utils_2.1-0        tibble_3.1.6               
[107] stringi_1.7.6               lattice_0.20-45            
[109] Matrix_1.4-1                vctrs_0.3.8                
[111] pillar_1.7.0                lifecycle_1.0.1            
[113] spatstat.geom_2.1-0         lmtest_0.9-39              
[115] RcppAnnoy_0.0.18            data.table_1.14.2          
[117] cowplot_1.1.1               bitops_1.0-7               
[119] irlba_2.3.5                 httpuv_1.6.3               
[121] patchwork_1.1.1             GenomicRanges_1.42.0       
[123] R6_2.5.1                    promises_1.2.0.1           
[125] KernSmooth_2.23-20          gridExtra_2.3              
[127] IRanges_2.28.0              parallelly_1.29.0          
[129] codetools_0.2-18            MASS_7.3-56                
[131] assertthat_0.2.1            SummarizedExperiment_1.20.0
[133] withr_2.5.0                 sctransform_0.3.2          
[135] S4Vectors_0.32.3            GenomeInfoDbData_1.2.4     
[137] mgcv_1.8-40                 parallel_4.1.3             
[139] grid_4.1.3                  rpart_4.1.16               
[141] tidyr_1.2.0                 MatrixGenerics_1.6.0       
[143] Rtsne_0.15                  Biobase_2.54.0             
[145] shiny_1.7.1

I have problem with the format of my input data

I'm trying to run the code as below
suppressPackageStartupMessages(library(escape))

suppressPackageStartupMessages(library(dittoSeq))
suppressPackageStartupMessages(library(SingleCellExperiment))
suppressPackageStartupMessages(library(Seurat))
suppressPackageStartupMessages(library(SeuratObject))
als <- readRDS("/home/paria/scratch/als-project/analysis/OCU_MED/clustering/new/full_sample_k_100_resolution_0.8_PC_50.rds")
als <- DietSeurat(suppressMessages(UpdateSeuratObject(als)))
Error in slot(object = object, name = x) :
no slot of name "median_umi" for this object of class "SCTModel"

I was trying to extract the count matrix from Seurat object but I'm still getting the same error. Not sure were I'm doing it wrong or do I actually need two have median umi or it is not necessary. The ALS object is as below
als
An object of class Seurat
63224 features across 92297 samples within 3 assays
Active assay: SCT (30612 features, 3000 variable features)
2 other assays present: RNA, integrated
1 dimensional reduction calculated: pca

I really appreciate if you let me know how I can solve it? By using count matrix only? if yes what is the best way of extracting it?
Thanks,
Paria

Error in if (attr(class(egc), "package") == "GSEABase") { : argument is of length zero

Hi!
When I tried to run enrichIt and provided my custom gene sets, I received the following error:
Error in if (attr(class(egc), "package") == "GSEABase") { :
argument is of length zero
This error was also mentioned in #31 but did not seem to have a solution. I had no problem running enrichIt using the public gene sets from the database.
Here's a toy example of my custom gene set stored in list, and mat is my expression matrix with sampleId in the colnames and genes in the rownames:
list=list('l1'=c('MAN2B2','HSPG2','TREH'),'l2'=c('PRPS1L1','LALBA','OGN'))
ES.seurat <- enrichIt(obj = mat, gene.sets = list, groups = 1000, cores = 2)

Absence of method parameter in master branch as compared to the vignette

I noticed that the vignette found here:
https://ncborcherding.github.io/vignettes/escape_vignette.html

provides function parameters to change the method for GSEA (both UCell and singscore are shown as alternative options).

However in the source code on the mai branch for enrichIt, this parameter is not included.

The method seems to have this parameter in the dev branch. If this is the case, is this parameter still in a beta state/untested?

Statistical significance of pathways across clusters

I am trying to compare pathway/gene sets enrichment across the clusters of my scRNA-seq data obtained earlier from Seurat V3 package. But for the significance test, when I try to use ANOVA on my seurat clusters, I get an error:

`output <- getSignificance(ES, group="seurat_clusters", fit = "ANOVA")

Error in [.data.frame(enriched, , group) : undefined columns selected`

Saving heatmap

Hello,

Thanks for this great package. I am having an issue saving the dittoHeatmap. I have been able to save all other plots from Escape with ggsave("__.pdf"), but for some reason it does not work for dittoHeatmap. I have also tried to save it with pdf(".pdf") dittoHeatmap() then dev.off() and it also doesn't work.

Any advice for saving the heatmap?

Thanks in advance!
Brooke

Contour lines on tSNE/UMAP

Hi,

Thank you for developing Escape! I am using it for ssGSEA analysis. But I also am interested in adding contour plots to tSNE/UMAP embedding similarly as in figure 1b in your Communications Biology manuscript (congratulations btw, very nice paper!) I am interested in adding contour lines to highlight 1) cell density on a cluster basis (i.e., add a contour on cluster X) and 2) add a contour on cells expressing gene Y (to discriminate between clusters of cells expressing gene Y vs. those that do not).

Do you have any examples of how to do this? Would really appreciate it!

Thank you!

Cannot install package through devtools::install_github

devtools::install_github("ncborcherding/escape")

installing source package ‘escape’ ...
** using staged installation
** R
** byte-compile and prepare package for lazy loading
Warning: replacing previous import ‘vctrs::data_frame’ by ‘tibble::data_frame’ when loading ‘dplyr’
Error: object ‘msigdbr_species’ is not exported by 'namespace:msigdbr'
Execution halted
ERROR: lazy loading failed for package ‘escape’

Cannot get vignette to run fully

Hi, thank you for your vignette http://www.bioconductor.org/packages/release/bioc/vignettes/escape/inst/doc/vignette.html

I am trying to understand the analysis so have followed this vignette using the pbmc_small dataset.

After running:
dittoHeatmap(sce, genes = NULL,
metas = c("HALLMARK_APOPTOSIS", "HALLMARK_DNA_REPAIR", "HALLMARK_P53_PATHWAY"),
annot.by = "groups",
fontsize = 7,
heatmap.colors = colors(50))

I get this error: Error: subscript contains invalid names

What I am really interested in is assigning the pathways a significance. When I run this I get the following error:
output <- getSignificance(ES2, group = "cluster", fit = "linear.model")
Error in match.arg(fit, choices = c("T.test", "ANOVA", "Wilcoxon", "LR")) :
'arg' should be one of “T.test”, “ANOVA”, “Wilcoxon”, “LR”

When I run this:
dittoHeatmap(pbmc_small, genes = NULL, metas = names(ES.seurat),
annot.by = "groups",
fontsize = 7,
cluster_cols = TRUE,
heatmap.colors = colors(50))
It works fine. However, when I run it on my own dataset using the Reactome dataset I get so many pathways that the heatmap is unreadable. Is there a way to only show the most differential pathways across my groups?

Thanks for your time,
Naomi

GeneSet and GeneSetCollection

Hi！

Can you please help me why the enrichment is different in Geneset A?

gs.A <- GeneSet( c("gene.A1", "gene.A2“, ”gene.A3", ......), setName="A")
gs.B <- GeneSet( c("gene.B1", "gene.B2“, ”gene.B3", ......), setName="B")

es.A <- enrichIt(obj=Seurat.obj, gene.sets = gs.A)
es.B <- enrichIt(obj=Seurat.obj, gene.sets = gs.A)

head(es.A)
>
               A
cell.1    -0.34535
cell.2     0.88888
cell.3     0.66354
cell.4     0.43214
cell.5     0.42526

gsc <- GeneSetCollection(gs.A, gs.B)
es <- enrichIt(obj=Seurat.obj, gene.sets = gsc)
head(es)
>
               A           B
cell.1     0.85475        xxxx
cell.2     0.04346        xxxx
cell.3    -0.21353        xxxx
cell.4     0.98514        xxxx
cell.5     0.02526        xxxx

And how do I understand the value above? Does -0.21353 mean less enriched in Set A while 0.85475 means more enriched in Set A?

Making Gene Sets from Top X differential Genes

Hey Everyone,

Got a questions on how to make gene signatures/sets based on the top gene of a cluster. Thought I would share my code I have used to quickly make a gene set collection that will be compatible with the escape package.

Loading and filtering output from FindAllMarkers()

markers <- read.delim("~/Google\ Drive/CurrentProjects/CTCL_V2/data/CTCL_markers.txt")
markers <- markers[markers$p_val_adj < 0.05,]
top50 <- markers %>% group_by(cluster) %>% top_n(n = 50, wt = avg_logFC)

Extracting the top 50 genes for each cluster

cluster <- c("C1", "C2", "C3", "C6", "C10", "C12", "C15") #These are the clusters I was interested
out <- NULL
for (i in seq_along(cluster)) {
  tmp <- top50[top50$cluster == cluster[i], ]
  tmp <- as.character(tmp$gene)
  out <- cbind(out, tmp)
}
colnames(out) <- cluster

Creating the gene set collection

require(GSEABase)
list <- list()
names <- colnames(out)
for (i in 1:length(names)) {
  out2 <- out[,i]
  out2 <- as.character(out2) #ensuring everything is a character
  out2 <- out2[out2 != ""]  #ensuring no empty strings
  out2 <- unique(out2) #ensuring no repeats
  y <- GeneSet(out2, setName=paste(names[i]))
  list[[i]] <- y
}
names(list) <- names

Enrichment Analysis

enrichIt(obj, gene.sets = list, groups = 1000, cores = 2)

Calculating average ssGSEA score for easy-to-see heatmap?

Hi!

I'm working on some pathway heatmaps, and have gotten up to this:

I know the best part about scRNA is the ability to see things on the single-cell level, but is there a way to display the scoring data from each cluster an an average? so it's not individual ticks for each cell and rather just one color for the average of the cluster? (or any other suggestions for visualization - as you can see, i turned down the colors to 10 to see the differences a little better)

I'm looking for something along the lines of the the AverageExpression() function in Seurat that allows one to create a heatmap like below:

Thanks!

Limit the number of pathways to the top 10/20?

Hi there,

First, great program! Easy to work with.

From what I understand, the ssGEA calculates the gene set scores for each individual cell, and then you can group them with the "annot.by" command.

I see there is a way to call individual gene sets, but is there a way to compare the top 10/20 enrichment scored gene sets that are most different between the annotations? I'm still in the discovery phase (so don't have a specific gene set i want to look at), but even the largest image being produced by dittoHeatmap is too small to read what the gene sets are.

enrichmentPlot did not finish

Hi, I am using this package for GSEA on scRNA extracted from a ArchR project as a SingleCellExperiment object. I was able to use enrichIt however when I run enrichmentPlot it is not finishing even after 12h and it is taking a lot of memory (around 50G). Is it normal? Do you have any idea why it is so long?
Here is my code:
"'

RNA_data=getMatrixFromProject(MM_OnlyPre_peak,useMatrix = "GeneIntegrationMatrix")
RNA_data=as(RNA_data, "SingleCellExperiment")
row.names(RNA_data)=rowData(RNA_data)$name
assayNames(RNA_data)[1]="counts"

m_df_C8<- msigdbr(species = "Homo sapiens", category = "C8") #Homo sapien and Cell type signature gene sets
fgsea_sets_C8<- m_df_C8 %>% split(x = .$gene_symbol, f = .$gs_name)
fgsea_sets_C8=fgsea_sets_C8[grep("HAY_BONE_MARROW",names(fgsea_sets_C8))]#selection of bone marrow cells
fgsea_sets_C8_bis=c(fgsea_sets_C8[grep("PLASMA",names(fgsea_sets_C8))], fgsea_sets_C8[grep("_B_CELL",names(fgsea_sets_C8))],fgsea_sets_C8[grep("PRO_B",names(fgsea_sets_C8))]) #selection of B and plasma cells from bone marrow cells

ES.MM_OnlyPre_peak <- enrichIt(obj = RNA_data, gene.sets = fgsea_sets_C8_bis, method = "UCell",groups = 1000, cores = 8)
met.data <- merge(colData(RNA_data), ES.MM_OnlyPre_peak, by = "row.names", all=TRUE)
row.names(met.data) <- met.data$Row.names
met.data$Row.names <- NULL
colData(RNA_data) <- met.data
RNA_data
class: SingleCellExperiment
dim: 18097 48042
metadata(0):
assays(1): counts
rownames(18097): FAM87B LINC00115 ... TMLHE-AS1 TMLHE
rowData names(6): seqnames start ... name idx
colnames(48042): P1217_MM_ATAC#AAACTCGAGAATACTG-1 P1217_MM_ATAC#AAACTCGGTCAGGCTC-1 ... P1752_MM_ATAC#TTTGTGTTCGGGAATG-1
P1752_MM_ATAC#TTTGTGTTCTCGGCGA-1
colData names(52): BlacklistRatio DoubletEnrichment ... HAY_BONE_MARROW_FOLLICULAR_B_CELL HAY_BONE_MARROW_PRO_B
reducedDimNames(0):
mainExpName: NULL
altExpNames(0):

p=enrichmentPlot(RNA_data,
gene.set = "HAY_BONE_MARROW_FOLLICULAR_B_CELL",
gene.sets = fgsea_sets_C8_bis,
group = "Translocation") +
scale_color_manual(values = colorblind_vector(5)[c(1,4)]) "'

Is it possible to dig into relative contributions of various genes to the enrichit score

This is a great package - thank you for making it!

I was wondering whether it would be possible to look at the relative contribution of individual genes to the enrichIt score by any chance? Thank you.

Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 102

Hi!
I am trying to run the package and I get an error as indicated below:

ES.seurat <- enrichIt(obj = rds_file, gene.sets = GS.hallmark_and_c2, groups = 50, cores = 2)
Error in asMethod(object) : 
  Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 102

This is a large sparse matrix.
dim(rds_file)
2000 136048
I have used this tool before, so I know it works, but this matrix is much larger!
I am using version ‘1.2.0’
Thanks,
Rini

Median of getSignificance and multi_dittoPlot() don't match

Hey Nick,

I am testing escape, and I love it! I want to convert the previous analysis I did with the module score from Seurat and use the enrichment analysis from escape instead.

I have a problem, and I'm not sure what the cause is. In short, I used enrichIt() on a gene set and getSignificance().
When I plotted with multi_dittoPlot() (but the same happens if I use geom_violin() and geom_boxplot() ), the median of the three groups doesn't match the median calculated by getSignificance():

The group "M" should have the lowest median (0.44), while "S" has the highest (0.47). However, from the plot M is higher than S.

getSignificance() also replicated what I observed before with Seurat module score. This also happens with other gene sets, so I am puzzled.

I really appreciate any help you can provide.

Improve computational time for EnrichIt(), performPCA(), and masterPCAplot

Hi there! Great package, I am a huge fan!

I am wondering if there is a way to improve the computational time it takes to run the functions performPCA(), and masterPCAplot when considering larger gene sets such as MsigDB's curated gene set C2 that has ~6000+ gene sets to do GSEA on. For example, it took 8 hours to finish running just performPCA(). Is there an ability to specify or increase the multi-core processors or threads?

Thanks!

cholmod error 'problem too large' at file

Had an email with this error.

Issue here is that the count matrices within the Seurat or SingleCellExperiment object is too large to convert to traditional matrix using the call as.matrix(). This will be an issue if you have a ton of cells or conversely, a ton of features.

Methods to circumvent

1. Summarize and Filter
Filter the sparse matrix for features that are expressed (or change 0 to percent of sample expressed)

expr_matrix <- integrated@assays$RNA@counts #sparsematrix
expr_matrix <- expr_matrix[tabulate(summary(expr_matrix)$i) != 0, , drop = FALSE] #remove any feature without a single count
expr_matrix <- as.matrix(expr_matrix) #convert to matrix

2. Split Seurat Object and Loop

Although not ideal - there is also a possibility of looping through the seurat objects or initial count matrices.

Seurat_loop <- SplitObject(integrated, split.by = "ident")
ES_list <- list[[]]
for (i in seq_along(Seurat_loop) {
           tmp <- enrichIt(ES_list[[i]], GS)
           ES_list[[i]] <- tmp
}

Some genesets NES score correlate with nCount, nFeatures

Hello,

Many thanks for your package, I wanted to ask some questions regarding the results.

In my dataset it looks like some of the genesets score present a very strong correlation with the number of features detected per cell or the number of counts (I attached some examples below).

Thus, I would like to ask if you have encountered such an issue before? And if so, how did you handled it?

Thank you very much,

Andrés

Brief description of what I did in case it is useful:

I processed the datasets with Seurat standard protocol and use the resulting object as input in the enrichIt function with the H library. Then added the ES resulting object to the metadata of the seurat object as per tutorial instructions. To obtain the correlation with the nFeature_RNA I used the FeatureScatter function of Seurat.

Examples:

Please Select Compatible Species

Had a user reach out with the following error:

This is an issue with the msigdbr package updating function names, both the bioconductor and github versions of escape have been updated to reflect this new change, please reinstall.

Error when running dittoHeatmap

Hello,

Thanks for this wonderful package!

I was encountering some errors when running the following scripts with a seurat project:

dittoHeatmap(seurat, genes = NULL, metas = names(ES.seurat),
annot.by = "groups",
fontsize = 7,
cluster_cols = TRUE,
heatmap.colors = colors(50))

Here is the error message generated after,

Error in .which_data(assay, slot, object)[genes, cells.use] :
invalid or not-yet-implemented 'Matrix' subsetting

Can you help me figure this out?

Thanks!
Hous

Error in running enrichIt

Hi
I'm trying to run enrichIt package, but same error repeated
This is the code I trying:
gene.sets <- getGeneSets(library = "H")
ES <- enrichIt(obj = seurat, gene.sets = gene.sets, groups = 1000, cores = 4)

but the error message is:
[1] "Using sets of 1000 cells. Running 51 times."
Setting parallel calculations through a SnowParam back-end
with workers=4 and tasks=100.
Estimating ssGSEA scores for 50 gene sets.
Registered S3 method overwritten by 'spatstat.geom':
method from
print.boxx cli
Registered S3 method overwritten by 'spatstat.geom':
method from
print.boxx cli
Registered S3 method overwritten by 'spatstat.geom':
method from
print.boxx cli
Registered S3 method overwritten by 'spatstat.geom':
method from
print.boxx cli
Error: BiocParallel errors
element index: 1, 2, 3, 4, 5, 6, ...
first error: object '.fastRndWalk' not found

How can I solve this problem?
I'm using escape version 1.3.1, and BiocParallel version 1.26.2, same as you in escape vingette

plotpathway() : sort by seurat clusters? Also, can I just keep the violin plot, but not the individual dots.

Hi,
Is there a way to sort by seurat clusters? Also, can I plot he boxplot with no individual dots?
Current command:
plotPathway(sce_rp, resultName = "VAM_H_CDF", geneset = hallmark, groupby = "seurat_clusters", boxplot=TRUE)

Thanks,
Rini

about using a subset of groups in visualization

Hi. I am using Visium data to do the ssGSEA. I followed the tutorial but when it comes to creating the heatmap, I cannot do it only for a limited number of groups which is here my clusters:
dittoHeatmap(obj, genes = NULL, metas = names(ES.seurat), annot.by = "seurat_clusters", fontsize = 7, cluster_cols = TRUE, heatmap.colors = colors(50))
By the above command, I get the heatmap for all clusters; however, I want to do it only for example a couple of them.
Do you have any idea how can I get the clusters of interest?

statistcs

Thank you a lot for creating a tool that is compatible with Seurat and SCE objects.
I used escape and got some interesting results, however, in the vignette I didn't see any section on how to find the statistically significant pathways. Is there a way to do so using escape or any other tools?

Thank you again!

Error in running enrichIt

Hi
I'm tryinh enrichIt package with my seurat data

When I'm running the code;
gene.sets <- getGeneSets(library = "H")
ES <- enrichIt(obj = CD4T_naive, gene.sets = gene.sets, groups = 1000, cores = 4)

Error message repeat
[1] "Using sets of 1000 cells. Running 51 times."
Setting parallel calculations through a SnowParam back-end
with workers=4 and tasks=100.
Estimating ssGSEA scores for 50 gene sets.
Error in names(res)[compute_element] <- nms : replacement has length zero

Why dose this message repeat? Is this message mean there are no enriched geneset in my data?
I'm using escape version 1.3.1

what's the means of "group" in enrichIt function ?

Thanks for the useful package. And i am using it on my own data, but a little confused what's the "group" means ? I notice the
reminder "Using sets of 1000 cells. Running 2 times" when i run it by default, but the raw data contain more than 1000 cells, should i set to the real cell number?

Subsetting objects by cluster for visualization

Hey Nick,

Thank you so much for your contributions to the single-cell analysis field.

Sometimes when I am making visualizations from a Seurat object, I only want to focus on a subset of clusters.

I have been doing this by subsetting the whole object, while passing it to a visualization function.

dittoHeatmap(subset(seurat_ex, idents = c("C1","C2")),
 genes = NULL, metas = names(ES),
             heatmap.colors = rev(colorblind_vector(50)),
             annot.by = c("active.idents", "Type"),
             cluster_cols = TRUE,
             fontsize = 7)

The visualizations using the ES2 object can be subset the following way too.

ridgeEnrichment(ES2[ES2$cluster %in% c("C1","C2")], gene.set = "HALLMARK_DNA_REPAIR", 
group = "cluster", facet = "Type", add.rug = TRUE)

Anyway, I thought this was an interesting additional functionality in case you wanted to add this to the vignette for anyone else that might want to focus in on subsets of clusters, as it is something that isn't very clear on how to do in Seurat or the dittoSeq vignettes.

Heatmap with all Nominal variables using R

I have a dataset with multiple normal variables only. I'd like to use R to create a Heatmap similar to the one shown in the attached image. Each variable has its own color scale. Please help me, I have been struggling for days.

Error while running getSignificance()

Hello, thank you for developing this easy to use package.

I am running into an error while trying to run the getSignificance() function on a portion of my dataset. I was able to run this successfully on some other celltypes but with CD8 cells I am getting this error:

output <- getSignificance(ES2, group = "immune_status", fit = "T.test") Error in t.test.default(x = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, : data are essentially constant

Any thoughts/suggestions on how to proceed ? Thanks a lot.

NES comparison between cells

Thanks for this great package!

I have a question regarding the comparison of genesets within a cell. I want to use the C8 collection and determine which of those gene sets is the most enriched in a particular cell.

Currently, I am doing a z-transformation on the EnrichIT output for each gene set and then determine which gene set has the highest z-score within a particular cell. Could you comment on this approach and perhaps advise on a more streamlined method?

Best,

Jeron

Subsetting? and Methodological differences for enrichIt...Due to sample makeup?

Hello again! Two questions, roughly related...regarding method differences with enrichIt()

So just like you said, UCell is much faster. I'm having hard time getting through my script using my seurat objects without R crashing using the regular method, but the regular method it can work if I do one at a time and have patience.

I've been comparing default method vs the Ucell method, using the pct difference between my conditions, and i've gotten different results. Just wondering what your thoughts are on this, and if my samples may have an impact on this. (to clarify, top 30 are kinda the same, just always in a different order).

For background, I have 7 human samples of 2 different conditions (4 MSE, 3 FT). NONE are biological replicates (the unfortunate aspect of human samples). All 7 are different, but taken at certain timepoints so they should be roughly similar.

Just wondering if you could shed some light on the differences, and if the point that my samples are all different makes an impact?

Follow-up question, if you prefer the default statistical method, i'm imaging that subsetting 1k cells from the original seurat object will make the function run faster, but then it kinda defeats the purpose of assigning a score to every cell (although i guess it would be a random 1k, so it should be representative)? Thoughts?

masterPCAplot bug

Just a small bug. It seems variable top.contribution in masterPCAplot is not used inside the function. Modify

names <- tbl %>% top_n(n = 10, (factors.x + factors.y)/2)
by
names <- tbl %>% top_n(n = top.contribution, (factors.x + factors.y)/2)

Thanks for your amazing work.

Best,
JL

All gene sets highly significant with "getSignificance"

I successfully run escape on my scRNA data. However, when checking for gene sets which are significant using getSignificance I get p_adjust == 0 for all gene sets.

I tried ANOVA and linear.model, both report all highly significant. Also, I run it on different groupings, also did not change the outcome.

Do you have an idea what could be the problem?

ncborcherding / escape Goto Github PK

escape's Introduction

About Me

GitHub Stats

escape's People

Contributors

Stargazers

Watchers

Forkers

escape's Issues

Loading and filtering output from FindAllMarkers()

Extracting the top 50 genes for each cluster

Creating the gene set collection

Enrichment Analysis

Methods to circumvent

Recommend Projects

Recommend Topics

Recommend Org