Giter Site home page Giter Site logo

igordot / msigdbr Goto Github PK

View Code? Open in Web Editor NEW
68.0 9.0 12.0 63.84 MB

MSigDB gene sets for multiple organisms in a tidy data format

Home Page: https://igordot.github.io/msigdbr

License: Other

R 100.00%
genomics msigdb gene-sets pathways gsea pathway-analysis enrichment-analysis

msigdbr's Introduction

msigdbr's People

Contributors

igordot avatar smped avatar vreuter avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

msigdbr's Issues

msigdbr package, category C2, subcategory CP

Hello,
I'm currently running a gsea using msigdbr package.
I've noticed that subcategory CP of category C2 only contains 29 gene sets as displayed by msigdbr(collections), whereas this subcategory should include all of the depending gene sets (KEGG, reactome, wikipthways,...) and originally contains 2982 gene sets, as detailed on the original website : http://www.gsea-msigdb.org/gsea/msigdb/genesets.jsp?collection=CP

Any recommendations to run all of these gene sets depending on CP subcategory?

Thank you!

 

Save the 'entrez_gene' columns in character mode

First thanks for this great package! Especially it directly outputs three different gene ID types, which saves a lot of time when switching between different gene ID types.

I have a small suggestion. Here in the output table, columns related to "entrez_gene" are stored as integers. I would suggest to change to characters, as what other Bioconducror annotation package does (e.g. org.Hs.eg.db).

gene_sets
# A tibble: 8,209 × 15
   gs_cat gs_su…¹ gs_name gene_…² entre…³ ensem…⁴ human…⁵ human…⁶ human…⁷ gs_id gs_pmid gs_ge…⁸
   <chr>  <chr>   <chr>   <chr>     <int> <chr>   <chr>     <int> <chr>   <chr> <chr>   <chr>  
 1 H      ""      HALLMA… ABCA1        19 ENSG00… ABCA1        19 ENSG00… M5905 267710… ""     
 2 H      ""      HALLMA… ABCB8     11194 ENSG00… ABCB8     11194 ENSG00… M5905 267710… ""     
 3 H      ""      HALLMA… ACAA2     10449 ENSG00… ACAA2     10449 ENSG00… M5905 267710… ""     
 4 H      ""      HALLMA… ACADL        33 ENSG00… ACADL        33 ENSG00… M5905 267710… ""     
 5 H      ""      HALLMA… ACADM        34 ENSG00… ACADM        34 ENSG00… M5905 267710… ""     
 6 H      ""      HALLMA… ACADS        35 ENSG00… ACADS        35 ENSG00… M5905 267710… ""     
 7 H      ""      HALLMA… ACLY         47 ENSG00… ACLY         47 ENSG00… M5905 267710… ""     
 8 H      ""      HALLMA… ACO2         50 ENSG00… ACO2         50 ENSG00… M5905 267710… ""     
 9 H      ""      HALLMA… ACOX1        51 ENSG00… ACOX1        51 ENSG00… M5905 267710… ""     
10 H      ""      HALLMA… ADCY6       112 ENSG00… ADCY6       112 ENSG00… M5905 267710… ""     
# … with 8,199 more rows, 3 more variables: gs_exact_source <chr>, gs_url <chr>,
#   gs_description <chr>, and abbreviated variable names ¹​gs_subcat, ²​gene_symbol,
#   ³​entrez_gene, ⁴​ensembl_gene, ⁵​human_gene_symbol, ⁶​human_entrez_gene, ⁷​human_ensembl_gene,
#   ⁸​gs_geoid
# ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names

Imagine we want to convert Entrez IDs to Refseq IDs, and we have a mapping vector (map) where Entrez IDs are the names and Refseq IDs are the values. Then naturally, to convert, we can do:

map[gene_sets$entrez_gene]

This causes the problem because gene_sets$entrez_gene are integers and it is actually treated as numeric indices for the map vector, while not to match to the names in map.

To do it correctly, we need to explicitly convert gene_sets$entrez_gene to characters:

map[as.character(gene_sets$entrez_gene)]

The more severe consequence is, if the maximal numeric value in gene_sets$entrez_gene is smaller than the length of map, executing map[gene_sets$entrez_gene] actually will not generate any warning or error message. And it would generate wrong results silently.

Some orthologs are missing

Hi,

I am trying to use msigdbr for a GSEA analysis for the GENESET - HSF1_01 in MSigDB.

Now this geneset contains a gene SHFM3 in MSigDB but it is missing in your list of orthologs for the same geneset.

I did a search for this gene - https://uswest.ensembl.org/Multi/Search/Results?q=SHFM3;site=ensembl

And found out that this gene has an alias/synonym - FBXW4 (as shown here - > https://uswest.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000107829;r=10:101610664-101695295 )

This particular alias (FBXW4) does have ORTHOLOG information for mus musculus (Fbxw4) as shown at - https://uswest.ensembl.org/Homo_sapiens/Gene/Compara_Ortholog?db=core;g=ENSG00000107829;r=10:101610664-101695295

There are many such cases and I was wondering if that is intentional or could be fixed in the future releases?

Much appreciate!

Ashu

getting error

Hello and thank you for your work,

I have this piece of code

library(msigdbr)

all_gene_sets <- msigdbr(species = "Mus musculus")
head(all_gene_sets)

but I am having the following error:

Error in parse(text = elt): <text>:1:5: simbolo inatteso
1: Use of
        ^
Traceback:

1. msigdbr(species = "Mus musculus")
2. orthologs(genes = genesets_subset$human_ensembl_gene, species = species) %>% 
 .     select(-any_of(c("human_symbol", "human_entrez"))) %>% rename(human_ensembl_gene = .data$human_ensembl, 
 .     gene_symbol = .data$symbol, entrez_gene = .data$entrez, ensembl_gene = .data$ensembl, 
 .     ortholog_sources = .data$support, num_ortholog_sources = .data$support_n)
3. rename(., human_ensembl_gene = .data$human_ensembl, gene_symbol = .data$symbol, 
 .     entrez_gene = .data$entrez, ensembl_gene = .data$ensembl, 
 .     ortholog_sources = .data$support, num_ortholog_sources = .data$support_n)
4. rename.data.frame(., human_ensembl_gene = .data$human_ensembl, 
 .     gene_symbol = .data$symbol, entrez_gene = .data$entrez, ensembl_gene = .data$ensembl, 
 .     ortholog_sources = .data$support, num_ortholog_sources = .data$support_n)
5. tidyselect::eval_rename(expr(c(...)), .data)
6. rename_impl(data, names(data), as_quosure(expr, env), strict = strict, 
 .     name_spec = name_spec, allow_predicates = allow_predicates, 
 .     error_call = error_call)
7. eval_select_impl(x, names, {
 .     {
 .         sel
 .     }
 . }, strict = strict, name_spec = name_spec, type = "rename", allow_predicates = allow_predicates, 
 .     error_call = error_call)
8. with_subscript_errors(out <- vars_select_eval(vars, expr, strict = strict, 
 .     data = x, name_spec = name_spec, uniquely_named = uniquely_named, 
 .     allow_rename = allow_rename, allow_empty = allow_empty, allow_predicates = allow_predicates, 
 .     type = type, error_call = error_call), type = type)
9. try_fetch(expr, vctrs_error_subscript = function(cnd) {
 .     cnd$subscript_action <- subscript_action(type)
 .     cnd$subscript_elt <- "column"
 .     cnd_signal(cnd)
 . })
10. withCallingHandlers(expr, vctrs_error_subscript = function(cnd) {
  .     {
  .         .__handler_frame__. <- TRUE
  .         .__setup_frame__. <- frame
  .     }
  .     out <- handlers[[1L]](cnd)
  .     if (!inherits(out, "rlang_zap")) 
  .         throw(out)
  . })
11. vars_select_eval(vars, expr, strict = strict, data = x, name_spec = name_spec, 
  .     uniquely_named = uniquely_named, allow_rename = allow_rename, 
  .     allow_empty = allow_empty, allow_predicates = allow_predicates, 
  .     type = type, error_call = error_call)
12. walk_data_tree(expr, data_mask, context_mask)
13. eval_c(expr, data_mask, context_mask)
14. reduce_sels(node, data_mask, context_mask, init = init)
15. walk_data_tree(new, data_mask, context_mask)
16. expr_kind(expr, context_mask, error_call)
17. call_kind(expr, context_mask, error_call)
18. lifecycle::deprecate_soft("1.2.0", what, details = cli::format_inline("Please use {.code {str}} instead of `.data${var}`"), 
  .     user_env = env)
19. signal_stage("deprecated", what)
20. spec(what, env = env)
21. spec_what(spec, "spec", signaller)
22. parse_expr(what)
23. parse_exprs(x)
24. chr_parse_exprs(x)
25. map(x, function(elt) as.list(parse(text = elt)))
26. lapply(.x, .f, ...)
27. FUN(X[[i]], ...)
28. as.list(parse(text = elt))
29. parse(text = elt)

Could you provide help to solve this issue?
Thank you in advance

Ensembl Gene IDs

Are Ensembl gene sets supported?

I have just started using msigdbr and I cannot find any in the gene sets I have seen so far

Thanks!

Adding the "EXACT_SOURCE" column to the MsigDB C5 entries

Thanks for the very useful package,
would it be possible to add the EXACT_SOURCE attribute to GENESET record attributes for msigdb C5 gene sets? It would make it much easier to convert msigdb accession numbers into GO IDs. Thanks!

Run KEGG in Seurat object

@igordot @smped @vreuter @actions-user

Hello msigdbr team,

I am running GSEA analysis in 10X spatial and scRNA-seq data and I would like to use KEGG dataset.
Which function/category should I run?
For Hallmark, I run m_df<- msigdbr(species = "Homo sapiens", category = "H")

but category = "KEGG" does not work. I would greatly appreciate your advice.

Thank you.

enricher result is different from msigDB web "investigate Gene Sets"

Hi,

Many thanks for the msigdbr package.
Can I ask a question about the result of enricher please?

msigdbr_t2g = msigdbr_df %>% dplyr::select(gs_name, gene_symbol) %>% as.data.frame()
enricher(gene = gene_symbols_vector, TERM2GENE = msigdbr_t2g, ...)

I am using the code above but I've found the result of enriched msigDB signatures is different from "investigate gene sets" on msigDB website. I thought it's based on the number of the overlapped gene between the user's gene and the background gene in the gene set. But the overlapped gene count from enricher seems smaller than the real overlapped count (i.e. if I use intersect to see how many genes overlapped between mine and the msigdb gene set). Did i misunderstand the function of enricher here? And if possible, how can I get the same results to msigDB web?

Thanks in advance!

Best,
Wei

Retrieve all C2 canonical pathways using option subcategory = "CP"?

Dear Igordot,

Thanks for this wonderful tool! I understand it can be used to retrieve subcategory pathways by setting subcategory = "CP:KEGG". But I was wondering if I can extract all canonical pathways as follows:

library(msigdbr)
m_df = msigdbr(species = "Homo sapiens", category = 'C2', subcategory = 'CP')
length(unique(m_df$gs_name))
[1] 29

Looking forward to your comments!

Best,
Lei

`unused argument (.data$species_name == species)` error

Hi,
I've just got unused argument (.data$species_name == species) error, and I don't know how to proceed. Is it a bug or am I doing sth wrong?

> library(msigdbr)
> msigdbr(species = "Homo sapiens")
Error in filter.tbl(msigdbr_orthologs, .data$species_name == species) : 
  unused argument (.data$species_name == species)
> msigdbr(species = "Mus musculus", category = "C2", subcategory = "CGP")
Error in filter.tbl(msigdbr_orthologs, .data$species_name == species) : 
  unused argument (.data$species_name == species)
> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS:   /opt/R-4.0.2/lib64/R/lib/libRblas.so
LAPACK: /opt/R-4.0.2/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
 [1] grid      stats4    parallel  stats     graphics  grDevices utils    
 [8] datasets  methods   base     

other attached packages:
 [1] msigdbr_7.2.1               DESeq2_1.28.1              
 [3] SummarizedExperiment_1.18.2 DelayedArray_0.14.1        
 [5] matrixStats_0.57.0          Biobase_2.48.0             
 [7] rtracklayer_1.48.0          genomation_1.20.0          
 [9] gProfileR_0.7.0             ChIPpeakAnno_3.22.4        
[11] Biostrings_2.56.0           XVector_0.28.0             
[13] VennDiagram_1.6.20          futile.logger_1.4.3        
[15] rGREAT_1.20.0               methylKit_1.14.2           
[17] GenomicRanges_1.40.0        GenomeInfoDb_1.24.2        
[19] IRanges_2.22.2              S4Vectors_0.26.1           
[21] BiocGenerics_0.34.0         gprofiler2_0.2.0           
[23] reshape2_1.4.4              ggplot2_3.3.2              
[25] gridExtra_2.3               data.table_1.13.0          
[27] biomaRt_2.44.4              igraph_1.2.6               
[29] STRINGdb_2.0.2             

loaded via a namespace (and not attached):
  [1] circlize_0.4.10          BiocFileCache_1.12.1     plyr_1.8.6              
  [4] lazyeval_0.2.2           splines_4.0.2            BiocParallel_1.22.0     
  [7] gridBase_0.4-7           digest_0.6.25            ensembldb_2.12.1        
 [10] htmltools_0.5.0          GO.db_3.11.4             magrittr_1.5            
 [13] memoise_1.1.0            BSgenome_1.56.0          limma_3.44.3            
 [16] annotate_1.66.0          readr_1.4.0              R.utils_2.10.1          
 [19] askpass_1.1              bdsmatrix_1.3-4          prettyunits_1.1.1       
 [22] colorspace_1.4-1         blob_1.2.1               rappdirs_0.3.1          
 [25] xfun_0.18                dplyr_1.0.2              crayon_1.3.4            
 [28] RCurl_1.98-1.2           jsonlite_1.7.1           graph_1.66.0            
 [31] genefilter_1.70.0        impute_1.62.0            survival_3.1-12         
 [34] glue_1.4.2               hash_2.2.6.1             gtable_0.3.0            
 [37] zlibbioc_1.34.0          seqinr_4.2-4             GetoptLong_1.0.3        
 [40] shape_1.4.5              scales_1.1.1             futile.options_1.0.1    
 [43] mvtnorm_1.1-1            DBI_1.1.0                Rcpp_1.0.5              
 [46] plotrix_3.7-8            xtable_1.8-4             viridisLite_0.3.0       
 [49] progress_1.2.2           emdbook_1.3.12           bit_4.0.4               
 [52] mclust_5.4.6             sqldf_0.4-11             htmlwidgets_1.5.2       
 [55] httr_1.4.2               gplots_3.1.0             RColorBrewer_1.1-2      
 [58] ellipsis_0.3.1           pkgconfig_2.0.3          XML_3.99-0.5            
 [61] R.methodsS3_1.8.1        farver_2.0.3             dbplyr_1.4.4            
 [64] locfit_1.5-9.4           tidyselect_1.1.0         labeling_0.3            
 [67] rlang_0.4.7              AnnotationDbi_1.50.3     munsell_0.5.0           
 [70] tools_4.0.2              gsubfn_0.7               generics_0.0.2          
 [73] RSQLite_2.2.1            ade4_1.7-15              fastseg_1.34.0          
 [76] evaluate_0.14            stringr_1.4.0            yaml_2.2.1              
 [79] knitr_1.30               bit64_4.0.5              caTools_1.18.0          
 [82] purrr_0.3.4              AnnotationFilter_1.12.0  RBGL_1.64.0             
 [85] formatR_1.7              R.oo_1.24.0              xml2_1.3.2              
 [88] compiler_4.0.2           rstudioapi_0.11          plotly_4.9.2.1          
 [91] curl_4.3                 png_0.1-7                geneplotter_1.66.0      
 [94] tibble_3.0.3             idr_1.2                  stringi_1.5.3           
 [97] GenomicFeatures_1.40.1   lattice_0.20-41          ProtGenerics_1.20.0     
[100] Matrix_1.2-18            multtest_2.44.0          vctrs_0.3.4             
[103] pillar_1.4.6             lifecycle_0.2.0          BiocManager_1.30.10     
[106] GlobalOptions_0.1.2      bitops_1.0-6             qvalue_2.20.0           
[109] R6_2.4.1                 KernSmooth_2.23-17       lambda.r_1.2.4          
[112] MASS_7.3-51.6            gtools_3.8.2             assertthat_0.2.1        
[115] chron_2.3-56             proto_1.0.0              openssl_1.4.3           
[118] rjson_0.2.20             withr_2.3.0              regioneR_1.20.1         
[121] GenomicAlignments_1.24.0 Rsamtools_2.4.0          GenomeInfoDbData_1.2.3  
[124] hms_0.5.3                tidyr_1.1.2              coda_0.19-4             
[127] rmarkdown_2.4            seqPattern_1.20.0        bbmle_1.0.23.1          
[130] numDeriv_2016.8-1.1      tinytex_0.26

Best,
Kasia

Inconsistent gene set contents with MSigDB

First, thanks for the great package! It's really convenient to be able to pull in these gene sets from MSigDB. I've been using it to pull gene sets for about a year now, and only recently noticed that some of the gene sets are different than what's on MSigDB (e.g., GOBP_Keratinization from msigdbr includes 279 genes, but on MSigDB it only has 83 genes).

I thought it might be a difference of versions (as msigdbr pulls MSigDB 7.5.1), but GOBP_Keratinization actually contains fewer genes in this version (n = 59): https://data.broadinstitute.org/gsea-msigdb/msigdb/release/7.5.1/c5.go.bp.v7.5.1.symbols.gmt

I used this line to pull all GO BP sets:

m_df_BP = msigdbr(species = "Homo sapiens",subcategory=c("BP"))

here is my session info:

R version 4.1.0 (2021-05-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] parallel stats4 stats graphics grDevices
[6] datasets utils methods base

other attached packages:
[1] scales_1.1.1 msigdbr_7.4.1
[3] biomartr_0.9.2 data.table_1.14.0
[5] GSEABase_1.54.0 graph_1.70.0
[7] annotate_1.70.0 XML_3.99-0.6
[9] reactome.db_1.76.0 GO.db_3.13.0
[11] fgsea_1.18.0 dplyr_1.0.7
[13] EnhancedVolcano_1.10.0 ggrepel_0.9.1
[15] rlist_0.4.6.1 pheatmap_1.0.12
[17] org.Hs.eg.db_3.13.0 AnnotationDbi_1.54.1
[19] readxl_1.3.1 ggplot2_3.3.5
[21] ashr_2.2-47 DESeq2_1.32.0
[23] SummarizedExperiment_1.22.0 Biobase_2.52.0
[25] MatrixGenerics_1.4.0 matrixStats_0.59.0
[27] GenomicRanges_1.44.0 GenomeInfoDb_1.28.1
[29] IRanges_2.26.0 S4Vectors_0.30.0
[31] BiocGenerics_0.38.0 rmarkdown_2.14
[33] here_1.0.1

loaded via a namespace (and not attached):
[1] snow_0.4-3 circlize_0.4.14
[3] fastmatch_1.1-3 BiocFileCache_2.0.0
[5] splines_4.1.0 BiocParallel_1.26.1
[7] digest_0.6.27 invgamma_1.1
[9] foreach_1.5.2 htmltools_0.5.2
[11] SQUAREM_2021.1 fansi_0.5.0
[13] magrittr_2.0.1 memoise_2.0.0
[15] cluster_2.1.2 doParallel_1.0.17
[17] ComplexHeatmap_2.8.0 Biostrings_2.60.1
[19] extrafont_0.17 extrafontdb_1.0
[21] prettyunits_1.1.1 colorspace_2.0-2
[23] rappdirs_0.3.3 blob_1.2.2
[25] xfun_0.30 crayon_1.4.1
[27] RCurl_1.98-1.3 genefilter_1.74.0
[29] survival_3.3-1 iterators_1.0.14
[31] glue_1.6.2 gtable_0.3.0
[33] zlibbioc_1.38.0 XVector_0.32.0
[35] GetoptLong_1.0.5 DelayedArray_0.18.0
[37] proj4_1.0-10.1 Rttf2pt1_1.3.9
[39] shape_1.4.6 maps_3.3.0
[41] DBI_1.1.1 Rcpp_1.0.7
[43] progress_1.2.2 xtable_1.8-4
[45] clue_0.3-60 bit_4.0.4
[47] truncnorm_1.0-8 httr_1.4.2
[49] RColorBrewer_1.1-2 ellipsis_0.3.2
[51] pkgconfig_2.0.3 farver_2.1.0
[53] dbplyr_2.1.1 locfit_1.5-9.4
[55] utf8_1.2.1 tidyselect_1.1.1
[57] labeling_0.4.2 rlang_0.4.11
[59] munsell_0.5.0 cellranger_1.1.0
[61] tools_4.1.0 cachem_1.0.5
[63] cli_3.3.0 generics_0.1.0
[65] RSQLite_2.2.7 evaluate_0.14
[67] stringr_1.4.0 fastmap_1.1.0
[69] yaml_2.2.1 babelgene_21.4
[71] knitr_1.33 bit64_4.0.5
[73] purrr_0.3.4 KEGGREST_1.32.0
[75] ash_1.0-15 ggrastr_0.2.3
[77] xml2_1.3.2 biomaRt_2.48.2
[79] compiler_4.1.0 rstudioapi_0.13
[81] filelock_1.0.2 curl_4.3.2
[83] beeswarm_0.4.0 png_0.1-8
[85] tibble_3.1.3 geneplotter_1.70.0
[87] stringi_1.7.3 highr_0.10
[89] ggalt_0.4.0 lattice_0.20-45
[91] Matrix_1.3-4 vctrs_0.3.8
[93] pillar_1.6.1 lifecycle_1.0.0
[95] BiocManager_1.30.16 GlobalOptions_0.1.2
[97] bitops_1.0-7 irlba_2.3.3
[99] R6_2.5.0 renv_0.15.4
[101] KernSmooth_2.23-20 gridExtra_2.3
[103] vipor_0.4.5 codetools_0.2-19
[105] MASS_7.3-55 assertthat_0.2.1
[107] rprojroot_2.0.2 rjson_0.2.21
[109] withr_2.4.2 GenomeInfoDbData_1.2.6
[111] hms_1.1.0 grid_4.1.0
[113] Cairo_1.5-12.2 mixsqp_0.3-43
[115] tinytex_0.37 ggbeeswarm_0.6.0

Problem with loading several categories

In our work we often want to test our gene lists against several categories of gene sets at once.
Until now we would load the gene sets like this:

msigdb.genes.sets <-msigdbr(species="Homo sapiens", category=c("H","C2"))

We noticed that in doing so, the gene sets are truncated, with a remaining number of genes in a gene set varying with the number of categories or their order.
After looking at the R code it seems the problem is that the categories are filtered with an "==" and not a "%in%, which means we cannot use an array in our command. But no warning or error is thrown and everything downstream works, with background ratio values wrong obviously.

Would it be possible to correct this or to forbid requesting more than one category in the command?

Update to MSIGDB

Hello!

I was wondering if there were plans to synchronize msigdbr with the latest release of MSIGDB (aug 2019)? The new MSIGDB has added and removed hundreds of gene sets so I've been finding that the information pages for most of my top GSEA hits using msgidbr annotations no longer exist.

Thank you for your time!
Best,
Henry

No gene sets from KEGG, REACTOME or BIOCARTA

It looks like it's no longer possible to get gene sets from KEGG, REACTOME or BIOCARTA:

c2_reactome <- msigdbr(category = "C2", subcategory = "REACTOME") %>%
  split(x = .$gene_symbol, f = .$gs_name)
> length(c2_reactome)
[1] 0

Can these be restored? Thank you.

Methodology details, and `write.gmt` helper functions?

Hi I came across your package which could potentially save me a lot of work so I thank you.

Could you publish the details on your methods for converting between human to X species? I need this information in order to be able to cite you in my research.

Also will you consider adding helper functions to convert from the data.frame types to a type which can be easily written as a .gmt pathway file?

Add shorter GO descriptions?

The entries in the gs_description column for GO terms are rather long and not ideal for use as human-readable identifiers when plotting ORA or GSEA results. Would it be possible to add a gs_brief_description column that uses the names from the appropriate GO database release? I have been getting the data using the code below and then left-joining it to ORA and GSEA results tables made with fgsea. For other databases, I just use the entries in gs_description.

# install.packages(c("ontologyIndex", "dplyr"))
library(ontologyIndex)
library(dplyr)

# Brief GO term descriptions (use same data from MSigDB release notes)
file <- "http://release.geneontology.org/2021-12-15/ontology/go-basic.obo"
go_basic_list <- get_OBO(file,
                         propagate_relationships = "is_a",
                         extract_tags = "minimal")

# Convert to data.frame with fewer columns
go_basic_df <- as.data.frame(go_basic_list) %>%
  filter(!obsolete) %>%
  select(pathway = id, name)

2023 update?

Thank you for developing this useful tool. Do you have any plans to update it based on the 2023 release of MSigDB?

Problem with dyplr dependency (I think)

I am getting this error when trying to use msigdbr:

`> msigdbr(species = "Homo sapiens")
Error in `select()`:
! <text>:1:5: unexpected symbol
1: Use of
        ^
Run `rlang::last_error()` to see where the error occurred.
> rlang::last_error()
<simpleError in select(., .data$human_ensembl_gene, gene_symbol = .data$human_gene_symbol,     entrez_gene = .data$human_entrez_gene): <text>:1:5: unexpected symbol
1: Use of
        ^>`

session info:

`> sessionInfo()
R version 4.2.0 (2022-04-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.5 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] EnrichmentBrowser_2.26.0    graph_1.74.0               
 [3] SummarizedExperiment_1.26.1 Biobase_2.56.0             
 [5] GenomicRanges_1.48.0        GenomeInfoDb_1.32.4        
 [7] IRanges_2.30.1              S4Vectors_0.34.0           
 [9] BiocGenerics_0.42.0         MatrixGenerics_1.8.1       
[11] matrixStats_0.63.0          msigdbr_7.5.1              
[13] fgsea_1.22.0                biomaRt_2.52.0             
[15] dplyr_1.0.10                clusterProfiler_4.4.4      `

Any ideas...?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.