Comments (12)
Thanks a lot Denes for his, this is helpful for the whole community I think! For the Bio release version, the newest version is only 3.4.0 currently, you mentioned 3.4.3, so installation from Github via devtools::install_github('saezlab/OmnipathR')
is necessary I think and it worked like a charm.
from decoupler-py.
Most of the mouse database knowledge is orthology translated from human, I believe MSigDB is no different. They write here:
an orthology converted version of these sets is being provided here to allow analysis in the mouse gene-space alongside other, mouse-native, sets
However, they don't tell which ones are the mouse-native sets. I think M1 and M8 are definitely, but the rest are more likely to be orthology translated, either by MSigDB or its primary resources.
The two options here:
-
Load the human MSigDB and translate to mouse by orthology as shown in my first comment.
-
Using our database builder module
pypath
, process the MSigDB mouse data and write a little custom code to extract the desired data frame from the dictionaries provided bypypath
. Something like this:
from pypath.inputs import msigdb
import pandas as pd
msigdb_mouse = msigdb.msigdb_annotations(organism = 'mouse')
msigdb_mouse_df = pd.DataFrame(
[(k,) + a for k, v in msigdb_mouse.items() for a in v],
columns = ['uniprot', 'collection', 'geneset']
)
msigdb_mouse_df
uniprot collection geneset
0 Q9WVC6 mirna_targets_mirdb MIR_322_5P
1 Q9WVC6 mirna_targets_mirdb MIR_497A_5P
2 Q9WVC6 chemical_and_genetic_perturbations CADWELL_ATG16L1_TARGETS_DN
3 Q9WVC6 chemical_and_genetic_perturbations LEIN_OLIGODENDROCYTE_MARKERS
4 Q9WVC6 chemical_and_genetic_perturbations GRAESSMANN_APOPTOSIS_BY_SERUM_DEPRIVATION_UP
... ... ... ...
568859 Q80T03 reactome_pathways REACTOME_O_LINKED_GLYCOSYLATION
568860 Q80T03 reactome_pathways REACTOME_TERMINATION_OF_O_GLYCAN_BIOSYNTHESIS
568861 Q80T03 reactome_pathways REACTOME_O_LINKED_GLYCOSYLATION_OF_MUCINS
568862 Q80T03 reactome_pathways REACTOME_POST_TRANSLATIONAL_PROTEIN_MODIFICATION
568863 Q80T03 reactome_pathways REACTOME_METABOLISM_OF_PROTEINS
[568864 rows x 3 columns]
If you prefer gene symbols instead of UniProts, use the msigdb_download_collections
function:
from pypath.inputs import msigdb
import pandas as pd
msigdb_mouse_raw = msigdb.msigdb_download_collections(organism = 'mouse')
msigdb_mouse_raw_df = pd.DataFrame(
[
(collname, collcode, gset, gene)
for (collname, collcode), coll in msigdb_mouse_raw.items()
for gset, genes in coll.items()
for gene in genes
],
columns = ['collection', 'code', 'geneset', 'genesymbol']
)
msigdb_mouse_raw_df
collection code geneset genesymbol
0 hallmark mh.all HALLMARK_TNFA_SIGNALING_VIA_NFKB Dusp1
1 hallmark mh.all HALLMARK_TNFA_SIGNALING_VIA_NFKB Tnfaip3
2 hallmark mh.all HALLMARK_TNFA_SIGNALING_VIA_NFKB Sqstm1
3 hallmark mh.all HALLMARK_TNFA_SIGNALING_VIA_NFKB Rcan1
4 hallmark mh.all HALLMARK_TNFA_SIGNALING_VIA_NFKB Egr2
... ... ... ... ...
667940 cell_type_signatures m8.all TABULA_MURIS_SENISTRACHEA_SMOOTH_MUSCLE_CELL_O... S100a1
667941 cell_type_signatures m8.all TABULA_MURIS_SENISTRACHEA_SMOOTH_MUSCLE_CELL_O... Jund
667942 cell_type_signatures m8.all TABULA_MURIS_SENISTRACHEA_SMOOTH_MUSCLE_CELL_O... Msn
667943 cell_type_signatures m8.all TABULA_MURIS_SENISTRACHEA_SMOOTH_MUSCLE_CELL_O... Tle5
667944 cell_type_signatures m8.all TABULA_MURIS_SENISTRACHEA_SMOOTH_MUSCLE_CELL_O... Dcxr
[667945 rows x 4 columns]
Note: by default the c5 or m5 geneset collections (mostly gene ontology) are disabled, see the exclude
argument. MSigDB recently changed a few things on their web page, and until now the pypath.inputs.msigdb
module didn't explicitly support mouse. Hence I had to update the code in pypath. For this reason, the above example above works only with the current head of master branch (v0.14.17):
pip3 install 'git+https://github.com/saezlab/pypath.git'
from decoupler-py.
What's your Python version? Not a downgrade, but an upgrade should be necessary. Currently 3.9 is the minimum required version for pypath.
from decoupler-py.
Hi,
PROGENy and other annotation resources are not yet available for organisms other than human. However, you can easily translate them by orthology. Running these for the first time might take long as it requires many downloads from Ensembl, HomoloGene and UniProt. Subsequent runs work from cache and take only a few seconds. Here we use the modules pypath
and omnipath
, which are available by pip
:
pip install https://github.com/saezlab/omnipath
pip install https://github.com/saezlab/pypath
import omnipath
from pypath.utils import homology, mapping
progeny = omnipath.requests.Annotations.get(resources = 'PROGENy', wide = True)
progeny['mouse_uniprot'] = [homology.translate(u, 10090) for u in progeny.uniprot]
progeny = progeny.explode('mouse_uniprot')
progeny['mouse_genesymbol'] = [mapping.label(u, ncbi_tax_id = 10090) for u in progeny.mouse_uniprot]
progeny
# uniprot genesymbol entity_type p_value pathway weight mouse_uniprot mouse_genesymbol
# 0 P35250 RFC2 protein 0.624086 Trail -0.800677 Q9WUK4 Rfc2
# 1 P35250 RFC2 protein 0.000704 Hypoxia -2.049501 Q9WUK4 Rfc2
# 2 P35250 RFC2 protein 0.001655 EGFR 1.470647 Q9WUK4 Rfc2
# 3 P35250 RFC2 protein 0.833456 TNFa -0.124993 Q9WUK4 Rfc2
# 4 P35250 RFC2 protein 0.630460 TGFb -0.430508 Q9WUK4 Rfc2
# ... ... ... ... ... ... ... ... ...
# 233402 Q96A11 GAL3ST3 protein 0.236295 PI3K -0.228038 P61315 Gal3st3
# 233403 Q96A11 GAL3ST3 protein 0.705764 JAK-STAT 0.052601 P61315 Gal3st3
# 233404 Q96A11 GAL3ST3 protein 0.575544 EGFR 0.070407 P61315 Gal3st3
# 233405 Q96A11 GAL3ST3 protein 0.988972 Trail -0.005215 P61315 Gal3st3
# 233406 Q96A11 GAL3ST3 protein 0.607089 Hypoxia 0.063501 P61315 Gal3st3
#
# [237671 rows x 8 columns]
I hope this helps.
Best,
Denes
from decoupler-py.
Thanks a lot, that should indeed help me.
I tried to run your example but it throws an error.
I installed pypath-omnipath via pip and everything is fine.
But when I want to import via:
from pypath.utils import homology,mapping
I have an error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Input In [2], in <module>
1 import omnipath
----> 2 from pypath.utils import homology,mapping
File ~/opt/miniconda3/envs/scanpy/lib/python3.9/site-packages/pypath/utils/homology.py:41, in <module>
37 import pickle
39 import timeloop
---> 41 import pypath.utils.mapping as mapping
42 import pypath.share.common as common
43 import pypath.internals.intera as intera
File ~/opt/miniconda3/envs/scanpy/lib/python3.9/site-packages/pypath/utils/mapping.py:73, in <module>
71 import pypath.inputs.uniprot as uniprot_input
72 import pypath.inputs.pro as pro_input
---> 73 import pypath.inputs.biomart as biomart_input
74 import pypath.inputs.unichem as unichem_input
75 import pypath.internals.input_formats as input_formats
File ~/opt/miniconda3/envs/scanpy/lib/python3.9/site-packages/pypath/inputs/biomart.py:36, in <module>
34 import pypath.share.curl as curl
35 import pypath.resources.urls as urls
---> 36 import pypath.utils.taxonomy as taxonomy
38 _logger = session_mod.Logger(name = 'biomart_input')
41 # for mouse homologues: Filter name = "with_mmusculus_homolog"
File ~/opt/miniconda3/envs/scanpy/lib/python3.9/site-packages/pypath/utils/taxonomy.py:88, in <module>
49 # XXX: Shouldn't we keep all functions and variables separated
50 # (together among them)?
51 taxids = {
52 9606: 'human',
53 10090: 'mouse',
(...)
80 9544: 'rhesus macaque',
81 }
83 taxids2 = dict(
84 (
85 t.taxon_id,
86 t.common_name.lower()
87 )
---> 88 for t in ensembl_input.ensembl_organisms()
89 )
91 taxa = common.swap_dict_simple(taxids)
92 taxa2 = common.swap_dict_simple(taxids2)
File ~/opt/miniconda3/envs/scanpy/lib/python3.9/site-packages/pypath/inputs/ensembl.py:52, in ensembl_organisms()
49 c = curl.Curl(url)
50 soup = bs4.BeautifulSoup(c.result, 'html.parser')
---> 52 for r in soup.find('table').find_all('tr'):
54 if not record:
56 record = collections.namedtuple(
57 'EnsemblOrganism',
58 [c.text.lower().replace(' ', '_') for c in r] +
59 ['ensembl_name']
60 )
AttributeError: 'NoneType' object has no attribute 'find_all'
I tried to clear the cache in ~/.pypath and rerun but no effect.
Did you already see this error?
Thanks
from decoupler-py.
It seems the problem is from the Ensembl website. The server at https://www.ensembl.org/info/about/species.html is down and so the package cannot be loaded.
from decoupler-py.
Yes, the Ensembl server is having issues today, it's up again now but still slow. Unfortunately, without Ensembl the homology translation won't work, but this kind of error doesn't happen often. Ensembl has 4 mirrors, maybe I will later add an option for choosing mirror.
from decoupler-py.
Hi, is there also an R-only solution to create all necessary databases and networks for the rat genome? pypath has an issue it seems, and cannot be installed via python, and I generally like to stay within R ;). Thanks!
from decoupler-py.
Hi @chrarnold ,
In R it could look like this:
library(OmnipathR)
library(dplyr)
progeny <- import_omnipath_annotations(resources = 'PROGENy', wide = TRUE)
human_rat <- homologene_uniprot_orthology(target = 10116L, by = genesymbol)
progeny_rat <-
progeny %>%
inner_join(human_rat, by = c('uniprot' = 'source')) %>%
mutate(uniprot = target) %>%
select(-target, -genesymbol) %>%
translate_ids(uniprot, genesymbol, organism = 10116L) %>%
relocate(genesymbol, .after = uniprot)
progeny_rat
# A tibble: 86,038 × 6
uniprot genesymbol entity_type pathway weight p_value
<chr> <chr> <chr> <chr> <dbl> <dbl>
1 Q641W4 Rfc2 protein Hypoxia -2.05 7.04e- 4
2 Q641W4 Rfc2 protein TGFb -0.431 6.30e- 1
3 Q641W4 Rfc2 protein NFkB -0.410 3.72e- 1
4 Q641W4 Rfc2 protein p53 -3.35 9.86e- 4
5 Q641W4 Rfc2 protein TNFa -0.125 8.33e- 1
6 Q641W4 Rfc2 protein EGFR 1.47 1.66e- 3
7 Q641W4 Rfc2 protein Trail -0.801 6.24e- 1
8 Q641W4 Rfc2 protein JAK-STAT 0.00122 9.98e- 1
9 Q641W4 Rfc2 protein MAPK 2.28 3.32e-11
10 Q641W4 Rfc2 protein VEGF -0.157 8.48e- 1
# … with 86,028 more rows
If you have already OmnipathR installed, please update it to the most recent version (3.4.3 or 3.5.6): due to the recent UniProt URL and API update the above example won't work with earlier versions.
Best,
Denes
from decoupler-py.
Is there a way to add the mouse msigdb database as a resource to pull from the dc.get_resource funtion? It exists on the GSEA site but is not built into the wrapper. If this is not possible, how can I load it in myself? I am trying to run an analysis on the functional enrichment of biological terms. Thanks!
from decoupler-py.
downloading pypath with the code above, I received an error:
ERROR: Package 'pypath-omnipath' requires a different Python: 3.8.13 not in '<4.0,>=3.9'
Note: you may need to restart the kernel to use updated packages.
Is there a way to use this package without having to downdate my python?
from decoupler-py.
Closing this issue since now it is implemented as a function translate_net
in decoupler-1.3.0
.
Here is a vignette showcasing how to do it: https://decoupler-py.readthedocs.io/en/latest/notebooks/translate.html
from decoupler-py.
Related Issues (20)
- run_ora_df HOT 2
- Loading resources for mouse is not working HOT 9
- Differential expression error in pseudo-bulk step HOT 4
- shuffle_nets function produces networks with repeated edges HOT 1
- Module request: UCell signatures HOT 2
- use of the run_gsva method : format of the net argument HOT 1
- Switching to conda forge HOT 7
- Problems running decoupleR with Compressed Sparse Column (csc) count matrix HOT 2
- Error downlaoding progeny model for mouse species HOT 3
- shuffle_net output not random HOT 1
- method run_gsea() error : SystemError: CPUDispatcher(<function nb_gsea at 0x7f7477d3b9c0>) returned a result with an exception set HOT 6
- Pseudobulk for each sample HOT 2
- Method dc.run_gsva error HOT 2
- Get gene markers used to annotate the cell type HOT 2
- dc.get_progeny(organism = 'mouse') fails with ImportError HOT 5
- ValueError: Invalid value `loops` for `InteractionsQuery` in dc.get_collectri() HOT 2
- Functional PB Tutorial fails at dc.plot_associations HOT 5
- dc.get_collectri() does not work HOT 2
- Announcement: some Galaxy modules for some decoupler functionalities HOT 1
- Limiting usage of cores or threads HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from decoupler-py.