Comments (4)
I am not quite sure if this approach is stable enough but I figured out that we can search the synonym in the synonym
field of the external_synonym
table which also contains a xref_id
field.
This xref_id
can then be matched to the display_xref_id
field of the gene
table.
See the example below for the aforementioned WISP2
gene. This results in the "correct" gene result, the CCN5
gene for which WISP2
is a synonym.
MySQL [homo_sapiens_core_109_38]> select * from external_synonym where synonym="WISP2";
+---------+---------+
| xref_id | synonym |
+---------+---------+
| 2781993 | WISP2 |
+---------+---------+
1 row in set (0.083 sec)
MySQL [homo_sapiens_core_109_38]> select gene_id,stable_id from gene where display_xref_id=2781993;
+---------+-----------------+
| gene_id | stable_id |
+---------+-----------------+
| 106801 | ENSG00000064205 |
+---------+-----------------+
1 row in set (0.109 sec)
I hope this helps.
from gget.
I played around with this approach a bit and now I am unsure if it should be used.
When searching in the synonyms table for the gene CCR9
(ENSG00000173585
), it results in the xref_id
2766033
.
But this links to the gene ACKR2
(ENSG00000144648
), which is not CCR9
but one of its synonyms...
MySQL [homo_sapiens_core_109_38]> select * from external_synonym where synonym="CCR9";
+---------+---------+
| xref_id | synonym |
+---------+---------+
| 2766033 | CCR9 |
+---------+---------+
1 row in set (0.029 sec)
MySQL [homo_sapiens_core_109_38]> select gene_id,stable_id from gene where display_xref_id=2766033;
+---------+-----------------+
| gene_id | stable_id |
+---------+-----------------+
| 124089 | ENSG00000144648 |
+---------+-----------------+
1 row in set (0.058 sec)
Another possible approach could be to search in the gene_attrib
table as follows (for the WISP2
gene):
(This allows for the retrieval of the gene_id
directly instead of the xref_id
)
MySQL [homo_sapiens_core_109_38]> select * from gene_attrib where value="WISP2";
+---------+----------------+-------+
| gene_id | attrib_type_id | value |
+---------+----------------+-------+
| 106801 | 4 | WISP2 |
+---------+----------------+-------+
1 row in set (0.075 sec)
MySQL [homo_sapiens_core_109_38]> select gene_id,stable_id from gene where gene_id=106801;
+---------+-----------------+
| gene_id | stable_id |
+---------+-----------------+
| 106801 | ENSG00000064205 |
+---------+-----------------+
1 row in set (0.028 sec)
from gget.
Hi Samuel,
Thank you for your suggestion. The new release v0.27.9 of gget now also searches Ensembl synonyms (in addition to gene descriptions and names) to return more comprehensive search results. You can install the version using the command
pip install gget
Let us know if there are any issues.
from gget.
Hi Samuel,
I agree that it would be great to expand gget search
to also include results based on synonyms. Unfortunately, the Ensembl SQL database does not include a synonyms field (which is why I am getting the synonyms from UniProt for gget info
). Their website search does not use the publicly accessible SQL database, so it is extremely difficult to reproduce all of the results that would be returned through a website search (hence the disclaimer on searching name and description only). I agree though that this is not optimal and will search for a workaround when I have some time
from gget.
Related Issues (20)
- Rewrite gget BLAST to use BLAST+ instead of deprecated NCBI server
- Invalid command line Expected -option_name or --option_name, got '-' using gget muscle HOT 6
- Request addition of Open Targets API
- Multiple sequence alignment for multiple species HOT 10
- gget seq encounters missing gene name from uniprot and throws type error HOT 2
- gget.cellxgene TileDBError error when trying to return anndata HOT 4
- Add module to COSMIC database
- Add module for reactome
- KeyError: 'Primary_Acc' HOT 3
- Module to download MitoCarta3 database
- Is it possible to get all ELM's using gget? HOT 4
- Add module to depmap database HOT 1
- gget blast can not restrict by taxonomy
- GENCODE GTFs+FASTAs
- cellxgene filter improvements HOT 1
- gget blast sequence needs to be capitalized for automatic recognition of nucleotide/amino acid seq HOT 1
- Request addition of Eukaryotic Linear Motif database HOT 1
- Specify version in ensembl "search" module HOT 2
- gget pdb error in python HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gget.