biolink / ontobio Goto Github PK
View Code? Open in Web Editor NEWpython library for working with ontologies and ontology associations
Home Page: https://ontobio.readthedocs.io/en/latest/
License: BSD 3-Clause "New" or "Revised" License
python library for working with ontologies and ontology associations
Home Page: https://ontobio.readthedocs.io/en/latest/
License: BSD 3-Clause "New" or "Revised" License
Add a new method called something like parse_iter
, this should yield
results, returning a generator.
Code can then call
for a in p.parse_iter():
...
And have the file be lazily parsed.
Note: the currently implementation of parse() should be moved to the iter method, yields added, and then:
def parse(self, ...):
return list(self.parse_iter(...
A gaf file should produce a graph URI based on the group name.
So for Gaf group 'goa' and dataset 'chicken_complex', produce a URI http://www.ebi.ac.uk/GOA/chicken_complex/
.
For a GAF line, we can produce a unique id with a hash function on that line, after any "piped" columns are sorted (alphanumerically?). Then for an instance of a term in a column of the gaf line, we use the id of the final identifier: {graphUri}/{lineHash}/{termId}
.
For example, for goa chicken_complex:
http://www.ebi.ac.uk/GOA/chicken_complex/123abc/GO:0016021
See
monarch-initiative/biolink-api#145 (comment)
Every time we have a ARG={}
or ARG=[]
in an init, it should be changed to None, and in the body explicitly check for None arguments
GafWriter and GafParser need to know how to filter out evidence associations, and put headers with versions in each file.
This is on mac/os.. pandoc is installed, but this seems like a heavy-weight requirement.
`
pip install ontobio
Collecting ontobio
Downloading ontobio-0.2.10.tar.gz (78kB)
100% |████████████████████████████████| 81kB 1.3MB/s
Complete output from command python setup.py egg_info:
pandoc: /private/var/folders/_z/t1f6j2317kq4fcspgcyd2_780000gn/T/pip-build-iry9_6/ontobio/README.md: openFile: does not exist (No such file or directory)
README.md conversion to reStructuredText failed. Error:
Command '('pandoc', '--from', 'markdown', '--to', 'rst', '/private/var/folders/_z/t1f6j2317kq4fcspgcyd2_780000gn/T/pip-build-iry9_6/ontobio/README.md')' returned non-zero exit status 1
Traceback (most recent call last):
File "", line 1, in
File "/private/var/folders/_z/t1f6j2317kq4fcspgcyd2_780000gn/T/pip-build-iry9_6/ontobio/setup.py", line 28, in
with open(readme_path) as read_file:
IOError: [Errno 2] No such file or directory: '/private/var/folders/_z/t1f6j2317kq4fcspgcyd2_780000gn/T/pip-build-iry9_6/ontobio/README.md'
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /private/var/folders/_z/t1f6j2317kq4fcspgcyd2_780000gn/T/pip-build-iry9_6/ontobio/
`
The 'relation' argument of GolrAssociationQuery() appears to filter on the names of relations, whereas, each such relation generally has a CURIE attached to it (e.g. "interacts with" is ontology term "RO:0002434". It would generally be more precise to be able to filter on relation_id (i.e. "RO:0002434") or even, an array of such relation_ids?
each file should start with:
!gaf-version: 2.1
note: both the main output and the evidence filtered version
diskcache
on pypi
For this query
http://localhost:8888/api/bioentity/gene/HGNC:11593/homologs/?homology_type=O&fetch_objects=false
The JSON returned is this and clearly the taxon for zfin is missing :
{'id': '11d97a53-bf91-477e-89e2-340f94e8264e', 'subject': {'id': 'NCBIGene:347853', 'label': 'TBX10', 'taxon': {'id': 'NCBITaxon:9606', 'label': 'Homo sapiens'}}, 'object': {'id': 'ZFIN:ZDB-GENE-121228-1', 'label': 'tbx10'}, 'negated': False, 'relation': {'id': 'RO:HOM0000017', 'label': 'in orthology relationship with'}, 'publications': [{'id': 'PMID:22915831'}], 'provided_by': ['https://data.monarchinitiative.org/ttl/zfin.ttl'], 'evidence': ['MONARCH:ba21515e9db29392', 'PMID:22915831', 'ECO:0000031', 'NCBIGene:347853', 'ZFIN:ZDB-GENE-121228-1'], 'evidence_graph': {'nodes': [{'id': 'ECO:0000031', 'lbl': 'protein BLAST evidence used in manual assertion', 'meta': {}}, {'id': 'MONARCH:ba21515e9db29392', 'lbl': None, 'meta': {}}, {'id': 'ZFIN:ZDB-GENE-121228-1', 'lbl': 'tbx10', 'meta': {}}, {'id': 'PMID:22915831', 'lbl': 'Ahn et al; Evolution of the Tbx6/16 Subfamily Genes in Vertebrates: Insights from Zebrafish; Mol. Biol. Evol.; 2012; 29(12); 3959-3983', 'meta': {}}, {'id': 'NCBIGene:347853', 'lbl': 'TBX10', 'meta': {}}], 'edges': [{'sub': 'MONARCH:ba21515e9db29392', 'obj': 'PMID:22915831', 'pred': 'dc:source', 'meta': {'isDefinedBy': ['https://data.monarchinitiative.org/ttl/zfin.ttl'], 'lbl': ['Source'], 'equivalentOriginalNodeTarget': ['http://zfin.org/ZDB-PUB-120830-8']}}, {'sub': 'ZFIN:ZDB-GENE-121228-1', 'obj': 'NCBIGene:347853', 'pred': 'RO:HOM0000017', 'meta': {'owlType': ['Annotation'], 'isDefinedBy': ['https://data.monarchinitiative.org/ttl/zfin.ttl'], 'lbl': ['in orthology relationship with']}}, {'sub': 'MONARCH:ba21515e9db29392', 'obj': 'NCBIGene:347853', 'pred': 'OBAN:association_has_object', 'meta': {'isDefinedBy': ['https://data.monarchinitiative.org/ttl/zfin.ttl'], 'lbl': ['association has object']}}, {'sub': 'MONARCH:ba21515e9db29392', 'obj': 'PMID:22915831', 'pred': 'dc:source', 'meta': {'isDefinedBy': ['https://data.monarchinitiative.org/ttl/zfin.ttl'], 'lbl': ['Source']}}, {'sub': 'MONARCH:ba21515e9db29392', 'obj': 'ZFIN:ZDB-GENE-121228-1', 'pred': 'OBAN:association_has_subject', 'meta': {'isDefinedBy': ['https://data.monarchinitiative.org/ttl/zfin.ttl'], 'lbl': ['association has subject']}}, {'sub': 'MONARCH:ba21515e9db29392', 'obj': 'ECO:0000031', 'pred': 'RO:0002558', 'meta': {'isDefinedBy': ['https://data.monarchinitiative.org/ttl/zfin.ttl'], 'lbl': ['has evidence']}}], 'meta': {'query': 'monarch:cypher/gene-homology.yaml'}}}
I think Dictionary literals look clearer.
ontologies can be local (filesystem) or remote (currently SPARQL service).
currently most remote access is eager - given a parameter such as ontology (e.g. go) or focus node, fetch graph into memory (and cache for future access).
This is intended to support core use cases which require processing of entire ontologies, it's most efficient to do this in-memory.
But for other use cases this tradeoff doesn't make sense - e.g. search query across all ontologies, quick ad-hoc ancestor query on ncbitaxon.
We should add better support for lazy access to remote services, focusing on SPARQL and Neo4J (bolt or SicGraph service layer)
Currently the model assumes extensions are a conjunctive list of relational expressions. Annots with disjunctive expressions are translated to multiple annots. This is semantically equivalent but problematic as number of annots change.
The model will change to support disjunctions
In Gpad, Gaf, and Hpoa.
If no positional argument (subcommands) are given, the script just fails. We should instead either fail politely or have a default subcommand that just runs (perhaps validate?).
See "Mike's Script" go-site/pipeline/utils/new-filter-gaf.pl, the --noiea-file option.
We should have this be a --filter-{evidence code}
and we just filter out any lines of the gaf with the given evidence code from the output file.
So my code using ontobio started throwing this error about an hour ago:
KeyError: 'GO:0001005'
I traced it to the subontology method in sparql_ontology.py looking up "text definitions" and then trying to set metadata for the definition's corresponding node in the GO graph (created using: ontology = OntologyFactory().create("go")).
This seems to be caused by a discrepancy between terms contained in the go ontology graph and terms returned by the SPARQL query in sparql_ontol_util.fetchall_textdefs("go"), e.g. 'GO:0001005' gets returned in the text definitions list but isn't returned when the GO graph was first initialized.
Since this term GO:0001005 and three others were merged into GO:0001004 just fourteen days ago (geneontology/go-ontology#14852), I'm suspecting this is a data issue, but I could be way off about that. Any ideas what's going on? Let me know if you guys want me to push my current code up to my github.
Thanks!
-Dustin
BTW, here's the full trace:
Traceback (most recent call last): File "pombase_golr_query.py", line 137, in <module> tad = TermAnnotationDictionary(aset) File "/Users/ebertdu/go/go-pombase/pombase_direct_bp_annots_query.py", line 92, in __init__ ancestor_bps = get_ancestor_bps(object_id) File "/Users/ebertdu/go/go-pombase/pombase_direct_bp_annots_query.py", line 77, in get_ancestor_bps for ancestor in get_ancestors(mf_go_term): File "/Users/ebertdu/go/go-pombase/pombase_direct_bp_annots_query.py", line 44, in get_ancestors subont = onto.subontology(all_ancestors) File "/Users/ebertdu/go/go-pombase/ontobio/sparql/sparql_ontology.py", line 112, in subontology self.all_text_definitions() File "/Users/ebertdu/go/go-pombase/ontobio/sparql/sparql_ontology.py", line 68, in all_text_definitions self.add_text_definition(td) File "/Users/ebertdu/go/go-pombase/ontobio/ontol.py", line 777, in add_text_definition self._add_meta_element(textdef.subject, 'definition', textdef.as_dict()) File "/Users/ebertdu/go/go-pombase/ontobio/ontol.py", line 785, in _add_meta_element n = self.node(id) File "/Users/ebertdu/go/go-pombase/ontobio/ontol.py", line 336, in node return self.get_graph().node[id] KeyError: 'GO:0001005'
Very interested to try using ontobio for generating graph vizualisations. Doc seems to be lacking.
e.g. nothing much here:
http://ontobio.readthedocs.io/en/latest/outputs.html#graphviz-output
or here:
In [60]: GraphRenderer?
Init signature: GraphRenderer(outfile=None, config=None, **args)
Docstring: base class for writing networkx graphs
File: /repos/ontobio/ontobio/io/ontol_renderers.py
Type: type
Where should I start?
Seems to work differently, but will check again when my glucose levels are higher.
When ontobio is imported the user gets
> UserWarning: Cachier warning: pymongo was not found. MongoDB cores will
> not work.
> "Cachier warning: pymongo was not found. MongoDB cores will not work.")
and
> ERROR:root:Empty graph for 'cache/ontologies/pato.json' - did you use
> the correct id?
The former may be an upstream for cachier
Scenario is that the client initiates a bunch of queries in parallel. One of them was the initial query prior to collecting a set (e.g. orthologs of the query gene) of related bioentities. When the batch of responses are received for the parallel queries the only way to pick out the initial query is by matching on the IDs, including the name space, but since this has been swapped out it can't be string matched. Thoughts?
I'm running on Ubuntu 14.04, using python3:
(venv) lance@Lance-PC:~/git/2017/ontobio_test$ python --version
Python 3.4.3
(venv) lance@Lance-PC:~/git/2017/ontobio_test$ pip --version
pip 9.0.1 from /home/lance/git/2017/ontobio_test/venv/lib/python3.4/site-packages (python 3.4)
(venv) lance@Lance-PC:~/git/2017/ontobio_test$ pip install ontobio
Collecting ontobio
Using cached ontobio-0.2.10-py3-none-any.whl
Collecting scipy (from ontobio)
Using cached scipy-0.19.1-cp34-cp34m-manylinux1_x86_64.whl
Collecting cachier (from ontobio)
Using cached cachier-1.2.1.tar.gz
Collecting networkx (from ontobio)
Using cached networkx-1.11-py2.py3-none-any.whl
Collecting pyyaml (from ontobio)
Collecting marshmallow (from ontobio)
Using cached marshmallow-2.13.5-py2.py3-none-any.whl
Collecting requests (from ontobio)
Using cached requests-2.18.1-py2.py3-none-any.whl
Collecting prefixcommons (from ontobio)
Using cached prefixcommons-0.1.4-py3-none-any.whl
Collecting pysolr (from ontobio)
Using cached pysolr-3.6.0-py2.py3-none-any.whl
Collecting sparqlwrapper (from ontobio)
Collecting numpy>=1.8.2 (from scipy->ontobio)
Using cached numpy-1.13.0-cp34-cp34m-manylinux1_x86_64.whl
Collecting watchdog (from cachier->ontobio)
Collecting portalocker (from cachier->ontobio)
Using cached portalocker-1.1.0-py2.py3-none-any.whl
Collecting decorator>=3.4.0 (from networkx->ontobio)
Using cached decorator-4.0.11-py2.py3-none-any.whl
Collecting certifi>=2017.4.17 (from requests->ontobio)
Using cached certifi-2017.4.17-py2.py3-none-any.whl
Collecting urllib3<1.22,>=1.21.1 (from requests->ontobio)
Using cached urllib3-1.21.1-py2.py3-none-any.whl
Collecting chardet<3.1.0,>=3.0.2 (from requests->ontobio)
Using cached chardet-3.0.4-py2.py3-none-any.whl
Collecting idna<2.6,>=2.5 (from requests->ontobio)
Using cached idna-2.5-py2.py3-none-any.whl
Collecting rdflib>=4.0 (from sparqlwrapper->ontobio)
Using cached rdflib-4.2.2-py3-none-any.whl
Collecting pathtools>=0.1.1 (from watchdog->cachier->ontobio)
Collecting argh>=0.24.1 (from watchdog->cachier->ontobio)
Using cached argh-0.26.2-py2.py3-none-any.whl
Collecting pyparsing (from rdflib>=4.0->sparqlwrapper->ontobio)
Using cached pyparsing-2.2.0-py2.py3-none-any.whl
Collecting isodate (from rdflib>=4.0->sparqlwrapper->ontobio)
Building wheels for collected packages: cachier
Running setup.py bdist_wheel for cachier ... error
Complete output from command /home/lance/git/2017/ontobio_test/venv/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-r5b9tydd/cachier/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/tmpwthrce5opip-wheel- --python-tag cp34:
running bdist_wheel
error: invalid truth value ''
----------------------------------------
Failed building wheel for cachier
Running setup.py clean for cachier
Failed to build cachier
Installing collected packages: numpy, scipy, pathtools, argh, pyyaml, watchdog, portalocker, cachier, decorator, networkx, marshmallow, certifi, urllib3, chardet, idna, requests, prefixcommons, pysolr, pyparsing, isodate, rdflib, sparqlwrapper, ontobio
Running setup.py install for cachier ... done
Successfully installed argh-0.26.2 cachier-1.2.1 certifi-2017.4.17 chardet-3.0.4 decorator-4.0.11 idna-2.5 isodate-0.5.4 marshmallow-2.13.5 networkx-1.11 numpy-1.13.0 ontobio-0.2.10 pathtools-0.1.2 portalocker-1.1.0 prefixcommons-0.1.4 pyparsing-2.2.0 pysolr-3.6.0 pyyaml-3.12 rdflib-4.2.2 requests-2.18.1 scipy-0.19.1 sparqlwrapper-1.8.0 urllib3-1.21.1 watchdog-0.8.3
Now I try running app.py, which imports ontobio.
(venv) lance@Lance-PC:~/git/2017/ontobio_test$ python app.py
/home/lance/git/2017/ontobio_test/venv/lib/python3.4/site-packages/cachier/mongo_core.py:24: UserWarning: Cachier warning: pymongo was not found. MongoDB cores will not work.
"Cachier warning: pymongo was not found. MongoDB cores will not work.")
Traceback (most recent call last):
File "app.py", line 10, in <module>
results = q.exec()
File "/home/lance/git/2017/ontobio_test/venv/lib/python3.4/site-packages/ontobio/golr/golr_query.py", line 349, in exec
params = self.solr_params()
File "/home/lance/git/2017/ontobio_test/venv/lib/python3.4/site-packages/ontobio/golr/golr_query.py", line 304, in solr_params
self._set_solr(self.get_config().solr_search)
AttributeError: 'Session' object has no attribute 'solr_search'
And this is app.py:
from ontobio.golr.golr_query import GolrSearchQuery
q = GolrSearchQuery(
term='diabetes',
category=['disease', 'gene'],
rows=5,
start=1
)
results = q.exec()
for d in results['docs']:
print(d)
As you can see when I run app.py I get a warning about pymongo, but installing it doesn't make a difference (the warning just goes away).
When traversing the graph looking for the proteins that encode "WB:WBGene00017137", one of the nodes returned has an identifier with the value ":.well-known/genid/NCBIGene65018product"
Looks like a parsing error to me. Plus this is the human homolog, PTEN induced putative kinase 1 [ Homo sapiens (human) ] , not the wormbase protein.
calling
results= GolrAssociationQuery(
...
non_null_fields=['subject','relation','object']
)
(other args omitted for clarity)
I still get some results with subject : null
I am trying to modify GolrAssociationQuery to allow for searching for associations where either the subject or object match an ID, and then also filter by semantic categories. This filtering should only apply on the concept that doesn't match an ID. The logic looks like this:
filter_queries.append(
'(' + subject_id_filter + ' AND ' + object_category_filter + ')' \
' OR ' \
'(' + object_id_filter + ' AND ' + subject_category_filter + ')'
)
Here I have implemented this, and you can see how I'm building these filters: https://github.com/lhannest/ontobio/blob/4b32b28ac83fe2e8aad0366c799490e520b4b87a/ontobio/golr/golr_query.py#L765-L783
My problem is that filtering on categories (and I'm removing the second disjunct for simplicity) isn't working. When I set the filter query to be ['(subject:"NCBIGene:84570" AND object_category:"biological process")']
, I expect to get associations where the subject
is NCBIGene:84570
and the object_category
contains "biological process"
, but instead I'm getting a lot of associations where the object_category
is simply "cellular component"
.
Even more odd, when I instead set the object_category
to "cellular component"
, no associations are returned.
On the other hand when I instead search for the object_category
being "gene"
, the filter seems to work perfectly well.
Hello,
I got an error when I tried to pip-install the ontobio package. My environment is python3.4 on macOS Sierra. pip install -r requirements.txt
ended up with the same error. Any suggestion will be appreciated.
Here is the full log. Thank you.
kf$ pip install ontobio
Collecting ontobio
Using cached ontobio-0.2.15-py3-none-any.whl
Requirement already satisfied: scipy in ./anaconda/envs/ete3/lib/python3.5/site-packages (from ontobio)
Requirement already satisfied: pandas in ./anaconda/envs/ete3/lib/python3.5/site-packages (from ontobio)
Requirement already satisfied: sparqlwrapper in ./anaconda/envs/ete3/lib/python3.5/site-packages (from ontobio)
Collecting cachier (from ontobio)
Using cached cachier-1.2.1.tar.gz
Requirement already satisfied: requests in ./anaconda/envs/ete3/lib/python3.5/site-packages (from ontobio)
Requirement already satisfied: marshmallow in ./anaconda/envs/ete3/lib/python3.5/site-packages (from ontobio)
Requirement already satisfied: pysolr in ./anaconda/envs/ete3/lib/python3.5/site-packages (from ontobio)
Collecting prefixcommons (from ontobio)
Using cached prefixcommons-0.1.4-py3-none-any.whl
Requirement already satisfied: networkx in ./anaconda/envs/ete3/lib/python3.5/site-packages (from ontobio)
Requirement already satisfied: pyyaml in ./anaconda/envs/ete3/lib/python3.5/site-packages (from ontobio)
Requirement already satisfied: python-dateutil>=2 in ./anaconda/envs/ete3/lib/python3.5/site-packages (from pandas->ontobio)
Requirement already satisfied: pytz>=2011k in ./anaconda/envs/ete3/lib/python3.5/site-packages (from pandas->ontobio)
Requirement already satisfied: numpy>=1.7.0 in ./anaconda/envs/ete3/lib/python3.5/site-packages (from pandas->ontobio)
Requirement already satisfied: rdflib>=4.0 in ./anaconda/envs/ete3/lib/python3.5/site-packages (from sparqlwrapper->ontobio)
Collecting watchdog (from cachier->ontobio)
Using cached watchdog-0.8.3.tar.gz
Collecting portalocker (from cachier->ontobio)
Using cached portalocker-1.1.0-py2.py3-none-any.whl
Requirement already satisfied: urllib3<1.23,>=1.21.1 in ./anaconda/envs/ete3/lib/python3.5/site-packages (from requests->ontobio)
Requirement already satisfied: certifi>=2017.4.17 in ./anaconda/envs/ete3/lib/python3.5/site-packages (from requests->ontobio)
Requirement already satisfied: idna<2.6,>=2.5 in ./anaconda/envs/ete3/lib/python3.5/site-packages (from requests->ontobio)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in ./anaconda/envs/ete3/lib/python3.5/site-packages (from requests->ontobio)
Requirement already satisfied: decorator>=3.4.0 in ./anaconda/envs/ete3/lib/python3.5/site-packages (from networkx->ontobio)
Requirement already satisfied: six>=1.5 in ./anaconda/envs/ete3/lib/python3.5/site-packages (from python-dateutil>=2->pandas->ontobio)
Requirement already satisfied: pyparsing in ./anaconda/envs/ete3/lib/python3.5/site-packages (from rdflib>=4.0->sparqlwrapper->ontobio)
Requirement already satisfied: isodate in ./anaconda/envs/ete3/lib/python3.5/site-packages (from rdflib>=4.0->sparqlwrapper->ontobio)
Requirement already satisfied: argh>=0.24.1 in ./anaconda/envs/ete3/lib/python3.5/site-packages (from watchdog->cachier->ontobio)
Requirement already satisfied: pathtools>=0.1.1 in ./anaconda/envs/ete3/lib/python3.5/site-packages (from watchdog->cachier->ontobio)
Building wheels for collected packages: cachier, watchdog
Running setup.py bdist_wheel for cachier ... error
Complete output from command /Users/kf/anaconda/envs/ete3/bin/python -u -c "import setuptools, tokenize;__file__='/private/var/folders/6r/5jx5t6994xs90pl8s4_vddy00000gq/T/pip-build-ockkm040/cachier/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /var/folders/6r/5jx5t6994xs90pl8s4_vddy00000gq/T/tmplcmlpqh6pip-wheel- --python-tag cp35:
running bdist_wheel
error: invalid truth value ''
----------------------------------------
Failed building wheel for cachier
Running setup.py clean for cachier
Running setup.py bdist_wheel for watchdog ... error
Complete output from command /Users/kf/anaconda/envs/ete3/bin/python -u -c "import setuptools, tokenize;__file__='/private/var/folders/6r/5jx5t6994xs90pl8s4_vddy00000gq/T/pip-build-ockkm040/watchdog/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /var/folders/6r/5jx5t6994xs90pl8s4_vddy00000gq/T/tmpd1p82ytopip-wheel- --python-tag cp35:
running bdist_wheel
running build
running build_py
creating build
creating build/lib.macosx-10.7-x86_64-3.5
creating build/lib.macosx-10.7-x86_64-3.5/watchdog
copying src/watchdog/__init__.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog
copying src/watchdog/events.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog
copying src/watchdog/version.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog
copying src/watchdog/watchmedo.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog
creating build/lib.macosx-10.7-x86_64-3.5/watchdog/observers
copying src/watchdog/observers/__init__.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/observers
copying src/watchdog/observers/api.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/observers
copying src/watchdog/observers/fsevents.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/observers
copying src/watchdog/observers/fsevents2.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/observers
copying src/watchdog/observers/inotify.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/observers
copying src/watchdog/observers/inotify_buffer.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/observers
copying src/watchdog/observers/inotify_c.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/observers
copying src/watchdog/observers/kqueue.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/observers
copying src/watchdog/observers/polling.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/observers
copying src/watchdog/observers/read_directory_changes.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/observers
copying src/watchdog/observers/winapi.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/observers
creating build/lib.macosx-10.7-x86_64-3.5/watchdog/tricks
copying src/watchdog/tricks/__init__.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/tricks
creating build/lib.macosx-10.7-x86_64-3.5/watchdog/utils
copying src/watchdog/utils/__init__.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/utils
copying src/watchdog/utils/bricks.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/utils
copying src/watchdog/utils/compat.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/utils
copying src/watchdog/utils/decorators.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/utils
copying src/watchdog/utils/delayed_queue.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/utils
copying src/watchdog/utils/dirsnapshot.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/utils
copying src/watchdog/utils/echo.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/utils
copying src/watchdog/utils/event_backport.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/utils
copying src/watchdog/utils/importlib2.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/utils
copying src/watchdog/utils/platform.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/utils
copying src/watchdog/utils/unicode_paths.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/utils
copying src/watchdog/utils/win32stat.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/utils
running egg_info
writing entry points to src/watchdog.egg-info/entry_points.txt
writing top-level names to src/watchdog.egg-info/top_level.txt
writing src/watchdog.egg-info/PKG-INFO
writing requirements to src/watchdog.egg-info/requires.txt
writing dependency_links to src/watchdog.egg-info/dependency_links.txt
warning: manifest_maker: standard file '-c' not found
reading manifest file 'src/watchdog.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching '*.h' under directory 'src'
writing manifest file 'src/watchdog.egg-info/SOURCES.txt'
running build_ext
building '_watchdog_fsevents' extension
creating build/temp.macosx-10.7-x86_64-3.5
creating build/temp.macosx-10.7-x86_64-3.5/src
gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/Users/kf/anaconda/envs/ete3/include -arch x86_64 -DWATCHDOG_VERSION_STRING="0.8.3" -DWATCHDOG_VERSION_MAJOR=0 -DWATCHDOG_VERSION_MINOR=8 -DWATCHDOG_VERSION_BUILD=3 -I/Users/kf/anaconda/envs/ete3/include/python3.5m -c src/watchdog_fsevents.c -o build/temp.macosx-10.7-x86_64-3.5/src/watchdog_fsevents.o -std=c99 -pedantic -Wall -Wextra -fPIC -Wno-error=unused-command-line-argument-hard-error-in-future
cc1: error: -Werror=unused-command-line-argument-hard-error-in-future: no option -Wunused-command-line-argument-hard-error-in-future
error: command 'gcc' failed with exit status 1
----------------------------------------
Failed building wheel for watchdog
Running setup.py clean for watchdog
Failed to build cachier watchdog
Installing collected packages: watchdog, portalocker, cachier, prefixcommons, ontobio
Running setup.py install for watchdog ... error
Complete output from command /Users/kf/anaconda/envs/ete3/bin/python -u -c "import setuptools, tokenize;__file__='/private/var/folders/6r/5jx5t6994xs90pl8s4_vddy00000gq/T/pip-build-ockkm040/watchdog/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /var/folders/6r/5jx5t6994xs90pl8s4_vddy00000gq/T/pip-p7lswqcr-record/install-record.txt --single-version-externally-managed --compile:
running install
running build
running build_py
creating build
creating build/lib.macosx-10.7-x86_64-3.5
creating build/lib.macosx-10.7-x86_64-3.5/watchdog
copying src/watchdog/__init__.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog
copying src/watchdog/events.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog
copying src/watchdog/version.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog
copying src/watchdog/watchmedo.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog
creating build/lib.macosx-10.7-x86_64-3.5/watchdog/observers
copying src/watchdog/observers/__init__.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/observers
copying src/watchdog/observers/api.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/observers
copying src/watchdog/observers/fsevents.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/observers
copying src/watchdog/observers/fsevents2.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/observers
copying src/watchdog/observers/inotify.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/observers
copying src/watchdog/observers/inotify_buffer.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/observers
copying src/watchdog/observers/inotify_c.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/observers
copying src/watchdog/observers/kqueue.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/observers
copying src/watchdog/observers/polling.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/observers
copying src/watchdog/observers/read_directory_changes.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/observers
copying src/watchdog/observers/winapi.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/observers
creating build/lib.macosx-10.7-x86_64-3.5/watchdog/tricks
copying src/watchdog/tricks/__init__.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/tricks
creating build/lib.macosx-10.7-x86_64-3.5/watchdog/utils
copying src/watchdog/utils/__init__.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/utils
copying src/watchdog/utils/bricks.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/utils
copying src/watchdog/utils/compat.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/utils
copying src/watchdog/utils/decorators.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/utils
copying src/watchdog/utils/delayed_queue.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/utils
copying src/watchdog/utils/dirsnapshot.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/utils
copying src/watchdog/utils/echo.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/utils
copying src/watchdog/utils/event_backport.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/utils
copying src/watchdog/utils/importlib2.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/utils
copying src/watchdog/utils/platform.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/utils
copying src/watchdog/utils/unicode_paths.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/utils
copying src/watchdog/utils/win32stat.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/utils
running egg_info
writing entry points to src/watchdog.egg-info/entry_points.txt
writing top-level names to src/watchdog.egg-info/top_level.txt
writing requirements to src/watchdog.egg-info/requires.txt
writing dependency_links to src/watchdog.egg-info/dependency_links.txt
writing src/watchdog.egg-info/PKG-INFO
warning: manifest_maker: standard file '-c' not found
reading manifest file 'src/watchdog.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching '*.h' under directory 'src'
writing manifest file 'src/watchdog.egg-info/SOURCES.txt'
running build_ext
building '_watchdog_fsevents' extension
creating build/temp.macosx-10.7-x86_64-3.5
creating build/temp.macosx-10.7-x86_64-3.5/src
gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/Users/kf/anaconda/envs/ete3/include -arch x86_64 -DWATCHDOG_VERSION_STRING="0.8.3" -DWATCHDOG_VERSION_MAJOR=0 -DWATCHDOG_VERSION_MINOR=8 -DWATCHDOG_VERSION_BUILD=3 -I/Users/kf/anaconda/envs/ete3/include/python3.5m -c src/watchdog_fsevents.c -o build/temp.macosx-10.7-x86_64-3.5/src/watchdog_fsevents.o -std=c99 -pedantic -Wall -Wextra -fPIC -Wno-error=unused-command-line-argument-hard-error-in-future
cc1: error: -Werror=unused-command-line-argument-hard-error-in-future: no option -Wunused-command-line-argument-hard-error-in-future
error: command 'gcc' failed with exit status 1
----------------------------------------
Command "/Users/kf/anaconda/envs/ete3/bin/python -u -c "import setuptools, tokenize;__file__='/private/var/folders/6r/5jx5t6994xs90pl8s4_vddy00000gq/T/pip-build-ockkm040/watchdog/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /var/folders/6r/5jx5t6994xs90pl8s4_vddy00000gq/T/pip-p7lswqcr-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /private/var/folders/6r/5jx5t6994xs90pl8s4_vddy00000gq/T/pip-build-ockkm040/watchdog/
I guess it's used in AssocsParser, in parse()
, but what is it that parse_line()
is building up? And then who consumes the assocs array in parse()
? I see that in ogr-parser-assocs.py
uses it but then only the report is looked at, and the assocs array seems to be thrown out. We could make the dictionary that parse_line()
returns some kind of class/object so that it's easier to read what it's place is maybe?
Right now, if someone makes ontobio.golr.GolrSearchQuery
and there is no config file present (default location or otherwise) the "no config file" default config code gets run, ultimately in ontobio.config.get_config()
. This has the code:
if os.path.isfile(path):
logging.info("LOADING FROM: {}".format(path))
session.config = load_config(path)
else:
session.config = Session()
logging.info("using default session: {}, path does not exist: {}".format(session, path))
and we see that in the first line of the else
block session.config
is being set to a blank Session
object. The actual error is present occasionally in tests and in this ticket: #64: AttributeError: 'Session' object has no attribute 'solr_search'
. The ontobio.config.config.Config
class does have solr_search, so I assume that line is supposed to read: session.config = Config()
.
We should instead assign a new Config()
rather than Session()
.
would be good to have a bit more documentation on each (@kltm and @kshefchek respectively)
The Monarch concept of an evidence graph generalizes the GO GAF evidence model. The latter allows only one link in a chain. This is actually a frequent issue for GO, where we have long needed to represent chains of two or more pieces of evidence. This is also unsatisfactory when we collapse a GOCAM to a GPAD, we have to find the one link the curator finds most pertinent (cc @balhoff). If GPADs had supported chains from the beginning this would be easier. The Monarch concept of an evidence graph generalizes chains further allowing arbitrary graphs connecting a source/subject to an object/sink (though in practice many such chains are links of length 2).
The Monarch evidence graph is represented as a bbop graph, which is stored as a string in solr. For example this query
see the link between TBX5 and atrial fibrillation. This is actually inferred from the asserted graph below:
As can be seen there are 2 links in the chain of inference. Only one has evidence asserted (the link between the variant and the gene is taken as true here).
in addition to the full evidence graph, the monarch solr schema pattern has convenience fields that list all the nodes in the evidence graph
"evidence_object": [
"MONARCH:e95f810c91005264",
"PMID:28416818",
"dbSNP:rs883079",
"ECO:0000213",
"PMID:28416822",
"EFO:0000275",
"NCBIGene:6910"
],
"evidence_object_label": [
"rs883079-C",
"MONARCH:e95f810c91005264",
"atrial fibrillation",
"PMID:28416818",
"TBX5",
"PMID:28416822",
"combinatorial evidence used in automatic assertion"
],```
there isn't a convenience field specifically for the ECO class, this could be extracted formally from the graph using RO:0002558, or hackily by looking at the evidence_object list and taking the ECO: class. This would map to the single 'evidence' field using in AmiGO-GOlr.
When comparing two ontologies A & B, the lexmap script outputs term A then term B, and sometimes term B then term A. It's unexpected, although not a big problem, in case this just works better for lexmap. Example:
UBERON:2001805 articular bone http://purl.obolibrary.org/obo/NCIT_C13044 Joint 14.404102898078806 articular Articular 0.07228915662650602 0.0945945945945946 0 0.08561485166101274 4
http://purl.obolibrary.org/obo/NCIT_C13044 Joint UBERON:0004744 articular/anguloarticular 7.781658062556939 articular Articular 0.0945945945945946 0.06741573033707865 2 0.04625248135317029 4
Should run within pyvenv
So on an ontobio-parse-assocs.py run, with the --filter-out IEA
option, we should build a file with all the lines included, and one called group_name_noiea.gaf.
In _validate_taxon()
if in the config there are no valid taxons, or self.config.valid_taxa is None
we accept all taxons. This seems counterintuitive, since wouldn't that mean that there does not exist a taxon that is valid?
Instead of just a warning?
Different ontologies have different blacklist subsets; GO has 2
Documented here:
https://github.com/geneontology/go-site/blob/master/metadata/rules/gorule-0000008.md
Let's start with a GO-specific check at parse time. It won't affect other ontologies as only GO uses these subsets.
We'd like to do this in SPARTA but until mechanism in place let's do this parse-time
perhaps related to #64 - have the same error when installing with wheel and cachier, but installation finishes successfully.
However when trying to run the notebooks, I get the following error:
/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/cachier/mongo_core.py:24: UserWarning: Cachier warning: pymongo was not found. MongoDB cores will not work.
"Cachier warning: pymongo was not found. MongoDB cores will not work.")
I have not tried the workaround described by @lhannest at #64 (comment) as this is indicated as being unnecessary later in the thread.
What is the significance of 15 values? And then if we do have 15, why do we add two empty strings to the list, and always on the end?
from ontobio.ontol_factory import OntologyFactory
ofactory = OntologyFactory()
ont = ofactory.create("~/Downloads/go-plus.json")
=>
ERROR:root:Empty graph for '~/Downloads/go-plus.json' - did you use the correct id?
json freshly pulled from http://purl.obolibrary.org/obo/go/extensions/go-plus.json. Here's the head:
{
"graphs" : [ {
"nodes" : [ {
"id" : "http://purl.obolibrary.org/obo/GO_0099593",
"meta" : {
"definition" : {
"val" : "Fusion of an endocytosed synaptic vesicle with an endosome.",
"xrefs" : [ "GOC:dos" ]
},
"basicPropertyValues" : [ {
"pred" : "http://www.geneontology.org/formats/oboInOwl#hasOBONamespace",
"val" : "biological_process"
} ]
},
"type" : "CLASS",
"lbl" : "endocytosed synaptic vesicle to endosome fusion"
}, {
"id" : "http://purl.obolibrary.org/obo/GO_0099592",
"meta" : {
"definition" : {
"val" : "The process in which endocytosed synaptic vesicles fuse to the presynaptic endosome followed by sorting of synaptic vesicle components and budding of new synaptic vesicles.",
"xrefs" : [ "GOC:dos" ]
},
...
}
If one creates a GpiParser and attempts to parse the uncompressed attached zfin.gpi file below, the following error occurs:
Traceback (most recent call last):
File "tests/test_gpiparser.py", line 43, in <module>
run_the_zfin_thing()
File "tests/test_gpiparser.py", line 36, in run_the_zfin_thing
results = p.parse(open("zfin.gpi", "r"))
File "/Users/edouglass/lbl/biolink/ontobio/ontobio/io/entityparser.py", line 41, in parse
parsed_line, new_ents = self.parse_line(line)
File "/Users/edouglass/lbl/biolink/ontobio/ontobio/io/entityparser.py", line 107, in parse_line
properties] = vals
I used this code to run it:
def run_the_zfin_thing():
ont = OntologyFactory().create("go-ontology.json")
p = GpiParser()
p.config.remove_double_prefixes = True
results = p.parse(open("zfin.gpi", "r"))
for r in results:
print(r)
print(p.report.to_markdown())
It's not clear yet if the file is at fault or if it's the parser. But in any case the parser should handle the wrong number of columns more gracefully.
Attached:
zfin.gpi.zip
I'm getting an error when trying to create a new AssociationSetFactory:
from ontobio.ontol_factory import OntologyFactory
from ontobio.assoc_factory import AssociationSetFactory
HUMAN = 'NCBITaxon:9606'
ofactory = OntologyFactory()
afactory = AssociationSetFactory()
ont = ofactory.create('mondo')
aset = afactory.create(ontology=ont,
subject_category='disease',
object_category='phenotype',
taxon=HUMAN)
Error:
/Users/marcin/anaconda/lib/python3.6/site-packages/cachier/mongo_core.py:24: UserWarning: Cachier warning: pymongo was not found. MongoDB cores will not work.
"Cachier warning: pymongo was not found. MongoDB cores will not work.")
Traceback (most recent call last):
File "enrichment.py", line 16, in
taxon=HUMAN)
File "/Users/marcin/anaconda/lib/python3.6/site-packages/ontobio/assoc_factory.py", line 62, in create
taxon=taxon)
File "/Users/marcin/anaconda/lib/python3.6/site-packages/cachier/core.py", line 178, in func_wrapper
return _calc_entry(core, key, func, args, kwds)
File "/Users/marcin/anaconda/lib/python3.6/site-packages/cachier/core.py", line 78, in _calc_entry
func_res = func(*args, **kwds)
File "/Users/marcin/anaconda/lib/python3.6/site-packages/ontobio/assoc_factory.py", line 152, in bulk_fetch_cached
return bulk_fetch(**args)
File "/Users/marcin/anaconda/lib/python3.6/site-packages/ontobio/golr/golr_associations.py", line 146, in bulk_fetch
**kwargs)
File "/Users/marcin/anaconda/lib/python3.6/site-packages/ontobio/golr/golr_associations.py", line 77, in search_associations_compact
**kwargs
File "/Users/marcin/anaconda/lib/python3.6/site-packages/ontobio/golr/golr_associations.py", line 33, in search_associations
return q.exec()
File "/Users/marcin/anaconda/lib/python3.6/site-packages/ontobio/golr/golr_query.py", line 846, in exec
params = self.solr_params()
File "/Users/marcin/anaconda/lib/python3.6/site-packages/ontobio/golr/golr_query.py", line 571, in solr_params
self._set_solr(self.get_config().solr_assocs)
AttributeError: 'Session' object has no attribute 'solr_assocs'
This should work
ofact.create('http://purl.obolibrary.org/obo/go/go.json')
currently it assumes any remote file is owl
Need to support the directionless association search required for beacons. For inputs sources
, targets
, relationships
, semanticGroups
, keywords
, all being lists of strings, the output should be a set of associations where either the subject or object is identified by one of the sources
.
The other parameters are filters. The relata that is not identified as a source must be identified by one of the targets
if any are provided. And all relata not identified by one of the sources
must have their semantic type be filtered on semanticGroups
and name filtered on keywords
if provided. The source is not filtered by semanticGroups
. And the association name itself is filtered by relationships
if provided.
Previously I had added the subjects_or_objects parameter before to GolrAssociationQuery to get directionless association searches, but this didn't achieve the target filtering.
It seems that the if
and the elif
on line 264 is a little redundant, since the elif will always accept anything that the if accepts. And they're two different ways of downloading urls. Should we just pick the one that uses requests and saves the data into an in memory file as that's general and less prone to failure (since there's no writing to disk)?
I'm trying to get a count of how many entities fall into each semantic category with GolrSearchQuery(rows=0, facet_fields=['category']).exec()
, without actually returning any of the entities. I'm getting this as a response:
{
"docs": [],
"facet_counts": {
"category": {
"Phenotype": 10,
"disease": 89,
"gene": 8
}
},
"highlighting": {},
"pagination": {}
}
This must be incorrect, there's much more data than that. Maybe am I using the class incorrectly?
Tests should be split into Integration Tests and Unit tests
Tests should be written so as to not be affected by cashier (a caching module). Behavior seems to change when caching is turned on or off.
MGI GAF:
MGI MGI:1354727 Smtn GO:0060452 MGI:MGI:4437553|PMID:18678771 IMP MGI:MGI:3613292 P smoothelin protein taxon:10090 20170906 MGI occurs_in(EMAPA:35342),occurs_in(MA:0002039 TS28)
MGI MGI:1354727 Smtn GO:0060452 MGI:MGI:4437553|PMID:18678771 IMP MGI:MGI:3613293 P smoothelin protein taxon:10090 20170906 MGI occurs_in(EMAPA:35342),occurs_in(MA:0002039 TS28)
the contents of the parentheses should be an ID, no spaces.
In these case we can assume the core annotation is good, but either the entire extension conjunction, or the offending element must be removed, with suitable warning reported
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.