biolink / ontobio Goto Github PK

View Code? Open in Web Editor NEW

117.0 21.0 30.0 59.8 MB

python library for working with ontologies and ontology associations

Home Page: https://ontobio.readthedocs.io/en/latest/

License: BSD 3-Clause "New" or "Revised" License

Python 56.50% Makefile 0.11% Shell 0.03% Jupyter Notebook 43.37%

ontology obo sparql graph bioinformatics rdf obofoundry semantic-web gene-ontology ontology-tools

ontobio's People

Contributors

Stargazers

Watchers

ontobio's Issues

Association parsers should return generators

Add a new method called something like parse_iter, this should yield results, returning a generator.

Code can then call

for a in p.parse_iter():
    ...

And have the file be lazily parsed.

Note: the currently implementation of parse() should be moved to the iter method, yields added, and then:

def parse(self, ...):
   return list(self.parse_iter(...

Each GAF line should have a unique and deterministic URI

A gaf file should produce a graph URI based on the group name.
So for Gaf group 'goa' and dataset 'chicken_complex', produce a URI http://www.ebi.ac.uk/GOA/chicken_complex/.

For a GAF line, we can produce a unique id with a hash function on that line, after any "piped" columns are sorted (alphanumerically?). Then for an instance of a term in a column of the gaf line, we use the id of the final identifier: {graphUri}/{lineHash}/{termId}.
For example, for goa chicken_complex:
http://www.ebi.ac.uk/GOA/chicken_complex/123abc/GO:0016021

Fix all init to avoid default array or dicts

See
monarch-initiative/biolink-api#145 (comment)

Every time we have a ARG={} or ARG=[] in an init, it should be changed to None, and in the body explicitly check for None arguments

assocwriter should be responsible for writing headers and gaf versions

GafWriter and GafParser need to know how to filter out evidence associations, and put headers with versions in each file.

pip install of ontobio-0.2.10.tar.gz fails

This is on mac/os.. pandoc is installed, but this seems like a heavy-weight requirement.

`
pip install ontobio
Collecting ontobio
Downloading ontobio-0.2.10.tar.gz (78kB)
100% |████████████████████████████████| 81kB 1.3MB/s
Complete output from command python setup.py egg_info:
pandoc: /private/var/folders/_z/t1f6j2317kq4fcspgcyd2_780000gn/T/pip-build-iry9_6/ontobio/README.md: openFile: does not exist (No such file or directory)
README.md conversion to reStructuredText failed. Error:
Command '('pandoc', '--from', 'markdown', '--to', 'rst', '/private/var/folders/_z/t1f6j2317kq4fcspgcyd2_780000gn/T/pip-build-iry9_6/ontobio/README.md')' returned non-zero exit status 1
Traceback (most recent call last):
File "", line 1, in
File "/private/var/folders/_z/t1f6j2317kq4fcspgcyd2_780000gn/T/pip-build-iry9_6/ontobio/setup.py", line 28, in
with open(readme_path) as read_file:
IOError: [Errno 2] No such file or directory: '/private/var/folders/_z/t1f6j2317kq4fcspgcyd2_780000gn/T/pip-build-iry9_6/ontobio/README.md'

----------------------------------------

Command "python setup.py egg_info" failed with error code 1 in /private/var/folders/_z/t1f6j2317kq4fcspgcyd2_780000gn/T/pip-build-iry9_6/ontobio/
`

GolrAssociationQuery 'relation' argument limitation?

The 'relation' argument of GolrAssociationQuery() appears to filter on the names of relations, whereas, each such relation generally has a CURIE attached to it (e.g. "interacts with" is ontology term "RO:0002434". It would generally be more precise to be able to filter on relation_id (i.e. "RO:0002434") or even, an array of such relation_ids?

Exported GAFs should have a version header

each file should start with:

!gaf-version: 2.1

note: both the main output and the evidence filtered version

switch from cachier to diskcache

diskcache on pypi

Homolog object is missing taxon key

For this query
http://localhost:8888/api/bioentity/gene/HGNC:11593/homologs/?homology_type=O&fetch_objects=false

The JSON returned is this and clearly the taxon for zfin is missing :
{'id': '11d97a53-bf91-477e-89e2-340f94e8264e', 'subject': {'id': 'NCBIGene:347853', 'label': 'TBX10', 'taxon': {'id': 'NCBITaxon:9606', 'label': 'Homo sapiens'}}, 'object': {'id': 'ZFIN:ZDB-GENE-121228-1', 'label': 'tbx10'}, 'negated': False, 'relation': {'id': 'RO:HOM0000017', 'label': 'in orthology relationship with'}, 'publications': [{'id': 'PMID:22915831'}], 'provided_by': ['https://data.monarchinitiative.org/ttl/zfin.ttl'], 'evidence': ['MONARCH:ba21515e9db29392', 'PMID:22915831', 'ECO:0000031', 'NCBIGene:347853', 'ZFIN:ZDB-GENE-121228-1'], 'evidence_graph': {'nodes': [{'id': 'ECO:0000031', 'lbl': 'protein BLAST evidence used in manual assertion', 'meta': {}}, {'id': 'MONARCH:ba21515e9db29392', 'lbl': None, 'meta': {}}, {'id': 'ZFIN:ZDB-GENE-121228-1', 'lbl': 'tbx10', 'meta': {}}, {'id': 'PMID:22915831', 'lbl': 'Ahn et al; Evolution of the Tbx6/16 Subfamily Genes in Vertebrates: Insights from Zebrafish; Mol. Biol. Evol.; 2012; 29(12); 3959-3983', 'meta': {}}, {'id': 'NCBIGene:347853', 'lbl': 'TBX10', 'meta': {}}], 'edges': [{'sub': 'MONARCH:ba21515e9db29392', 'obj': 'PMID:22915831', 'pred': 'dc:source', 'meta': {'isDefinedBy': ['https://data.monarchinitiative.org/ttl/zfin.ttl'], 'lbl': ['Source'], 'equivalentOriginalNodeTarget': ['http://zfin.org/ZDB-PUB-120830-8']}}, {'sub': 'ZFIN:ZDB-GENE-121228-1', 'obj': 'NCBIGene:347853', 'pred': 'RO:HOM0000017', 'meta': {'owlType': ['Annotation'], 'isDefinedBy': ['https://data.monarchinitiative.org/ttl/zfin.ttl'], 'lbl': ['in orthology relationship with']}}, {'sub': 'MONARCH:ba21515e9db29392', 'obj': 'NCBIGene:347853', 'pred': 'OBAN:association_has_object', 'meta': {'isDefinedBy': ['https://data.monarchinitiative.org/ttl/zfin.ttl'], 'lbl': ['association has object']}}, {'sub': 'MONARCH:ba21515e9db29392', 'obj': 'PMID:22915831', 'pred': 'dc:source', 'meta': {'isDefinedBy': ['https://data.monarchinitiative.org/ttl/zfin.ttl'], 'lbl': ['Source']}}, {'sub': 'MONARCH:ba21515e9db29392', 'obj': 'ZFIN:ZDB-GENE-121228-1', 'pred': 'OBAN:association_has_subject', 'meta': {'isDefinedBy': ['https://data.monarchinitiative.org/ttl/zfin.ttl'], 'lbl': ['association has subject']}}, {'sub': 'MONARCH:ba21515e9db29392', 'obj': 'ECO:0000031', 'pred': 'RO:0002558', 'meta': {'isDefinedBy': ['https://data.monarchinitiative.org/ttl/zfin.ttl'], 'lbl': ['has evidence']}}], 'meta': {'query': 'monarch:cypher/gene-homology.yaml'}}}

Question: Dictionary literals or constructors?

I think Dictionary literals look clearer.

Support lazier remote access

ontologies can be local (filesystem) or remote (currently SPARQL service).

currently most remote access is eager - given a parameter such as ontology (e.g. go) or focus node, fetch graph into memory (and cache for future access).

This is intended to support core use cases which require processing of entire ontologies, it's most efficient to do this in-memory.

But for other use cases this tradeoff doesn't make sense - e.g. search query across all ontologies, quick ad-hoc ancestor query on ncbitaxon.

We should add better support for lazy access to remote services, focusing on SPARQL and Neo4J (bolt or SicGraph service layer)

Assocmodel: extensions datamodel should support disjunctions

Currently the model assumes extensions are a conjunctive list of relational expressions. Annots with disjunctive expressions are translated to multiple annots. This is semantically equivalent but problematic as number of annots change.

The model will change to support disjunctions

Make sure AssocParser.report_ontology_id() is being called on produced associations

In Gpad, Gaf, and Hpoa.

Default subcommand in ontobio-parse-assocs.py

If no positional argument (subcommands) are given, the script just fails. We should instead either fail politely or have a default subcommand that just runs (perhaps validate?).

implement --no-iea option for ontobio parsing

See "Mike's Script" go-site/pipeline/utils/new-filter-gaf.pl, the --noiea-file option.

We should have this be a --filter-{evidence code} and we just filter out any lines of the gaf with the given evidence code from the output file.

Odd KeyError using subontology()

So my code using ontobio started throwing this error about an hour ago:
KeyError: 'GO:0001005'

I traced it to the subontology method in sparql_ontology.py looking up "text definitions" and then trying to set metadata for the definition's corresponding node in the GO graph (created using: ontology = OntologyFactory().create("go")).

This seems to be caused by a discrepancy between terms contained in the go ontology graph and terms returned by the SPARQL query in sparql_ontol_util.fetchall_textdefs("go"), e.g. 'GO:0001005' gets returned in the text definitions list but isn't returned when the GO graph was first initialized.

Since this term GO:0001005 and three others were merged into GO:0001004 just fourteen days ago (geneontology/go-ontology#14852), I'm suspecting this is a data issue, but I could be way off about that. Any ideas what's going on? Let me know if you guys want me to push my current code up to my github.

Thanks!

-Dustin

BTW, here's the full trace:
Traceback (most recent call last): File "pombase_golr_query.py", line 137, in <module> tad = TermAnnotationDictionary(aset) File "/Users/ebertdu/go/go-pombase/pombase_direct_bp_annots_query.py", line 92, in __init__ ancestor_bps = get_ancestor_bps(object_id) File "/Users/ebertdu/go/go-pombase/pombase_direct_bp_annots_query.py", line 77, in get_ancestor_bps for ancestor in get_ancestors(mf_go_term): File "/Users/ebertdu/go/go-pombase/pombase_direct_bp_annots_query.py", line 44, in get_ancestors subont = onto.subontology(all_ancestors) File "/Users/ebertdu/go/go-pombase/ontobio/sparql/sparql_ontology.py", line 112, in subontology self.all_text_definitions() File "/Users/ebertdu/go/go-pombase/ontobio/sparql/sparql_ontology.py", line 68, in all_text_definitions self.add_text_definition(td) File "/Users/ebertdu/go/go-pombase/ontobio/ontol.py", line 777, in add_text_definition self._add_meta_element(textdef.subject, 'definition', textdef.as_dict()) File "/Users/ebertdu/go/go-pombase/ontobio/ontol.py", line 785, in _add_meta_element n = self.node(id) File "/Users/ebertdu/go/go-pombase/ontobio/ontol.py", line 336, in node return self.get_graph().node[id] KeyError: 'GO:0001005'

gafparser: Ensure nots skipped when skimming

Documentation needed for GraphRenderer

Very interested to try using ontobio for generating graph vizualisations. Doc seems to be lacking.

e.g. nothing much here:

http://ontobio.readthedocs.io/en/latest/outputs.html#graphviz-output

or here:

In [60]: GraphRenderer?
Init signature: GraphRenderer(outfile=None, config=None, **args)
Docstring: base class for writing networkx graphs
File: /repos/ontobio/ontobio/io/ontol_renderers.py
Type: type

Where should I start?

add ability to retrieve only by certain evidence types for associations

Seems to work differently, but will check again when my glucose levels are higher.

Suppress cachier warning about mongodb and warning about default ontology

When ontobio is imported the user gets

> UserWarning: Cachier warning: pymongo was not found. MongoDB cores will
> not work.
>   "Cachier warning: pymongo was not found. MongoDB cores will not work.")

and

> ERROR:root:Empty graph for 'cache/ontologies/pato.json' - did you use
> the correct id?

The former may be an upstream for cachier

Round tripping the namespace

Scenario is that the client initiates a bunch of queries in parallel. One of them was the initial query prior to collecting a set (e.g. orthologs of the query gene) of related bioentities. When the batch of responses are received for the parallel queries the only way to pick out the initial query is by matching on the IDs, including the name space, but since this has been swapped out it can't be string matched. Thoughts?

Generate documentation (e.g. using pydoc) and publish

support for networkx 2.0

https://networkx.github.io/documentation/stable/release/migration_guide_from_1.x_to_2.0.html

configurations are not installed when installing ontobio

I'm running on Ubuntu 14.04, using python3:

(venv) lance@Lance-PC:~/git/2017/ontobio_test$ python --version
Python 3.4.3
(venv) lance@Lance-PC:~/git/2017/ontobio_test$ pip --version
pip 9.0.1 from /home/lance/git/2017/ontobio_test/venv/lib/python3.4/site-packages (python 3.4)

(venv) lance@Lance-PC:~/git/2017/ontobio_test$ pip install ontobio
Collecting ontobio
  Using cached ontobio-0.2.10-py3-none-any.whl
Collecting scipy (from ontobio)
  Using cached scipy-0.19.1-cp34-cp34m-manylinux1_x86_64.whl
Collecting cachier (from ontobio)
  Using cached cachier-1.2.1.tar.gz
Collecting networkx (from ontobio)
  Using cached networkx-1.11-py2.py3-none-any.whl
Collecting pyyaml (from ontobio)
Collecting marshmallow (from ontobio)
  Using cached marshmallow-2.13.5-py2.py3-none-any.whl
Collecting requests (from ontobio)
  Using cached requests-2.18.1-py2.py3-none-any.whl
Collecting prefixcommons (from ontobio)
  Using cached prefixcommons-0.1.4-py3-none-any.whl
Collecting pysolr (from ontobio)
  Using cached pysolr-3.6.0-py2.py3-none-any.whl
Collecting sparqlwrapper (from ontobio)
Collecting numpy>=1.8.2 (from scipy->ontobio)
  Using cached numpy-1.13.0-cp34-cp34m-manylinux1_x86_64.whl
Collecting watchdog (from cachier->ontobio)
Collecting portalocker (from cachier->ontobio)
  Using cached portalocker-1.1.0-py2.py3-none-any.whl
Collecting decorator>=3.4.0 (from networkx->ontobio)
  Using cached decorator-4.0.11-py2.py3-none-any.whl
Collecting certifi>=2017.4.17 (from requests->ontobio)
  Using cached certifi-2017.4.17-py2.py3-none-any.whl
Collecting urllib3<1.22,>=1.21.1 (from requests->ontobio)
  Using cached urllib3-1.21.1-py2.py3-none-any.whl
Collecting chardet<3.1.0,>=3.0.2 (from requests->ontobio)
  Using cached chardet-3.0.4-py2.py3-none-any.whl
Collecting idna<2.6,>=2.5 (from requests->ontobio)
  Using cached idna-2.5-py2.py3-none-any.whl
Collecting rdflib>=4.0 (from sparqlwrapper->ontobio)
  Using cached rdflib-4.2.2-py3-none-any.whl
Collecting pathtools>=0.1.1 (from watchdog->cachier->ontobio)
Collecting argh>=0.24.1 (from watchdog->cachier->ontobio)
  Using cached argh-0.26.2-py2.py3-none-any.whl
Collecting pyparsing (from rdflib>=4.0->sparqlwrapper->ontobio)
  Using cached pyparsing-2.2.0-py2.py3-none-any.whl
Collecting isodate (from rdflib>=4.0->sparqlwrapper->ontobio)
Building wheels for collected packages: cachier
  Running setup.py bdist_wheel for cachier ... error
  Complete output from command /home/lance/git/2017/ontobio_test/venv/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-r5b9tydd/cachier/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/tmpwthrce5opip-wheel- --python-tag cp34:
  running bdist_wheel
  error: invalid truth value ''
  
  ----------------------------------------
  Failed building wheel for cachier
  Running setup.py clean for cachier
Failed to build cachier
Installing collected packages: numpy, scipy, pathtools, argh, pyyaml, watchdog, portalocker, cachier, decorator, networkx, marshmallow, certifi, urllib3, chardet, idna, requests, prefixcommons, pysolr, pyparsing, isodate, rdflib, sparqlwrapper, ontobio
  Running setup.py install for cachier ... done
Successfully installed argh-0.26.2 cachier-1.2.1 certifi-2017.4.17 chardet-3.0.4 decorator-4.0.11 idna-2.5 isodate-0.5.4 marshmallow-2.13.5 networkx-1.11 numpy-1.13.0 ontobio-0.2.10 pathtools-0.1.2 portalocker-1.1.0 prefixcommons-0.1.4 pyparsing-2.2.0 pysolr-3.6.0 pyyaml-3.12 rdflib-4.2.2 requests-2.18.1 scipy-0.19.1 sparqlwrapper-1.8.0 urllib3-1.21.1 watchdog-0.8.3

Now I try running app.py, which imports ontobio.

(venv) lance@Lance-PC:~/git/2017/ontobio_test$ python app.py 
/home/lance/git/2017/ontobio_test/venv/lib/python3.4/site-packages/cachier/mongo_core.py:24: UserWarning: Cachier warning: pymongo was not found. MongoDB cores will not work.
  "Cachier warning: pymongo was not found. MongoDB cores will not work.")
Traceback (most recent call last):
  File "app.py", line 10, in <module>
    results = q.exec()
  File "/home/lance/git/2017/ontobio_test/venv/lib/python3.4/site-packages/ontobio/golr/golr_query.py", line 349, in exec
    params = self.solr_params()
  File "/home/lance/git/2017/ontobio_test/venv/lib/python3.4/site-packages/ontobio/golr/golr_query.py", line 304, in solr_params
    self._set_solr(self.get_config().solr_search)
AttributeError: 'Session' object has no attribute 'solr_search'

And this is app.py:

from ontobio.golr.golr_query import GolrSearchQuery

q = GolrSearchQuery(
        term='diabetes',
        category=['disease', 'gene'],
        rows=5,
        start=1
)

results = q.exec()

for d in results['docs']:
    print(d)

As you can see when I run app.py I get a warning about pymongo, but installing it doesn't make a difference (the warning just goes away).

Odd identifier for protein

When traversing the graph looking for the proteins that encode "WB:WBGene00017137", one of the nodes returned has an identifier with the value ":.well-known/genid/NCBIGene65018product"

Looks like a parsing error to me. Plus this is the human homolog, PTEN induced putative kinase 1 [ Homo sapiens (human) ] , not the wormbase protein.

GolrAssociationQuery() non_null_fields input parameter seems to be ignored?

calling

results= GolrAssociationQuery(
...
non_null_fields=['subject','relation','object']
)
(other args omitted for clarity)
I still get some results with subject : null

solr semantic category filter appears to not work

I am trying to modify GolrAssociationQuery to allow for searching for associations where either the subject or object match an ID, and then also filter by semantic categories. This filtering should only apply on the concept that doesn't match an ID. The logic looks like this:

filter_queries.append(
    '(' + subject_id_filter + ' AND ' + object_category_filter + ')'    \
    ' OR '                                                              \
    '(' + object_id_filter + ' AND ' + subject_category_filter + ')'
)

Here I have implemented this, and you can see how I'm building these filters: https://github.com/lhannest/ontobio/blob/4b32b28ac83fe2e8aad0366c799490e520b4b87a/ontobio/golr/golr_query.py#L765-L783

My problem is that filtering on categories (and I'm removing the second disjunct for simplicity) isn't working. When I set the filter query to be ['(subject:"NCBIGene:84570" AND object_category:"biological process")'], I expect to get associations where the subject is NCBIGene:84570 and the object_category contains "biological process", but instead I'm getting a lot of associations where the object_category is simply "cellular component".

Even more odd, when I instead set the object_category to "cellular component", no associations are returned.

On the other hand when I instead search for the object_category being "gene", the filter seems to work perfectly well.

pip installation failed

Hello,

I got an error when I tried to pip-install the ontobio package. My environment is python3.4 on macOS Sierra. pip install -r requirements.txt ended up with the same error. Any suggestion will be appreciated.

Here is the full log. Thank you.

kf$ pip install ontobio
Collecting ontobio
  Using cached ontobio-0.2.15-py3-none-any.whl
Requirement already satisfied: scipy in ./anaconda/envs/ete3/lib/python3.5/site-packages (from ontobio)
Requirement already satisfied: pandas in ./anaconda/envs/ete3/lib/python3.5/site-packages (from ontobio)
Requirement already satisfied: sparqlwrapper in ./anaconda/envs/ete3/lib/python3.5/site-packages (from ontobio)
Collecting cachier (from ontobio)
  Using cached cachier-1.2.1.tar.gz
Requirement already satisfied: requests in ./anaconda/envs/ete3/lib/python3.5/site-packages (from ontobio)
Requirement already satisfied: marshmallow in ./anaconda/envs/ete3/lib/python3.5/site-packages (from ontobio)
Requirement already satisfied: pysolr in ./anaconda/envs/ete3/lib/python3.5/site-packages (from ontobio)
Collecting prefixcommons (from ontobio)
  Using cached prefixcommons-0.1.4-py3-none-any.whl
Requirement already satisfied: networkx in ./anaconda/envs/ete3/lib/python3.5/site-packages (from ontobio)
Requirement already satisfied: pyyaml in ./anaconda/envs/ete3/lib/python3.5/site-packages (from ontobio)
Requirement already satisfied: python-dateutil>=2 in ./anaconda/envs/ete3/lib/python3.5/site-packages (from pandas->ontobio)
Requirement already satisfied: pytz>=2011k in ./anaconda/envs/ete3/lib/python3.5/site-packages (from pandas->ontobio)
Requirement already satisfied: numpy>=1.7.0 in ./anaconda/envs/ete3/lib/python3.5/site-packages (from pandas->ontobio)
Requirement already satisfied: rdflib>=4.0 in ./anaconda/envs/ete3/lib/python3.5/site-packages (from sparqlwrapper->ontobio)
Collecting watchdog (from cachier->ontobio)
  Using cached watchdog-0.8.3.tar.gz
Collecting portalocker (from cachier->ontobio)
  Using cached portalocker-1.1.0-py2.py3-none-any.whl
Requirement already satisfied: urllib3<1.23,>=1.21.1 in ./anaconda/envs/ete3/lib/python3.5/site-packages (from requests->ontobio)
Requirement already satisfied: certifi>=2017.4.17 in ./anaconda/envs/ete3/lib/python3.5/site-packages (from requests->ontobio)
Requirement already satisfied: idna<2.6,>=2.5 in ./anaconda/envs/ete3/lib/python3.5/site-packages (from requests->ontobio)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in ./anaconda/envs/ete3/lib/python3.5/site-packages (from requests->ontobio)
Requirement already satisfied: decorator>=3.4.0 in ./anaconda/envs/ete3/lib/python3.5/site-packages (from networkx->ontobio)
Requirement already satisfied: six>=1.5 in ./anaconda/envs/ete3/lib/python3.5/site-packages (from python-dateutil>=2->pandas->ontobio)
Requirement already satisfied: pyparsing in ./anaconda/envs/ete3/lib/python3.5/site-packages (from rdflib>=4.0->sparqlwrapper->ontobio)
Requirement already satisfied: isodate in ./anaconda/envs/ete3/lib/python3.5/site-packages (from rdflib>=4.0->sparqlwrapper->ontobio)
Requirement already satisfied: argh>=0.24.1 in ./anaconda/envs/ete3/lib/python3.5/site-packages (from watchdog->cachier->ontobio)
Requirement already satisfied: pathtools>=0.1.1 in ./anaconda/envs/ete3/lib/python3.5/site-packages (from watchdog->cachier->ontobio)
Building wheels for collected packages: cachier, watchdog
  Running setup.py bdist_wheel for cachier ... error
  Complete output from command /Users/kf/anaconda/envs/ete3/bin/python -u -c "import setuptools, tokenize;__file__='/private/var/folders/6r/5jx5t6994xs90pl8s4_vddy00000gq/T/pip-build-ockkm040/cachier/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /var/folders/6r/5jx5t6994xs90pl8s4_vddy00000gq/T/tmplcmlpqh6pip-wheel- --python-tag cp35:
  running bdist_wheel
  error: invalid truth value ''
  
  ----------------------------------------
  Failed building wheel for cachier
  Running setup.py clean for cachier
  Running setup.py bdist_wheel for watchdog ... error
  Complete output from command /Users/kf/anaconda/envs/ete3/bin/python -u -c "import setuptools, tokenize;__file__='/private/var/folders/6r/5jx5t6994xs90pl8s4_vddy00000gq/T/pip-build-ockkm040/watchdog/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /var/folders/6r/5jx5t6994xs90pl8s4_vddy00000gq/T/tmpd1p82ytopip-wheel- --python-tag cp35:
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build/lib.macosx-10.7-x86_64-3.5
  creating build/lib.macosx-10.7-x86_64-3.5/watchdog
  copying src/watchdog/__init__.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog
  copying src/watchdog/events.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog
  copying src/watchdog/version.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog
  copying src/watchdog/watchmedo.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog
  creating build/lib.macosx-10.7-x86_64-3.5/watchdog/observers
  copying src/watchdog/observers/__init__.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/observers
  copying src/watchdog/observers/api.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/observers
  copying src/watchdog/observers/fsevents.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/observers
  copying src/watchdog/observers/fsevents2.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/observers
  copying src/watchdog/observers/inotify.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/observers
  copying src/watchdog/observers/inotify_buffer.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/observers
  copying src/watchdog/observers/inotify_c.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/observers
  copying src/watchdog/observers/kqueue.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/observers
  copying src/watchdog/observers/polling.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/observers
  copying src/watchdog/observers/read_directory_changes.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/observers
  copying src/watchdog/observers/winapi.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/observers
  creating build/lib.macosx-10.7-x86_64-3.5/watchdog/tricks
  copying src/watchdog/tricks/__init__.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/tricks
  creating build/lib.macosx-10.7-x86_64-3.5/watchdog/utils
  copying src/watchdog/utils/__init__.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/utils
  copying src/watchdog/utils/bricks.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/utils
  copying src/watchdog/utils/compat.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/utils
  copying src/watchdog/utils/decorators.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/utils
  copying src/watchdog/utils/delayed_queue.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/utils
  copying src/watchdog/utils/dirsnapshot.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/utils
  copying src/watchdog/utils/echo.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/utils
  copying src/watchdog/utils/event_backport.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/utils
  copying src/watchdog/utils/importlib2.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/utils
  copying src/watchdog/utils/platform.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/utils
  copying src/watchdog/utils/unicode_paths.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/utils
  copying src/watchdog/utils/win32stat.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/utils
  running egg_info
  writing entry points to src/watchdog.egg-info/entry_points.txt
  writing top-level names to src/watchdog.egg-info/top_level.txt
  writing src/watchdog.egg-info/PKG-INFO
  writing requirements to src/watchdog.egg-info/requires.txt
  writing dependency_links to src/watchdog.egg-info/dependency_links.txt
  warning: manifest_maker: standard file '-c' not found
  
  reading manifest file 'src/watchdog.egg-info/SOURCES.txt'
  reading manifest template 'MANIFEST.in'
  warning: no files found matching '*.h' under directory 'src'
  writing manifest file 'src/watchdog.egg-info/SOURCES.txt'
  running build_ext
  building '_watchdog_fsevents' extension
  creating build/temp.macosx-10.7-x86_64-3.5
  creating build/temp.macosx-10.7-x86_64-3.5/src
  gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/Users/kf/anaconda/envs/ete3/include -arch x86_64 -DWATCHDOG_VERSION_STRING="0.8.3" -DWATCHDOG_VERSION_MAJOR=0 -DWATCHDOG_VERSION_MINOR=8 -DWATCHDOG_VERSION_BUILD=3 -I/Users/kf/anaconda/envs/ete3/include/python3.5m -c src/watchdog_fsevents.c -o build/temp.macosx-10.7-x86_64-3.5/src/watchdog_fsevents.o -std=c99 -pedantic -Wall -Wextra -fPIC -Wno-error=unused-command-line-argument-hard-error-in-future
  cc1: error: -Werror=unused-command-line-argument-hard-error-in-future: no option -Wunused-command-line-argument-hard-error-in-future
  error: command 'gcc' failed with exit status 1
  
  ----------------------------------------
  Failed building wheel for watchdog
  Running setup.py clean for watchdog
Failed to build cachier watchdog
Installing collected packages: watchdog, portalocker, cachier, prefixcommons, ontobio
  Running setup.py install for watchdog ... error
    Complete output from command /Users/kf/anaconda/envs/ete3/bin/python -u -c "import setuptools, tokenize;__file__='/private/var/folders/6r/5jx5t6994xs90pl8s4_vddy00000gq/T/pip-build-ockkm040/watchdog/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /var/folders/6r/5jx5t6994xs90pl8s4_vddy00000gq/T/pip-p7lswqcr-record/install-record.txt --single-version-externally-managed --compile:
    running install
    running build
    running build_py
    creating build
    creating build/lib.macosx-10.7-x86_64-3.5
    creating build/lib.macosx-10.7-x86_64-3.5/watchdog
    copying src/watchdog/__init__.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog
    copying src/watchdog/events.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog
    copying src/watchdog/version.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog
    copying src/watchdog/watchmedo.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog
    creating build/lib.macosx-10.7-x86_64-3.5/watchdog/observers
    copying src/watchdog/observers/__init__.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/observers
    copying src/watchdog/observers/api.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/observers
    copying src/watchdog/observers/fsevents.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/observers
    copying src/watchdog/observers/fsevents2.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/observers
    copying src/watchdog/observers/inotify.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/observers
    copying src/watchdog/observers/inotify_buffer.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/observers
    copying src/watchdog/observers/inotify_c.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/observers
    copying src/watchdog/observers/kqueue.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/observers
    copying src/watchdog/observers/polling.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/observers
    copying src/watchdog/observers/read_directory_changes.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/observers
    copying src/watchdog/observers/winapi.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/observers
    creating build/lib.macosx-10.7-x86_64-3.5/watchdog/tricks
    copying src/watchdog/tricks/__init__.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/tricks
    creating build/lib.macosx-10.7-x86_64-3.5/watchdog/utils
    copying src/watchdog/utils/__init__.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/utils
    copying src/watchdog/utils/bricks.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/utils
    copying src/watchdog/utils/compat.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/utils
    copying src/watchdog/utils/decorators.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/utils
    copying src/watchdog/utils/delayed_queue.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/utils
    copying src/watchdog/utils/dirsnapshot.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/utils
    copying src/watchdog/utils/echo.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/utils
    copying src/watchdog/utils/event_backport.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/utils
    copying src/watchdog/utils/importlib2.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/utils
    copying src/watchdog/utils/platform.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/utils
    copying src/watchdog/utils/unicode_paths.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/utils
    copying src/watchdog/utils/win32stat.py -> build/lib.macosx-10.7-x86_64-3.5/watchdog/utils
    running egg_info
    writing entry points to src/watchdog.egg-info/entry_points.txt
    writing top-level names to src/watchdog.egg-info/top_level.txt
    writing requirements to src/watchdog.egg-info/requires.txt
    writing dependency_links to src/watchdog.egg-info/dependency_links.txt
    writing src/watchdog.egg-info/PKG-INFO
    warning: manifest_maker: standard file '-c' not found
    
    reading manifest file 'src/watchdog.egg-info/SOURCES.txt'
    reading manifest template 'MANIFEST.in'
    warning: no files found matching '*.h' under directory 'src'
    writing manifest file 'src/watchdog.egg-info/SOURCES.txt'
    running build_ext
    building '_watchdog_fsevents' extension
    creating build/temp.macosx-10.7-x86_64-3.5
    creating build/temp.macosx-10.7-x86_64-3.5/src
    gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/Users/kf/anaconda/envs/ete3/include -arch x86_64 -DWATCHDOG_VERSION_STRING="0.8.3" -DWATCHDOG_VERSION_MAJOR=0 -DWATCHDOG_VERSION_MINOR=8 -DWATCHDOG_VERSION_BUILD=3 -I/Users/kf/anaconda/envs/ete3/include/python3.5m -c src/watchdog_fsevents.c -o build/temp.macosx-10.7-x86_64-3.5/src/watchdog_fsevents.o -std=c99 -pedantic -Wall -Wextra -fPIC -Wno-error=unused-command-line-argument-hard-error-in-future
    cc1: error: -Werror=unused-command-line-argument-hard-error-in-future: no option -Wunused-command-line-argument-hard-error-in-future
    error: command 'gcc' failed with exit status 1
    
    ----------------------------------------
Command "/Users/kf/anaconda/envs/ete3/bin/python -u -c "import setuptools, tokenize;__file__='/private/var/folders/6r/5jx5t6994xs90pl8s4_vddy00000gq/T/pip-build-ockkm040/watchdog/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /var/folders/6r/5jx5t6994xs90pl8s4_vddy00000gq/T/pip-p7lswqcr-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /private/var/folders/6r/5jx5t6994xs90pl8s4_vddy00000gq/T/pip-build-ockkm040/watchdog/

What does parse_line( ) return and who consumes it?

I guess it's used in AssocsParser, in parse(), but what is it that parse_line() is building up? And then who consumes the assocs array in parse()? I see that in ogr-parser-assocs.py uses it but then only the report is looked at, and the assocs array seems to be thrown out. We could make the dictionary that parse_line() returns some kind of class/object so that it's easier to read what it's place is maybe?

ontobio.config.get_config() should set default config to Config(), not Session()

Right now, if someone makes ontobio.golr.GolrSearchQuery and there is no config file present (default location or otherwise) the "no config file" default config code gets run, ultimately in ontobio.config.get_config(). This has the code:

if os.path.isfile(path):
    logging.info("LOADING FROM: {}".format(path))
    session.config = load_config(path)
else:
    session.config = Session()
    logging.info("using default session: {}, path does not exist: {}".format(session, path))

and we see that in the first line of the else block session.config is being set to a blank Session object. The actual error is present occasionally in tests and in this ticket: #64: AttributeError: 'Session' object has no attribute 'solr_search'. The ontobio.config.config.Config class does have solr_search, so I assume that line is supposed to read: session.config = Config().

We should instead assign a new Config() rather than Session().

Bridge GO/AmiGO and Monarch evidence models

AmiGO evidence solr schema: https://github.com/geneontology/amigo/blob/6165f094fd3770c731b517ca754edfb537e9295e/metadata/ann-config.yaml#L299-L342
Monarch example evidence model: https://github.com/monarch-initiative/monarch-app/blob/e423d75358c9b06bb7b370da47bf6c623e1b8347/conf/golr-views/single-tab/disease-phenotype.yaml#L130-L156

would be good to have a bit more documentation on each (@kltm and @kshefchek respectively)

The Monarch concept of an evidence graph generalizes the GO GAF evidence model. The latter allows only one link in a chain. This is actually a frequent issue for GO, where we have long needed to represent chains of two or more pieces of evidence. This is also unsatisfactory when we collapse a GOCAM to a GPAD, we have to find the one link the curator finds most pertinent (cc @balhoff). If GPADs had supported chains from the beginning this would be easier. The Monarch concept of an evidence graph generalizes chains further allowing arbitrary graphs connecting a source/subject to an object/sink (though in practice many such chains are links of length 2).

The Monarch evidence graph is represented as a bbop graph, which is stored as a string in solr. For example this query

see the link between TBX5 and atrial fibrillation. This is actually inferred from the asserted graph below:

As can be seen there are 2 links in the chain of inference. Only one has evidence asserted (the link between the variant and the gene is taken as true here).

in addition to the full evidence graph, the monarch solr schema pattern has convenience fields that list all the nodes in the evidence graph

"evidence_object": [
"MONARCH:e95f810c91005264",
"PMID:28416818",
"dbSNP:rs883079",
"ECO:0000213",
"PMID:28416822",
"EFO:0000275",
"NCBIGene:6910"
],
"evidence_object_label": [
"rs883079-C",
"MONARCH:e95f810c91005264",
"atrial fibrillation",
"PMID:28416818",
"TBX5",
"PMID:28416822",
"combinatorial evidence used in automatic assertion"
],```

there isn't a convenience field specifically for the ECO class, this could be extracted formally from the graph using RO:0002558, or hackily by looking at the evidence_object list and taking the ECO: class. This would map to the single 'evidence' field using in AmiGO-GOlr.

GAF parsing should discard line and report if aspect is not F/P/C

See geneontology/go-annotation#1797

lexmap randomly orders matched pair

When comparing two ontologies A & B, the lexmap script outputs term A then term B, and sometimes term B then term A. It's unexpected, although not a big problem, in case this just works better for lexmap. Example:

UBERON:2001805	articular bone	http://purl.obolibrary.org/obo/NCIT_C13044	Joint	14.404102898078806	articular	Articular	0.07228915662650602	0.0945945945945946	0	0.08561485166101274	4
http://purl.obolibrary.org/obo/NCIT_C13044	Joint	UBERON:0004744	articular/anguloarticular	7.781658062556939	articular	Articular	0.0945945945945946	0.06741573033707865	2	0.04625248135317029	4

Restore travis builds

Should run within pyvenv

filter-out EVIDENCE option needs to produce a file with the lines lines filtered with that evidence

So on an ontobio-parse-assocs.py run, with the --filter-out IEA option, we should build a file with all the lines included, and one called group_name_noiea.gaf.

Question: When there are no valid taxons, we accept all taxons

In _validate_taxon() if in the config there are no valid taxons, or self.config.valid_taxa is None we accept all taxons. This seems counterintuitive, since wouldn't that mean that there does not exist a taxon that is valid?

Question: Should a pipe ( | ) in an ID instead fail to validate when parsing GAFs?

Instead of just a warning?

assocparser QC check: check annotation not in blacklist subsets (gorule-0000008)

geneontology/go-site#551

Different ontologies have different blacklist subsets; GO has 2

Documented here:
https://github.com/geneontology/go-site/blob/master/metadata/rules/gorule-0000008.md

Let's start with a GO-specific check at parse time. It won't affect other ontologies as only GO uses these subsets.

We'd like to do this in SPARTA but until mechanism in place let's do this parse-time

Issue with cachier and configurations

perhaps related to #64 - have the same error when installing with wheel and cachier, but installation finishes successfully.

However when trying to run the notebooks, I get the following error:

/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/cachier/mongo_core.py:24: UserWarning: Cachier warning: pymongo was not found. MongoDB cores will not work.
"Cachier warning: pymongo was not found. MongoDB cores will not work.")

I have not tried the workaround described by @lhannest at #64 (comment) as this is indicated as being unnecessary later in the thread.

Question: GAF parse_line( ) 15 values

What is the significance of 15 values? And then if we do have 15, why do we add two empty strings to the list, and always on the end?

JSON parsing issues?

from ontobio.ontol_factory import OntologyFactory
ofactory = OntologyFactory()
ont = ofactory.create("~/Downloads/go-plus.json")

=>
ERROR:root:Empty graph for '~/Downloads/go-plus.json' - did you use the correct id?

json freshly pulled from http://purl.obolibrary.org/obo/go/extensions/go-plus.json. Here's the head:

{
  "graphs" : [ {
    "nodes" : [ {
      "id" : "http://purl.obolibrary.org/obo/GO_0099593",
      "meta" : {
        "definition" : {
          "val" : "Fusion of an endocytosed synaptic vesicle with an endosome.",
          "xrefs" : [ "GOC:dos" ]
        },
        "basicPropertyValues" : [ {
          "pred" : "http://www.geneontology.org/formats/oboInOwl#hasOBONamespace",
          "val" : "biological_process"
        } ]
      },
      "type" : "CLASS",
      "lbl" : "endocytosed synaptic vesicle to endosome fusion"
    }, {
      "id" : "http://purl.obolibrary.org/obo/GO_0099592",
      "meta" : {
        "definition" : {
          "val" : "The process in which endocytosed synaptic vesicles fuse to the presynaptic endosome followed by sorting of synaptic vesicle components and budding of new synaptic vesicles.",
          "xrefs" : [ "GOC:dos" ]
        },
...
}

GPI Parsing fails on zfin.gpi

If one creates a GpiParser and attempts to parse the uncompressed attached zfin.gpi file below, the following error occurs:

Traceback (most recent call last):
  File "tests/test_gpiparser.py", line 43, in <module>
    run_the_zfin_thing()
  File "tests/test_gpiparser.py", line 36, in run_the_zfin_thing
    results = p.parse(open("zfin.gpi", "r"))
  File "/Users/edouglass/lbl/biolink/ontobio/ontobio/io/entityparser.py", line 41, in parse
    parsed_line, new_ents  = self.parse_line(line)
  File "/Users/edouglass/lbl/biolink/ontobio/ontobio/io/entityparser.py", line 107, in parse_line
    properties] = vals

I used this code to run it:

def run_the_zfin_thing():
    ont = OntologyFactory().create("go-ontology.json")
    p = GpiParser()
    p.config.remove_double_prefixes = True
    results = p.parse(open("zfin.gpi", "r"))
    for r in results:
        print(r)

    print(p.report.to_markdown())

It's not clear yet if the file is at fault or if it's the parser. But in any case the parser should handle the wrong number of columns more gracefully.

Attached:
zfin.gpi.zip

error when creating new new AssociationSetFactory: "no attribute 'solr_assocs'"

I'm getting an error when trying to create a new AssociationSetFactory:

from ontobio.ontol_factory import OntologyFactory
from ontobio.assoc_factory import AssociationSetFactory

HUMAN = 'NCBITaxon:9606'

ofactory = OntologyFactory()
afactory = AssociationSetFactory()
ont = ofactory.create('mondo')

aset = afactory.create(ontology=ont,
subject_category='disease',
object_category='phenotype',
taxon=HUMAN)

Error:

/Users/marcin/anaconda/lib/python3.6/site-packages/cachier/mongo_core.py:24: UserWarning: Cachier warning: pymongo was not found. MongoDB cores will not work.
"Cachier warning: pymongo was not found. MongoDB cores will not work.")
Traceback (most recent call last):
File "enrichment.py", line 16, in
taxon=HUMAN)
File "/Users/marcin/anaconda/lib/python3.6/site-packages/ontobio/assoc_factory.py", line 62, in create
taxon=taxon)
File "/Users/marcin/anaconda/lib/python3.6/site-packages/cachier/core.py", line 178, in func_wrapper
return _calc_entry(core, key, func, args, kwds)
File "/Users/marcin/anaconda/lib/python3.6/site-packages/cachier/core.py", line 78, in _calc_entry
func_res = func(*args, **kwds)
File "/Users/marcin/anaconda/lib/python3.6/site-packages/ontobio/assoc_factory.py", line 152, in bulk_fetch_cached
return bulk_fetch(**args)
File "/Users/marcin/anaconda/lib/python3.6/site-packages/ontobio/golr/golr_associations.py", line 146, in bulk_fetch
**kwargs)
File "/Users/marcin/anaconda/lib/python3.6/site-packages/ontobio/golr/golr_associations.py", line 77, in search_associations_compact
**kwargs
File "/Users/marcin/anaconda/lib/python3.6/site-packages/ontobio/golr/golr_associations.py", line 33, in search_associations
return q.exec()
File "/Users/marcin/anaconda/lib/python3.6/site-packages/ontobio/golr/golr_query.py", line 846, in exec
params = self.solr_params()
File "/Users/marcin/anaconda/lib/python3.6/site-packages/ontobio/golr/golr_query.py", line 571, in solr_params
self._set_solr(self.get_config().solr_assocs)
AttributeError: 'Session' object has no attribute 'solr_assocs'

lexical mapping should include penalty if only matches on normalized syn

Allow remote obo JSON file to be used as a handle

This should work
ofact.create('http://purl.obolibrary.org/obo/go/go.json')

currently it assumes any remote file is owl

Directionless beacon association search with filters

Need to support the directionless association search required for beacons. For inputs sources, targets, relationships, semanticGroups, keywords, all being lists of strings, the output should be a set of associations where either the subject or object is identified by one of the sources.
The other parameters are filters. The relata that is not identified as a source must be identified by one of the targets if any are provided. And all relata not identified by one of the sources must have their semantic type be filtered on semanticGroups and name filtered on keywords if provided. The source is not filtered by semanticGroups. And the association name itself is filtered by relationships if provided.

Previously I had added the subjects_or_objects parameter before to GolrAssociationQuery to get directionless association searches, but this didn't achieve the target filtering.

gafparser.py _ensure_file( ) has a strange if-elif branching in AssocParser

It seems that the if and the elif on line 264 is a little redundant, since the elif will always accept anything that the if accepts. And they're two different ways of downloading urls. Should we just pick the one that uses requests and saves the data into an in memory file as that's general and less prone to failure (since there's no writing to disk)?

Count of all semantic categories

I'm trying to get a count of how many entities fall into each semantic category with GolrSearchQuery(rows=0, facet_fields=['category']).exec(), without actually returning any of the entities. I'm getting this as a response:

{
  "docs": [],
  "facet_counts": {
    "category": {
      "Phenotype": 10,
      "disease": 89,
      "gene": 8
    }
  },
  "highlighting": {},
  "pagination": {}
}

This must be incorrect, there's much more data than that. Maybe am I using the class incorrectly?

Organize Tests

Tests should be split into Integration Tests and Unit tests
- Any test that relies on a resource -outside- the ontobio project (the network primarily) is considered to be an integration test
- Any test that is soley reliant on the inputs and outputs of code within the ontobio project is a unit test
- Unit tests are preferred, and any integration tests that can be converted easily into unit tests, should be.
Tests should be written so as to not be affected by cashier (a caching module). Behavior seems to change when caching is turned on or off.

ontobio gaf parser should reject extensions that have invalid IDs

MGI GAF:

MGI     MGI:1354727     Smtn            GO:0060452      MGI:MGI:4437553|PMID:18678771   IMP     MGI:MGI:3613292 P       smoothelin              protein taxon:10090     20170906        MGI   occurs_in(EMAPA:35342),occurs_in(MA:0002039 TS28)
MGI     MGI:1354727     Smtn            GO:0060452      MGI:MGI:4437553|PMID:18678771   IMP     MGI:MGI:3613293 P       smoothelin              protein taxon:10090     20170906        MGI   occurs_in(EMAPA:35342),occurs_in(MA:0002039 TS28)

the contents of the parentheses should be an ID, no spaces.

In these case we can assume the core annotation is good, but either the entire extension conjunction, or the offending element must be removed, with suitable warning reported

biolink / ontobio Goto Github PK

ontobio's People

Contributors

Stargazers

Watchers

Forkers

ontobio's Issues

Recommend Projects

Recommend Topics

Recommend Org