Giter Site home page Giter Site logo

althonos / pronto Goto Github PK

View Code? Open in Web Editor NEW
216.0 12.0 46.0 10.38 MB

A Python frontend to (Open Biomedical) Ontologies.

Home Page: https://pronto.readthedocs.io

License: MIT License

Python 100.00%
obo owl ontology parser python bioinformatics semantic-web obofoundry obo-graphs

pronto's Introduction

pronto Stars

A Python frontend to ontologies.

Actions License Source Docs Coverage Sanity PyPI Bioconda Versions Wheel Changelog GitHub issues DOI Downloads

🚩 Table of Contents

πŸ—ΊοΈ Overview

Pronto is a Python library to parse, browse, create, and export ontologies, supporting several ontology languages and formats. It implement the specifications of the Open Biomedical Ontologies 1.4 in the form of an safe high-level interface. If you're only interested in parsing OBO or OBO Graphs document, you may wish to consider fastobo instead.

🏳️ Supported Languages

πŸ”§ Installing

Installing with pip is the easiest:

# pip install pronto          # if you have the admin rights
$ pip install pronto --user   # install it in a user-site directory

There is also a conda recipe in the bioconda channel:

$ conda install -c bioconda pronto

Finally, a development version can be installed from GitHub using setuptools, provided you have the right dependencies installed already:

$ git clone https://github.com/althonos/pronto
$ cd pronto
# python setup.py install

πŸ’‘ Examples

If you're only reading ontologies, you'll only use the Ontology class, which is the main entry point.

>>> from pronto import Ontology

It can be instantiated from a path to an ontology in one of the supported formats, even if the file is compressed:

>>> go = Ontology("tests/data/go.obo.gz")

Loading a file from a persistent URL is also supported, although you may also want to use the Ontology.from_obo_library method if you're using persistent URLs a lot:

>>> cl = Ontology("http://purl.obolibrary.org/obo/cl.obo")
>>> stato = Ontology.from_obo_library("stato.owl")

🏷️ Get a term by accession

Ontology objects can be used as mappings to access any entity they contain from their identifier in compact form:

>>> cl['CL:0002116']
Term('CL:0002116', name='B220-low CD38-positive unswitched memory B cell')

Note that when loading an OWL ontology, URIs will be compacted to CURIEs whenever possible:

>>> aeo = Ontology.from_obo_library("aeo.owl")
>>> aeo["AEO:0000078"]
Term('AEO:0000078', name='lumen of tube')

πŸ–ŠοΈ Create a new term from scratch

We can load an ontology, and edit it locally. Here, we add a new protein class to the Protein Ontology.

>>> pr = Ontology.from_obo_library("pr.obo")
>>> brh = ms.create_term("PR:XXXXXXXX")
>>> brh.name = "Bacteriorhodopsin"
>>> brh.superclasses().add(pr["PR:000001094"])  # is a rhodopsin-like G-protein
>>> brh.disjoint_from.add(pr["PR:000036194"])   # disjoint from eukaryotic proteins

✏️ Convert an OWL ontology to OBO format

The Ontology.dump method can be used to serialize an ontology to any of the supported formats (currently OBO and OBO JSON):

>>> edam = Ontology("http://edamontology.org/EDAM.owl")
>>> with open("edam.obo", "wb") as f:
...     edam.dump(f, format="obo")

🌿 Find ontology terms without subclasses

The terms method of Ontology instances can be used to iterate over all the terms in the ontology (including the ones that are imported). We can then use the is_leaf method of Term objects to check is the term is a leaf in the class inclusion graph.

>>> ms = Ontology("ms.obo")
>>> for term in ms.terms():
...     if term.is_leaf():
...         print(term.id)
MS:0000000
MS:1000001
...

🀫 Silence warnings

pronto is explicit about the parts of the code that are doing non-standard assumptions, or missing capabilities to handle certain constructs. It does so by raising warnings with the warnings module, which can get quite verbose.

If you are fine with the inconsistencies, you can manually disable warning reports in your consumer code with the filterwarnings function:

import warnings
import pronto
warnings.filterwarnings("ignore", category=pronto.warnings.ProntoWarning)

πŸ“– API Reference

A complete API reference can be found in the online documentation, or directly from the command line using pydoc:

$ pydoc pronto.Ontology

πŸ“œ License

This library is provided under the open-source MIT license. Please cite this library if you are using it in a scientific context using the following DOI: 10.5281/zenodo.595572

pronto's People

Contributors

alexhenrie avatar althonos avatar chrishmorris avatar davmlaw avatar dependabot-preview[bot] avatar dependabot[bot] avatar dhimmel avatar ebakke avatar flying-sheep avatar imgbot[bot] avatar ravwojdyla avatar spenceforce avatar ttyskg avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pronto's Issues

'str' object has no attribute 'synonyms' Error

Hi,

I stumbled across another issue. I tried to iterate over concepts of an OBO-Ontology:

`
import pronto
ont = pronto.Ontology('data/apo.obo')

for concept in ont:

for syn in concept.synonyms:
`

Since I updated to pronto 2.0.1, I get the following error

AttributeError: 'str' object has no attribute 'synonyms'

Is there a documentation where I could look these things up, if I just used it wrongly?

All the best
Philipp

pronto.Ontology() gives inconsistent lengths

Using the input test.obo:

[Term]
id: ONT0:CHILD_0
name: child_0
is_a: ONT0:ROOT ! root

[Term]
id: ONT0:CHILD_1
name: child_1
is_a: ONT0:ROOT ! root

[Term]
id: ONT0:CHILD_2
name: child_2
is_a: ONT0:ROOT ! root

[Term]
id: ONT0:ROOT
name: root

and running test.py:

import pronto

print len(pronto.Ontology("./test.obo"))
print len(pronto.Ontology("./test.obo"))
print len(pronto.Ontology("./test.obo"))
print len(pronto.Ontology("./test.obo"))

Using pronto 0.3.3 I get the output:

1
2
4
2

The expected output as given by pronto 0.3.2 is:

4
4
4
4

Option to cache ontologies

I've been debugging the CHIRO (CHEBI Integrated Role Ontology) OBO export, and it had a few issues. First, I had to manually add some Typedef stanzas for its ad-hoc relations. Second, I had to switch the imports from its slimmed versions to the originals since the slim versions were missing several entities.

This lead me to the problem that it has to download each ontology file each time, and this takes a loooong time. Therefore, I'd like to request an option to cache OBO files (either the source .obo or a pre-compiled version as a pickle or OBO JSON)

I understand there could be problems to keeping the caches up-to-date, but maybe there's a simple way to add a dictionary argument to Ontology.__init__ so I can specify where I have my own copies like

from pronto import Ontology

Ontology.from_obo_library('chiro.obo', cache_files={
    'http://purl.obolibrary.org/obo/chiro/imports/chebi_import.owl': '/Users/cthoyt/obo/chebi.owl',
    'http://purl.obolibrary.org/obo/chiro/imports/envo_import.owl' : '/Users/cthoyt/obo/envo.owl',
    ...
})

Or alternatively, maybe you have an idea that could take care of this kind of caching for me so I don't have to look into the imports of the OBO file specifically.

Definition string contains cross-references

Hi again,

I'm importing the OBO file from NCIT, and when I extract the definition, I get the whole string unprocessed (i.e. entry.desc contains everything after the quotes). Here's an example entry:

[Term]
id: NCIT:C72909
name: Pentopril
def: "A non-sulfhydryl angiotensin converting enzyme (ACE) inhibitor with antihypertensive activity. As a prodrug, pentopril is hydrolyzed into its active form pentoprilat, which competitively inhibits ACE, thereby blocking the conversion of angiotensin I to angiotensin II. This prevents the potent vasoconstrictive actions of angiotensin II and results in vasodilation. Pentoprilat also decreases angiotensin II-induced aldosterone secretion by the adrenal cortex, which leads to an increase in sodium excretion and subsequently increases water outflow." [] {http://purl.obolibrary.org/obo/NCIT_P378="NCI"}

So according to the spec (https://owlcollab.github.io/oboformat/doc/GO.format.obo-1_4.html#S.2.2), a "def" entry can consist of a string and a dbxref term. This is done correctly for the synonyms, could you do it for the definition as well?

pronto==0.4.0 fails when printing obo

Using the test.obo

[Term]
id: ONT0:CHILD_0
name: child_0
is_a: ONT0:ROOT ! root

[Term]
id: ONT0:CHILD_1
name: child_1
is_a: ONT0:ROOT ! root

[Term]
id: ONT0:CHILD_2
name: child_2
is_a: ONT0:ROOT ! root

[Term]
id: ONT0:ROOT
name: root

when I run the the following python script with pronto 0.4.0:

import pronto

for i in xrange(100):
    assert(len(pronto.Ontology("./test.obo")) == 4)

obo = pronto.Ontology("./test.obo")
print(obo.obo)

I get the error message

Traceback (most recent call last):
  File "test.py", line 7, in <module>
    print(obo.obo)
  File "/local/scratch/winni/EchiDNA/phos_env/lib/python2.7/site-packages/pronto/ontology.py", line 129, in obo
    return "\n\n".join( [self._obo_meta()] + [t.obo for t in self if t.id.startswith(self.meta['namespace'][0])])
  File "/local/scratch/winni/EchiDNA/phos_env/lib/python2.7/site-packages/pronto/ontology.py", line 163, in _obo_meta
    ] if "ontology" in self.meta else ["ontology: {}".format(self.meta["namespace"][0].lower())]
KeyError: 'namespace'

I'm surprised there isn't an acceptance test for this?

Support for OBO comments

Thanks for the new version! The new version seems to have a many improvements.

It would be great to have support for comments in OBO files (lines that begin with !).

[Possible non-issue] SyntaxError: expected QuotedString

To reproduce:

do = Ontology("https://raw.githubusercontent.com/DiseaseOntology/HumanDiseaseOntology/master/src/ontology/releases/2019-10-30/doid-merged.obo")

I'm not sure this is a pronto issue, as the file's url is indeed corrupted. If so please close this.

image

Basically the file's url is corrupted with a random space in there, which certainly isn't correct. But if the url was to be treated as a string then maybe this would not have raised a SyntaxError?

I have however asked disease ontologies people to take a look, the broken url is certainly something they should fix.

Thanks!

Error with printing nodes with utf-8 characters in python 2.7

Using the test.obo

[Term]
id: ONT0:ROOT
name: Β°

when I run the the following python script with python 2.7 and pronto 0.4.0:

import pronto

obo = pronto.Ontology("./test.obo")
for node in obo:
    print(node)

I get the error message

Traceback (most recent call last):
  File "test.py", line 5, in <module>
    print(node)
  File "/local/scratch/PopulationAnalysis/env-phos/lib/python2.7/site-packages/pronto/term.py", line 55, in __repr__
    return "<{}: {}>".format(self.id, self.name)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xb0' in position 0: ordinal not in range(128)

I'm guessing that the repr should have some of the strings marked as unicode.

Turtle format?

Hi,
I was wondering if Pronto can handle the Turtle (ttl) format?
Cheers

OBO Parser Error

Hi,

first of all, thanks a lot for your great work.

I tried to parse the attached little OBO terminology, but it fails. I guess the API is mismatches "EXACT" and our custom synonym type "EXACT_MODE". Would it be possible for you to change the parser so that it accepts the synonym type "EXACT_MODE"?

Thanks a lot and all the best
Philipp

MappingModi.obo.txt

Make pronto work with https

pronto does not work with https. Can anybody add such support? Thanks.

ont = pronto.Ontology('https://raw.githubusercontent.com/althonos/pronto/master/tests/resources/elo.obo')

Error when parsing MONDO

I was using pronto to work with MONDO found here. However I get this error:

Traceback (most recent call last):
  File "/Users/gonzalezma/.pyenv/versions/nexus/lib/python3.5/site-packages/pronto/parser/obo.py", line 222, in _classify
    s = _cached_synonyms[obo_header]
KeyError: '"CFS" EXACT ABBREVIATION [https://en.wikipedia.org/wiki/Chronic_fatigue_syndrome#Naming]'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/gonzalezma/.pyenv/versions/nexus/lib/python3.5/site-packages/pronto/synonym.py", line 115, in __init__
    self.syn_type = SynonymType._instances[syn_type]
KeyError: 'ABBREVIATION'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "manage.py", line 29, in <module>
    execute_from_command_line(sys.argv)
  File "/Users/gonzalezma/.pyenv/versions/nexus/lib/python3.5/site-packages/django/core/management/__init__.py", line 371, in execute_from_command_line
    utility.execute()
  File "/Users/gonzalezma/.pyenv/versions/nexus/lib/python3.5/site-packages/django/core/management/__init__.py", line 365, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/Users/gonzalezma/.pyenv/versions/nexus/lib/python3.5/site-packages/django/core/management/base.py", line 288, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/Users/gonzalezma/.pyenv/versions/nexus/lib/python3.5/site-packages/django/core/management/base.py", line 335, in execute
    output = self.handle(*args, **options)
  File "/Users/gonzalezma/.pyenv/versions/nexus/lib/python3.5/site-packages/phenotype_ontologies/management/commands/sync_ontology.py", line 36, in handle
    data = Ontology(purl, timeout=10)
  File "/Users/gonzalezma/.pyenv/versions/nexus/lib/python3.5/site-packages/pronto/ontology.py", line 109, in __init__
    self.parse(handle, parser)
  File "/Users/gonzalezma/.pyenv/versions/nexus/lib/python3.5/site-packages/pronto/ontology.py", line 224, in parse
    self.meta, self.terms, self.imports = p.parse(stream)
  File "/Users/gonzalezma/.pyenv/versions/nexus/lib/python3.5/site-packages/pronto/parser/obo.py", line 69, in parse
    terms = cls._classify(_rawtypedef, _rawterms)
  File "/Users/gonzalezma/.pyenv/versions/nexus/lib/python3.5/site-packages/pronto/parser/obo.py", line 224, in _classify
    s = Synonym.from_obo(obo_header, scope)
  File "/Users/gonzalezma/.pyenv/versions/nexus/lib/python3.5/site-packages/pronto/synonym.py", line 138, in from_obo
    return cls(**groupdict)
  File "/Users/gonzalezma/.pyenv/versions/nexus/lib/python3.5/site-packages/pronto/synonym.py", line 118, in __init__
    raise ValueError("Undefined synonym type: {}".format(syn_type))
ValueError: Undefined synonym type: ABBREVIATION

ValueError: could not find `owl:Ontology` element

Hi,

when I try to parse APO Ontology (http://ontologies.berkeleybop.org/apo.obo) doing

import pronto
ont = pronto.Ontology.from_obo_library('data/apo.obo')

I get a ValueError: could not find owl:Ontology element.

Do you know what I do wrong?
All the best
Philipp


ValueError Traceback (most recent call last)
in
4
5 #ont = pronto.Ontology.from_obo_library('data/apo.obo')
----> 6 ont = pronto.Ontology.from_obo_library('data/abda_sto.obo')
7
8 fout = open('data/abda_sto_enriched.obo', "w", encoding="utf-8")

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pronto\ontology.py in from_obo_library(cls, slug, import_depth, timeout)
93
94 """
---> 95 return cls(f"http://purl.obolibrary.org/obo/{slug}", import_depth, timeout)
96
97 def init(

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pronto\ontology.py in init(self, handle, import_depth, timeout)
159 for cls in BaseParser.subclasses():
160 if cls.can_parse(typing.cast(str, self.path), buffer):
--> 161 cls(self).parse_from(_handle)
162 break
163 else:

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pronto\parsers\rdfxml.py in parse_from(self, handle)
91 owl_ontology = tree.find(_NS["owl"]["Ontology"])
92 if owl_ontology is None:
---> 93 raise ValueError("could not find owl:Ontology element")
94 self.ont.metadata = self._extract_meta(owl_ontology)
95

ValueError: could not find owl:Ontology element

Comments get parsed if on same line as tag

I'm importing the obi.owl file from the OBI ontology (http://obi-ontology.org/) and after many tags they add a comment on the same line. Here's an example from line 5:

<owl:versionIRI rdf:resource="http://purl.obolibrary.org/obo/obi/2017-02-22/obi.owl"/><!-- OBI Release 2017-02-22 -->

This causes an error in the parser which thinks that the comment tag is a real element:

File "/home/pplettner/venv/lib/python3.5/site-packages/pronto/ontology.py", line 80, in __init__
    self.parse(handle, parser)
  File "/home/pplettner/venv/lib/python3.5/site-packages/pronto/ontology.py", line 197, in parse
    self.meta, self.terms, self.imports = p.parse(stream)
  File "/home/pplettner/venv/lib/python3.5/site-packages/pronto/parser/owl.py", line 108, in parse
    meta, imports = cls._parse_meta(tree)
  File "/home/pplettner/venv/lib/python3.5/site-packages/pronto/parser/owl.py", line 127, in _parse_meta
    basename = elem.tag.split('}', 1)[-1]
AttributeError: 'cython_function_or_method' object has no attribute 'split'

Comment tags on their own lines are ignored, and if I delete all of these postfix comments in the OWL file it imports successfully. Could you fix this to ignore these comments as well?

Thanks in advance.

it doesn't work on many ontologies, both for obo and for owl formats.

#it seems as though it is quite sensitive to any errors in the owl code or obo representation?

import pronto
sd =pronto.Ontology('https://github.com/SDG-InterfaceOntology/sdgio/blob/master/sdgio.owl')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "build/bdist.linux-x86_64/egg/pronto/ontology.py", line 109, in __init__
  File "build/bdist.linux-x86_64/egg/pronto/ontology.py", line 224, in parse
  File "build/bdist.linux-x86_64/egg/pronto/utils.py", line 62, in new_func
  File "build/bdist.linux-x86_64/egg/pronto/parser/owl.py", line 63, in parse
  File "src/lxml/etree.pyx", line 3426, in lxml.etree.parse
  File "src/lxml/parser.pxi", line 1861, in lxml.etree._parseDocument
  File "src/lxml/parser.pxi", line 1881, in lxml.etree._parseFilelikeDocument
  File "src/lxml/parser.pxi", line 1776, in lxml.etree._parseDocFromFilelike
  File "src/lxml/parser.pxi", line 1187, in lxml.etree._BaseParser._parseDocFromFilelike
  File "src/lxml/parser.pxi", line 601, in lxml.etree._ParserContext._handleParseResultDoc
  File "src/lxml/parser.pxi", line 711, in lxml.etree._handleParseResult
  File "src/lxml/parser.pxi", line 640, in lxml.etree._raiseParseError
  File "https://github.com/SDG-InterfaceOntology/sdgio/blob/master/sdgio.owl", line 44
lxml.etree.XMLSyntaxError: Specification mandate value for attribute data-pjax-transient, line 44, column 91
envo = pronto.Ontology('https://raw.githubusercontent.com/EnvironmentOntology/envo/master/envo.obo')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "build/bdist.linux-x86_64/egg/pronto/ontology.py", line 109, in __init__
  File "build/bdist.linux-x86_64/egg/pronto/ontology.py", line 224, in parse
  File "build/bdist.linux-x86_64/egg/pronto/parser/obo.py", line 70, in parse
  File "build/bdist.linux-x86_64/egg/pronto/parser/obo.py", line 236, in _classify
  File "build/bdist.linux-x86_64/egg/pronto/synonym.py", line 138, in from_obo
  File "build/bdist.linux-x86_64/egg/pronto/synonym.py", line 118, in __init__
ValueError: Undefined synonym type: synonym

and this one times out:

gaz = pronto.Ontology('http://purl.obolibrary.org/obo/gaz.obo')

Windows: missing six module

  • Missing module 'six' when pip install pronto 0.3.3
  • Easy fix by installing six prior to pronto installation but would be best if we add six to the dependencies in the setup.py file


Microsoft Windows [Version 6.1.7601]
Copyright (c) 2009 Microsoft Corporation.  All rights reserved.

C:\Users\tnl495>pip install mzml2isa
Collecting mzml2isa
  Downloading mzml2isa-0.4.25.tar.gz (128kB)
    100% |################################| 131kB 2.6MB/s
Collecting lxml (from mzml2isa)
  Downloading lxml-3.6.3.tar.gz (3.7MB)
    100% |################################| 3.7MB 145kB/s
Collecting pronto (from mzml2isa)
  Downloading pronto-0.3.3.tar.gz
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 20, in <module>
      File "c:\users\tnl495\appdata\local\temp\pip-build-b_tcry\pronto\setup.py"
, line 9, in <module>
        import pronto
      File "pronto\__init__.py", line 14, in <module>
        from pronto.ontology import Ontology
      File "pronto\ontology.py", line 25, in <module>
        import six
    ImportError: No module named six

    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in c:\users\tnl495\a
ppdata\local\temp\pip-build-b_tcry\pronto
You are using pip version 7.1.2, however version 8.1.2 is available.
You should consider upgrading via the 'python -m pip install --upgrade pip' comm
and.

C:\Users\tnl495>



retrieval of part_of children?

Hello,

I am new to pronto, and am trying to work out how to retrieve all is_a and part_of relations of a term.

Here is an example term:

[Term]
id: PLANA:0002034
name: ER membrane
def: "The lipid bilayer surrounding the endoplasmic reticulum." []
xref: GO:0005789
is_a: PLANA:0000521 ! bounding membrane of organelle
relationship: part_of PLANA:0007513 ! endoplasmic reticulum

Code that I have tried:

>>> import pronto
>>> from pronto.relationship import Relationship
>>> 
>>> ont = pronto.Ontology('plana.owl')
>>> term = ont['PLANA:0002034']
>>> term.relations
{Relationship('is_a'): [<PLANA:0000521: bounding membrane of organelle>]}

I see that part_of is a known relationship.

>>> for r in Relationship.bottomup():
...   print(r)
...
Relationship('is_a')
Relationship('part_of')
Relationship('develops_from')

How do I include relationship: part_of PLANA:0007513 ! endoplasmic reticulum in my report of the relations of my term?

Thank you,
Sofia

Not retrieving "PRO-short-label" synonym type

Hi,

glancing over the code base it looks like when you scan for synonyms it should be able to pick up the synonym type. In this example entry I got from Protein Ontology, I should expect one of the exact synonyms to be labelled as "PRO-short-label".

However, if I import this all of the synonyms have type = None. Is this an unexpected format for the parser?

    <owl:Class rdf:about="http://purl.obolibrary.org/obo/PR_000001007">
        <rdfs:label rdf:datatype="http://www.w3.org/2001/XMLSchema#string">integrin alpha-1</rdfs:label>
        <rdfs:subClassOf rdf:resource="http://purl.obolibrary.org/obo/PR_000001005"/>
        <obo:IAO_0000115 rdf:datatype="http://www.w3.org/2001/XMLSchema#string">An integrin alpha with A domain that is a translation product of the human ITGA1 gene or a 1:1 ortholog thereof.</obo:IAO_0000115>
        <oboInOwl:hasExactSynonym rdf:datatype="http://www.w3.org/2001/XMLSchema#string">CD49 antigen-like family member A</oboInOwl:hasExactSynonym>
        <oboInOwl:hasExactSynonym rdf:datatype="http://www.w3.org/2001/XMLSchema#string">CD49a</oboInOwl:hasExactSynonym>
        <rdfs:comment rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Category=gene.</rdfs:comment>
        <oboInOwl:hasExactSynonym rdf:datatype="http://www.w3.org/2001/XMLSchema#string">ITGA1</oboInOwl:hasExactSynonym>
        <oboInOwl:id rdf:datatype="http://www.w3.org/2001/XMLSchema#string">PR:000001007</oboInOwl:id>
        <oboInOwl:hasExactSynonym rdf:datatype="http://www.w3.org/2001/XMLSchema#string">VLA-1</oboInOwl:hasExactSynonym>
        <oboInOwl:hasExactSynonym rdf:datatype="http://www.w3.org/2001/XMLSchema#string">laminin and collagen receptor</oboInOwl:hasExactSynonym>
        <oboInOwl:hasOBONamespace rdf:datatype="http://www.w3.org/2001/XMLSchema#string">protein</oboInOwl:hasOBONamespace>
    </owl:Class>
    <owl:Axiom>
        <owl:annotatedTarget rdf:datatype="http://www.w3.org/2001/XMLSchema#string">An integrin alpha with A domain that is a translation product of the human ITGA1 gene or a 1:1 ortholog thereof.</owl:annotatedTarget>
        <oboInOwl:hasDbXref rdf:datatype="http://www.w3.org/2001/XMLSchema#string">PRO:CNA</oboInOwl:hasDbXref>
        <owl:annotatedProperty rdf:resource="http://purl.obolibrary.org/obo/IAO_0000115"/>
        <owl:annotatedSource rdf:resource="http://purl.obolibrary.org/obo/PR_000001007"/>
    </owl:Axiom>
    <owl:Axiom>
        <owl:annotatedTarget rdf:datatype="http://www.w3.org/2001/XMLSchema#string">ITGA1</owl:annotatedTarget>
        <oboInOwl:hasDbXref rdf:datatype="http://www.w3.org/2001/XMLSchema#string">PRO:DNx</oboInOwl:hasDbXref>
        <owl:annotatedSource rdf:resource="http://purl.obolibrary.org/obo/PR_000001007"/>
        <oboInOwl:hasSynonymType rdf:resource="http://purl.obolibrary.org/obo/pr#PRO-short-label"/>
        <owl:annotatedProperty rdf:resource="http://www.geneontology.org/formats/oboInOwl#hasExactSynonym"/>
    </owl:Axiom>

OwlXMLTargetParser not extracting imports properly

Currently OwlXMLTargetParser does not extract imports properly, and only extract the last import instead of extracting all imports.

e.g. with cl.ont:

<owl:Ontology rdf:about="http://purl.obolibrary.org/obo/cl.owl">
        <rdfs:comment rdf:datatype="http://www.w3.org/2001/XMLSchema#string">See PMID:15693950, PMID:12799354, PMID:20123131, PMID:21208450; Contact Alexander Diehl, [email protected], University at Buffalo.</rdfs:comment>
        <dc:license rdf:resource="http://creativecommons.org/licenses/by/4.0/"/>
        <owl:imports rdf:resource="http://purl.obolibrary.org/obo/cl/imports/chebi_import.owl"/>
        <owl:imports rdf:resource="http://purl.obolibrary.org/obo/cl/imports/clo_import.owl"/>
        <owl:imports rdf:resource="http://purl.obolibrary.org/obo/cl/imports/go_import.owl"/>
        <owl:imports rdf:resource="http://purl.obolibrary.org/obo/cl/imports/ncbitaxon_import.owl"/>
        <owl:imports rdf:resource="http://purl.obolibrary.org/obo/cl/imports/pato_import.owl"/>
        <owl:imports rdf:resource="http://purl.obolibrary.org/obo/cl/imports/pr_import.owl"/>
        <owl:imports rdf:resource="http://purl.obolibrary.org/obo/cl/imports/ro_import.owl"/>
        <owl:imports rdf:resource="http://purl.obolibrary.org/obo/cl/imports/uberon_import.owl"/>
        <owl:versionIRI rdf:resource="http://purl.obolibrary.org/obo/cl/releases/2016-02-01/cl.owl"/>
    </owl:Ontology>

OwlXMLTreeParser extracts:

`["http://purl.obolibrary.org/obo/cl/imports/chebi_import.owl", 
  "http://purl.obolibrary.org/obo/cl/imports/clo_import.owl", 
  "http://purl.obolibrary.org/obo/cl/imports/go_import.owl", 
  "http://purl.obolibrary.org/obo/cl/imports/ncbitaxon_import.owl", 
  "http://purl.obolibrary.org/obo/cl/imports/pato_import.owl", 
  "http://purl.obolibrary.org/obo/cl/imports/pr_import.owl", 
  "http://purl.obolibrary.org/obo/cl/imports/ro_import.owl", 
  "http://purl.obolibrary.org/obo/cl/imports/uberon_import.owl"]

where OwlXMLTargetParser only extracts:

["http://purl.obolibrary.org/obo/cl/imports/uberon_import.owl"]

No module name pronto -UBUNTU 16.04 lts ROS

Hi,
I'm trying implement pronto ontology in a Robot simulation, is it possible use pronto with ROS ? . Not sure if i did something wrong.
I installed using third step (git clone + setup.py). I tried but got "no module named pronto" in import line.

import pronto

ont = pronto.Ontology('/home/antoniobatistast1/CORA/cora-fullaxiom.owl')
print(ont.obo)
print(ont.json)

is it wrong ?

Comments in metadata area of OBO cause the parser to fail

According to this OBO format guide:

An OBO file may contain any number of lines beginning with !, at any point in the file. These lines are ignored by parsers.

Further, any line may end with a ! comment. Parsers that encounter an unescaped ! will ignore the ! and all data until the end of the line. <newline> sequences are not allowed in ! comments (see escape characters).

However, the pronto parser apparently does not work well with some OBO files with comments:

from pronto import Ontology
Ontology('ftp://ftp.expasy.org/databases/cellosaurus/cellosaurus.obo')

leads to ValueError exception for me:

ValueError                                Traceback (most recent call last)
      1 from pronto import Ontology
----> 2 Ontology('ftp://ftp.expasy.org/databases/cellosaurus/cellosaurus.obo')

pronto/ontology.py in __init__(self, handle, imports, import_depth, timeout, parser)
    107             self.path = handle
    108             with self._get_handle(handle, timeout) as handle:
--> 109                 self.parse(handle, parser)
    110         else:
    111             actual = type(handle).__name__

pronto/ontology.py in parse(self, stream, parser)
    222         for p in parsers:
    223             if p.hook(path=self.path, force=force, lookup=lookup):
--> 224                 self.meta, self.terms, self.imports, self.typedefs = p.parse(stream)
    225                 self._parsed_by = p.__name__
    226                 break

pronto/parser/obo.py in parse(cls, stream)
     61 
     62             if _section is OboSection.meta:
---> 63                 cls._parse_metadata(streamline, meta)
     64             elif _section is OboSection.typedef:
     65                 cls._parse_typedef(streamline, _rawtypedef)

pronto/parser/obo.py in _parse_metadata(cls, line, meta, parse_remarks)
    116 
    117         """
--> 118         key, value = line.split(':', 1)
    119         key, value = key.strip(), value.strip()
    120         if parse_remarks and "remark" in key:                        # Checking that the ':' is not

ValueError: not enough values to unpack (expected 2, got 1)

Editing-out the comments manually solved the problem for me.

defunct gene ontology items

This is really strange, I can read the latest GO and most of the items seem fine but e.g. this entry:

[Term]
id: GO:0016301
name: kinase activity
namespace: molecular_function
def: "Catalysis of the transfer of a phosphate group, usually from ATP, to a substrate molecule." [ISBN:0198506732]
comment: Note that this term encompasses all activities that transfer a single phosphate group; although ATP is by far the most common phosphate donor, reactions using other phosphate donors are included in this term.
subset: goslim_chembl
subset: goslim_drosophila
subset: goslim_generic
subset: goslim_metagenomics
subset: goslim_plant
subset: goslim_yeast
synonym: "phosphokinase activity" EXACT []
is_a: GO:0016772 ! transferase activity, transferring phosphorus-containing groups
relationship: part_of GO:0016310 ! phosphorylation

...will lose its name, synonyms, and description:

In [6]: ont = pronto.Ontology('go-edit.obo')                                    
/usr/lib/python3.6/site-packages/pronto/ontology.py:326: ProntoWarning: XMLSyntaxError occured during import of http://purl.obolibrary.org/obo/go/extensions/go-bridge.owl
  ProntoWarning)
/usr/lib/python3.6/site-packages/pronto/ontology.py:326: ProntoWarning: XMLSyntaxError occured during import of http://purl.obolibrary.org/obo/go/extensions/go-gci.owl
  ProntoWarning)

In [7]: term = ont.terms.get('GO:0016301')                                      

In [8]: term                                                                    
Out[8]: <GO:0016301: >

In [12]: term.synonyms                                                          
Out[12]: set()

In [14]: term.desc                                                              
Out[14]: Description('', [])

In [15]: term.obo                                                               
Out[15]: '[Term]\nid: GO:0016301\nname: \nis_a: GO:0016772 ! transferase activity, transferring phosphorus-containing groups\nrelationship: part_of GO:0016310 ! phosphorylation'

In [17]: term.other                                                             
Out[17]: {}

In [18]: term.name                                                              
Out[18]: ''

Counting the items with GO id and an empty name:

In [24]: c=0  
    ...: for id,t in ont.terms.items(): 
    ...:     if id[:2] == 'GO' and t.name == '': 
    ...:         c = c+1 
    ...:                                                                        

In [25]: c                                                                      
Out[25]: 1332

Can you confirm this, or do I have a local problem? I'm using current master of both GO and pronto, with Python3.

TypeError: sequence item 0: expected str instance, NoneType found

To reproduce the issue, run:

from pronto import Ontology
duo = Ontology("https://raw.githubusercontent.com/EBISPOT/DUO/master/duo.owl")

Error traceback to:

File "pronto/parsers/rdfxml.py", line 529, in _extract_object_property
reldata.comment = "\n".join(comments)
TypeError: sequence item 0: expected str instance, NoneType found

Upon further inspection, it was found that Line 522 does a TypeCheck

    if comments:

But since comments is a list, this always evaluates to True. In this case, it was found that the comments is a list of one None: [None].

Propose fix:
Change line 529 to reldata.comment = "\n".join(str(comments))
Similar changes can be applied to line 365.

Propose fix 2:
See comment below

Remove redundant type check [This is not true, see comment below]

names: List[str] = []
comments: List[str] = []

You initialized names and comments to be a list, so do a if comments or if names will always evaluate to True. This is not needed

There are similar type checks elsewhere in the code, such as line 513. The type checks there would probably be redundant and incomplete so feel free to take a look there.

PS. Is there a recommended way to suppress warning in terminal and print them to a separate log file?

Issues with the new Pronto API

I've been upgrading some of my Python 2 code to Python 3 and just noticed today that you completely overhauled Pronto in October.

Unfortunately it seems to have made things more complex for me. I had some nice simple code that did this:

def get_psimi_terms():
    psimi_root = read_psimi()[ROOT_DETMETHOD_ID]
    psimi_terms = [psimi_root]
    current_terms = psimi_root.children

    while current_terms:
        psimi_terms.extend(current_terms)
        current_terms = current_terms.children

    return psimi_terms

I see that you've got a subclasses iterator that grabs all of a term's descendants, but that's not quite what I need (the fact that some terms might repeat themselves here is intended). And elsewhere in the code I also need to grab each term's immediate parent. Just the parent, not every ancestor. Is there a nice way to implement this now without having to wrestle with the new API?

Another thing (and maybe I should have mentioned this first) is that I'm no longer able to even load the OBO files I was previously using. I'm working with the PSI-MI.OBO from here: https://github.com/HUPO-PSI/psi-mi-CV/blob/master/psi-mi.obo

When I try to load that, I get a SyntaxError: expected QuotedString exception from some random term near the end. I also tried GO.OBO and had the same problem. According to your compatibility chart these files are supposed to work, so I'm not sure what's going on here. I didn't have time to investigate this deeply before leaving work today so maybe it's something dumb, but perhaps you can help me.

Ignore errors per term when parsing

I am trying to parse the GO database:

format-version: 1.2
data-version: releases/2020-01-01

However, when I read the file via:

from pronto import Ontology
GO = Ontology("/home/dbs/GO_new/go-basic.obo")

Traceback (most recent call last):

  File "/scratch0/software/miniconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3319, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)

  File "<ipython-input-2-0166872dad99>", line 1, in <module>
    GO = Ontology("/home/dbs/GO_new/go-basic.obo")

  File "/scratch0/software/miniconda3/lib/python3.7/site-packages/pronto/ontology.py", line 174, in __init__
    self._build_subclassing_cache()

  File "/scratch0/software/miniconda3/lib/python3.7/site-packages/contexter.py", line 96, in __exit__
    reraise(exc)

  File "/scratch0/software/miniconda3/lib/python3.7/site-packages/contexter.py", line 26, in reraise
    raise exc[1].with_traceback(exc[2])

  File "/scratch0/software/miniconda3/lib/python3.7/site-packages/pronto/ontology.py", line 167, in __init__
    cls(self).parse_from(_handle)

  File "/scratch0/software/miniconda3/lib/python3.7/site-packages/pronto/parsers/obo.py", line 36, in parse_from
    raise SyntaxError(s.args[0], location) from None

  File "/home/dbs/GO_new/go-basic.obo", line 284367
    def: "Catalysis of the reaction: (6S)-6-hydroxyhyoscyamine + 2-oxoglutarate + O(2) = CO(2) + H(2)O + H(+) + scopolamine + succinate." [EC: 1.14.20.13, RHEA:12797]
                                                                                                                                               ^
SyntaxError: expected QuotedString

recommended way to fetch #shorthand

[Term]
id: DUO:0000004
name: no restriction
def: "This consent code primary category indicates there is no restriction on use."
property_value: http://purl.obolibrary.org/obo/DUO_0000041 "no restriction" xsd:string
property_value: http://www.geneontology.org/formats/oboInOwl#shorthand "NRES" xsd:string
is_a: DUO:0000002

Above just an example of a obo file snippet, I wanted to read the #shorthand from one of the property_value, any recommended way of doing this? In this way, I want to be able to read NRES. Thanks!

Obo exporting does not work properly

Two errors occuring in obo exporting (example with human phenotype ontology):

  • repeated attributes are added as lists, not as repeated key:value pairs
  • synonyms are not added to the export

Example:

[Term]
id: HP:0000003
name: Multicystic kidney dysplasia
alt_id: [u'HP:0004715']
def: "Multicystic dysplasia of the kidney is characterized by multiple cysts of varying size in the kidney and the absence of a normal pelvocaliceal system. The condition is associated with ureteral or ureteropelvic atresia, and the affected kidney is nonfunctional." [HPO:curators]
xref: [u'MeSH:D021782 "Multicystic Kidney Dysplasia"', u'UMLS:C0345335 "Multicystic Kidney Dysplasia"']
is_a: HP:0000107 ! Renal cyst

instead of:

[Term]
id: HP:0000003
name: Multicystic kidney dysplasia
alt_id: HP:0004715
def: "Multicystic dysplasia of the kidney is characterized by multiple cysts of varying size in the kidney and the absence of a normal pelvocaliceal system. The condition is associated with ureteral or ureteropelvic atresia, and the affected kidney is nonfunctional." [HPO:curators]
comment: Multicystic kidney dysplasia is the result of abnormal fetal renal development in which the affected kidney is replaced by multiple cysts and has little or no residual function. The vast majority of multicystic kidneys are unilateral. Multicystic kidney can be diagnosed on prenatal ultrasound.
synonym: "Multicystic dysplastic kidney" EXACT []
synonym: "Multicystic kidneys" EXACT []
synonym: "Multicystic renal dysplasia" EXACT []
xref: MeSH:D021782 "Multicystic Kidney Dysplasia"
xref: UMLS:C0345335 "Multicystic Kidney Dysplasia"
is_a: HP:0000107 ! Renal cyst

Merge metadata when merging ontologies

When merging ontologies, metadata should be merged too.

Tags with cardinality any should be simply merged together, and those with zero or one or one should be treated separalety:

  • format-version: 1.2 if any is 1.2 else 1.4
  • date: date of the merge
  • saved-by: None
  • default-relationship-id-prefix : the one of the merge-receiving ontology

Windows multiprocessing: AttributeError: 'Queue' object has no attribute 'size'

After testing mzml2isa on windows 7 the following error message is displayed from the the pronto package

Installed from pip (mzml2isa 0.4.25)

Microsoft Windows [Version 6.1.7601]
Copyright (c) 2009 Microsoft Corporation. All rights reserved.

C:\Users\tnl495>mzml2isa
Process _OboClassifier-1:
Traceback (most recent call last):
File "c:\python27\lib\multiprocessing\process.py", line 258, in _bootst
self.run()
File "c:\python27\lib\site-packages\pronto\parser\obo.py", line 33, in
term = self.queue.get()
File "c:\python27\lib\site-packages\pronto\utils.py", line 257, in get
self.size.increment(-1)
AttributeError: 'Queue' object has no attribute 'size'
Process _OboClassifier-8:
Traceback (most recent call last):
File "c:\python27\lib\multiprocessing\process.py", line 258, in _bootst
self.run()
File "c:\python27\lib\site-packages\pronto\parser\obo.py", line 33, in
P rocess _OboClassifier-2:
term = self.queue.get()
Traceback (most recent call last):
File "c:\python27\lib\multiprocessing\process.py", line 258, in _bootst
P self.run()
rocess _OboClassifier-4:
T raceback (most recent call last):
File "c:\python27\lib\site-packages\pronto\utils.py", line 257, in get
File "c:\python27\lib\site-packages\pronto\parser\obo.py", line 33, in r
File "c:\python27\lib\multiprocessing\process.py", line 258, in _bootst
term = self.queue.get()
self.run()
self.size.increment(-1)
File "c:\python27\lib\site-packages\pronto\parser\obo.py", line 33, in r
File "c:\python27\lib\site-packages\pronto\utils.py", line 257, in get
term = self.queue.get()
File "c:\python27\lib\site-packages\pronto\utils.py", line 257, in get
self.size.increment(-1)
AttributeError: 'Queue' object has no attribute 'size'
AttributeError: 'Queue' object has no attribute 'size'
self.size.increment(-1)
AttributeError: 'Queue' object has no attribute 'size'
Process _OboClassifier-6:
Traceback (most recent call last):
File "c:\python27\lib\multiprocessing\process.py", line 258, in _bootst
Process _OboClassifier-16:
Traceback (most recent call last):
File "c:\python27\lib\multiprocessing\process.py", line 258, in _bootst
self.run()
self.run()
File "c:\python27\lib\site-packages\pronto\parser\obo.py", line 33, in
P rocess _OboClassifier-9:
File "c:\python27\lib\site-packages\pronto\parser\obo.py", line 33, in r
TPraceback (most recent call last):
rocess _OboClassifier-7:
PT File "c:\python27\lib\multiprocessing\process.py", line 258, in _bo
term = self.queue.get()
term = self.queue.get()
rocess _OboClassifier-10:
raceback (most recent call last):
PT File "c:\python27\lib\site-packages\pronto\utils.py", line 257, in get
self.run()
File "c:\python27\lib\site-packages\pronto\utils.py", line 257, in get
rocess _OboClassifier-14:
raceback (most recent call last):
PP PT self.size.increment(-1)
File "c:\python27\lib\multiprocessing\process.py", line 258, in _bootstr
File "c:\python27\lib\site-packages\pronto\parser\obo.py", line 33, in r
rocess _OboClassifier-5:
rocess _OboClassifier-12:
self.size.increment(-1)
rocess _OboClassifier-11:
raceback (most recent call last):
File "c:\python27\lib\multiprocessing\process.py", line 258, in _bootstr
A T PTPPAT ttributeError: 'Queue' object has no attribute 'size'
self.run()
raceback (most recent call last):
term = self.queue.get()
rocess _OboClassifier-15:
raceback (most recent call last):
rocess _OboClassifier-13:
rocess _OboClassifier-3:
ttributeError: 'Queue' object has no attribute 'size'
raceback (most recent call last):
File "c:\python27\lib\multiprocessing\process.py", line 258, in _bootstr
T TT self.run()
File "c:\python27\lib\site-packages\pronto\parser\obo.py", line 33, in r
File "c:\python27\lib\multiprocessing\process.py", line 258, in _bootstr
File "c:\python27\lib\site-packages\pronto\utils.py", line 257, in get
raceback (most recent call last):
File "c:\python27\lib\multiprocessing\process.py", line 258, in _bootstr
raceback (most recent call last):
raceback (most recent call last):
File "c:\python27\lib\multiprocessing\process.py", line 258, in _bootstr
self.run()
File "c:\python27\lib\site-packages\pronto\parser\obo.py", lin
term = self.queue.get()
File "c:\python27\lib\multiprocessing\process.py", line 258, in _bootstr
File "c:\python27\lib\multiprocessing\process.py", line 258, in _bootstr
self.run()
File "c:\python27\lib\multiprocessing\process.py", line 258, in _bootstr
self.size.increment(-1)
self.run()
self.run()
File "c:\python27\lib\site-packages\pronto\parser\obo.py", line 33, in r
A File "c:\python27\lib\site-packages\pronto\utils.py", line 257,
term = self.queue.get()
File "c:\python27\lib\site-packages\pronto\parser\obo.py", line 33, in
ttributeError: 'Queue' object has no attribute 'size'
self.run()
self.run()
File "c:\python27\lib\site-packages\pronto\parser\obo.py", line 33, in r
File "c:\python27\lib\site-packages\pronto\parser\obo.py", line 33, in r
self.run()
term = self.queue.get()
File "c:\python27\lib\site-packages\pronto\utils.py", line 257, in
self.size.increment(-1)
File "c:\python27\lib\site-packages\pronto\parser\obo.py", line 33, i
term = self.queue.get()
File "c:\python27\lib\site-packages\pronto\parser\obo.py", line 33, in r
File "c:\python27\lib\site-packages\pronto\parser\obo.py", line 33, in r
File "c:\python27\lib\site-packages\pronto\utils.py", line 257, in get
A term = self.queue.get()
term = self.queue.get()
ttributeError: 'Queue' object has no attribute 'size'
File "c:\python27\lib\site-packages\pronto\utils.py", line 257, in ge
self.size.increment(-1)
term = self.queue.get()
term = self.queue.get()
File "c:\python27\lib\site-packages\pronto\utils.py", line 257, in get
term = self.queue.get()
File "c:\python27\lib\site-packages\pronto\utils.py", line 257, in get
self.size.increment(-1)
A AttributeError: 'Queue' object has no attribute 'size'
File "c:\python27\lib\site-packages\pronto\utils.py", line 257, in get
self.size.increment(-1)
File "c:\python27\lib\site-packages\pronto\utils.py", line 257, in get
File "c:\python27\lib\site-packages\pronto\utils.py", line 257, in get
self.size.increment(-1)
ttributeError: 'Queue' object has no attribute 'size'
A self.size.increment(-1)
AttributeError: 'Queue' object has no attribute 'size'
A self.size.increment(-1)
self.size.increment(-1)
self.size.increment(-1)
ttributeError: 'Queue' object has no attribute 'size'
ttributeError: 'Queue' object has no attribute 'size'
AAAttributeError: 'Queue' object has no attribute 'size'
ttributeError: 'Queue' object has no attribute 'size'
ttributeError: 'Queue' object has no attribute 'size'

Synonym scope after each synonym necessary?

Hi,

I have another question. When I load my obo terminology and print it out via "ont.obo", all terms automatically get a synonym scope "RELATED", even if I already defined the scope in the synonym type definition: (synonymtypedef: DEFAULT_MODE "Default Mapping Mode" RELATED)

in:
synonym: "First Concept" DEFAULT_MODE []

out:
synonym: "First Concept" RELATED DEFAULT_MODE []

Is there a way to avoid this behavior?

All the best
Philipp

rchildren() not working?

Hi,

I tried to load the attachted terminology and access the subtree of term with id 1 via:

term = ont['1']
children = term.rchildren()
for child in children:
do_something

Te recursion doesn't end and finally stops with:
RecursionError: maximum recursion depth exceeded while calling a Python object

Do you know why?
All the best
Philipp

MappingModi.obo.txt

Ontology import fails occasionally

Hi, thanks for fixing the inline comments bug so quickly. I have another issue to report, but this is harder to pin down since it doesn't happen all the time (it happens maybe 10-20% of the time?)

If I import the Uberon ontology:

uberon_ontology = Ontology(path='http://purl.obolibrary.org/obo/uberon.owl', imports=False)

I sometimes get this exception and it fails:

Traceback (most recent call last):
  File "/home/pplettner/venv/lib/python3.5/site-packages/pronto/parser/owl.py", line 441, in _classify
    new_term[owl_to_obo[k]] = rawterm[k]['data']
KeyError: 'data'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./import_ontology.py", line 61, in <module>
    uberon_ontology = Ontology(path='http://purl.obolibrary.org/obo/uberon.owl', imports=False)
  File "/home/pplettner/venv/lib/python3.5/site-packages/pronto/ontology.py", line 80, in __init__
    self.parse(handle, parser)
  File "/home/pplettner/venv/lib/python3.5/site-packages/pronto/ontology.py", line 197, in parse
    self.meta, self.terms, self.imports = p.parse(stream)
  File "/home/pplettner/venv/lib/python3.5/site-packages/pronto/parser/owl.py", line 368, in parse
    terms = cls._classify(_rawterms)
  File "/home/pplettner/venv/lib/python3.5/site-packages/pronto/parser/owl.py", line 443, in _classify
    new_term[k] = rawterm[k]['data']
KeyError: 'data'

The specific line in the ontology that makes this fail is this:

<obo:UBPROP_0000008 rdf:datatype="http://www.w3.org/2001/XMLSchema#string"></obo:UBPROP_0000008>

which makes:
k=UBPROP_0000008
rawterms[k]={'datatype': 'http://www.w3.org/2001/XMLSchema#string'}

It's strange because it can succeed if I run it again, but not always. Thanks in advance.

Grabbing comments from Terms

Is there a way to grab comments from individual terms if they exist? I couldn’t locate anything on it in the documentation

Make Ontologies object serializable

Attempting to pickle a pronto.ontology.Ontology

Results in the error:

TypeError: cannot serialize '_io.BufferedReader' object```

Being able to serialize these objects would help with the fact that 
owl is  slow to parse.


Strange NoneType errors with Terms when not accessed directly from an Ontology object

I'm getting some very unusual errors with the latest version of Pronto.

I've got this method for reading the ontology:

def read_psimi():
    return pronto.Ontology(DATA_DIRECTORY + PSIMI_FILENAME)

If I access a term in the ontology like this:

psimi = read_psimi()
psimi_root = psimi[ROOT_DETMETHOD_ID]

Then I can use psimi_root however I want.

However, if I do this:

psimi_root = read_psimi()[ROOT_DETMETHOD_ID]

I get a broken psimi_root that gives errors if I try to print it or get its subclasses or superclasses.

AttributeError: 'NoneType' object has no attribute 'name'
AttributeError: 'NoneType' object has no attribute 'get_relationship'

It seems this problem might be happening whenever the Ontology object is not in the local scope. So even if I do get a functional psimi_root, if I return it from a method to a scope where the original Ontology is not available, it becomes broken. Some kind of garbage collection thing?

Term.replaced_by raises KeyError

Hello,

I'm using pronto to parse the Human Phenotype Ontology (HPO) which is in obo format (file here). When I'm given an obsolete term, I would like to automatically fall back to the term that replaced it via the replaced_by field.

However, whenever I try to access the replaced_by property of a Term, I get a strange KeyError.

Example of an obsolete term in HPO:

[Term]
id: HP:0000284
name: obsolete Abnormality of the ocular region
is_obsolete: true
replaced_by: HP:0000315

Example of code used with the error:

>>> from pronto import Ontology
>>> ontology = Ontology("http://purl.obolibrary.org/obo/hp.obo")
__main__:1: UnicodeWarning: unsound encoding, assuming ISO-8859-1 (73% confidence)
>>> term = ontology["HP:0000284"]
>>> term
Term('HP:0000284', name='obsolete Abnormality of the ocular region')
>>> term.obsolete
True
>>> term.replaced_by
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "XXX/lib/python3.7/site-packages/pronto/term.py", line 402, in replaced_by
    return frozenset({ontology.get_term(t) for t in termdata.replaced_by})
  File "XXX/lib/python3.7/site-packages/pronto/term.py", line 402, in <setcomp>
    return frozenset({ontology.get_term(t) for t in termdata.replaced_by})
  File "XXX/lib/python3.7/site-packages/pronto/utils/meta.py", line 81, in newfunc
    return func(*args, **kwargs)
  File "XXX/lib/python3.7/site-packages/pronto/ontology.py", line 373, in get_term
    raise KeyError(id)
KeyError: 'H'

The target term exists in the ontology and is correctly loaded:

>>> ontology["HP:0000315"]
Term('HP:0000315', name='Abnormality of the orbital region')

Is it a problem with the ontology file or with pronto ? And can I just get the raw value of the replaced_by field and handle it myself ?

Thank you in advance

Versions used : Python 3.7.5, pronto 1.1.2

Synonym names are mistakenly considered unique

Hi Martin,

I found another issue. When I set the synonym type of a synonym, it sets this synonym type for all synonyms in the ontology with the same descriptor, regardless whether it's in the same term or not.

Example: when applying the code below, it sets erroneously both "Second Synonym" to IGNORE_MODE.

All the best
Philipp

term = ont['2']
for syn in term.synonyms:
    syn.syn_type = synTypeList['IGNORE_MODE']  

synonymtypedef: EXACT_MODE "Exact Mapping Mode" EXACT
synonymtypedef: IGNORE_MODE "Ignore Mapping Mode" EXACT

[Term]
id: 1
name: First Concept
synonym: "Second Synonym" EXACT_MODE []

[Term]
id: 2
name: First Child
synonym: "Second Synonym" EXACT_MODE []

AttributeError: module 'pronto' has no attribute 'Ontology'

When I try to use the Ontology class I get an attribute error, "module pronto has no attribute Ontology'

After much tracking, we found that the module 'six' is REQUIRED and that the package import __init__.py catches the import error but passes and fails to catch this issue.

In summary, install six and all is well.

Thanks!
Sofia and Jessen

deepcopy doesn't work properly

deepcopy should be transitive. But this test fails:

import copy
import unittest
import pronto

class TermEqualityTestCase(unittest.TestCase):
    def test_demo_deepcopy_bug(self):
        obo = pronto.Ontology("http://purl.obolibrary.org/obo/doid.obo")
        term = obo['DOID:0001816']
        term_copy = copy.deepcopy(term)
        self.assertEqual(term, term_copy)
        term_copy_copy = copy.deepcopy(term_copy)
        # pronto doesn't properly implement deepcopy()
        # this assert should succeed, but it fails
        self.assertEqual(term, term_copy_copy)

Thanks, Arthur

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.