Giter Site home page Giter Site logo

marvinm2 / aopwikirdf Goto Github PK

View Code? Open in Web Editor NEW
7.0 3.0 1.0 198.27 MB

This repository contains code for the AOP-Wiki XML-to-RDF conversion and guidance to deploy a Virtuoso SPARQL endpoint Docker image that is loaded with the AOP-Wiki RDF

License: MIT License

Jupyter Notebook 99.81% Python 0.19%

aopwikirdf's Issues

Ontologies without linkout

Ontologies that are not currently marked with a URI for biological objects

  • FMA (Foundational Model of Anatomy)
  • PCO (Population and Community Ontology)
  • TAIR (The Arabidopsis Information Resource) -- 3 options for this

Ontologies that are not currently marked with a URI for biological processes

  • VT (Vertebrate Trait)
  • NBO (Neuro Behaviour Ontology)
  • PCO (Population and Community Ontology)

For cell terms, organ terms and biological action, WIKI is used (meaning AOP-wiki), for which a URI should be minted.

  • WIKI

rbo != RBO in prefixes

The prefix:

@prefix rbo: <http://purl.obolibrary.org/obo/RBO_>.

Does not match its usage:

    go:0008150      RBO:00015021 ;
    pato:0000001    "WIKI:1" ;

as prefixes are case sensitive in Turtle, see https://www.w3.org/TR/turtle/

Turtle's @Prefix and @base declarations are case sensitive, the SPARQL dervied PREFIX and BASE are case insensitive.

new prefix: RBO

Add new prefix to loader and turtle syntax
RBO: radiation biology ontology

Literals used as subjects

In AOPWikiRDF.ttl there is:

"N/A"   a       pato:0001241 ;
        dc:identifier   "N/A" ;
        dc:title        "Acute phase proteins" ;
        dc:source       "N/A".

This fails to parse with raptor2:

$ curl -L https://github.com/marvinm2/AOPWikiRDF/raw/master/data/AOPWikiRDF.ttl | rapper -i turtle -o ntriples - http://aopwiki.org/ | gzip > AOPWikiRDF.n3.gz
rapper: Error - URI http://aopwiki.org/:165632 - syntax error, unexpected string literal
rapper: Failed to parse file <stdin> turtle content
rapper: Parsing returned 71090 triples

Similarly there are lots of entries like:


6500    a       cheminf:000405 ;
        cheminf:000405  "";
        dc:identifier   "6500";
        dc:source       "ChemSpider".

213108  a       cheminf:000405 ;
        cheminf:000405  "";
        dc:identifier   "213108";
        dc:source       "ChemSpider".

I think the literal on the left should be an IRI.

InchiKey Identifiers

if not che.find('{http://www.aopkb.org/aop-xml}jchem-inchi-key')==None:
chedict[che.get('id')]['jchem-inchi-key']='http://identifiers.org/inchikey/'+str(che.find('{http://www.aopkb.org/aop-xml}jchem-inchi-key').text)
if not che.find('{http://www.aopkb.org/aop-xml}indigo-inchi-key')==None:
chedict[che.get('id')]['indigo-inchi-key']='http://identifiers.org/inchikey/'+che.find('{http://www.aopkb.org/aop-xml}indigo-inchi-key').text[:-1]

We need a proper link for the inchi-keys that are present in the AOP Wiki.

DSStox IDs without linkout

if not che.find('{http://www.aopkb.org/aop-xml}dsstox-id')==None:
chedict[che.get('id')]['dsstox-id']=che.find('{http://www.aopkb.org/aop-xml}dsstox-id').text[:-1]

DSStox IDs are not registered on identifiers.org. Need to find another way to link to the right URLs.

Add output in Jupyter notebook

Now it is basically a script. To make the code and procedure more informative, the blocks of code should give some output

Create GH action for daily update

A GH action to

  • Produce the RDF based on nightly XML export of AOP-Wiki
  • Log into the server
  • Load the data

Additionally, add automatic push to Zenodo of the new RDF

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.