Giter Site home page Giter Site logo

drugbank-to-biopax's Introduction

drugbank-to-biopax

Build with Maven

Originated from https://bitbucket.org/armish/gsoc14 and will continue here (ToDo).

DrugBank to BioPAX Level3 data converter.

Most of the drug-target interactions in DrugBank do not contain detailed process information—i.e. how the drug actually potentiates/inhibits the protein. Although their XML data export is easy to parse, conversion of this to BioPAX will require discussion with Pathway Commons and deciding on the best way to represent these interactions.

Data source

Implementation details

DrugBank XML file contains sufficient information to create SmallMolecule and Protein instances with proper external identification to them. There are multiple types of relationships between SmallMolecules and Proteins presented in this database: Transporters, Targets, Enzymes. For the purpose of this project, we are interested in capturing drug-target relationships, hence we parse only this information and convert it to BioPAX BiochemicalReactions. These binary relationships do not contain structured mechanism information, e.g. how a drug inhibits or potentiates a target. We have decided to encode this information in BioPAX by using SequenceModificationFeatures with active and inactive terms and associating these features with target proteins. Therefore, the final BioPAX model contains BiochemicalReactions where drugs regulate the reaction and the participant protein either gets activated. The type of regulation is based on the controlled vocabulary adopted by DrugBank. If the interaction type is one of the following, then we represent that as a negative regulation, meaning that drug inhibits the inactivation of the protein (double negative): substrate, agonist, inducer, potentiator, stimulator, cofactor or ligand. Otherwise, the drug positively regulates the inactivation reaction.

The following screenshot, for example, shows positive (green edges) and negative (red edges) regulations by some drugs:

https://bitbucket.org/armish/gsoc14/downloads/goal3_drugbank_screenshot_20140730.jpg

For some drugs, the XML file also contains information about how the drug is metabolized by various enzymes, but we currently do not capture this in the final model. The reason we are not doing this is partly due to incomplete knowledge (especially regarding the intermediate chemicals) and partly due to the fact that SMPDB knowledgebase already has these in BioPAX.

Usage

Check out (git clone) and change to:

$ cd drugbank-to-biopax

build with Maven:

$ mvn clean package

This will create a single executable JAR file under the target/ directory, with the following file name: drugbank-to-biopax.jar. Once you have the single JAR file, you can try to run without any command line options to see the help text:

$ java -jar drugbank-to-biopax.jar 
usage: DrugbankToBiopax
 -d,--drugs <arg>    structured DrugBank data (XML) [required]
 -o,--output <arg>   Output (BioPAX file) [required]

The only input file required for this conversion is the XML file that is distributed by DrugBank. Once downloaded, you can then convert the model as follows:

$ java -jar drugbank-to-biopax.jar -d drugbank.xml -o drugbank.owl

Validation

The (OLD) validation report is available under the Downloads: goal3_drugbank_validationResults_20140730.zip. The converted model does not have errors, but it produces warnings for few known cases.

The first noticeable problem has to do with the active and inactive terms of the modification features. These terms are not registered in Miriam, but they have been used in BioPAX models, especially in NCI-PID. Ideally instead of these terms, we encode the activation reaction with all mechanistic details; but since this information is not available, we have to ignore these warnings for the time being.

Another problem that validator captures has to do with the external links. During conversion, we create RelationshipXrefs for SmallMolecules as they are provided by DrugBank. The database names for some of these links are not registered in Miriam, hence we get unknown.db warnings in the validation report. One way to deal with this is to manually fix the database names to match them with the existing Miriam identifiers, but ideally we will work with DrugBank to get these names standardized.

drugbank-to-biopax's People

Contributors

armish avatar dependabot[bot] avatar igorrodchenkov avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

drugbank-to-biopax's Issues

Acetylation events are not necessarily activation/inhibition

(B. Arman Aksoy created an issue in the original gsoc14 Bitbucket project)

From Ozgun:

While looking at the DrugBank data in PC2v7 using ChiBE, I noticed the below graph.

It says acetylsalicylic acid inhibits TP53. When I go to DrugBank and read the referenced paper, I see that aspirin acetylates TP53. And it is known that TP53 is not inhibited but activated by acetylation. So this conversion is wrong.

I am not sure of how to correct this though. Yes, many drug targets are inhibited, but not all of them. Just assuming inactivation do not work for TP53. The effect type information is not on DrugBank website. It is probably too much missing info for a BioPAX level model.

unnamed.png
image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.