Giter Site home page Giter Site logo

dgidb / dgidb-v5 Goto Github PK

View Code? Open in Web Editor NEW
10.0 10.0 1.0 20.76 MB

Providing interactions between drugs and genes sourced from a variety of publications and knowledgebases

Home Page: https://dgidb.org

License: MIT License

Ruby 61.85% JavaScript 0.01% HTML 0.55% TypeScript 26.28% SCSS 3.97% PLpgSQL 7.09% CSS 0.24% Procfile 0.01%
bioinformatics biomedical-data-science drug-target-interactions genomics

dgidb-v5's Introduction

Build Status Code Climate

Rails frontend to the Drug-Gene Interaction database.

Publicly accessible instance

To use DGIdb please first visit the public instance here: DGIdb

Installation

If you would like to install a local instance of DGIdb to work on the code or maintain a private database, refer to the INSTALL-OSX or INSTALL-LINUX files for installation instructions. Additional installation and developer documentation can be found in the dgi-db wiki. If the public version of DGIdb is missing a datasource that you would like to see added, please Contact Us.

Implementation

DGIdb is built in Ruby on Rails with PostgreSQL as the primary data store. Memcached is utilized heavily for caching, as the data is largely static between new source imports. The site is served with Apache and Phusion Passenger on a server running Ubuntu 12.04 LTS (Precise Pangolin). The code itself is divided into two primary components – the web application itself and the libraries that handle the importing and normalization of new sources.

The web application is organized in a fairly traditional Model-View-Controller (MVC) architecture with a couple of notable exceptions. In an effort to keep application logic out of the view templates, presenter objects are utilized to decorate domain models with view logic while still allowing access to the underlying models through delegation. Additionally, most domain logic is pulled out into command and helper classes. This allows for a separation of concerns between the persistence layer (data model) and business logic of the application. This architecture also makes the API implementation simpler. The same back-end code runs to produce the result for both the API and the web site. At render time, the result is simply wrapped in a different presenter object and sent to a JSON template instead of an HTML template.

Two of the web application’s primary pieces of functionality are its gene name matching algorithm and its implementation of filtering. The gene name matching process attempts to account for potential ambiguity in user search terms. It first attempts to make an exact match on Entrez gene symbols. If it finds such a match, it assumes it to be what the user meant. If it is unable to find an exact Entrez match for a search term, it reverts to searching through all reported aliases for gene clusters in the system. If the system finds more than one gene cluster that matches the search term, it will classify the result as ambiguous and return all potential gene group matches. The ambiguity is expressed in both the user interface and API responses in order to help the user decide which gene they meant.

Application Programming Interface (API)

The DGIdb API can be used to query for drug-gene interactions in your own applications through a simple JSON based interface. Extensive documentation of the API including functioning code example is maintained at: http://dgidb.org/api

Citations

DGIdb 3.0: a redesign and expansion of the drug-gene interaction database. Kelsy C Cotto*, Alex H Wagner*, Yang-Yang Feng, Susanna Kiwala, Adam C Coffman, Gregory Spies, Alex Wollam, Nicholas C Spies, Obi L Griffith, Malachi Griffith. Nucleic Acids Research. 2017 Nov 16. doi: https://doi.org/10.1093/nar/gkx1143 . *These authors contributed equally to this work.

DGIdb 2.0: mining clinically relevant drug-gene interactions.Alex H Wagner, Adam C Coffman, Benjamin J Ainscough, Nicholas C Spies, Zachary L Skidmore, Katie M Campbell, Kilannin Krysiak, Deng Pan, Joshua F McMichael, James M Eldred, Jason R Walker, Richard K Wilson, Elaine R Mardis, Malachi Griffith, Obi L Griffith. Nucleic Acids Research. 2016 Jan 4;44(D1):D1036-44. doi: 10.1093/nar/gkv1165. PMID: 26531824.

DGIdb - mining the druggable genome.Malachi Griffith*, Obi L Griffith*, Adam C Coffman, James V Weible, Josh F McMichael, Nicholas C Spies, James Koval, Indraniel Das, Matthew B Callaway, James M Eldred, Christopher A Miller, Janakiraman Subramanian, Ramaswamy Govindan, Runjun D Kumar, Ron Bose, Li Ding, Jason R Walker, David E Larson, David J Dooling, Scott M Smith, Timothy J Ley, Elaine R Mardis, Richard K Wilson. Nature Methods. 2013 Dec;10(12):1209-10. doi: 10.1038/nmeth.2689. PMID: 24122041. *These authors contributed equally to this work.

License

Copyright (c) 2017 The Griffith Lab [www.griffithlab.com]

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

dgidb-v5's People

Contributors

acoffman avatar cjosu avatar jsstevenson avatar katiestahl avatar kcotto avatar korikuzma avatar mcannon068nw avatar nairod2000 avatar rbasu101 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

Forkers

layth17

dgidb-v5's Issues

Clean up gene and drug aliases

In particular, try to identify cases where non-namespaced ID numbers are getting grouped into genes and drugs and fix the importer code accordingly

Prototype drug & gene record pages

From results page, clicking individual drug or gene should route to drug or gene record. Design these layouts following design goals laid out in user story exercise.

Bring over remaining data models for drugs and genes

Similar to what we did for GeneClaims, bring over other data models from old version of DGIdb.

For now as learning exercise and general progress, we can just use the old data. We can refactor these as needed if changes to data structure occur.

Ensembl importer

Figure it out. Potentially replaced by the VICC gene normalizer, so lower priority.

Add importers

Big picture

Specific sources

  • BaderLab
  • CancerCommons
  • Caris
  • CGI
  • Clearity Foundation Biomarkers
  • Clearity Foundation Clinical Trial: Non-existent interaction claim type issue (immunotherapy) #49
  • CiVIC
  • COSMIC
  • dGene
  • DrugBank: Figure out data source, #52
  • Docm
  • DTC
  • Ensembl: #55
  • Entrez: #56
  • FDA
  • Foundation One Genes
  • GO
  • #57
  • Hingorini Casas
  • Hopkins Groom: Add gene claim category "DNA DIRECTED DNA POLYMERASE" #49
  • Human Protein Atlas
  • IDG: Could probably refactor as API importer. See #48
  • #54
  • MSK impact
  • My Cancer Genome: Non-existent interaction claim type issue (immunotherapy) #49
  • My Cancer Genome clinical trial
  • NCI: #51
  • OncoKB: #47
  • Oncomine
  • PharmGKB
  • Pharos
  • Russ-Lampel
  • TALC: #46
  • Tdg
  • Tempus
  • Tend
  • TTD

Drug attributes

  • Examine overlap/conflicts with therapy normalizer
  • Double-check current DB structure
  • Write any needed migrations

Render sample data on a front-end page

Can be anything (such as a single GeneClaim or Source citation) and doesn't have to be pretty for now.

This is essentially to learn how to link everything together and show that we can render something on the top layer front end thats stored in the bottom layer database.

Drug Approval

Evaluate DGIdb current filtering strategy/language against planned approval enum expansion for all sources:

CHEMBL_1
CHEMBL_2
CHEMBL_3
CHEMBL_4
CHEMBL_WITHDRAWN
FDA_DISCONTINUED
FDA_PRESCRIPTION
FDA_OTC
FDA_TENTATIVE
GTOPDB_APPROVED
GTOPDB_WITHDRAWN
HEMONC_APPROVED
RXNORM_PRESCRIBABLE

Process remaining new interaction claim types and gene categories

InteractionClaimType -- normalization defined in interaction claim type model
Clarity Biomarkers: "Biomarker"
Clarity Clinical Trials: "immunostimulator", "natigen", "radioimmunotherapy"
My Cancer Genome: "immunotherapy"

GeneCategory
Hopkins/Groom: "DNA DIRECTED DNA POLYMERASE"

Design a Front-end UI mock-up

We've talked about designing a new front-end UI. It would be useful for us to come up with a mock-up for what that should look like and what components it should have.

Spin up staging box on AWS

Will remain on WUSTL AWS resources

  • Add environment-specific hostnames to request URLs in client (#93)
  • Add github -> s3 deployment pipeline (of some kind) for client
  • Write CloudFormation templates for Beanstalk and RDS
  • Add some kind of deployment pipeline for server -> Beanstalk
  • Add cloudfront to templates

Prototype results page

Designing layout for results page following design goals laid out in user story exercise.

Identify way to filter drug claims by type/stage of development

From conversation with scientists in clinical research pipelines, it would be extremely useful for them to have a way to filter or sort drug claims by type/stage of development (e.g. (e.g. FDA approved drugs, drugs in clinical trials, research compounds, natural products).

Add updaters

Additionally, it'd be nice to implement better per-source deleters (so that you don't have to delete every grouping in order to delete/re-add a single source, unless this work is already done and I didn't copy them over correctly) and more optimized interaction grouping in this issue

Add interaction type and gene claim category if not already in DB

Currently, the base Importer class will raise an error if it encounters an interaction type or gene claim category that isn't already in the corresponding tables (see eg

def create_interaction_claim_type(interaction_claim, type)
)

We should (in separate issues) ensure that the normalization of the values going into those fields has satisfactory results -- but I don't think a normalized value should have to be manually added to any tables, so the constraints above should be removed, and if the value isn't already in the table, the importer should add it.

Check and update source citation data

For TALC:

For TALC, citation is different in three places but 'most correct' citation appears to be from website:

Morgensztern D, Campo MJ, Dahlberg SE, Doebele RC, Garon E, Gerber DE, Goldberg SB, Hammerman PS, Heist RS, Hensing T, et al. Molecularly targeted therapies in non-small-cell lung cancer annual update 2014. J Thorac Oncol 2015; 10: S1-63. PMID: 25535693

Some other sources appear to have dead or incorrect links, or old/weird source citation data as well

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.