Giter Site home page Giter Site logo

denguedrug's Introduction

DenD Open In Collab License: MIT GitHub Repo stars Github all releases Python GitHub contributors Github tag

denguedrug

Deep Learning and Molecular Docking Prediction of Potential Inhibitors against Dengue Virus

Overview

In this project, we are building a in silico pipeline to identify novel dengue virus inhibitors. We will incorporate Deep/Machine Learning (DL/ML) and molecular modelling techniques into the pipeline.

Dengue virus (DENV) is a Flaviviridae family member responsible for the most prevalent mosquito-borne viral hemorrhagic fever. The transmission of Dengue virus to humans primarily occurs through mosquito bites from species such as Aedes aegypti and Aedes albopictus, widespread in tropical and subtropical climates, including both urban and rural regions. The severe and sometimes fatal diseases known as dengue hemorrhagic fever (DHF) and dengue shock syndrome (DSS) can develop in certain people infected with DENV. The spread of dengue fever has resulted in several medical emergencies and deaths for which no drug is currently available. Despite its prevalence, the treatment administered is symptomatic. The structural information available for the DENV presented an opportunity to discover potent antiviral agents capable of disrupting the early stages of DENV infection. Our approach seeks to train different machine learning models using the Anti-Dengue dataset from PubChem to distinguish between potential anti-Dengue compounds and non-anti-Dengue compounds. Subsequently, we will further screen the predicted compounds against a Dengue protein target for downstream analysis. Details of the pipeline can be found in the workflow below.

Please cite and star the repository if you utilise the pipeline for research or commercial purposes

Table of contents

  1. Objectives
  2. Description
  3. Manuscript
  4. Results
  5. How to use
  6. Data availability
  7. Credits

Objectives

  • Identify dengue virus protein target.
  • Identify dengue virus ligand database for DL/ML training and molecular modeling method validation.
  • Determine DL/ML algorithm to be utilised in project.
  • Process ligand database and train DL/ML model.
  • Evaluate DL/ML performance.
  • Validate molecular modeling method using prepared ligand database (Actives vs non-actives).
  • Virtual screening of predicted actives into identified protein crystal structures.
  • Assess and identify hits using criterion: docking score, interactions with important residues.
  • Assess hits ADMET properties.
  • Conduct MD simulations to determine compounds' binding mode stability and binding free energy.

Description


The figure illustrates the proposed DengueDrug pipeline to be utilized to identify Dengue Virus Inhibitors.

Proposed Dengue Drug Identification Pipeline
Proposed Dengue Drug Identification Pipeline

Step 1: Identification of Dengue Virus inhibitors database for ML training

The ligand database was obtained from PubChem (BioAssay ID: 651640). The ligand database was experimentally generated using (in vivo) DENV2 CPE-Based HTS Measured in Cell-Based and Microorganism Combination System method by the Broad Institute. A total of 347,136 compounds were analyzed for their Dengue Virus inhibition and 5,946 actives and 324,845 non-actives were identified. A active is represented as a compound that can exhibit an ATP activity level above 20% at 10 $\mu M$.

Step 2: Preprocessing

  • The unprocessed database can be found here.

  • The molecular descriptors of the actives and inactives were calculated using PaDEL-Descriptors. The descriptors of the actives and inactives were calculated using the DescriptorCalculator.py script.

  • The actives and inactives databases were combined and all missing descriptors were filled with the value 0. Next dimensionality reduction was conducted using a variance filter (scikit-learn VarianceThreshold library)

  • The data was then standardized using the mean and standard deviation metrics

Step 3: Model construction

  • The data was split into training, test and external datasets. The training dataset was equivalent to 70% (14875 compounds) of the data set and the test and external data sets were equivalent to 15% (~3188) each. The training dataset contained 3105 actives vs 11770 inactives.

  • The ML models were constructed using lazy predict python package. The models that exhibited the greatest Accuracy, F1-score, Balanced Accuracy and ROC AUC metrics were selected for validation.

  • The models chosen for further validation were K-Nearest Neighbours, Naive Bayes, Support Vector Machine, Random Forest and Logistic regression. The models can be found here. Using K-fold splitting of the training data the models were cross-validated and the model's suitability was evaluated using the Accuracy, F1-score, Precision, Recall and Specificity, and false and true positive and negative rate metrics.

  • The models' prediction ability was assessed using the test data. The model's prediction accuracy was determined using Accuracy, F1-score, Precision and Recall metrics.

  • The logistic regression (LR) model exhibited the greatest results on the test dataset and therefore was evaluated on the external dataset. The LR model obtained an 82% active and 98% inactive accuracy.

Step 4: Prediction

Step 5: Molecular Docking

  • The crystal structure of the dengue 2 virus envelope protein (PDB: 10KE) was identified for structure-based virtual screening.

  • AutoDock Vina was utilized to screen the 7,722 compounds into the dengue 2 virus envelope protein.

  • The potential hits were selected using the criterion:

    • AutoDock Vina binding score
    • Presence of binding interactions between important binding site residues and ligand (LigPlot + v1.4.5).

Step 6: ADMET prediction

  • The ADMET properties of the identified hits will be predicted using SwissADME.
  • The hits with potential pharmacokinetic and toxicity moieties will be removed.

Step 7: Molecular Dynamic (MD) Simulations

  • The hits binding mode stability will be assessed through a 100-nanosecond (ns) MD simulations utilising GROMACS.
  • The stability will be assessed using metrics like root-mean-square deviation (RMSD) and fluctuation (RMSF), Radius of Gyration, etc.
  • The compounds binding interactions retention with important residues throughout the MD simulations will be assessed with the ProLIF python library.
  • The compounds binding free energy throughout the MD simulation was calculated using Molecular Mechanics Poisson-Boltzmann Surface Area (MMPBSA).

Manuscript

When using the pipeline for research or commercial purposes please cite our research.

How to use

The documentation and tutorial give a general overview of how the pipeline can be utilized for identifying novel Dengue Virus inhibitors.

Tutorial 1

DL/ML pipeline describes how the models were constructed, validated and selected.

Tutorial 2

Molecular docking and dynamics

Data availability

The data utilized for the project can be found here

Prerequisites

The codes and scripts were run on Python 3.8.

Credits

The Team members include:

  1. George Hanson – [email protected]
  2. Joseph Adams - [email protected]
  3. Kepgang Daveson Innocento Brank - [email protected]
  4. Andy Asante - [email protected]
  5. Emmanuel Israel Nsedu - [email protected]
  6. Hem Bondarwad – [email protected]
  7. Kisaakye Maureen - [email protected]
  8. Lewis Tem - [email protected]
  9. Luke Zondagh - [email protected]
  10. Soham Amod Shirolkar - [email protected]

denguedrug's People

Contributors

luke-zondagh avatar davesonbrank avatar soham2400 avatar laitanawe avatar hanson54 avatar

Stargazers

 avatar Mr-Nnobody avatar

Watchers

Allissa Dillman avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.