denguedrug

Deep Learning and Molecular Docking Prediction of Potential Inhibitors against Dengue Virus

Overview

In this project, we are building a in silico pipeline to identify novel dengue virus inhibitors. We will incorporate Deep/Machine Learning (DL/ML) and molecular modelling techniques into the pipeline.

Dengue virus (DENV) is a Flaviviridae family member responsible for the most prevalent mosquito-borne viral hemorrhagic fever. The transmission of Dengue virus to humans primarily occurs through mosquito bites from species such as Aedes aegypti and Aedes albopictus, widespread in tropical and subtropical climates, including both urban and rural regions. The severe and sometimes fatal diseases known as dengue hemorrhagic fever (DHF) and dengue shock syndrome (DSS) can develop in certain people infected with DENV. The spread of dengue fever has resulted in several medical emergencies and deaths for which no drug is currently available. Despite its prevalence, the treatment administered is symptomatic. The structural information available for the DENV presented an opportunity to discover potent antiviral agents capable of disrupting the early stages of DENV infection. Our approach seeks to train different machine learning models using the Anti-Dengue dataset from PubChem to distinguish between potential anti-Dengue compounds and non-anti-Dengue compounds. Subsequently, we will further screen the predicted compounds against a Dengue protein target for downstream analysis. Details of the pipeline can be found in the workflow below.

Please cite and star the repository if you utilise the pipeline for research or commercial purposes

Objectives
Description
Manuscript
Results
How to use
Data availability
Credits

Objectives

Description

The figure illustrates the proposed DengueDrug pipeline to be utilized to identify Dengue Virus Inhibitors.

Proposed Dengue Drug Identification Pipeline

Step 1: Identification of Dengue Virus inhibitors database for ML training

The ligand database was obtained from PubChem (BioAssay ID: 651640). The ligand database was experimentally generated using (in vivo) DENV2 CPE-Based HTS Measured in Cell-Based and Microorganism Combination System method by the Broad Institute. A total of 347,136 compounds were analyzed for their Dengue Virus inhibition and 5,946 actives and 324,845 non-actives were identified. A active is represented as a compound that can exhibit an ATP activity level above 20% at 10 $\mu M$.

Step 2: Preprocessing

The unprocessed database can be found here.
The molecular descriptors of the actives and inactives were calculated using PaDEL-Descriptors. The descriptors of the actives and inactives were calculated using the DescriptorCalculator.py script.
The actives and inactives databases were combined and all missing descriptors were filled with the value 0. Next dimensionality reduction was conducted using a variance filter (scikit-learn VarianceThreshold library)
The data was then standardized using the mean and standard deviation metrics

Step 3: Model construction

The data was split into training, test and external datasets. The training dataset was equivalent to 70% (14875 compounds) of the data set and the test and external data sets were equivalent to 15% (~3188) each. The training dataset contained 3105 actives vs 11770 inactives.
The ML models were constructed using lazy predict python package. The models that exhibited the greatest Accuracy, F1-score, Balanced Accuracy and ROC AUC metrics were selected for validation.
The models chosen for further validation were K-Nearest Neighbours, Naive Bayes, Support Vector Machine, Random Forest and Logistic regression. The models can be found here. Using K-fold splitting of the training data the models were cross-validated and the model's suitability was evaluated using the Accuracy, F1-score, Precision, Recall and Specificity, and false and true positive and negative rate metrics.
The models' prediction ability was assessed using the test data. The model's prediction accuracy was determined using Accuracy, F1-score, Precision and Recall metrics.
The logistic regression (LR) model exhibited the greatest results on the test dataset and therefore was evaluated on the external dataset. The LR model obtained an 82% active and 98% inactive accuracy.

Step 4: Prediction

The LR model was employed to screen the Northern African Natural Products Database (NANPD), East African Natural Products Database (EANPD), AfroDBand Tradtional Chinese Medicine (TCM) database.
The natural products' chemical structures were prepared in a similar manner as for the training dataset and ~43,000 compounds were screened using the LR model.
7,722 compounds were predicted to be active and subsequently utilized for molecular docking

Step 5: Molecular Docking

The crystal structure of the dengue 2 virus envelope protein (PDB: 10KE) was identified for structure-based virtual screening.
AutoDock Vina was utilized to screen the 7,722 compounds into the dengue 2 virus envelope protein.
The potential hits were selected using the criterion:
- AutoDock Vina binding score
- Presence of binding interactions between important binding site residues and ligand (LigPlot + v1.4.5).

Step 6: ADMET prediction

The ADMET properties of the identified hits will be predicted using SwissADME.
The hits with potential pharmacokinetic and toxicity moieties will be removed.

Step 7: Molecular Dynamic (MD) Simulations

The hits binding mode stability will be assessed through a 100-nanosecond (ns) MD simulations utilising GROMACS.
The stability will be assessed using metrics like root-mean-square deviation (RMSD) and fluctuation (RMSF), Radius of Gyration, etc.
The compounds binding interactions retention with important residues throughout the MD simulations will be assessed with the ProLIF python library.
The compounds binding free energy throughout the MD simulation was calculated using Molecular Mechanics Poisson-Boltzmann Surface Area (MMPBSA).

Manuscript

When using the pipeline for research or commercial purposes please cite our research.

How to use

The documentation and tutorial give a general overview of how the pipeline can be utilized for identifying novel Dengue Virus inhibitors.

Tutorial 1

DL/ML pipeline describes how the models were constructed, validated and selected.

Tutorial 2

Molecular docking and dynamics

Data availability

The data utilized for the project can be found here

Prerequisites

The codes and scripts were run on Python 3.8.

Credits

The Team members include:

George Hanson – [email protected]
Joseph Adams - [email protected]
Kepgang Daveson Innocento Brank - [email protected]
Andy Asante - [email protected]
Emmanuel Israel Nsedu - [email protected]
Hem Bondarwad – [email protected]
Kisaakye Maureen - [email protected]
Lewis Tem - [email protected]
Luke Zondagh - [email protected]
Soham Amod Shirolkar - [email protected]

omicscodeathon / denguedrug Goto Github PK

denguedrug's Introduction

denguedrug

Overview

Table of contents

Objectives

Description

Manuscript

How to use

Data availability

Prerequisites

Credits

denguedrug's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent