Giter Site home page Giter Site logo

ddaffinity's Introduction

DDAffinity-network


Description

This repo contains code for Predicting the changes in binding affinity of multiple point mutations using protein three-dimensional structure by Guanglei Yu, Qichang Zhao, Xuehua Bi and Jianxin Wang.

We proposed a ProteinMPNN-inspired $\Delta\Delta G$ predictor using 3D structure and 2D sequences of wildtype $\mathcal{WT}$ and mutant $\mathcal{MT}$ protein complex as input. The mutant structure is generated by BuildModel and Optimize module using FoldX 5.0.

  • Clipped patches: when given $\mathcal{WT}$ and $\mathcal{MT}$, we clipped $\mathcal{WT}$ and $\mathcal{MT}$ into residue patches containing 256 residues respectively, which are the 256 nearest neighbors of mutant residues based on $C_{\beta}$ distances of inter-residues, including the mutant residues itself.
  • Two-step additive Gaussian noising strategy: To improve the performance and generalization of DDAffinity, we implemented a two-step additive Gaussian noising strategy for the atomic coordinates of residues. Firstly, the additive Gaussian noise ($std=0.2\mathring{\mathrm A}$) was combined with all input atomic coordinates, which yields the perturbed backbone dihedrals $(\phi,\psi,\omega)$ and sidechain dihedrals $(\chi^{(1)},\chi^{(2)},\chi^{(3)},\chi^{(4)})$. Secondly, inspired by the ideas of ProteinMPNN that can improve predictive performance and make prediction algorithm more robust, we also incorporate Gaussian noise ($std=0.2\mathring{\mathrm A}$) to the atomic coordinates of protein backbone atom set $\boldsymbol{A}={N,C_\alpha,C,O,C_\beta}$. Importantly, this perturbation was implemented without updating the backbone dihedrals and sidechain dihedrals. Additionally, we only implemented above mentioned two-step additive Gaussian noising strategy during training.
  • How to construct the $k$-nearest neighbor graph. We use three different neighbor residues: (1) Spatial distance $k_1$. A residue will be connected to its $k_1$-nearest neighbors according to their spatial Euclidean distances, which ensures that the spatial densities of different proteins are comparable. (2) Sequential distance $k_2$. The linear interactions of residues are defined as the sequential distance between the residue $r_i$ and its sequence neighbors if their sequential distances are no more than $(k_2-1)/2$. (3) Long-range distance $k_3$. For efficiently capturing those dependencies that are long-range in sequence but local in 3D Euclidean space, neighbors of residue $r_i$ are ranked in ascending order according to their Euclidean distances, and discarded if their sequence distances are not greater than $(k_2-1)/2$. After that, we select the $k_3$-nearest neighbors from the ordered neighbor list. In summary, $k=k_1+k_2+k_3$.

Overview of our DDAffinity architecture is shown below.

cover

Install

DDAffinity Environment

conda env create -f env.yml -n DDAffinity
conda activate DDAffinity

The default PyTorch version is 1.12.1 and cudatoolkit version is 11.3. They can be changed in env.yml.

Preparation of processed dataset

We generated all protein mutant complex PDB data and wild-type complex PDB data from PDBs file data/SKEMPI2/PDBs, rde/datasets/PDB_generate.py, data/SKEMPI2/SKEMPI2.csv, and FoldX tool. Then we use rde/datasets/skempi_parallel.py to transform the PDB files of wild-type and mutant complexes into processed dataset SKEMPI2_cache.

python PDB_generate.py 
python skempi_parallel.py --reset

Datasets

Dataset Download Script Processed Dataset
SKEMPI v2 data/get_skempi_v2.sh data/SKEMPI2/SKEMPI2_cache
SKEMPI2.csv SKEMPI2_cache
M1707.csv M1707_cache
S1131.csv S1131_cache
M1340.csv M1340_cache
M595.csv M595_cache
S494.csv S494_cache
S285.csv S285_cache
Ssys.csv Ssys_cache

Trained Weights

The overall SKEMPI2 trained weights is located in: DDAffinity

The M1340 trained weights is located in: M1340

Usage

Evaluate DDAffinity
python test_DDAffinity.py ./configs/train/mpnn_ddg.yml --device cuda:0

Blind testing: non-redundant blind testing on the multiple point mutation dataset M595

python case_study.py ./configs/inference/blind_testing.yml --device cuda:0

Case Study 1: Predict Mutation Effects for SARS-CoV-2 RBD

python case_study.py ./configs/inference/case_study_1.yml --device cuda:0

Case Study 2: Human Antibody Optimization

python case_study.py ./configs/inference/case_study_2.yml --device cuda:0

Train DDAffinity

python train_DDAffinity.py ./configs/train/mpnn_ddg.yml --num_cvfolds 10 --device cuda:0

Acknowledgements

We acknowledge that parts of our code is adapted from Rotamer Density Estimator (RDE). Thanks to the authors for sharing their codes.

ddaffinity's People

Contributors

ak422 avatar

Stargazers

 avatar Johnny Tam avatar David R. Winer avatar luojiejian avatar Bipin Singh avatar GCS-ZHN avatar Daniel DeMonte avatar Veda Sheersh Boorla avatar

Watchers

 avatar kehan liu avatar

Forkers

zhanglabxmu

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.