Giter Site home page Giter Site logo

atom2d's Introduction

Atom2D

This repository takes the ATOM3D benchmark and applies a DiffusionNet onto a protein surface representation.

Installation

We have a dependency on pymesh, which can be installed following their homepage.

git clone https://github.com/PyMesh/PyMesh.git
cd PyMesh
git submodule update --init
export PYMESH_PATH=`pwd`

# Install Pymesh dependencies with apt-get
apt-get install libeigen3-dev libgmp-dev libgmpxx4ldbl libmpfr-dev libboost-dev \
    libboost-thread-dev libtbb-dev python3-dev
# Or in jean zay by loading modules 
module load gmp/6.1.2 eigen/3.3.7-mpi cmake/3.21.3 mpfr/4.0.2 boost/1.70.0



./setup.py build
./setup.py install
# Check everything works ok :
python -c "import pymesh; pymesh.test()"

Then you can install other packages in the usual way.

conda create -n atom2d -y
conda activate atom2d
conda install python=3.8
conda install pytorch=1.13 torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
conda install pyg -c pyg
pip install -r requirements.txt

Data

PIP

In this task, we are given two residues from two proteins and the task is to predict if they interact. To do so, we learn separate embeddings of each protein and use these embeddings in a pairwise manner.

Then one need to download and extract the appropriate data from atom3D zenodo repo. You should get the DIPS_split dataset, extract it and save it at the following path : data/DIPS-split/data/train

Then, run :

cd pip_task
python preprocess_data.py

and the surfaces should start being processed along with the operators used by DiffNets.

To learn a PPI residue predictor, run: python train.py

MSP

In this task, the goal is to predict whether a mutation is stabilizing a protein interaction. based on its modified structure. The task is framed as a binary task, where 1 indicates that the modified structure is more stable.

PSR

The goal is to predict each protein model TM-score from the ground truth released at CASP. The task is framed as a regression task on the TM score.

Structure of the project

The data processing is organized as pipeline. The object we start from is an ATOM3D Dataframe, which is the item contained in LMDB datasets. Those Dataframes are then processed :

  • by Atom3D to produce the coordinates of positive and negative pairs of CA.
  • by our pipeline to produce the geometry objects needed for DiffNets

The steps of our pipeline are the following :

  • (DF -> PDB) in df_utils.py
  • (PDB -> surface mesh .ply file) in surf_utils.py
  • (surface -> precomputed operators) in build_surfaces.py
  • (.ply mesh and PDB -> npz features) in point_cloud_utils.py

The features are obtained through RBF interpolation from the protein residues to the vertices of the mesh. We can then use the mesh and operators to embed our protein surface with DiffusionNet.

Then we use RBF interpolation from the vertices onto selected CA of each protein to get a feature vector for these atoms. Finally we feed these feature vectors to an MLP to discriminate CA that interact from the one that do not.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.