Giter Site home page Giter Site logo

bhattacharya-lab / equipnas Goto Github PK

View Code? Open in Web Editor NEW
18.0 2.0 0.0 9.27 MB

pLM-informed E(3) equivariant deep graph neural networks for protein-nucleic acid binding site prediction

License: GNU General Public License v3.0

Python 100.00%
protein-dna-interactions protein-language-model protein-rna-interactions graph-neural-netowrks

equipnas's Introduction

EquiPNAS: improved protein-nucleic acid binding site prediction using protein-language-model-informed equivariant deep graph neural networks

by Rahmatullah Roche, Bernard Moussad, Md Hossain Shuvo, Sumit Tarafder, and Debswapna Bhattacharya

published in Nucleic Acids Research

Codebase for our improved protein-nucleic binding site prediction appraoch, EquiPNAS.

Workflow

Installation

1.) We recommend conda virtual environment to install dependencies for EquiPNAS. The following command will create a virtual environment named 'EquiPNAS'

conda env create -f EquiPNAS_env.yml

2.) Then activate the virtual environment

conda activate EquiPNAS

3.) Download the trained models from here

  • For protein-DNA binding site prediction, use models/EquiPNAS-DNA model
  • For protein-RNA binding site prediction, use models/EquiPNAS-RNA model

That's it! EquiPNAS is ready to be used.

Usage

To see usage instructions, run python EquiPNAS.py -h

usage: EquiPNAS.py [-h] [--model_state_dict MODEL_STATE_DICT] [--indir INDIR] [--outdir OUTDIR] [--num_workers NUM_WORKERS]

options:
  -h, --help            show this help message and exit
  --model_state_dict MODEL_STATE_DICT
                        Saved model
  --indir INDIR         Path to input data containing distance maps and input features (default 'datasets/DNA_test_129_Preprocessing_using_AlphaFold2/')
  --outdir OUTDIR       Prediction output directory
  --num_workers NUM_WORKERS
                        Number of workers (default=4)

Here is an example of running EquiPNAS:

1.) Input target list and all input files should be inside input preprocessing directory (examples can be found here Preprocessing/). A detailed preprocessing instructions can be found here

2.) Make an output directory mkdir output

3.) Run python EquiPNAS.py --model_state_dict models/EquiPNAS-DNA/E-l12-768.pt --indir Preprocessing/ --outdir output/

4.) The residue-level protein-DNA or protein-RNA binding site predictions are generated at output/.

Training

For protein-DNA binding site prediction, we obtain the training targets from here, and for protein-RNA binding site prediction, we obtain the training targets from here. Our full train dataset containing the train code, list, and features for both protein-DNA and protein-RNA combined altogether can be found here. The procedure for training is detailed as follows:

Train scripts

  • Download the train scripts from here

  • Extract the train scripts and move them to the current directory

    tar -xzvf train_scripts.tar.gz

    mv train_scripts/* .

Train model for protein-DNA binding site

To train protein-DNA binding site predictions in your own dataset, input train target list and all input files should be inside the train data directory and can be preprocessed as described earlier here. Example train data for protein-DNA binding site prediction can be found here.

To retrain the protein-DNA binding site prediction model with our dataset, download the train features and data from here.

  • Extract the train features

    tar -xzvf DNA_train_data.tar.gz

  • Run the train scripts:

    python train_model.py --indir DNA_train_data/ --save_dir model/DNA/

The trained model will be saved inside: model/DNA

Train model for protein-RNA binding site

To train protein-RNA binding site predictions in your own dataset, input train target list and all input files should be inside the train data directory and can be preprocessed as described earlier here Example train data for protein-RNA binding site prediction can be found here.

To retrain the protein-RNA binding site prediction model with our dataset, download the train features and data from here.

  • Extract the train features

    tar -xzvf RNA_train_data.tar.gz

  • Run the train scripts:

    python train_model.py --indir RNA_train_data/ --save_dir model/RNA

The trained model will be saved inside: model/RNA/

Test set benchmarking

For protein-DNA binding site prediction, we obtain the test targets for Test_129 from here, and for Test_181 from here For protein-RNA binding site prediction, we obtain the test targets from here. Our full test dataset containing the test list and features for all the benchmarking datasets can be found here. The procedure for test set benchmarking is detailed as follows:

Pretrained model

  • First download the trained models from here

  • Extract the models

    tar -xzvf models.tar.gz

Protein-DNA

Test_129

Prediction using AlphaFold2 predicted structural models
  • Download the test list, data, and features from here

  • Extract the features

    tar -xzvf DNA_test_129_Preprocessing_using_AlphaFold2.tar.gz
    
  • Create output prediction directory

    mkdir outputs/DNA_test_129_predictions_using_AlphaFold2/
    
  • Run EquiPNAS prediction using the pretrained protein-DNA model

     python EquiPNAS.py --model_state_dict models/EquiPNAS-DNA/E-l12-768.pt --indir DNA_test_129_Preprocessing_using_AlphaFold2/ --outdir outputs/DNA_test_129_predictions_using_AlphaFold2/
    
Prediction using experimental structures
  • Download the test list, data, and features from here

  • Extract the features

    tar -xzvf DNA_test_129_Preprocessing_using_native.tar.gz
    
  • Create output prediction directory

    mkdir outputs/DNA_test_129_predictions_using_native/
    
  • Run EquiPNAS prediction using the pretrained protein-DNA model

     python EquiPNAS.py --model_state_dict models/EquiPNAS-DNA/E-l12-768.pt --indir DNA_test_129_Preprocessing_using_native/ --outdir outputs/DNA_test_129_predictions_using_native/
    

Test_181

Prediction using AlphaFold2 predicted structural models
  • Download the test list, data, and features from here

  • Extract the features

    tar -xzvf DNA_test_181_Preprocessing_using_AlphaFold2.tar.gz
    
  • Create output prediction directory

    mkdir outputs/DNA_test_181_predictions_using_AlphaFold2/
    
  • Run EquiPNAS prediction using the pretrained protein-DNA model

     python EquiPNAS.py --model_state_dict models/EquiPNAS-DNA/E-l12-768.pt --indir DNA_test_181_Preprocessing_using_AlphaFold2/ --outdir outputs/DNA_test_181_predictions_using_AlphaFold2/
    
Prediction using experimental structures
  • Download the test list, data, and features from here

  • Extract the features

    tar -xzvf DNA_test_181_Preprocessing_using_native.tar.gz
    
  • Create output prediction directory

    mkdir outputs/DNA_test_181_predictions_using_native/
    
  • Run EquiPNAS prediction using the pretrained protein-DNA model

     python EquiPNAS.py --model_state_dict models/EquiPNAS-DNA/E-l12-768.pt --indir DNA_test_181_Preprocessing_using_native/ --outdir outputs/DNA_test_181_predictions_using_native/
    

Protein-RNA

Test_117

Prediction using AlphaFold2 predicted structural models
  • Download the test list, data, and features from here

  • Extract the features

    tar -xzvf RNA_test_117_Preprocessing_using_AlphaFold2.tar.gz
    
  • Create output prediction directory

    mkdir outputs/RNA_test_117_predictions_using_AlphaFold2/
    
  • Run EquiPNAS prediction using the pretrained protein-RNA model

     python EquiPNAS.py --model_state_dict models/EquiPNAS-RNA/E-l12-768.pt --indir RNA_test_117_Preprocessing_using_AlphaFold2/ --outdir outputs/RNA_test_117_predictions_using_AlphaFold2/
    
Prediction using experimental structures
  • Download the test list, data, and features from here

  • Extract the features

    tar -xzvf RNA_test_117_Preprocessing_using_native.tar.gz
    
  • Create output prediction directory

    mkdir outputs/RNA_test_117_predictions_using_native/
    
  • Run EquiPNAS prediction using the pretrained protein-RNA model

     python EquiPNAS.py --model_state_dict models/EquiPNAS-RNA/E-l12-768.pt --indir RNA_test_117_Preprocessing_using_native/ --outdir outputs/RNA_test_117_predictions_using_native/
    

equipnas's People

Contributors

debswapna avatar roche78 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

equipnas's Issues

Issues Encountered During Feature Generation

Hi,

I've encountered a couple of issues while generating features:

It seems that the batch_save_msa_feat.py script is not compatible with the latest ColabFold 1.5.5. Could you please specify which version of ColabFold your batch_save_msa_feat.py script is compatible with?

It appears that there is no script in the repository to generate the distmap.

To address these issues, I modified the ColabFold 1.5.5 batch.py script to extract the MSA first row features, and I wrote a script to extract the distmap based on the provided distmap example. However, the AUC obtained with my modified code on the DNA_test_129 dataset is 0.9386, which is lower than the 0.9428 AUC obtained with your provided features. I suspect there might be an issue with my modifications.

Could you please provide some guidance on these issues?

Thank you!

Unknown error uisng test data-set

Hello.

I want to use EquiPNAS to check my TF bing sites (picked up from RNA seq experiment).
After creating the environment with conda, I downloaded the published models and tested the operation.
However, anknown error occurred.
Is there any solution?

git clone https://github.com/Bhattacharya-Lab/EquiPNAS.git
cd EquiPNAS/
mamba env create -f EquiPNAS_env.yml
conda activate EquiPNAS

Python version 3.10.4

$ python EquiPNAS.py --model_state_dict models/EquiPNAS-RNA/E-l12-768.pt --indir Preprocessing/ --outdir output/


  •                        EquiPNAS                                          *
    
  •       Improved protein-nucleic binding site prediction                   *
    
  • using pretrained protein language model and equivariant deep graph learning *
  •     For comments, please email to [email protected]                   *
    

Residie-level predictions for each target is being saved at output//

Traceback (most recent call last):
File "/home/kazu/Documents/EquiPNAS/EquiPNAS.py", line 155, in
main(PARS)
File "/home/kazu/Documents/EquiPNAS/EquiPNAS.py", line 92, in main
model.load_state_dict(torch.load(PARS.model_state_dict, map_location=device))
UnboundLocalError: local variable 'model' referenced before assignment
(EquiPNAS) kazu@kazu:~/Documents/EquiPNAS$

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.