Giter Site home page Giter Site logo

msa-augmentor's Introduction

MSA-Augmentor codebase

method (1)

codebase for paper Enhancing the Protein Tertiary Structure Prediction by Multiple Sequence Alignment Generation arxiv

Pretrain

All the commands are designed for slurm cluster, we use huggingface trainer to pretrain the model, more details could be find here

  1. Construct local binary dataset ( load training data from cluster is too slow, so it's better to fisrt construct all your dataset to .bin file as shown in datasets )

    python utils.py \
       --output_dir ./datasets/ \
       --random_src --src_seq_per_msa_l 5\
       --src_seq_per_msa_u 10 \
       --total_seq_per_msa 25 \
       --local_file_path  path_to_pretrained_dataset 
    
  2. install dependency libraries pip install -r requirements.txt

  3. bash run.sh

Inference

  1. download checkpoints
  2. run inference by bash scripts/inference.sh

Note: all inference code is in inference.py

Evaluation

DATASET MSA STRUCTURE
CASP15 https://zenodo.org/record/8126538 google drive

Alphafold2 Prediction

  1. Please refer to Alphafold2 GitHub to learn more about set up af2.

  2. We provide scripts to use alphafold2 to launch protein structure prediction by bash scripts/run_af2, one need to modify msa directory

LDDT

  1. follow this document for lddt evaluation tool download https://www.openstructure.org/
  2. follow this document for https://www.openstructure.org/docs/2.4/mol/alg/lddt/ usage

Ensemble

Directly run following to get .json file of final results.

python ensemble.py --predicted_pdb_root_dir ./af2/casp15/orphan/A1T3R1.5/

📎 Citation

@misc{zhang2023enhancing,
      title={Enhancing the Protein Tertiary Structure Prediction by Multiple Sequence Alignment Generation}, 
      author={Le Zhang and Jiayang Chen and Tao Shen and Yu Li and Siqi Sun},
      year={2023},
      eprint={2306.01824},
      archivePrefix={arXiv},
      primaryClass={q-bio.QM}
}

📧 Contact

please let us know if you have further questions or comments, reach out to [[email protected]](

msa-augmentor's People

Contributors

lezhang7 avatar

Stargazers

旧城筱雨 avatar Febrina Margaretha avatar eric avatar Johnny Tam avatar Sophie Colette avatar Jeff Carpenter avatar Richard Song avatar Bozitao Zhong avatar NobHappy avatar Bozhen Hu avatar Hanjin Bae avatar Yihai Luo avatar  avatar 邱子杰 avatar ZhiyeGuo avatar  avatar  avatar barnabas avatar

Watchers

 avatar

Forkers

richards0268

msa-augmentor's Issues

demo

could you please give a demo for this work for those who only have a personal pc to try this project?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.