ijayden-lung / evoef2 Goto Github PK
View Code? Open in Web Editor NEWThis project forked from tommyhuangthu/evoef2
EvoEF2: fast and accurate program for de novo protein sequence design
License: MIT License
This project forked from tommyhuangthu/evoef2
EvoEF2: fast and accurate program for de novo protein sequence design
License: MIT License
README for EvoEF2 ------------------------------------------------------------------------------ What's EvoEF? -------------- EvoEF is the abbreviation of EvoDesign physical Energy Function, where EvoDesign is an evolution-based approach to de novo protein sequence design. EvoDesign combines an evolution- and physics-based scoring function for protein design simulation. In the original EvoDesign, FoldX is used as the physics-based score. We found that FoldX is very slow and is not well optimized for modeling protein-protein interactions. To improve the computational accuracy and speed of the physical score for EvoDesign, we developed EvoEF to replace FoldX. Our benchmark test showed that EvoEF is more accurate than FoldX and about three to five times faster than FoldX when they were benchmarked in the same way to predict thermodynamic change upon amino acid mutations (see reference 2 for more details). What's EvoEF2? ---------------- EvoEF2 is an updated version of EvoEF. The motivation of developing EvoEF is for protein design. However, we found that EvoEF alone without evolutionary profile showed a poor performance to produce native-like sequence, suggesting the inability of EvoEF for de novo protein sequence design. We found the reason was that we optimized EvoEF by correlating with the experimental thermodynamic change data (i.e. ddG), and a potential specifically optimized for ddG prediction cannot do well for de novo protein design. We therefore extended EvoEF into EvoEF2 by introducing a few new energy terms and reoptimizing all the energy weights. EvoEF2 yielded a much better performance on sequence design than EvoEF. Although here we regard EvoEF2 as an energy function, however, it can work as a framework for different tasks of protein modeling, such as computing energy, repairing incomplete protein side chains, building mutant model, optimizing the position of rotatable hydrogens, and most importantly protein design. What EvoEF2 can do? --------------------- The following useful functions are supported in EvoEF: o ComputeStability -- compute the stability of a given protein molecule in PDB format. o ComputeBinding -- compute the binding affinity of protein-protein dimeric complexes. o RepairStructure -- repair incomplete side chains of user-provided model and minimize energy of give model to reduce possible steric clashes. o BuildMutant -- build mutant models, which are useful for ddG assessment in combination with the wild-type. Before running the BuildMutant module, we suggest users run the RepairStructure module to minimize the structure using the EvoEF2 force field o OptimizeHydrogen -- optimize the positions of hydrogens in the hydroxyl groups (e.g. Ser, Thr, and Tyr). o ProteinDesign -- de novo protein sequence design using a fixed backbone. o and some other commands (please see the source code for details) Installation ------------ First of all, download the EvoEF2 package. o If you are using a Windows system, you can directly run the executable EvoEF2.exe program. o If you are working in a Linux system, go to the EvoEF2 main directory and run: ./build.sh or run: g++ -O3 --fast-math -o EvoEF2 src/*.cpp to build the EvoEF2 program. If it does not work, try without the '--fast-math' option. o If you want to build a new EvoEF2.exe in Windows, you need to install a g++ compilier first. You can make the installation with MinGW. After you complete the installation of g++ compiler and set the environmental variable. You can test if the g++ compiler has been successfully installed with 'g++ -v' in the cmd.exe control console. If successful, it will show messages similar to: "Using built-in specs. COLLECT_GCC=g++ COLLECT_LTO_WRAPPER=c:/mingw/bin/../libexec/gcc/mingw32/8.2.0/lto-wrapper.exe Target: mingw32 Configured with: ../src/gcc-8.2.0/configure --build=x86_64-pc-linux-gnu \ --host=mingw32 --target=mingw32 --prefix=/mingw --disable-win32-registry \ --with-arch=i586 --with-tune=generic --enable-languages=c,c++,objc,obj-c++,fortran,ada \ --with-pkgversion='MinGW.org GCC-8 Thread model: win32 gcc version 8.2.0 (MinGW.org GCC-8.2.0-3)" Otherwise, it will say that g++ cannot be found. You should check your installation. If you try many times but still cannot install the program successfully, please contact report problems to the emails listed below. Usage ----- o To compute protein stability, you can run: ./EvoEF2 --command=ComputeStability --pdb=protein.pdb o To compute protein-protein binding energy of a dimer complex, you can run: ./EvoEF2 --command=ComputeBinding --pdb=complex.pdb sometimes, you may have a multi-chain complex structure, which can still be handled by EvoEF, but users need to specify how to split the chains for binding calculation. For example, say you have a four-chain (A,B,C, and D) complex, you can split the chains by: ./EvoEF2 --command=ComputeBinding --pdb=multi_chain_complex.pdb --split_chains=AB,CD which calculates the binding between partner 'AB' and 'CD'. or: ./EvoEF2 --command=ComputeBinding --pdb=multi_chain_complex.pdb --split_chains=AC,BD which calculates the binding between partner 'AC' and 'BD' In one word, users can split chains in their own ways. o To repair the structure model and do energy minimization: ./EvoEF2 --command=RepairStructure --pdb=xxxx.pdb A new structure model name "xxxx_Repair.pdb" will be built in the directory where you run the command. o To build mutation model, you can run: ./EvoEF2 --command=BuildMutant --pdb=xxxx.pdb --mutant_file=individual_list.txt where the "individual_list.txt" file shows the mutants that you want to build. It has the following format: CA171A; CA171A,DB180E; Each mutation is written in one line ending with “;”, and multiple mutants are divided by “,”. Note that there’s no gap/space between single mutations. For each single mutation, the first alphabet is the wild-type amino acid, the second is the identifier of the chain that the amino acid is attached to, the number is the position of the amino acid in the chain, and the last alphabet is the amino acid after mutation. Running the command successfully should generate mutant models namded as “xxxx_Model_0001.pdb”, "xxxx_Model_0002.pdb", etc. In the mutant model, optimized polar hydrogen coordinates are also shown. o To de novo design new sequence, you can run: ./EvoEF2 --command=ProteinDesign --monomer --pdb=monomer.pdb to design a monomer. or run: ./EvoEF2 --command=ProteinDesign --ppint --design_chains=A --pdb=complex.pdb to design the chain A of a complex. For a multi-chain complex (e.g., ABCD.pdb), if you want to design the first two chains, you can run: ./EvoEF2 --command=ProteinDesign --ppint --design_chains=AB --pdb=ABCD.pdb Note that both backbone-depdent and backbone-indepdent rotamer libraries are supported by EvoEF2, you can also specify the library that you want to use: Backbone-depdent rotamer library: >> dun2010bb.lib >> dun2010bb1per.lib (exclude the rotamers with probability < 0.01 from dun2010bb.lib) >> dun2010bb3per.lib (exclude the rotamers with probability < 0.03 from dun2010bb.lib) ------------------------------------ Backbone-indepdent rotamer library: ------------------------------------ >> honig984.lib (984 rotamers for all 20 amino acids) >> honig3222.lib (3222 rotamers for all 20 amino acids) >> honig7421.lib (7421 rotamers for all 20 amino acids) >> honig11810.lib (11810 rotamers for all 20 amino acids) By default, the dun2010bb3per.lib is used for fast speed, if you want to switch to another backbone-dependent rotamer library, you can specify: ./EvoEF2 --command=ProteinDesign --monomer --rotlib=dun2010bb --pdb=monomer.pdb Note that you do not have to add the suffix '.lib' when you specify the rotamer library or swith to a backbone-independent library by using: ./EvoEF2 --command=ProteinDesign --monomer --bbdep=disable --rotlib=honig984 --pdb=monomer.pdb we strongly suggest you use the default settings to acheive both good performance and speed. Cost and Availability --------------------- The standalone EvoEF2 program and its source code is freely available to users. But unauthorized copying of the source code files via any medium is strictly prohibited. Disclaimer and Copyright ------------------------ EvoEF2 is copyright (c) Xiaoqiang Huang ([email protected]; [email protected]) Bugs, comments and suggestions ------------------------------ If you find bugs, or have comments and suggestions for improving EvoEF, please contact Dr. Xiaoqiang Huang ([email protected] or [email protected]). References ---------- If EvoEF2 is important to your work, please cite: [1] Xiaoqiang Huang, Robin Pearce, Yang Zhang. EvoEF2: accurate and fast energy function for computational protein design. Bioinformatics (2020), 36:1135-1142. https://doi.org/10.1093/bioinformatics/btz740 [2] Robin Pearce, Xiaoqiang Huang, Dani Setiawan, Yang Zhang. EvoDesign: Designing protein-protein binding interactions using evolutionary interface profiles in conjunction with an optimized physical energy function. Journal of Molecular Biology (2019), 431:2467-2476.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.