akashnk / acpsicov Goto Github PK
View Code? Open in Web Editor NEWThis project forked from etaijacob/acpsicov
Protein Sparse Inverse COVariance analysis program for nucleotide and amino acid sequences
License: GNU General Public License v3.0
This project forked from etaijacob/acpsicov
Protein Sparse Inverse COVariance analysis program for nucleotide and amino acid sequences
License: GNU General Public License v3.0
Protein Sparse Inverse COVariance analysis program for nucleotide and amino acid sequences ------------------------------------------------------------------------------------------ Installation: ------------ 1. Use the make command to compile the psicov21* c code. 2. Modify the path to the two executables in the aacodonPsicov.py. This code was used in the following paper: Codon-level information improves predictions of inter-residue contacts in proteins by correlated mutation analysis Etai Jacob, Ron Unger, Amnon Horovitz Bar-Ilan University, Israel; Weizmann Institute of Science, Israel DOI: http://dx.doi.org/10.7554/eLife.08932 Published September 15, 2015 Cite as eLife 2015;4:e08932 - See more at: http://elifesciences.org/content/4/e08932 aacodonPsicov.py: ---------------- Wrapper script for the application: Protein Sparse Inverse COVariance analysis program for nucleotide and amino acid sequences By Etai Jacob, September 2015 - Copyright (C) 2015 Bar-Ilan University and Weizmann Institute of Science This code is licensed under the terms of GNU General Public License v2 or later psicov21codon.c: --------------- PSICOVcodon - Protein Sparse Inverse COVariance analysis program for nucleotide sequences This code is a modified version of PSICOV for amino acid sequences. See details below. The following modifications were implemented: 1) Analysis of codon alphabet 2) 'codonnum' function No optimizations were performed to the code. psicov21.c: ---------- From the original PSICOV code by David T. Jones: USAGE NOTES FOR PSICOV V2.1b3 (By David T. Jones August 2011 - Copyright (C) 2011 University College London) (See LICENSE file for details on licensing terms) This release of PSICOV implements a number of optimizations and speedups over the original code. Firstly a multithreaded C implementation of the glassofast code has been added. Secondly, the sequence weighting steps are now also multithreaded. The multithreading will produce small rounding discrepancies compared to the non-threaded code, but the differences should not be significant. Compiling ========= Either use the supplied makefile or compile manually: gcc -O3 -march=native -ffast-math -m64 -ftree-vectorize -fopenmp psicov21.c -lm -o psicov Note, compilation requires GCC 4.4 or later - in some Linux distributions, gcc 4.4 can be installed as "gcc44". If you don't want to use multithreading to speed up calculations, then simply omit the "-fopenmp" option or use the runtime option "-z 1" to set the maximum number of threads to 1. If you prefer to use the Intel compiler suite, then use the following compile line: icc -fast -fopenmp psicov21.c -lm -o psicov Typical usage ============= With the speedups implemented in version 2, you may as well use the slower but slightly better option below. You will probably only need to select the faster options for very large proteins. *********************************************************************************************************** psicov -p -d 0.03 demo.aln > output (Run with target precision matrix density of 3% rather than fixed rho and estimate PPVs - takes 2-3x longer than using a fixed regularization parameter, but this is now the recommended default for Version 2) *********************************************************************************************************** Some other command line examples are shown below: psicov -p -r 0.001 demo.aln > output (Run with fixed default rho and estimate precision - this was the sensible default for most cases) psicov -z 4 -p -r 0.001 -j 12 demo.aln > output (Only output contacts separated by 12 or more in the sequence and run with at most 4 threads) psicov -p -r 0.001 -g 0.3 demo.aln > output (Ignore alignment columns with > 30% gaps) Recommended options for quicker results (quick enough for large proteins): psicov -p -r 0.001 demo.aln > output Recommended (but not highly recommended) option for fastest possible approximate results (quick enough for very large proteins/large alignments): psicov -a -r 0.001 -i 62 demo.aln > output Example using HHBLITS ===================== Here is a simple 3-step example of predicting contacts for a target sequence using HHBLITS as the alignment method (using the UniRef20 database). HHBLITS can be obtained from ftp://toolkit.genzentrum.lmu.de/pub/HH-suite See HHBLITS documentation for installation instructions. 1. Search UniRef20 database and produce alignment in A3M format: hhblits -i example.fasta -d uniprot20_2012_03 -oa3m example.a3m -n 3 -diff inf -cov 60 2. Quick command line conversion from A3M to PSICOV alignment format (and remove identical rows): egrep -v "^>" example.a3m | sed 's/[a-z]//g' | sort -u > example.aln 3. Run PSICOV as previously described: psicov -p -d 0.03 example.aln > example.psicov IMPORTANT NOTES =============== Please note that a large number of _diverse_ homologous sequences are needed for contact prediction to succeed. In the original paper, the worst example had 511 homologous sequences and the best had 74836. If you try to calculate contacts with very few sequences, then at best the results will be poor, or at worst psicov will struggle to converge on a solution and keep running for ages! In the latter case, convergence can be achieved by increasing the rho parameter to 0.005 or 0.01 say (default is 0.001) e.g. psicov -r 0.005 demo.aln > output However, even if PSICOV does converge this will not necessarily produce good contact predictions - you cannot avoid the issue of needing a lot of sequence data to properly compute the initial covariance matrix. PSICOV will give an error if an alignment with too few sequences is used, and will also print a warning if there is insufficient sequence variation. You can override these checks by using the -o flag or by modifying the code, but bear in mind that PSICOV can only work well with VERY LARGE sequence families. If you only have a handful of sequences, or if the sequences are all very similar, then you simply do not have enough data to analyse. For best results you need a sequence alignment with >1000 sequences and a large amount of sequence variation across the sequences. Also note that for long sequences, PSICOV will require a large amount of RAM (and a 64-bit system) to handle the covariance matrices. For example, a target sequence of 300 residues will require nearly 3 Gb of RAM. The number of sequences in the alignment, on the other hand, has relatively little effect on memory usage.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.