Giter Site home page Giter Site logo

mircare / brewery Goto Github PK

View Code? Open in Web Editor NEW
14.0 3.0 3.0 18.48 MB

State-of-the-art ab initio prediction of 1D protein structure annotations

Home Page: http://distilldeep.ucd.ie/brewery/

License: Other

Python 18.90% C++ 61.22% Makefile 0.44% Perl 16.41% Shell 2.54% Dockerfile 0.49%
proteins machine-learning deep-learning hhblits psi-blast protein-structure protein-sequences solvent-accessible-surface-area secondary-structure contact-density

brewery's Introduction

PWC

Brewery: prediction of 1D protein structure annotations

The web server, train and test sets of Brewery are available at http://distilldeep.ucd.ie/brewery/.
The docker container is available at https://hub.docker.com/r/mircare/brewery (HOWTO).

The predictions of the UniProtKB entries for COVID-19 are available at http://distilldeep.ucd.ie/brewery/.
See https://github.com/mircare/Porter5 to predict protein secondary structure only.

Pipeline of BreweryDiagram of the pipeline we propose to gather and exploit deeper profiles.

Setup

$ git clone https://github.com/mircare/Brewery/ --depth 1 && rm -rf Brewery/.git

Requirements

  1. Python3 (https://www.python.org/downloads/);
  2. NumPy (https://www.scipy.org/scipylib/download.html);
  3. HHblits (https://github.com/soedinglab/hh-suite/);
  4. uniprot20 (http://wwwuser.gwdg.de/~compbiol/data/hhsuite/databases/hhsuite_dbs/old-releases/uniprot20_2016_02.tgz).

Optionally (for more accurate predictions):

  1. PSI-BLAST (ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/);
  2. UniRef90 (ftp://ftp.uniprot.org/pub/databases/uniprot/uniref/uniref90/uniref90.fasta.gz).

How to run Brewery with/without PSI-BLAST

# To exploit HHblits only (for fast and accurate predictions)
$ python3 Brewery/Brewery.py -i Brewery/example/2FLGA.fasta --cpu 4 --fast

# To exploit both PSI-BLAST and HHblits (for very accurate predictions)
$ python3 Brewery/Brewery.py -i Brewery/example/2FLGA.fasta --cpu 4

How to run Brewery on multiple sequences

# To split a FASTA file with multiple sequences (Optionally)
$ python3 Brewery/split_fasta.py many_sequences.fasta

# To predict all the fasta files in a given directory (Fastas)
$ python3 Brewery/multiple_fasta.py -i Fastas/ --cpu 4 --fast

# To run multiple predictions in parallel (using a total of 8 cores)
$ python3 Brewery/multiple_fasta.py -i Fastas/ --cpu 4 --parallel 2 --fast

How to visualize the help of Brewery

$ python3 Brewery/Brewery.py --help
usage: Brewery.py [-h] [-input fasta_file] [--cpu CPU] [--fast] [--noSS]
                  [--noTA] [--noSA] [--noCD] [--distill] [--setup]

This is the standalone of Brewery5. Run it on a FASTA file to predict its
Secondary Structure in 3- and 8-classes (Porter5), Solvent Accessibility in 4
classes (PaleAle5), Torsional Angles in 14 classes (Porter+5) and Contact
Density in 4 classes (BrownAle).

optional arguments:
  -h, --help         show this help message and exit
  -input fasta_file  FASTA file containing the protein to predict
  --cpu CPU          How many cores to perform this prediction
  --fast             Use only HHblits (skip PSI-BLAST)
  --bfd              Harness also the BFD database (https://bfd.mmseqs.com/)
  --noSS             Skip Secondary Structure prediction with Porter5
  --noTA             Skip Torsional Angles prediction with Porter+5
  --noSA             Skip Solvent Accessibility prediction with PaleAle5
  --noCD             Skip Contact Density prediction with BrownAle5
  --distill          Generate useful outputs for 3D protein structure prediction
  --setup            Initialize Brewery5 from scratch (it is recommended when
                     there has been any change involving PSI-BLAST, HHblits,
                     Brewery itself, etc).

E.g., run Brewery on 4 cores: python3 Brewery5.py -i example/2FLGA --cpu 4

Use the docker image

# Set the absolute PATHs for databases and query sequences (stored locally)
$ docker run --name brewery -v /**PATH_to_uniprot20_2016_02**:/uniprot20 \
-v /**PATH_to_UniRef90_optional**:/uniref90 -v /**PATH_to_fasta_to_predict**:/Brewery/query \
--cap-add IPC_LOCK mircare/brewery sleep infinity &

# Run a prediction using 5 cores and HHblits only
$ docker exec brewery python3 Brewery.py -i query/2FLGA.fasta --cpu 5 --fast

Performances of Secondary Structure Predictors in 3 classes

Method Q3 per AA SOV'99 per AA Q3 per protein SOV'99 per protein
Brewery 83.81% 80.41% 84.32% 81.05%
SPIDER 3 83.15% 79.43% 83.42% 79.79%
Brewery HHblits only 83.06% 79.49% 83.68% 80.26%
SSpro 5.1 with templates 82.58% 78.54% 83.94% 80.29%
PSIPRED 4.01 81.88% 77.36% 82.48% 78.22%
RaptorX-Property 81.86% 78.08% 82.57% 78.99%
Porter 4 81.66% 78.05% 82.29% 78.61%
SSpro 5.1 ab initio 81.17% 76.87% 81.10% 76.92%
DeepCNF 81.04% 76.74% 81.16% 76.99%

Reference: Table 1 in https://doi.org/10.1101/289033.

Performances of Secondary Structure Predictors in 8 classes

Method Q8 per AA SOV8_refine per AA Q8 per protein SOV8_refine per protein
Brewery 73.02% 72.09% 73.92% 72.64%
SSpro 5.1 with templates 71.91% 70.72% 74.46% 73.45%
Brewery HHblits only 71.8% 71.16% 72.83% 71.74%
RaptorX-Property 70.74% 69.65% 71.78% 70.03%
DeepCNF 69.76% 68.5% 70.14% 68.06%
SSpro 5.1 ab initio 68.85% 67.54% 69.27% 67.91%

Reference: Table 2 in https://doi.org/10.1101/289033.

Performances of Solvent Accessibility Predictors in up to 4 classes

Method Q2 per AA Q3 per AA Q4 per AA
ACCpro 5.1 with templates 80.5% N.A. N.A.
Brewery 80.48% 66.41% 56.46%
PaleAle 4 78.21% N.A. 52.53%
SPIDER 3 77.91% 61.19% 49.01%
ACCpro 5.1 ab initio 76.6% N.A. N.A.
RaptorX-Property N.A. 63.25% N.A.

Performances of Torsion Angles Predictors in 14 classes

Method Q14 per AA Q14 per protein
Brewery 69.93% 70.59%
SPIDER 3 66.58% 66.27%
Porter+ 64.73% 66%

Performances of Contact Density Predictors in 4 classes

Method Q4 per AA Q4 per protein
Brewery 50.01% 48%
BrownAle 46.5% N.A.

Citation

If you use Brewery, please cite our Bioinformatics paper:

@article{torrisi_brewery_2020,
	title = {Brewery: Deep Learning and deeper profiles for the prediction of 1D protein structure annotations},
	doi = {10.1093/bioinformatics/btaa204},
	journal = {Bioinformatics},
	author = {Torrisi, Mirko and Pollastri, Gianluca}
}

References

Brewery: Deep Learning and deeper profiles for the prediction of 1D protein structure annotations,
Bioinformatics, Oxford University Press; Mirko Torrisi and Gianluca Pollastri;
Guest link: https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btaa204/5811232?guestAccessKey=9a73ae2a-2cb6-4fe1-b333-a4f3261f02cf.

Protein Structure Annotations; Essentials of Bioinformatics, Volume I. Springer Nature;
Mirko Torrisi and Gianluca Pollastri; Post-print: https://www.researchgate.net/publication/332048741_Protein_Structure_Annotations.

Deeper Profiles and Cascaded Recurrent and Convolutional Neural Networks for state-of-the-art Protein Secondary Structure Prediction, Scientific Reports, Nature Publishing Group; Mirko Torrisi, Manaz Kaleel and Gianluca Pollastri;
doi: https://doi.org/10.1038/s41598-019-48786-x.

PaleAle 5.0: prediction of protein relative solvent accessibility by deep learning, Amino Acids, Springer
Manaz Kaleel, Mirko Torrisi, Catherine Mooney and Gianluca Pollastri; Guest Link: https://rdcu.be/bNlXS.

License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Email us at gianluca[dot]pollastri[at]ucd[dot]ie if you wish to use it for purposes not permitted by the CC BY-NC-SA 4.0.

Creative Commons License

brewery's People

Contributors

mircare avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

brewery's Issues

Missing license.

Hi, I would really like to use Brewery for a project I will be starting soon, I noticed it does not have a license and that would be crucial for my project. Could you please add one ?
I know it is not always in simple to license in academia, but I can only recommend FSF licenses (https://www.gnu.org/licenses/licenses.html) to keep your code open.
Thanks in advance !
Regards, Louis

The results of output files are 0

I use this code, but most time the content of output files .ss .ss3 .ss8 is 0
python3 Brewery/Brewery.py -i Brewery/example/2FLGA.fasta --cpu 4
How to solve this issue, thank you

Screenshot from 2022-10-31 16-55-18

Brewery to evaluate conservation of secondary structure

I wonder if Brewery tools would help me solve my problem, described in brief below:

I have predicted ~ 20K protein domain sequences by using a HMM query from Pfam, and proteomes as queried databases.

Amongst the putative matches reported:

  • most are shorter in length (peak at ~37aa, as short as 17aa),
  • a small fraction are much longer (up to 87aa), and
  • rarely they match canonical length of the HMM query (48aa).

I want to check how many of these predicted domain sequences have conserved protein secondary structure, so one idea was to use hhblits to generate a HHM-HMM pairwise alignment and parse it.

But that idea also comes with more questions that I am looking to answer. (https://www.biostars.org/p/387607/#387854)

While looking for those answers, I came across your GitHub account, and wonder if you have any advice about using any of your tools (Brewery) or other tools to help me "evaluate degree of conservation of predicted SS for my input sequences"?

Thank you, in advance!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.