Giter Site home page Giter Site logo

gardner-binflab / razor Goto Github PK

View Code? Open in Web Editor NEW
3.0 7.0 1.0 31.02 MB

Tool to predict eukaryotic signal peptides. In addition, prediction of toxin and fungi signal peptides is also done.

Home Page: https://tisigner.com/razor

License: Other

Python 100.00%
signal-peptide cleavage-sites toxic-proteins fungi

razor's Introduction

Razor

Razor is a tool to detect signal peptides for eukaryotic protein sequences. In addition to signal peptide detection, we also detect:

  • If the signal peptide carries toxic protein.
  • Whether the signal peptide is from fungi.

Installation

Prerequisite

  • Python 3.6+

Download/Clone the source code to your device. In the source code directory, execute these commands:

pip3 install -r requirements.txt

It is highly recommended to use a virtual environment venv and install the dependencies to that environment. If you are interested in a webserver version of this tool, please check TISIGNER_ReactJS repository.

Usage

Description of available options:

usage: Razor [-h] [-v] -f FASTAFILE [-o OUTPUT] [-m MAXSCAN] [-n NCORES]

A tool to detect signal peptide

optional arguments:
  -h, --help            show this help message and exit
  -v, --version         Show program's version number and exit.
  -f FASTAFILE, --fastafile FASTAFILE
                        Input fasta file
  -o OUTPUT, --output OUTPUT
                        Output file name.
  -m MAXSCAN, --maxscan MAXSCAN
                        Check for cleavage site upto this residue. Default: 80
  -n NCORES, --ncores NCORES
                        Number of cores to use. Default: 1/4 of total cores.
  -q QUIET, --quiet QUIET
                        Do not show warnings. (yes/no). Default: yes

(c) Authors
  • m is the maximum length upto which we scan for the possible cleavage site. For example: -m 50 means we will scan upto 50th residue for the presence of a cleavage site. By default, upto 80th residue is scanned.

  • n is the number of CPU cores we will use for the computation. This will be turned off if number of sequences is less than 100. Above that, we will use one fourth of your available CPU cores by default.

Sample usage:

python3 razor.py -f example_fasta.fa

Description of results

Razor detects signal peptide in the given sequence. If signal peptide is found, it also checks if the signal peptide carries toxic proteins or is from fungi. This is done using 5 random forest model at each detection step. Consequently, we have 5 scores for each step. These scores are described below:

Signal peptide Toxin Fungi
Scores from 5 models Y_score Toxin_Scores Fungi_Scores
Prediction from 5 models True/False True/False True/False
Final scores (Median of scores above) SP_score Toxin_scores_Median Fungi_scores_Median

The result of interest is often the final scores (SP_score, Toxin_scores_Median, Fungi_scores_Median).

Cleavage site identification

Possible cleavage site is the residues where the C-score is maximum. There will be 5 probable cleavage sites form 5 models. The location of the median of these max C-scores is regarded as the final cleavage site. If all of the signal peptide predictions are False, the final cleavage site will be 0 regardless of the values in possible cleavage sites.

Final cleavage site is labelled as Cleavage after residue in the results.

Example results file

For an example signal peptide: Q07310 with cleavage after 27th residue, the result file looks like this. This has a SP_score of 0.75, with 4 out of 5 models returning True in SP_prediction. Looking at predictions for fungi and toxin, we are certain that it does not have any toxic proteins and is not of fungi origin.

Accession Sequence Y_score SP_Prediction Max_C Probable Cleavage after Cleavage after residue SP_score Fungi_Scores Fungi_Prediction Fungi_scores_Median Toxin_Scores Toxin_Prediction Toxin_scores_Median
Q07310 MSFTLHSVFFTLKVSSFLGSLV... [0.81, 0.75, 0.34, 0.75, 0.76] [True, True, False, True, True] [0.87, 0.81, 0.54, 0.8, 0.81] [27, 27, 27, 27, 27] 27 0.75 [0.06, 0.07, 0.14, 0.09, 0.06] [False, False, False, False, False] 0.07 [0.07, 0.03, 0.04, 0.16, 0.02] [False, False, False, False, False] 0.04

Cite

If you find Razor useful, please cite the following paper:

  • Bikash K Bhandari, Paul P Gardner, Chun Shen Lim. (2020). Annotating eukaryotic and toxin-specific signal peptides using Razor. bioRxiv. DOI:10.1101/2020.11.30.405613

razor's People

Contributors

bkb3 avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

herlab-bio

razor's Issues

ValueError: Unknown residues in the input sequence.

Hi,
I am running razor on my proteins as:

python3 razor.py -f proteins.fasta -o test

They come from an assembled transcriptome/ORFs called by transdecoder.

I am getting the following error:

Multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/apps/python3/3.7.0/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/apps/python3/3.7.0/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "/home/00_scripts/py3-venv/lib/python3.7/site-packages/pandarallel/pandarallel.py", line 64, in global_worker
    return _func(x)
  File "/home/00_scripts/py3-venv/lib/python3.7/site-packages/pandarallel/pandarallel.py", line 116, in wrapper
    **kwargs
  File "/home/00_scripts/py3-venv/lib/python3.7/site-packages/pandarallel/data_types/series.py", line 20, in worker
    return series.apply(func, *args, **kwargs)
  File "/home/00_scripts/py3-venv/lib/python3.7/site-packages/pandas/core/series.py", line 3848, in apply
    mapped = lib.map_infer(values, f, convert=convert_dtype)
  File "pandas/_libs/lib.pyx", line 2327, in pandas._libs.lib.map_infer
  File "razor.py", line 139, in <lambda>
    df['Analysis_'] = df['Sequence'].parallel_apply(lambda x: razor_predict(x, m))
  File "razor.py", line 71, in razor_predict
    newObj = detector.RAZOR(seq=seq, max_scan=max_scan)
  File "/scratch/user/razor/Razor/libs/detector.py", line 31, in __init__
    self.seq = functions.validate(seq, self.max_scan)
  File "/scratch/user/razor/Razor/libs/functions.py", line 77, in validate
    "Unknown residues in the input "
ValueError: Unknown residues in the input sequence.
 Only standard amino acid codes are allowed.
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "razor.py", line 159, in <module>
    main()
  File "razor.py", line 139, in main
    df['Analysis_'] = df['Sequence'].parallel_apply(lambda x: razor_predict(x, m))
  File "/home/00_scripts/py3-venv/lib/python3.7/site-packages/pandarallel/pandarallel.py", line 462, in closure
    map_result,
  File "/home/00_scripts/py3-venv/lib/python3.7/site-packages/pandarallel/pandarallel.py", line 396, in get_workers_result
    results = map_result.get()
  File "/apps/python3/3.7.0/lib/python3.7/multiprocessing/pool.py", line 657, in get
    raise self._value
ValueError: Unknown residues in the input sequence.
 Only standard amino acid codes are allowed.

I tried to run a check using seqkit seq -v -V proteins.fasta, but that doesn't find the culprit residue. Do you have any other idea what I could try?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.