fteufel / signalp-6.0 Goto Github PK

View Code? Open in Web Editor NEW

75.0 75.0 14.0 12.18 MB

Multi-class signal peptide prediction and structure decoding model.

Home Page: https://services.healthtech.dtu.dk/service.php?SignalP-6.0

License: Other

Python 77.36% Jupyter Notebook 22.64%

signalp-6.0's People

Contributors

Stargazers

Watchers

Forkers

nnguyen19 robertlizatovic chilluniverse yuexujiang awekie huihuimu zchwang wangjun258 yu979 jinyuansun ericdeveaud souratr

signalp-6.0's Issues

Installation error

The package is not installing via the command "pip install signalp-6-package/"_. When I installed it via "python setup.py install" command, no "signalp-6-package/models/" directory was found and no model files were present in any folder of the directory. Consequently, the package is not running well.
It used to work nice before when i installed it on my system a few months ago, package was installed via "pip install signalp-6-package/" and upon installing "signalp-6-package/models/" directory was created in which i copied the model file "distilled_model_signalp6.pt" to "signalp/model_weights/" directory and then pkg started running. I am facing errors in the same steps now!
I would appreciate the help you would provide, as I am in need of this python package. Awaiting your kind response.

Installation Error for nvidia-cusparse-cu11==11.7.4.91

Hi,

I'm trying to install signalp6 in a singularity/apptainer container to be able to run it on an hpc. When running the installation following error occurs:

[...]
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.1/2.1 MB 21.1 MB/s eta 0:00:00
Collecting triton==2.0.0
  Downloading triton-2.0.0-1-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (63.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 63.3/63.3 MB 14.7 MB/s eta 0:00:00
Collecting nvidia-cusparse-cu11==11.7.4.91
  Downloading nvidia_cusparse_cu11-11.7.4.91-py3-none-manylinux1_x86_64.whl (173.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸     153.8/173.2 MB 19.6 MB/s eta 0:00:01
ERROR: THESE PACKAGES DO NOT MATCH THE HASHES FROM THE REQUIREMENTS FILE. If you have updated the package versions, please update the hashes. Otherwise, examine the package contents carefully; someone may have tampered with them.
    nvidia-cusparse-cu11==11.7.4.91 from https://files.pythonhosted.org/packages/ea/6f/6d032cc1bb7db88a989ddce3f4968419a7edeafda362847f42f614b1f845/nvidia_cusparse_cu11-11.7.4.91-py3-none-manylinux1_x86_64.whl#sha256=a3389de714db63321aa11fbec3919271f415ef19fda58aed7f2ede488c32733d (from torch>1.7.0->signalp6==6.0+g):
        Expected sha256 a3389de714db63321aa11fbec3919271f415ef19fda58aed7f2ede488c32733d
             Got        b253327205118db8b9d6ef5e7257f310d1024e115b8a4f6e21f1bc5b02b9a598

FATAL:   While performing build: while running engine: exit status 1

The os is ubuntu:22.04
I used the non-dev version from https://services.healthtech.dtu.dk/services/SignalP-6.0/

The complete installation routine is:

apt-get update -y
apt-get upgrade -y
apt-get install --no-install-recommends -y curl bash vim git ca-certificates tar unzip gzip wget gcc build-essential cpanminus
apt-get install --no-install-recommends -y sqlite3 ncbi-blast+ hmmer infernal infernal-doc
apt-get install --no-install-recommends -y python3 python3-pip
cd /usr/local/src
tar -xvzf signalp-6.0g.fast.tar.gz && cd signalp6_fast
pip3 install signalp-6-package/
Processing ./signalp-6-package
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Collecting torch>1.7.0
  Downloading torch-2.0.1-cp310-cp310-manylinux1_x86_64.whl (619.9 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 619.9/619.9 MB 831.6 kB/s eta 0:00:00
Collecting matplotlib>3.3.2
  Downloading matplotlib-3.7.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (11.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.6/11.6 MB 23.1 MB/s eta 0:00:00
Collecting tqdm>4.46.1
  Downloading tqdm-4.65.0-py3-none-any.whl (77 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 77.1/77.1 KB 13.8 MB/s eta 0:00:00
Collecting numpy>1.19.2
  Downloading numpy-1.24.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17.3/17.3 MB 21.1 MB/s eta 0:00:00
Collecting pillow>=6.2.0
  Downloading Pillow-9.5.0-cp310-cp310-manylinux_2_28_x86_64.whl (3.4 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.4/3.4 MB 23.6 MB/s eta 0:00:00
Collecting pyparsing>=2.3.1
  Downloading pyparsing-3.0.9-py3-none-any.whl (98 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 98.3/98.3 KB 19.3 MB/s eta 0:00:00
Collecting fonttools>=4.22.0
  Downloading fonttools-4.39.4-py3-none-any.whl (1.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/1.0 MB 22.8 MB/s eta 0:00:00
Collecting contourpy>=1.0.1
  Downloading contourpy-1.0.7-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (300 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 300.3/300.3 KB 19.1 MB/s eta 0:00:00
Collecting cycler>=0.10
  Downloading cycler-0.11.0-py3-none-any.whl (6.4 kB)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib>3.3.2->signalp6==6.0+g) (23.1)
Collecting python-dateutil>=2.7
  Downloading python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 247.7/247.7 KB 7.9 MB/s eta 0:00:00
Collecting kiwisolver>=1.0.1
  Downloading kiwisolver-1.4.4-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.6/1.6 MB 18.2 MB/s eta 0:00:00
Collecting nvidia-nvtx-cu11==11.7.91
  Downloading nvidia_nvtx_cu11-11.7.91-py3-none-manylinux1_x86_64.whl (98 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 98.6/98.6 KB 16.9 MB/s eta 0:00:00
Collecting nvidia-cuda-runtime-cu11==11.7.99
  Downloading nvidia_cuda_runtime_cu11-11.7.99-py3-none-manylinux1_x86_64.whl (849 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 849.3/849.3 KB 16.0 MB/s eta 0:00:00
Collecting nvidia-cuda-cupti-cu11==11.7.101
  Downloading nvidia_cuda_cupti_cu11-11.7.101-py3-none-manylinux1_x86_64.whl (11.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.8/11.8 MB 16.2 MB/s eta 0:00:00
Collecting sympy
  Downloading sympy-1.12-py3-none-any.whl (5.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.7/5.7 MB 24.9 MB/s eta 0:00:00
Collecting nvidia-cublas-cu11==11.10.3.66
  Downloading nvidia_cublas_cu11-11.10.3.66-py3-none-manylinux1_x86_64.whl (317.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 317.1/317.1 MB 2.8 MB/s eta 0:00:00
Collecting nvidia-curand-cu11==10.2.10.91
  Downloading nvidia_curand_cu11-10.2.10.91-py3-none-manylinux1_x86_64.whl (54.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 54.6/54.6 MB 13.1 MB/s eta 0:00:00
Collecting nvidia-cusolver-cu11==11.4.0.1
  Downloading nvidia_cusolver_cu11-11.4.0.1-2-py3-none-manylinux1_x86_64.whl (102.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 102.6/102.6 MB 11.1 MB/s eta 0:00:00
Collecting networkx
  Downloading networkx-3.1-py3-none-any.whl (2.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.1/2.1 MB 21.1 MB/s eta 0:00:00
Collecting triton==2.0.0
  Downloading triton-2.0.0-1-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (63.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 63.3/63.3 MB 14.7 MB/s eta 0:00:00
Collecting nvidia-cusparse-cu11==11.7.4.91
  Downloading nvidia_cusparse_cu11-11.7.4.91-py3-none-manylinux1_x86_64.whl (173.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸     153.8/173.2 MB 19.6 MB/s eta 0:00:01
ERROR: THESE PACKAGES DO NOT MATCH THE HASHES FROM THE REQUIREMENTS FILE. If you have updated the package versions, please update the hashes. Otherwise, examine the package contents carefully; someone may have tampered with them.
    nvidia-cusparse-cu11==11.7.4.91 from https://files.pythonhosted.org/packages/ea/6f/6d032cc1bb7db88a989ddce3f4968419a7edeafda362847f42f614b1f845/nvidia_cusparse_cu11-11.7.4.91-py3-none-manylinux1_x86_64.whl#sha256=a3389de714db63321aa11fbec3919271f415ef19fda58aed7f2ede488c32733d (from torch>1.7.0->signalp6==6.0+g):
        Expected sha256 a3389de714db63321aa11fbec3919271f415ef19fda58aed7f2ede488c32733d
             Got        b253327205118db8b9d6ef5e7257f310d1024e115b8a4f6e21f1bc5b02b9a598

FATAL:   While performing build: while running engine: exit status 1

It would be great if you could check what might be wrong with the hash.
Thank you
Best,
Nadine

time to run signalp6/6.0h

Hi there,

im running this software as part of a workflow to eventually generate a trinotate.xls.

However the issue im facing is time. My script has been running for 4 days and a bit and the max is 7 days.

is there a way to do this in sections so it can continue running

if i submit the script again will it be able to remember where it left from and continue.

kind regards

George

Web version and linux give different output

Hi,

The web version does not detect signal peptide but the linux version (--mode slow-sequential) does.

The web version:

signalp6 -fasta /path/to/input.fasta -org euk --output_dir path/to_be_saved --format txt --mode slow-sequential
output:
ID Prediction OTHER SP(Sec/SPI) CS Position
test1 SP 0.317355 0.68265 CS pos: 59-60. Pr: 0.7032

input.fasta
>test1
MVITHLPAKIWRTLVIIRCVGLSESLKIFSGEALCVHHTLPKHIFLLLLLVLLADLVPGQPVGGRTCPKINTRKEWRQLS
RESQASYLKAVKCLTTKPTTLRTRFRLRHYDDFQYVHSTLYMQGTDIWYSSTSKVSVNVVIRMVFRTGIGA

What is the threshold for selection?

Thanks

slow model download website is not available

Hi,
thanks for this wonderful tool.

unfortunately when trying to download the slow model from the provided link, i get the error:
the fast model donwload works fine.
is it possible to update the download site \ download from a different source?
thanks!

Unknown sequence behaviour

Hello, I got the following warning messages running signalp6 with "--organism euk " and got these warnings:

Unknown behaviour encountered for sequence no. 12101. Please check outputs.
Unknown behaviour encountered for sequence no. 16029. Please check outputs.
Unknown behaviour encountered for sequence no. 25368. Please check outputs.

Are they sequence numbers on the original file? If so these are the sequences:

>BSUD.16781.1.p1
MISLLTTFFLLLSPKVAGDCYGDTIARQKRLLGEMLDMSPAILSMEKLHADSVQQTLHEM
EIFQKYQFAEITPYEYKKLLLSRYLVFSASFALNICNQYGARLVEIMEEDERKAIAVMLD
LSDTPVDCLVIGTRFDKGDWTYWYSTRPAYRTLSTEENQQKGNCMILENQKSWNMTRVPC
LRTYFCHFMCEISMK*

>BSUD.18903.3.p1
MKPDHHAKMAAQMKERLKVEELAENIDELEDVVENAFPVLLVTVLVSIFLLAIFLVRMYL
RYTVENPSKNRMDGKTVLITGATSGLGKATAIELARKNARVLITGRDKIKVEAVARNIRK
KTGNQHVNALVLDLANLRGIREFCEAFCKDEKYLHVLINNAAYMGPKAATDDNLERCFGV
NYLGHFYLTYLLSDKLKKNAPSRVINVVSDSYAIGQLDFDDIALNKGYDVFKAYARSKYA
MMLWNLEHHRRTYSSCIWTFAVHPGACATELLRNYPGLTGNLLRIVSRIMFKAPEDGCQT
IVYLAVADGLREFSGKTFANCKVIKTQDRIKDKEVAKELWNISAHLCGFEPDTPYEEQES
TEAKETTTSDSPTADIAAAAAVSEQKKDK*

>BSUD.2410.1.p1
MKNRPSAAFRASAKPPTYCKMESQKEDEEDDGKGSRTMVLAGGGDGSNTEGAVPAKGGCG
EGRVLIFLVLGVLTLVFSGVLIGIYMNIRTLTSSLDVIEVMPSFVPAAAGGLAGLFLLGL
FWKRCVVLVYPVLVLCAVSTGLSIIIAVLTGTHVLQPLLSVSGCVYTRKGNICQCLTQFK
RDKLDLERVNAGETVYLALHNVSSCEDVQTVIPTMLYTMIGIYGLLALVSAVAGIISFLV
YRTERNRNYLDDTDYDEDEDSSPSTPSSNTDNYTEHQNMLSSRQANVTTAASVGNIYTNT
NATTNDEETNNDDGNTTPSDLTYNPSDAPMGYTEACKMRRCQSFTQPHKGAAREGSPGSS
ESGQTASSLMSSDRVADGGAIRLKENRKKGRRAVTLHGLDRDQLLLILSLQMRYLQESEQ
LAKKECQSALNLNNINKPNRTNVKNPNATSSDTSQENIDTSNTFSHFQRRAMTPTPRQER
LSSSRATNDDLDYKPAKQVRSHTPQPYHFKVQHNGVPATMGPVLLPNIPAQYQVEQQLQP
LQVQHLQQQQLQQQQQFLQFQQQQQLAQLQQQQQKIQMQQQLQLQQHLQVQQMSQHLPMQ
QPILVQQPALQSANHNMENESLTMITYDLRSVQTTGPIVYENVPSSRTSLNYYSPDGSCS
SNSSSLLRATNLPGYPSLSSNSVPTQTQPMPQPVESSPIKVLGGNIQTTTPNSPGQISRQ
TSEISSPSHSGNDSINTEQKQPGKSPESTPETQPAKAKGKKKLSKKEKNAKKEEEKTATT
DSTKTSTLGRTDSNASKAGSVSDRWQAVLPDGKAQAQTLWENVQRKIVSDPQTPDSTLTF
PHSSHPTSQTSVQPSAPNHFPNSSLVVPNGILKKTKSVPYSQQSSPNLAPVSPQSLPQFD
SQVPSNIYSSYNQMTNTPSYNAFIPIPTSSTDNYEDIDDFSRANQHAQQVQQDVPPPKPA
RLHARKPAPGAEAGDTLDSQGQQLRPKSYLSAVDRESMAAASLTSMGNVPCQPAYEGTNK
QSLTHMQMEGPVRDERSGGIVPAGGNDQLIMIYGDLYAQPRRKSIPTNLPLLPSLNQSHQ
MYQNSDIDHNLMDNQARAQYRLDVHQPQRSAFHMLGHTYHGDRSGHLPSNEICDTDLDEL
PIPRWNSRYHRSQSFSPPPYTPPPVYQSLESVGKYPSIRSTSSSSSDPHNSSSEGSTLDN
IQPGFTNNRLQGPPLSMSRGRQPAHVTDRRFCLTQQRQPPPNLYNHSGRRLNLKSDYDSF
RDRRVDQEPPPQVAIRRCQSVEEGNRKRLTSGQHLVNGKMVPHNNYYPNIDNIRTTSDGQ
NLENNLKMGPSVNRVRPIHNGSVPNTEQPAFQKGVQNFQNEAYIPNKGCQARKFPGDVSE
EDLSCSIDTDSVISDSSSQEVCPNKELNGFITHGRALESSDSDKDDYAETVI*

What is the cause? Thanks

your prediction server

Dear author,
You prediction server is constantly returning this message when requested for analysis:
Exception: WebfaceSystemError
Package: Webface::service : 712
Message: Unable to create /var/www/webface/tmp/server/SignalP-6.0/659D1C91000040FA921FC790 : No space left on device
Kindly look into it.
Regards.

Still having issues with `resolve_viterbi_marginal_conflicts`

Hi there!

Sorry to bother you again.
I'm still running into issues with the decoding step.

Running this sequence with SignalP 6.0e raises an error:

>P000004B9
MAFRLFAGITGRQLLAGGAALGGTGLAGSLIQTESERLQATEAQVQFHTSSIHPTPVGFS
PWQIRNDYPTSDILKARLKAQKDDSLPNAPSPLIPAPGLPGDFEGENAPWFKYDYEKEPE
KFAEAIREYCFDGNVDKGFRLNENKIRDWYHAPWMHYRDPNSMCTEREPINGFTFERATP
AGEFAKTQNVTLQNWAIGFYNATGATVFGDMWKDPDNPDFSQNKEFPVGTCVFKILLNNS
TPEQMPIQDGAPTMHAVISKSTSNGKERNDFASPLRLIQVDFAVVDKRSPIGWVFGTFMY
NKDQPGKGPWDRLTLVGLQWGNDHWLTNQVYDETKAEGRVAKPRECYIHKKAEDIRKREG
GTRPSWGWNGRMNGPADNFISACASCHSTSTSHPMYNGKVKDGVKQTYGMVPPLNMKPLP
PQPKEGNTFSDVMIYFRNVMGGVPFDEGVNPNNPDEYDPTYKSKVKSADYSLQLQVGWAN
YKKWKEDHETVLQSIFRKTRYVIGSELAGASDLSQRDQGRQEPTDDGPVE

$signalp6  --fastafile "${1}" --output_dir "${TMPDIR}" --format none --organism eukarya --mode fast  --bsize "32" --write_procs 1
Predicting: 100%|██████████| 1/1 [00:04<00:00,  4.87s/sequences]
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/predector/bin/signalp6", line 8, in <module>
    sys.exit(predict())
  File "/home/ubuntu/miniconda3/envs/predector/lib/python3.6/site-packages/signalp/__init__.py", line 6, in predict
    main()
  File "/home/ubuntu/miniconda3/envs/predector/lib/python3.6/site-packages/signalp/predict.py", line 239, in main
    resolve_viterbi_marginal_conflicts(global_probs, marginal_probs, cleavage_sites, viterbi_paths)
  File "/home/ubuntu/miniconda3/envs/predector/lib/python3.6/site-packages/signalp/utils.py", line 311, in resolve_viterbi_marginal_conflicts
    cleavage_sites[i] = sp_idx.max() +1
  File "/home/ubuntu/miniconda3/envs/predector/lib/python3.6/site-packages/numpy/core/_methods.py", line 39, in _amax
    return umr_maximum(a, axis, None, out, keepdims, initial, where)
ValueError: zero-size array to reduction operation maximum which has no identity

I haven't had a huge amount of time to debug it (or decipher how it all works), but it seems as though the marginal probabilities in type_marginal_probs are all assigning it to the PAD token, so you end up with a zero length array at np.where(np.isin(marginal_region_preds, [5, 10, 19, 25, 31]))[0].

I wonder if a property unit testing framework (like https://hypothesis.readthedocs.io/en/latest/) would be helpful for finding all of these edge cases and appropriately handle them?
It seems to have become a troublesome issue.

Mode `slow` not found, running `slow-sequential` instead.

The mode slow does not seem to work:
my command is:
signalp6 --fastafile /home/barthelemy/beta/A.fasta --organism other --output_dir /home/barthelemy/beta/signalP_out --format txt --mode slow

Does slow-sequential produce the same output as slow ?

best

barthelemy

Feature request: Include the current package version

Hi there!

Thanks again for your great work.
It would be really helpful if the distributed package could please have the current version numbers in them please!

I maintain a pipeline that runs SignalP6 (https://github.com/ccdmb/predector), and i've been getting bug reports from people.
It would be helpful to know exactly which version of signalp 6 is actually installed, so that I know whether to report a new bug to you or ask them to update.

It also helps me to cache pre-computed results, and automatically skip repeated analysis if an identical version has been run before.

Currently the -h --help flag lists usage: SignalP 6.0 Signal peptide prediction tool [-h] --fastafile FASTAFILE.
And setup.py lists version='1.0'.

It would be helpful if you could tag the full version info e.g. 6.0e in these places.
Or a special --version flag is becoming more common.

I use the bump2version to do this automatically with a git tag for me.

I know this is pretty low priority so feel absolutely free to say no.
It would just be helpful.

Thanks and all the best,

Darcy

Wrong column name (typo) in `prediction_results.txt`

Hello,

I just ran the downloaded version of SignalP 6.0g and noticed that the TATLIPO column in prediction_results.txt (printed by make_output_files.py:21) reads "TATLIPO(Sec/SPII)" where it supposedly should output "TATLIPO(Tat/SPII)".

Best regards

CPU and GPU disagree with pytorch 2.0

I installed SignalP as follows:

Dockerfile

FROM nvidia/cuda:12.1.1-base-ubuntu20.04
MAINTAINER Thomas Roder

ADD software/signalp-6.0g.fast.tar.gz /opt/signalp

# install system dependencies
RUN apt update
RUN apt install -y python3.9 python3-distutils curl zip unzip less
RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.9

# install python dependencies
RUN pip install -r /opt/signalp/signalp6_fast/signalp-6-package/requirements.txt
RUN pip install /opt/signalp/signalp6_fast/signalp-6-package
RUN mv /opt/signalp/signalp6_fast/signalp-6-package/models/* /usr/local/lib/python3.9/dist-packages/signalp/model_weights/

# replace python 3.8 with 3.9
RUN \
    update-alternatives --install /usr/bin/python python /usr/bin/python3.8 1 && \
    update-alternatives --install /usr/bin/python python /usr/bin/python3.9 2 && \
    update-alternatives --set python /usr/bin/python3.9 && \
    update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.8 1 && \
    update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.9 2 && \
    update-alternatives --set python3 /usr/bin/python3.9

# convert to GPU if GPU_MODE
ARG GPU_MODE
RUN if [ -n "$GPU_MODE" ]; then \
        echo "GPU_MODE is ON: converting models..."; \
        signalp6_convert_models gpu; \
    else \
        echo "GPU_MODE is OFF: do nothing"; \
    fi

WORKDIR /data

Build

Build with podman/CUDA.

CPU mode: podman build . --tag signalp_cpu
GPU mode: podman build --build-arg GPU_MODE=TRUE --security-opt=label=disable --hooks-dir=/usr/share/containers/oci/hooks.d/ . --tag signalp_gpu

Run

CPU mode:

podman run --rm \
    -v ./:/data:Z \
    signalp_cpu \
    signalp6 --fastafile /data/test/predicted.faa --organism other --output_dir /data/test/out_cpu --format txt --mode fast

GPU mode:

podman run --rm \
    -v ./:/data:Z \
    --security-opt=label=disable \
    --hooks-dir=/usr/share/containers/oci/hooks.d/ \
    signalp_gpu \
    signalp6 --fastafile /data/test/predicted.faa --organism other --output_dir /data/test/out_gpu --format txt --mode fast

Result:

You can download the output here: signalp-output.tar.xz.

GPU mode gives many warnings like this:

Unknown behaviour encountered for sequence no. 11. Please check outputs.

Also, many proteins are classified differently, e.g.:

[CPU] 1_12 # 13606 # 13803 # -1 # ID=1_12;partial=00;start_type=ATG;rbs_motif=GGAG/GAGG;rbs_spacer=5-10bp;gc_cont=0.424	PILIN	0.004577	0.000000	0.000000	0.000000	0.000000	60031319932928.000000	CS pos: 10-11. Pr: 0.0000
[GPU] 1_12 # 13606 # 13803 # -1 # ID=1_12;partial=00;start_type=ATG;rbs_motif=GGAG/GAGG;rbs_spacer=5-10bp;gc_cont=0.424	OTHER	0.999923	0.000075	0.000001	0.000000	0.000000	0.000002

Question

I suppose something is wrong with the GPU predictions. However, I want to annotate thousands of genomes and require the algorithm to be faster. Do you have an idea what went wrong? Can you help me?

My system:

$ cat /etc/os-release
NAME="Fedora Linux"
VERSION="38 (Workstation Edition)"
...
$ nvidia-smi 
Wed May 17 20:47:27 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.41.03              Driver Version: 530.41.03    CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce MX150            Off| 00000000:3B:00.0 Off |                  N/A |
| N/A   93C    P0               N/A /  N/A|   1659MiB /  2048MiB |     95%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      2530      G   /usr/bin/gnome-shell                          0MiB |
|    0   N/A  N/A    102059      C   /usr/bin/python3.9                         1656MiB |
+---------------------------------------------------------------------------------------+

Can signaIP-6.0 applied to prediction on SRP-dependent signal sequences?

Hi, I found that signaIP-6.0 was trained to identify signal peptides of Sec or Tat pathway. However, when I tested signaIP-6.0 on the secreted protein DSBA_ECOLI which is reported to be SRP-dependent, I get the following output:

It surprised me that the signal sequence was identified quite well, consistent with the reported one. But the signal peptide of DSBA_ECOLI is also classified to the Sec pathway. I think this phenomenon is interesting. Could you give some possible explanations?

Can not find ensemble_model_signalp6.pt file

I've installed Signal-6.0, but when running the command on my fasta file, it gives an error: FileNotFoundError: Slow mode requires the full model to be installed at ....., I'm looking for ensemble_model_signalp6.pt to resolve this error.

Please if you have any opinion or solution on this error let me know.

Convert the model to .onnx

Hello, how do you pass in the model and parameters when converting this model into an onnx model?
The operation code is as follows：
model = torch.load("best_weight.pth", map_location="cpu")
dummy_input = (1, 3, 224, 224)
torch.onnx.export(model, dummy_input, f, verbose=False, input_names=input_names, output_names=output_names)

Is this right?

Can't use online service

Hi community,

Signalp-6.0 is a wonderful tool, but I can't access to the website (https://services.healthtech.dtu.dk/service.php?SignalP-6.0.) for online service. Is there anything wrong here?

Thanks,
Nemo

Code to reproduce the restricted single linkage clustering results?

Is the code for the restricted single linkage clustering of the dataset available? Thank you very much.

Signalp 'Sequence limit reached: Max 10000 sequences are allowed'

Hello,

I am trying to run signalp on a fasta file with 258514 sequences. However, I reached the number of sequences limit.
How can I increase the sequence limit in SignalP?

Thanks.

Numpy error with 6.0b for some sequences

Hi there!

Thanks for your great work.
I was testing out the update and came along an issue while running on the signalp5 benchmark set during the marginal conflict resolution step.

This sequence:

>A0R1E8|POSITIVE|LIPO|2
MTQNCVAPVAIIGMACRLPGAINSPQQLWEALLRGDDFVTEIPTGRWDAEEYYDPEPGVPGRSVSKWGAF

from https://services.healthtech.dtu.dk/services/SignalP-6.0/public_data/benchmark_set_sp5.fasta
appears to be the issue.

Running version 6.0b in "fast" mode with this sequence in both other and eukaryote organisms causes the following error.

$ signalp6 --output_dir test --format txt --organism euk --mode fast --fastafile test.fasta

/home/ubuntu/miniconda3/envs/sp6/lib/python3.6/site-packages/torch/nn/modules/module.py:1051: UserWarning: where received a uint8 condition tensor. This behavior is deprecated and will be removed in a future version of PyTorch. Use a boolean condition instead. (Triggered internally at  /tmp/pip-req-build-1ky46svp/aten/src/ATen/native/TensorCompare.cpp:255.)
  return forward_call(*input, **kwargs)
Predicting: 100%|| 1/1 [00:00<00:00,  1.53batch/s]
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/sp6/bin/signalp6", line 8, in <module>
    sys.exit(predict())
  File "/home/ubuntu/miniconda3/envs/sp6/lib/python3.6/site-packages/signalp/__init__.py", line 6, in predict
    main()
  File "/home/ubuntu/miniconda3/envs/sp6/lib/python3.6/site-packages/signalp/predict.py", line 235, in main
    resolve_viterbi_marginal_conflicts(global_probs, marginal_probs, cleavage_sites, viterbi_paths)
  File "/home/ubuntu/miniconda3/envs/sp6/lib/python3.6/site-packages/signalp/utils.py", line 254, in resolve_viterbi_marginal_conflicts
    cleavage_sites[i] = sp_idx.max() +1
  File "/home/ubuntu/miniconda3/envs/sp6/lib/python3.6/site-packages/numpy/core/_methods.py", line 39, in _amax
    return umr_maximum(a, axis, None, out, keepdims, initial, where)
ValueError: zero-size array to reduction operation maximum which has no identity

This doesn't appear to be an issue with the previous version available for download.
Both have identical main dependency versions:

python 3.6.13
numpy 1.19.5
pytorch 1.9.1
tqdm 4.62.3

Thanks in advance,
Darcy

Model error?

Hi,

SignalP6 is out and it looks great. I started to work on it but getting following error.

Installed as follow:
python3 -m pip install signalp-6-package/

SIGNALP_DIR=$(python3 -c "import signalp; import os; print(os.path.dirname(signalp.file))" )
cp -r signalp-6-package/models/* $SIGNALP_DIR/model_weights/

signalp6 -fasta /path/to/input.fasta -org euk --output_dir path/to/be/saved --format txt --mode slow

Traceback (most recent call last):
File "/home/.local/lib/python3.6/site-packages/signalp/predict.py", line 207, in main
model = torch.jit.load(SLOW_MODEL_PATH)
File "/home/.local/lib/python3.6/site-packages/torch/jit/_serialization.py", line 151, in load
raise ValueError("The provided filename {} does not exist".format(f)) # type: ignore[str-bytes-safe]
ValueError: The provided filename /home/.local/lib/python3.6/site-packages/signalp/model_weights/ensemble_model_signalp6.pt does not exist

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/.local/bin/signalp6", line 8, in
sys.exit(predict())
File "/home/.local/lib/python3.6/site-packages/signalp/init.py", line 6, in predict
main()
File "/home/.local/lib/python3.6/site-packages/signalp/predict.py", line 211, in main
raise FileNotFoundError(f"Slow mode requires the full model to be installed at {SLOW_MODEL_PATH}. Missing from this installation or incorrectly configured.")
FileNotFoundError: Slow mode requires the full model to be installed at /home/.local/lib/python3.6/site-packages/signalp/model_weights/ensemble_model_signalp6.pt. Missing from this installation or incorrectly configured.

I could not find the model "ensemble_model_signalp6.pt" where can i download it from?
The model present is "sequential_models_signalp6".

Thanks

Reproduce the result in Supplementary Table 1

Hi, I found it extremely slow and also problematic when I ran train_model.py with the basic training command (shown in the figure attached), and I want to reproduce the result in your Supplementary Table 1, could you please tell me how I can do it?

Why does the Cleavage sites have negative values？

Hello，When I used signalP 6.0 to predict signal peptides, I found that there are negative values for Cleavage sites in the output.gff3 file. What is the reason?

Example:
"RS_3_bin.18_00081 hypothetical protein SignalP-6.0 signal_peptide 1 -1 0.56286603 . . Note=TAT"

Looking forward to your reply.

David

SignalP parameter --bsize

Hi, I’m currently working with SignalP. I have some questions and I would very much appreciate it if you can please help me with them.

There’s a parameter named “--bsize”, which according to the instructions: “--bsize, -bs is the integer batch size used for prediction. When running on GPU, this should be adjusted to maximize usage of the available memory. On CPU, the choice usually has only a limited effect on performance. Defaults to 10.”

What would be the best way to figure out the value that maximizes the available memory?

We are running SignalP on CPU, but we have seen significant improvements increasing the bsize value. We just need to figure out how to come up with the ideal value based on dataset size, number of CPUs, etc.

RuntimeError: set_num_threads expects an int, but got str

Hi fteufel,
Signalp-6.0 is an outstanding software for signal peptide prediction, and I ran this software with parameter: --torch_num_thread 25, but it raised an error

Traceback (most recent call last):
  File "/data/conda_envs/signalp6/bin/signalp6", line 33, in <module>
    sys.exit(load_entry_point('signalp6==6.0+h', 'console_scripts', 'signalp6')())
  File "/data/conda_envs/signalp6/lib/python3.7/site-packages/signalp6-6.0+h-py3.7.egg/signalp/__init__.py", line 6, in predict
    main()
  File "/data/conda_envs/signalp6/lib/python3.7/site-packages/signalp6-6.0+h-py3.7.egg/signalp/predict.py", line 144, in main
    torch.set_num_threads(args.torch_num_threads)
RuntimeError: set_num_threads expects an int, but got str

Then I checked line 144 in predict.py

torch.set_num_threads(args.torch_num_threads)

and I found the error was raised because parameter was passed with a string format, then I use int function like that

torch.set_num_threads(int(args.torch_num_threads))

and it worked. So I wonder if you can fix this problem in the later version of Signalp-6.0. Thanks for the useful tool again!

Best,
Zewei

Limit the number of threads

Users are running signalp-6.0 on our compute cluster. Each signalp process takes as many CPU cores as the hardware provides. The processes scale poorly if using >8 threads, especially if using more than 64 threads. Please provide a command line parameter that allows the user to define the number of CPU threads that signalp uses.

how and from where to download the model files

pls tell

output processed_entries contains all the proteins with or without the signal

Hello,

I am using signalP6 but I am having a hard time retrieving the proteins that are supposed to have the P signal in the files named "processed_entries", as I understood from previous versions of the program, these files were supposed to contain only the proteins that are predicted to have the P signal, but with their signal peptide sequences removed. But instead I get a fasta, although without the ">" in the header name, which contains all the proteins I provided as input, although in the "prediction_results.txt" file it is clear that not all the proteins provided carry the signal. This file looks like this:

SignalP-6.0 Organism: Eukarya Timestamp: 20240212172549

ID Prediction OTHER SP(Sec/SPI) CS Position

FRA0004_1 NO_SP 1.000040 0.000000
FRA0004_2 NO_SP 1.000039 0.000004
FRA0004_3 SP 0.000306 0.999679 CS pos: 18-19. Pr: 0.9627
FRA0004_4 NO_SP 1.000064 0.000000
.
.
.

But the "fasta" file looks like this:

FRA0004_1
MKDIVSAISHRFHGLSKSKAAV...
FRA0004_2
MEGLRRSLIEGSSSEKYSVYNP...
FRA0004_3
MKTSSVLALASFAAFATASPIAP...
FRA0004_4
MKFGTTLRKSVYAPWKDKYID...
FRA0004_5
MDTPPRVFETAVGKFWR ...
.
.
.

I really appreciate any help that you can provide to solve this,

Thank you so much

fteufel / signalp-6.0 Goto Github PK

signalp-6.0's People

Contributors

Stargazers

Watchers

Forkers

signalp-6.0's Issues

Dockerfile

Build

Run

Result:

Question

My system:

SignalP-6.0 Organism: Eukarya Timestamp: 20240212172549

ID Prediction OTHER SP(Sec/SPI) CS Position

Recommend Projects

Recommend Topics

Recommend Org