mannlabs / alphapeptdeep Goto Github PK

View Code? Open in Web Editor NEW

104.0 9.0 20.0 30.04 MB

Deep learning framework for proteomics

License: Apache License 2.0

Python 8.59% Shell 0.15% Jupyter Notebook 91.19% HTML 0.02% Inno Setup 0.04%

deep-learning hla machine-learning proteomics python

alphapeptdeep's Introduction

AlphaPeptDeep (PeptDeep)

About
License
Installation
Usage
Troubleshooting
Citations
How to contribute
Changelog

About

AlphaPeptDeep (peptdeep for short) aims to easily build new deep learning models for shotgun proteomics studies. Transfer learning is also easy to apply using AlphaPeptDeep.

It contains some built-in models such as retention time (RT), collision cross section (CCS), and tandem mass spectrum (MS2) prediction for given peptides. With these models, one can easily generate a predicted library from fasta files.

For details, check out our publications.

For documentation, see readthedocs.

AlphaX repositories:

alphabase: Infrastructure for AlphaX Ecosystem
alphapept: DDA search engine
alphapeptdeep: Deep learning for proteomics
alpharaw: Raw data accessing
alphaviz: MS data and result visualization
alphatims: timsTOF data accessing

Subsequent projects of AlphaPeptDeep

peptdeep_hla: the DL model that predict if a peptide is presented by indivudual HLA or not.

Other pre-trained MS2/RT/CCS models

Dimethyl: the MS2/RT/CCS models for Dimethyl-labeled peptides.

Citations

Wen-Feng Zeng, Xie-Xuan Zhou, Sander Willems, Constantin Ammar, Maria Wahle, Isabell Bludau, Eugenia Voytik, Maximillian T. Strauss & Matthias Mann. AlphaPeptDeep: a modular deep learning framework to predict peptide properties for proteomics. Nat Commun 13, 7238 (2022). https://doi.org/10.1038/s41467-022-34904-3

License

AlphaPeptDeep was developed by the Mann Labs at the Max Planck Institute of Biochemistry and the University of Copenhagen and is freely available with an Apache License. External Python packages (available in the requirements folder) have their own licenses, which can be consulted on their respective websites.

Installation

AlphaPeptDeep can be installed and used on all major operating systems (Windows, macOS and Linux).

There are three different types of installation possible:

One-click GUI installer: Choose this installation if you only want the GUI and/or keep things as simple as possible.
Pip installer: Choose this installation if you want to use peptdeep as a Python package in an existing Python (recommended Python 3.8 or 3.9) environment (e.g. a Jupyter notebook). If needed, the GUI and CLI can be installed with pip as well.
Developer installer: Choose this installation if you are familiar with CLI tools, conda and Python. This installation allows access to all available features of peptdeep and even allows to modify its source code directly. Generally, the developer version of peptdeep outperforms the precompiled versions which makes this the installation of choice for high-throughput experiments.

One-click GUI

The GUI of peptdeep is a completely stand-alone tool that requires no knowledge of Python or CLI tools. Click on one of the links below to download the latest release for:

Older releases remain available on the release page, but no backwards compatibility is guaranteed.

Note that, as GitHub does not allow large release files, these installers do not have GPU support. To create GPU version installers, clone the source code and install GPU-version pytorch (#use-gpu), and then use release/one_click_xxx_gui/create_installer_xxx.sh to build installer locally. For example in Windows, run

cd release/one_click_windows_gui
. ./create_installer_windows.sh

Pip

PythonNET must be installed to access Thermo or Sciex raw data.

Legacy, should be replaced by AlphaRaw in the near future.

PythonNET in Windows

Automatically installed for Windows.

PythonNET in Linux

Install Mono from mono-project website Mono Linux. NOTE, the installed mono version should be at least 6.10, which requires you to add the ppa to your trusted sources!

Install PythonNET with pip install pythonnet.

PythonNET in MacOS

Install brew and pkg-config: brew install pkg-config 3. Install Mono from mono-project website Mono Mac

Register the Mono-Path to your system: For macOS Catalina, open the configuration of zsh via the terminal:

Type nano ~/.zshrc to open the configuration of the terminal

Append the mono path to your PKG_CONFIG_PATH: export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig:/usr/lib/pkgconfig:/Library/Frameworks/Mono.framework/Versions/Current/lib/pkgconfig:$PKG_CONFIG_PATH.

Save everything and execute . ~/.zshrc

Install PythonNET with pip install pythonnet.

peptdeep can be installed in an existing Python environment with a single bash command. This bash command can also be run directly from within a Jupyter notebook by prepending it with a !:

pip install peptdeep

Installing peptdeep like this avoids conflicts when integrating it in other tools, as this does not enforce strict versioning of dependancies. However, if new versions of dependancies are released, they are not guaranteed to be fully compatible with peptdeep. This should only occur in rare cases where dependencies are not backwards compatible.

TODO You can always force peptdeep to use dependancy versions which are known to be compatible with:
pip install "peptdeep[stable]"
NOTE: You might need to run pip install pip before installing peptdeep like this. Also note the double quotes ".

For those who are really adventurous, it is also possible to directly install any branch (e.g. @development) with any extras (e.g. #egg=peptdeep[stable,development-stable]) from GitHub with e.g.

pip install "git+https://github.com/MannLabs/alphapeptdeep.git@development#egg=peptdeep[stable,development-stable]"

Use GPU

To enable GPU, GPU version of PyTorch is required, it can be installed with:

pip install torch --extra-index-url https://download.pytorch.org/whl/cu116 --upgrade

Note that this may depend on your NVIDIA driver version. Run the command to check your NVIDIA driver:

nvidia-smi

For latest pytorch version, see pytorch.org.

Developer

peptdeep can also be installed in editable (i.e. developer) mode with a few bash commands. This allows to fully customize the software and even modify the source code to your specific needs. When an editable Python package is installed, its source code is stored in a transparent location of your choice. While optional, it is advised to first (create and) navigate to e.g. a general software folder:

mkdir ~/alphapeptdeep/project/folder
cd ~/alphapeptdeep/project/folder

The following commands assume you do not perform any additional cd commands anymore.

Next, download the peptdeep repository from GitHub either directly or with a git command. This creates a new peptdeep subfolder in your current directory.

git clone https://github.com/MannLabs/alphapeptdeep.git

For any Python package, it is highly recommended to use a separate conda virtual environment, as otherwise dependancy conflicts can occur with already existing packages.

conda create --name peptdeep python=3.9 -y
conda activate peptdeep

Finally, peptdeep and all its dependancies need to be installed. To take advantage of all features and allow development (with the -e flag), this is best done by also installing the development dependencies instead of only the core dependencies:

pip install -e ".[development]"

By default this installs loose dependancies (no explicit versioning), although it is also possible to use stable dependencies (e.g. pip install -e ".[stable,development-stable]").

By using the editable flag -e, all modifications to the peptdeep source code folder are directly reflected when running peptdeep. Note that the peptdeep folder cannot be moved and/or renamed if an editable version is installed. In case of confusion, you can always retrieve the location of any Python module with e.g. the command import module followed by module.__file__.

Usage

There are three ways to use peptdeep:

GUI
CLI
Python

NOTE: The first time you use a fresh installation of peptdeep, it is often quite slow because some functions might still need compilation on your local operating system and architecture. Subsequent use should be a lot faster.

GUI

If the GUI was not installed through a one-click GUI installer, it can be launched with the following bash command:

peptdeep gui

This command will start a web server and automatically open the default browser:

There are several options in the GUI (left panel):

Server: Start/stop the task server, check tasks in the task queue
Settings: Configure common settings, load/save current settings
Model: Configure DL models for prediction or transfer learning
Transfer: Refine the models
Library: Predict a library
Rescore: Perform ML feature extraction and Percolator

CLI

The CLI can be run with the following command (after activating the conda environment with conda activate peptdeep or if an alias was set to the peptdeep executable):

peptdeep -h

It is possible to get help about each function and their (required) parameters by using the -h flag. AlphaPeptDeep provides several commands for different tasks:

export-settings
cmd-flow
library
transfer
rescore
install-models
gui

Run a command to check usages:

peptdeep $command -h

For example:

peptdeep library -h

export-settings

peptdeep export-settings C:/path/to/settings.yaml

This command will export the default settings into the settings.yaml as a template, users can edit the yaml file to run other commands.

Here is a section of the yaml file which controls global parameters for different tasks:

model_url: "https://github.com/MannLabs/alphapeptdeep/releases/download/pre-trained-models/pretrained_models.zip"

task_type: library
task_type_choices:
  - library
  - train
  - rescore
thread_num: 8
torch_device:
  device_type: gpu
  device_type_choices:
    - gpu
    - mps
    - cpu
  device_ids: []

log_level: info
log_level_choices:
  - debug
  - info
  - warning
  - error
  - critical

common:
  modloss_importance_level: 1.0
  user_defined_modifications: {}
  # For example,
  # user_defined_modifications:
  #   "Dimethyl2@Any N-term":
  #     composition: "H(2)2H(2)C(2)"
  #     modloss_composition: "H(0)" # can be without if no modloss
  #   "Dimethyl2@K":
  #     composition: "H(2)2H(2)C(2)"
  #   "Dimethyl6@Any N-term":
  #     composition: "2H(4)13C(2)"
  #   "Dimethyl6@K":
  #     composition: "2H(4)13C(2)"

peak_matching:
  ms2_ppm: True
  ms2_tol_value: 20.0
  ms1_ppm: True
  ms1_tol_value: 20.0

model_mgr:
  default_nce: 30.0
  default_instrument: Lumos
  mask_modloss: True
  model_type: generic
  model_choices:
  - generic
  - phos
  - hla # same as generic
  - digly
  external_ms2_model: ''
  external_rt_model: ''
  external_ccs_model: ''
  instrument_group:
    ThermoTOF: ThermoTOF
    Astral: ThermoTOF
    Lumos: Lumos
    QE: QE
    timsTOF: timsTOF
    SciexTOF: SciexTOF
    Fusion: Lumos
    Eclipse: Lumos
    Velos: Lumos # not important
    Elite: Lumos # not important
    OrbitrapTribrid: Lumos
    ThermoTribrid: Lumos
    QE+: QE
    QEHF: QE
    QEHFX: QE
    Exploris: QE
    Exploris480: QE
  predict:
    batch_size_ms2: 512
    batch_size_rt_ccs: 1024
    verbose: True
    multiprocessing: True

The model_mgr section in the yaml defines the common settings for MS2/RT/CCS prediction.

cmd-flow

peptdeep cmd-flow ...

Support CLI parameters to control global_settings for CLI users. It supports three workflows: train, library or train library, controlled by CLI parameter --task_workflow, for example, --task_workflow train library. All settings in global_settings are converted to CLI parameters using -- as the dict level indicator, for example, global_settings["library"]["var_mods"] corresponds to --library--var_mods. See test_cmd_flow.sh for example.

There are three kinds of parameter types:

value type (int, float, bool, str): The CLI parameter only has a single value, for instance: --model_mgr--default_instrument 30.0.
list type (list): The CLI parameter has a list of values seperated by a space, for instance --library--var_mods "Oxidation@M" "Acetyl@Protein_N-term".
dict type (dict): Only three parameters are dict type, --library--labeling_channels, --model_mgr--transfer--psm_modification_mapping, and --common--user_defined_modifications. Here are the examples: - --library--labeling_channels: labeling channels for the library. Example: --library--labeling_channels "0:Dimethyl@Any_N-term;Dimethyl@K" "4:xx@Any_N-term;xx@K" - --model_mgr--transfer--psm_modification_mapping: converting other search engines' modification names to alphabase modifications for transfer learning. Example: --model_mgr--transfer--psm_modification_mapping "Dimethyl@Any_N-term:_(Dimethyl-n-0);_(Dimethyl)" "Dimethyl@K:K(Dimethyl-K-0);K(Dimethyl)". Note that X(UniMod:id) format can directly be recognized by alphabase. - --common--user_defined_modification: user defined modifications. Example:--common--user_defined_modification "NewMod1@Any_N-term:H(2)2H(2)C(2)" "NewMod2@K:H(100)O(2)C(2)"

library

peptdeep library settings_yaml

This command will predict a spectral library for given settings_yaml file (exported by export-settings). All the essential settings are in the library section in the settings_yaml file:

library:
  infile_type: fasta
  infile_type_choices:
  - fasta
  - sequence_table
  - peptide_table # sequence with mods and mod_sites
  - precursor_table # peptide with charge state
  infiles:
  - xxx.fasta
  fasta:
    protease: 'trypsin'
    protease_choices:
    - 'trypsin'
    - '([KR])'
    - 'trypsin_not_P'
    - '([KR](?=[^P]))'
    - 'lys-c'
    - 'K'
    - 'lys-n'
    - '\w(?=K)'
    - 'chymotrypsin'
    - 'asp-n'
    - 'glu-c'
    max_miss_cleave: 2
    add_contaminants: False
  fix_mods:
  - Carbamidomethyl@C
  var_mods:
  - Acetyl@Protein N-term
  - Oxidation@M
  special_mods: [] # normally for Phospho or GlyGly@K
  special_mods_cannot_modify_pep_n_term: False
  special_mods_cannot_modify_pep_c_term: False
  labeling_channels: {}
  # For example,
  # labeling_channels:
  #   0: ['Dimethyl@Any N-term','Dimethyl@K']
  #   4: ['Dimethyl:2H(2)@Any N-term','Dimethyl:2H(2)@K']
  #   8: [...]
  min_var_mod_num: 0
  max_var_mod_num: 2
  min_special_mod_num: 0
  max_special_mod_num: 1
  min_precursor_charge: 2
  max_precursor_charge: 4
  min_peptide_len: 7
  max_peptide_len: 35
  min_precursor_mz: 200.0
  max_precursor_mz: 2000.0
  decoy: pseudo_reverse
  decoy_choices:
  - pseudo_reverse
  - diann
  - None
  max_frag_charge: 2
  frag_types:
  - b
  - y
  rt_to_irt: True
  generate_precursor_isotope: False
  output_folder: "{PEPTDEEP_HOME}/spec_libs"
  output_tsv:
    enabled: False
    min_fragment_mz: 200
    max_fragment_mz: 2000
    min_relative_intensity: 0.001
    keep_higest_k_peaks: 12
    translate_batch_size: 1000000
    translate_mod_to_unimod_id: False

peptdeep will load sequence data based on library:infile_type and library:infiles for library prediction. library:infiles contains the list of files with library:infile_type defined in library:infile_type_choices:

fasta: Protein fasta files, peptdeep will digest the protein sequences into peptide sequences.
sequence_table: Tab/comma-delimited txt/tsv/csv (text) files which contain the column sequence for peptide sequences.
peptide_table: Tab/comma-delimited txt/tsv/csv (text) files which contain the columns sequence, mods, and mod_sites. peptdeep will not add modifications for peptides of this file type.
precursor_table: Tab/comma-delimited txt/tsv/csv (text) files which contain the columns sequence, mods, mod_sites, and charge. peptdeep will not add modifications and charge states for peptides of this file type.

See examples:

import pandas as pd
df = pd.DataFrame({
    'sequence': ['ACDEFGHIK','LMNPQRSTVK','WYVSTR'],
    'mods': ['Carbamidomethyl@C','Acetyl@Protein N-term;Phospho@S',''],
    'mod_sites': ['2','0;7',''],
    'charge': [2,3,1],
})

sequence_table

df[['sequence']]

	sequence
0	ACDEFGHIK
1	LMNPQRSTVK
2	WYVSTR

peptide_table

df[['sequence','mods','mod_sites']]

	sequence	mods	mod_sites
0	ACDEFGHIK	Carbamidomethyl@C	2
1	LMNPQRSTVK	Acetyl@Protein N-term;Phospho@S	0;7
2	WYVSTR

precursor_table

df

	sequence	mods	mod_sites	charge
0	ACDEFGHIK	Carbamidomethyl@C	2	2
1	LMNPQRSTVK	Acetyl@Protein N-term;Phospho@S	0;7	3
2	WYVSTR			1

Columns of proteins and genes are optional for these txt/tsv/csv files.

peptdeep supports multiple files for library prediction, for example (in the yaml file):

library:
  ...
  infile_type: fasta
  infiles:
  - /path/to/fasta/human.fasta
  - /path/to/fasta/yeast.fasta
  ...

The library in HDF5 (.hdf) format will be saved into library:output_folder. If library:output_tsv:enabled is True, a TSV spectral library that can be processed by DIA-NN and Spectronaut will also be saved into library:output_folder.

transfer

peptdeep transfer settings_yaml

This command will apply transfer learning to refine RT/CCS/MS2 models based on model_mgr:transfer:psm_files and model_mgr:transfer:psm_type. All yaml settings (exported by export-settings) related to this command are:

model_mgr:
  transfer:
    model_output_folder: "{PEPTDEEP_HOME}/refined_models"
    epoch_ms2: 20
    warmup_epoch_ms2: 10
    batch_size_ms2: 512
    lr_ms2: 0.0001
    epoch_rt_ccs: 40
    warmup_epoch_rt_ccs: 10
    batch_size_rt_ccs: 1024
    lr_rt_ccs: 0.0001
    verbose: False
    grid_nce_search: False
    grid_nce_first: 15.0
    grid_nce_last: 45.0
    grid_nce_step: 3.0
    grid_instrument: ['Lumos']
    psm_type: alphapept
    psm_type_choices:
      - alphapept
      - pfind
      - maxquant
      - diann
      - speclib_tsv
    psm_files: []
    ms_file_type: alphapept_hdf
    ms_file_type_choices:
      - alphapept_hdf
      - thermo_raw
      - mgf
      - mzml
    ms_files: []
    psm_num_to_train_ms2: 100000000
    psm_num_per_mod_to_train_ms2: 50
    psm_num_to_test_ms2: 0
    psm_num_to_train_rt_ccs: 100000000
    psm_num_per_mod_to_train_rt_ccs: 50
    psm_num_to_test_rt_ccs: 0
    top_n_mods_to_train: 10
    psm_modification_mapping: {}
    # alphabase modification to modifications of other search engines
    # For example,
    # psm_modification_mapping:
    #   Dimethyl@Any N-term:
    #     - _(Dimethyl-n-0)
    #     - _(Dimethyl)
    #   Dimethyl:2H(2)@K:
    #     - K(Dimethyl-K-2)
    #   ...

For DDA data, peptdeep can also extract MS2 intensities from the spectrum files from model_mgr:transfer:ms_files and model_mgr:transfer:ms_file_type for all PSMs. This will enable the transfer learning of the MS2 model.

For DIA data, only RT and CCS (if timsTOF) models will be refined.

For example of the settings yaml:

model_mgr:
  transfer:
    ...
    psm_type: pfind
    psm_files:
    - /path/to/pFind.spectra
    - /path/to/other/pFind.spectra

    ms_file_type: thermo_raw
    ms_files:
    - /path/to/raw1.raw
    - /path/to/raw2.raw
    ...

The refined models will be saved in model_mgr:transfer:model_output_folder. After transfer learning, users can apply the new models by replacing model_mgr:external_ms2_model, model_mgr:external_rt_model and model_mgr:external_ccs_model with the saved ms2.pth, rt.pth and ccs.pth in model_mgr:transfer:model_output_folder. This is useful to perform sample-specific library prediction.

rescore

This command will apply Percolator to rescore DDA PSMs in percolator:input_files:psm_files and percolator:input_files:psm_type. All yaml settings (exported by export-settings) related to this command are:

percolator:
  require_model_tuning: True
  raw_num_to_tune: 8

  require_raw_specific_tuning: True
  raw_specific_ms2_tuning: False
  psm_num_per_raw_to_tune: 200
  epoch_per_raw_to_tune: 5

  multiprocessing: True

  top_k_frags_to_calc_spc: 10
  calibrate_frag_mass_error: False
  max_perc_train_sample: 1000000
  min_perc_train_sample: 100

  percolator_backend: sklearn
  percolator_backend_choices:
    - sklearn
    - pytorch
  percolator_model: linear
  percolator_model_choices:
    pytorch_as_backend:
      - linear # not fully tested, performance may be unstable
      - mlp # not implemented yet
    sklearn_as_backend:
      - linear # logistic regression
      - random_forest
  lr_percolator_torch_model: 0.1 # learning rate, only used when percolator_backend==pytorch
  percolator_iter_num: 5 # percolator iteration number
  cv_fold: 1
  fdr: 0.01
  fdr_level: psm
  fdr_level_choices:
    - psm
    - precursor
    - peptide
    - sequence
  use_fdr_for_each_raw: False
  frag_types: ['b_z1','b_z2','y_z1','y_z2']
  input_files:
    psm_type: alphapept
    psm_type_choices:
      - alphapept
      - pfind
    psm_files: []
    ms_file_type: alphapept_hdf
    ms_file_type_choices:
      - alphapept_hdf
      - thermo_raw # if alpharaw is installed
      - mgf
      - mzml
    ms_files: []
    other_score_column_mapping:
      alphapept: {}
      pfind:
        raw_score: Raw_Score
      msfragger:
        hyperscore: hyperscore
        nextscore: nextscore
      maxquant: {}
  output_folder: "{PEPTDEEP_HOME}/rescore"

Transfer learning will be applied when rescoring if percolator:require_model_tuning is True.

The corresponding MS files (percolator:input_files:ms_files and percolator:input_files:ms_file_type) must be provided to extract experimental fragment intensities.

install-models

peptdeep install-models [--model-file url_or_local_model_zip] --overwrite True

Running peptdeep for the first time, it will download and install models from models on github defined in ‘model_url’ in the default yaml settings. This command will update pretrained_models.zip from --model-file url_or_local_model_zip.

It is also possible to use other models instead of the pretrained_models by providing model_mgr:external_ms2_model, model_mgr:external_rt_model and model_mgr:external_ccs_model.

Python and Jupyter notebooks

Using peptdeep from Python script or notebook provides the most flexible way to access all features in peptdeep.

We will introduce several usages of peptdeep via Python notebook:

global_settings
Pipeline APIs
ModelManager
Library Prediction
DDA Rescoring
HLA Peptide Prediction

global_settings

Most of the default parameters and attributes peptdeep functions and classes are controlled by peptdeep.settings.global_settings which is a dict.

from peptdeep.settings import global_settings

The default values of global_settings is defined in default_settings.yaml.

Pipeline APIs

Pipeline APIs provides the same functionalities with CLI, including library prediction, transfer learning, and rescoring.

from peptdeep.pipeline_api import (
    generate_library,
    transfer_learn,
    rescore,
)

All these functionalities take a settings_dict as the inputs, the dict structure is the same as the settings yaml file. See the documatation of generate_library, transfer_learn, rescore in https://alphapeptdeep.readthedocs.io/en/latest/module_pipeline_api.html.

ModelManager

from peptdeep.pretrained_models import ModelManager

ModelManager class is the main entry to access MS2/RT/CCS models. It provides functionalities to train/refine the models and then use the new models to predict the data.

Check tutorial_model_manager.ipynb for details.

Library Prediction

from peptdeep.protein.fasta import PredictSpecLibFasta

PredictSpecLibFasta class provides functionalities to deal with fasta files or protein sequences and spectral libraries.

Check out tutorial_speclib_from_fasta.ipynb for details.

DDA Rescoring

from peptdeep.rescore.percolator import Percolator

Percolator class provides functionalities to rescore DDA PSMs search by pFind and AlphaPept, (and MaxQuant if output FDR=100%), …

Check out test_percolator.ipynb for details.

HLA Peptide Prediction

from peptdeep.model.model_interface import ModelInterface
import peptdeep.model.generic_property_prediction # model shop

Building new DL models for peptide property prediction is one of the key features of AlphaPeptDeep. The key functionalities are ModelInterface and the pre-designed models and model interfaces in the model shop (module peptdeep.model.generic_property_prediction).

For example, we can built a HLA classifier that distinguishes HLA peptides from non-HLA peptides, see https://github.com/MannLabs/PeptDeep-HLA for details.

Troubleshooting

In case of issues, check out the following:

Issues. Try a few different search terms to find out if a similar problem has been encountered before.
Discussions. Check if your problem or feature requests has been discussed before.

How to contribute

If you like this software, you can give us a star to boost our visibility! All direct contributions are also welcome. Feel free to post a new issue or clone the repository and create a pull request with a new branch. For an even more interactive participation, check out the discussions and the the Contributors License Agreement.

Changelog

See the HISTORY.md for a full overview of the changes made in each version.

alphapeptdeep's People

Contributors

Stargazers

Watchers

alphapeptdeep's Issues

Observability Prediction

Dear Developers
I am new to the Alpha Environment and have recently come across a project in which I have to select peptides for targeted detection.
Does your framework provide the possibility to predict relative ionization efficiency (MS1 intensity) between peptides or could it be trained to do that? Or is there another resource that you can recommend? d::POP, PeptideRank, PREGO, MaxDB manual selection, PeptideProphet manual selection are all not really satisfying when I compare them to real life data.

Any help / insight would be appreciated!

Best Klemens

Errors in Transfer

I used the cmd 'peptdeep transfer setting.yaml' to refine a model for my own data. The params look like this:

I tried two different psm type. One was 'msfragger_pepxml' and another was 'maxquant', but I got two same errors like this:

I do not know what is wrong with it. Could you please help me figure this out?
As you listed in the setting_yaml file 'pfind,diann,speclib_tsv' in psm_type, I wonder if you could provide some example files to show what is the exact columns they contain to be compatible with peptdeep.

Thanks in advance!

Best regards
Xiaoxiang

Library translate tsv: IndexError: index 4 is out of bounds for axis 1 with size 4

Hi, hope you are well!

Describe the bug
I get an error that index is out of bounds when using the library command with mostly default settings apart from new Phospho modifications added. The spectral library generation works up to the HDF but on translating to TSV errors and hangs with no progress being made yet the process not being killed.

To Reproduce
Steps to reproduce the behavior:

Install developer mode alphapeptdeep.
conda activate peptdeep
peptdeep export-settings /home/james.burgess/projects/JB240206_AlphaPeptDeep/AlphaPeptDeep/configs/default-settings.yaml
Make some changes to the default-settings.yaml to add Phospho modifications and adjust file paths.
default-settings.txt
peptdeep library /home/james.burgess/projects/JB240206_AlphaPeptDeep/AlphaPeptDeep/configs/default-settings.yaml

Expected behavior
A tsv spectral library should also be produced that can be used in DIA-NN or Spectronaut.

Logs
Provided error message from terminal which was same as log file. See context below.

Version (please complete the following information):

Installation Type: Developer
2024-02-06 15:59:24> [PeptDeep] Running library task ...
2024-02-06 15:59:24> Input files (fasta): ['/ibm/hpcfs1/tmp/JB230123_DATA/DATA/MaxDIA/fasta/JB230123_quantms_database.fasta']
2024-02-06 15:59:24> Platform information:
2024-02-06 15:59:24> system - Linux
2024-02-06 15:59:24> release - 4.18.0-513.5.1.el8_9.x86_64
2024-02-06 15:59:24> version - #1 SMP Fri Nov 17 03:31:10 UTC 2023
2024-02-06 15:59:24> machine - x86_64
2024-02-06 15:59:24> processor - x86_64
2024-02-06 15:59:24> cpu count - 128
2024-02-06 15:59:24> ram - 325.7/503.0 Gb (available/total)
2024-02-06 15:59:24>
2024-02-06 15:59:24> Python information:
2024-02-06 15:59:24> alphabase - 1.2.1
2024-02-06 15:59:24> alphabase> -
2024-02-06 15:59:24> alpharaw - 0.4.2
2024-02-06 15:59:24> alpharaw> -
2024-02-06 15:59:24> biopython - 1.83
2024-02-06 15:59:24> click - 8.1.7
2024-02-06 15:59:24> lxml - 5.1.0
2024-02-06 15:59:24> numba - 0.59.0
2024-02-06 15:59:24> numpy - 1.26.3
2024-02-06 15:59:24> pandas - 2.2.0
2024-02-06 15:59:24> peptdeep - 1.1.5
2024-02-06 15:59:24> psutil - 5.9.8
2024-02-06 15:59:24> pyteomics - 4.6.3
2024-02-06 15:59:24> python - 3.9.18
2024-02-06 15:59:24> scikit-learn - 1.4.0
2024-02-06 15:59:24> streamlit - 1.30.0
2024-02-06 15:59:24> streamlit-aggrid - 0.3.4.post3
2024-02-06 15:59:24> streamlit> -
2024-02-06 15:59:24> torch - 2.2.0
2024-02-06 15:59:24> tqdm - 4.66.1
2024-02-06 15:59:24> transformers - 4.37.2

Additional context
error_message.txt

Error in 'peptdeep --install-model'

Sorry to bother you.
I have peptdeep installed in my linux system.
when I ran the command 'peptdeep library settings_yaml'. I got an error like this.

So, I downloaded the 'pretrained_models.zip' manually, and then I ran the command "peptdeep install-models --model-file ./pretrained_models.zip --overwrite True" , I got the same error.

All I did was followed the instructions.
Please help to fix the problem, thanks very much!

Best
Xiaoxiang

Redundant protein names for one precursor entry in the spectral library

If a protein sequence contains a same peptide at different positions, the peptide/precursor entry will include the redundant protein ids/names. For example:

prot_1
ABCKABCK
prot_2
ABCK

The 'ABCK' entry will have protein_idxes '0;0;1' and protein names 'prot_1;prot1;prot_2'. The correct ones should be '0;1' and 'prot_1;prot2'.

This has been fixed in the development branch of alphabase: 6f8f333f1aa6e2c661b8b58ab65845a278ea75c1. And it will be released soon for pypi and installers.

How does "matched_intensity_df" from ModelManager.train_ms2_model (pretrain_model.py) look like?

Hello, thanks for such a useful package!

I've been trying to fine-tune ms2_model from ModelManager object, but I'm having hard time figuring out how "matched_intensity_df" data frame looks like.
Could you describe the structure of the data frame? or provide a sample. Thank you so much!

Support cli commands in addition to yaml configs

Crashing on prediction from peptide list

Describe the bug
Hi, GUI interface outputs an error when submitting a job from sequence_table:

2023-08-24 09:41:14> Downloading pretrained_models.zip ...
2023-08-24 09:41:16> The pretrained models had been downloaded in C:\Users\slavat.WISMAIN/peptdeep\pretrained_models\pretrained_models.zip
Starting PeptDeep Web Server on port 10077 ...

  You can now view your Streamlit app in your browser.

  Local URL: http://localhost:10077
  Network URL: http://132.77.89.166:10077

*********************************
[PeptDeep] Waiting for tasks ...
*********************************
[PeptDeep] Starting a new job 'C:\Users\slavat.WISMAIN/peptdeep/tasks/queue\peptdeep_library_2023-08-24--11-03-00.912749.yaml'...
[PeptDeep] Predicting library ...
2023-08-24 11:03:01> Platform information:
2023-08-24 11:03:01> system        - Windows
2023-08-24 11:03:01> release       - 10
2023-08-24 11:03:01> version       - 10.0.19044
2023-08-24 11:03:01> machine       - AMD64
2023-08-24 11:03:01> processor     - AMD64 Family 23 Model 96 Stepping 1, AuthenticAMD
2023-08-24 11:03:01> cpu count     - 16
2023-08-24 11:03:01> ram           - 23.5/31.4 Gb (available/total)
2023-08-24 11:03:01>
2023-08-24 11:03:01> Python information:
2023-08-24 11:03:01> alphabase        - 1.0.2
2023-08-24 11:03:01> biopython        - 1.81
2023-08-24 11:03:01> click            - 8.1.3
2023-08-24 11:03:01> lxml             - 4.9.2
2023-08-24 11:03:01> numba            - 0.56.4
2023-08-24 11:03:01> numpy            - 1.23.5
2023-08-24 11:03:01> pandas           - 1.5.3
2023-08-24 11:03:01> peptdeep         - 1.0.2
2023-08-24 11:03:01> psutil           - 5.9.4
2023-08-24 11:03:01> python           - 3.9.16
2023-08-24 11:03:01> scikit-learn     - 1.2.1
2023-08-24 11:03:01> streamlit        - 1.19.0
2023-08-24 11:03:01> streamlit-aggrid - 0.3.3
2023-08-24 11:03:01> torch            - 1.13.1
2023-08-24 11:03:01> tqdm             - 4.64.1
2023-08-24 11:03:01> transformers     - 4.26.1
2023-08-24 11:03:01>
2023-08-24 11:03:05> Generating the spectral library ...
2023-08-24 11:03:05> Traceback (most recent call last):
  File "peptdeep\pipeline_api.py", line 308, in generate_library
    lib_maker.make_library(df)
  File "peptdeep\spec_lib\library_factory.py", line 101, in make_library
    self._input(_input)
  File "peptdeep\spec_lib\library_factory.py", line 210, in _input
    self.spec_lib.append_decoy_sequence()
  File "alphabase\spectral_library\base.py", line 166, in append_decoy_sequence
    decoy_lib.decoy_sequence()
  File "alphabase\spectral_library\decoy.py", line 65, in decoy_sequence
    self._decoy_seq()
  File "alphabase\spectral_library\decoy.py", line 74, in _decoy_seq
    ) = self._precursor_df.sequence.apply(
  File "pandas\core\generic.py", line 5902, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'sequence'

'DataFrame' object has no attribute 'sequence'

To Reproduce
Steps to reproduce the behavior:

Start GUI
Put a sequence_table (attached) in the library
Keep everything else default and submit

[peptide_list.txt](https://github.com/MannLabs/alphapeptdee
peptide_list.zip
p/files/12427127/peptide_list.txt)

Retrain model from a spectral library tsv file containing custom modifications

I am trying to retrain the MS2 model using a spectral library file (tsv format) which contains a custom modification on cysteine.

Here is the output I got, it looks like most of spectra were skipped because the unknown modification. Could you help me to figure out which part I did wrong? Here I attached the ymal file and the trimmed version of the library tsv file.
Thank you very much in advance.

peptdeep_transfer_2024-01-05--11-10-14.400893.yaml.txt
NCIH_KRAS_GPF_Library_1_plus_2.report-lib_trimmed.tsv.txt

C:\>peptdeep transfer C:\Users\Chih-ChiangTsou\peptdeep\peptdeep_transfer_2024-01-05--11-10-14.400893.yaml

     ____             __  ____
    / __ \___  ____  / /_/ __ \___  ___  ____
   / /_/ / _ \/ __ \/ __/ / / / _ \/ _ \/ __ \
  / ____/  __/ /_/ / /_/ /_/ /  __/  __/ /_/ /
 /_/    \___/ .___/\__/_____/\___/\___/ .___/
           /_/                       /_/
....................................................
.                      1.1.1                       .
.       https://github.com/MannLabs/peptdeep       .
.                    Apache 2.0                    .
....................................................

2024-01-05 11:25:58> [PeptDeep] Running train task ...
2024-01-05 11:25:58> Platform information:
2024-01-05 11:25:58> system        - Windows
2024-01-05 11:25:58> release       - 10
2024-01-05 11:25:58> version       - 10.0.22631
2024-01-05 11:25:58> machine       - AMD64
2024-01-05 11:25:58> processor     - Intel64 Family 6 Model 141 Stepping 1, GenuineIntel
2024-01-05 11:25:58> cpu count     - 16
2024-01-05 11:25:58> ram           - 56.0/79.7 Gb (available/total)
2024-01-05 11:25:58>
2024-01-05 11:25:58> Python information:
2024-01-05 11:25:58> alphabase        - 1.2.0
2024-01-05 11:25:58> alpharaw         - 0.4.0
2024-01-05 11:25:58> biopython        - 1.82
2024-01-05 11:25:58> click            - 8.1.7
2024-01-05 11:25:58> lxml             - 5.0.0
2024-01-05 11:25:58> numba            - 0.58.1
2024-01-05 11:25:58> numpy            - 1.22.3
2024-01-05 11:25:58> pandas           - 1.4.2
2024-01-05 11:25:58> peptdeep         - 1.1.1
2024-01-05 11:25:58> psutil           - 5.9.7
2024-01-05 11:25:58> pyteomics        - 4.6.3
2024-01-05 11:25:58> python           - 3.10.4
2024-01-05 11:25:58> scikit-learn     - 1.3.2
2024-01-05 11:25:58> streamlit        - 1.29.0
2024-01-05 11:25:58> streamlit-aggrid - 0.3.4.post3
2024-01-05 11:25:58> torch            - 2.1.2
2024-01-05 11:25:58> tqdm             - 4.66.1
2024-01-05 11:25:58> transformers     - 4.36.2
2024-01-05 11:25:58>
2024-01-05 11:26:00> Loading PSMs and extracting fragments ...
794604 Entries with unknown modifications are removed
100%|████████████████████████████████████████████████████████████████████████████| 4052/4052 [00:01<00:00, 3027.46it/s]
2024-01-05 11:26:21> Loaded 4052 PSMs for training and testing
2024-01-05 11:26:21> Training RT model ...
2024-01-05 11:26:21> 3862 PSMs for RT model training/transfer learning
2024-01-05 11:26:21> Training with fixed sequence length: 0
[Training] Epoch=1, lr=1e-05, loss=0.13977318226049343

Error Occurs When Setting 'Min Number of Variable Modifications' to 1

Describe the bug
I am encountering an error when attempting to set the 'Min number of variable modifications' to 1 on the Library page of v1.0.2 of the Windows GUI.

Steps to Reproduce

Open the Windows GUI version 1.0.2.
Navigate to the Library page.
Upload any FASTA file.
Leave all settings to their default values, except for the 'Min number of variable modifications,' which should be set to 1.

Logs

2023-11-08 17:18:26> Traceback (most recent call last):
  File "peptdeep\pipeline_api.py", line 302, in generate_library
    lib_maker.make_library(lib_settings['infiles'])
  File "peptdeep\spec_lib\library_factory.py", line 103, in make_library
    self._predict()
  File "peptdeep\spec_lib\library_factory.py", line 67, in _predict
    self.spec_lib.predict_all()
  File "peptdeep\spec_lib\predict_lib.py", line 111, in predict_all
    self.calc_precursor_mz()
  File "alphabase\spectral_library\base.py", line 186, in calc_precursor_mz
    fragment.update_precursor_mz(self._precursor_df)
  File "alphabase\peptide\precursor.py", line 108, in update_precursor_mz
    pep_mzs = calc_peptide_masses_for_same_len_seqs(
  File "alphabase\peptide\mass_calc.py", line 164, in calc_peptide_masses_for_same_len_seqs
    if len(mods) > 0:
TypeError: object of type 'float' has no len()

Version

  2023-11-08 17:11:51> Platform information:
  2023-11-08 17:11:51> system        - Windows
  2023-11-08 17:11:51> release       - 10
  2023-11-08 17:11:51> version       - 10.0.19045
  2023-11-08 17:11:51> machine       - AMD64
  2023-11-08 17:11:51> processor     - Intel64 Family 6 Model 140 Stepping 1, GenuineIntel
  2023-11-08 17:11:51> cpu count     - 8
  2023-11-08 17:11:51> ram           - 11.8/31.7 Gb (available/total)
  2023-11-08 17:11:51>
  2023-11-08 17:11:51> Python information:
  2023-11-08 17:11:51> alphabase        - 1.0.0
  2023-11-08 17:11:51> biopython        - 1.80
  2023-11-08 17:11:51> click            - 8.1.3
  2023-11-08 17:11:51> lxml             - 4.9.2
  2023-11-08 17:11:51> numba            - 0.56.4
  2023-11-08 17:11:51> numpy            - 1.23.5
  2023-11-08 17:11:51> pandas           - 1.5.2
  2023-11-08 17:11:51> peptdeep         - 1.0.2
  2023-11-08 17:11:51> psutil           - 5.9.4
  2023-11-08 17:11:51> python           - 3.9.16
  2023-11-08 17:11:51> scikit-learn     - 1.2.0
  2023-11-08 17:11:51> streamlit        - 1.16.0
  2023-11-08 17:11:51> streamlit-aggrid - 0.3.3
  2023-11-08 17:11:51> torch            - 1.13.1
  2023-11-08 17:11:51> tqdm             - 4.64.1
  2023-11-08 17:11:51> transformers     - 4.25.1

installers v1.0.0: ModuleNotFoundError: No module named 'sklearn.metrics._pairwise_distances_reduction._datasets_pair'

Some pyd files are not correctly compiled into the installers

Transfer learning to refine predicted library

Hello,
I am learning to refine our predicted spectral library to better match various samples (various proteolytic peptides, novel modifications, etc.). I am starting with standard unmodified tryptic peptides to ensure I am using AlphPeptDeep correctly, but I have run into issues with the file uploads. I have tried importing various speclib_tsv libraries (initially with then without modifications) as well as diann reports but am running into the same errors. I did not locate any log output, but attached is the output from my Windows PowerShell
2024-02-22_log.txt

I suspect I am loading inappropriate/invalid files. Are there any example file formats that I can test with?

Is it possible to get 1/K0 mobility instead of CCS from AlpahPept CCS predictions?

I am using the AlphaPept_ccs predictor through the Koina server.
https://koina.proteomicsdb.org/docs/#post-/AlphaPept_ccs_generic/infer

The only described output is 'ccs'. I am seeking to compare the predictions with measured ion mobilities in units of 1/K0 from our Bruker timsTOF SCP instrument. Is there an undocumented option to get the output directly as 1/K0? Or do I need to do that myself?

The following sentence is in the methods section of your paper Zeng et al NatComm_2022:
The predicted CCS values are converted to mobilities of Bruker timsTOF using the Mason Schamp equation.

Thanks,
--Karl

How to include 3+ fragments?

Hi,

We are trying to identify longer petptides and therefore higher charge states and 3+ fragments become important to us.
I have a few DDA library with 5+, 6+ peptide ions included also with all 1+, 2+, and 3+ fragments. I changed the yaml file

max_frag_charge: 3

and I was able to perform transfer learning without problem, but then after I generate predicted library from FASTA file, still only 1+, 2+ fragments were included in the tsv file.

Could you please advise me on how to include 3+ fragments?
Here is my yaml file:

model:
  frag_types:
  - b
  - y
  - b_modloss
  - y_modloss
  max_frag_charge: 3
PEPTDEEP_HOME: C:\Users\Administrator/peptdeep
local_model_zip_name: pretrained_models.zip
model_url: https://github.com/MannLabs/alphapeptdeep/releases/download/pre-trained-models/pretrained_models.zip
task_workflow:
- library
task_choices:
- train
- library
thread_num: 40
MAX_THREADS: 60
torch_device:
  device_type: gpu
  device_type_choices:
  - get_available
  - gpu
  - mps
  - cpu
  device_ids: []
log_level: info
log_level_choices:
- debug
- info
- warning
- error
- critical
common:
  modloss_importance_level: 1.0
  user_defined_modifications:
    IADTB@C:
      composition: H(24)C(14)N(4)O(3)
      modloss_composition: ''
peak_matching:
  ms2_ppm: true
  ms2_tol_value: 20.0
  ms1_ppm: true
  ms1_tol_value: 20.0
model_mgr:
  default_nce: 30.0
  default_instrument: Lumos
  mask_modloss: true
  model_type: generic
  model_choices:
  - generic
  - phos
  - hla
  - digly
  external_ms2_model: C:/Users/Administrator/peptdeep/refined_models_v3/ms2.pth
  external_rt_model: C:/Users/Administrator/peptdeep/refined_models_v3/rt.pth
  external_ccs_model: C:/Users/Administrator/peptdeep/refined_models_v3/ccs.pth
  instrument_group:
    ThermoTOF: ThermoTOF
    Astral: ThermoTOF
    Lumos: Lumos
    QE: QE
    timsTOF: timsTOF
    SciexTOF: SciexTOF
    Fusion: Lumos
    Eclipse: Lumos
    Velos: Lumos
    Elite: Lumos
    OrbitrapTribrid: Lumos
    ThermoTribrid: Lumos
    QE+: QE
    QEHF: QE
    QEHFX: QE
    Exploris: QE
    Exploris480: QE
    THERMOTOF: ThermoTOF
    ASTRAL: ThermoTOF
    LUMOS: Lumos
    TIMSTOF: timsTOF
    SCIEXTOF: SciexTOF
    FUSION: Lumos
    ECLIPSE: Lumos
    VELOS: Lumos
    ELITE: Lumos
    ORBITRAPTRIBRID: Lumos
    THERMOTRIBRID: Lumos
    EXPLORIS: QE
    EXPLORIS480: QE
  predict:
    batch_size_ms2: 512
    batch_size_rt_ccs: 1024
    verbose: true
    multiprocessing: true
  transfer:
    model_output_folder: C:/Users/Administrator/peptdeep/refined_models
    epoch_ms2: 20
    warmup_epoch_ms2: 10
    batch_size_ms2: 512
    lr_ms2: 0.0001
    epoch_rt_ccs: 40
    warmup_epoch_rt_ccs: 10
    batch_size_rt_ccs: 1024
    lr_rt_ccs: 0.0001
    verbose: false
    grid_nce_search: false
    grid_nce_first: 15.0
    grid_nce_last: 45.0
    grid_nce_step: 3.0
    grid_instrument:
    - Lumos
    psm_type: alphapept
    psm_type_choices:
    - alphapept
    - pfind
    - maxquant
    - diann
    - speclib_tsv
    - msfragger_pepxml
    - spectronaut_report
    dda_psm_types:
    - alphapept
    - pfind
    - maxquant
    - msfragger_pepxml
    psm_files: []
    ms_file_type: alphapept_hdf
    ms_file_type_choices:
    - alphapept_hdf
    - thermo_raw
    - mgf
    - mzml
    ms_files: []
    psm_num_to_train_ms2: 100000000
    psm_num_per_mod_to_train_ms2: 50
    psm_num_to_test_ms2: 0
    psm_num_to_train_rt_ccs: 100000000
    psm_num_per_mod_to_train_rt_ccs: 50
    psm_num_to_test_rt_ccs: 0
    top_n_mods_to_train: 10
    psm_modification_mapping: {}
library:
  infile_type: fasta
  infile_type_choices:
  - fasta
  - sequence_table
  - peptide_table
  - precursor_table
  - all_other_psm_reader_types
  infiles:
  - C:/Users/Administrator/peptdeep/uniprot_swissprot_20200903_can_iso_with_LOH_altAlleles_Ver3_20230808_fragpipe.fasta
  fasta:
    protease: ([DE])
    protease_choices:
    - trypsin
    - ([KR])
    - trypsin_not_P
    - ([KR](?=[^P]))
    - lys-c
    - K
    - lys-n
    - \w(?=K)
    - chymotrypsin
    - asp-n
    - glu-c
    max_miss_cleave: 5
    add_contaminants: false
  fix_mods:
  - IADTB@C
  var_mods:
  - Acetyl@Protein_N-term
  - Oxidation@M
  special_mods: []
  special_mods_cannot_modify_pep_n_term: false
  special_mods_cannot_modify_pep_c_term: false
  labeling_channels: {}
  min_var_mod_num: 0
  max_var_mod_num: 2
  min_special_mod_num: 0
  max_special_mod_num: 1
  min_precursor_charge: 2
  max_precursor_charge: 6
  min_peptide_len: 5
  max_peptide_len: 50
  min_precursor_mz: 200.0
  max_precursor_mz: 2000.0
  decoy: None
  decoy_choices:
  - protein_reverse
  - pseudo_reverse
  - diann
  - None
  max_frag_charge: 3
  frag_types:
  - b
  - y
  rt_to_irt: false
  irt_library: xxx/library.tsv
  irt_library_type: speclib_tsv
  generate_precursor_isotope: false
  output_folder: C:/Users/Administrator/peptdeep/spec_libs_uniprot_swissprot_20200903_can_iso_with_LOH_altAlleles_Ver3_20230808_fragpipe_gluc_v3_nodecoy
  output_tsv:
    enabled: true
    min_fragment_mz: 200.0
    max_fragment_mz: 2000.0
    min_relative_intensity: 0.001
    keep_higest_k_peaks: 12
    translate_batch_size: 100000
    translate_mod_to_unimod_id: false

Add transfer learning supports for Sage output

Is it possible to predict 'a, x, c,z' fragment ions?

I tried to predict a spectra library by precursor table, and I set the frag_types as 'b,y,a' in setting.yaml. I found the hdf file successfully saved, but translation to tsv failed.

I wonder if it is possible to predict 'a, x, c,z' fragment ions in spectra library? If it is. How can I translate the hdf to tsv? Thanks!

Following is the log:

Best regards
Xiaoxiang

calc_precursor_mass changed in AlphaBase https://github.com/MannLabs/alphabase/pull/108

See MannLabs/alphabase#108. refine_precursor_df() must be called manually after calc_precursor_mass()

Generic questions about pretrained models

Hello,

I have two questions about the pretrained models.

Were they only trained on HCD data?
Were the "phos" models only trained on ProteomeTools phospho@Y or also phospho@ST? And the other 20 PTMs were not considered in the training?

Thanks,
Kevin

ValueError: need at most 63 handles, got a sequence of length 102

Hi,

I get "ValueError: need at most 63 handles, got a sequence of length 102" when using the old version V1.0.2. Has this been fixed in the new version? I have to check again but I think I only get this error in our analysis computer.

Thanks
Maithy

how to know syntax for modifcations other than phophorylation and oxidation?

If I want to create a library with less common modifications (eg acetylation, methylation, citrulination etc.), is there some documentation or file which indicates which modifications are supported and their syntax?

Peptdeep GUI error in new version

Hi,

When I try to start a GUI from CLI, I get the following error. Could you please tell me what is wrong here?

Thanks
Maithy

Here is the error

(peptdeep) C:\Users\m284y>peptdeep gui

 ____             __  ____
/ __ \___  ____  / /_/ __ \___  ___  ____

/ // / _ / __ / __/ / / / _ / _ / __
/ / __/ // / // // / __/ __/ // /
// _/ ./_//_/_/ ./
// //
....................................................
. 1.1.0 .
. https://github.com/MannLabs/peptdeep .
. Apache 2.0 .
....................................................

Traceback (most recent call last):
File "C:\Users\m284y.conda\envs\peptdeep\lib\runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\m284y.conda\envs\peptdeep\lib\runpy.py", line 87, in run_code
exec(code, run_globals)
File "C:\Users\m284y.conda\envs\peptdeep\Scripts\peptdeep.exe_main.py", line 7, in
File "C:\Users\m284y.conda\envs\peptdeep\lib\site-packages\click\core.py", line 1157, in call
return self.main(*args, **kwargs)
File "C:\Users\m284y.conda\envs\peptdeep\lib\site-packages\click\core.py", line 1078, in main
rv = self.invoke(ctx)
File "C:\Users\m284y.conda\envs\peptdeep\lib\site-packages\click\core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "C:\Users\m284y.conda\envs\peptdeep\lib\site-packages\click\core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "C:\Users\m284y.conda\envs\peptdeep\lib\site-packages\click\core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "C:\Users\m284y.conda\envs\peptdeep\lib\site-packages\peptdeep\cli.py", line 56, in _gui
from peptdeep.webui.server import _server
File "C:\Users\m284y.conda\envs\peptdeep\lib\site-packages\peptdeep\webui\server.py", line 7, in
from peptdeep.pipeline_api import (
ImportError: cannot import name 'rescore' from 'peptdeep.pipeline_api' (C:\Users\m284y.conda\envs\peptdeep\lib\site-packages\peptdeep\pipeline_api.py)

Error in installing by Developer installer

Thanks for your work in alphapeptdeep!

I followed the instruction in Developer installer section to install alphapeptdeep in my linux server. Because the size of the /home folder is very small, so I intend to install in the /tools folder. Following is my cmd:

(peptdeep) user@linux:/tools/alphapeptdeep/project/alphapeptdeep$ sudo pip install -e ".[development]" -t /tools/anaconda3/envs/peptdeep/bin

then I got an error like this:

Please help how to fix this problem. Thanks very much!

Best
Xiaoxiang

Tuning MS2 by low-quality fragment matches on all DIA PSMs at apex_rt resulted in low-quality MS2 prediction

Using DIA PSMs identified by DIA search engine to refine MS2 model may result in low-quality MS2 prediction due to low-quality precursor matches;
Using spectral library tsv to refine the MS2 model may also result in low-quality MS2 prediction. It depends on how many low-quality matches are there in spectral library;

normalize_fragment_intensities not normalizing intensities

Describe the bug
On my local machine, training losses looked regular. When I used a linux server, the training losses were huge (see attached screenshot). I pinned it down to this line in peptdeep.model.ms2.normalize_fragment_intensities:
frag_intensity_df.values[frag_start_idx:frag_stop_idx,:] = intens

The use of values instead of iloc did not update the frag_intensity_df. Could I recommend iloc here, along with any other place where values are updated in a pandas dataframe?

Screenshots

Version (please complete the following information):
On local machine
2023-05-21 19:09:47> Python information:
2023-05-21 19:09:47> alphabase - 1.0.2
2023-05-21 19:09:47> biopython - 1.81
2023-05-21 19:09:47> click - 8.1.3
2023-05-21 19:09:47> lxml - 4.9.1
2023-05-21 19:09:47> numba - 0.57.0
2023-05-21 19:09:47> numpy - 1.24.3
2023-05-21 19:09:47> pandas - 1.5.3
2023-05-21 19:09:47> peptdeep - 1.0.2
2023-05-21 19:09:47> psutil - 5.9.0
2023-05-21 19:09:47> python - 3.9.16
2023-05-21 19:09:47> scikit-learn - 1.2.2
2023-05-21 19:09:47> streamlit - 1.20.0
2023-05-21 19:09:47> streamlit-aggrid - 0.3.4.post3
2023-05-21 19:09:47> torch - 1.12.1
2023-05-21 19:09:47> tqdm - 4.65.0
2023-05-21 19:09:47> transformers - 4.24.0

On Linux server
2023-05-21 19:32:42> Python information:
2023-05-21 19:32:42> alphabase - 1.0.2
2023-05-21 19:32:42> biopython - 1.79
2023-05-21 19:32:42> click - 8.1.3
2023-05-21 19:32:42> lxml - 4.9.2
2023-05-21 19:32:42> numba - 0.53.0
2023-05-21 19:32:42> numpy - 1.20.3
2023-05-21 19:32:42> pandas - 2.0.1
2023-05-21 19:32:42> peptdeep - 1.0.2
2023-05-21 19:32:42> psutil - 5.9.5
2023-05-21 19:32:42> python - 3.9.12
2023-05-21 19:32:42> scikit-learn - 1.2.2
2023-05-21 19:32:42> streamlit - 1.12.0
2023-05-21 19:32:42> streamlit-aggrid - 0.3.4.post3
2023-05-21 19:32:42> torch - 1.12.1
2023-05-21 19:32:42> tqdm - 4.65.0
2023-05-21 19:32:42> transformers - 4.23.1

problem of reading pFind.spectra

Describe the bug
Hello, I am a student of zhejiang university, after I have read the paper, I want to use it to facilitate better performance of DDA identification of HLA.
but when I use its transfer function to fine tune the model with pFind psm, I met problem like this.

here is my pFind spectra file head

here is my "transfer" parameters in setting.yaml

I am sure I install the it in conda virtual environment successfully because I can use the library function and run it well.

any idea of how to deal with it would be very appreciated.

Transfer learning failed (GUI and CLI)

Failed in match_psms() while transfer learning in peptdeep.pipeline_api, in the latest version, match_psms() does not have arguments anymore, global_settings controls all parameters for all pipeline APIs

Missing dependencies for nbs

Hi,
I did install alphapeptdeep locally and tried running some of the Notebooks (test_ccs_rt).
Here, some dependencies were missing:

lxml (for reading the results-html with pandas)
wget (for downloading the files)

wget is in the development requirements.

As a potential solution, one could do one of the following:
(1) include lxml in the dev requirements and make a note at the beginning of the notebooks that dev needs to be installed
(2) include lxml and wget in the default requirements
(3) Download the additional requirements in the notebooks (!pip install lxml)

Customize dtype for fragment mz/intensity dataframes

See MannLabs/alphabase#98

Adding mods into AlphaBase

Hi,

I would like to train a model to predict RT/IM for peptides with a custom modification from DDA data (~10,000 modified peptide IDs output from MSFragger into a tsv library). I'm using PeptDeep via the GUI and am able to add my modification into "User-defined modifications" (and have it successfully update in the yaml file). I click to add the mod into AlphaBase but when I go back to the "Transfer" section to set up the training, the modification is not found in the AlphaBase mod search bar.

Please let me know if I'm misunderstanding the process to train a custom model for DIA library prediction of custom modifications and thanks in advance for any help!

Matt

Move unrelated functionalities to other packages

This issue is to make APD cleaner as APD now is doing too much on unrelated stuffs:

Move MS data readers to AlphaRaw.
Move alphapeptdeep.spec_lib.translate.py to AlphaBase.
Move alphapeptdeep.mass_spec to AlphaRaw.
Change utils as a folder
TODO move alphapeptdeep.rescore to somewhere else.

Windows release failed

pyinstaller:
Unable to find "C:\Miniconda\envs\peptdeep_installer\Library\bin\libcrypto-1_1-x64.dll" when adding binary and data files

how to create evidence and msms.txt files for maxquant search from a fasta

I am trying to create evidence and msms.txt files for maxquant search

from a fasta file but not sure how to go about it?

I have tried psm_type: maxquant in peptdeep library settings.yaml

model:
  frag_types:
  - b
  - y
  - b_modloss
  - y_modloss
  max_frag_charge: 2
PEPTDEEP_HOME: /home/ash022/peptdeep
local_model_zip_name: pretrained_models.zip
model_url: https://github.com/MannLabs/alphapeptdeep/releases/download/pre-trained-models/pretrained_models.zip
task_workflow:
- library
task_choices:
- train
- library
thread_num: 16
torch_device:
  device_type: gpu
  device_type_choices:
  - get_available
  - gpu
  - mps
  - cpu
  device_ids: []
log_level: info
log_level_choices:
- debug
- info
- warning
- error
- critical
common:
  modloss_importance_level: 1.0
  user_defined_modifications: {}
peak_matching:
  ms2_ppm: true
  ms2_tol_value: 20.0
  ms1_ppm: true
  ms1_tol_value: 20.0
model_mgr:
  default_nce: 30.0
  default_instrument: Lumos
  mask_modloss: true
  model_type: generic
  model_choices:
  - generic
  - phos
  - hla
  - digly
  external_ms2_model: ''
  external_rt_model: ''
  external_ccs_model: ''
  instrument_group:
    ThermoTOF: ThermoTOF
    Astral: ThermoTOF
    Lumos: Lumos
    QE: QE
    timsTOF: timsTOF
    SciexTOF: SciexTOF
    Fusion: Lumos
    Eclipse: Lumos
    Velos: Lumos
    Elite: Lumos
    OrbitrapTribrid: Lumos
    ThermoTribrid: Lumos
    QE+: QE
    QEHF: QE
    QEHFX: QE
    Exploris: QE
    Exploris480: QE
    THERMOTOF: ThermoTOF
    ASTRAL: ThermoTOF
    LUMOS: Lumos
    TIMSTOF: timsTOF
    SCIEXTOF: SciexTOF
    FUSION: Lumos
    ECLIPSE: Lumos
    VELOS: Lumos
    ELITE: Lumos
    ORBITRAPTRIBRID: Lumos
    THERMOTRIBRID: Lumos
    EXPLORIS: QE
    EXPLORIS480: QE
  predict:
    batch_size_ms2: 512
    batch_size_rt_ccs: 1024
    verbose: true
    multiprocessing: true
  transfer:
    model_output_folder: /home/ash022/peptdeep/refined_models
    epoch_ms2: 20
    warmup_epoch_ms2: 10
    batch_size_ms2: 512
    lr_ms2: 0.0001
    epoch_rt_ccs: 40
    warmup_epoch_rt_ccs: 10
    batch_size_rt_ccs: 1024
    lr_rt_ccs: 0.0001
    verbose: false
    grid_nce_search: false
    grid_nce_first: 15.0
    grid_nce_last: 45.0
    grid_nce_step: 3.0
    grid_instrument:
    - Lumos
    psm_type: maxquant
    psm_type_choices:
    - alphapept
    - pfind
    - maxquant
    - diann
    - speclib_tsv
    - msfragger_pepxml
    - spectronaut_report
    dda_psm_types:
    - alphapept
    - pfind
    - maxquant
    - msfragger_pepxml
    psm_files: []
    ms_file_type: alphapept_hdf
    ms_file_type_choices:
    - alphapept_hdf
    - thermo_raw
    - mgf
    - mzml
    ms_files: []
    psm_num_to_train_ms2: 100000000
    psm_num_per_mod_to_train_ms2: 50
    psm_num_to_test_ms2: 0
    psm_num_to_train_rt_ccs: 100000000
    psm_num_per_mod_to_train_rt_ccs: 50
    psm_num_to_test_rt_ccs: 0
    top_n_mods_to_train: 10
    psm_modification_mapping: {}
library:
  infile_type: fasta
  infile_type_choices:
  - fasta
  - sequence_table
  - peptide_table
  - precursor_table
  - all_other_psm_reader_types
  infiles:
  - /home/ash022/FastaDB/UP000005640_9606.fasta
  fasta:
    protease: trypsin
    protease_choices:
    - trypsin
    - ([KR])
    - trypsin_not_P
    - ([KR](?=[^P]))
    - lys-c
    - K
    - lys-n
    - \w(?=K)
    - chymotrypsin
    - asp-n
    - glu-c
    max_miss_cleave: 2
    add_contaminants: false
  fix_mods:
  - Carbamidomethyl@C
  var_mods:
  - Acetyl@Protein_N-term
  - Oxidation@M
  special_mods: []
  special_mods_cannot_modify_pep_n_term: false
  special_mods_cannot_modify_pep_c_term: false
  labeling_channels: {}
  min_var_mod_num: 0
  max_var_mod_num: 2
  min_special_mod_num: 0
  max_special_mod_num: 1
  min_precursor_charge: 2
  max_precursor_charge: 4
  min_peptide_len: 7
  max_peptide_len: 35
  min_precursor_mz: 200.0
  max_precursor_mz: 2000.0
  decoy: pseudo_reverse
  decoy_choices:
  - protein_reverse
  - pseudo_reverse
  - diann
  - None
  max_frag_charge: 2
  frag_types:
  - b
  - y
  rt_to_irt: false
  generate_precursor_isotope: false
  output_folder: /home/ash022/peptdeep/spec_libs
  output_tsv:
    enabled: false
    min_fragment_mz: 200.0
    max_fragment_mz: 2000.0
    min_relative_intensity: 0.001
    keep_higest_k_peaks: 12
    translate_batch_size: 100000
    translate_mod_to_unimod_id: false

but i am still getting the hdf output which i am not sure how to convert to evidence&msms.txt for maxquant?

generated log is following

2023-11-23 14:47:11> [PeptDeep] Running library task ...
2023-11-23 14:47:11> Input files (fasta): ['/home/ash022/FastaDB/UP000005640_9606.fasta']
2023-11-23 14:47:11> Platform information:
2023-11-23 14:47:11> system        - Linux
2023-11-23 14:47:11> release       - 4.18.0-372.9.1.el8.x86_64
2023-11-23 14:47:11> version       - #1 SMP Tue May 10 14:48:47 UTC 2022
2023-11-23 14:47:11> machine       - x86_64
2023-11-23 14:47:11> processor     - x86_64
2023-11-23 14:47:11> cpu count     - 255
2023-11-23 14:47:11> ram           - 846.4/1007.4 Gb (available/total)
2023-11-23 14:47:11> 
2023-11-23 14:47:11> Python information:
2023-11-23 14:47:11> alphabase        - 1.1.1
2023-11-23 14:47:11> alpharaw         - 0.2.0
2023-11-23 14:47:11> biopython        - 1.81
2023-11-23 14:47:11> click            - 8.1.3
2023-11-23 14:47:11> lxml             - 4.9.1
2023-11-23 14:47:11> numba            - 0.55.2
2023-11-23 14:47:11> numpy            - 1.22.0
2023-11-23 14:47:11> pandas           - 1.4.3
2023-11-23 14:47:11> peptdeep         - 1.1.0
2023-11-23 14:47:11> psutil           - 5.9.1
2023-11-23 14:47:11> pyteomics        - 4.6.3
2023-11-23 14:47:11> python           - 3.10.5
2023-11-23 14:47:11> scikit-learn     - 1.1.2
2023-11-23 14:47:11> streamlit        - 1.28.2
2023-11-23 14:47:11> streamlit-aggrid - 0.3.4.post3
2023-11-23 14:47:11> torch            - 1.12.1
2023-11-23 14:47:11> tqdm             - 4.64.0
2023-11-23 14:47:11> transformers     - 4.35.2
2023-11-23 14:47:11> 
2023-11-23 14:47:16> Generating the spectral library ...
2023-11-23 14:50:08> Predicting RT/IM/MS2 for 20537685 precursors ...
2023-11-23 14:50:08> Predicting RT ...
2023-11-23 14:54:17> Predicting mobility ...
2023-11-23 15:01:02> Predicting MS2 ...
2023-11-23 15:11:07> End predicting RT/IM/MS2
2023-11-23 15:11:07> Predicting the spectral library with 20537685 precursors and 1439.50M fragments used 19.5068 GB memory
2023-11-23 15:11:07> Saving HDF library to /home/ash022/peptdeep/spec_libs/predict.speclib.hdf ...
2023-11-23 15:13:06> Library generated!!

Data integration

Hello，
Thank you for your great contribution.
I would like to ask after analyzing the raw PXD data from different ddaPASEF or diaPASEF types with search software, how can the RT, Ion Intensity and CCS data to be predicted be integrated into one data set for training?

Exporting library results from hdf5 to tsv uses a single thread (and therefore takes 95% of processing time, ie 30 min))

When exporting library results from hdf to tsv, only one thread is being used. As a result, on my 212 core google cloud VM, only < 2 minutes or so is spent on the "real" work (ie predicting rt and fragmentation intensity), and about 30 minutes is spent (single threaded) on the exporting from hdf to tsv.

Is there any way in which that export can be done in parallel? (I'm a pure Java programmer with near-zero Python skills, but in Java I've solved a problem similar to this by running 99% of the export logic in parallel, but then added a synchronization statement for the actual writing out to file, since multiple simultaneous writes would corrupt the file etc; if that's too hard or too risky to do etc., is it possible to export n tsv files, where n is the number of threads -- and each tsv is suffixed somehow with thread id/number? This has the added advantage that it's easy to read the files in parallel as well.)

Thoughts?

(FYI: In case you're interested, I tried reading the hdf file in Java directly, but there appears to be some sort of Python-only library for the string field in hdf that causes problems in non Python languages. I've documented the issue on SO: https://stackoverflow.com/questions/74995561/cant-view-string-fields-in-an-hdf5-file/75034940#75034940 ...that sounds like someone from HDF/Python/Java may need to update their libraries, so it'll probably be a long while, which is why I'm hoping it's easy and safe to somehow export the tsv files in parallel...)

Re-design CLI and GUI parameters

Due to conflicts when two users use the GUI at the same time.

Generate a library with nonspecific digestion rule?

I am trying to generate a library using nonspecific digestion rule, the cutting rule I added was

([ARNCEQGHDILKMFPSTWYV])

Is it correct? I ran it and it has been stuck here for 20 hours with 100% memory usage (140GB), any advice?

2024-02-20 16:22:22> xxx/library.tsv does not exist, use default IRT_PEPTIDE_DF to translate irt
2024-02-20 16:22:22> Generating the spectral library ..

Use refined model that contains user-defined mod to generate a predicted library

          A follow-up question: I was able to get the refined model and I am trying to use it to predict a new library given a FASTA file, but I encountered the following error message, it looks like it has something to do with the user-defined mod. Any clue?

[PeptDeep] Starting a new job 'C:\Users\Chih-ChiangTsou/peptdeep/tasks/queue\peptdeep_library_2024-01-06--12-47-56.749038.yaml'...
[PeptDeep] Predicting library ...
2024-01-06 12:47:59> [PeptDeep] Running library task ...
2024-01-06 12:47:59> Input files (fasta): ['D:/fasta/test.fasta']
2024-01-06 12:47:59> Platform information:
2024-01-06 12:47:59> system        - Windows
2024-01-06 12:47:59> release       - 10
2024-01-06 12:47:59> version       - 10.0.22631
2024-01-06 12:47:59> machine       - AMD64
2024-01-06 12:47:59> processor     - Intel64 Family 6 Model 141 Stepping 1, GenuineIntel
2024-01-06 12:47:59> cpu count     - 16
2024-01-06 12:47:59> ram           - 56.2/79.7 Gb (available/total)
2024-01-06 12:47:59>
2024-01-06 12:47:59> Python information:
2024-01-06 12:47:59> alphabase        - 1.2.0
2024-01-06 12:47:59> alpharaw         - 0.4.0
2024-01-06 12:47:59> biopython        -
2024-01-06 12:47:59> click            - 8.1.7
2024-01-06 12:47:59> lxml             - 4.9.4
2024-01-06 12:47:59> numba            - 0.58.1
2024-01-06 12:47:59> numpy            - 1.26.2
2024-01-06 12:47:59> pandas           - 2.1.4
2024-01-06 12:47:59> peptdeep         - 1.1.3
2024-01-06 12:47:59> psutil           - 5.9.7
2024-01-06 12:47:59> pyteomics        - 4.6.3
2024-01-06 12:47:59> python           - 3.9.18
2024-01-06 12:47:59> scikit-learn     - 1.3.2
2024-01-06 12:47:59> streamlit        - 1.29.0
2024-01-06 12:47:59> streamlit-aggrid -
2024-01-06 12:47:59> torch            - 2.1.2
2024-01-06 12:47:59> tqdm             - 4.66.1
2024-01-06 12:47:59> transformers     - 4.36.2
2024-01-06 12:47:59>
2024-01-06 12:48:01> Using external ms2 model: 'C:/Users/Chih-ChiangTsou/peptdeep/refined_models/ms2.pth'
2024-01-06 12:48:01> Using external rt model: 'C:/Users/Chih-ChiangTsou/peptdeep/refined_models/rt.pth'
2024-01-06 12:48:01> Using external ccs model: 'C:/Users/Chih-ChiangTsou/peptdeep/refined_models/ccs.pth'
2024-01-06 12:48:01> xxx/library.tsv does not exist, use default IRT_PEPTIDE_DF to translate irt
2024-01-06 12:48:01> Generating the spectral library ...
2024-01-06 12:48:01> Loaded 17865 precursors.
2024-01-06 12:48:01> Predicting RT/IM/MS2 for 16892 precursors ...
2024-01-06 12:48:01> Using multiprocessing with 16 processes ...
2024-01-06 12:48:01> Predicting rt,mobility,ms2 ...
  0%|                                                                                           | 0/31 [00:15<?, ?it/s]
2024-01-06 12:48:18> multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "multiprocessing\pool.py", line 125, in worker
  File "peptdeep\pretrained_models.py", line 914, in _predict_func_for_mp
    return self.predict_all(
  File "peptdeep\pretrained_models.py", line 1084, in predict_all
    self.predict_rt(precursor_df,
  File "peptdeep\pretrained_models.py", line 877, in predict_rt
    df = self.rt_model.predict(precursor_df,
  File "peptdeep\model\model_interface.py", line 388, in predict
    features = self._get_features_from_batch_df(
  File "peptdeep\model\rt.py", line 161, in _get_features_from_batch_df
    self._get_mod_features(batch_df)
  File "peptdeep\model\model_interface.py", line 812, in _get_mod_features
    get_batch_mod_feature(batch_df)
  File "peptdeep\model\featurize.py", line 86, in get_batch_mod_feature
    mod_features_list = batch_df.mods.str.split(';').apply(
  File "pandas\core\series.py", line 4757, in apply
    return SeriesApply(
  File "pandas\core\apply.py", line 1209, in apply
    return self.apply_standard()
  File "pandas\core\apply.py", line 1289, in apply_standard
    mapped = obj._map_values(
  File "pandas\core\base.py", line 921, in _map_values
    return algorithms.map_array(arr, mapper, na_action=na_action, convert=convert)
  File "pandas\core\algorithms.py", line 1814, in map_array
    return lib.map_infer(values, mapper, convert=convert)
  File "lib.pyx", line 2926, in pandas._libs.lib.map_infer
  File "peptdeep\model\featurize.py", line 87, in <lambda>
    lambda mod_names: [
  File "peptdeep\model\featurize.py", line 88, in <listcomp>
    MOD_TO_FEATURE[mod] for mod in mod_names
KeyError: 'IADTB@C'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "peptdeep\pipeline_api.py", line 416, in generate_library
    lib_maker.make_library(lib_settings['infiles'])
  File "peptdeep\spec_lib\library_factory.py", line 105, in make_library
    self._predict()
  File "peptdeep\spec_lib\library_factory.py", line 68, in _predict
    self.spec_lib.predict_all()
  File "peptdeep\spec_lib\predict_lib.py", line 121, in predict_all
    res = self.model_manager.predict_all(
  File "peptdeep\pretrained_models.py", line 1127, in predict_all
    return self.predict_all_mp(
  File "peptdeep\pretrained_models.py", line 964, in predict_all_mp
    for ret_dict in process_bar(
  File "peptdeep\utils.py", line 27, in process_bar
    for i,iter in enumerate(iterator):
  File "multiprocessing\pool.py", line 870, in next
KeyError: 'IADTB@C'

'IADTB@C'`

Originally posted by @cctsou in #125 (comment)

Release of Windows GUI

Is there an anticipated date for the release of an alphapeptdeep Windows GUI?

The following download link in the README does not work: https://github.com/MannLabs/alphapeptdeep/releases/latest/download/peptdeep_gui_installer_windows.exe

Thank you!

Missing the documentation on how to use the GUI to train a model for in-silico prediction

Dear AlphaPeptDeep developers,

I am wondering if I could train a model to predict CSS and RTs for diaPASEF based data analysis of samples including a non-common PTM.

Would have a FragPipe-derived ddaPASEF based library at hand, as well as some DIA-NN reports (based on legacy predictor or FragPipe library).

What are the essential steps required to train a predictor and use it for in-silico library prediction?
Which files are needed as input (not sure about what a "PSM" diann output (selectable in GUI) exactly refers to?
Are raw instrument data actually required to train the model (is a section of the "transfer" part, but *.d files are not a selectable option here.

Would highly appreciate any help / link to doc I am simply missing out here.

Thanks in advance!
Michael

Value error + Peptdeep is trying to use multiprocess even when it is not selected

Hi,

Thanks for the latest update.

I am trying it but it seems to get stuck at one point with no output. I have the log pasted here. Additionally I have uploaded the settings file. Please et me know if there is any mistake in the setting.

peptdeep_library_2023-12-29--10-08-25.862954.yaml.zip

Starting PeptDeep Web Server on port 10077 ...

You can now view your Streamlit app in your browser.

Local URL: http://localhost:10077
Network URL: http://193.174.62.4:10077

[PeptDeep] Starting a new job 'C:\Users\m284y/peptdeep/tasks/queue\peptdeep_library_2023-10-05--19-40-40.124913.yaml'...[PeptDeep] Predicting library ...
2023-12-29 10:07:32> [PeptDeep] Running library task ...
2023-12-29 10:07:32> Input files (fasta): ['E:/FASTA/HUMAN_070321/7March21_uniprot-proteome_UP000005640+reviewed_yes.fasta']
2023-12-29 10:07:32> Platform information:
2023-12-29 10:07:32> system - Windows
2023-12-29 10:07:32> release - 10
2023-12-29 10:07:32> version - 10.0.19045
2023-12-29 10:07:32> machine - AMD64
2023-12-29 10:07:32> processor - AMD64 Family 23 Model 49 Stepping 0, AuthenticAMD
2023-12-29 10:07:32> cpu count - 128
2023-12-29 10:07:32> ram - 234.3/255.9 Gb (available/total)
2023-12-29 10:07:32>
2023-12-29 10:07:32> Python information:
2023-12-29 10:07:32> alphabase - 1.2.0
2023-12-29 10:07:32> alpharaw - 0.4.0
2023-12-29 10:07:32> biopython -
2023-12-29 10:07:32> click - 8.1.7
2023-12-29 10:07:32> lxml - 4.9.4
2023-12-29 10:07:32> numba - 0.58.1
2023-12-29 10:07:32> numpy - 1.26.2
2023-12-29 10:07:32> pandas - 2.1.4
2023-12-29 10:07:32> peptdeep - 1.1.1
2023-12-29 10:07:32> psutil - 5.9.7
2023-12-29 10:07:32> pyteomics - 4.6.3
2023-12-29 10:07:32> python - 3.9.18
2023-12-29 10:07:32> scikit-learn - 1.3.2
2023-12-29 10:07:32> streamlit - 1.29.0
2023-12-29 10:07:32> streamlit-aggrid -
2023-12-29 10:07:32> torch - 2.1.2
2023-12-29 10:07:32> tqdm - 4.66.1
2023-12-29 10:07:32> transformers - 4.36.2
2023-12-29 10:07:32>
2023-12-29 10:07:34> xxx/library.tsv does not exist, use default IRT_PEPTIDE_DF to translate irt
2023-12-29 10:07:34> Generating the spectral library ...
2023-12-29 10:11:37> Loaded 23643549 precursors.
2023-12-29 10:13:50> Predicting RT/IM/MS2 for 19882736 precursors ...
2023-12-29 10:13:50> Using multiprocessing with 100 processes ...
2023-12-29 10:13:50> Predicting rt,mobility,ms2 ...
Exception in thread Thread-1:
Traceback (most recent call last):
File "threading.py", line 980, in _bootstrap_inner
File "threading.py", line 917, in run
File "multiprocessing\pool.py", line 519, in _handle_workers
File "multiprocessing\pool.py", line 499, in _wait_for_updates
File "multiprocessing\connection.py", line 879, in wait
File "multiprocessing\connection.py", line 811, in _exhaustive_wait
ValueError: need at most 63 handles, got a sequence of length 102
47%|████████████████████████████████████████████████████████▍ | 101/213 [1:11:24<1:31:48, 49.18s/it]

Adding User defined modificaitions does not seem to work

Hi,

I tried to add user defined modification in the settings tab of the GUI. Nothing seems to be happening when i click add user mod into alphabase. Please let me know what to do.

Thanks
Maithy

Some library entries don't get translated to a tsv file

Attached is an input file and settings.yaml file that if run, will not generate a .tsv output file (though it will generate the .hdf5 file). Thoughts? (I used alphapeptdeep 1.0.1, i.e., the latest version)

settingsLibrary.yaml.gz
first_search.mzid_forPred.tsv.gz

CANNOT loading PSMs and extracting fragments for transfer learning

When I use the 'Transfer' section to get a re-fined model, something wrong with 'Loading PSMs and extracting fragments' happened. I provided the paths of MaxQuant data 'msms.txt' and the corresponding raw file, but got an error.
peptdeep_transfer.log

error using gui: 'PredictFastaSpecLib' object has no attribute 'from_fasta'

Hello, I'm getting the following error when trying to generate a library from a fasta input file via the GUI version of the tool:

AttributeError: 'PredictFastaSpecLib' object has no attribute 'from_fasta'

Here's the full trace:

2022-07-29 13:44:55.893 Uncaught app exception
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/peptdeep/lib/python3.8/site-packages/streamlit/scriptrunner/script_runner.py", line 557, in _run_script
    exec(code, module.__dict__)
  File "/home/ubuntu/miniconda3/envs/peptdeep/lib/python3.8/site-packages/peptdeep/webui/main_ui.py", line 48, in <module>
    sidebar[menu]()
  File "/home/ubuntu/miniconda3/envs/peptdeep/lib/python3.8/site-packages/peptdeep/webui/library_ui.py", line 198, in show
    generate_library()
  File "/home/ubuntu/miniconda3/envs/peptdeep/lib/python3.8/site-packages/peptdeep/pipeline_api.py", line 252, in generate_library
    raise e
  File "/home/ubuntu/miniconda3/envs/peptdeep/lib/python3.8/site-packages/peptdeep/pipeline_api.py", line 219, in generate_library
    lib_maker.make_library(lib_settings['input']['paths'])
  File "/home/ubuntu/miniconda3/envs/peptdeep/lib/python3.8/site-packages/peptdeep/spec_lib/library_factory.py", line 67, in make_library
    self._input(_input)
  File "/home/ubuntu/miniconda3/envs/peptdeep/lib/python3.8/site-packages/peptdeep/spec_lib/library_factory.py", line 181, in _input
    self.spec_lib.from_fasta(fasta)
AttributeError: 'PredictFastaSpecLib' object has no attribute 'from_fasta'

I'm getting the same error on a MacOS install as well as linux.

Am I missing something simple in the use of the tool?
Thanks!

model_mgr_nce? what is nce?

Hello,

Thank you for providing .ipynb examples! I have been creating spectral libraries from FASTA files on Colab.

However, I was wondering what nce means. It seems that it has to be specified when predicting MS2, RT and CSS.

The default value appears to be 30.

Could you let me know what nce is?

Thank you.

Issues when running transfer_learn()

I am trying to fine-tune peptdeep using the function transfer_learn(). I am using the example given here. Of course I have changed the paths to sm and psm files. I have also changed mgr_settings['transfer']['psm_type'] to 'maxquant' and mgr_settings['transfer']['ms_file_type'] to 'thermo'. When I run transfer_learn() I face the following two issues. Could you please help me with these?

If the name of the raw file, given in the 'maxquant' file 'evidence.txt', contains uppercase letters, the program does not read the raw file at all and crashes with the following error message:

Traceback (most recent call last):
File "/gpfs/gsfs9/users/qmbp_ms/Mehdi/alphapeptdeep/peptdeep/pipeline_api.py", line 205, in transfer_learn
psm_df, frag_df = match_psms()
File "/gpfs/gsfs9/users/qmbp_ms/Mehdi/alphapeptdeep/peptdeep/pipeline_api.py", line 133, in match_psms
return concat_precursor_fragment_dataframes(
File "/gpfs/gsfs9/users/qmbp_ms/Mehdi/conda/envs/alpha/lib/python3.8/site-packages/alphabase/peptide/fragment.py", line 367, in concat_precursor_fragment_dataframes
pd.concat(precursor_df_list, ignore_index=True),
File "/gpfs/gsfs9/users/qmbp_ms/Mehdi/conda/envs/alpha/lib/python3.8/site-packages/pandas/util/_decorators.py", line 331, in wrapper
return func(*args, **kwargs)
File "/gpfs/gsfs9/users/qmbp_ms/Mehdi/conda/envs/alpha/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 368, in concat
op = _Concatenator(
File "/gpfs/gsfs9/users/qmbp_ms/Mehdi/conda/envs/alpha/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 425, in init
raise ValueError("No objects to concatenate")
ValueError: No objects to concatenate

I was able to work around the previous issue by changing the names of the raw files in the maxquant file to lowercase. But then I faced another error message. It seems that GetMSOrderForScanNum() requires a standard integer as an input, but it is getting numpy int64. Here is the error message:

Python.Runtime.PythonException: an integer is required

The above exception was the direct cause of the following exception:

System.ArgumentException: an integer is required in method ThermoFisher.CommonCore.Data.Interfaces.IScanEvent GetScanEventForScanNumber(Int32) ---> Python.Runtime.PythonException: an integer is required
--- End of inner exception stack trace ---

The above exception was the direct cause of the following exception:

System.AggregateException: One or more errors occurred. (an integer is required in method ThermoFisher.CommonCore.Data.Interfaces.IScanEvent GetScanEventForScanNumber(Int32)) ---> System.ArgumentException: an integer is required in method ThermoFisher.CommonCore.Data.Interfaces.IScanEvent GetScanEventForScanNumber(Int32) ---> Python.Runtime.PythonException: an integer is required
--- End of inner exception stack trace ---
--- End of inner exception stack trace ---
---> (Inner Exception #0) System.ArgumentException: an integer is required in method ThermoFisher.CommonCore.Data.Interfaces.IScanEvent GetScanEventForScanNumber(Int32) ---> Python.Runtime.PythonException: an integer is required
--- End of inner exception stack trace ---<---

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/gpfs/gsfs9/users/qmbp_ms/Mehdi/alphapeptdeep/peptdeep/pipeline_api.py", line 205, in transfer_learn
psm_df, frag_df = match_psms()
File "/gpfs/gsfs9/users/qmbp_ms/Mehdi/alphapeptdeep/peptdeep/pipeline_api.py", line 122, in match_psms
) = match_one_raw(
File "/gpfs/gsfs9/users/qmbp_ms/Mehdi/alphapeptdeep/peptdeep/rescore/feature_extractor.py", line 41, in match_one_raw
) = match.match_ms2_one_raw(
File "/gpfs/gsfs9/users/qmbp_ms/Mehdi/alphapeptdeep/peptdeep/mass_spec/match.py", line 282, in match_ms2_one_raw
ms2_reader.load(ms2_file)
File "/gpfs/gsfs9/users/qmbp_ms/Mehdi/alphapeptdeep/peptdeep/mass_spec/ms_reader.py", line 335, in load
ms_order = rawfile.GetMSOrderForScanNum(i)
File "/gpfs/gsfs9/users/qmbp_ms/Mehdi/alphapeptdeep/peptdeep/legacy/thermo_raw/pyrawfilereader.py", line 487, in GetMSOrderForScanNum
return IScanEventBase(self.source.GetScanEventForScanNumber(scanNumber)).MSOrder
TypeError: No method matches given arguments for IRawDataPlus.GetScanEventForScanNumber: (<class 'numpy.int64'>)

Best regards,
Mehdi

GUI failed when cpu count < global_settings['thread_num']

Traceback (most recent call last):

  File "streamlit\runtime\scriptrunner\script_runner.py", line 556, in _run_script

    exec(code, module.__dict__)

  File "C:\Users\marth\AppData\Local\Programs\peptdeep\peptdeep\webui\main_ui.py", line 55, in <module>

    sidebar[menu]()

  File "peptdeep\webui\settings_ui.py", line 31, in show

    thread_num = st.number_input('Thread number',

  File "streamlit\elements\number_input.py", line 113, in number_input

    return self._number_input(

  File "streamlit\elements\number_input.py", line 225, in _number_input

    raise StreamlitAPIException(

streamlit.errors.StreamlitAPIException: The default `value` of 8 must lie between the `min_value` of None and the `max_value` of 4, inclusively.

Transfer of alphapeptdeep predicted HLA library to DIANN and test with acquired raw files (from Q Exactive™ HF-X mass spectrometer): precursor matrix is blank

Hello,

we ultimately generated an HLA-library using PeptDeep-HLA for the prediction of peptides and alphapeptdeep for library generation and conversion to .tsv for compatibility with DIANN. The file was readable with DIANN and we used it with some already acquired sample spectra to test for peptide identification. Despite the fact that the test.log file mentions identified IDs at the 1% FDR for both tested sample spectra after the second pass, eventualy the pr_matrix.tsv is empty. Any idea why this might be happening? Could you maybe provide us with the exact options enabled in DIANN?

Best,
Martha

test.log.txt

Multiprocessing causes hanging for library creation; also, multiprocessing flag in yaml is IGNORED

When trying to run peptdeep library creation on an 8 core linux docker, it hangs. Please see attached logs. Also, when I tried to disable multiprocessing in the yaml, it still continues to use multiprocessing. I've attached the output log, the settings*.yaml file and the input file. My current workaround: I manually edited the pretrained_models.py file to force DISABLING of mutliprocessing, and then it appeared to work.

peptdeep.log
settingsLibrary.yaml.gz
cyto_all.tsv_forPred.tsv.gz

Ideally, multiprocessing should work etc., but what's really odd is why the multiprocessing flag in the yaml file is ignored etc.

MSFragger speclib incomaptible with alphaPeptDeep

Describe the bug
Loading a MSFragger lib and predicting spectra works on the main branch of alphaBase but fails on the development branch of alphaBase.

To Reproduce

%reload_ext autoreload
%autoreload 2
import logging
logging.getLogger().setLevel(logging.INFO)

from alphabase.spectral_library.reader import LibraryReaderBase

lib_location = '/Users/georgwallmann/Documents/data/alphadia_benchmarking/libraries/patricia_hela/21min_Evosep_HeLa_BR14_48fractions_diaPASEF_py_diAID_2.tsv'

# create dense library from diann psm file
target_lib = LibraryReaderBase()
target_lib.add_modification_mapping(
    {'Oxidation@M':['M(Oxidation)'],
     'Dimethyl@K':['K(Dimethyl)'],
     'Dimethyl@R':['R(Dimethyl)'],
     'Dimethyl@Any N-term':['(Dimethyl)']
    }
)
psm = target_lib.import_file(lib_location)

from peptdeep.pretrained_models import ModelManager
from alphabase.peptide.fragment import get_charged_frag_types

model_mgr = ModelManager(device='cpu')

model_mgr.nce = 25
model_mgr.instrument = 'orbitrap'

target_lib.precursor_df['instrument'] = model_mgr.instrument
target_lib.precursor_df['nce'] = model_mgr.nce

frag_types = get_charged_frag_types(
    ['b','y'],
    2
)

res = model_mgr.predict_all(
    target_lib.precursor_df,
    predict_items=['ms2'],
    frag_types = frag_types,
)
target_lib._precursor_df = res['precursor_df']
target_lib._fragment_mz_df = res['fragment_mz_df']
target_lib._fragment_intensity_df = res['fragment_intensity_df']

Additional context

Error Message

---------------------------------------------------------------------------
RemoteTraceback                           Traceback (most recent call last)
RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/Users/georgwallmann/miniconda3/envs/alpha/lib/python3.9/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/Users/georgwallmann/Documents/git/alphapeptdeep/peptdeep/pretrained_models.py", line 887, in _predict_func_for_mp
    return self.predict_all(
  File "/Users/georgwallmann/Documents/git/alphapeptdeep/peptdeep/pretrained_models.py", line 1065, in predict_all
    fragment_mz_df = create_fragment_mz_dataframe(
  File "/Users/georgwallmann/Documents/git/alphabase/alphabase/peptide/fragment.py", line 913, in create_fragment_mz_dataframe
    return create_fragment_mz_dataframe(
  File "/Users/georgwallmann/Documents/git/alphabase/alphabase/peptide/fragment.py", line 975, in create_fragment_mz_dataframe
    return mask_fragments_for_charge_greater_than_precursor_charge(
  File "/Users/georgwallmann/Documents/git/alphabase/alphabase/peptide/fragment.py", line 502, in mask_fragments_for_charge_greater_than_precursor_charge
    fragment_df.loc[
  File "/Users/georgwallmann/miniconda3/envs/alpha/lib/python3.9/site-packages/pandas/core/indexing.py", line 815, in __setitem__
    indexer = self._get_setitem_indexer(key)
  File "/Users/georgwallmann/miniconda3/envs/alpha/lib/python3.9/site-packages/pandas/core/indexing.py", line 698, in _get_setitem_indexer
    return self._convert_tuple(key)
  File "/Users/georgwallmann/miniconda3/envs/alpha/lib/python3.9/site-packages/pandas/core/indexing.py", line 897, in _convert_tuple
    keyidx = [self._convert_to_indexer(k, axis=i) for i, k in enumerate(key)]
  File "/Users/georgwallmann/miniconda3/envs/alpha/lib/python3.9/site-packages/pandas/core/indexing.py", line 897, in 
    keyidx = [self._convert_to_indexer(k, axis=i) for i, k in enumerate(key)]
  File "/Users/georgwallmann/miniconda3/envs/alpha/lib/python3.9/site-packages/pandas/core/indexing.py", line 1394, in _convert_to_indexer
    key = check_bool_indexer(labels, key)
  File "/Users/georgwallmann/miniconda3/envs/alpha/lib/python3.9/site-packages/pandas/core/indexing.py", line 2567, in check_bool_indexer
    return check_array_indexer(index, result)
  File "/Users/georgwallmann/miniconda3/envs/alpha/lib/python3.9/site-packages/pandas/core/indexers/utils.py", line 553, in check_array_indexer
    raise IndexError(
IndexError: Boolean index has wrong length: 51864 instead of 3084552
"""

The above exception was the direct cause of the following exception:

IndexError                                Traceback (most recent call last)
[/Users/georgwallmann/Downloads/create_msfragger_lib.ipynb](https://file+.vscode-resource.vscode-cdn.net/Users/georgwallmann/Downloads/create_msfragger_lib.ipynb) Cell 4 in 1
     [10](vscode-notebook-cell:/Users/georgwallmann/Downloads/create_msfragger_lib.ipynb#W3sZmlsZQ%3D%3D?line=9) target_lib.precursor_df['nce'] = model_mgr.nce
     [12](vscode-notebook-cell:/Users/georgwallmann/Downloads/create_msfragger_lib.ipynb#W3sZmlsZQ%3D%3D?line=11) frag_types = get_charged_frag_types(
     [13](vscode-notebook-cell:/Users/georgwallmann/Downloads/create_msfragger_lib.ipynb#W3sZmlsZQ%3D%3D?line=12)     ['b','y'],
     [14](vscode-notebook-cell:/Users/georgwallmann/Downloads/create_msfragger_lib.ipynb#W3sZmlsZQ%3D%3D?line=13)     2
     [15](vscode-notebook-cell:/Users/georgwallmann/Downloads/create_msfragger_lib.ipynb#W3sZmlsZQ%3D%3D?line=14) )
---> [17](vscode-notebook-cell:/Users/georgwallmann/Downloads/create_msfragger_lib.ipynb#W3sZmlsZQ%3D%3D?line=16) res = model_mgr.predict_all(
     [18](vscode-notebook-cell:/Users/georgwallmann/Downloads/create_msfragger_lib.ipynb#W3sZmlsZQ%3D%3D?line=17)     target_lib.precursor_df,
     [19](vscode-notebook-cell:/Users/georgwallmann/Downloads/create_msfragger_lib.ipynb#W3sZmlsZQ%3D%3D?line=18)     predict_items=['ms2'],
     [20](vscode-notebook-cell:/Users/georgwallmann/Downloads/create_msfragger_lib.ipynb#W3sZmlsZQ%3D%3D?line=19)     frag_types = frag_types,
     [21](vscode-notebook-cell:/Users/georgwallmann/Downloads/create_msfragger_lib.ipynb#W3sZmlsZQ%3D%3D?line=20) )
     [22](vscode-notebook-cell:/Users/georgwallmann/Downloads/create_msfragger_lib.ipynb#W3sZmlsZQ%3D%3D?line=21) target_lib._precursor_df = res['precursor_df']
     [23](vscode-notebook-cell:/Users/georgwallmann/Downloads/create_msfragger_lib.ipynb#W3sZmlsZQ%3D%3D?line=22) target_lib._fragment_mz_df = res['fragment_mz_df']

File [~/Documents/git/alphapeptdeep/peptdeep/pretrained_models.py:1098](https://file+.vscode-resource.vscode-cdn.net/Users/georgwallmann/Downloads/~/Documents/git/alphapeptdeep/peptdeep/pretrained_models.py:1098), in ModelManager.predict_all(self, precursor_df, predict_items, frag_types, multiprocessing, min_required_precursor_num_for_mp, process_num, mp_batch_size)
   1096 else:
   1097     logging.info(f"Using multiprocessing with {process_num} processes ...")
-> 1098     return self.predict_all_mp(
   1099         precursor_df, 
   1100         predict_items=predict_items,
   1101         process_num = process_num,
   1102         mp_batch_size=mp_batch_size,
   1103     )

File [~/Documents/git/alphapeptdeep/peptdeep/pretrained_models.py:937](https://file+.vscode-resource.vscode-cdn.net/Users/georgwallmann/Downloads/~/Documents/git/alphapeptdeep/peptdeep/pretrained_models.py:937), in ModelManager.predict_all_mp(self, precursor_df, predict_items, frag_types, process_num, mp_batch_size)
    934 self.verbose = False
    936 with mp.get_context('spawn').Pool(process_num) as p:
--> 937     for ret_dict in process_bar(
    938         p.imap_unordered(
    939             self._predict_func_for_mp, 
    940             mp_param_generator(df_groupby)
    941         ), 
    942         get_batch_num_mp(df_groupby)
    943     ):
    944         precursor_df_list.append(ret_dict['precursor_df'])
    945         if fragment_mz_df_list is not None:

File [~/Documents/git/alphapeptdeep/peptdeep/utils.py:27](https://file+.vscode-resource.vscode-cdn.net/Users/georgwallmann/Downloads/~/Documents/git/alphapeptdeep/peptdeep/utils.py:27), in process_bar(iterator, len_iter)
     25 with tqdm.tqdm(total=len_iter) as bar:
     26     i = 0
---> 27     for i,iter in enumerate(iterator):
     28         yield iter
     29         bar.update()

File [~/miniconda3/envs/alpha/lib/python3.9/multiprocessing/pool.py:870](https://file+.vscode-resource.vscode-cdn.net/Users/georgwallmann/Downloads/~/miniconda3/envs/alpha/lib/python3.9/multiprocessing/pool.py:870), in IMapIterator.next(self, timeout)
    868 if success:
    869     return value
--> 870 raise value

IndexError: Boolean index has wrong length: 51864 instead of 3084552

Conda Env

# packages in environment at /Users/georgwallmann/miniconda3/envs/alpha:
#
# Name                    Version                   Build  Channel
alabaster                 0.7.12                   pypi_0    pypi
alphabase                 0.2.0                     dev_0    <develop>
alphadia                  1.2.0                     dev_0    <develop>
alphapept                 0.4.8                    pypi_0    pypi
alpharaw                  0.1.0                     dev_0    <develop>
alphatims                 1.0.6                     dev_0    <develop>
altair                    4.2.0                    pypi_0    pypi
anyio                     3.6.1              pyhd8ed1ab_1    conda-forge
appnope                   0.1.3              pyhd8ed1ab_0    conda-forge
argon2-cffi               21.3.0             pyhd8ed1ab_0    conda-forge
argon2-cffi-bindings      21.2.0           py39hb18efdd_2    conda-forge
arrow                     1.2.3                    pypi_0    pypi
asteval                   0.9.28                   pypi_0    pypi
asttokens                 2.0.8              pyhd8ed1ab_0    conda-forge
astunparse                1.6.3              pyhd8ed1ab_0    conda-forge
attrs                     22.1.0             pyh71513ae_1    conda-forge
autodocsumm               0.2.9                    pypi_0    pypi
babel                     2.10.3             pyhd8ed1ab_0    conda-forge
backcall                  0.2.0              pyh9f0ad1d_0    conda-forge
backports                 1.0                        py_2    conda-forge
backports.functools_lru_cache 1.6.4              pyhd8ed1ab_0    conda-forge
beautifulsoup4            4.11.1             pyha770c72_0    conda-forge
biopython                 1.79                     pypi_0    pypi
bleach                    5.0.1              pyhd8ed1ab_0    conda-forge
blinker                   1.5                      pypi_0    pypi
bokeh                     2.4.3              pyhd8ed1ab_3    conda-forge
boto3                     1.26.37                  pypi_0    pypi
botocore                  1.29.37                  pypi_0    pypi
bravado                   11.0.3                   pypi_0    pypi
bravado-core              5.17.1                   pypi_0    pypi
brotli                    1.0.9                h1c322ee_7    conda-forge
brotli-bin                1.0.9                h1c322ee_7    conda-forge
brotlipy                  0.7.0           py39hb18efdd_1004    conda-forge
bump2version              1.0.1                    pypi_0    pypi
bumpversion               0.6.0                    pypi_0    pypi
bzip2                     1.0.8                h3422bc3_4    conda-forge
c-ares                    1.18.1               h3422bc3_0    conda-forge
ca-certificates           2023.05.30           hca03da5_0  
cachetools                5.2.0                    pypi_0    pypi
certifi                   2023.7.22        py39hca03da5_0  
cffi                      1.15.1           py39h04d3946_0    conda-forge
chardet                   5.1.0                    pypi_0    pypi
charset-normalizer        2.1.1              pyhd8ed1ab_0    conda-forge
click                     8.1.3                    pypi_0    pypi
cloudpickle               2.2.0              pyhd8ed1ab_0    conda-forge
clr-loader                0.2.6                    pypi_0    pypi
colorama                  0.4.5              pyhd8ed1ab_0    conda-forge
colorcet                  3.0.1              pyhd8ed1ab_0    conda-forge
commonmark                0.9.1                    pypi_0    pypi
conda                     22.9.0           py39h2804cbe_1    conda-forge
conda-package-handling    1.9.0            py39h02fc5c5_0    conda-forge
contextlib2               21.6.0                   pypi_0    pypi
contourpy                 1.0.5            py39haaf3ac1_0    conda-forge
coverage                  7.2.0                    pypi_0    pypi
coverage-badge            1.1.0                    pypi_0    pypi
cryptography              38.0.4           py39he2a39a8_0    conda-forge
cycler                    0.11.0             pyhd8ed1ab_0    conda-forge
dask-core                 2022.10.0          pyhd8ed1ab_0    conda-forge
datashader                0.14.2             pyh6c4a22f_0    conda-forge
datashape                 0.5.4                      py_1    conda-forge
debugpy                   1.6.3            py39h3c22d25_0    conda-forge
decorator                 5.1.1              pyhd8ed1ab_0    conda-forge
defusedxml                0.7.1              pyhd8ed1ab_0    conda-forge
docutils                  0.19                     pypi_0    pypi
easygui                   0.98.3                   pypi_0    pypi
entrypoints               0.4                pyhd8ed1ab_0    conda-forge
execnb                    0.1.4              pyhd8ed1ab_0    conda-forge
executing                 1.1.1              pyhd8ed1ab_0    conda-forge
fastcore                  1.5.27                   pypi_0    pypi
filelock                  3.8.0                    pypi_0    pypi
flit-core                 3.7.1              pyhd8ed1ab_0    conda-forge
fmt                       9.1.0                hffc8910_0    conda-forge
fonttools                 4.37.4           py39h02fc5c5_0    conda-forge
fqdn                      1.5.1                    pypi_0    pypi
freetype                  2.12.1               hd633e50_0    conda-forge
fsspec                    2022.8.2           pyhd8ed1ab_0    conda-forge
furo                      2022.12.7                pypi_0    pypi
future                    0.18.2                   pypi_0    pypi
ghapi                     1.0.3              pyhd8ed1ab_1    conda-forge
gitdb                     4.0.9                    pypi_0    pypi
gitpython                 3.1.29                   pypi_0    pypi
h5py                      3.7.0                    pypi_0    pypi
hdf5                      1.12.2          nompi_h33dac16_100    conda-forge
hilbertcurve              2.0.5                    pypi_0    pypi
holoviews                 1.15.1             pyhd8ed1ab_0    conda-forge
huggingface-hub           0.10.1                   pypi_0    pypi
hvplot                    0.8.1              pyhd8ed1ab_0    conda-forge
icu                       70.1                 h6b3803e_0    conda-forge
idna                      3.4                pyhd8ed1ab_0    conda-forge
imageio                   2.27.0                   pypi_0    pypi
imagesize                 1.4.1                    pypi_0    pypi
imbalanced-learn          0.11.0                   pypi_0    pypi
importlib-metadata        4.11.4           py39h2804cbe_0    conda-forge
importlib_resources       5.10.0             pyhd8ed1ab_0    conda-forge
iniconfig                 1.1.1              pyhd3eb1b0_0  
ipykernel                 6.16.0             pyh736e0ef_0    conda-forge
ipympl                    0.9.3                    pypi_0    pypi
ipython                   8.5.0              pyhd1c38e8_1    conda-forge
ipython_genutils          0.2.0                      py_1    conda-forge
ipywidgets                7.6.5              pyhd3eb1b0_1  
isoduration               20.11.0                  pypi_0    pypi
jedi                      0.18.1             pyhd8ed1ab_2    conda-forge
jinja2                    3.1.2              pyhd8ed1ab_1    conda-forge
jmespath                  1.0.1                    pypi_0    pypi
joblib                    1.2.0              pyhd8ed1ab_0    conda-forge
jpeg                      9e                   he4db4b2_2    conda-forge
json5                     0.9.5              pyh9f0ad1d_0    conda-forge
jsonpointer               2.3                      pypi_0    pypi
jsonref                   1.0.1                    pypi_0    pypi
jsonschema                4.16.0             pyhd8ed1ab_0    conda-forge
jupyter_client            7.4.2              pyhd8ed1ab_0    conda-forge
jupyter_core              4.11.1           py39h2804cbe_0    conda-forge
jupyter_server            1.21.0             pyhd8ed1ab_0    conda-forge
jupyterlab                3.4.8              pyhd8ed1ab_0    conda-forge
jupyterlab_pygments       0.2.2              pyhd8ed1ab_0    conda-forge
jupyterlab_server         2.16.0             pyhd8ed1ab_0    conda-forge
jupyterlab_widgets        1.0.0              pyhd3eb1b0_1  
kiwisolver                1.4.4            py39hab5e169_0    conda-forge
krb5                      1.19.3               he492e65_0    conda-forge
lazy-loader               0.2                      pypi_0    pypi
lcms2                     2.12                 had6a04f_0    conda-forge
lerc                      4.0.0                h9a09cb3_0    conda-forge
libarchive                3.5.2                hdd7f49f_3    conda-forge
libblas                   3.9.0           16_osxarm64_openblas    conda-forge
libbrotlicommon           1.0.9                h1c322ee_7    conda-forge
libbrotlidec              1.0.9                h1c322ee_7    conda-forge
libbrotlienc              1.0.9                h1c322ee_7    conda-forge
libcblas                  3.9.0           16_osxarm64_openblas    conda-forge
libcurl                   7.86.0               h1c293e1_1    conda-forge
libcxx                    14.0.6               h2692d47_0    conda-forge
libdeflate                1.14                 h1a8c8d9_0    conda-forge
libedit                   3.1.20191231         hc8eb9b7_2    conda-forge
libev                     4.33                 h642e427_1    conda-forge
libffi                    3.4.2                h3422bc3_5    conda-forge
libgfortran               5.0.0           11_3_0_hd922786_25    conda-forge
libgfortran5              11.3.0              hdaf2cc0_25    conda-forge
libiconv                  1.17                 he4db4b2_0    conda-forge
liblapack                 3.9.0           16_osxarm64_openblas    conda-forge
libllvm11                 11.1.0               hfa12f05_4    conda-forge
libmamba                  1.1.0                he1bf84c_2    conda-forge
libmambapy                1.1.0            py39h6901ea2_2    conda-forge
libnghttp2                1.47.0               h519802c_1    conda-forge
libopenblas               0.3.21          openmp_hc731615_3    conda-forge
libpng                    1.6.38               h76d750c_0    conda-forge
libsodium                 1.0.18               h27ca646_1    conda-forge
libsolv                   0.7.22               h1280f1d_0    conda-forge
libsqlite                 3.39.4               h76d750c_0    conda-forge
libssh2                   1.10.0               h7a5bd25_3    conda-forge
libtiff                   4.4.0                hfa0b094_4    conda-forge
libwebp-base              1.2.4                h57fd34a_0    conda-forge
libxcb                    1.13              h9b22ae9_1004    conda-forge
libxml2                   2.10.3               h87b0503_0    conda-forge
libzlib                   1.2.13               h03a7124_4    conda-forge
line-profiler             3.5.1                    pypi_0    pypi
llvm-openmp               14.0.6               hc6e5704_0  
llvmlite                  0.39.1           py39h8ca5d33_0    conda-forge
lmfit                     1.1.0                    pypi_0    pypi
locket                    1.0.0              pyhd8ed1ab_0    conda-forge
lxml                      4.9.1                    pypi_0    pypi
lz4-c                     1.9.3                hbdafb3b_1    conda-forge
lzo                       2.10              h642e427_1000    conda-forge
mamba                     1.1.0            py39ha55b623_2    conda-forge
markdown                  3.4.1              pyhd8ed1ab_0    conda-forge
markdown-it-py            2.1.0                    pypi_0    pypi
markupsafe                2.1.1            py39hb18efdd_1    conda-forge
matplotlib-base           3.5.3            py39hc377ac9_0  
matplotlib-inline         0.1.6              pyhd8ed1ab_0    conda-forge
matplotlib-venn           0.11.7                   pypi_0    pypi
maturin                   0.13.7           py39ha6e5c4f_0  
mdit-py-plugins           0.3.3                    pypi_0    pypi
mdurl                     0.1.2                    pypi_0    pypi
mistune                   2.0.4              pyhd8ed1ab_0    conda-forge
mmh3                      3.0.0                    pypi_0    pypi
monotonic                 1.6                      pypi_0    pypi
msgpack                   1.0.4                    pypi_0    pypi
multipledispatch          0.6.0                      py_0    conda-forge
munkres                   1.1.4              pyh9f0ad1d_0    conda-forge
myst-parser               0.18.1                   pypi_0    pypi
nb_conda_kernels          2.3.1            py39h2804cbe_1    conda-forge
nbclassic                 0.4.5              pyhd8ed1ab_0    conda-forge
nbclient                  0.7.0              pyhd8ed1ab_0    conda-forge
nbconvert                 7.2.1              pyhd8ed1ab_0    conda-forge
nbconvert-core            7.2.1              pyhd8ed1ab_0    conda-forge
nbconvert-pandoc          7.2.1              pyhd8ed1ab_0    conda-forge
nbdev                     2.2.10             pyhd8ed1ab_0    conda-forge
nbformat                  5.7.0              pyhd8ed1ab_0    conda-forge
nbsphinx                  0.8.10                   pypi_0    pypi
ncurses                   6.3                  h07bb92c_1    conda-forge
neptune                   1.6.1                    pypi_0    pypi
nest-asyncio              1.5.6              pyhd8ed1ab_0    conda-forge
networkx                  3.1                      pypi_0    pypi
notebook                  6.5.1              pyha770c72_0    conda-forge
notebook-shim             0.1.0              pyhd8ed1ab_0    conda-forge
numba                     0.56.4           py39h78102c4_0  
numexpr                   2.8.3                    pypi_0    pypi
numpy                     1.23.3           py39hcb4b507_0    conda-forge
oauthlib                  3.2.2                    pypi_0    pypi
openjpeg                  2.5.0                h5d4e404_1    conda-forge
openssl                   3.0.10               h1a28f6b_0  
packaging                 21.3               pyhd8ed1ab_0    conda-forge
pandas                    1.5.0                    pypi_0    pypi
pandoc                    2.12                 hca03da5_0  
pandocfilters             1.5.0              pyhd8ed1ab_0    conda-forge
panel                     0.14.0             pyhd8ed1ab_0    conda-forge
param                     1.12.2             pyh6c4a22f_0    conda-forge
parso                     0.8.3              pyhd8ed1ab_0    conda-forge
partd                     1.3.0              pyhd8ed1ab_0    conda-forge
peptdeep                  0.1.7                     dev_0    <develop>
pexpect                   4.8.0              pyh9f0ad1d_2    conda-forge
pickleshare               0.7.5                   py_1003    conda-forge
pillow                    9.2.0            py39he45c975_2    conda-forge
pip                       22.3               pyhd8ed1ab_0    conda-forge
pkgutil-resolve-name      1.3.10             pyhd8ed1ab_0    conda-forge
pluggy                    1.0.0            py39hca03da5_1  
pprofile                  2.1.0                    pypi_0    pypi
progressbar               2.5                      pypi_0    pypi
prometheus_client         0.15.0             pyhd8ed1ab_0    conda-forge
prompt-toolkit            3.0.31             pyha770c72_0    conda-forge
protobuf                  3.20.3                   pypi_0    pypi
psutil                    5.9.2            py39h02fc5c5_0    conda-forge
pthread-stubs             0.4               h27ca646_1001    conda-forge
ptyprocess                0.7.0              pyhd3deb0d_0    conda-forge
pure_eval                 0.2.2              pyhd8ed1ab_0    conda-forge
py                        1.11.0             pyhd3eb1b0_0  
py-lmd                    1.0.0                    pypi_0    pypi
py-rs-playground          0.1.0                    pypi_0    pypi
pyarrow                   9.0.0                    pypi_0    pypi
pybind11-abi              4                    hd8ed1ab_3    conda-forge
pycosat                   0.6.3           py39hb18efdd_1010    conda-forge
pycparser                 2.21               pyhd8ed1ab_0    conda-forge
pyct                      0.4.6                      py_0    conda-forge
pyct-core                 0.4.6                      py_0    conda-forge
pydeck                    0.8.0b4                  pypi_0    pypi
pydivsufsort              0.0.6                    pypi_0    pypi
pygments                  2.13.0             pyhd8ed1ab_0    conda-forge
pygount                   1.5.1                    pypi_0    pypi
pyjwt                     2.6.0                    pypi_0    pypi
pympler                   1.0.1                    pypi_0    pypi
pyopenssl                 22.1.0             pyhd8ed1ab_0    conda-forge
pyparsing                 3.0.9              pyhd8ed1ab_0    conda-forge
pyrsistent                0.18.1           py39hb18efdd_1    conda-forge
pysocks                   1.7.1              pyha2e5f31_6    conda-forge
pyteomics                 4.5.5                    pypi_0    pypi
pytest                    7.1.2            py39hca03da5_0  
python                    3.9.13          h96fcbfb_0_cpython    conda-forge
python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
python-fastjsonschema     2.16.2             pyhd8ed1ab_0    conda-forge
python_abi                3.9                      2_cp39    conda-forge
pythonnet                 3.0.1                    pypi_0    pypi
pytz                      2022.4             pyhd8ed1ab_0    conda-forge
pytz-deprecation-shim     0.1.0.post0              pypi_0    pypi
pyviz_comms               2.2.1              pyhd8ed1ab_1    conda-forge
pywavelets                1.4.1                    pypi_0    pypi
pyyaml                    6.0              py39hb18efdd_4    conda-forge
pyzmq                     24.0.1           py39h0553236_0    conda-forge
pyzstd                    0.15.3                   pypi_0    pypi
readline                  8.1.2                h46ed386_0    conda-forge
regex                     2022.9.13                pypi_0    pypi
reproc                    14.2.3               h3422bc3_0    conda-forge
reproc-cpp                14.2.3               hbdafb3b_0    conda-forge
requests                  2.28.1             pyhd8ed1ab_1    conda-forge
requests-oauthlib         1.3.1                    pypi_0    pypi
rfc3339-validator         0.1.4                    pypi_0    pypi
rfc3987                   1.3.8                    pypi_0    pypi
rich                      12.6.0                   pypi_0    pypi
rocket-fft                0.1.5                    pypi_0    pypi
ruamel_yaml               0.15.80         py39h9eb174b_1007    conda-forge
s3transfer                0.6.0                    pypi_0    pypi
scikit-image              0.20.0                   pypi_0    pypi
scikit-learn              1.3.0            py39h46d7db6_0  
scipy                     1.9.1            py39h737da60_0    conda-forge
seaborn                   0.12.0           py39hca03da5_0  
semver                    2.13.0                   pypi_0    pypi
send2trash                1.8.0              pyhd8ed1ab_0    conda-forge
setuptools                65.5.0             pyhd8ed1ab_0    conda-forge
simplejson                3.18.0                   pypi_0    pypi
six                       1.16.0             pyh6c4a22f_0    conda-forge
smmap                     5.0.0                    pypi_0    pypi
sniffio                   1.3.0              pyhd8ed1ab_0    conda-forge
snowballstemmer           2.2.0                    pypi_0    pypi
soupsieve                 2.3.2.post1        pyhd8ed1ab_0    conda-forge
sphinx                    5.3.0                    pypi_0    pypi
sphinx-basic-ng           1.0.0b1                  pypi_0    pypi
sphinxcontrib-applehelp   1.0.2                    pypi_0    pypi
sphinxcontrib-devhelp     1.0.2                    pypi_0    pypi
sphinxcontrib-htmlhelp    2.0.0                    pypi_0    pypi
sphinxcontrib-jsmath      1.0.1                    pypi_0    pypi
sphinxcontrib-qthelp      1.0.3                    pypi_0    pypi
sphinxcontrib-serializinghtml 1.1.5                    pypi_0    pypi
sqlalchemy                1.4.42                   pypi_0    pypi
sqlite                    3.39.4               h2229b38_0    conda-forge
stack_data                0.5.1              pyhd8ed1ab_0    conda-forge
streamlit                 1.13.0                   pypi_0    pypi
svgelements               1.9.1                    pypi_0    pypi
swagger-spec-validator    3.0.3                    pypi_0    pypi
tables                    3.7.0                    pypi_0    pypi
tbb                       2021.5.0             h525c30c_0  
terminado                 0.16.0             pyhd1c38e8_0    conda-forge
threadpoolctl             3.1.0              pyh8a188c0_0    conda-forge
tifffile                  2023.3.21                pypi_0    pypi
tinycss2                  1.1.1              pyhd8ed1ab_0    conda-forge
tk                        0.1.0                    pypi_0    pypi
tokenizers                0.13.1                   pypi_0    pypi
toml                      0.10.2                   pypi_0    pypi
tomli                     2.0.1              pyhd8ed1ab_0    conda-forge
toolz                     0.12.0             pyhd8ed1ab_0    conda-forge
torch                     1.12.1                   pypi_0    pypi
torchaudio                0.14.0.dev20221025          pypi_0    pypi
torchvision               0.15.0.dev20221026          pypi_0    pypi
tornado                   6.2              py39h9eb174b_0    conda-forge
tqdm                      4.64.1             pyhd8ed1ab_0    conda-forge
traitlets                 5.4.0              pyhd8ed1ab_0    conda-forge
transformers              4.23.1                   pypi_0    pypi
typing_extensions         4.4.0              pyha770c72_0    conda-forge
tzdata                    2022.5                   pypi_0    pypi
tzlocal                   4.2                      pypi_0    pypi
uncertainties             3.1.7                    pypi_0    pypi
unicodedata2              14.0.0           py39hb18efdd_1    conda-forge
uri-template              1.2.0                    pypi_0    pypi
urllib3                   1.26.11            pyhd8ed1ab_0    conda-forge
validators                0.20.0                   pypi_0    pypi
vulture                   2.7                      pypi_0    pypi
wcwidth                   0.2.5              pyh9f0ad1d_2    conda-forge
webcolors                 1.12                     pypi_0    pypi
webencodings              0.5.1                      py_1    conda-forge
websocket-client          1.4.1              pyhd8ed1ab_0    conda-forge
wheel                     0.37.1             pyhd8ed1ab_0    conda-forge
widgetsnbextension        3.5.2            py39hca03da5_0  
xarray                    2022.10.0          pyhd8ed1ab_0    conda-forge
xorg-libxau               1.0.9                h27ca646_0    conda-forge
xorg-libxdmcp             1.1.3                h27ca646_0    conda-forge
xxhash                    3.2.0                    pypi_0    pypi
xz                        5.2.6                h57fd34a_0    conda-forge
yaml                      0.2.5                h3422bc3_2    conda-forge
yaml-cpp                  0.7.0                hb7217d7_2    conda-forge
zeromq                    4.3.4                hbdafb3b_1    conda-forge
zipp                      3.9.0              pyhd8ed1ab_0    conda-forge
zstd                      1.5.2                h8128057_4    conda-forge

I'm using the dev branch of alphaPeptDeep.
I placed a notebook and library at pool-mann-projects\0_Georg\for_people\Feng.

mannlabs / alphapeptdeep Goto Github PK

alphapeptdeep's Introduction

AlphaPeptDeep (PeptDeep)

About

AlphaX repositories:

Subsequent projects of AlphaPeptDeep

Other pre-trained MS2/RT/CCS models

Citations

License

Installation

One-click GUI

Pip

PythonNET in Windows

PythonNET in Linux

PythonNET in MacOS

Use GPU

Developer

Usage

GUI

CLI

export-settings

cmd-flow

library

sequence_table

peptide_table

precursor_table

transfer

rescore

install-models

Python and Jupyter notebooks

global_settings

Pipeline APIs

ModelManager

Library Prediction

DDA Rescoring

HLA Peptide Prediction

Troubleshooting

How to contribute

Changelog

alphapeptdeep's People

Contributors

Stargazers

Watchers

Forkers

alphapeptdeep's Issues

Recommend Projects

Recommend Topics

Recommend Org