Giter Site home page Giter Site logo

peng113028 / deepdia Goto Github PK

View Code? Open in Web Editor NEW

This project forked from lmsac/deepdia

0.0 0.0 0.0 38.53 MB

Using deep learning to generate in silico spectral libraries for data-independent acquisition analysis.

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

deepdia's Introduction

DeepDIA

Using deep learning to generate in silico spectral libraries for data-independent acquisition (DIA) analysis.

Updates

1.1.0

  • Dependency of R removed
  • FASTA digestion
  • Ion mobility prediction (experimental)

For the version of the Nat Commun 2020 publication, please refer to the commit #674e2fb.

Dependency

The following software and packages are required:

  • Python (version 3.7 or later, Anaconda distribution is recommended)
  • TensorFlow (version 2.0 or later)
  • Keras (packaged with TensorFlow)

For spectral library generation from FASTA files and data preprocessing for training detectability models, the following package is required:

DeepDIA requires the following Python packages integrated in Anaconda:

  • numpy (version 1.18.5)
  • pandas (version 0.25.3)
  • scipy (version 1.4.1)
  • statsmodels (version 0.13.2)

Later versions may be compatible, but have not been tested.

For model training, NVIDIA graphics cards with CUDA are recommended.

Installation

1. Install Python (Anaconda)

Download and install Anaconda.

Check successful installation by in the Anaconda Prompt:

pip list

Ensure that the following Python packages are installed: numpy, pandas, scipy, and statsmodels. If not, install the missing packages using the following command (as an example for statsmodels):

pip install statsmodels

2. Install TensorFlow

Ensure that NVIDIA GPU driver has been installed. Install the CUDA and cuDNN with conda. This step can be skipped if you run TensorFlow on CPU only.

conda install -c conda-forge cudatoolkit=11.2 cudnn=8.1.0

Install TensorFlow using pip:

pip install tensorflow

3. Install Biopython

Install Biopython using pip:

pip install biopython

or conda:

conda install -c conda-forge biopython

Getting Started

1. Prepare a Peptide List

A peptide list is stored in a comma-separated values (CSV) file including columns named protein and sequence.

"protein","sequence"
"O43504","HDGITVAVHK"
"P56470","VGSSGDIALHINPR"
"Q9UHL4","LDHFNFER"
"P68371","IREEYPDR"
"P01024","AKDQLTCNK"

Peptides can be collected from public resources. From the Pan Human Library (Rosenberger, G. et al. Sci. Data 2014, 1, 140031, doi:10.1038/sdata.2014.31), peptide lists have been collected and provided as an example in data\peptide folder:

  • Pan_human.peptide.csv
  • Pan_human_charge2.peptide.csv
  • Pan_human_charge3.peptide.csv

DeepDIA only supports peptide sequences with standard amino acids (ACDEFGHIKLMNPQRSTVWY) and length <= 50.

2. Predict MS/MS Spectra

Prepare a model for MS/MS prediction. You can use pre-trained models or train your own models. A model trained with HeLa data on Q Exactive HF (Bruderer, R. et al. Mol. Cell. Proteomics 2017, 16, 2296-2309, doi:10.1074/mcp.RA117.000314) is provided as an example in data\models folder:

  • data\models\charge2\epoch_035.hdf5
  • data\models\charge3\epoch_034.hdf5

Run predict_ms2.py to predict MS/MS ion intensities for peptide precursors with charge 2+.

python src\predict_ms2.py `
--in data\peptide\Pan_human_charge2.peptide.csv `
--model data\models\charge2\epoch_035.hdf5 `
--charge 2 `
--out data\Pan_human_charge2.prediction.ions.json

The predicted MS/MS ion intensities are saved in a JSON file (*.prediction.ions.json).

Predict MS/MS for charge 3+ following the same steps.

python src\predict_ms2.py `
--in data\peptide\Pan_human_charge3.peptide.csv `
--model data\models\charge3\epoch_034.hdf5 `
--charge 3 `
--out data\Pan_human_charge3.prediction.ions.json

3. Predict iRT

Prepare a model for iRT prediction. You can use pre-trained models or train your own models. A pretrained model is provided as an example in data\models folder:

  • data\models\irt\epoch_082.hdf5

Run predict_rt.py.

python src\predict_rt.py `
--in data\peptide\Pan_human.peptide.csv `
--model data\models\irt\epoch_082.hdf5 `
--out data\Pan_human.prediction.irt.csv

The predicted iRT values are saved in a CSV file (*.prediction.irt.csv).

4. Generate Spectral Library

Ensure that the predicted MS/MS and iRT files are present in the data folder.

Run build_assays_from_prediction.py.

python src\build_assays_from_prediction.py `
--peptide data\peptide\Pan_human.peptide.csv `
--ions data\Pan_human_charge2.prediction.ions.json `
       data\Pan_human_charge3.prediction.ions.json `
--rt data\Pan_human.prediction.irt.csv `
--out data\Pan_human.prediction.assay.pickle

The generated spectral library is saved in a Python binary file (*.assay.pickle).

Run convert_assays_to_Spectronaut_library.py.

python src\convert_assays_to_Spectronaut_library.py `
--in data\Pan_human.prediction.assay.pickle `
--out data\Pan_human.prediction.library.xls

The generated spectral library is converted to a speadsheet file (*.library.xls) that is compatible with Spectronaut and DIA-NN.

Tutorial

Tutorials are avaliable in the docs folder.

Spectral Library Pretiction

DeepDIA Tutorial: Spectral Library Generation From Peptide Lists describes the workflow to generate in silico spectral libraries from peptide lists.

Detectability Prediction

DeepDIA Tutorial: Spectral Library Generation with Detectability Prediction describes the complete workflow to generate in silico spectral libraries from proteome databases with detectability filtering.

Model Training

DeepDIA Tutorial: Training New Models for MS/MS and iRT Prediction describes the workflow for training new models for MS/MS and iRT prediction using data-dependent acquisition (DDA) data.

DeepDIA Tutorial: Training a New Model for Detectability Prediction describes the workflow for training a new model for MS detectability prediction using data-dependent acquisition DDA data.

Publications

Yang, Y., Liu, X., Shen, C., Lin, Y., Yang, P., Qiao, L. In silico spectral libraries by deep learning facilitate data-independent acquisition proteomics. Nat Commun 11, 146 (2020). https://doi.org/10.1038/s41467-019-13866-z.

License

DeepDIA is distributed under a BSD license. See the LICENSE file for details.

Contacts

Please report any problems directly to the github issue tracker. Also, you can send feedback to [email protected].

deepdia's People

Contributors

yyi17 avatar lapisdev avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.