pnnl-comp-mass-spec / informed-proteomics Goto Github PK

View Code? Open in Web Editor NEW

29.0 29.0 9.0 109.6 MB

Top down / bottom up, MS/MS analysis tool for DDA and DIA mass spectrometry data

C# 97.95% Inno Setup 0.41% Batchfile 0.21% Rich Text Format 1.43%

mass-spectrometry ms-datasets mzml

informed-proteomics's People

Contributors

Stargazers

Watchers

Forkers

aguthals inambioinfo jhenyagavrilenko leahvschaffer zrolfs manuelperisdiaz nbollis gslysz chuanping-zhao

informed-proteomics's Issues

Why does the identified PrSM decrease a lot after using the modified file?

The histone H3.1 I used contains a total of 3460 spectra, and the database human_proteome_database.fasta contains 20410 entries.
When I did not use the modified file, the result output 1310 PrSM, the parameters are as follows:
SpecFile 2DLC_H3_1.pbf
DatabaseFile human_proteome_database.fasta
FeatureFile 2DLC_H3_1.ms1ft
InternalCleavageMode SingleInternalCleavage
Tag-based search True
Tda Target+Decoy
PrecursorIonTolerancePpm 10
ProductIonTolerancePpm 10
MinSequenceLength 21
MaxSequenceLength 300
MinPrecursorIonCharge 2
MaxPrecursorIonCharge 30
MinProductIonCharge 1
MaxProductIonCharge 20
MinSequenceMass 3000
MaxSequenceMass 50000
ActivationMethod Unknown
MaxDynamicModificationsPerSequence 0

When I use the modified file, only 59 PrSMs are output, and the parameters are as follows:
SpecFile 2DLC_H3_1.pbf
DatabaseFile human_proteome_database.fasta
FeatureFile 2DLC_H3_1.ms1ft
InternalCleavageMode SingleInternalCleavage
Tag-based search True
Tda Target+Decoy
PrecursorIonTolerancePpm 10
ProductIonTolerancePpm 10
MinSequenceLength 21
MaxSequenceLength 500
MinPrecursorIonCharge 2
MaxPrecursorIonCharge 50
MinProductIonCharge 1
MaxProductIonCharge 20
MinSequenceMass 3000
MaxSequenceMass 50000
ActivationMethod Unknown
MaxDynamicModificationsPerSequence 4
Modification C(2) H(2) N(0) O(1) S(0),R,opt,Everywhere,Acetyl
Modification C(2) H(2) N(0) O(1) S(0),K,opt,Everywhere,Acetyl
Modification C(1) H(2) N(0) O(0) S(0),R,opt,Everywhere,Methyl
Modification C(1) H(2) N(0) O(0) S(0),K,opt,Everywhere,Methyl
Modification C(2) H(4) N(0) O(0) S(0),R,opt,Everywhere,Dimethyl
Modification C(2) H(4) N(0) O(0) S(0),K,opt,Everywhere,Dimethyl
Modification C(3) H(6) N(0) O(0) S(0),R,opt,Everywhere,Trimethyl
Modification C(0) H(1) N(0) O(3) S(0) P(1),S,opt,Everywhere,Phospho
Modification C(0) H(1) N(0) O(3) S(0) P(1),T,opt,Everywhere,Phospho
Modification C(0) H(1) N(0) O(3) S(0) P(1),Y,opt,Everywhere,Phospho

The modification file I am using is as follows:

This file is used to specify modifications for MSPathFinder

Max Number of Modifications per peptide

NumMods=4

Static mods

None

Dynamic mods

C2H2O1,RK,opt,any,Acetyl # Acetylation RK
CH2,RK,opt,any,Methyl # Methylation RK
C2H4,RK,opt,any,Dimethyl
C3H6,R,opt,any,Trimethyl
HO3P,STY,opt,any,Phospho # Phosphorylation STY

Is there a problem with my parameter settings, which leads to this situation?
Is there a normal event, only 59 prsm can be identified for such input?

Improper documentation

The wiki documentation for running the tutorial has wrong file names for tutorial files, or wrong tutorial files.

Why is there no data in the generated RP4H_P32_WHIM2_biorep1_techrep3.ms1ft when RP4H_P32_WHIM2_biorep1_techrep3.ms1ft runs ProMex?

The version I am using is Version 1.0.6619 and Version 1.0.7017.

When running ProMex, the RP4H_P32_WHIM2_biorep1_techrep3.ms1ft file is generated for MSPathFinderT input, but the RP4H_P32_WHIM2_biorep1_techrep3.ms1ft file is only 1KB, which may affect the identification performance of MSPathFinder.

What is the reason for this?

Reported proteoform masses is always 9Da less

It occurs to me that the reported mass of the proteoforms are always ~9Da less than the mass calculated from the reported m/z and z.

In my knowledge, as I am using nESI as the ion source, the mass should be calculated as (m/z * z - z)

So as in the attached example,
for a precursor ion of 661m/z, charge = 23+, the mass should be 15180Da.
Yet, the reported proteoform mass was 15171.5.
The unmodified mass of my protein is 15143Da. Having K9Me1 and K27Me1, as MSpf reported, will make it 15171Da.

my search parameters are as follow:
'''
SpecFile W05C48_H3_UVPD.raw
DatabaseFile SOYBN_H3.fasta
FeatureFile W05C48_H3_UVPD.ms1ft
InternalCleavageMode NoInternalCleavage
Tag-based search False
Tda Target
PrecursorIonTolerancePpm 10
ProductIonTolerancePpm 10
MinSequenceLength 21
MaxSequenceLength 500
MinPrecursorIonCharge 2
MaxPrecursorIonCharge 50
MinProductIonCharge 1
MaxProductIonCharge 20
MinSequenceMass 3000
MaxSequenceMass 50000
ActivationMethod Unknown
MaxDynamicModificationsPerSequence 5
Modification C(0) H(0) N(0) O(1) S(0),M,opt,Everywhere,Oxidation
Modification C(0) H(0) N(0) O(1) S(0),Y,opt,Everywhere,Oxidation
Modification C(0) H(0) N(0) O(1) S(0),C,opt,Everywhere,Oxidation
Modification C(0) H(0) N(0) O(1) S(0),K,opt,Everywhere,Oxidation
Modification C(0) H(1) N(0) O(3) S(0) P(1),S,opt,Everywhere,Phospho
Modification C(0) H(1) N(0) O(3) S(0) P(1),T,opt,Everywhere,Phospho
Modification C(0) H(1) N(0) O(3) S(0) P(1),Y,opt,Everywhere,Phospho
Modification C(2) H(2) N(0) O(1) S(0),K,opt,Everywhere,Acetyl
Modification C(1) H(2) N(0) O(0) S(0),K,opt,Everywhere,Methyl
Modification C(2) H(4) N(0) O(0) S(0),K,opt,Everywhere,Dimethyl
Modification C(3) H(6) N(0) O(0) S(0),K,opt,Everywhere,Trimethyl
'''

EThcD activation type

Which activation type is most suitable for EThcD fragmentation, which generates both c/z- and b/y-type product ions? Thank you very much.

Add argument to preventing halting when processing a batch

When processing a set of files by specifying a directory to -s it would be helpful if the the process did not halt if a single raw file fails to run, e.g. due to the target or decoy file being empty. If this behaviour is preferred then an additional -skipOnFail argument could be added so that MSPathFinder continues to the next raw file.

ProMex feature determination: Mass

Hi, I have a question about the feature mass determination in ProMex.

According to my understanding to the Informed-Proteomics publication,
(Park, J., Piehowski, P. D., Wilkins, C., Zhou, M., Mendoza, J., Fujimoto, G. M., Gibbons, B. C., Shaw, J. B., Shen, Y., Shukla, A. K., Moore, R. J., Liu, T., Petyuk, V. A., Tolić, N., Paša-Tolić, L., Smith, R. D., Payne, S. H., & Kim, S. (2017). Informed-Proteomics: Open-source software package for top-down proteomics. Nature Methods, 14(9), 909–914. https://doi.org/10.1038/nmeth.4388)
ProMex obtain features, by clustering isotopic envelopes across different charge states and LC elution time, for each monoisotopic mass with in mass range specified. So to my understanding, ProMex will iterate each possible monoisotopic mass within the mass range, and carry out the clustering to obtain feature information.

My question is, how refined is ProMex possible monoisotopic mass list? In my data, I noticed that ProMex could group near isobaric proteoforms, such as trimethylation VS acetylation, into the same feature, although they have slightly different monoisotopic mass.

I noticed that ProMex will divide the mass range into bins during analysis, and in the publication, it was said that a tolerance could be input in ProMex. So my guess is ProMex will divide the mass range into bins according to the tolerance, and then theoretical isotopic envelope will be generated for matching, using the averagine model of the bin mass. Therefore, for near isobaric proteoforms, their monoisotopic mass will fall into the same bin and will be grouped into the same feature with they co-elute.

Is my understanding correct?

"An item with the same key has already been added"

Hello,

I'm trying to run PbfGen on this mzML file, and getting the error "An item with the same key has already been added". Could you help me understand why and how I might fix this?

https://www.dropbox.com/s/9y1of85g5ju9252/091817_mix3_dda_20min.mzML?dl=0

I created the mzML file using MSConvert from Waters RAW format.

Thanks,
Gabriel

How to use Agilent .d folder formats in PbfGen and ProMex?

How to use Agilent .d folder formats in PbfGen?I have installed ProteoWizard, and I tried to use msconvertGUI to convert the .d folder formats into .mzML files so that I can use it in PbfGen. However , when I use the previously created .pbf file in ProMex , it said Index out of range exception. How I can fix it?

Hex(2) treated as an unknown modification

Lactosylation / Hex(2) is treated as an unknown modification:
http://www.unimod.org/modifications_view.php?editid1=512

C12H20O10,RK,opt,any,Hex(2) # Lactostylation

Current output:
<SearchModification fixedMod="false" massDelta="324.105652" residues="R"> <cvParam cvRef="MS" accession="MS:1001460" name="unknown modification" value="Hex(2)" /> </SearchModification>

Expected output:
<SearchModification fixedMod="false" massDelta="324.105652" residues="R"> <cvParam cvRef="UNIMOD" accession="UNIMOD:512" name="Hex(2)" /> </SearchModification>

Export MS/MS sequence tags before scoring

Is it possible to write the MS/MS sequence tag results to a file before running the scoring algorithms? I'm seeing many scans with sequence tags but end with 0 matches after the final scoring step. It would be useful to obtain the sequence tag information for MS2 scans, so one could determine why these are not significant matches (poor MS1 envelope similarity, low # of matching fragments, etc).

Crash if NumMods >= 28

Input:
NumMods=28
HO3P,STY,opt,any,Phospho # Phosphorylation
O1,M,opt,any,Oxidation # Oxidation M
C12H20O10,RK,opt,any,Hex(2) # Lactostylation
C2H2O,K,opt,Prot-N-term,Acetyl # Acetylation Protein N-term (C2H2O can be replaced with "H(2) C(2) O")

On execute:

Exception parsing the file for parameter -mod: The given key was not present in the dictionary.
Exception while processing: The given key was not present in the dictionary.
at System.ThrowHelper.ThrowKeyNotFoundException()
at System.Collections.Generic.Dictionary2.get_Item(TKey key) at InformedProteomics.Backend.Data.Sequence.ModificationParams.GenerateModCombMap() at InformedProteomics.Backend.Data.Sequence.AminoAcidSet..ctor(IEnumerable1 searchModifications, Int32 maxNumModsPerSequence)
at MSPathFinderT.TopDownInputParameters.LoadModsFile(String modFilePath)
at MSPathFinderT.TopDownInputParameters.Parse(Dictionary`2 parameters)
at MSPathFinderT.Program.Main(String[] args)

Can ProMex be used to extract feature from bottom up data

The algorithm may not be optimized for bottom-up data.
What possible problems or limitations if ProMex is used to extract bottom-up data?

Analysis of profile data MSPathFinder

Hi all,
I have acquired profile data (MS1 and MS2) on an Thermo instrument.
I have now tested the following two MSPathFinder piplines:

Use the raw file as input for pbf generation via PbfGen and Promex deconvolution.
Convert raw file with msconvert peak picking to mzML and subsequently use this mzml file as input for PbfGen and Promex.

Both workflows run successfully and give a similar number of identifications. BUT the overlap of the identified proteoforms is only 45% (comparing sequences).
I am very unsure which results I can trust.
Looking forward to your feedback!
Cheers,
Konrad

Create .mzid file if -tda 0

Currently does not create an .mzid file if -tda 0 is used. Can this be altered so that an .mzid file is always produced?

Sequence contains no elements

Hi, all.

I have reproduced a problem when I run the 1.1.8305 release on my test file.

My script for execution looks like this:

"C:\Program Files\Informed-Proteomics-1.1.8305\ProMex.exe" -i 20151116_F_MaD_Ecolik12pool_ACNplate_ET5hcD10_01bis.raw
"C:\Program Files\Informed-Proteomics-1.1.8305\MSPathFinderT.exe" -i 20151116_F_MaD_Ecolik12pool_ACNplate_ET5hcD10_01bis.raw -d uniprot-proteome_UP000000625.fasta -ic 0 -mod IP-PTMs.txt -tda 0

(This RAW file isn't public, but any of the experiments from PXD019247 would probably be a fine replacement for it.)

My IP-PTMs.txt file looks like this:

NumMods=3

# Static
# C2H3NO,C,opt,any,Carbamidomethyl

# Variable
O1,    M, opt, any,         Oxidation
H-2,   C, opt, any,         Disulfide
C2H2O1,*, opt, Prot-N-term, Acetyl

The error output looks like the following:
Calculating spectral E-values for target-spectrum matches
Estimated matched sequences: 673
Processing, 0 proteins done, 0.0% complete, 1.4 sec elapsed
Processing, 54 proteins done, 8.0% complete, 17.2 sec elapsed
Processing, 104 proteins done, 15.5% complete, 32.9 sec elapsed
Processing, 152 proteins done, 22.6% complete, 48.0 sec elapsed
Processing, 195 proteins done, 29.0% complete, 63.0 sec elapsed
Processing, 243 proteins done, 36.1% complete, 78.2 sec elapsed
Processing, 294 proteins done, 43.7% complete, 94.1 sec elapsed
Processing, 341 proteins done, 50.7% complete, 109.4 sec elapsed
Processing, 434 proteins done, 64.5% complete, 140.7 sec elapsed
Processing, 504 proteins done, 74.9% complete, 170.8 sec elapsed
Processing, 572 proteins done, 85.0% complete, 200.8 sec elapsed
Processing, 646 proteins done, 96.0% complete, 231.0 sec elapsed
Total Progress: 53.33%, 0d 0h 5.00m elapsed, Current Task: Calculating spectral E-values for target-spectrum matches, estimated remaining: 0d 0h 4.37m
Target-spectrum match E-value calculation elapsed Time: 262.8 sec

Exception while processing: Sequence contains no elements

Stack trace:
MSPathFinderT.Program.ProcessFiles
InformedProteomics.TopDown.Execution.IcTopDownLauncher.RunSearch
InformedProteomics.TopDown.Execution.MzidResultsWriter.WriteResultsToMzid
InformedProteomics.TopDown.Execution.MzidResultsWriter.CreateMzidSettings
System.Linq.Enumerable.First[TSource]

Exception while processing: Sequence contains no elements
Stack trace: MSPathFinderT.Program.ProcessFiles-:-InformedProteomics.TopDown.Execution.IcTopDownLauncher.RunSearch-:-InformedProteomics.TopDown.Execution.MzidResultsWriter.WriteResultsToMzid-:-InformedProteomics.TopDown.Execution.MzidResultsWriter.CreateMzidSettings-:-System.Linq.Enumerable.First[TSource]

What does this warning mean?

Warning: scores is null for index 0 in scan 1448

Warning: scores is null for index 0 in scan 1479

Why is there no data in the generated 2DLC_H3_1.ms1ft file?

PBFGen.exe -s 2DLC_H3_1.mzML > MSpathfinderResult201902221_search2.log

ProMex.exe -i 2DLC_H3_1.pbf -minCharge 2 -score n -csv n > MSpathfinderResult201902222_search2.log

Complete MS1 feature extraction.

Elapsed time = 1045.317 sec
Number of extracted features = 0
Start selecting mutually independent features from feature network graph
Complete feature filtration
Elapsed time = 0.068 sec
Number of filtered features = 0
ProMex output: G:\SYS\new_mspathfinder\Informed-Proteomics1\2DLC_H3_1.ms1ft
Feature map image output: G:\SYS\new_mspathfinder\Informed-Proteomics1\2DLC_H3_1_ms1ft.png

Is there any way to batch export XIC data points(time & Intensity) of all PrSMs?

I am currently trying to create some customized XICs which would need the retention time and intensities of all the precursors in all the PrSM. Is there any way to batch export these data? I know I can see the XIC of PrSMs through the LCMSSpectator, but it does not allow me to export the data points. Even if it does, I will have to do it one by one.
I am thinking that the required information maybe stored in the pbf file, but I do not know how to parse it.

.param/.mzid file does not include activation method

The .param file appears not to shown the specified -act which was specified, likewise as far as I can see it also isn't shown in the .mzid file.
It would handy specifying it both of these for results validation purposes. Certainly would be good to know what was actually chosen when the default -act 6 is used.

Why am I running MSPathFinder slow?

I read the paper "Informed-Proteomics: open-source software package for top-down proteomics", which mentions that MSPathFinder is a faster proteoform identification tool. But I don't know where I am operating, which makes the identification speed slower. First, I used the msconvert tool in the ProteoWizard package to convert the original spectral file to the .mzML file format. Then experimented with the following parameters, MSPathFinderT.exe running total time is 8d8h12m.
SpecFile 2DLC_H3_1.pbf
DatabaseFile human_proteome_database.fasta
FeatureFile 2DLC_H3_1.ms1ft
InternalCleavageMode SingleInternalCleavage
Tag-based search True
Tda Target+Decoy
PrecursorIonTolerancePpm 10
ProductIonTolerancePpm 10
MinSequenceLength 21
MaxSequenceLength 300
MinPrecursorIonCharge 2
MaxPrecursorIonCharge 30
MinProductIonCharge 1
MaxProductIonCharge 20
MinSequenceMass 3000
MaxSequenceMass 50000
ActivationMethod Unknown
MaxDynamicModificationsPerSequence 0

When I used the default parameters below and added a modified file, the experiment speed became slower. Running 0d 22h 35.02m only ran 0.4%. I want to know  where the problem is?

SpecFile 2DLC_H3_1.pbf
DatabaseFile human_proteome_database.fasta
FeatureFile 2DLC_H3_1.ms1ft
InternalCleavageMode SingleInternalCleavage
Tag-based search True
Tda Target+Decoy
PrecursorIonTolerancePpm 10
ProductIonTolerancePpm 10
MinSequenceLength 21
MaxSequenceLength 500
MinPrecursorIonCharge 2
MaxPrecursorIonCharge 50
MinProductIonCharge 1
MaxProductIonCharge 20
MinSequenceMass 3000
MaxSequenceMass 50000
ActivationMethod Unknown
MaxDynamicModificationsPerSequence 4
Modification C(2) H(2) N(0) O(1) S(0),R,opt,Everywhere,Acetyl
Modification C(2) H(2) N(0) O(1) S(0),K,opt,Everywhere,Acetyl
Modification C(1) H(2) N(0) O(0) S(0),R,opt,Everywhere,Methyl
Modification C(1) H(2) N(0) O(0) S(0),K,opt,Everywhere,Methyl
Modification C(2) H(4) N(0) O(0) S(0),R,opt,Everywhere,Dimethyl
Modification C(2) H(4) N(0) O(0) S(0),K,opt,Everywhere,Dimethyl
Modification C(3) H(6) N(0) O(0) S(0),R,opt,Everywhere,Trimethyl
Modification C(0) H(1) N(0) O(3) S(0) P(1),S,opt,Everywhere,Phospho
Modification C(0) H(1) N(0) O(3) S(0) P(1),T,opt,Everywhere,Phospho
Modification C(0) H(1) N(0) O(3) S(0) P(1),Y,opt,Everywhere,Phospho

Processing, 93499 proteins done, 0.4% complete, 80266.5 sec elapsed
Total Progress: 42.58%, 0d 22h 20.02m elapsed, Current Task: Searching the targe
t database
Processing, 93950 proteins done, 0.4% complete, 80566.8 sec elapsed
Total Progress: 42.58%, 0d 22h 25.02m elapsed, Current Task: Searching the targe
t database
Processing, 94352 proteins done, 0.4% complete, 80881.6 sec elapsed
Total Progress: 42.58%, 0d 22h 30.02m elapsed, Current Task: Searching the targe
t database
Processing, 94955 proteins done, 0.4% complete, 81189.5 sec elapsed
Total Progress: 42.58%, 0d 22h 35.02m elapsed, Current Task: Searching the targe
t database
Another problem is that the. fasta file I used contains 20410 entries, why the search shows that 94352 proteins done?

Promex csv error

Hi, When asking promex to write csv file, I am getting "System.Int32[]" in every cell in FeatureID field.

best

Artur Pirog

How to install?

Hi,

I am pretty sure I am doing something wrong, but I would like to know how to install Informed-Proteomics (and LCMS-Spectator) on Windows.

I have downloaded Informed-Proteomics-master and LCM-Spectator-master from GitHub, but I was not able to install both. I downloaded Inno Setup to compile .ISS files, but both installers gave me an error "The [Setup] section must include an AppVersion or AppVerName directive". These errors were corrected by removing a ; sign (line 98 for LCMS-Spectator, line 110 for Informed-Proteomics). Then when trying to compile again, another error shows about not finding a file at BIN folder (LcmsSpectator\bin\Release\LcmsSpectator.exe;). For Informed-Proteomics the error is about a .DLL not found again in the BIN folder. I think the problem is that there is no BIN folder in the .ZIP from GitHub. Could you help me to install these softwares? Thanks a lot.

General help

Hey,
I'm trying to implement MSPathFinderT in our lab, and having difficulties to establish a pbf file which contain the MS1 and MS2 files, I managed to turn the raw files using pbfgen, but when I try to use promax or mspathfindert I get an error which says that "Datafile has no MS1 spectra".
I'm sorry for the basic question, couldn't find any solution in the tutorial though.
Thanks in advance.

FDR filtering

Hi all,
I have a question about filtering MSPathFinder results after target/decoy search.
The PrSMs in the *lcTda.tsv-file are propably not filtered for a certain FDR. So I have to do it afterwards.
I am now wondering what value I should use: Whats the difference between QValue and PepQValue?
And if I want a final FDR of 1%, do I just exclude all PrSMs with a higher value than 0.01?
Thanks a lot!
Cheers,
Konrad

Is MSPathFinderT able to report isobaric proteoforms?

I have been using Informed-Proteomics to analysis Histone Top-down spectrums, and as you may already know, there are a lot of near-isobaric or isobaric proteoforms.

My question is, is MsPathFinderT able to report multiple proteoforms from a MS2 spectrum? According to my understanding to the Informed-Proteomics article and my application of this software, it would only report one best PrSM per one MS2 spectrum. In other words, if multiple (near-) isobaric proteoforms were co-isolated and co-fragmented in a MS2 scan, only the one with the best score will be reported.

Am I right about this?

Specify scan start and scan end in ProMex

Is there a way to run Promex on a subset of a .pbf file? Currently I have to generate two different .pbf files to accomplish this.

Issues in searching Thermo EThcD raw data

There are some issues in using MSPathfinderT to search thermo .raw EThcD data.

I understand that Informed Proteomics would handle EThcD spectra as ETD spectra, for ETD is the major driving fragmentation force in EThcD.

Yet, I found out that when thermo .raw file is used to generate the .pbf data, MSPathfinderT would automatically assign HCD to EThcD spectra and it would only handle the b y ions. For these ions were not the majority, most of the PrSM had bad scores.
I have tried to input -act ETD (i.e. specify the activation method is ETD) in the command line, but the same problem will still be resulted. (Search result and raw file here)

I pondered that this problem may due to the .raw format. Therefore, I tried to first convert the .raw file to .mzml then use it to generate a .pbf to do the search again. This time it had worked. MSPathfinderT were able to handle the MS2 spectra as ETD spectra (i.e. search c z ions), with or without my specification of -act ETD. (Search result and mzml file here)

Another Issue is that the IcTda or IcTarget #matched fragment ions does not seem to match with that reported in LCMSSpectator. For example in this PrSM, it was reported that the #Matching ions is 74.

Nonetheless, as reported in the LCMSSpectator, there should only be 68 ions. In some cases, there were more ions reported in LCMSSpectator than in the IcTda or IcTarget

Lastly, since MSPathfinderT can now support UVPD, which has even more ion types than EThcD, would MSPathFinderT able to support EThcD (i.e. to search for all b y c z ions) in the near future? I know it is quite a bit to ask but in the case of highly modified proteins like histones, thorough fragment information is quite important to locate the PTMs and sequence variants. So adding this function would definitely help this field a lot.

Help to specify modification

I want to add some modifications to the mod.txt file. Can you point me to the guidance how to do that? e.g. what is the proper format of the text in the .txt file, and how do I specify the number of atoms for each element?

Thanks!

MSPathFinderT.exe crash

Hello,

MSPathFinderT.exe crashes issuing the following error:
what could have gone wrong?

Total Progress: 94.97%, 2d 14h 45.49m elapsed, Current Task:
Collected candidate matches: 0
Decoy database search elapsed Time: 109057.9 sec
Calculating spectral E-values for decoy-spectrum matches
Estimated matched sequences: 0
Decoy-spectrum match E-value calculation elapsed Time: 0.0 sec
Error computing FDR: Cannot compute FDR Scores; target file is empty
Error processing Frac4_140kR_01.raw: Cannot compute FDR Scores; target file is empty

Error computing FDR: Cannot compute FDR Scores; target file is empty

I directly use MSPathFinderT to analyze my data(Thermo .raw), however, it goes wrong. I don't know know how to fix it, could anybody help me?

C:\Users\Neal\Documents>MSPathFinderT.exe -s D:\10_26data\Chem2-2-1.raw -d D:\10_26data\all.fasta -o D:\10_26data\new -t 10 -f 10 -m 1 -tda 1 -minLength 21 -maxLength 300 -minCharge 2 -maxCharge 30 -minFragCharge 1 -maxFragCharge 15 -minMass 3000 -maxMass 50000 -mod D:\10_26data\MSPathFinder_Mods.txt
MSPathFinderT version 1.0.6510 (Oct. 28, 2017)
MaxThreads: 6
SpectrumFilePath: D:\10_26data\Chem2-2-1.raw
DatabaseFilePath: D:\10_26data\all.fasta
FeatureFilePath: N/A
OutputDir: D:\10_26data\new
InternalCleavageMode: SingleInternalCleavage
Tag-based search: True
Tda: Target+Decoy
PrecursorIonTolerancePpm: 10
ProductIonTolerancePpm: 10
MinSequenceLength: 21
MaxSequenceLength: 300
MinPrecursorIonCharge: 2
MaxPrecursorIonCharge: 30
MinProductIonCharge: 1
MaxProductIonCharge: 15
MinSequenceMass: 3000
MaxSequenceMass: 50000
MaxDynamicModificationsPerSequence: 4
Modifications:
C(0) H(0) N(0) O(1) S(0),M,opt,Everywhere,Oxidation
C(0) H(-1) N(0) O(0) S(0),C,opt,Everywhere,Dehydro
C(2) H(2) N(0) O(1) S(0),*,opt,ProteinNTerm,Acetyl
Creating and loading pbf file...
Total Progress: 0.00%, 0d 0h 0.00m elapsed, Current Task: Reading spectra file
Elapsed Time: 5.2 sec
Reading Fasta File
Generating D:\10_26data\all.icseq and
Generating D:\10_26data\all.icanno ... Done
Reading ProMex results...
3155/3155 features loaded...Elapsed Time: 2.0 sec
Generating deconvoluted spectra for MS/MS spectra...
Elapsed Time: 0.0 sec
Generating sequence tags for MS/MS spectra...
Number of spectra: 0
Generated sequence tags: 0
Elapsed Time: 0.0 sec
Caching peaks in MS1 spectra: 25329 scans
Sorting MS1 peaks: 54,689,404 peaks
Reading the target database...
Elapsed Time: 0.3 sec
Tag-based searching the target database
Number of spectra containing sequence tags: 0
Collected candidate matches: 0
Target database tag-based search elapsed Time: 35.5 sec
Searching the target database
Generating D:\10_26data\all.icplcp ... Done
Estimated Sequences: 37,834,838
Processing, 0 proteins done, 0.0% complete, 0.0 sec elapsed
Processing, 106315 proteins done, 0.3% complete, 15.1 sec elapsed
Processing, 118028 proteins done, 0.3% complete, 30.1 sec elapsed
Processing, 216527 proteins done, 0.6% complete, 45.1 sec elapsed
Processing, 323031 proteins done, 0.9% complete, 60.1 sec elapsed
Processing, 427459 proteins done, 1.1% complete, 75.1 sec elapsed
Total Progress: 42.70%, 0d 0h 5.00m elapsed, Current Task: Searching the target database
Processing, 532183 proteins done, 1.4% complete, 90.1 sec elapsed
Processing, 637148 proteins done, 1.7% complete, 105.1 sec elapsed
Processing, 768883 proteins done, 2.0% complete, 135.1 sec elapsed
Processing, 953651 proteins done, 2.5% complete, 165.1 sec elapsed
Processing, 1101862 proteins done, 2.9% complete, 195.9 sec elapsed
Processing, 1285452 proteins done, 3.4% complete, 225.9 sec elapsed
Processing, 1422493 proteins done, 3.8% complete, 255.9 sec elapsed
Processing, 1510383 proteins done, 4.0% complete, 285.9 sec elapsed
Processing, 1672899 proteins done, 4.4% complete, 346.0 sec elapsed
Total Progress: 43.36%, 0d 0h 10.00m elapsed, Current Task: Searching the target database
Processing, 2041036 proteins done, 5.4% complete, 406.0 sec elapsed
Processing, 2368429 proteins done, 6.3% complete, 466.0 sec elapsed
Processing, 2683408 proteins done, 7.1% complete, 527.5 sec elapsed
Processing, 2991146 proteins done, 7.9% complete, 587.5 sec elapsed
Processing, 3309509 proteins done, 8.7% complete, 647.5 sec elapsed
Total Progress: 44.12%, 0d 0h 15.04m elapsed, Current Task: Searching the target database
Processing, 3681335 proteins done, 9.7% complete, 707.5 sec elapsed
Processing, 4036809 proteins done, 10.7% complete, 767.5 sec elapsed
Processing, 4308919 proteins done, 11.4% complete, 827.6 sec elapsed
Processing, 4548002 proteins done, 12.0% complete, 887.8 sec elapsed
Processing, 4752700 proteins done, 12.6% complete, 947.8 sec elapsed
Total Progress: 44.75%, 0d 0h 20.09m elapsed, Current Task: Searching the target database
Processing, 4939771 proteins done, 13.1% complete, 1007.8 sec elapsed
Processing, 4939772 proteins done, 13.1% complete, 1007.8 sec elapsed
Processing, 4939773 proteins done, 13.1% complete, 1007.8 sec elapsed
Processing, 5122628 proteins done, 13.5% complete, 1069.0 sec elapsed
Processing, 5122627 proteins done, 13.5% complete, 1069.0 sec elapsed
Processing, 5328186 proteins done, 14.1% complete, 1129.0 sec elapsed
Processing, 5535302 proteins done, 14.6% complete, 1191.3 sec elapsed
Total Progress: 45.21%, 0d 0h 25.09m elapsed, Current Task: Searching the target database
Processing, 6754622 proteins done, 17.9% complete, 1492.6 sec elapsed
Total Progress: 45.77%, 0d 0h 30.09m elapsed, Current Task: Searching the target database
Processing, 7896082 proteins done, 20.9% complete, 1792.6 sec elapsed
Total Progress: 46.38%, 0d 0h 35.14m elapsed, Current Task: Searching the target database
Processing, 9470041 proteins done, 25.0% complete, 2092.6 sec elapsed
Total Progress: 47.03%, 0d 0h 40.14m elapsed, Current Task: Searching the target database
Processing, 10584900 proteins done, 28.0% complete, 2393.7 sec elapsed
Total Progress: 47.50%, 0d 0h 45.19m elapsed, Current Task: Searching the target database
Processing, 11487978 proteins done, 30.4% complete, 2694.0 sec elapsed
Total Progress: 47.98%, 0d 0h 50.21m elapsed, Current Task: Searching the target database
Processing, 12217755 proteins done, 32.3% complete, 2994.0 sec elapsed
Total Progress: 48.23%, 0d 0h 55.22m elapsed, Current Task: Searching the target database
Processing, 12725499 proteins done, 33.6% complete, 3296.6 sec elapsed
Total Progress: 48.39%, 0d 1h 0.26m elapsed, Current Task: Searching the target database
Total Progress: 48.39%, 0d 1h 5.27m elapsed, Current Task: Searching the target database
Total Progress: 48.39%, 0d 1h 10.29m elapsed, Current Task: Searching the target database
Total Progress: 48.39%, 0d 1h 15.31m elapsed, Current Task: Searching the target database
Total Progress: 48.39%, 0d 1h 20.32m elapsed, Current Task: Searching the target database
Total Progress: 48.39%, 0d 1h 25.34m elapsed, Current Task: Searching the target database
Processing, 12725500 proteins done, 33.6% complete, 5000.6 sec elapsed
Total Progress: 48.84%, 0d 1h 30.34m elapsed, Current Task: Searching the target database
Processing, 14227281 proteins done, 37.6% complete, 5300.6 sec elapsed
Total Progress: 49.45%, 0d 1h 35.38m elapsed, Current Task: Searching the target database
Processing, 15403529 proteins done, 40.7% complete, 5600.7 sec elapsed
Total Progress: 49.87%, 0d 1h 40.38m elapsed, Current Task: Searching the target database
Processing, 16221767 proteins done, 42.9% complete, 5900.7 sec elapsed
Total Progress: 50.28%, 0d 1h 45.39m elapsed, Current Task: Searching the target database
Processing, 17104889 proteins done, 45.2% complete, 6203.2 sec elapsed
Total Progress: 50.69%, 0d 1h 50.39m elapsed, Current Task: Searching the target database
Processing, 18226901 proteins done, 48.2% complete, 6505.3 sec elapsed
Total Progress: 51.20%, 0d 1h 55.39m elapsed, Current Task: Searching the target database
Processing, 19098473 proteins done, 50.5% complete, 6806.7 sec elapsed
Total Progress: 51.62%, 0d 2h 0.42m elapsed, Current Task: Searching the target database
Processing, 20256051 proteins done, 53.5% complete, 7106.7 sec elapsed
Total Progress: 52.17%, 0d 2h 5.42m elapsed, Current Task: Searching the target database
Processing, 21211621 proteins done, 56.1% complete, 7406.7 sec elapsed
Total Progress: 52.57%, 0d 2h 10.43m elapsed, Current Task: Searching the target database
Processing, 22091816 proteins done, 58.4% complete, 7708.7 sec elapsed
Total Progress: 53.12%, 0d 2h 15.48m elapsed, Current Task: Searching the target database
Processing, 23624794 proteins done, 62.4% complete, 8008.7 sec elapsed
Total Progress: 53.93%, 0d 2h 20.48m elapsed, Current Task: Searching the target database
Processing, 25145852 proteins done, 66.5% complete, 8311.5 sec elapsed
Total Progress: 54.50%, 0d 2h 25.51m elapsed, Current Task: Searching the target database
Processing, 26387047 proteins done, 69.7% complete, 8613.3 sec elapsed
Total Progress: 55.08%, 0d 2h 30.54m elapsed, Current Task: Searching the target database
Processing, 27652102 proteins done, 73.1% complete, 8913.3 sec elapsed
Total Progress: 55.66%, 0d 2h 35.54m elapsed, Current Task: Searching the target database
Processing, 28895951 proteins done, 76.4% complete, 9215.1 sec elapsed
Total Progress: 56.24%, 0d 2h 40.58m elapsed, Current Task: Searching the target database
Processing, 30139855 proteins done, 79.7% complete, 9515.2 sec elapsed
Total Progress: 56.80%, 0d 2h 45.59m elapsed, Current Task: Searching the target database
Processing, 31342528 proteins done, 82.8% complete, 9815.2 sec elapsed
Total Progress: 57.34%, 0d 2h 50.59m elapsed, Current Task: Searching the target database
Processing, 32435944 proteins done, 85.7% complete, 10117.8 sec elapsed
Total Progress: 57.87%, 0d 2h 55.59m elapsed, Current Task: Searching the target database
Processing, 33930982 proteins done, 89.7% complete, 10417.8 sec elapsed
Total Progress: 58.68%, 0d 3h 0.61m elapsed, Current Task: Searching the target database
Processing, 35360872 proteins done, 93.5% complete, 10717.8 sec elapsed
Total Progress: 59.19%, 0d 3h 5.62m elapsed, Current Task: Searching the target database
Processing, 36518054 proteins done, 96.5% complete, 11017.8 sec elapsed
Total Progress: 59.74%, 0d 3h 10.67m elapsed, Current Task: Searching the target database
Processing, 37390107 proteins done, 98.8% complete, 11319.5 sec elapsed
Collected candidate matches: 0
Target database search elapsed Time: 11399.8 sec
Calculating spectral E-values for target-spectrum matches
Estimated matched sequences: 0
Target-spectrum match E-value calculation elapsed Time: 0.1 sec
Creating D:\10_26data\all.icsfldecoy.fasta
Generating D:\10_26data\all.icsfldecoy.icseq and
Generating D:\10_26data\all.icsfldecoy.icanno ... Done
Reading the decoy database...
Elapsed Time: 0.5 sec
Tag-based searching the decoy database
Number of spectra containing sequence tags: 0
Collected candidate matches: 0
Decoy database tag-based search elapsed Time: 12.1 sec
Searching the decoy database
Generating D:\10_26data\all.icsfldecoy.icplcp ... Done
Estimated Sequences: 37,834,838
Processing, 0 proteins done, 0.0% complete, 0.0 sec elapsed
Processing, 91058 proteins done, 0.2% complete, 15.0 sec elapsed
Processing, 104777 proteins done, 0.3% complete, 36.5 sec elapsed
Processing, 104775 proteins done, 0.3% complete, 36.5 sec elapsed
Processing, 104780 proteins done, 0.3% complete, 36.5 sec elapsed
Processing, 199050 proteins done, 0.5% complete, 51.5 sec elapsed
Processing, 299404 proteins done, 0.8% complete, 66.5 sec elapsed
Processing, 399679 proteins done, 1.1% complete, 81.6 sec elapsed
Processing, 473324 proteins done, 1.3% complete, 99.3 sec elapsed
Total Progress: 77.72%, 0d 3h 15.70m elapsed, Current Task: Searching the decoy database
Processing, 473827 proteins done, 1.3% complete, 116.2 sec elapsed
Processing, 473826 proteins done, 1.3% complete, 116.2 sec elapsed
Processing, 473828 proteins done, 1.3% complete, 116.2 sec elapsed
Processing, 474920 proteins done, 1.3% complete, 147.2 sec elapsed
Processing, 576058 proteins done, 1.5% complete, 177.2 sec elapsed
Processing, 776854 proteins done, 2.1% complete, 207.2 sec elapsed
Processing, 978487 proteins done, 2.6% complete, 237.2 sec elapsed
Processing, 1122059 proteins done, 3.0% complete, 267.2 sec elapsed
Processing, 1322696 proteins done, 3.5% complete, 297.2 sec elapsed
Processing, 1707488 proteins done, 4.5% complete, 357.2 sec elapsed
Total Progress: 78.42%, 0d 3h 20.70m elapsed, Current Task: Searching the decoy database
Processing, 2026830 proteins done, 5.4% complete, 417.3 sec elapsed
Processing, 2367141 proteins done, 6.3% complete, 478.8 sec elapsed
Processing, 2721653 proteins done, 7.2% complete, 538.8 sec elapsed
Processing, 3082872 proteins done, 8.1% complete, 598.8 sec elapsed
Processing, 3434130 proteins done, 9.1% complete, 658.8 sec elapsed
Total Progress: 79.22%, 0d 3h 25.70m elapsed, Current Task: Searching the decoy database
Processing, 3745457 proteins done, 9.9% complete, 718.8 sec elapsed
Processing, 3996486 proteins done, 10.6% complete, 778.8 sec elapsed
Processing, 4350716 proteins done, 11.5% complete, 838.9 sec elapsed
Processing, 4716227 proteins done, 12.5% complete, 904.6 sec elapsed
Processing, 4716229 proteins done, 12.5% complete, 904.6 sec elapsed
Processing, 5132499 proteins done, 13.6% complete, 964.6 sec elapsed
Total Progress: 80.00%, 0d 3h 30.71m elapsed, Current Task: Searching the decoy database
Processing, 5520902 proteins done, 14.6% complete, 1024.6 sec elapsed
Processing, 6057630 proteins done, 16.0% complete, 1084.6 sec elapsed
Processing, 6560208 proteins done, 17.3% complete, 1144.6 sec elapsed
Total Progress: 81.24%, 0d 3h 35.71m elapsed, Current Task: Searching the decoy database
Processing, 9229408 proteins done, 24.4% complete, 1444.6 sec elapsed
Total Progress: 82.43%, 0d 3h 40.71m elapsed, Current Task: Searching the decoy database
Processing, 11711076 proteins done, 31.0% complete, 1744.7 sec elapsed
Total Progress: 83.50%, 0d 3h 45.71m elapsed, Current Task: Searching the decoy database
Processing, 13778610 proteins done, 36.4% complete, 2044.7 sec elapsed
Total Progress: 84.34%, 0d 3h 50.71m elapsed, Current Task: Searching the decoy database
Processing, 15581306 proteins done, 41.2% complete, 2344.7 sec elapsed
Total Progress: 85.13%, 0d 3h 55.71m elapsed, Current Task: Searching the decoy database
Processing, 17190416 proteins done, 45.4% complete, 2644.7 sec elapsed
Total Progress: 85.85%, 0d 4h 0.71m elapsed, Current Task: Searching the decoy database
Processing, 18745544 proteins done, 49.5% complete, 2944.7 sec elapsed
Total Progress: 86.40%, 0d 4h 5.71m elapsed, Current Task: Searching the decoy database
Processing, 19900230 proteins done, 52.6% complete, 3244.7 sec elapsed
Total Progress: 87.10%, 0d 4h 10.71m elapsed, Current Task: Searching the decoy database
Processing, 21381784 proteins done, 56.5% complete, 3546.7 sec elapsed
Total Progress: 87.72%, 0d 4h 15.76m elapsed, Current Task: Searching the decoy database
Processing, 22653945 proteins done, 59.9% complete, 3846.7 sec elapsed
Total Progress: 88.33%, 0d 4h 20.76m elapsed, Current Task: Searching the decoy database
Processing, 24074766 proteins done, 63.6% complete, 4147.1 sec elapsed
Total Progress: 88.99%, 0d 4h 25.76m elapsed, Current Task: Searching the decoy database
Processing, 25388499 proteins done, 67.1% complete, 4447.1 sec elapsed
Total Progress: 89.56%, 0d 4h 30.81m elapsed, Current Task: Searching the decoy database
Processing, 26602039 proteins done, 70.3% complete, 4748.9 sec elapsed
Total Progress: 90.13%, 0d 4h 35.85m elapsed, Current Task: Searching the decoy database
Processing, 27836037 proteins done, 73.6% complete, 5050.5 sec elapsed
Total Progress: 90.70%, 0d 4h 40.86m elapsed, Current Task: Searching the decoy database
Processing, 29086254 proteins done, 76.9% complete, 5351.3 sec elapsed
Total Progress: 91.28%, 0d 4h 45.88m elapsed, Current Task: Searching the decoy database
Processing, 30335458 proteins done, 80.2% complete, 5651.3 sec elapsed
Total Progress: 91.84%, 0d 4h 50.88m elapsed, Current Task: Searching the decoy database
Processing, 31534228 proteins done, 83.3% complete, 5951.9 sec elapsed
Processing, 31534229 proteins done, 83.3% complete, 5951.9 sec elapsed
Total Progress: 92.36%, 0d 4h 55.91m elapsed, Current Task: Searching the decoy database
Processing, 32562446 proteins done, 86.1% complete, 6254.3 sec elapsed
Total Progress: 92.87%, 0d 5h 0.93m elapsed, Current Task: Searching the decoy database
Processing, 33775607 proteins done, 89.3% complete, 6557.1 sec elapsed
Total Progress: 93.44%, 0d 5h 5.98m elapsed, Current Task: Searching the decoy database
Processing, 34961566 proteins done, 92.4% complete, 6858.7 sec elapsed
Total Progress: 93.96%, 0d 5h 11.00m elapsed, Current Task: Searching the decoy database
Processing, 36073780 proteins done, 95.3% complete, 7160.8 sec elapsed
Total Progress: 94.51%, 0d 5h 16.05m elapsed, Current Task: Searching the decoy database
Processing, 37283875 proteins done, 98.5% complete, 7460.8 sec elapsed
Collected candidate matches: 0
Decoy database search elapsed Time: 7626.0 sec
Calculating spectral E-values for decoy-spectrum matches
Estimated matched sequences: 0
Decoy-spectrum match E-value calculation elapsed Time: 0.0 sec

Warning: Error computing FDR: Cannot compute FDR Scores; target file is empty
Error processing Chem2-2-1.raw: Cannot compute FDR Scores; target file is empty

limiting search by mass, charge and seqence length

After running mspathfindert, I see that whole mass and charge range is searched, not only limited by minimum mass or charge. Can I somehow limit this, or is it a bug in the software? I would like to shorten running time.

Output deconvoluted spectra for external visualization/analysis

Hello, it would be useful to output deconvoluted mass spectra in mzml format.