proteowizard / pwiz Goto Github PK

The ProteoWizard Library is a set of software libraries and tools for rapid development of mass spectrometry and proteomic data analysis software.

Home Page: http://proteowizard.sourceforge.net/

License: Apache License 2.0

Batchfile 0.09% Shell 0.07% Python 1.07% Makefile 0.01% C++ 31.79% C 11.47% Yacc 0.03% Objective-C 0.01% CMake 0.01% HTML 1.31% Gherkin 0.01% Java 0.14% Gnuplot 0.01% C# 53.61% XSLT 0.01% R 0.36% Objective-C++ 0.01% JavaScript 0.02% CSS 0.02% DIGITAL Command Language 0.01%

pwiz's Introduction

The ProteoWizard Library and Tools are a set of modular and extensible open-source, cross-platform tools and software libraries that facilitate proteomics data analysis.

The libraries enable rapid tool creation by providing a robust, pluggable development framework that simplifies and unifies data file access, and performs standard chemistry and LCMS dataset computations.

Core code and libraries are under the Apache open source license; the vendor libraries fall under various vendor-specific licenses.

Features

reference implementation of HUPO-PSI mzML standard mass spectrometry data format
supports HUPO-PSI mzIdentML 1.1 standard mass spectrometry analysis format
supports reading directly from many vendor raw data formats (on Windows)
modern C++ techniques and design principles
cross-platform with native compilers (MSVC on Windows, gcc on Linux, darwin on OSX)
modular design, for testability and extensibility
framework for rapid development of data analysis tools
open source license suitable for both academic and commercial projects (Apache v2)

Official build status

OS	Status
Windows
Native Linux
Wine Linux

Click here to visit the official download page.

Unofficial toolsets

OS	Toolset
Linux	GCC 9.3
OS X	Clang 12

pwiz's People

Contributors

Stargazers

Watchers

Forkers

nickshulman chambm bspratt mdcallah austinkeller justin92 rfellers zrolfs bishwambharsen cmri-procan awd97 heejongkim heavencc lopippo darmao frenchwr scottschreckengaust sneumann pnnl-comp-mass-spec orianbsilva joey10086 mhhur lulzzz dennisgoldfarb jeleclaire byd913 elvint57 fineliu nilshoffmann ewail titusjungiip crerecombinase little-jun biospi yrm9837 bioshare jing-bio hakanaku2009 gushulonghun junli-song fatteeyjibamin david-cox-sciex lioscro linkai1208 stevenshuken yachliu wfondrie ghdulrich csi-studio jpmenetrey laeubisoft mwang87 hechth chrashwood mailaender litao-wrk joshbnewton31080 allan-vennbio biochemia jinyinwang-csi chorpler premierori leeta070 vagisha dpsmca xytjj acylation seagen heliu226 pete-reay-waters zpincus brian-day biotech7 spallanck alanwu75 kpnovoselov sheeeep233 tilfischer nonlineardynamics seagen jorainer zontal seivnoed meowcat satyamisme cpanse brukerlsms mrchipset brukerlsms akhileshkaushal nesvilab upcyuan albertpqr

pwiz's Issues

Filtering ITMS/FTMS from Lumos data files losing all scan data

Dear Proteowizard developers,

I hope you are well. I'm currently having some trouble with msconvert regarding filtering of a Thermo Lumos raw data file containing multiple mass analyzer steps. I am using ProteoWizard 3.0.8725 x64. Direct and non-filtered conversion from raw to mzML, then subsequent import into Skyline works fine and the filter string names are preserved in the unfiltered mzML file.

Curiously however, if implementing filtering for only FTMS (--filter "analyzer FT" OR --filter "analyzer orbi" OR "--filter "analyzerType FTMS"), all scans are lost with only the chromatogram/metadata preserved. I have provided a copy of the command line process run below:

`W:\1_RAW_Data\4_Lumos>>
msconvert Lumos_data.raw --filter "analyzer FT"
format: mzML
m/z: Compression-None, 64-bit
intensity: Compression-None, 32-bit
rt: Compression-None, 64-bit
ByteOrder_LittleEndian
indexed="true"
outputPath: .
extension: .mzML
contactFilename:

filters:
analyzer FT

filenames:
Lumos_data.raw

processing file: Lumos_data.raw
writing output file: .\Lumos_data.mzML
`

This same process works fine with my Velos Orbitrap data so I don't believe my commands to be the issue.

Cheers,
Chris

2019 Updates for Absolute Quant tutorial

This issue is for the 2019 updates to the Absolute Quant tutorial

msconvert 'loadlocale.c:129' error

Hello,

I'm working on an Ubuntu computer and trying to access msconvert via the terminal. I'm getting an error message that I'm having a tough time interpreting:

msconvert: loadlocale.c:129: _nl_intern_locale_data: Assertion `cnt < (sizeof (_nl_value_type_LC_TIME) / sizeof (_nl_value_type_LC_TIME[0]))' failed.
Aborted (core dumped)

Other command line functions in ProteoWizard, like msaccess, do not give the same error and work fine.

I'm running:

Distributor ID: Ubuntu
Description: Ubuntu Bionic Beaver (development branch)
Release: 18.04

Any thoughts are appreciated!

Output mzML file violates schema/spec with empty chromatogram element for non-MS channel

I am trying to convert a Thermo RAW file to mzML using msconvert. When it converts the file it creates an “empty” chromatogram element, i.e. lacks binaryDataArrayList element, for the ECD channel in the data file. This is causing issues trying to use the file in Mascot, as it violates the mzML schema.

Any help would be greatly appreciated.

Here is the snippet from the output mzML file.

      <chromatogramList count="2" defaultDataProcessingRef="pwiz_Reader_Thermo_conversion">
        <chromatogram index="0" id="TIC" defaultArrayLength="13209">
          <cvParam cvRef="MS" accession="MS:1000235" name="total ion current chromatogram" value=""/>
          <binaryDataArrayList count="2">
            <binaryDataArray encodedLength="105924">
...
          </binaryDataArrayList>
        </chromatogram>
        <chromatogram index="1" id="ECD" defaultArrayLength="0">
          <cvParam cvRef="MS" accession="MS:1000813" name="emission chromatogram" value=""/>
        </chromatogram>
      </chromatogramList>

Filter runs in msconvert

Hello and first of all thank you for this amazing tool. The following question is regarding msconvert.exe:

I am trying to convert big (AB SCIEX) .WIFF files containing multiple thousands of single runs. I was wondering if there is any possibility to select or filter the run index as you can do with spectra/chromatograms.

something like:

--runFilter "index 1-500"

Thank you for your time!

Cannot Find MS3 SPS data

ProteoWizard Devs,

I've been trying to find a way to extract SPS data from mzML files made with msconvert but so far am not having any luck. My raw files are from a Thermo Fusion Lumos, and looking through the mzML files by hand I can't find any tag or content that looks like SPS data. I know the data is in the raw somewhere because I can see it when I open the files in Thermo's software, but they seem to be getting lost in the conversion. My only guess so far is that this might be related to the filter strings. I've seen examples online where the sps designation seems to be included in the filter string, but in both my mzML and when opened in Thermo's software my filter strings look like the following: "FTMS + c NSI d Full ms3 [email protected] [email protected] [100.0000-500.0000]".

Is there some setting I'm missing maybe? I apologize if this is not a software issue and is just a user error.

Regards,
Trent

msconvert.exe: The ignoreUnknownInstrumentError true/false assignment is backwards when a config file is used

This is a really minor issue but I wanted to document it. Adding the --ignoreUnknownInstrumentError flag on the command line bypasses the error as expected. I would expect when using a config file that adding a line of ignoreUnknownInstrumentError=true would give the same behavior. However, I need to add ignoreUnknownInstrumentError=false instead.

Contents of my config file:

zlib=true
mz64=true
inten64=true
simAsSpectra=true
ignoreUnknownInstrumentError=false
filter="peakPicking vendor msLevel=1-2"

Using ProteoWizard 3.0.18250.994311be0

I noticed in the code there is a line that is used to flip the logic for use with the unknownInstrumentIsError variable when using the command line flag.

pwiz/pwiz_tools/commandline/msconvert.cpp

Lines 358 to 359 in 2f477cf

    
           // negate unknownInstrumentIsError value since command-line parameter (ignoreUnknownInstrumentError) and the Config parameters use inverse semantics 
        
           config.unknownInstrumentIsError = !config.unknownInstrumentIsError;

I didn't see an obvious way to fix this without breaking the command line flag logic, so I'm leaving this here for later.

header/metadata from Waters raw files

Hello,

I am working on a project to automatically do some diagnostics on files as soon as a run finishes.
The problem I am running into is getting relevant metadata from the converted file.
For example I need to know the file description (in the Masslynx samplelist) and the project name (in our lab I can only get that from the samplelist name).

The info I need is available in the _HEADER.TXT file but is not added to the converted files meta data.
The lines I am interested in is:

$$ Job Code: Some project
$$ Sample Description: Important sample

I was looking through the mzML specs and examples and from what I could gather a possible solution would be to add something like:

    <sampleList count="1">
      <sample id="org_filename.raw" name="Important sample">
     <userParam name="Job Code" value="Some project"/>
      </sample>
    </sampleList>

Or alternatively:

    <sampleList count="1">
      <sample id="org_filename.raw" name="org_filename.raw">
     <userParam name="Job Code" value="Some project"/>
     <userParam name="Sample Description" value="Important sample"/>
      </sample>
    </sampleList>

Would this be something you could consider?
If not, would there be anything wrong about injecting the above into the file?

Confusing Requirements for BiblioSpec Compilation

The Download and Build webpage states "You will need Visual Studio 2013 to build." However, this doesn't seem applicable to installing it when the operating system is Linux. Could it be changed? There is also no branch named "trunk", so I downloaded skyline_4_1 and ran quickbuild.sh which ended with:

Building pwiz...
Jamroot.jam:446: in modules.load
*** argument error
* rule constant ( name : value + )
* called with: ( PWIZ_GIT_BRANCH :  )
* missing argument value

Something was wrong when I build pwiz on ubuntu 16.04

Hello,
I download source file of pwiz (file name: pwiz-src-3_0_19309_1dc16c9.tar.bz2). When I build on ubuntu 16.04, some problems have occurred.

...failed updating 1 target...
...skipped 1 target...
...updated 1776 targets...
At least one pwiz target failed to build.

Before build pwiz, I install boost library and java open-jdk and git. However, I find strange log information.

================================================================
[MSDataAnalyzerApplication] no files found matching "file0"
[MSDataAnalyzerApplication] no files found matching "file1"
[MSDataAnalyzerApplication] no files found matching "file2"
[MSDataAnalyzerApplication] no files found matching "file3"
[MSDataAnalyzerApplication] no files found matching "file4"
[MSDataAnalyzerApplication] no files found matching "MSDataAnalyzerApplicationTest.temp.txt"
[SpectrumListFactory] Ignoring wrapper: coffee
[SpectrumListFactory] Ignoring wrapper: news media

====== BEGIN OUTPUT ======
no vendor test data found (try running without --incremental)

EXIT STATUS: 1
====== END OUTPUT ======

LD_LIBRARY_PATH="/usr/bin:/usr/lib:/usr/lib32:/usr/lib64:$LD_LIBRARY_PATH"

export LD_LIBRARY_PATH

status=0
if test $status -ne 0 ; then
    echo Skipping test execution due to testing.execute=off
    exit 0
fi
name=Reader_Bruker_Test
echo > /dev/null

[SpectrumList_PeakPicker] Warning: vendor peakPicking requested, but peakPicking is not the first filter. Since the vendor DLLs can only operate directly on raw data, this filter will likely not have any effect.
Warning: vendor peakPicking was requested, but is unavailable as it depends on Windows DLLs. Using ProteoWizard centroiding algorithm instead.
High-quality peak-picking can be enabled using the cwt flag.

When I use msconvert to convert tdf file to mzML, the flowing message appear on screen:

Command:
msconvert test.d --mzML --mz32 --inten32 --zlib

Result:
[ReaderFail] [Reader_Bruker::read()] Bruker Analysis reader not implemented: requires CompassXtract which only works on Windows
Error processing file test.d

How can I fix this problem?
Thank you very much!!

MSConvertGUI doesn't sanitize file+samplename for WIFFs

Use reported that a pipe | in their sample names caused an error during conversion.

MSConvert batch processing issue

Hi,

When I run the latest version of MSConvert GUI on a Windows 7 64 bit on SWATH data, some of the scans remain empty with no recorded data. I do not have this issue when convert the files one at the time. Do you know what could cause such a problem?

Cheers,
Saer

Compressed mgf files

msconvert seems to have an inconsistency with regard to mgf compression: in contrast to e.g. mzML, it can produce mgf.gz files but cannot read them (easily). Some other formats might be affected by this issue as well (didn't test).

Generate the inputs:

$ ls
test.mgf
$ msconvert --mzML test.mgf
$ gzip -k test.mgf test.mzML
$ ls
test.mgf test.mgf.gz test.mzML test.mzML.gz

Test them by converting to mzXML:

$ msconvert --mzXML test.mzML
# OK
$ msconvert --mzXML test.mzML.gz
# OK
$ msconvert --mzXML test.mgf
# OK
$ msconvert --mzXML test.mgf.gz
format: mzXML 
    m/z: Compression-None, 64-bit
    intensity: Compression-None, 32-bit
    rt: Compression-None, 64-bit
ByteOrder_LittleEndian
 indexed="true"
outputPath: .
extension: .mzXML
contactFilename: 

spectrum list filters:
  
chromatogram list filters:
  
filenames:
  test.mgf.gz
  
processing file: test.mgf.gz
[ReaderFail]  don't know how to read test.mgf.gz
Error processing file test.mgf.gz

msconvert can be made to read compressed mgf if we remove the .gz extension:

$ mv test.mgf.gz test.mgf.gz.mgf
$ msconvert --mzXML test.mgf.gz.mgf
# OK

... but only if the file was not created with msconvert itself:

$ msconvert --mgf test.mgf -g
format: MGF
outputPath: .
extension: .mgf.gz
contactFilename: 

spectrum list filters:
  
chromatogram list filters:
  
filenames:
  test.mgf
  
processing file: test.mgf
writing output file: ./test.mgf.gz

$ mv test.mgf.gz test.mgf.gz.mgf
$ msconvert --mzXML test.mgf.gz.mgf
format: mzXML 
    m/z: Compression-None, 64-bit
    intensity: Compression-None, 32-bit
    rt: Compression-None, 64-bit
ByteOrder_LittleEndian
 indexed="true"
outputPath: .
extension: .mzXML
contactFilename: 

spectrum list filters:
  
chromatogram list filters:
  
filenames:
  test.mgf.gz.mgf
  
processing file: test.mgf.gz.mgf
writing output file: ./test.mgf.gz.mzXML
Error writing run 1 in "test.mgf.gz.mgf":
[SpectrumList_MGF::spectrum] Error seeking to BEGIN IONS tag

tandem mass spec functions

Hi,
my lab has a waters triple quad instrument and I did a neutral loss scan that involves two functions (at two different collision energy), does the proteowizard MSConvert convert the two functions together in one mzml file?
Thanks

the source tarball downloaded from sourceforge decompresses to the current directory

Greetings,

just a note on the fact that the source tarball decompresses in the current directory, not in a subdirectory. That is not the standard way of crafting compressed tarballs as it generates an unexpected mess in the current directory.

Thanks, Filippo

empty sourceFileList violates schema

Hi, writing an MSData object without sourceFiles results in

      <sourceFileList count="0">
      </sourceFileList>

which violates the schema (I wrongly reported that in HUPO-PSI/mzML#2,
and it is biting us in sneumann/mzR#192)

I think the < sourceFileList > should be skipped
in https://github.com/ProteoWizard/pwiz/blob/master/pwiz/data/msdata/IO.cpp#L462
if count==0. I'd happily create a PR, but wouldn't know how to add a good test case to IOTest.cpp.

Yours, Steffen

Waters synapt2 mobility data: failure to read the scan number

Greetings,

with the last version (downloaded and built today) but with previous ones also, I fail to get the scan number out of the mzML file generated starting from Water Synapt2 raw data. I succeed in getting that scan number when reading Thermo Orbitrap data.

This is what I code:

pwiz::msdata::CVID native_id_format = pwiz::msdata::id::getDefaultNativeIDFormat(*mp_msDataFile);

(Note that the value that is returned is indeed for Waters files, so that id format is registered in pwiz)

Then, for each spectrum I iterate in, I do:

std::size_t scan_num = QString(pwiz::msdata::id::translateNativeIDToScanNumber(
native_id_format, spectrum->id).c_str()).toULong();

But this does not provide proper scan_num values, I systematically get a value of 0.

I saw in the documentation that the getDefaultNativeIDFormat only works for selected file formats, and I could see that Waters Synapt2 files were not supported.

My question: since what I want is an absolutely unique identifier for the various mass spectra in the mzML file, I thought that I could standardize on the spectrum index (that seems to be more general than the scan number in the various file formats).

Am I right if I say that there is always the same number of scans in a mzML file as there are spectra? If so, then I will be able to retrieve a mass spectrum from a mzML file using its index with the following call: spectrum (size_t index, bool getBinaryData=false). True or False?

Thank you for your kind attention,
Cheers,
Filippo

Wine linux version

Hello! On your readme it says that there is a wine linux version (in a docker container?). I just spent half of today trying to get msconvert to work in a wine env in ubuntu so I'm very interested! How can I obtain it?

startTimeStamp in Waters RAW file conversion

Hi,

When using MSConvert on Waters .raw files to mzML, the startTimeStamp field is populated using the HEADER.TXT acquisition date and time fields.
However it appears that if the acquired time contains '08' or '09' as the hour, minute or second; the value is replaced by '00' in the startTimeStamp. For example:

$$ Acquired Date: 05-Sep-2017
$$ Acquired Time: 08:09:08
Results in:
startTimeStamp="2017-09-05T00:00:00Z" instead of startTimeStamp="2017-09-05T08:09:08Z"

We experienced this issue with multiple datasets and it can be simulated by simply changing the acquisition time in HEADER.TXT in any raw data. (tested in today's build ProteoWizard 3.0.18182.5406cdbf0)

Thanks and apologies for cross-posting this in the support-mailing list

Best regards

Clarification of msconvert command line interface

According to msconvert outputs help into stderr msconvert pipes the help via stderr (instead of via stdout). Is the command line interface despite of this design decision implemented like one would usually expect? Is stderr used for all output? Are there return codes? What do the return codes mean?

Missing Precusor Mass when Converting Bruker TimsTOF to MZXML using pwiz

Dear sir,

I am using pwiz to extract TimsTOF signal from .d directory.

However, I found those signals lacking of precusor mass and charge. Just like this one:

I wonder that is because of pwiz or the file itself?

msconvert command line tool missing on Windows 10 Pro

I downloaded Proteowizard, Windows 64-bit Installer (able to convert vendor files except T2D) and installed it on Windows 10 Pro. After installation the MSConvertGUI works just fine. But I miss the msconvert command line tool. Where is the command line tool located and how can I run it via e.g. command prompt?

Error converting raw files in Docker and Singularity

Hello,

I have been trying to construct a pipeline making use of msconvert in a Singularity image. I have tried the images from proteowizard/pwiz-skyline-i-agree-to-the-vendor-licenses and chambm/pwiz-skyline-i-agree-to-the-vendor-licenses running as converted singularity images as well as in Docker and I am running into the same issue with all of them:

The process encounters this error:
[SpectrumWorkerThreads::work] error in thread: [SpectrumList_Thermo::spectrum()] Unknown exception retrieving spectrum "controllerType=0 controllerNumber=1 scan=XXXX",
where XXXX is some scan number. It seems to be random in nature, as sometimes a particular file will finish and other times not. When a file fails, the scan number is not reproducible.

Here is the complete output from a docker run:

root@cfdbe56b4176:/Data# wine msconvert ./2017-04-21_BK10_EM_MA_DR3_03b.raw --mzML --filter "peakPicking vendor"
format: mzML 
    m/z: Compression-None, 64-bit
    intensity: Compression-None, 32-bit
    rt: Compression-None, 64-bit
ByteOrder_LittleEndian
 indexed="true"
outputPath: .
extension: .mzML
contactFilename: 
runIndexSet: 

spectrum list filters:
  peakPicking vendor
  
chromatogram list filters:
  
filenames:
  .\2017-04-21_BK10_EM_MA_DR3_03b.raw
  
processing file: .\2017-04-21_BK10_EM_MA_DR3_03b.raw
0039:err:combase:RoGetActivationFactory Failed to find library for L"Windows.Foundation.Diagnostics.AsyncCausalityTracer"
calculating source file checksums
writing output file: .\2017-04-21_BK10_EM_MA_DR3_03b.mzML
[SpectrumList_PeakPicker]: one or more spectra are already centroided, no processing needed
[SpectrumWorkerThreads::work] error in thread: [SpectrumList_Thermo::spectrum()] Unknown exception retrieving spectrum "controllerType=0 controllerNumber=1 scan=2900"

The 0039:err:combase:RoGetActivationFactory Failed to find library for L"Windows.Foundation.Diagnostics.AsyncCausalityTracer" in there seems like it might be an issue as it has something to do with async process control and the error is coming up in a SpectrumThreadWorker, though that is just a guess on my part. Possibly that is something that is always there and is suppressed when passing the -e WINEDEBUG=-all to Docker.

I have tried using many different .raw files. I have only tested on Orbitrap files. As an example, I have tried files from this dataset: "https://www.ebi.ac.uk/pride/archive/projects/PXD004451". They are larger files, but I have also had this problem with small files (<400 MB).

Thanks for any assistance!
Kevin

MSConvert: mzXML dataProcessing elements fail schema validation

Context

I'm testing schema validation of the output mzML and mzXML's from msconvert and finding that the mzXML writer produces validation errors when multiple processing steps occur. This seems like an issue with the mzXML 3.2 schema itself that disallows multiple occurrences of within a group. Based on the code in pwiz/data/msdata/Serializer_mzXML.cpp, this seems to be an intentionally flexible field where processingOperation or comments can serve to document the processing done.

pwiz/pwiz/data/msdata/Serializer_mzXML.cpp

Lines 358 to 385 in c71da0d

    
           xmlWriter.startElement("dataProcessing", attributes); 
        
           BOOST_FOREACH(const ProcessingMethod& pm, dpPtr->processingMethods) 
        
           { 
        
               CVParam fileFormatConversion = pm.cvParamChild(MS_file_format_conversion); 
        
               string softwareType = fileFormatConversion.empty() ? "processing" : "conversion"; 
        
               if (pm.softwarePtr.get()) 
        
                   writeSoftware(xmlWriter, pm.softwarePtr, msd, cvTranslator, softwareType); 
        
               write_processingOperation(xmlWriter, pm, MS_file_format_conversion); 
        
               write_processingOperation(xmlWriter, pm, MS_peak_picking); 
        
               write_processingOperation(xmlWriter, pm, MS_deisotoping); 
        
               write_processingOperation(xmlWriter, pm, MS_charge_deconvolution); 
        
               write_processingOperation(xmlWriter, pm, MS_thresholding); 
        
               xmlWriter.pushStyle(XMLWriter::StyleFlag_InlineInner); 
        
               BOOST_FOREACH(const UserParam& param, pm.userParams) 
        
               { 
        
                   xmlWriter.startElement("comment"); 
        
                   xmlWriter.characters(param.name + (param.value.empty() ? string() : ": " + param.value)); 
        
                   xmlWriter.endElement(); // comment 
        
               } 
        
               xmlWriter.popStyle(); 
        
           } 
        
           xmlWriter.endElement(); // dataProcessing

However, the schema itself seems to enforce unusually rigid

To reproduce, I converted a thermo raw file with the following msconvert config file:

mzXML=true
zlib=true
mz64=true
inten64=true
simAsSpectra=true
filter="peakPicking vendor msLevel=1-2"
filter="scanNumber 22289-22486"

This produced an mzXML containing the following lines:

<dataProcessing centroided="1">
    <software type="conversion" name="ProteoWizard software" version="3.0.18342"/>
    <processingOperation name="Conversion to mzML"/>
    <software type="processing" name="ProteoWizard software" version="3.0.18342"/>
    <comment>Thermo/Xcalibur peak picking</comment>
</dataProcessing>

Running a validator on the full mzXML with the appropriate schema gives the following validation error:

austin@austin-vm-ubuntu:~/gdrive/data/20181208_fix_demux_mzXML_schema$ xmllint --schema raw/mzXML_schema/schema_revision/mzXML_3.2/mzXML_idx_3.2.xsd data/01_trimmed/23aug2017_hela_serum_timecourse_4mz_narrow_1.mzXML --noout
data/01_trimmed/23aug2017_hela_serum_timecourse_4mz_narrow_1.mzXML:20: element software: Schemas validity error : Element '{http://sashimi.sourceforge.net/schema_revision/mzXML_3.2}software': This element is not expected. Expected is one of ( {http://sashimi.sourceforge.net/schema_revision/mzXML_3.2}processingOperation, {http://sashimi.sourceforge.net/schema_revision/mzXML_3.2}comment ).
data/01_trimmed/23aug2017_hela_serum_timecourse_4mz_narrow_1.mzXML fails to validate

Here's the xmllint validator version:

austin@austin-vm-ubuntu:~/gdrive/data/20181208_fix_demux_mzXML_schema$ xmllint -version
xmllint: using libxml version 20908
   compiled with: Threads Tree Output Push Reader Patterns Writer SAXv1 FTP HTTP DTDValid HTML Legacy C14N Catalog XPath XPointer XInclude Iconv ICU ISO8859X Unicode Regexps Automata Expr Schemas Schematron Modules Debug Zlib Lzma

The problem from the error seems to be that only one software element is allowed per dataProcessing element. Here is the visual XSD diagram from XMLSpy:

Editing the XML manually to ensure that there was at least one processingOperation element per software element did not fix the issue. E.g., the following still produces the same validation error:

    <dataProcessing centroided="1">
      <software type="conversion" name="ProteoWizard software" version="3.0.18342"/>
      <processingOperation name="Conversion to mzML"/>
      <software type="processing" name="ProteoWizard software" version="3.0.18342"/>
      <processingOperation name="Dummy processing op"/>
      <comment>Thermo/Xcalibur peak picking</comment>
    </dataProcessing>

Splitting the dataProcessing group into multiple groups, each with its own software element still fails validation in cases where there is a comment without a specific processingOperation. Here is the XML snippet and corresponding error:

    <dataProcessing centroided="1">
      <software type="conversion" name="ProteoWizard software" version="3.0.18342"/>
      <processingOperation name="Conversion to mzML"/>
    </dataProcessing>
    <dataProcessing>
      <software type="processing" name="ProteoWizard software" version="3.0.18342"/>
      <comment>Thermo/Xcalibur peak picking</comment>
    </dataProcessing>

austin@austin-vm-ubuntu:~/gdrive/data/20181208_fix_demux_mzXML_schema$ xmllint --schema raw/mzXML_schema/schema_revision/mzXML_3.2/mzXML_idx_3.2.xsd data/01_trimmed/23aug2017_hela_serum_timecourse_4mz_narrow_1_edited.mzXML --noout
data/01_trimmed/23aug2017_hela_serum_timecourse_4mz_narrow_1_edited.mzXML:23: element comment: Schemas validity error : Element '{http://sashimi.sourceforge.net/schema_revision/mzXML_3.2}comment': This element is not expected. Expected is ( {http://sashimi.sourceforge.net/schema_revision/mzXML_3.2}processingOperation ).
data/01_trimmed/23aug2017_hela_serum_timecourse_4mz_narrow_1_edited.mzXML fails to validate

However, by adding in a dummy processingOperation element then the XML validates:

    <dataProcessing centroided="1">
      <software type="conversion" name="ProteoWizard software" version="3.0.18342"/>
      <processingOperation name="Conversion to mzML"/>
    </dataProcessing>
    <dataProcessing>
      <software type="processing" name="ProteoWizard software" version="3.0.18342"/>
      <processingOperation name="User-defined"/>
      <comment>Thermo/Xcalibur peak picking</comment>
    </dataProcessing>

austin@austin-vm-ubuntu:~/gdrive/data/20181208_fix_demux_mzXML_schema$ xmllint --schema raw/mzXML_schema/schema_revision/mzXML_3.2/mzXML_idx_3.2.xsd data/01_trimmed/23aug2017_hela_serum_timecourse_4mz_narrow_1_edited.mzXML --noout
data/01_trimmed/23aug2017_hela_serum_timecourse_4mz_narrow_1_edited.mzXML validates

Problem

Tools such as OpenSWATH that perform schema validation during mzXML import can fail on MSConvert mzXML files. Does the 3.2 schema needs to be updated to allow the sort of desired flexibility that the Serializer_mzXML allows? Can we constrain the Serializer_mzXML instead?

[Reader_Thermo::fillInMetadata] unable to parse instrument model

Hi,
we're running a "GC (TraceGC Ultra, Thermo) - MS (SQ, Thermo) with Atlas Injector"
(see https://www.ipb-halle.de/en/research/cell-and-metabolic-biology/technical-resources/)
and get

[Reader_Thermo::fillInMetadata] unable to parse instrument model; 
  please report this error to the ProteoWizard developers with this information: 
  model(ISQ Series) name(ISQ Series)

I guess this needs to be added somewhere near

pwiz/pwiz_aux/msrc/utility/vendor_api/thermo/RawFileTypes.h

Line 168 in 994311b

else if (type == "ISQ") return InstrumentModelType_ISQ;

but unsure if that simply requires to add

else if (type == "ISQ SERIES") return InstrumentModelType_ISQ;

If so, happy to send a PR.
Yours, Steffen

correct cmg parameters

Excuse me, I wonder how to start this process in cmd in order to run it on linux.
I have tried some parameters, like "C:\Program Files\ProteoWizard\ProteoWizard 3.0.11856\msconvert.exe" --filter "peakPicking true 1-" -z *.wiff
however, the outputs are not exactly the same as those generated by GUI,
Thank you for your time!!

Build failure on mingw64

Greetings, Fellow Developers,

I build libpwiz on MS Windows (7 and 10) using the excellent MSYS2/MINGW64 development environment that mimick a GNU/Linux machine.

When I build the 3.0.18342 version in this environment with gcc 8.2.1 20181207, I get the following errors:

pwiz/utility/misc/Filesystem.cpp: In function 'void* {anonymous}::GetLibraryProcAddress(PSTR, PSTR)':
pwiz/utility/misc/Filesystem.cpp:151:30: error: invalid conversion from 'FARPROC' {aka 'long long int ()()'} to 'PVOID' {aka 'void'} [-fpermissive]
return GetProcAddress(GetModuleHandleA(LibraryName), ProcName);
~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

pwiz/utility/misc/Filesystem.cpp:273:27: error: reference to 'SYSTEM_HANDLE_INFORMATION' is ambiguous
DWORD dwSize = sizeof(SYSTEM_HANDLE_INFORMATION);
^~~~~~~~~~~~~~~~~~~~~~~~~
pwiz/utility/misc/Filesystem.cpp:102:12: note: candidates are: 'struct {anonymous}::SYSTEM_HANDLE_INFORMATION'
struct SYSTEM_HANDLE_INFORMATION {
^~~~~~~~~~~~~~~~~~~~~~~~~
In file included from pwiz/utility/misc/Filesystem.cpp:35:
C:/msys64/mingw64/x86_64-w64-mingw32/include/winternl.h:837:5: note: 'typedef struct _SYSTEM_HANDLE_INFORMATION SYSTEM_HANDLE_INFORMATION'
} SYSTEM_HANDLE_INFORMATION, PSYSTEM_HANDLE_INFORMATION;
^~~~~~~~~~~~~~~~~~~~~~~~~
pwiz/utility/misc/Filesystem.cpp:301:35: error: reference to 'SYSTEM_HANDLE_INFORMATION' is ambiguous
auto pInfo = reinterpret_cast<SYSTEM_HANDLE_INFORMATION>(pInfoBytes.data());
^~~~~~~~~~~~~~~~~~~~~~~~~
pwiz/utility/misc/Filesystem.cpp:102:12: note: candidates are: 'struct {anonymous}::SYSTEM_HANDLE_INFORMATION'
struct SYSTEM_HANDLE_INFORMATION {
^~~~~~~~~~~~~~~~~~~~~~~~~
In file included from pwiz/utility/misc/Filesystem.cpp:35:
C:/msys64/mingw64/x86_64-w64-mingw32/include/winternl.h:837:5: note: 'typedef struct _SYSTEM_HANDLE_INFORMATION SYSTEM_HANDLE_INFORMATION'
} SYSTEM_HANDLE_INFORMATION, PSYSTEM_HANDLE_INFORMATION;
^~~~~~~~~~~~~~~~~~~~~~~~~
pwiz/utility/misc/Filesystem.cpp:301:60: error: expected '>' before '' token
auto pInfo = reinterpret_cast<SYSTEM_HANDLE_INFORMATION*>(pInfoBytes.data());
^
pwiz/utility/misc/Filesystem.cpp:301:60: error: expected '(' before '' token
auto pInfo = reinterpret_cast<SYSTEM_HANDLE_INFORMATION>(pInfoBytes.data());
^
(
pwiz/utility/misc/Filesystem.cpp:301:61: error: expected primary-expression before '>' token
auto pInfo = reinterpret_cast<SYSTEM_HANDLE_INFORMATION*>(pInfoBytes.data());
^
pwiz/utility/misc/Filesystem.cpp:301:81: error: expected ')' before ';' token
auto pInfo = reinterpret_cast<SYSTEM_HANDLE_INFORMATION*>(pInfoBytes.data());
^
)
make: *** [Makefile:5079: pwiz/utility/misc/Filesystem.lo] Error 1

Some errors seem to be specific of the MSWindows platform (FARPROC, I guess).

Could somebody be kind enough to help me find light here ?

Thank you for your attention,
Cheers,
Filippo

EDIT: I add a screen dump because the ^ location of the error is no more useful with the text pasted in this editor.

Inaccurate MS2 precursor value = center of isolation window

Setup

I am reporting a problem with MsConvert from Proteowizard version 3.0.19077-506c48e9c 64bit and would like to ask for help with this issue.

I am using Windows 10 64bit as operating system.

The MS data was acquired on a Bruker qTOF, calibrated with an internal standard.

Problem description

When exporting Bruker .d-files with MsConvert, the resulting open format files (mzML, mzXML) display imprecise MS2 precursor values that do not match the values in the corresponding MS1 scans:

Interestingly, the displayed precursor value of scan #2160 is exactly the centre of the isolation window.

However, in the original data, the values are calibrated and correct:

This behaviour can be be reproduced using the command-line version of MsConvert.

The paramerters used in both GUI and command line were:
PeakPicking: vendor, MS-level=1-2
Binary encoding precision: 32bit
Output format: mzML

Applying the precursorRecalculation filter in the command line version of msconvert was not successful:

C:\Program Files\ProteoWizard\ProteoWizard 3.0.19077.506c48e9c>msconvert.exe d:6132_6136_BD3_01_31134.d --32 --filter "peakPicking [vendor[msLevel=1-2]]" --filter "precursorRecalculation" -o d:
format: mzML
    m/z: Compression-None, 32-bit
    intensity: Compression-None, 32-bit
    rt: Compression-None, 32-bit
ByteOrder_LittleEndian
 indexed="true"
outputPath: d:
extension: .mzML
contactFilename:
runIndexSet:

spectrum list filters:
  peakPicking [vendor[msLevel=1-2]]
  precursorRecalculation

chromatogram list filters:

filenames:
  d:6132_6136_BD3_01_31134.d

processing file: d:6132_6136_BD3_01_31134.d
calculating source file checksums
Error writing run 1 in "6132_6136_BD3_01_31134.d":
[SpectrumList_PrecursorRecalculator] Mass analyzer not supported: time-of-flight

Interestingly, also Bruker's Compass Export tool produced the same problem (inaccurate MS2 precursor values).

I would appreciate any suggestions on how to solve this problem. I have no clue how to proceed from this point except for ignoring the inaccuracy.

The original file can be obtained upon request.

Thank you for your help!

Metadata errors with new RawFileReader MSConvert

MSConvert Version (x64): 3.0.18243

We just ran a test with the new MSConvert that uses RawFileReader, and noticed a few errors comparing older results vs. new results when converting to mzML. The peak data is the same, as is most of the metadata, but I'm going to add replies to this with the issues we have found.

To start off:

MS1 spectra have precursors displayed in the mzML, with default values for the items required under the precursorList. (This is on a data file from a QExactive HF with Xcalibur 2.8.1.2806, and doesn't happen with data from an LTQ Orbitrap Elite with Xcalibur 2.7.0 SP1)
HCD is being interpreted as ECD+CID. It appears to be caused by the pwiz Thermo ActivationType Enum using flag values and directly casting from the RawFileReader ActivationType enum, which uses sequential values (RawFileReader: HCD=5, RawFileTypes.h: CID=1, ECD=4)

MSConvert: Demultiplexing gives mzML and mzXML output that violate the schema/spec

Context

Brian Searle reported that OpenSWATH interprets the wrong isolation scheme of demuxed files. This appears to be because the selected ion m/z is not updated. Brian confirmed that manually updating the selected ion m/z values with centers corresponding to the synthetic demuxed windows fixes the import issue for OpenSWATH.

Here is an example of a line of the mzML file that should be updated:

<cvParam cvRef="MS" accession="MS:1000744" name="selected ion m/z" value="500.4774" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>

Brian provided an awk script written to repair his mzML:

#!/usr/bin/awk -f
{
	if ($0 ~ /<processingMethod order=\"2\">/) {
		# fix the missing softwareRef attribute in prism
		print "        <processingMethod order=\"2\" softwareRef=\"pwiz\">";

	} else if ($0 ~ /<spectrum index=/) {
		# fix the scan numbers so they are consecutive, remember to save them in ids[]
		current=$0;
		split($0, a, "\"");
		num=a[2]+1;
		id=a[4];
		split(id, b, "scan=");
		idpre=b[1];
		newid=idpre"scan="num;
		ids[id]=newid;
		print a[1] "\"" a[2] "\"" a[3] "\"" newid "\"" a[5] "\"" a[6] "\"" a[7];

	} else if ($0 ~ /accession=\"MS:1000827\"/) {
		# grab the correct scan center, save it as target
		split($0, a, "\"");
		target=a[8];
		print $0;

	} else if ($0 ~ /accession=\"MS:1000744\"/) {
		# insert the previous target to correct the incorrect center
		split($0, a, "\"");
		print a[1] "\"" a[2] "\"" a[3] "\"" a[4] "\"" a[5] "\"" a[6] "\"" a[7] "\"" target "\"" a[9] "\"" a[10] "\"" a[11] "\"" a[12] "\"" a[13] "\"" a[14] "\"" a[15];

	} else if ($0 ~ /<precursor spectrumRef=/) {
		# fix the scan index to reflect the new scan numbers
		# for some reason, these DO NOT include the "demux=0" tag. Thanks Austin!
		split($0, a, "\"");
		key=a[2]" demux=0";
		print a[1] "\"" ids[key] "\"" a[3];

	} else if ($0 ~ /<offset idRef=/) {
		# fix the scan index to reflect the new scan numbers
		split($0, a, "\"");
		print a[1] "\"" ids[a[2]] "\"" a[3];

	} else {
		print $0;
	}
}

Import of mzXML's also failed in other tools that do schema validation during import.

Problem

The demultiplexing filter fails to uphold the mzML and mzXML schema specs. Because pwiz uses mzML as it's internal model of data, fixing the mzML schema issues should be the first step. I expect this to fix the mzXML issues as well.

Potential Solution

Run an output file from demultiplexing through an mzML schema validator and fix any issues. Then verify that no issues remain by converting to mzXML and running through an mzXML schema validator.

License of the pwiz/utility/misc/pinned_gcroot.h file

Greetings,

currently packaging libpwiz for Debian, I would like to know what license governs the redistribution of the pwiz/utility/misc/pinned_gcroot.h file, apparently copyrighted to Microsoft. I need to list the various copyright holders of the files in the distribution for libpwiz to enter the Debian servers.

I wonder, however, if this file is needed at all in the GNU/Linux build environment ?

Thank you for your kind attention,
Sincerely
Filippo

running in wine in singularity - 'Caught unknown exception'

Hi team! I am struggling with a bit of a meta-meta problem - I want to run msconvert on a linux based HPC, that cannot run docker (fun to begin with.. its a permissions issue) and together with a colleague I have managed to install the docker container through singularity.
Now, I managed to get this command to work (once), but it does not anymore: when I try to run through singularity with a command akin to:

singularity exec -B /my/home /my/home/pwiz_singularity_container wine msconvert /my/home/data/file.raw -o /my/home/data/converted_files –-mzXML

I see the following error:

[msconvert] no files found matching "–-mzXML"
format: mzML
    m/z: Compression-None, 64-bit
    intensity: Compression-None, 32-bit
    rt: Compression-None, 64-bit
ByteOrder_LittleEndian
 indexed="true"
outputPath: /my/home/data/converted/file.mzXML
extension: .mzML
contactFilename:
runIndexSet:

spectrum list filters:

chromatogram list filters:

filenames:
/my/home/data/file.raw

processing file: 
/my/home/data/file.raw
[C:\pwiz\msconvert.exe] Caught unknown exception.
Please report this error to [email protected].
Attach the command output and this version information in your report:

ProteoWizard release: 3.0.19252 (aa45583de)
Build date: Sep  9 2019 07:06:22

Please excuse me for not showing the actual file paths, its a bit of a privacy issue.
Two problems - one: not recognising the --mzXML flag at all, it seems?
And two: the unknown exception.

I'd love to include logs but I am not sure where to find them. Should I be having certain environment settings up? Since I'm in a wrapper in a wrapper, I hope there's still a fix.. and I hope I just missed something simple and it's fixable!

Thanks so much :)
Kind regards,
Joanna

msconvert outputs help into stderr

When I try to run msconvert --help | less on Ubuntu most of the output is not captured by the pipe. It works only when I do msconvert --help 2> ~/msconvert.help
Not a bit deal, of course, but is a bit awkward, especially with the amount of information it outputs.

Documentation not up-to-date on sourceforge.net

Greetings,

I wanted to build my software with the latest pwiz version and I discovered that the API has changed with respect to the encoding/decoding features (BinaryDataEncoder class), with functions that have gone (the ones with std::vector & and std::string & params). I wanted to peruse the docs and I saw that these are not up-to-date.

At page http://proteowizard.sourceforge.net/dox/namespacepwiz_1_1util.html, for example, there is no doc for the BinaryData class, that is located at

pwiz/utility/misc/BinaryData.hpp in the source tree.

Sincerely,
Filippo

Compilation on linux

I am trying to compile it on Linux Ubuntu 16.04.
I am getting this messages at the end:
'''
...failed updating 48 targets...
...skipped 233 targets...
...updated 2939 targets...
At least one pwiz target failed to build.
'''
Is this expected? I don't see any binary files generated.

Creation of the autotools-based build starting from the bjam-based build system

Greetings,

I am currently working to streamline the autotools-based build of pwiz using the clever python-based "translation" of the bjam-based build log.

My experience is that the autotools-based build fails because the svm.h header is not found. I could fix that by modifying the python script file generate_autoconf.py to read like this:

	if ("libraries/boost_aux" in line) : # forward looking boostiness
		includes.add("libraries/boost_aux")
	if ("libraries/Eigen" in line) : #  make sure we ship Eigen and include it
		includes.add("libraries/Eigen")
	if ("libraries/CSpline" in line) : #  make sure we ship CSpline and include it
		includes.add("libraries/CSpline")
	if ("libraries/libsvm-3.0" in line) : #  make sure we ship libsvm-3.0 and include it
		includes.add("libraries/libsvm-3.0")

by addition of the the last two lines.

Then the build fails because BOOST_LDPATH is not defined, and the linker command line is erroneous:

/bin/sh ./libtool --tag=CXX --mode=link g++ -g -O2 -L -version-info 3:0:0 [.....]

which thus fails with the following error:

libtool: error: require no space between '-L' and '-version-info'

Apparently, the boost.m4 file does not fill in that information bit. I tried looking into that boost.m4 file, but I found it cryptic...

Sincerely,
Filippo

boost::filesystem::create_directories: Invalid argument

Thank you for the amazing tools.
I am trying to use msconvert on:

No LSB modules are available.    
Distributor ID: Ubuntu     
Description:    Ubuntu 18.04.1 LTS     
Release:        18.04     
Codename:       bionic

But I got the following error:

format: mzML 
    m/z: Compression-None, 64-bit
    intensity: Compression-None, 32-bit
    rt: Compression-None, 64-bit
ByteOrder_LittleEndian
 indexed="true"
outputPath: .
extension: .mzML
contactFilename: 

filters:
  
filenames:
  Library_fr1_nonlin_1hour_200518.mzML
  
boost::filesystem::create_directories: Invalid argument
Please report this error to [email protected].
Attach the command output and this version information in your report:

ProteoWizard release: 3.0.10827 (2017-5-11)
ProteoWizard MSData: 3.0.10827 (2017-5-11)
ProteoWizard Analysis: 3.0.10827 (2017-5-11)
Build date: Nov  1 2017 15:08:32

Could you please advise?

Support for Thermo TSQ Altis QQQ

MSConvert Version (x64): 3.0.18230.1e0a1ad53

Trying to convert a RAW file produced by a Thermo TSQ Altis QQQ I get the error:

[Reader_Thermo::fillInMetadata] unable to parse instrument model; please report this error 
to the ProteoWizard developers with this information: model(TSQ Altis) name(TSQ Altis); if 
want to convert the file anyway, use the ignoreUnknownInstrumentError flag

metadata.txt is the sanitized output from ThermoRawMetaDump.exe. This is SRM data, but in separate processing I have learned that the instrument method is for some reason unreadable by MSFileReader 3.0, but can be read by MSFileReader 3.1 (which is 64-bit only) and also by RawFileReader.

metadata.txt

Intensity scale changes of Sciex data after centroiding data

Hello,

When I use msconvert to convert TripleTOF 5600 Sciex data (GUI or command line, using a recent build) to mzML files the intensity scale seems to change to much lower values (several orders of magnitude):

Interestingly, the TIC/BPC values reported for each scan seem to be unaffected.

If I use qtofpeakpicker instead of vendor centroiding this doesn't occur. Does anyone else have experience with this?

Thanks,
Rick

RawFileReader update?

I’m not sure which version of Thermo’s RawFileReader is currently in use by ProteoWizard (I assume 4.0.26), but they do have 2 newer versions (4.0.89 and 5.0.6) available on their sharepoint site, with release notes mentioning unspecified “bug fixes in the libraries” and a change to a single set of binaries (compared to the 3 previous sets for each of Windows, MacOS, and Linux).

Error building on Windows

Hello, I am trying to build pwiz on Windows 10 with Visual Studio 2019 Community Edition installed. When I run quickbuild.bat, it terminates early with the following error. Am I missing something obvious? I usually work on Linux/Unix so please forgive me if there is something really obvious that I need to do prior to building.

Thanks very much for your help!

// greg //

tar.list expat-2.0.1.tar.bz2
tar.extract expat-2.0.1.tar.bz2 (already extracted)
[pwiz_tools/BiblioSpec] Changing code page to UTF-8 (65001)
tar.list inputs.tar.bz2
tar.extract inputs.tar.bz2 (already extracted)
error: Name clash for '<pC:\Users\ghassett\pwiz\build-nt-x86\pwiz_tools\BiblioSpec\src\msvc-14.2\release\asynch-exceptions-on\link-static\threading-multi>BlibBuild.exe'
error:
error: Tried to build the target twice, with property sets having
error: these incompatible properties:
error:
error:     -  none
error:     -  <assembly>object(file-target)@1290 <define>PWIZ_READER_ABI_T2D
error:
error: Please make sure to have consistent requirements for these
error: properties everywhere in your project, especially for install
error: targets.

Build finished at 17:50:40.77
Elapsed time: 0:0:46

"mode" attribute in the audit log breaks Panorama upload

I tried to run a random Skyline tutorial (Small Molecules) and upload the resulting file into Panorama and it breaks on a new audit_log_entry attribute. In this case the attribute is mode for the small molecule document, but the general problem is that Skyline has no well-defined schema for the audit log structure it produces while Panorama expects the log file to conform to a schema because the audit log class database structure depend on it.
We need either to limit such additions/modifications to a necessary minimum or revise the way Panorama stores the audit log.

Keeping metadata from Thermo .raw files in .mzML conversion?

Hi there,

I've noticed that Thermo .Raw files have (in a binary format), useful metadata like the sequence (.sld) information associated with the sample injection, and the used instrument method, as well as metadata link the instrument parameters (e.g. Turbopump speed) over the course of the run. This metadata does not make it into a MSConvert converted .mzML file. Would it be possible for MSConvert to maintain this metadata to the best of its ability in the conversion to .mzML format? The sample injection volume & the instrument method used would be especially useful pieces of metadata to transfer. If it is any help to evaluate how difficult implementing this would be, here is a 3rd-party project which interfaces with the proteowizard code to pull out the instrument-related metadata from .raw files:
https://bitbucket.org/proteinspector/imondb/wiki/ThermoCompiling

All the best,
-Tim

MSConvert freezes up when using the parameter: DefaultArrayLength 30-

when including "Subset: Number of Data Points: 30-" (DefaultArrayLength 30-) the software never goes past the "processing step". Either this significantly slows down MSConvert or it freezes, not sure which.

I have tried the parameter on thermo and waters files

PeakPicking Vendor files & empty scan

Dear Developers,

thank you very much for your amazing job. I have some problems running msconvert using windows command line, I hope you can help me and point out what I am doing wrong.

I have a thermo .raw file acquired on Thermo Fusion Lumos instrument. It containts MS2 and SPS-MS3 spectra in profile mode and the goal is to convert it to .mgf file format.

I am using Windows 10 command line and ProteoWizard release: 3.0.19248 (37b2e98)
Build date: Sep 5 2019 21:34:03

running

msconvert test_file.raw --mgf --filter "peakPicking vendor" --filter "titleMaker <RunId>.
<ScanNumber>.<ScanNumber>.<ChargeState>"

gives an error message while writing the output file:

[SpectrumWorkerThreads::work] error in thread: [SpectrumList_Thermo::spectrum()] Error
retrieving spectrum "controllerType=0 controllerNumber=1 scan=29316"
[RawFileThreadImpl::getMassList()] failed to centroid scan

The output file is created, however only scans up to 29315 are returned, the further scans are missing. The command line freezes and does not return the prompt back.
However, the first scan reported looks normal:

BEGIN IONS
TITLE=test_file.628.628.2
RTINSECONDS=2960.344929
PEPMASS=1220.495361328125 102622.8203125
CHARGE=2+
145.0496674 3544.6081542969
147.7623596 1837.2149658203
156.7328644 1640.818359375
156.8530579 1448.8192138672
156.9769287 1674.1666259766
158.9644928 3548.841796875
163.0605774 10220.19140625
166.9441833 2255.6904296875
[not showing all ions]
1220.99585 34895.9140625
1221.495728 28782.04296875
1221.975342 8339.6279296875
1305.610474 2594.3481445313
1629.73645 3305.6770019531
1630.727905 3509.3596191406
1953.827148 2065.6315917969
END IONS

The scan=29316 looks empty when I check it in Thermo Xcalibur browser. I am not sure why it was created and not filtered out.

Using msconvert GUI

it runs through without an issue, the output file is created, but the result is rather strange:

BEGIN IONS
test_file.628.628.2 File:"test_file.raw", NativeID:"controllerType=0 controllerNumber=1 scan=628"
RTINSECONDS=2960.344929
PEPMASS=1220.495361328125 377777.30212399998
CHARGE=2+
130.6846985 0.0
130.6863028 0.0
130.6879071 0.0
145.0370789 0.0
145.0389546 0.0
145.0408303 0.0
145.0502095 3478.3859863281
145.0595887 0.0
145.0614649 0.0
145.0633411 0.0
147.7511843 0.0
147.7531129 0.0
147.7550416 0.0
147.7627565 1817.9001464844
147.770469 0.0
147.7723979 0.0
[...]
158.9494898 0.0
158.9516417 0.0
158.9537938 0.0
158.9645544 3546.9838867188
158.9757711 0.0
158.9779236 0.0
1.968864375e-312 0.0
0 0.0
0 0.0
0 0.0
0 0.0
0 0.0
0 0.0
0 0.0
[...]
0 0.0
0 0.0
0 0.0
0 0.0
END IONS

First, there are flanking 0 ions around each centroided peak, second, the ions above ~160 m/z are missing or reporterd as 0
The troublesome scan 29316 and all scans after it are reported:

TITLE=test_file.29316.29316.2 File:"test_file.raw", NativeID:"controllerType=0 controllerNumber=1 scan=29316"
RTINSECONDS=13276.56953
PEPMASS=1878.165405273438
CHARGE=2+
130.6847357 0.0
130.68634 0.0
130.6879443 0.0
9.911105182e-307 0.0
0 0.0
0 0.0
0 0.0
0 0.0
0 0.0
0 0.0
0 0.0
0 0.0
0 0.0
0 0.0
0 0.0
0 0.0
0 0.0
0 0.0
0 0.0
0 0.0
0 0.0
0 0.0
0 0.0
0 0.0
0 0.0
0 0.0
0 0.0
0 0.0
0 0.0
0 0.0
0 0.0
END IONS

running
msconvert test_file.raw --mgf --filter "peakPicking cwt" --filter "titleMaker <RunId>.<ScanNumber>.<ScanNumber>.<ChargeState>"

creates and output very similar to the one from GUI, but with fewer 0 reported, so the output is much smaller in size.

adding zeroSamples to eihter cmd or gui results in a following error:
--filter "zeroSamples removeExtra"

[SpectrumList_ZeroSamplesFilter] Error filtering intensity data: managed and native storage have different sizes

So far I feel myself lost and cannot explain the different output I get from gui and cmd and how I can proceed further. Am I missing something when running msconvert in command line? I would appreciate any help!

Thank you very much in advance,
Ivan

Version releases

We received a request to install pwiz in the software stack of our HPC cluster. For reproducibility reasons we require all the installed software releases to be identifiable through a version number.
Do you have any plan to start releasing versions of your source code through GitHub? I am sure that this would definitely benefit not just us but the user community at large.

MSConvert: SHA1 hashing excessive memory usage

Context

I came across this issue while processing large (>50Gb) mzML files for profile demultiplexing. Updating to a newer SHA1 library version, which does not include memory mapping, seemed to fix the problem.
See #332 (comment)
However, we would like to keep the memory mapping functionality for it's speed benefits.

Problem

File conversion of my large files failed after all of my physical memory would be consumed during the hashing step. The SHA1 implementation uses memory mapping to load the entire file into virtual memory, but this memory should be released as the file is hashed. This doesn't seem to be the case as all physical memory will be consumed if the file size is roughly greater than the available memory.

Possible solution

Identify where memory is not being released and fix in the SHA1 library:
https://github.com/ProteoWizard/pwiz/blob/62eb002bed1f169cd1d3a73b9885bab31e7f4f69/pwiz/utility/misc/SHA1.cpp
https://github.com/ProteoWizard/pwiz/blob/62eb002bed1f169cd1d3a73b9885bab31e7f4f69/pwiz/utility/misc/SHA1.h

Quameter error when first spectrum is not MS1

We run Quameter in automation and it generally works well. However, we occasionally get a Thermo .raw file where the first spectrum (or spectra) is MS2, then later an MS1 spectrum appears, followed by more MS2 spectra. For data files that start with an MS2 spectrum, Quameter reports
Error processing ID-free metrics: error reading spectrum index 0 (No MS1 spectrum found before controllerType=0 controllerNumber=1 scan=1)

I found a workaround -- update the default filter to include a scan range:
-SpectrumListFilters: "peakPicking true 1-;threshold absolute 0.00000000001 most-intense; scanNumber [14,1000000]"

However, it would be nice if Quameter could automatically skip the initial MS2 spectra and resume normal operation once an MS1 spectrum is found. If the entire file is MS2 spectra, it should report an error. Is this something that could be added?

MSConvert: MS2Deisotope annotate charge

When using the MS2Deisotope filter in MSConvert, would it be possible to add an inferred charge array to the .mzML file to annotate the resulting peaks?

And maybe a side question: Which are the recommended parameters for this filter for Orbitrap data?

Issues compiling on Windows with VS2013

When working on #165 in August, I was able to build the master branch without issues on VS2013. Now that @chambm has added 64 bit support for Thermo RAW files using RawFileReader, I'd like to pick my PR back up and finish. However, I can no longer build. I am getting a ton of noexcept errors like the following:
.\pwiz/utility/misc/Filesystem.hpp(75) : error C3646: 'noexcept' : unknown override specifier

A quick look seems to indicate that this keyword wasn't supported on Windows until VS2015 (https://msdn.microsoft.com/en-us/library/wfa0edys.aspx?f=255&MSPPError=-2147217396). Has the preferred Windows build setup changed? I am running the following command from PowerShell.

.\quickbuild.bat -j4 --toolset=msvc-12.0 --i-agree-to-the-vendor-licenses address-model=64 msconvert msbenchmark

Thanks,
Ryan

	// negate unknownInstrumentIsError value since command-line parameter (ignoreUnknownInstrumentError) and the Config parameters use inverse semantics
	config.unknownInstrumentIsError = !config.unknownInstrumentIsError;

	xmlWriter.startElement("dataProcessing", attributes);

	BOOST_FOREACH(const ProcessingMethod& pm, dpPtr->processingMethods)
	{
	CVParam fileFormatConversion = pm.cvParamChild(MS_file_format_conversion);

	string softwareType = fileFormatConversion.empty() ? "processing" : "conversion";

	if (pm.softwarePtr.get())
	writeSoftware(xmlWriter, pm.softwarePtr, msd, cvTranslator, softwareType);

	write_processingOperation(xmlWriter, pm, MS_file_format_conversion);
	write_processingOperation(xmlWriter, pm, MS_peak_picking);
	write_processingOperation(xmlWriter, pm, MS_deisotoping);
	write_processingOperation(xmlWriter, pm, MS_charge_deconvolution);
	write_processingOperation(xmlWriter, pm, MS_thresholding);

	xmlWriter.pushStyle(XMLWriter::StyleFlag_InlineInner);
	BOOST_FOREACH(const UserParam& param, pm.userParams)
	{
	xmlWriter.startElement("comment");
	xmlWriter.characters(param.name + (param.value.empty() ? string() : ": " + param.value));
	xmlWriter.endElement(); // comment
	}
	xmlWriter.popStyle();
	}

	xmlWriter.endElement(); // dataProcessing

proteowizard / pwiz Goto Github PK

pwiz's Introduction

Features

Official build status

Unofficial toolsets

pwiz's People

Contributors

Stargazers

Watchers

Forkers

pwiz's Issues

Context

Problem

Setup

Problem description

Context

Problem

Potential Solution

Context

Problem

Possible solution

Recommend Projects

Recommend Topics

Recommend Org