Giter Site home page Giter Site logo

nf-core / quantms Goto Github PK

View Code? Open in Web Editor NEW
26.0 65.0 29.0 22.05 MB

Quantitative mass spectrometry workflow. Currently supports proteomics experiments with complex experimental designs for DDA-LFQ, DDA-Isobaric and DIA-LFQ quantification.

Home Page: https://nf-co.re/quantms

License: MIT License

HTML 0.85% Python 22.58% Nextflow 51.75% Groovy 8.97% R 15.85%
nf-core nextflow workflow pipeline proteomics mass-spectrometry proteogenomics tmt lfq mass-spec

quantms's People

Contributors

daichengxin avatar drpatelh avatar fabianegli avatar jpfeuffer avatar jspaezp avatar mashehu avatar nf-core-bot avatar sminot avatar wanghong007 avatar ypriverol avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

quantms's Issues

proteomicsLFQ removes features with few measurements from MSstats input

Description of the bug

I run quantms v1.1.1 for a DDA LFQ experiment with the option "msstatslfq_removeFewMeasurements": false to keep features with 2 measurements but proteomicsLFQ removes them from the *_msstats_in.csv" file anyway. I want to keep features with 2 measurements since I have only 3 replicates per condition in this experiment.

The msstats.log (attached) says both that:

  • Features with less than 3 measurements across runs will be kept.
    and that
    INFO [2023-05-26 10:45:53] ** Features with one or two measurements across runs are removed.

I tried to export the sdrf_openms_design_msstats_in_comparisons.csv into MSstats to ensure these features are not removed but noticed that proteomicsLFQ has already removed them.
data <- read.csv("proteomicslfq/sdrf_openms_design_msstats_in.csv", header = TRUE, sep = ',')
data <- data %>% filter(!grepl('CONTAMINANT_', ProteinName))
data_msstats <- OpenMStoMSstatsFormat(data, useUniquePeptide = TRUE, removeFewMeasurements=FALSE, removeProtein_with1Feature = FALSE, summaryforMultipleRows = max, use_log_file = FALSE)
data_msstats %>% group_by(PeptideSequence, PrecursorCharge) %>% summarise(RunCount = sum(is.na(Intensity))) %>% ungroup() %>% count(RunCount) %>% mutate(Frequency = n / sum(n))
RunCount n Frequency

1 3 106 0.0484
2 4 368 0.168
3 5 639 0.292
4 6 1077 0.492

multiQC
msstats.log

Command used and terminal output

No response

Relevant files

No response

System information

Nextflow version 22.10.6
Hardware Desktop
Executor local
Container engine: Docker
OS Linux
Version of nf-core/quantms 1.1.1

Clarify/Unify exported scores

Description of feature

In the documentation we should explain when which score will be exported.
And maybe unify the scores in the sense that we should always perform an FDR calculation in PLFQ even if the cutoff is set to 1.0. (Currently calculation is omitted and actually 1.0 is fixed since max_psm_fdr does not seem to be passed to PLFQ).
This can be a bit confusing since then you might have some q-values from the per-file FDR calculation in there. Maybe we should also call the scores different then (e.g. per-file q-value, experiment-wide q-value).

test configs

Description of the bug

Missing test config

Hello, I wanted to mention that this pipeline is missing the standard test configuration. It would be great to include a test profile since it is an important part of all nf-core pipelines and it would help users to test the pipeline before actually using it with their own data.

Errors

I then wanted to test the pipeline with two of the existing test profiles and there I encountered some errors.

  1. test_dia
    the command that I used:
    nextflow run nf-core/quantms -profile test_dia,cfc --outdir results_dia

The test produced an output but I think in the msstates_comparisons.tsv the column names do not match with the actual column values. The column names might be in the wrong order. I attached the output tsv.

  1. test_lfq
    the command that I used:
    nextflow run nf-core/quantms -r 1.0 -profile test_lfq,cfc --outdir results_lfq

The test_lfq didn't run through and gave the following error report:
nextflow_tower_error

Thanks in advance :) !

Command used and terminal output

No response

Relevant files

msstats_comparisons.csv

System information

No response

SAGE run in a failing job

Description of the bug

When SAGE is run after a failing execution, it give the following error:

Command wrapper:
  ln: failed to create symbolic link '01524_A01_P015424_S00_N01_R1.mzML': File exists

Work dir:
  /hps/nobackup/juan/pride/reanalysis/absolute-expression/tissues/PXD010154/work/51/e706bd1fd9ebdf269ed3c6d378a3c6

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run

Command used and terminal output

No response

Relevant files

No response

System information

No response

Allow passing through `raw` files if DIA is used

Description of feature

Currently we are always converting to mzML although some tools allow reading raw files (e.g. DIANN and I think even Comet)
We only need to make sure that the mzml statistics module can also read raw files. There are Thermo raw parsing libraries for python out there.
Or we convert just for QC while DIANN is running and discard intermediate results.

Problems with OpenMS-style experimental design input

Description of the bug

Following advice from @jpfeuffer, I'm making the switch from the "nf-core/proteomicslfq" workflow to "nf-core/quantms" (dev version in both cases). I'm surprised to find that my "experimental_design.tsv" file that works for "proteomicslfq" doesn't work for "quantms". Contents of the file are included below.

If I name the file "experimental_design.tsv" and place it in the directory where I run the workflow, the file gets cleared (!) and I get the first error message (see below). The cause appears to be the sed command that attempts to change the file in-place.

If I give the file a different name (e.g. "experimental_design_test.tsv"), I get the second error message (see below).

Command used and terminal output

$ nextflow run nf-core/quantms -r dev -profile docker --input experimental_design.tsv --database ../Human_reference_proteome_TD.fasta
[...]
Error executing process > 'NFCORE_QUANTMS:QUANTMS:CREATE_INPUT_CHANNEL:PREPROCESS_EXPDESIGN'

Caused by:
  Process `NFCORE_QUANTMS:QUANTMS:CREATE_INPUT_CHANNEL:PREPROCESS_EXPDESIGN` terminated with an error exit status (1)

Command executed:

  sed 's/.raw\t/.mzML\t/I' experimental_design.tsv > experimental_design.tsv
  a=$(grep -n '^$' experimental_design.tsv | head -n1| awk -F":" '{print $1}'); sed -e ''"${a}"',$d' experimental_design.tsv > process_experimental_design.tsv

Command exit status:
  1

Command output:
  (empty)


$ nextflow run nf-core/quantms -r dev -profile docker --input experimental_design_test.tsv --database ../Human_reference_proteome_TD.fasta
[...]
Cannot invoke method contains() on null object

 -- Check script '/home/hendrik.weisser/.nextflow/assets/nf-core/quantms/./workflows/../subworkflows/local/create_input_channel.nf' at line: 137 or see '.nextflow.log' file for more details

Relevant files

experimental_design.tsv:

Fraction_Group	Fraction	Spectra_Filepath	Label	Sample
1	1	/home/hendrik.weisser/Data/Proteomics/SST/mzML_files/20220223e_JR_SST_02.mzML	1	1
2	1	/home/hendrik.weisser/Data/Proteomics/SST/mzML_files/20220228a_JR_SST_02.mzML	1	2
3	1	/home/hendrik.weisser/Data/Proteomics/SST/mzML_files/20220301a_JR_SST_03.mzML	1	3
4	1	/home/hendrik.weisser/Data/Proteomics/SST/mzML_files/20220301a_JR_SST_05.mzML	1	4

Sample	MSstats_Condition	MSstats_BioReplicate
1	before	1
2	before	2
3	after	1
4	after	2

System information

Nextflow version: 21.10.6
nf-core/quantms revision: 63dd266 [dev]
OS: Ubuntu 20.04.4 LTS

Tests for the Python scripts in `./bin`

Description of feature

It would be great to have tests for all the quantms pipeline's Python code.

The automated tests would make the development much more enjoyable and have the benefit to speed up development.

There are a couple of considerations:

  • Which test framework to use? I think pytest is the logical choice.
  • How to organize tests? Tests in the same file? Tests in a separate file per script, but in the same folder? tests folder in ./bin/?

Some example of tests for ./bin/diann_convert.py

"""Tests for test_diann_convert.py
"""

from diann_convert import calculate_coverage

def test_calculate_coverage_multiple_incomplete():
    assert calculate_coverage("WATEROVERTHEDUCKSBACK", {"WATER", "DUCK"}) == 0.42857142857142855
def test_calculate_coverage_repeated_complete():
    assert calculate_coverage("DUCKDUCKDUCK", {"DUCK"}) == 1.0
def test_calculate_coverage_incomplete():
    assert calculate_coverage("WATEROVERTHEDUCK", {"DUCK"}) == 0.25
def test_calculate_coverage_overlap_partial_complete():
    assert calculate_coverage("WATER", {"WAT", "TER"}) == 1.0
def test_calculate_coverage_overlap_partial_incomplete():
    assert calculate_coverage("WATERGLASS", {"WAT", "TER"}) == 0.5

NB: the tests require all dependencies of the scripts and hence can not be run on all infrastructure. e.g. there is no openms version for MacOS. So tests might have to be run in containers or some other solution would have to be found.

The run-command needs to be updated in documentation

Description of the bug

The commands in the documentation on nf-co.re and in this repository are wrong.

On the banner on the right-hand side of https://nf-co.re/quantms, it says this pipeline can be run with:

$ nextflow run nf-core/quantms -r 1.0 -profile test --outdir ../outdir
N E X T F L O W  ~  version 22.10.4
Unknown configuration profile: 'test'

Did you mean one of these?
    oist

In the README.md in this repository, it says it can be run with:

$ nextflow run nf-core/quantms -profile test,singularity --input project.sdrf.tsv --database protein.fasta --outdir ../outdir
N E X T F L O W  ~  version 22.10.4
Project `nf-core/quantms` is currently stickied on revision: 1.0 -- you need to explicitly specify a revision with the option `-r` in order to use it

Passing the -r makes it fail like the previous one.

Perhaps going forward the config should assign one of the test_* as test, which is what other workflows with multiple tests do.

URGENT: pin nf-validation version

Description of the bug

To prevent breaking this pipeline in the near future, the nf-validation version should be pinned to version 1.1.3 like:

plugins {
    id '[email protected]'
}

Command used and terminal output

No response

Relevant files

No response

System information

No response

Can't assign 4 names to a 0 column data.table MSStats errror

Description of the bug

I get the following error when running quant.ms using an sdrf.tsv file:

Error in setnames(x, value) :
Can't assign 4 names to a 0 column data.table
Calls: OpenMStoMSstatsFormat ... colnames<- -> names<- -> names<-.data.table -> setnames
Execution halted

Command error:
Loading required package: MSstats

Attaching package: ‘MSstats’

The following object is masked from ‘package:grDevices’:

  savePlot

Loading required package: tibble
Loading required package: data.table
INFO [2023-09-01 12:20:06] ** Raw data from OpenMS imported successfully.
INFO [2023-09-01 12:20:06] ** Raw data from OpenMS cleaned successfully.
INFO [2023-09-01 12:20:06] ** Using annotation extracted from quantification data.
INFO [2023-09-01 12:20:06] ** Run labels were standardized to remove symbols such as '.' or '%'.
INFO [2023-09-01 12:20:06] ** The following options are used:
- Features will be defined by the columns: PeptideSequence, PrecursorCharge, FragmentIon, ProductCharge
- Shared peptides will be removed.
- Proteins with a single feature will be removed.
- Features with less than 3 measurements across runs will be removed.
INFO [2023-09-01 12:20:06] ** Features with all missing measurements across runs are removed.
INFO [2023-09-01 12:20:06] ** Shared peptides are removed.
INFO [2023-09-01 12:20:07] ** Multiple measurements in a feature and a run are summarized by summaryforMultipleRows: max
INFO [2023-09-01 12:20:07] ** Features with one or two measurements across runs are removed.
INFO [2023-09-01 12:20:07] Proteins with a single feature are removed.
INFO [2023-09-01 12:20:07] ** Run annotation merged with quantification data.
INFO [2023-09-01 12:20:07] ** Multiple fractionations exist: 36 fractionations per MS replicate.
INFO [2023-09-01 12:20:27] ** Features with one or two measurements across runs are removed.
INFO [2023-09-01 12:20:27] ** Fractionation handled.
Error in setnames(x, value) :
Can't assign 4 names to a 0 column data.table
Calls: OpenMStoMSstatsFormat ... colnames<- -> names<- -> names<-.data.table -> setnames
Execution halted

Work dir:
/home/kcoetzer/nf/nf-core-quantms-1.1.1/workflow/work/76/86b9b6fe2a2b16cf1173bf95dc7275

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named .command.sh

Execution cancelled -- Finishing pending tasks before exit

Command used and terminal output

NXF_VER=22.10.1 nextflow run main.nf --outdir /home/kcoetzer/nf/nf-core-quantms-1.1.1/ -profile singularity -params-file /home/kcoetzer/nfparams_quantms.json -resume

Relevant files

PXD010154.sdrf_openms_design_msstats_in.csv
PXD010154.sdrf.csv
nfparams.txt

System information

Hardware: local server
Container: singularity
Version of nf-core/quantms: 1.1.1

Add more QC metrics for pmultiQC and use mzQC files

Description of feature

My plan is to use OpenMS' QCCalculator additionally in (almost) each step to create small mzQC files with additional summaries.
Those mzQC files should contain only stuff that cannot be read from the final mzTab.
This would also allow skipping the copying of the input mzMLs to the pmultiqc step since it just needs to read the already summarized data in the mzQC.

Please list places and metrics that we need to extract in the comments @ypriverol @timosachsenberg

MzMLs (run QCCalculator during mzML Indexing step?):

  • Export all metrics that our QC classes can do
  • Export number of spectra per file

idXMLs (per Search engines):

  • score distributions target vs decoy
    • Which scores to export?
    • Best hit only?
    • histogram or full density?
  • nr targets vs decoys
  • hits per psm?

idXMLs (after Perc/IDPEP):

  • target vs decoy distribution again

idXMLs (after consensusID):

  • overlap between search engines (e.g. 2D plot for every pair of search engines)
  • histogram of number of times a psm was identified with same, with different, ...
  • nr targets vs decoys
  • hits per psm?

idXMLs (after filtering):

  • do we need anything here?

idXMLs (after inference):

  • see #27
  • depends a bit on the order of FDR filtering if this can be inferred by comparing the mzTab with the raw IDs per file (but currently we do FDR filter before quantification, therefore it indeed might be helpful to know if a protein is missing because of filtering after inference or because of missing quant data
  • in any case, we need that information since we per-default also filter out decoys and a target-decoy score distribution plot would be helpful for proteins as well.
  • for TMT the inference idXML is easily accessible

features:

  • since we only generate features internally for ProteomicsLFQ, we must export summarized feature QC metrics during execution (or write out the temporary featureXMLs even without debug mode).
  • for TMT this does not really exist because the "consensus" features are not really 2D features

consensus features:

  • is there anything important that is not available in the mzTab?

Running this on Kubernetes or docker

Hi,

I would like to install quantms on Kubernetes service on AWS. Are there any instructions on how can I get started with this if it is possble? I do not find any documentation on how to install all the components in AWS env or on any cloud.

Config loaded twice

Description of the bug

@drpatelh reported this in Slack

https://nfcore.slack.com/archives/C02Q3FL29PD/p1661255290163269

Hola! Just noticed you have a bug in the dev code where you are trying to load the same config twice:

quantms/nextflow.config

Lines 291 to 293 in 6ffb0c9

// Load module config after profile, so they can depend on overwritten input parameters specific for each profile.
// Load modules.config for DSL2 module specific options
includeConfig 'conf/modules.config'

quantms/nextflow.config

Lines 337 to 338 in 6ffb0c9

// Load modules.config for DSL2 module specific options
includeConfig 'conf/modules.config'

You should keep the one defined later in the config.

Command used and terminal output

No response

Relevant files

No response

System information

No response

Move most modules from local into nf-core

Description of feature

Especially the ones from the openms/subfolder. So they can be reused for other pipelines.
They should be mostly nf-core compatible already but might require some tests and minor adaptions.

Error fixed: file not found: 'MSGFPlus.jar' in google cloud

Description of the bug

Error executing process > 'NFCORE_QUANTMS:QUANTMS:LFQ:ID:DATABASESEARCHENGINES:SEARCHENGINEMSGF (BSA2_F2)'

Caused by:
Process NFCORE_QUANTMS:QUANTMS:LFQ:ID:DATABASESEARCHENGINES:SEARCHENGINEMSGF (BSA2_F2) terminated with an error exit status (1)

Command executed:

MSGFPlusAdapter
-protocol automatic
-in BSA2_F2.mzML
-out BSA2_F2_msgf.idXML

-threads 2
-java_memory 6144 \
-database "18Protein_SoCe_Tr_detergents_trace_target_decoy.fasta"
-instrument high_res
-matches_per_spec 1
-min_precursor_charge 2
-max_precursor_charge 4
-min_peptide_length 6
-max_peptide_length 40
-max_missed_cleavages 2
-isotope_error_range 0,1
-enzyme "Trypsin/P"
-tryptic fully
-precursor_mass_tolerance 5
-precursor_error_units ppm
-fixed_modifications 'Carbamidomethyl (C)'
-variable_modifications 'Oxidation (M)'
-max_mods 3
-PeptideIndexing:IL_equivalent
-PeptideIndexing:unmatched_action warn
-debug 0

2>&1 | tee BSA2_F2_msgf.log

cat <<-END_VERSIONS > versions.yml
"NFCORE_QUANTMS:QUANTMS:LFQ:ID:DATABASESEARCHENGINES:SEARCHENGINEMSGF":
MSGFPlusAdapter: $(MSGFPlusAdapter 2>&1 | grep -E '^Version(.)' | sed 's/Version: //g' | cut -d ' ' -f 1)
msgf_plus: $(msgf_plus 2>&1 | grep -E '^MS-GF+ Release.
')
END_VERSIONS

Command exit status:
1

Command output:
Input file 'MSGFPlus.jar' could not be found (by searching on PATH). Either provide a full filepath or fix your PATH environment!
Error: File not found (the file 'MSGFPlus.jar' could not be found)

Command used and terminal output

No response

Relevant files

No response

System information

No response

DIA test data SDRF file is missing required column

Description of the bug

I was trying to run the test data set using test_full_data.config and got an error saying there was a missing required column in SDRF file. It needs a characteristics[biological replicate] column.

I posted in the nf-core/test-data slack channel and they suggested I make the bug report here first.

Command used and terminal output

Error message:

 The following columns are mandatory and not present in the SDRF: characteristics[biological replicate] -- ERROR
  The column characteristics[biological replicate] is not present in the SDRF -- ERROR
  There were validation errors!


### Relevant files

_No response_

### System information

_No response_

Misspelled "search_engines" argument runs workflow without search

Description of the bug

Not a huge problem, but still worth fixing in my opinion: If an invalid value is passed to the "search_engines" parameter (e.g. "mgsf" instead of "msgf"), the workflow will run and "complete successfully", but no searches or subsequent steps will be performed.
Aborting with an informative error message (e.g. "Error: invalid value for parameter 'search_engines': mgsf") would be more useful in such a case.
It may be worth checking that all parameters that require specific values are validated to avoid unexpected results.

Command used and terminal output

$ nextflow run nf-core/quantms -r dev -profile docker --input ../experimental_design_mzML.tsv --database ../Human_reference_proteome_TD.fasta --search_engines mgsf --outdir results_test --labelling_type "label free sample" --acquisition_method dda
[...]
executor >  local (13)
[a6/835132] process > NFCORE_QUANTMS:QUANTMS:INPUT_CHECK:SAMPLESHEET_CHECK                                      [100%] 1 of 1 ✔
[c3/d7052e] process > NFCORE_QUANTMS:QUANTMS:CREATE_INPUT_CHANNEL:PREPROCESS_EXPDESIGN                          [100%] 1 of 1 ✔
[-        ] process > NFCORE_QUANTMS:QUANTMS:FILE_PREPARATION:THERMORAWFILEPARSER                               -
[f9/21a528] process > NFCORE_QUANTMS:QUANTMS:FILE_PREPARATION:MZMLINDEXING (20220301a_JR_SST_05)                [100%] 10 of 10 ✔
[-        ] process > NFCORE_QUANTMS:QUANTMS:TMT:ID:PSMRESCORING:EXTRACTPSMFEATURES                             -
[-        ] process > NFCORE_QUANTMS:QUANTMS:TMT:ID:PSMRESCORING:PERCOLATOR                                     -
[-        ] process > NFCORE_QUANTMS:QUANTMS:TMT:ID:PSMFDRCONTROL:IDSCORESWITCHER                               -
[-        ] process > NFCORE_QUANTMS:QUANTMS:TMT:ID:PSMFDRCONTROL:IDFILTER                                      -
[-        ] process > NFCORE_QUANTMS:QUANTMS:TMT:FEATUREMAPPER:ISOBARICANALYZER                                 -
[-        ] process > NFCORE_QUANTMS:QUANTMS:TMT:FEATUREMAPPER:IDMAPPER                                         -
[-        ] process > NFCORE_QUANTMS:QUANTMS:TMT:FILEMERGE                                                      -
[-        ] process > NFCORE_QUANTMS:QUANTMS:TMT:PROTEININFERENCE:PROTEININFERENCER                             -
[-        ] process > NFCORE_QUANTMS:QUANTMS:TMT:PROTEININFERENCE:IDFILTER                                      -
[-        ] process > NFCORE_QUANTMS:QUANTMS:TMT:PROTEINQUANT:IDCONFLICTRESOLVER                                -
[-        ] process > NFCORE_QUANTMS:QUANTMS:TMT:PROTEINQUANT:PROTEINQUANTIFIER                                 -
[-        ] process > NFCORE_QUANTMS:QUANTMS:TMT:PROTEINQUANT:MSSTATSCONVERTER                                  -
[-        ] process > NFCORE_QUANTMS:QUANTMS:LFQ:ID:PSMRESCORING:EXTRACTPSMFEATURES                             -
[-        ] process > NFCORE_QUANTMS:QUANTMS:LFQ:ID:PSMRESCORING:PERCOLATOR                                     -
[-        ] process > NFCORE_QUANTMS:QUANTMS:LFQ:ID:PSMFDRCONTROL:IDSCORESWITCHER                               -
[-        ] process > NFCORE_QUANTMS:QUANTMS:LFQ:ID:PSMFDRCONTROL:IDFILTER                                      -
[-        ] process > NFCORE_QUANTMS:QUANTMS:LFQ:PROTEOMICSLFQ                                                  -
[-        ] process > NFCORE_QUANTMS:QUANTMS:LFQ:MSSTATS                                                        -
[-        ] process > NFCORE_QUANTMS:QUANTMS:DIA:DIANNCFG                                                       -
[-        ] process > NFCORE_QUANTMS:QUANTMS:DIA:LIBRARYGENERATION                                              -
[-        ] process > NFCORE_QUANTMS:QUANTMS:DIA:DIANNSEARCH                                                    -
[-        ] process > NFCORE_QUANTMS:QUANTMS:DIA:DIANNCONVERT                                                   -
[c4/6c40ab] process > NFCORE_QUANTMS:QUANTMS:CUSTOM_DUMPSOFTWAREVERSIONS (1)                                    [100%] 1 of 1 ✔
[-        ] process > NFCORE_QUANTMS:QUANTMS:SUMMARYPIPELINE                                                    -
-[nf-core/quantms] Pipeline completed successfully-

Relevant files

No response

System information

Nextflow version: 21.10.6
nf-core/quantms revision: 1193c6e [dev]
OS: Ubuntu 20.04.4 LTS

[Discussion] modification specification propagation

Hello! I was hoping we could discuss why the lookup of a modification is using the 'NT' name when an unimod accession is available. It sincerely feels like the human readable name should be a fallback or at least checked for consistency when validating the input SDRF.

LMK what you think!

Mod specified as:

NT=Carbamidomethylation; MT=Fixed; TA=C; AC=Unimod:4
I am assuming right now that it should have been specified as its PSI name:
NT=Carbamidomethyl; MT=Fixed; TA=C; AC=Unimod:4 BUT I believe it should not be used anyway, since the unimod accession is available.

(Should this value be validated in sdrf_pipelines ?)

LMK if we could improve the documentation on the pipeline to know what fields/sub-fields actually matter for the execution.

Error trace at the NFCORE_QUANTMS:QUANTMS:DIA:DIANNCONVERT stage:

Error executing process > 'NFCORE_QUANTMS:QUANTMS:DIA:DIANNCONVERT (input_files.sdrf)'

Caused by:
  Essential container in task exited

Command executed:

  diann_convert.py convert \
      --folder ./ \
      --exp_design input_files.sdrf_openms_design.tsv \
      --diann_version ./version/versions.yml \
      --dia_params "0.02;Da;10;ppm;Trypsin;Carbamidomethylation (C);Oxidation (M)" \
      --charge 4 \
      --missed_cleavages 2 \
      --qvalue_threshold 0.01 \
      2>&1 | tee convert_report.log
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_QUANTMS:QUANTMS:DIA:DIANNCONVERT":
      pyopenms: $(pip show pyopenms | grep "Version" | awk -F ': ' '{print $2}')
  END_VERSIONS

Command exit status:
  1


line 422, in MTD_mod_info
      mod_obj = mods_db.getModification(mod)
    File "pyopenms/pyopenms_2.pyx", line 9057, in pyopenms.pyopenms_2.ModificationsDB.getModification
    File "pyopenms/pyopenms_2.pyx", line 9016, in pyopenms.pyopenms_2.ModificationsDB._getModification_1
  RuntimeError: the value 'Carbamidomethylation (C)' was used but is not valid; Retrieving the modification failed. It is not available for the residue '' and term specificity 'none'.

const OpenMS::ResidueModification* OpenMS::ModificationsDB::searchModificationsFast(const OpenMS::String&, bool&, const OpenMS::String&, OpenMS::ResidueModification::TermSpecificity) constModification not found: Carbamidomethylation (C)
  Warning: OPENMS_DATA_PATH environment variable already exists. pyOpenMS will use it (/usr/local/share/OpenMS/) to locate data in the OpenMS share folder (e.g., the unimod database), instead of the default (/usr/local/lib/python3.10/site-packages/pyopenms/share/OpenMS).

Msgf not working on AWS!

1nxf-scratch-dir ip-172-31-26-118.eu-west-1.compute.internal:/tmp/nxf.XXXXSM8v5U

Input file 'MSGFPlus.jar' could not be found (by searching on PATH). Either provide a full filepath or fix your PATH environment!

Error: File not found (the file 'MSGFPlus.jar

DIA-NN Analysis parameters always trigger DDA-LFQ jobs?

Description of the bug

Hi, thanks for the all useful docs and work so far!

I am currently trying to run DIA data through quantms. We are primarily interested in Phosphoproteomics but I would like to run the data through DIA-NN to see how feasible our outputs are for use. From the flowchart in the homepage, it appears that in DIA-LFQ mode the pipeline should not run Comet or percolator and instead go straight into DIA-NN after mzML indexing.

However, with my run the thermorawparser completes successfully and yet ProteomicsLFQ amongst other steps such as Percolator is carried out. It always errors out at the NFCORE_QUANTMS:QUANTMS:LFQ:PROTEOMICSLFQ with an error no proteins remaining after FDR message. I am new to proteomics (and Nextflow!) but from the docs I can see that it states the comet search engine and these steps are not appropriate for DIA data? Could this be the cause of the error?

Based on the argument parameters, I do not think my command line is triggering any DDA-LFQ jobs but perhaps it is something in the SDRF?
In addition: WARN: There's no process matching config selector: MULTIQC -- Did you mean: PMULTIQC?

Thanks in advance!

Kind regards,
James

Command used and terminal output

nextflow run nf-core/quantms -r 1.2.0 --input '/home/james.burgess/projects/JB240122_MaxQuant_MSQuant_Benchmarking/MaxQuant_MSquant_Linux_Benchmarking/quantms/JB230130_quantms_input.sdrf.tsv' --root_folder '/ibm/hpcfs1/tmp/JB230123_DATA/DATA/raw' --database '/ibm/hpcfs1/tmp/JB230123_DATA/DATA/MaxDIA/fasta/JB230123_quantms_database.fasta' --outdir '/ibm/hpcfs1/tmp/JB230123_DATA/DATA/JB230123_quantms_proc' --acquisition_method 'dia' --max_cpus 60 --max_memory 150GB  -profile singularity -resume


[8e/b7eb83] process > NFCORE_QUANTMS:QUANTMS:INPUT_CHECK:SAMPLESHEET_CHECK (JB230130_quantms_input.sdrf.tsv) [  0%] 0 of 1
[-        ] process > NFCORE_QUANTMS:QUANTMS:CREATE_INPUT_CHANNEL:SDRFPARSING                                -
[-        ] process > NFCORE_QUANTMS:QUANTMS:FILE_PREPARATION:DECOMPRESS                                     -
[-        ] process > NFCORE_QUANTMS:QUANTMS:FILE_PREPARATION:MZMLINDEXING                                   -
[-        ] process > NFCORE_QUANTMS:QUANTMS:FILE_PREPARATION:THERMORAWFILEPARSER                            -
[-        ] process > NFCORE_QUANTMS:QUANTMS:FILE_PREPARATION:MZMLSTATISTICS                                 -
[-        ] process > NFCORE_QUANTMS:QUANTMS:TMT:ID:DATABASESEARCHENGINES:SEARCHENGINECOMET                  -
[-        ] process > NFCORE_QUANTMS:QUANTMS:TMT:ID:PSMRESCORING:EXTRACTPSMFEATURES                          -
[-        ] process > NFCORE_QUANTMS:QUANTMS:TMT:ID:PSMRESCORING:PERCOLATOR                                  -
[-        ] process > NFCORE_QUANTMS:QUANTMS:TMT:ID:PSMFDRCONTROL:IDSCORESWITCHER                            -
[-        ] process > NFCORE_QUANTMS:QUANTMS:TMT:ID:PSMFDRCONTROL:IDFILTER                                   -
[-        ] process > NFCORE_QUANTMS:QUANTMS:TMT:FEATUREMAPPER:ISOBARICANALYZER                              -
[-        ] process > NFCORE_QUANTMS:QUANTMS:TMT:FEATUREMAPPER:IDMAPPER                                      -
[-        ] process > NFCORE_QUANTMS:QUANTMS:TMT:FILEMERGE                                                   -
[-        ] process > NFCORE_QUANTMS:QUANTMS:TMT:PROTEININFERENCE:PROTEININFERENCER                          -
[-        ] process > NFCORE_QUANTMS:QUANTMS:TMT:PROTEININFERENCE:IDFILTER                                   -
[-        ] process > NFCORE_QUANTMS:QUANTMS:TMT:PROTEINQUANT:IDCONFLICTRESOLVER                             -
[-        ] process > NFCORE_QUANTMS:QUANTMS:TMT:PROTEINQUANT:PROTEINQUANTIFIER                              -
[-        ] process > NFCORE_QUANTMS:QUANTMS:TMT:PROTEINQUANT:MSSTATSCONVERTER                               -
[-        ] process > NFCORE_QUANTMS:QUANTMS:TMT:MSSTATSTMT                                                  -
[-        ] process > NFCORE_QUANTMS:QUANTMS:LFQ:ID:DATABASESEARCHENGINES:SEARCHENGINECOMET                  -
[-        ] process > NFCORE_QUANTMS:QUANTMS:LFQ:ID:PSMRESCORING:EXTRACTPSMFEATURES                          -
[-        ] process > NFCORE_QUANTMS:QUANTMS:LFQ:ID:PSMRESCORING:PERCOLATOR                                  -
[-        ] process > NFCORE_QUANTMS:QUANTMS:LFQ:ID:PSMFDRCONTROL:IDSCORESWITCHER                            -
[-        ] process > NFCORE_QUANTMS:QUANTMS:LFQ:ID:PSMFDRCONTROL:IDFILTER                                   -
[-        ] process > NFCORE_QUANTMS:QUANTMS:LFQ:PROTEOMICSLFQ                                               -
[-        ] process > NFCORE_QUANTMS:QUANTMS:LFQ:MSSTATS                                                     -
[-        ] process > NFCORE_QUANTMS:QUANTMS:DIA:DIANNCFG                                                    -
[-        ] process > NFCORE_QUANTMS:QUANTMS:DIA:SILICOLIBRARYGENERATION                                     -
[-        ] process > NFCORE_QUANTMS:QUANTMS:DIA:DIANN_PRELIMINARY_ANALYSIS                                  -
[-        ] process > NFCORE_QUANTMS:QUANTMS:DIA:ASSEMBLE_EMPIRICAL_LIBRARY                                  -
[-        ] process > NFCORE_QUANTMS:QUANTMS:DIA:INDIVIDUAL_FINAL_ANALYSIS                                   -
[-        ] process > NFCORE_QUANTMS:QUANTMS:DIA:DIANNSUMMARY                                                -
[-        ] process > NFCORE_QUANTMS:QUANTMS:DIA:DIANNCONVERT                                                -
[-        ] process > NFCORE_QUANTMS:QUANTMS:DIA:MSSTATS                                                     -
[-        ] process > NFCORE_QUANTMS:QUANTMS:CUSTOM_DUMPSOFTWAREVERSIONS                                     -
[-        ] process > NFCORE_QUANTMS:QUANTMS:SUMMARYPIPELINE                                                 -
WARN: There's no process matching config selector: MULTIQC -- Did you mean: PMULTIQC?

Relevant files

JB230130_quantms_input.txt

System information

Nextflow version: 23.10.1.5891
Hardware: HPC
Executor: local
Container engine: Singularity, Conda
OS: Linux 4.18.0-513.5.1.el8_9.x86_64
Version of nf-core/quantms: 1.2.0

fix links to TOPP tools

Description of the bug

we removed the distinction of UTIL_ and TOPP_ tool in OpenMS 3.1.
This means we need to update documentation links in quantms to reflect those changes.
Replacing "UTIL_" with "TOPP_" should suffice.

Command used and terminal output

No response

Relevant files

No response

System information

No response

Do not use nextflow readLine() since it downloads files on cluster head nodes

Description of the bug

Im getting an odd error when trying to run 40 samples through the pipeline on AWS batch.

Everything proceeds normally until the MZMLINDEXING step when the head node crashes, with the error java error “Failed to acquire stream chunk”.

The log where the error happens:

| 2022-10-05T11:20:19.906-07:00 | [51/a3778a] Submitted process > NFCORE_QUANTMS:QUANTMS:FILE_PREPARATION:MZMLINDEXING (file_32)
  | 2022-10-05T11:20:23.240-07:00 | Failed to acquire stream chunk
  | 2022-10-05T11:20:23.240-07:00 | -- Check script '/root/.nextflow/assets/[users_name]/nf-quantms/./workflows/../subworkflows/local/file_preparation.nf' at line: 32 or see '.nextflow.log' file for more details
  | 2022-10-05T11:20:23.263-07:00 | -�[0;35m[nf-core/quantms]�[0;31m Pipeline completed with errors�[0m-
  | 2022-10-05T11:20:23.267-07:00 | WARN: Killing running tasks (39)
  | 2022-10-05T11:20:23.469-07:00CopyWARN: Unable to get file attributes file: s3://[users_bucket]/versions.yml -- Cause: com.amazonaws.SdkClientException: Failed to sanitize XML document destined for handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler | WARN: Unable to get file attributes file: s3://[users_bucket]/_nextflow/runs/39/df0b79873b070c19eddd53c33b8288/versions.yml -- Cause: com.amazonaws.SdkClientException: Failed to sanitize XML document destined for handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler
  | 2022-10-05T11:20:27.115-07:00 | Failed to acquire stream chunk
  | 2022-10-05T11:20:39.624-07:00 | === Running Cleanup ===

The code refenced:

nonIndexedMzML: file(it[1]).withReader {
f = it; 1.upto(5) {
if (f.readLine().contains("indexedmzML")) return false;
}
return true;

The head node seems to crash at a different file number if I change the amount of memory I assign to the head node. Is all the data passing through the head node somewhere? I've never had this problem with any of my NGS pipelines, they use more and larger files, so I am a little confused at this crash.

Command used and terminal output

No response

Relevant files

The log file where it crashes: (it is dated different, but this is the same error that always shows)

Oct-10 23:03:07.744 [Actor Thread 15] ERROR nextflow.extension.DataflowHelper - @unknown
java.io.IOException: Failed to acquire stream chunk
	at com.upplication.s3fs.ng.FutureInputStream.nextBuffer(FutureInputStream.java:78)
	at com.upplication.s3fs.ng.FutureInputStream.read(FutureInputStream.java:63)
	at java.base/sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:270)
	at java.base/sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:313)
	at java.base/sun.nio.cs.StreamDecoder.read(StreamDecoder.java:188)
	at java.base/java.io.InputStreamReader.read(InputStreamReader.java:177)
	at java.base/java.io.BufferedReader.fill(BufferedReader.java:162)
	at java.base/java.io.BufferedReader.readLine(BufferedReader.java:329)
	at java.base/java.io.BufferedReader.readLine(BufferedReader.java:396)
	at java_io_BufferedReader$readLine.call(Unknown Source)
	at Script_d4bc0d6a$_runScript_closure1$_closure2$_closure5$_closure8$_closure9$_closure11.doCall(Script_d4bc0d6a:32)
	at jdk.internal.reflect.GeneratedMethodAccessor276.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:107)
	at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323)
	at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:274)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1035)
	at groovy.lang.Closure.call(Closure.java:412)
	at groovy.lang.Closure.call(Closure.java:428)
	at org.codehaus.groovy.runtime.DefaultGroovyMethods.upto(DefaultGroovyMethods.java:16406)
	at org.codehaus.groovy.runtime.dgm$875.doMethodInvoke(Unknown Source)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1268)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1035)
	at groovy.runtime.metaclass.NumberDelegatingMetaClass.invokeMethod(NumberDelegatingMetaClass.java:60)
	at org.codehaus.groovy.runtime.callsite.PojoMetaClassSite.call(PojoMetaClassSite.java:44)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:148)
	at Script_d4bc0d6a$_runScript_closure1$_closure2$_closure5$_closure8$_closure9.doCall(Script_d4bc0d6a:31)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:107)
	at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323)
	at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:274)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1035)
	at groovy.lang.Closure.call(Closure.java:412)
	at groovy.lang.Closure.call(Closure.java:428)
	at org.codehaus.groovy.runtime.IOGroovyMethods.withReader(IOGroovyMethods.java:1160)
	at org.apache.groovy.nio.extensions.NioExtensions.withReader(NioExtensions.java:1434)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at org.codehaus.groovy.runtime.metaclass.ReflectionMetaMethod.invoke(ReflectionMetaMethod.java:54)
	at org.codehaus.groovy.runtime.metaclass.NewInstanceMetaMethod.invoke(NewInstanceMetaMethod.java:54)
	at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1268)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1035)
	at groovy.runtime.metaclass.NextflowDelegatingMetaClass.invokeMethod(NextflowDelegatingMetaClass.java:66)
	at org.codehaus.groovy.runtime.callsite.PojoMetaClassSite.call(PojoMetaClassSite.java:44)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:139)
	at Script_d4bc0d6a$_runScript_closure1$_closure2$_closure5$_closure8.doCall(Script_d4bc0d6a:30)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:107)
	at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323)
	at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:274)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1035)
	at groovy.lang.Closure.call(Closure.java:412)
	at groovy.lang.Closure.call(Closure.java:428)
	at nextflow.extension.BranchOp.doNext(BranchOp.groovy:55)
	at jdk.internal.reflect.GeneratedMethodAccessor258.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:107)
	at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1268)
	at groovy.lang.MetaClassImpl.invokeMethodClosure(MetaClassImpl.java:1048)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1142)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1035)
	at groovy.lang.Closure.call(Closure.java:412)
	at groovy.lang.Closure.call(Closure.java:428)
	at groovy.lang.Closure$call.call(Unknown Source)
	at nextflow.extension.DataflowHelper$_subscribeImpl_closure2.doCall(DataflowHelper.groovy:285)
	at jdk.internal.reflect.GeneratedMethodAccessor202.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:107)
	at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323)
	at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:274)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1035)
	at groovy.lang.Closure.call(Closure.java:412)
	at groovyx.gpars.dataflow.operator.DataflowOperatorActor.startTask(DataflowOperatorActor.java:120)
	at groovyx.gpars.dataflow.operator.DataflowOperatorActor.onMessage(DataflowOperatorActor.java:108)
	at groovyx.gpars.actor.impl.SDAClosure$1.call(SDAClosure.java:43)
	at groovyx.gpars.actor.AbstractLoopingActor.runEnhancedWithoutRepliesOnMessages(AbstractLoopingActor.java:293)
	at groovyx.gpars.actor.AbstractLoopingActor.access$400(AbstractLoopingActor.java:30)
	at groovyx.gpars.actor.AbstractLoopingActor$1.handleMessage(AbstractLoopingActor.java:93)
	at groovyx.gpars.util.AsyncMessagingCore.run(AsyncMessagingCore.java:132)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Cannot reserve 10,485,760 bytes of direct buffer memory (allocated: 1070363393, limit: 1,073,741,824)
	at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
	at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)
	at com.upplication.s3fs.ng.FutureInputStream.nextBuffer(FutureInputStream.java:75)
	... 94 common frames omitted
Caused by: java.lang.OutOfMemoryError: Cannot reserve 10485760 bytes of direct buffer memory (allocated: 1070363393, limit: 1073741824)
	at java.base/java.nio.Bits.reserveMemory(Bits.java:178)
	at java.base/java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:121)
	at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:332)
	at com.upplication.s3fs.ng.ChunkBuffer.<init>(ChunkBuffer.java:41)
	at com.upplication.s3fs.ng.ChunkBufferFactory.create(ChunkBufferFactory.java:65)
	at com.upplication.s3fs.ng.S3ParallelDownload.doDownload(S3ParallelDownload.java:136)
	at com.upplication.s3fs.ng.S3ParallelDownload.lambda$safeDownload$1(S3ParallelDownload.java:127)
	at dev.failsafe.Functions.lambda$toCtxSupplier$11(Functions.java:236)
	at dev.failsafe.Functions.lambda$get$0(Functions.java:46)
	at dev.failsafe.internal.RetryPolicyExecutor.lambda$apply$0(RetryPolicyExecutor.java:75)
	at dev.failsafe.SyncExecutionImpl.executeSync(SyncExecutionImpl.java:176)
	at dev.failsafe.FailsafeExecutor.call(FailsafeExecutor.java:437)
	at dev.failsafe.FailsafeExecutor.get(FailsafeExecutor.java:115)
	at com.upplication.s3fs.ng.S3ParallelDownload.safeDownload(S3ParallelDownload.java:127)
	at com.upplication.s3fs.ng.FutureIterator.lambda$init$0(FutureIterator.java:59)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	... 3 common frames omitted
Oct-10 23:03:07.773 [Actor Thread 15] DEBUG nextflow.Session - Session aborted -- Cause: Failed to acquire stream chunk

System information

Nextflow version (eg. 22.04.5)
Hardware AWS
Executor awsbatch
Container engine: default
OS AWSLinux
Version of nf-core/quantms v1.1dev

Feature request: output search results after protein inference and filtering

Description of feature

In the DDA/label-free workflow (perhaps in others as well), it would be useful to get an output of the identification results at the end of the search/inference/filtering pipeline, but before quantification. This would help for QC-purposes and to get a fuller understanding of the data on the ID level.

Currently the latest "ID-only" output is in the "idfilter" directory, containing results after database search, rescoring, (consensus ID, if applicable) and filtering according to "psm_pep_fdr_cutoff" - but before protein inference and protein-level filtering ("protein_level_fdr_cutoff"). Later outputs (after ProteomicsLFQ) only contain results for quantified IDs - omitting those that were identified, but never quantified. (Is that correct?)

Some problem in learning quantms source code and OpenMS

Thank you to all the developers for their hard work! Recently, I have been learning the basics of OpenMS, and I have been using the data PXD015828 from PRIDE as a practice dataset. During this process, I have encountered the following issues. I referenced quantms(https://github.com/nf-core/quantms/tree/master) to build a set of scripts, and I encountered three issues during testing.
(OpenMS 3.1 installed from bioconda)
image

the difference of quantms pipeline figure and the code

The position of the consensusID step is inconsistent between the figure (https://github.com/nf-core/quantms/blob/master/docs/images/quantms_metro.png) and the source code. In the code, it is placed after rescoring, while in the figure, it is placed before rescoring. Does this inconsistency affect the experimental results? Which approach is better? Is it preferable to have consistency?

where should be the best position of PeptideIndexer?

I faced Error in ProteomicsLFQ ( submitted to OpenMS), And I found the progress is omit in quantms, I want to put it back. I tried to add PeptideIndexer before ConsensusID, it it right?

IDScoreSwitcher error (also submitted to OpenMS)

I faced error when I use IDScoreSwitcher, I just skipped it Is it allowed to skip the IDScoreSwitcher?
Errorlog:

IDScoreSwitcher -in work/20170510_Joshi_9119/consensusID.idXML -out work/20170510_Joshi_9119/idscoreswitcher.idXML -new_score_orientation higher_better -threads 48 -debug 0 
Error: Unexpected internal error (Meta value '' not found for peptide hit with sequence 'SETNDS(Phospho)S(Phospho)S(Phospho)GS(Phospho)QS(Phospho)HQDASAASSAPPR', charge 2, score 1.0)

And I also faced some openms error report, I submit in OpenMS, Thank you for your assistance. I hope to contribute to this program.

Real full size data sets for AWS full tests

Description of feature

Right now, we are using our small test data. Which is kind of enough to show the capabilities but the results do not make much sense. Would be nice if you e.g. see reasonable proteins in the volcano plots of msstats or things like that.
We can then write stuff in the documentation on how to interpret results etc

Add the Zenodo Id properly

Description of feature

I saw you added the Zenodo ID, howeer it's better to use the 'global' one that points to th altest version (so you don't have to update it all the time):
Cite all versions? You can cite all versions by using the DOI 10.5281/zenodo.7754148. This DOI represents all versions, and will always resolve to the latest one. Read more. (that's yours)
Also you need to update the doi in the manifest section of the nextflow.config

What is singularity_pull_docker_container?

And can we remove it?
It certainly should not be part of the modules since it does not seem to be nf-core specific.
If you want to keep it, it should go into a config file.

error in reference_channel

Description of feature

i have some error in when set reference_channel for my tmt run. i use tmt131N as my reference and run --reference_channel 131N. it report that:

ERROR: Validation of pipeline parameters failed!

  • --reference_channel: expected type: Number, found: String (131N)

Usability tests/improvements

I think we have the basic features ready in this pipeline.

@ypriverol @timosachsenberg

Unfortunately most bug reports that we receive are from people struggling with SDRF or getting errors in the middle of the pipeline because of erroneous SDRFs.
I think we should significantly extend and improve both the checks and error messages in multiple steps, starting with SDRF parsing before focusing on even more complicated designs such as TMT with MBR. We can add erroneous SDRFs to the sdrf-pipelines repository and make sure that reasonable error messages are given. We can add several validation levels: for openms, for quantms, for quantms with MSstats (or downstream statistics), with each level requiring more information. We maybe should even invest in a basic UI component to more easily complete an SDRF in a way that is suitable for this pipeline. We can build on lesSDRF.

Other steps that would benefit from error handling are ID and filtering steps, where we could double-check that ID files are not empty.

Maintainers should also make sure to forward bad error handling to the upstream tools such as OpenMS.

Singularity image not available

Description of the bug

Hi all,

as the title says, I was running the LFQ test pipeline, dev, and it failed due to a missing singularity container which couldn't be downloaded, ghcr.io-openms-openms-executables-latest.img. Maybe relevant, the test pipeline ran successfully with `-r 1.0

Cheers.

Command used and terminal output

`nextflow run nf-core/quantms -r 1.0  -profile singularity,test_lfq --outdir test_pipeline`

Error executing process > 'NFCORE_QUANTMS:QUANTMS:LFQ:PROTEOMICSLFQ (BSA_design_urls_openms_design)'

Caused by:
  Failed to pull singularity image
  command: singularity pull  --name ftp.pride.ebi.ac.uk-pride-resources-tools-ghcr.io-openms-openms-executables-latest.img.img.pulling.1672748804878 https://ftp.pride.ebi.ac.uk/pride/resources/tools/ghcr.io-openms-openms-executables-latest.img > /dev/null
  status : 255
  message:
    INFO:    Downloading network image
    FATAL:   the requested image was not found

Relevant files

nextflow.log

System information

No response

`DIANNCONVERT` crashes when `dia_params` has empty parameters

Description of the bug

Im having some trouble testing v1.1 with my data. I am getting a crash when it hits DIANNCONVERT.
Looking into it, diann_convert.py is passed --dia_params "20;ppm;10;ppm;Trypsin;;Oxidation (M)", so

if fix_mod != "null":
fix_flag = 1
for mod in fix_mod.split(","):
mod_obj = mods_db.getModification(mod)
mod_name = mod_obj.getId()
mod_accession = mod_obj.getUniModAccession()
site = mod_obj.getOrigin()
fix_ptm.append(("[UNIMOD, " + mod_accession.upper() + ", " + mod_name + ", ]", site))
else:
fix_flag = 0
fix_ptm.append("[MS, MS:1002453, No fixed modifications searched, ]")
doesn’t work. mods_db.getModification(mod) says there is no mod for "".

Diving into that, I found my seed.sdrf_config has blanks for FixedModifications. This is probably because my original input SDRF file did not have any column for it. Is that the expected result if i don’t have any FixedModifications in my SDRF file or should it put null ?

One possible solution is just to change L215 to check for "null" or "", but I wonder if a better solution is to change it upstream where seed.sdrf_config is generated as it probably isn't supposed to have blanks in it?

I can't share my original seed file, but the bug should be reproducible if you delete the comment[modification parameters] columns from nf-core/test-datasets/quantms/testdata-aws/dia_full/PXD004684.sdrf.tsv

Command used and terminal output

No response

Relevant files

No response

System information

Nextflow version (eg. 22.04.5)
Hardware AWS
Executor awsbatch
Container engine: default
OS AWSLinux
Version of nf-core/quantms v1.1dev

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.