Giter Site home page Giter Site logo

nanortax's Introduction

Real-time analysis pipeline for nanopore 16S rRNA data.

Introduction

NanoRTax is a taxonomic and diversity analysis pipeline built originally for Nanopore 16S rRNA data with real-time analysis support in mind. It combines state-of-the-art classifiers such as Kraken2, Centrifuge and BLAST with downstream analysis steps to provide a framework for the analysis of in-progress sequencing runs. NanoRTax retrieves the final output files in the same structure/format for every classifier which enables more comprehensive tool/database comparison and better benchmarking capabilities. Additionally, NanoRTax includes a web application (./viz_webapp/) for visualizing complete or partial pipeline outputs.

The NanoRTax pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with conda environments and docker containers making installation trivial and results highly reproducible.

Quick Start

i. Install nextflow

ii. Install either Docker for full pipeline reproducibility or use Conda

iii. Download the pipeline and example databases and test it on a minimal dataset with a single command

#UPDATE:Taxonomic data necessary for taxonkit
wget ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz
tar -xzvf taxdump.tar.gz -C db/
#BLAST database
mkdir db db/taxdb
wget https://ftp.ncbi.nlm.nih.gov/blast/db/16S_ribosomal_RNA.tar.gz && tar -xzvf 16S_ribosomal_RNA.tar.gz -C db
wget https://ftp.ncbi.nlm.nih.gov/blast/db/taxdb.tar.gz && tar -xzvf taxdb.tar.gz -C db/taxdb
#Kraken2 RDP database
wget ftp://ftp.ccb.jhu.edu/pub/data/kraken2_dbs/16S_RDP11.5_20200326.tgz && tar -xzvf 16S_RDP11.5_20200326.tgz -C db
#Centrifuge P_COMPRESSED database (more information: https://ccb.jhu.edu/software/centrifuge/manual.shtml#database-download-and-index-building)
wget https://genome-idx.s3.amazonaws.com/centrifuge/p_compressed_2018_4_15.tar.gz && tar -xzvf p_compressed_2018_4_15.tar.gz -C db
nextflow run main.nf -profile test,<docker/conda>

iv. Start running your own analysis!

We provide an example configuration profile with the default parameters for running the pipeline (conf/default.config) and it is a good starting point to easily customize your NanoRTax workflow. This configuration is loaded by specifying "default" in the profiles list of pipeline command.

a. Run classification on a single FASTQ file

nextflow run main.nf -profile <default,docker/conda> --reads '/seq_path/sample.fastq'

b. Run classification on an entire sequencing run directory. NanoRTax will detect the barcode directories and analyze all samples:

nextflow run main.nf -profile <default,docker/conda> --reads '/seq_path/fastq_pass/**/*.fastq'

c. Real-time mode.

nextflow run main.nf -profile <default,docker/conda> --reads_rt '/seq_path/fastq_pass/**/*.fastq'

Similar to the normal mode but using --reads_rt for input. Partial results are stored in an output directory and are accessible as .csv file and web app visualization files. In this mode, the workflow will run endlessly, so it needs to be stopped manually by Ctrl+C once all FASTQ files are completely processed.

Note: This mode is intended to work with non-bulk FASTQ files (ie: 500 reads per file) in order to provide a fluid real-time analysis of generated reads. This aspect can be configured before starting the experiment via MinKNOW sequencing software.

v. Visualize partial/complete outputs using NanoRTax web application (./viz_webapp)

Before running the web application, make sure to have the necessary dependencies installed or use the provided viz_webapp/environment.yml file to build a conda environment (recommended):

conda env create -f environment.yml
conda activate nanortax_webapp

Start the web application server with the command below and access the interface with a web browser (http://127.0.0.1:8050/ by default).

cd viz_webapp && python dashboard.py

See usage docs for all of the available options when running the pipeline.

Documentation

The NanoRTax pipeline comes with documentation about the pipeline, found in the docs/ directory:

Running the pipeline Output and how to interpret the results

Credits

Rodríguez-Pérez H, Ciuffreda L, Flores C. NanoRTax, a real-time pipeline for taxonomic and diversity analysis of nanopore 16S rRNA amplicon sequencing data. Comput Struct Biotechnol J. 2022;20:5350-5354. doi: https://doi.org/10.1016/j.csbj.2022.09.024

This work was supported by Instituto de Salud Carlos III [PI14/00844, PI17/00610, and FI18/00230] and co-financed by the European Regional Development Funds, “A way of making Europe” from the European Union; Ministerio de Ciencia e Innovación [RTC-2017–6471-1, AEI/FEDER, UE]; Cabildo Insular de Tenerife [CGIEU0000219140]; Fundación Canaria Instituto de Investigación Sanitaria de Canarias [PIFUN48/18]; and by the agreement with Instituto Tecnológico y de Energías Renovables (ITER) to strengthen scientific and technological education, training, research, development and innovation in Genomics, Personalized Medicine and Biotechnology [OA17/008].

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines.

nanortax's People

Contributors

genomicsiter avatar

Stargazers

 avatar Esmee Alderliesten avatar Daniel J. Gomez avatar Colin Davenport avatar Davi Marcon avatar  avatar Jerry John avatar Karma avatar Jennifer Müller avatar Marcos Colebrook avatar

Watchers

James Cloos avatar  avatar

nanortax's Issues

Error: Cannot cast array data from dtype('O') to dtype('int64') according to the rule 'safe'

Hello

This is my first time trying NanoRTax and am new to nanopore data analysis. I am getting an error that I don't see how to fix or workaround. To move past the bottleneck created by the error I can move the related directories out of the work folder. The error occurs when running the kraken_push process and seems to be related to when the counts are to be tabulated for the diversity calculations.

Any help is appreciated.

Not sure if this is related but for the read_binning process I had to add the --drop flag. This error looked to be due to some of the rows in the kraken_report_full that had a value for a count but was otherwise empty -- no information on kingdom, phylum, class, order, family, genus, or species as in the rest of the rows.

Command error:
executor > local (12)
[d6/648673] process > QC (61) [ 14%] 61 of 434, cached: 59
[79/f5a961] process > qc_reporting (33) [ 55%] 33 of 60, cached: 31
[e3/41d78b] process > read_binning_kraken (60) [100%] 60 of 60, cached: 58
[b6/4d0d4a] process > agg_kraken (49) [ 84%] 49 of 58, cached: 46
[52/cba34b] process > kraken_push (24) [ 52%] 25 of 48, cached: 23, failed: 1
[4e/030042] process > agg_kraken_diversity (23) [100%] 23 of 23, cached: 22
[bc/b73cab] process > output_documentation [100%] 1 of 1, cached: 1 ✔
Execution cancelled -- Finishing pending tasks before exit
-[nf-core/rtnanopipeline] Pipeline completed with errors-
WARN: There's no process matching config selector: get_software_versions
Error executing process > 'kraken_push (23)'

Caused by:
Process kraken_push (23) terminated with an error exit status (1)

Command executed [/athena/home/beatond/Tools/NanoRTax/templates/kraken_push.py]:

#!/usr/bin/env python3

import datetime
import re
import pandas as pd
import skbio

df = pd.read_csv("kraken_report_full.txt", delimiter=" ", names=['seq_id', 'tax_id', 'kingdom', 'phylum', 'class', 'order', 'family', 'genus', 'species'])
tax_table = df['class'].value_counts()
tax_table_class = pd.DataFrame(list(zip(tax_table.index.tolist(), tax_table.tolist())), columns =['tax_id', 'read_count'])
tax_table_class.to_csv("kraken_report_class.csv", header=1, columns=["tax_id", "read_count"], index=False)

tax_table = df['order'].value_counts()
tax_table_order = pd.DataFrame(list(zip(tax_table.index.tolist(), tax_table.tolist())), columns =['tax_id', 'read_count'])
tax_table_order.to_csv("kraken_report_order.csv", header=1, columns=["tax_id", "read_count"], index=False)

tax_table = df['family'].value_counts()
tax_table_family = pd.DataFrame(list(zip(tax_table.index.tolist(), tax_table.tolist())), columns =['tax_id', 'read_count'])
tax_table_family.to_csv("kraken_report_family.csv", header=1, columns=["tax_id", "read_count"], index=False)

tax_table = df['genus'].value_counts()
tax_table_genus = pd.DataFrame(list(zip(tax_table.index.tolist(), tax_table.tolist())), columns =['tax_id', 'read_count'])
tax_table_genus.to_csv("kraken_report_genus.csv", header=1, columns=["tax_id", "read_count"], index=False)

tax_table = df['species'].value_counts()
tax_table_species = pd.DataFrame(list(zip(tax_table.index.tolist(), tax_table.tolist())), columns =['tax_id', 'read_count'])
tax_table_species.to_csv("kraken_report_species.csv", header=1, columns=["tax_id", "read_count"], index=False)

shannon = skbio.diversity.alpha.shannon(tax_table_class['read_count'])
simpson = skbio.diversity.alpha.simpson(tax_table_class['read_count'])
file1 = open("kraken_diversity_class.csv","w")
file1.write(str(tax_table_class['read_count'].sum()) + "," + str(round(shannon,3)) + "," + str(round(simpson,3)) + "\n")
file1.close()

shannon = skbio.diversity.alpha.shannon(tax_table_order['read_count'])
simpson = skbio.diversity.alpha.simpson(tax_table_order['read_count'])
file1 = open("kraken_diversity_order.csv","w")
file1.write(str(tax_table_order['read_count'].sum()) + "," + str(round(shannon,3)) + "," + str(round(simpson,3)) + "\n")
file1.close()

shannon = skbio.diversity.alpha.shannon(tax_table_family['read_count'])
simpson = skbio.diversity.alpha.simpson(tax_table_family['read_count'])
file1 = open("kraken_diversity_family.csv","w")
file1.write(str(tax_table_family['read_count'].sum()) + "," + str(round(shannon,3)) + "," + str(round(simpson,3)) + "\n")
file1.close()

shannon = skbio.diversity.alpha.shannon(tax_table_genus['read_count'])
simpson = skbio.diversity.alpha.simpson(tax_table_genus['read_count'])
file1 = open("kraken_diversity_genus.csv","w")
file1.write(str(tax_table_genus['read_count'].sum()) + "," + str(round(shannon,3)) + "," + str(round(simpson,3)) + "\n")
file1.close()

shannon = skbio.diversity.alpha.shannon(tax_table_species['read_count'])
simpson = skbio.diversity.alpha.simpson(tax_table_species['read_count'])
file1 = open("kraken_diversity_species.csv","w")
file1.write(str(tax_table_species['read_count'].sum()) + "," + str(round(shannon,3)) + "," + str(round(simpson,3)) + "\n")
file1.close()

#diversity = pd.DataFrame(data={"Shannon index": [round(shannon,3)], "Simpson index": [round(simpson, 3)]})
#diversity.to_csv("kraken_diversity.csv", sep=',',index=False, header=0)

Command exit status:
1

Command output:
(empty)

Command error:
/athena/home/beatond/Data_analysis/Nanopore_data/crosswise/NanoRTax_test/work/conda/nanortax-8a6750e7c29e704fe98c2ea094c85a6b/lib/python3.6/site-packages/skbio/util/_testing.py:16: FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.
import pandas.util.testing as pdt
Traceback (most recent call last):
File ".command.sh", line 30, in
shannon = skbio.diversity.alpha.shannon(tax_table_class['read_count'])
File "/athena/home/beatond/Data_analysis/Nanopore_data/crosswise/NanoRTax_test/work/conda/nanortax-8a6750e7c29e704fe98c2ea094c85a6b/lib/python3.6/site-packages/skbio/diversity/alpha/_base.py", line 868, in shannon
counts = _validate_counts_vector(counts)
File "/athena/home/beatond/Data_analysis/Nanopore_data/crosswise/NanoRTax_test/work/conda/nanortax-8a6750e7c29e704fe98c2ea094c85a6b/lib/python3.6/site-packages/skbio/diversity/_util.py", line 26, in _validate_counts_vector
counts = counts.astype(int, casting='safe', copy=False)
TypeError: Cannot cast array data from dtype('O') to dtype('int64') according to the rule 'safe'

Work dir:
/athena/home/beatond/Data_analysis/Nanopore_data/crosswise/NanoRTax_test/work/71/d8b8dda7cc3212135eda44829a8a67

Tip: view the complete command output by changing to the process work dir and entering the command cat .command.out

Problem running test

After installing NanoRTax according to the instructions, I fail to run the test with the test profile.
There seems to be lacking someting in main.nf:

Missing workflow definition - DSL2 requires at least a workflow block in the main script

Please find the verbose output below.

with kind regards,

Thierry

$ nextflow run main.nf -profile test,docker N E X T F L O W ~ version 22.04.0
Launching main.nf [serene_davinci] DSL2 - revision: 5faf521cd0
WARN: Access to undefined parameter multiqc_config -- Initialise it to a default value eg. params.multiqc_config = some_value
WARN: Access to undefined parameter reads_rt -- Initialise it to a default value eg. params.reads_rt = some_value

                                    ,--./,-.
    ___     __   __   __   ___     /,-._.--~'

|\ | |__ __ / / \ |__) |__ } { | \| | \__, \__/ | \ |___ \-.,--, .,._,'
nf-core/rtnanopipeline v1.0dev

Run Name : test-run-32-noBLAST
Reads : /media/minion/Data/Data_analyses/NanoRtax/NanoRTax/test_data/minimock.fastq
Max Resources : 128 GB memory, 16 cpus, 10d time per job
Container : docker - hecrp/nanortax:latest
Output dir : ./results
Launch dir : /media/minion/Data/Data_analyses/NanoRtax/NanoRTax
Working dir : /media/minion/Data/Data_analyses/NanoRtax/NanoRTax/work
Script dir : /media/minion/Data/Data_analyses/NanoRtax/NanoRTax
User : minion
Config Profile : test,docker

WARN: Access to undefined parameter hostnames -- Initialise it to a default value eg. params.hostnames = some_value
Missing workflow definition - DSL2 requires at least a workflow block in the main script

Issue with default config file

error-
N E X T F L O W ~ version 22.10.6
Unknown configuration profile: 'default'

while running the command-
nextflow run main.nf -profile default,conda --reads '/home/devuser/omega_idold/NanoRTax/test_data/barcode_82.fastq'

Kindly let us know how to rectify this problem

Errors while trying to run the webapp

Hi, I'm trying to set up the viz web application provided with this pipeline for visualizing the output. I have succesfully created a new Conda environment based on the provided 'environment.yml' file, but when I run
$python dashboard.py I get this error:

Traceback (most recent call last):
  File "dashboard.py", line 2, in <module>
    import dash
ModuleNotFoundError: No module named 'dash'

This is strange, because the 'environment.yml' file looks like this:

name: nanortax_webapp
channels:
  - conda-forge
  - defaults
  - anaconda
dependencies:
  - conda-forge::python=3.6.13
  - conda-forge::typing_extensions=3.10.0.0
  - conda-forge::dash=1.19.0
  - conda-forge::dash-core-components=1.15.0
  - conda-forge::dash-html-components=1.1.2
  - conda-forge::dash-bootstrap-components=0.12.0
  - conda-forge::dash-table=4.11.2
  - anaconda::scikit-bio=0.5.4
  - plotly::plotly=4.14.3
  - numpy=1.19.5
  - pandas=0.22.0

The dash module thus should be in the Conda environment, but the script can't seem to use it. I installed dash manually with
$sudo pip install dash==1.19.0 and I tried to run the 'dashboard.py' script again. I now got this as error message:

Traceback (most recent call last):
  File "dashboard.py", line 6, in <module>
    import dash
  File "/usr/local/lib/python3.8/dist-packages/dash/__init__.py", line 5, in <module>
    from .dash import Dash, no_update  # noqa: F401,E402
  File "/usr/local/lib/python3.8/dist-packages/dash/dash.py", line 22, in <module>
    from werkzeug.debug.tbtools import get_current_traceback
ImportError: cannot import name 'get_current_traceback' from 'werkzeug.debug.tbtools' (/usr/local/lib/python3.8/dist-packages/werkzeug/debug/tbtools.py)

Which indicates that another package that should already be in the Conda environment can't be used.

I am not sure whether I should manually install all the missing packages and modules that result in errors or if there is another solution for this problem. I would gladly hear from anybody who has experienced the same or a simmilar issue, or has some tips for me.

kraken2: database ("/tmp/db/krakendb/16S_RDP_k2db/") does not contain necessary file taxo.k2d

Hi there!

I am trying to run NanoRTax using conda (my computational allocation is incompatible with docker and singularity). I used the '--profile conda' flag and it didn't work. It gave me this error:

(nanortax) [dorojas@dribe-06 2-nanortax]$ nextflow run main.nf --reads 'data/fetuccini1.fastq' -profile conda --outdir prueba/
N E X T F L O W  ~  version 22.10.6
Launching `main.nf` [elated_watson] DSL1 - revision: 5faf521cd0
WARN: Access to undefined parameter `multiqc_config` -- Initialise it to a default value eg. `params.multiqc_config = some_value`
WARN: Access to undefined parameter `reads_rt` -- Initialise it to a default value eg. `params.reads_rt = some_value`
WARN: Access to undefined parameter `kraken` -- Initialise it to a default value eg. `params.kraken = some_value`
WARN: Access to undefined parameter `centrifuge` -- Initialise it to a default value eg. `params.centrifuge = some_value`
WARN: Access to undefined parameter `blast` -- Initialise it to a default value eg. `params.blast = some_value`
----------------------------------------------------
                                        ,--./,-.
        ___     __   __   __   ___     /,-._.--~'
  |\ | |__  __ /  ` /  \ |__) |__         }  {
  | \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                        `._,._,'
  nf-core/rtnanopipeline v1.0dev
----------------------------------------------------

Run Name          : elated_watson
Reads             : data/fetuccini1.fastq
Max Resources     : 128 GB memory, 16 cpus, 10d time per job
Output dir        : prueba/
Launch dir        : /work/dorojas/6-semen/2-nanortax
Working dir       : /work/dorojas/6-semen/2-nanortax/work
Script dir        : /work/dorojas/6-semen/2-nanortax
User              : dorojas
Config Profile    : conda
----------------------------------------------------
WARN: Access to undefined parameter `hostnames` -- Initialise it to a default value eg. `params.hostnames = some_value`
executor >  local (2)
[0c/5c9c55] process > QC (1)               [  0%] 0 of 1
[-        ] process > qc_reporting         -
[-        ] process > read_binning_kraken  -
[-        ] process > agg_kraken           -
[-        ] process > kraken_push          -
[-        ] process > agg_kraken_diversity -
[32/fa2163] process > output_documentation [  0%] 0 of 1
WARN: There's no process matching config selector: get_software_versions
Error executing process > 'QC (1)'

Caused by:
  Process `QC (1)` terminated with an error exit status (127)
executor >  local (2)
[0c/5c9c55] process > QC (1)               [100%] 1 of 1, failed: 1 ✘
[-        ] process > qc_reporting         -
[-        ] process > read_binning_kraken  -
[-        ] process > agg_kraken           -
[-        ] process > kraken_push          -
[-        ] process > agg_kraken_diversity -
[32/fa2163] process > output_documentation [  0%] 0 of 1
Execution cancelled -- Finishing pending tasks before exit
WARN: There's no process matching config selector: get_software_versions
Error executing process > 'QC (1)'

Caused by:
  Process `QC (1)` terminated with an error exit status (127)
executor >  local (2)
[0c/5c9c55] process > QC (1)               [100%] 1 of 1, failed: 1 ✘
[-        ] process > qc_reporting         -
[-        ] process > read_binning_kraken  -
[-        ] process > agg_kraken           -
[-        ] process > kraken_push          -
[-        ] process > agg_kraken_diversity -
[32/fa2163] process > output_documentation [100%] 1 of 1, failed: 1 ✘
Execution cancelled -- Finishing pending tasks before exit
-[nf-core/rtnanopipeline] Pipeline completed with errors-
WARN: There's no process matching config selector: get_software_versions
Error executing process > 'QC (1)'

Caused by:
  Process `QC (1)` terminated with an error exit status (127)

Command executed:

  barcode=$(basename $(dirname /work/dorojas/6-semen/2-nanortax/data/fetuccini1.fastq))
  fastp -i /work/dorojas/6-semen/2-nanortax/data/fetuccini1.fastq -q 8 -l 1400 --length_limit 1700 -o $barcode\_qced_reads.fastq --json $barcode\_qc_report.txt
  head -n30 $barcode\_qc_report.txt | sed '30s/,/\n}/' > $barcode\_qc_report.json
  echo "}" >> $barcode\_qc_report.json

Command exit status:
  127

Command output:
  (empty)

Command error:
  .command.sh: line 3: fastp: command not found

Work dir:
  /work/dorojas/6-semen/2-nanortax/work/0c/5c9c55cd01df5977db1e05b85fa605

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

It seemed to me that the conda environment didn't include 'fastp' but it is specified in the .yml file.

I couldn't find a solution for this, so I decided to create my own conda environment with the .yml file using the command

conda create -n nanortax --file=environment.yml

I excluded the '-profile' flag for the code to call each of the tools out of the working directory (which worked correctly). I know this is not recommended, it was my last resource to try and run this workflow.

The solution worked, but it is outputting this new error about the kraken db:

(nanortax) [dorojas@dribe-06 2-nanortax]$ nextflow run main.nf --reads 'data/fetuccini1.fastq' --outdir prue
ba/
N E X T F L O W  ~  version 22.10.6
Launching `main.nf` [hungry_bartik] DSL1 - revision: 5faf521cd0
WARN: Access to undefined parameter `multiqc_config` -- Initialise it to a default value eg. `params.multiqc_config = some_value`
WARN: Access to undefined parameter `reads_rt` -- Initialise it to a default value eg. `params.reads_rt = some_value`
WARN: Access to undefined parameter `kraken` -- Initialise it to a default value eg. `params.kraken = some_value`
WARN: Access to undefined parameter `centrifuge` -- Initialise it to a default value eg. `params.centrifuge = some_value`
WARN: Access to undefined parameter `blast` -- Initialise it to a default value eg. `params.blast = some_value`
----------------------------------------------------
                                        ,--./,-.
        ___     __   __   __   ___     /,-._.--~'
  |\ | |__  __ /  ` /  \ |__) |__         }  {
  | \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                        `._,._,'
  nf-core/rtnanopipeline v1.0dev
----------------------------------------------------

Run Name          : hungry_bartik
Reads             : data/fetuccini1.fastq
Max Resources     : 128 GB memory, 16 cpus, 10d time per job
Output dir        : prueba/
Launch dir        : /work/dorojas/6-semen/2-nanortax
Working dir       : /work/dorojas/6-semen/2-nanortax/work
Script dir        : /work/dorojas/6-semen/2-nanortax
User              : dorojas
Config Profile    : standard
----------------------------------------------------
WARN: Access to undefined parameter `hostnames` -- Initialise it to a default value eg. `params.hostnames = some_value`
executor >  local (4)
[b3/043f94] process > QC (1)                  [100%] 1 of 1 ✔
[c2/256413] process > qc_reporting (1)        [  0%] 0 of 1
[8b/337657] process > read_binning_kraken (1) [  0%] 0 of 1
[-        ] process > agg_kraken              -
[-        ] process > kraken_push             -
[-        ] process > agg_kraken_diversity    -
[d9/08c47e] process > output_documentation    [100%] 1 of 1 ✔
Error executing process > 'read_binning_kraken (1)'

Caused by:
  Process `read_binning_kraken (1)` terminated with an error exit status (2)

Command executed:

  sed '/^@/s/. ./_/g' data_qced_reads.fastq > krkinput.fastq
  kraken2 --db /tmp/db/krakendb/16S_RDP_k2db/ --use-names --threads 1 krkinput.fastq > krakenreport.txt
  echo "seq_id" > seq_ids.txt
executor >  local (4)
[b3/043f94] process > QC (1)                  [100%] 1 of 1 ✔
[c2/256413] process > qc_reporting (1)        [  0%] 0 of 1
[8b/337657] process > read_binning_kraken (1) [100%] 1 of 1, failed: 1 ✘
[-        ] process > agg_kraken              -
[-        ] process > kraken_push             -
[-        ] process > agg_kraken_diversity    -
[d9/08c47e] process > output_documentation    [100%] 1 of 1 ✔
Execution cancelled -- Finishing pending tasks before exit
Error executing process > 'read_binning_kraken (1)'

Caused by:
  Process `read_binning_kraken (1)` terminated with an error exit status (2)

Command executed:

  sed '/^@/s/. ./_/g' data_qced_reads.fastq > krkinput.fastq
  kraken2 --db /tmp/db/krakendb/16S_RDP_k2db/ --use-names --threads 1 krkinput.fastq > krakenreport.txt
  echo "seq_id" > seq_ids.txt
executor >  local (4)
[b3/043f94] process > QC (1)                  [100%] 1 of 1 ✔
[c2/256413] process > qc_reporting (1)        [100%] 1 of 1 ✔
[8b/337657] process > read_binning_kraken (1) [100%] 1 of 1, failed: 1 ✘
[-        ] process > agg_kraken              -
[-        ] process > kraken_push             -
[-        ] process > agg_kraken_diversity    -
[d9/08c47e] process > output_documentation    [100%] 1 of 1 ✔
Execution cancelled -- Finishing pending tasks before exit
Error executing process > 'read_binning_kraken (1)'

Caused by:
  Process `read_binning_kraken (1)` terminated with an error exit status (2)

Command executed:

  sed '/^@/s/. ./_/g' data_qced_reads.fastq > krkinput.fastq
  kraken2 --db /tmp/db/krakendb/16S_RDP_k2db/ --use-names --threads 1 krkinput.fastq > krakenreport.txt
  echo "seq_id" > seq_ids.txt
executor >  local (4)
[b3/043f94] process > QC (1)                  [100%] 1 of 1 ✔
[c2/256413] process > qc_reporting (1)        [100%] 1 of 1 ✔
[8b/337657] process > read_binning_kraken (1) [100%] 1 of 1, failed: 1 ✘
[-        ] process > agg_kraken              -
[-        ] process > kraken_push             -
[-        ] process > agg_kraken_diversity    -
[d9/08c47e] process > output_documentation    [100%] 1 of 1 ✔
Execution cancelled -- Finishing pending tasks before exit
-[nf-core/rtnanopipeline] Pipeline completed with errors-
Error executing process > 'read_binning_kraken (1)'

Caused by:
  Process `read_binning_kraken (1)` terminated with an error exit status (2)

Command executed:

  sed '/^@/s/. ./_/g' data_qced_reads.fastq > krkinput.fastq
  kraken2 --db /tmp/db/krakendb/16S_RDP_k2db/ --use-names --threads 1 krkinput.fastq > krakenreport.txt
  echo "seq_id" > seq_ids.txt
  awk -F "\t" '{print $2}' krakenreport.txt >> seq_ids.txt
  gawk -F "\t" 'match($0, /\(taxid ([0-9]+)\)/, ary) {print ary[1]}' krakenreport.txt | taxonkit lineage --data-dir /tmp/db/ > lineage.txt
  cat lineage.txt | taxonkit reformat  --data-dir /tmp/db/ | csvtk -H -t cut -f 1,3 | csvtk -H -t sep -f 2 -s ';' -R > seq_tax.txt
  cat lineage.txt | taxonkit reformat -P  --data-dir /tmp/db/ | csvtk -H -t cut -f 1,3 > seq_tax_otu.txt
  paste seq_ids.txt seq_tax.txt > kraken_report_annotated.txt
  paste seq_ids.txt seq_tax_otu.txt > kraken_report_annotated_otu.txt

Command exit status:
  2

Command output:
  (empty)

Command error:
  kraken2: database ("/tmp/db/krakendb/16S_RDP_k2db/") does not contain necessary file taxo.k2d

Work dir:
  /work/dorojas/6-semen/2-nanortax/work/8b/337657f6151ad5f128d8b542457769

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

My database directories are not the same, but I changed the path in the nextflow.config file. The modification and the tree to the directory are below:

# config file mofications: 
  taxonkit_db = "db/"
  blast_db = "db/blastdb/"
  blast_taxdb = "db/blastdb/"
  kraken_db = "db/krakendb/16S_RDP_k2db/"
  centrifuge_db = "db/centrifugedb/"

# my db directory
(nanortax) [dorojas@login-1 2-nanortax]$ tree db/
db/
├── blastdb
│   ├── 16S_ribosomal_RNA.ndb
│   ├── 16S_ribosomal_RNA.nhr
│   ├── 16S_ribosomal_RNA.nin
│   ├── 16S_ribosomal_RNA.nnd
│   ├── 16S_ribosomal_RNA.nni
│   ├── 16S_ribosomal_RNA.nog
│   ├── 16S_ribosomal_RNA.nos
│   ├── 16S_ribosomal_RNA.not
│   ├── 16S_ribosomal_RNA.nsq
│   ├── 16S_ribosomal_RNA.ntf
│   ├── 16S_ribosomal_RNA.nto
│   ├── 16S_ribosomal_RNA.tar.gz
│   ├── taxdb.btd
│   ├── taxdb.bti
│   ├── taxdb.tar.gz
│   └── taxonomy4blast.sqlite3
├── centrifugedb
│   ├── p_compressed.1.cf
│   ├── p_compressed_2018_4_15.tar.gz
│   ├── p_compressed.2.cf
│   ├── p_compressed.3.cf
│   └── p_compressed.4.cf
├── citations.dmp
├── delnodes.dmp
├── division.dmp
├── gc.prt
├── gencode.dmp
├── images.dmp
├── krakendb
│   ├── 16S_RDP11.5_20200326.tgz
│   └── 16S_RDP_k2db
│       ├── 16S_RDP11.5_20200326.tgz
│       ├── database100mers.kmer_distrib
│       ├── database150mers.kmer_distrib
│       ├── database200mers.kmer_distrib
│       ├── database250mers.kmer_distrib
│       ├── database50mers.kmer_distrib
│       ├── database75mers.kmer_distrib
│       ├── hash.k2d
│       ├── opts.k2d
│       ├── README.md
│       ├── seqid2taxid.map
│       └── taxo.k2d
├── merged.dmp
├── names.dmp
├── nodes.dmp
├── readme.txt
└── taxdump.tar.gz

4 directories, 45 files

The error says that the 'taxo.k2d' is not present in the temporary directory, but it's in my path.

Does anybody have an idea of how to solve this issue? Or the first issue about the '-profile conda' command?

I am new to bioinformatics, so help is very much appreciated!

Problem running test

Hi everyone! I have problems running the test specifically with Kraken I think, this is report:

N E X T F L O W ~ version 22.10.2
Launching main.nf [disturbed_heisenberg] DSL1 - revision: 3ef505f614
WARN: Access to undefined parameter multiqc_config -- Initialise it to a default value eg. params.multiqc_config = some_value
WARN: Access to undefined parameter reads_rt -- Initialise it to a default value eg. params.reads_rt = some_value

                                    ,--./,-.
    ___     __   __   __   ___     /,-._.--~'

|\ | |__ __ / / \ |__) |__ } { | \| | \__, \__/ | \ |___ \-.,--, .,._,'
nf-core/rtnanopipeline v1.0dev

Run Name : test-run-32-noBLAST
Reads : /Users/vicentearriagada/nanotar/NanoRTax/test_data/minimock.fastq
Max Resources : 128 GB memory, 16 cpus, 10d time per job
Container : docker - hecrp/nanortax:latest
Output dir : ./results
Launch dir : /Users/vicentearriagada/nanotar/NanoRTax
Working dir : /Users/vicentearriagada/nanotar/NanoRTax/work
Script dir : /Users/vicentearriagada/nanotar/NanoRTax
User : vicentearriagada
Config Profile : test,docker

WARN: Access to undefined parameter hostnames -- Initialise it to a default value eg. params.hostnames = some_value
executor > local (4)
executor > local (4)
executor > local (4)
[8f/c5f421] process > QC (1) [100%] 1 of 1 ✔
[- ] process > qc_reporting [ 0%] 0 of 1
[a9/dcf93b] process > read_binning_cntrf (1) [100%] 1 of 1, failed: 1 ✘
[- ] process > agg_centrifuge -
[- ] process > cntrf_push -
[- ] process > agg_cntrf_diversity -
[ee/4cdfe6] process > read_binning_kraken (1) [100%] 1 of 1, failed: 1 ✘
[- ] process > agg_kraken -
[- ] process > kraken_push -
[- ] process > agg_kraken_diversity -
[- ] process > read_binning_blast [ 0%] 0 of 1
[- ] process > agg_blast -
[- ] process > blast_push -
[- ] process > agg_blast_diversity -
[6a/90f7e1] process > output_documentation [100%] 1 of 1 ✔
Execution cancelled -- Finishing pending tasks before exit
-[nf-core/rtnanopipeline] Pipeline completed with errors-
WARN: There's no process matching config selector: get_software_versions
Error executing process > 'read_binning_kraken (1)'

Caused by:
Process read_binning_kraken (1) terminated with an error exit status (2)

Command executed:

sed '/^@/s/. ./_/g' test_data_qced_reads.fastq > krkinput.fastq
kraken2 --db /tmp/db/16S_RDP_k2db --use-names --threads 4 krkinput.fastq > krakenreport.txt
echo "seq_id" > seq_ids.txt
awk -F "\t" '{print $2}' krakenreport.txt >> seq_ids.txt
gawk -F "\t" 'match($0, /(taxid ([0-9]+))/, ary) {print ary[1]}' krakenreport.txt | taxonkit lineage > lineage.txt
cat lineage.txt | taxonkit reformat | csvtk -H -t cut -f 1,3 | csvtk -H -t sep -f 2 -s ';' -R > seq_tax.txt
cat lineage.txt | taxonkit reformat -P | csvtk -H -t cut -f 1,3 > seq_tax_otu.txt
paste seq_ids.txt seq_tax.txt > kraken_report_annotated.txt
paste seq_ids.txt seq_tax_otu.txt > kraken_report_annotated_otu.txt

Command exit status:
2

Command output:
(empty)

Command error:
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
kraken2: database ("/tmp/db/16S_RDP_k2db") does not contain necessary file taxo.k2d

Work dir:
/Users/vicentearriagada/nanotar/NanoRTax/work/ee/4cdfe6e558e047b71f91f85d3ecbf7

Tip: view the complete command output by changing to the process work dir and entering the command cat .command.out

please help!

Have a nice day

Unable to access the webapp

Hey, I can't access the webapp.
Though it says Dash is running on http://127.0.0.1:8050/ but the browser is refusing to connect.


(nanortax_webapp) -bash-4.2$ python dashboard.py
Dash is running on http://127.0.0.1:8050/

  • Serving Flask app 'dashboard' (lazy loading)
  • Environment: production
    WARNING: This is a development server. Do not use it in a production deployment.
    Use a production WSGI server instead.
  • Debug mode: on

Help is really appreciated!

csvtk chokes when first line of the (collapsed) lineage has no OTU(s) [FIX INSIDE]

Thank you for providing this utility. As the title suggests, when the first line of lineage.txt has no lineages, csvtk assumes the rest of the file has only 2 columns (see error below), and subsequently cannot parse.

~/temp$cat not_ok.csv | head -n 5
1105
2772    Eukaryota;Rhodophyta;Florideophyceae;Rhodymeniales;Champiaceae;Gastroclonium;
2773    Eukaryota;Rhodophyta;Florideophyceae;Rhodymeniales;Champiaceae;Gastroclonium;Gastroclonium coulteri
2772    Eukaryota;Rhodophyta;Florideophyceae;Rhodymeniales;Champiaceae;Gastroclonium;
2773    Eukaryota;Rhodophyta;Florideophyceae;Rhodymeniales;Champiaceae;Gastroclonium;Gastroclonium coulteri

Running csvtk leads to the error:

~/temp$cat not_ok.csv | head -n 5 | csvtk -H -t sep -f 2 -s ';' -R   
[ERRO] [line 2] number of new columns (7) exceeds that of first row (1), please increase -N (--num-cols) or drop extra data using --drop

FIX:
Several of the nextflow functions in main.nf have the offending line:
cat lineage.txt | taxonkit reformat --data-dir $taxondb | csvtk -H -t cut -f 1,3 | csvtk -H -t sep -f 2 -s ';' -R > seq_tax.txt
This should be replaced with:
cat lineage.txt | taxonkit reformat --data-dir $taxondb | csvtk -H -t cut -f 1,3 | csvtk -N 10 -H -t sep -f 2 -s ';' -R > seq_tax.txt

Error executing QC

Hi! I have this error that not recognize the files in .fastq

Execution cancelled -- Finishing pending tasks before exit
-[nf-core/rtnanopipeline] Pipeline completed with errors-
WARN: There's no process matching config selector: get_software_versions
Error executing process > 'QC (5)'

Caused by:
Process QC (5) terminated with an error exit status (255)

Command executed:

barcode=$(basename $(dirname /Users/vicentearriagada/centrigufe/porechop_tim/BC15.fastq))
fastp -i /Users/vicentearriagada/centrigufe/porechop_tim/BC15.fastq -q 8 -l 1000 --length_limit 2000 -o $barcode_qced_reads.fastq --json $barcode_qc_report.txt
head -n30 $barcode_qc_report.txt | sed '30s/,/\n}/' > $barcode_qc_report.json
echo "}" >> $barcode_qc_report.json

Command exit status:
255

Command output:
(empty)

Command error:
ERROR: Failed to open file: /Users/vicentearriagada/centrigufe/porechop_tim/BC15.fastq

Work dir:
/Users/vicentearriagada/nanotar/NanoRTax/work/20/b04195af89e9561ed0edcd321e4489

Tip: when you have fixed the problem you can continue the execution adding the option -resume to the run command line

Using this pipeline:

nextflow run main.nf -profile docker --reads '/Users/vicentearriagada/centrigufe/porechop_tim/*.fastq' --outdir '/Users/vicentearriagada/centrigufe/results_seg_bar' --blast --centrifuge --kraken -name bar

Any idea what can be the problem?

Error executing process > 'QC (1)' (docker)

Hello, I'm having trouble running this pipeline with a Docker profile. Although I receive no errors when running the provided test, errors occur when I use my own reads (sequenced sputum) with the Docker profile.

This issue is the same as #2 and may be due to a faulty creation of the Conda environment. I'm opening a new issue on this error since the previous one was not responded to.

This is the error message I get:

Error executing process > 'QC (1)'

Caused by:
  Process `QC (1)` terminated with an error exit status (255)

Command executed:

  barcode=$(basename $(dirname /Volumes/NanoporeM1C/Nanopore_sequencer_data/Cobalt/sediment_16S/20220613_1441_MC-112933_FAS49702_5a81a876/fastq_pass/barcode09/consolidated09.fastq))
  fastp -i /Volumes/NanoporeM1C/Nanopore_sequencer_data/Cobalt/sediment_16S/20220613_1441_MC-112933_FAS49702_5a81a876/fastq_pass/barcode09/consolidated09.fastq -q 8 -l 1400 --length_limit 1700 -o $barcode\_qced_reads.fastq --json $barcode\_qc_report.txt
  head -n30 $barcode\_qc_report.txt | sed '30s/,/\n}/' > $barcode\_qc_report.json
  echo "}" >> $barcode\_qc_report.json

Command exit status:
  255

Command output:
  (empty)

Command error:
  ERROR: Failed to open file: /home/brijvers/data/volume_2/subsets/bc09_10/bc09_10.fastq

I triple checked the path to the input reads and other tools can open the file and use it. Nextflow is installed (and works) in the Conda environment I'm working in, Docker is also properly installed.

In the work directory, the '.command.begin', '.command.out', '.command.trace' and '.exitcode' files are empty.

If anyone could provide me with some guidance on how to resolve this issue or suggest a starting point for fixing it, I would be grateful!

Suspect that that conda environment is not being created within the docker profile

Hello

I am attempting to run the pipeline on my Mac using the docker profile. The process fails with the warning:

WARN: There's no process matching config selector: fastqc

The environmentl.yml file provided does not include the fastqc package.

name: nanortax
channels:

  • conda-forge
  • bioconda
  • defaults
  • anaconda
    dependencies:
  • conda-forge::python
  • conda-forge::markdown
  • conda-forge::pymdown-extensions
  • conda-forge::pygments
  • anaconda::scikit-bio
  • bioconda::csvtk
  • bioconda::taxonkit
  • fastp
  • kraken2
  • centrifuge
  • pandas
  • blast
  • gawk

daniellebeaton2@danielles-iMac sediment_16S % /Users/daniellebeaton2/nextflow/nextflow /Users/daniellebeaton2/git/NanoRTax/main.nf -qs 1 -profile docker --reads '/Volumes/NanoporeM1C/Nanopore_sequencer_data/Cobalt/sediment_16S/20220613_1441_MC-112933_FAS49702_5a81a876/fastq_pass/barcode09/consolidated09.fastq' --blast_db --blast_taxdb
N E X T F L O W  ~  version 21.10.6
Launching `/Users/daniellebeaton2/git/NanoRTax/main.nf` [ridiculous_stallman] - revision: 8698db181b
WARN: Access to undefined parameter `multiqc_config` -- Initialise it to a default value eg. `params.multiqc_config = some_value`
WARN: Access to undefined parameter `reads_rt` -- Initialise it to a default value eg. `params.reads_rt = some_value`
WARN: Access to undefined parameter `kraken` -- Initialise it to a default value eg. `params.kraken = some_value`
WARN: Access to undefined parameter `centrifuge` -- Initialise it to a default value eg. `params.centrifuge = some_value`
WARN: Access to undefined parameter `blast` -- Initialise it to a default value eg. `params.blast = some_value`
----------------------------------------------------
                                        ,--./,-.
        ___     __   __   __   ___     /,-._.--~'
  |\ | |__  __ /  ` /  \ |__) |__         }  {
  | \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                        `._,._,'
  nf-core/rtnanopipeline v1.0dev
----------------------------------------------------

Run Name          : ridiculous_stallman
Reads             : /Volumes/NanoporeM1C/Nanopore_sequencer_data/Cobalt/sediment_16S/20220613_1441_MC-112933_FAS49702_5a81a876/fastq_pass/barcode09/consolidated09.fastq
Max Resources     : 128 GB memory, 16 cpus, 10d time per job
Container         : docker - hecrp/nanortax:latest
Output dir        : ./results
Launch dir        : /Volumes/NanoporeM1C/Nanopore_sequencer_data/Cobalt/sediment_16S
Working dir       : /Volumes/NanoporeM1C/Nanopore_sequencer_data/Cobalt/sediment_16S/work
Script dir        : /Users/daniellebeaton2/git/NanoRTax
User              : daniellebeaton2
Config Profile    : docker
----------------------------------------------------
WARN: Access to undefined parameter `hostnames` -- Initialise it to a default value eg. `params.hostnames = some_value`
executor >  local (2)
[99/029dbb] process > QC (1)               [  0%] 0 of 1
[-        ] process > qc_reporting         -
[-        ] process > read_binning_blast   -
[-        ] process > agg_blast            -
executor >  local (2)
[99/029dbb] process > QC (1)               [100%] 1 of 1, failed: 1 ✘
[-        ] process > qc_reporting         -
[-        ] process > read_binning_blast   -
[-        ] process > agg_blast            -
[-        ] process > blast_push           -
[-        ] process > agg_blast_diversity  -
[ee/b1a023] process > output_documentation [100%] 1 of 1 ✔
Execution cancelled -- Finishing pending tasks before exit
-[nf-core/rtnanopipeline] Pipeline completed with errors-
WARN: There's no process matching config selector: fastqc
Error executing process > 'QC (1)'

Caused by:
  Process `QC (1)` terminated with an error exit status (255)

Command executed:

  barcode=$(basename $(dirname /Volumes/NanoporeM1C/Nanopore_sequencer_data/Cobalt/sediment_16S/20220613_1441_MC-112933_FAS49702_5a81a876/fastq_pass/barcode09/consolidated09.fastq))
  fastp -i /Volumes/NanoporeM1C/Nanopore_sequencer_data/Cobalt/sediment_16S/20220613_1441_MC-112933_FAS49702_5a81a876/fastq_pass/barcode09/consolidated09.fastq -q 8 -l 1400 --length_limit 1700 -o $barcode\_qced_reads.fastq --json $barcode\_qc_report.txt
  head -n30 $barcode\_qc_report.txt | sed '30s/,/\n}/' > $barcode\_qc_report.json
  echo "}" >> $barcode\_qc_report.json

Command exit status:
  255

Command output:
  (empty)

Command error:
  ERROR: Failed to open file: /Volumes/NanoporeM1C/Nanopore_sequencer_data/Cobalt/sediment_16S/20220613_1441_MC-112933_FAS49702_5a81a876/fastq_pass/barcode09/consolidated09.fastq

Work dir:
  /Volumes/NanoporeM1C/Nanopore_sequencer_data/Cobalt/sediment_16S/work/99/029dbba425a071ed77419df3da574f

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line


Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.