genomicsiter / nanoclust Goto Github PK

View Code? Open in Web Editor NEW

93.0 5.0 44.0 6.86 MB

NanoCLUST is an analysis pipeline for UMAP-based classification of amplicon-based full-length 16S rRNA nanopore reads

License: MIT License

Dockerfile 5.80% HTML 2.87% Python 32.06% Nextflow 57.99% R 1.27%

nanoclust's Introduction

NanoCLUST

De novo clustering and consensus building for ONT 16S sequencing data.

Introduction

The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker containers making installation trivial and results highly reproducible.

Quick Start

i. Install nextflow

ii. Install docker or conda

iii. Clone the NanoCLUST repository and test the pipeline on a minimal dataset with a single command and docker/conda profiles.

*Download a BLAST database in the NanoCLUST dir for cluster sequence classification. For NCBI 16S rRNA database:

mkdir db db/taxdb
wget https://ftp.ncbi.nlm.nih.gov/blast/db/16S_ribosomal_RNA.tar.gz && tar -xzvf 16S_ribosomal_RNA.tar.gz -C db
wget https://ftp.ncbi.nlm.nih.gov/blast/db/taxdb.tar.gz && tar -xzvf taxdb.tar.gz -C db/taxdb

#Using docker profile with container-based dependencies (recommended).
nextflow run main.nf -profile test,docker

iv. Start running your own analysis!

Run a single sample analysis inside NanoCLUST dir using default parameters:

nextflow run main.nf \ 
             -profile docker \ 
             --reads 'sample.fastq' \ 
             --db "db/16S_ribosomal_RNA" \ 
             --tax "db/taxdb/"

See usage and output sections in the documentation (/docs) for all of the available options when running the pipeline.

Computing requirements note

Clustering step uses up to 32-36GB RAM when working with a real dataset analysis and default parameters (umap_set_size = 100000). Setting umap_set_size to 50000, will diminish memory consumption to 10-13GB RAM. When running the pipeline, kmer_freqs or mostly read_clustering processes could be terminated with status 137 when not enough RAM.

Nextflow automatically uses all available resources in your machine. More cpu threads enable the pipeline to compute and classify the different clusters at the same time and hence reduces the overall execution time.

Using the -with-trace option, it is possible to get an execution trace file which includes computing times and memory consumption metrics for all pipeline processes.

*The execution of the test profile (minimum testing dataset and default parameters) can be done with a regular 4 cores and 16GB RAM machine.

Troubleshooting

Using conda profile, some issues can arise due to unknown problems with the read_clustering and kmer_freq conda environments. If it is the case, we recommend using the docker profile to ensure all dependencies run in the right environments and these are tested and available in the cloud (automatically downloaded when using docker profile).
In some machines, the read_clustering process exits with error status(RuntimeError: cannot cache function '...'). We have seen that this condition can be avoided running the pipeline with sudo privileges (even if Docker was previously available without sudo permissions).

Credits

Rodríguez-Pérez H, Ciuffreda L, Flores C. NanoCLUST: a species-level analysis of 16S rRNA nanopore sequencing data. Bioinformatics. 2021;37(11):1600-1601. doi:https://doi.org/10.1093/bioinformatics/btaa900

This work was supported by Instituto de Salud Carlos III [PI14/00844, PI17/00610, and FI18/00230] and co-financed by the European Regional Development Funds, “A way of making Europe” from the European Union; Ministerio de Ciencia e Innovación [RTC-2017-6471-1, AEI/FEDER, UE]; Cabildo Insular de Tenerife [CGIEU0000219140]; Fundación Canaria Instituto de Investigación Sanitaria de Canarias [PIFUN48/18]; and by the agreement with Instituto Tecnológico y de Energías Renovables (ITER) to strengthen scientific and technological education, training, research, development and innovation in Genomics, Personalized Medicine and Biotechnology [OA17/008].

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines.

nanoclust's People

Contributors

Stargazers

Watchers

Forkers

piroonj druvus vikash84 luciauy krobison13 thomieh73 foi-bioinformatics ivanbh214 jianshu93 deniribicic jmtsuji ximenesuk sheffield-bioinformatics-core zxgsy520 simaafshar vindeez serlandson norwegianveterinaryinstitute ssyamoako loganwisteard ctmlab4 danbeaton tetukas srisvs33 treetopunder nobelex iferres guanguiwensy imagoxv liu5796796 simonelupini currocam liuweisu hkunerth tankmermaid pcrxn nyamota jennyfothergill xiaolw95 env-gen iracooke danpal96 michaela-murillo

nanoclust's Issues

Error running test at read_clustering

$ nextflow run ~/SequencingData/NanoCLUST/main.nf -profile test,docker
N E X T F L O W  ~  version 20.04.1
Launching `/home/james/SequencingData/NanoCLUST/main.nf` [dreamy_fourier] - revision: 15218921b7
----------------------------------------------------
      _   __                     ________    __  _____________
     / | / /___ _____  ____     / ____/ /   / / / / ___/_  __/
    /  |/ / __ `/ __ \/ __ \   / /   / /   / / / /\__ \ / /
   / /|  / /_/ / / / / /_/ /  / /___/ /___/ /_/ /___/ // /
  /_/ |_/\__,_/_/ /_/\____/   \____/_____/\____//____//_/

  NanoCLUST v1.0dev
----------------------------------------------------
Run Name          : dreamy_fourier
Reads             : /home/james/SequencingData/NanoCLUST/test_datasets/mock4_run3bc08_5000.fastq
Max Resources     : 128 GB memory, 16 cpus, 10d time per job
Container         : docker - [:]
Output dir        : ./results
Launch dir        : /home/james/SequencingData/NanoCLUST/templates
Working dir       : /home/james/SequencingData/NanoCLUST/templates/work
Script dir        : /home/james/SequencingData/NanoCLUST
User              : james
Config Profile    : test,docker
Config Description: Minimal test dataset to check pipeline function
----------------------------------------------------
executor >  local (5)
[5b/f73956] process > QC (1)                   [100%] 1 of 1 ✔
[01/a019f4] process > fastqc (1)               [100%] 1 of 1 ✔
[81/e663de] process > kmer_freqs (1)           [100%] 1 of 1 ✔
[eb/f5325e] process > read_clustering (1)      [  0%] 0 of 1
[-        ] process > split_by_cluster         -
[-        ] process > read_correction          -
[-        ] process > draft_selection          -
[-        ] process > racon_pass               -
[-        ] process > medaka_pass              -
[-        ] process > consensus_classification -
[-        ] process > join_results             -
[-        ] process > get_abundances           -
[-        ] process > plot_abundances          -
[d2/3d9a68] process > output_documentation     [100%] 1 of 1 ✔
Error executing process > 'read_clustering (1)'

Caused by:
  Process `read_clustering (1)` terminated with an error exit status (1)

Command executed [/home/james/SequencingData/NanoCLUST/templates/umap_hdbscan.py]:

  #!/usr/bin/env python

executor >  local (5)
[5b/f73956] process > QC (1)                   [100%] 1 of 1 ✔
[01/a019f4] process > fastqc (1)               [100%] 1 of 1 ✔
[81/e663de] process > kmer_freqs (1)           [100%] 1 of 1 ✔
[eb/f5325e] process > read_clustering (1)      [100%] 1 of 1, failed: 1 ✘
[-        ] process > split_by_cluster         -
[-        ] process > read_correction          -
[-        ] process > draft_selection          -
[-        ] process > racon_pass               -
[-        ] process > medaka_pass              -
[-        ] process > consensus_classification -
[-        ] process > join_results             -
[-        ] process > get_abundances           -
[-        ] process > plot_abundances          -
[d2/3d9a68] process > output_documentation     [100%] 1 of 1 ✔
Error executing process > 'read_clustering (1)'

Caused by:
  Process `read_clustering (1)` terminated with an error exit status (1)

Command executed [/home/james/SequencingData/NanoCLUST/templates/umap_hdbscan.py]:

  #!/usr/bin/env python

executor >  local (5)
[5b/f73956] process > QC (1)                   [100%] 1 of 1 ✔
[01/a019f4] process > fastqc (1)               [100%] 1 of 1 ✔
[81/e663de] process > kmer_freqs (1)           [100%] 1 of 1 ✔
[eb/f5325e] process > read_clustering (1)      [100%] 1 of 1, failed: 1 ✘
[-        ] process > split_by_cluster         -
[-        ] process > read_correction          -
[-        ] process > draft_selection          -
[-        ] process > racon_pass               -
[-        ] process > medaka_pass              -
[-        ] process > consensus_classification -
[-        ] process > join_results             -
[-        ] process > get_abundances           -
[-        ] process > plot_abundances          -
[d2/3d9a68] process > output_documentation     [100%] 1 of 1 ✔
[0;35m[nf-core/nanoclust] Pipeline completed with errors
Error executing process > 'read_clustering (1)'

Caused by:
  Process `read_clustering (1)` terminated with an error exit status (1)

Command executed [/home/james/SequencingData/NanoCLUST/templates/umap_hdbscan.py]:

  #!/usr/bin/env python

executor >  local (5)
[5b/f73956] process > QC (1)                   [100%] 1 of 1 ✔
[01/a019f4] process > fastqc (1)               [100%] 1 of 1 ✔
[81/e663de] process > kmer_freqs (1)           [100%] 1 of 1 ✔
[eb/f5325e] process > read_clustering (1)      [100%] 1 of 1, failed: 1 ✘
[-        ] process > split_by_cluster         -
[-        ] process > read_correction          -
[-        ] process > draft_selection          -
[-        ] process > racon_pass               -
[-        ] process > medaka_pass              -
[-        ] process > consensus_classification -
[-        ] process > join_results             -
[-        ] process > get_abundances           -
[-        ] process > plot_abundances          -
[d2/3d9a68] process > output_documentation     [100%] 1 of 1 ✔
[0;35m[nf-core/nanoclust] Pipeline completed with errors
Error executing process > 'read_clustering (1)'

Caused by:
  Process `read_clustering (1)` terminated with an error exit status (1)

Command executed [/home/james/SequencingData/NanoCLUST/templates/umap_hdbscan.py]:

  #!/usr/bin/env python

  import numpy as np
  import umap
  import matplotlib.pyplot as plt
  from sklearn import decomposition
  import random
  import pandas as pd
  import hdbscan

  df = pd.read_csv("freqs.txt", delimiter="     ")

  #UMAP
  motifs = [x for x in df.columns.values if x not in ["read", "length"]]
  X = df.loc[:,motifs]
  X_embedded = umap.UMAP(n_neighbors=15, min_dist=0.1, verbose=2).fit_transform(X)

  df_umap = pd.DataFrame(X_embedded, columns=["D1", "D2"])
  umap_out = pd.concat([df["read"], df["length"], df_umap], axis=1)

  #HDBSCAN
  X = umap_out.loc[:,["D1", "D2"]]
  umap_out["bin_id"] = hdbscan.HDBSCAN(min_cluster_size=int(50), cluster_selection_epsilon=int(0.5)).fit_predict(X)

  #PLOT
  plt.figure(figsize=(20,20))
  plt.scatter(X_embedded[:, 0], X_embedded[:, 1], c=umap_out["bin_id"], cmap='Spectral', s=1)
  plt.xlabel("UMAP1", fontsize=18)
  plt.ylabel("UMAP2", fontsize=18)
  plt.gca().set_aspect('equal', 'datalim')
  plt.title("Projecting " + str(len(umap_out['bin_id'])) + " reads. " + str(len(umap_out['bin_id'].unique())) + " clusters generated by HDBSCAN", fontsize=18)

  for cluster in np.sort(umap_out['bin_id'].unique()):
      read = umap_out.loc[umap_out['bin_id'] == cluster].iloc[0]
      plt.annotate(str(cluster), (read['D1'], read['D2']), weight='bold', size=14)

  plt.savefig('hdbscan.output.png')
  umap_out.to_csv("hdbscan.output.tsv", sep="/py", index=False)

Command exit status:
  1

Command output:
  (empty)

Command error:
  WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
  Traceback (most recent call last):
    File ".command.sh", line 4, in <module>
      import umap
    File "/opt/conda/envs/read_clustering/lib/python3.8/site-packages/umap/__init__.py", line 1, in <module>
      from .umap_ import UMAP
    File "/opt/conda/envs/read_clustering/lib/python3.8/site-packages/umap/umap_.py", line 53, in <module>
      from umap.layouts import (
    File "/opt/conda/envs/read_clustering/lib/python3.8/site-packages/umap/layouts.py", line 39, in <module>
      def rdist(x, y):
    File "/opt/conda/envs/read_clustering/lib/python3.8/site-packages/numba/decorators.py", line 193, in wrapper
      disp.enable_caching()
    File "/opt/conda/envs/read_clustering/lib/python3.8/site-packages/numba/dispatcher.py", line 679, in enable_caching
      self._cache = FunctionCache(self.py_func)
    File "/opt/conda/envs/read_clustering/lib/python3.8/site-packages/numba/caching.py", line 614, in __init__
      self._impl = self._impl_class(py_func)
    File "/opt/conda/envs/read_clustering/lib/python3.8/site-packages/numba/caching.py", line 348, in __init__
      raise RuntimeError("cannot cache function %r: no locator available "
  RuntimeError: cannot cache function 'rdist': no locator available for file '/opt/conda/envs/read_clustering/lib/python3.8/site-packages/umap/layouts.py'

Work dir:
  /home/james/SequencingData/NanoCLUST/templates/work/eb/f5325ea476439e9002400a2c0daecb

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

I have checked the error
RuntimeError: cannot cache function 'rdist': no locator available for file '/opt/conda/envs/read_clustering/lib/python3.8/site-packages/umap/layouts.py'
and found someone with a similar issue here:
https://stackoverflow.com/questions/56995232/runtimeerror-cannot-cache-function-jaccard-no-locator-available-for-file

I am not sure this will fix it, please help

Checked some of the previous posted issues, adding the following information:
Ubuntu: v18.04.4 LTS (Bionic Beaver)
Perl: v5.26.1
Docker: v18.09.7

No such variable: barcode

Hello, I've been trying to test the pipeline with one of my samples (fastq file) but every time I tried this error kept appearing (no matter if I use conda or docker ) :

ERROR ~ No such variable: barcode -- Check script 'main.nf' at line: 202 or see '.nextflow.log' file for more details.

Run Name : high_torvalds Reads : /home/nutrigenomics/bioinfo_soco/barcodes_4run_vsearch/barcode12.fastq Max Resources : 128 GB memory, 16 cpus, 10d time per job Output dir : ./results Launch dir : /home/nutrigenomics/NanoCLUST-master/ Working dir : /home/nutrigenomics/NanoCLUST-master/work Script dir : /home/nutrigenomics/NanoCLUST-master User : nutrigenomics Config Profile : conda Config Description: Minimal test dataset to check pipeline function ---------------------------------------------------- ERROR ~ No such variable: barcode -- Check script 'main.nf' at line: 202 or see '.nextflow.log' file for more details

The input file is a fastq file after nanopore sequencing. I would appreciate your response

Installing NanoCLUST as lmod module

Hi,
I have a couple users who requested that NanoCLUST be installed but I would like to ask for your help to understand how I can get it installed as an lmod module so that I can make this and future versions available on our compute nodes. I am not very familiar with the nextflow and also trying to install all the dependencies in a single conda environment that is loadable from our software stack onto our different machines is taking unusually long.
I also tried to install a local instance using your standard nextflow method, but I don't see any way to make the NanoCLUST git clone + work directories read-only prior to porting into our lmod software stack. This crashed any attempt to run NanoCLUST from a fresh working folder because either the entire work/ or just the conda/ were made read+execute-only.
Thank you for your help.
Erika

Fail to create conda environment and fail to process QC run

Hi,

I have installed the NanoCLUST, but errors occur when running the commands. When we tested the command line with our sample fastq file, error message was given that the path to some libgcc-ng packages could not be found. Also, it could not pass the QC(1). The problem could not be solved after the update of the package.

I have attached the error message for your reference.

Taxon_id output from http://api.unipept.ugent.be/api/v1/taxonomy.json?input have no output ,even for mock4_run3bc08_5000.fastq !, t

Thank you for the latest update with the get_abundance.py function, but a remaining issue is that some tax id's appear to be breaking the get_abundance.py function

nanoclust running problem read_clustering

Hi,
i'm trying to use NanoCLUST for bacteria diversity analysis, unfortunately i cannot solve this problem:

Command exit status:
1

Command output:
(empty)

Command error:
retval = self._compile_core(args, return_type)
File "/home/matias/Luca/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/dispatcher.py", line 106, in _compile_core
cres = compiler.compile_extra(self.targetdescr.typing_context,
File "/home/matias/Luca/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/compiler.py", line 606, in compile_extra
return pipeline.compile_extra(func)
File "/home/matias/Luca/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/compiler.py", line 353, in compile_extra
return self._compile_bytecode()
File "/home/matias/Luca/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/compiler.py", line 415, in _compile_bytecode
return self._compile_core()
File "/home/matias/Luca/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/compiler.py", line 395, in _compile_core
raise

I also attached the whole log file of the operations.

THANKS FOR HELPING ME!!!!

.nextflow.log

SOLVED: how to solve "ERROR ~ No such variable: barcode" ?

Hello,

I'm trying to run the NanoCLUST test using following command:

nextflow run main.nf -profile test,conda

However, when I run this, I receive error "No such variable: barcode".

I also tried to analyse my own dataset using following command:

nextflow run ../main.nf -profile conda --reads 'HAC_run_all.fastq' --db "./../db/16S_ribosomal_RNA" --tax "./../db/taxdb/"

However, I recieve the same error. The full output is given below.

Could you advise me on how to solve this problem?

OUTPUT:

~/Documents/Software/NanoCLUST/NanoCLUST$
nextflow run main.nf -profile test,conda
N E X T F L O W  ~  version 19.01.0
Launching `main.nf` [nauseous_ptolemy] - revision: c3b5ee2f3d

----------------------------------------------------
      _   __                     ________    __  _____________
     / | / /___ _____  ____     / ____/ /   / / / / ___/_  __/
    /  |/ / __ `/ __ \/ __ \   / /   / /   / / / /\__ \ / /   
   / /|  / /_/ / / / / /_/ /  / /___/ /___/ /_/ /___/ // /    
  /_/ |_/\__,_/_/ /_/\____/   \____/_____/\____//____//_/     

  NanoCLUST v1.0dev
----------------------------------------------------

Run Name          : nauseous_ptolemy
Reads             : /home/datura-workstation/Documents/Software/NanoCLUST/NanoCLUST/test_datasets/mock4_run3bc08_5000.fastq
Max Resources     : 128 GB memory, 16 cpus, 10d time per job
Output dir        : ./results
Launch dir        : /home/datura-workstation/Documents/Software/NanoCLUST/NanoCLUST
Working dir       : /home/datura-workstation/Documents/Software/NanoCLUST/NanoCLUST/work
Script dir        : /home/datura-workstation/Documents/Software/NanoCLUST/NanoCLUST
User              : datura-workstation
Config Profile    : test,conda
Config Description: Minimal test dataset to check pipeline function
----------------------------------------------------
ERROR ~ No such variable: barcode

 -- Check script 'main.nf' at line: 200 or see '.nextflow.log' file for more details

Error executing process > 'read_correction (2)'

Hi, i ran the pipeline on my oqn data set using following command

nextflow /home/hamlab/NanoCLUST/main.nf --reads '/home/hamlab/Nanopore/nanofilt_trimmed_barecode01.fastq' --db '/home/hamlab/NanoCLUST/db/blastdb' --tax '/home/hamlab/NanoCLUST/db/taxdb' -profile conda

the pipeline works as
Run Name : awesome_plateau
Reads : /home/hamlab/Nanopore/nanofilt_trimmed_barecode01.fastq
Max Resources : 128 GB memory, 16 cpus, 10d time per job
Output dir : ./results
Launch dir : /home/hamlab/Nanopore
Working dir : /home/hamlab/Nanopore/work
Script dir : /home/hamlab/NanoCLUST
User : hamlab
Config Profile : conda

[- ] process > QC -
[- ] process > fastqc -
[- ] process > kmer_freqs -
executor > local (1)
[be/62e2a0] process > QC (1) [100%] 1 of 1 ✔
[- ] process > fastqc -
[- ] process > kmer_freqs -
executor > local (2)
[be/62e2a0] process > QC (1) [100%] 1 of 1 ✔
[- ] process > fastqc -
[- ] process > kmer_freqs -
executor > local (2)
[be/62e2a0] process > QC (1) [100%] 1 of 1 ✔
[- ] process > fastqc -
[- ] process > kmer_freqs -
executor > local (3)
[be/62e2a0] process > QC (1) [100%] 1 of 1 ✔
executor > local (4)
[be/62e2a0] process > QC (1) [100%] 1 of 1 ✔
executor > local (4)
[be/62e2a0] process > QC (1) [100%] 1 of 1 ✔
executor > local (8)
[be/62e2a0] process > QC (1) [100%] 1 of 1 ✔
[ce/6e7260] process > fastqc (1) [100%] 1 of 1 ✔
[52/c6994a] process > kmer_freqs (1) [100%] 1 of 1 ✔
[65/c93866] process > read_clustering (1) [100%] 1 of 1 ✔
[1a/98e4b2] process > split_by_cluster (1) [100%] 1 of 1 ✔
[f1/f29d75] process > read_correction (2) [ 0%] 0 of 2
[- ] process > draft_selection -
[- ] process > racon_pass -
[- ] process > medaka_pass -
[- ] process > consensus_classification -
[- ] process > join_results -
[- ] process > get_abundances -
[- ] process > plot_abundances -
[52/0c3775] process > output_documentation [100%] 1 of 1 ✔

After that it has this error
Error executing process > 'read_correction (2)'
Caused by:
Process read_correction (2) terminated with an error exit status (1)
Command executed:
head -n$(( 1004 )) 1.fastq > subset.fastq
canu -correct -p corrected_reads -nanopore-raw subset.fastq genomeSize=1.5k stopOnLowCoverage=1 minInputCoverage=2 minReadLength=500 minOverlapLength=200
gunzip corrected_reads.correctedReads.fasta.gz
READ_COUNT=$(( $(awk '{print $1/2}' <(wc -l corrected_reads.correctedReads.fasta)) ))
cat 1.log > 1_racon.log
echo -n ";100;$READ_COUNT;" >> 1_racon.log && cp 1_racon.log 1_racon_.log
Command exit status:
1
Command output:
(empty)
Command error:
/home/hamlab/Nanopore/work/conda/read_correction--9838d2d8e8d06bacfda88b353ba23513/bin/sqStoreDumpMetaData
-S ./corrected_reads.seqStore ***
executor > local (8)
[be/62e2a0] process > QC (1) [100%] 1 of 1 ✔
[ce/6e7260] process > fastqc (1) [100%] 1 of 1 ✔
[52/c6994a] process > kmer_freqs (1) [100%] 1 of 1 ✔
[65/c93866] process > read_clustering (1) [100%] 1 of 1 ✔
[1a/98e4b2] process > split_by_cluster (1) [100%] 1 of 1 ✔
[f1/f29d75] process > read_correction (2) [ 50%] 1 of 2, failed: 1
[- ] process > draft_selection -
[- ] process > racon_pass -
[- ] process > medaka_pass -
[- ] process > consensus_classification -
[- ] process > join_results -
[- ] process > get_abundances -
[- ] process > plot_abundances -
[52/0c3775] process > output_documentation [100%] 1 of 1 ✔
can you help me in resolving the issue
Thanks

Docker profiles

The documentation says that the Docker profiles are pulled from nfcore/nanoclust, but these containers don't exist on DockerHub. They are pulled from hecrp (e.g. hecrp/nanoclust-read_correction). The documentation ought to be updated.

Also, all of the hecrp containers are tagged with "linux", but most of the nfcore containers are just tagged with "container". I can run the test profile no problem on an ubuntu EC2 instance on AWS, but when I try to run it on my MacBook Pro I run into problems (these vary, right now it is timing out on read_correction). I'm not familiar with the nuances of running docker profiles on different OS environments, but it seems like you maybe need to either mention in the documentation that it works on linux and may or may not work on other OSes, and/or do more testing on MacOS.

Failed to create Conda environment

Hi there,

I haven't been able to complete a successful run. It seems to have an issue creating a conda environment (conda otherwise seems to be working perfectly).

Output as follows:

`
NanoCLUST v1.0dev

Run Name : amazing_jepsen
Reads : Larvae_waste.fastq
Max Resources : 128 GB memory, 16 cpus, 10d time per job
Output dir : ./results
Launch dir : /exports/eddie/scratch/tregan/NanoCLUST
Working dir : /exports/eddie/scratch/tregan/NanoCLUST/work
Script dir : /exports/eddie/scratch/tregan/NanoCLUST
User : tregan
Config Profile : conda

executor > local (1)
[7f/4a6a0e] process > QC (1) [100%] 1 of 1 ✔
[- ] process > fastqc -
[- ] process > kmer_freqs -
[- ] process > read_clustering -
[- ] process > split_by_cluster -
[- ] process > read_correction -
[- ] process > draft_selection -
[- ] process > racon_pass -
[- ] process > medaka_pass -
[- ] process > consensus_classification -
[- ] process > join_results -
[- ] process > get_abundances -
[- ] process > plot_abundances -
[- ] process > output_documentation -
Creating Conda env: /exports/eddie/scratch/tregan/NanoCLUST/conda_envs/kmer_freqs/environment.yml [cache /exports/eddie/scratch/tregan/NanoCLUST/work/conda/kmer_freq-999fbd7473cd6106e662a85d832c592d]
executor > local (1)
executor > local (1)
[7f/4a6a0e] process > QC (1) [100%] 1 of 1 ✔
[- ] process > fastqc -
[- ] process > kmer_freqs -
[- ] process > read_clustering -
[- ] process > split_by_cluster -
[- ] process > read_correction -
[- ] process > draft_selection -
[- ] process > racon_pass -
[- ] process > medaka_pass -
[- ] process > consensus_classification -
[- ] process > join_results -
[- ] process > get_abundances -
[- ] process > plot_abundances -
[- ] process > output_documentation -
Error executing process > 'output_documentation'

Caused by:
Failed to create Conda environment
command: conda env create --prefix /exports/eddie/scratch/tregan/NanoCLUST/work/conda/output-documentation-f87367ab9ec78f3fc18fcdbf466b1827 --file /exports/eddie/scratch/tregan/NanoCLUST/conda_envs/output_documentation/environment.yml
status : 143
message:
`

Many thanks,
Tim

use of other database

Is there any option to use Silva or other curated database for the classification? if yes how do we create taxid and taxbtd files.

REFERENCE: Running the pipeline with 50 samples.

hi， the problems is that run the pipline with 1 sample is perfectct，but my data has 50 samples,it always occurs error ，when i run the 50 samples with the parameter "--reads 'my path/*.fastq" .

Originally posted by @HaiyangDu in #1 (comment)

abundance plot error

[22/e6402c] process > QC (1) [100%] 1 of 1 ✔
[64/24fe3f] process > fastqc (1) [100%] 1 of 1 ✔
[c0/ad2b7c] process > kmer_freqs (1) [100%] 1 of 1 ✔
[f2/0a2839] process > read_clustering (1) [100%] 1 of 1 ✔
[f7/cadd4e] process > split_by_cluster (1) [100%] 1 of 1 ✔
[25/d2be15] process > read_correction (17) [100%] 17 of 17 ✔
[3a/1f7328] process > draft_selection (17) [100%] 17 of 17 ✔
[bc/056b51] process > racon_pass (17) [100%] 17 of 17 ✔
[aa/ca1f63] process > medaka_pass (17) [100%] 17 of 17 ✔
[35/61343c] process > consensus_classification (17) [100%] 17 of 17 ✔
[b1/6f6045] process > join_results (1) [100%] 1 of 1 ✔
[00/9e1975] process > get_abundances (1) [100%] 1 of 1, failed: 1 ✘
[- ] process > plot_abundances -
[a1/a41915] process > output_documentation [100%] 1 of 1 ✔
Execution cancelled -- Finishing pending tasks before exit
[0;35m[nf-core/nanoclust] Pipeline completed with errors
Error executing process > 'get_abundances (1)'

Caused by:
Process get_abundances (1) terminated with an error exit status (1)

Command executed [/NanoporeTools/NanoCLUST/templates/get_abundance.py]:

#!/usr/bin/env python

import numpy as np
import matplotlib.pyplot as plt
from matplotlib import rc
import pandas as pd
from functools import reduce
import requests
import json
#https://unipept.ugent.be/apidocs/taxonomy

def get_taxname(tax_id,tax_level):
tags = {"S": "species_name","G": "genus_name","F": "family_name","O":'order_name', "C": "class_name"}
tax_level_tag = tags[tax_level]

  path = 'http://api.unipept.ugent.be/api/v1/taxonomy.json?input[]=' + str(int(tax_id)) + '&extra=true&names=true'
  complete_tax = requests.get(path).text
  return json.loads(complete_tax)[0][tax_level_tag]

def get_abundance_values(names,paths):
dfs = []
for name,path in zip(names,paths):
data = pd.read_csv(path, index_col=False, sep=';').iloc[:,1:]

      total = sum(data['reads_in_cluster'])
      rel_abundance=[]

      for index,row in data.iterrows():
          rel_abundance.append(row['reads_in_cluster'] / total)
          
      data['rel_abundance'] = rel_abundance
      dfs.append(pd.DataFrame({'taxid': data['taxid'], 'rel_abundance': rel_abundance}))
      data.to_csv("" + name + "_nanoclust_out.txt")

  return dfs

def merge_abundance(dfs,tax_level):
df_final = reduce(lambda left,right: pd.merge(left,right,on='taxid',how='outer').fillna(0), dfs)
df_final["taxid"] = [get_taxname(row["taxid"], tax_level) for index, row in df_final.iterrows()]
df_final_grp = df_final.groupby(["taxid"], as_index=False).sum()
return df_final_grp

def get_abundance(names,paths,tax_level):
if(not isinstance(paths, list)):
paths = [paths]
names = [names]

  dfs = get_abundance_values(names,paths)
  df_final_grp = merge_abundance(dfs, tax_level)
  df_final_grp.to_csv("rel_abundance_"+ names[0] + "_" + tax_level + ".csv", index = False)

paths = "barcode10_chimfilt.nanoclust_out.txt"
names = "barcode10_chimfilt"

get_abundance(names,paths, "G")
get_abundance(names,paths, "S")
get_abundance(names,paths, "O")
get_abundance(names,paths, "F")

Command exit status:
1

Command output:
(empty)

Command error:
Traceback (most recent call last):
File ".command.sh", line 55, in
get_abundance(names,paths, "G")
File ".command.sh", line 49, in get_abundance
df_final_grp = merge_abundance(dfs, tax_level)
File ".command.sh", line 39, in merge_abundance
df_final["taxid"] = [get_taxname(row["taxid"], tax_level) for index, row in df_final.iterrows()]
File ".command.sh", line 39, in
df_final["taxid"] = [get_taxname(row["taxid"], tax_level) for index, row in df_final.iterrows()]
File ".command.sh", line 16, in get_taxname
path = 'http://api.unipept.ugent.be/api/v1/taxonomy.json?input[]=' + str(int(tax_id)) + '&extra=true&names=true'
ValueError: cannot convert float NaN to integer

Work dir:
/NanoporeTools/NanoCLUST/work/00/9e19751bb522057d633880bce32a6a

Tip: when you have fixed the problem you can continue the execution adding the option -resume to the run command line

(base) bash-4.2$

Fail process > fastqc (1)

Hello,

I am getting the following error running (nextflow run main.nf -profile test,conda). Can you please help me to fix this issue?

Thank you,

Irmarie

N E X T F L O W ~ version 21.04.0
Launching main.nf [fabulous_darwin] - revision: 5e0f88a799

  _   __                     ________    __  _____________
 / | / /___ _____  ____     / ____/ /   / / / / ___/_  __/
/  |/ / __ `/ __ \/ __ \   / /   / /   / / / /\__ \ / /

/ /| / // / / / / // / / // // // // // /
// |/_,// //_/ _/__/_//___///

NanoCLUST v1.0dev

Run Name : fabulous_darwin
Reads : /scratch/icotto25/shotgun_time_series/16S_May2021/NanoCLUST/test_datasets/mock4_run3bc08_5000.fastq
Max Resources : 128 GB memory, 16 cpus, 10d time per job
Output dir : ./results
Launch dir : /scratch/icotto25/shotgun_time_series/16S_May2021/NanoCLUST
Working dir : /scratch/icotto25/shotgun_time_series/16S_May2021/NanoCLUST/work
Script dir : /scratch/icotto25/shotgun_time_series/16S_May2021/NanoCLUST
User : icotto25
Config Profile : test,conda
Config Description: Minimal test dataset to check pipeline function

executor > local (5)
executor > local (5)
executor > local (5)
[c8/177787] process > QC (1) [100%] 1 of 1 ✔
executor > local (5)
[c8/177787] process > QC (1) [100%] 1 of 1 ✔
[1a/4e6673] process > fastqc (1) [100%] 1 of 1, failed: 1 ✘
[49/bc700b] process > kmer_freqs (1) [100%] 1 of 1 ✔
[55/c689e9] process > read_clustering (1) [100%] 1 of 1, failed: 1 ✘
[- ] process > split_by_cluster -
[- ] process > read_correction -
[- ] process > draft_selection -
[- ] process > racon_pass -
[- ] process > medaka_pass -
[- ] process > consensus_classification -
[- ] process > join_results -
[- ] process > get_abundances -
[- ] process > plot_abundances -
[07/3f35b0] process > output_documentation [100%] 1 of 1 ✔
Execution cancelled -- Finishing pending tasks before exit
[nf-core/nanoclust] Pipeline completed with errors
Error executing process > 'fastqc (1)'

Caused by:
Process fastqc (1) terminated with an error exit status (127)

Command executed:

fastqc -q mock4_run3bc08_5000_qced_reads_set.fastq

Command exit status:
127

Command output:
(empty)

Command error:
.command.sh: line 2: fastqc: command not found

Work dir:
/scratch/icotto25/shotgun_time_series/16S_May2021/NanoCLUST/work/1a/4e6673c56c84743a12d46b1aaf57aa

Tip: when you have fixed the problem you can continue the execution adding the option -resume to the run command line

Error at read-correction step

Running the script with the test data provided, errors at the read-correction stage:

Starting command on Wed May 20 12:05:03 2020 with 39.037 GB free disk space

  cd /home/ubuntu/bin/NanoCLUST/work/0c/cdad9cbbc33ab4512480bf701f33cb
  sbatch \
    --cpus-per-task=1 \
    --mem-per-cpu=4g   \
    -D `pwd` \
    -J 'canu_corrected_reads' \
    -o canu-scripts/canu.01.out  canu-scripts/canu.01.sh

Finished on Wed May 20 12:05:03 2020 (like a bat out of hell) with 39.037 GB free disk space

gzip: corrected_reads.correctedReads.fasta.gz: No such file or directory

Error executing process > read_correction

Hello!

I have been trying out NanoCLUST for 16S analysis of a few of my data and it has been wonderful so far.
But, a few datasets always abort during read_correction.

Even though I have 47 GB of RAM and a 24 CPU machine, I thought maybe there were too many data so I played around with --umap_set_size and --polishing_reads but the same Error executing process > read_correction error comes up with an error exit status (1).

I'll post the whole error along with the information I got from running .command.sh in the workdir. I don't think it is anything wrong with Canu itself as I was able to complete the pipeline on most of my datas.

The command that I used:

nextflow run main.nf -profile docker --reads 'part3/10000_lq8BC03.fastq' --db "db/16S_ribosomal_RNA" --tax "db/taxdb/" --outdir "results/part3" --umap_set_size 25000 --polishing_reads 50

NanoCLUST

Error executing process > 'read_correction (7)'

Caused by:
  Process `read_correction (7)` terminated with an error exit status (1)

Command executed:

  head -n$(( 100*4 )) 6.fastq > subset.fastq
  canu -correct -p corrected_reads -nanopore-raw subset.fastq genomeSize=1.5k stopOnLowCoverage=1 minInputCoverage=2 minReadLength=500 minOverlapLength=200
  gunzip corrected_reads.correctedReads.fasta.gz
  READ_COUNT=$(( $(awk '{print $1/2}' <(wc -l corrected_reads.correctedReads.fasta)) ))
  cat 6.log > 6_racon.log
  echo -n ";100;$READ_COUNT;" >> 6_racon.log && cp 6_racon.log 6_racon_.log

Command exit status:
  1

Command output:
  (empty)

Command error:
      /opt/conda/envs/read_correction-/bin/sqStoreDumpMetaData \
        -S ./corrected_reads.seqStore \
        -corrected \
        -histogram \
      > ./corrected_reads.seqStore/readlengths-obt.txt \
      2> ./corrected_reads.seqStore/readlengths-obt.err 
      
  
  -- Finished on Thu Jun 24 09:07:56 2021 (like a bat out of hell) with 1108.475 GB free disk space
  ----------------------------------------
  --
  -- In sequence store './corrected_reads.seqStore':
  --   Found 0 reads.
  --   Found 0 bases (0 times coverage).
  --
  -- Purging correctReads output after loading into stores.
  -- Purged 1 .cns outputs.
  -- Purged 2 .out job log outputs.
  --
  -- Purging overlaps used for correction.
  -- Report changed.
  -- Finished stage 'cor-loadCorrectedReads', reset canuIteration.
  --
  -- Yikes!  No corrected reads generated!
  -- Can't proceed!
  --
  -- Generating empty outputs.
  -- No change in report.
  -- Finished stage 'generateOutputs', reset canuIteration.
  --
  -- Assembly 'corrected_reads' finished in '/home2/adham/NanoCLUST/work/6e/f3e0a8c687663fe25421e3ee7ea11d'.
  --
  -- Summary saved in 'corrected_reads.report'.
  --
  -- Sequences saved:
  --   Contigs       -> 'corrected_reads.contigs.fasta'
  --   Unassembled   -> 'corrected_reads.unassembled.fasta'
  --   Unitigs       -> 'corrected_reads.unitigs.fasta'
  --
  -- Read layouts saved:
  --   Contigs       -> 'corrected_reads.contigs.layout'.
  --   Unitigs       -> 'corrected_reads.unitigs.layout'.
  --
  -- Graphs saved:
  --   Unitigs       -> 'corrected_reads.unitigs.gfa'.
  -- No change in report.
  -- Finished stage 'cor-dumpCorrectedReads', reset canuIteration.
  --
  -- Bye.
  gzip: corrected_reads.correctedReads.fasta.gz: No such file or directory

Work dir:
  /home2/adham/NanoCLUST/work/6e/f3e0a8c687663fe25421e3ee7ea11d

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

.command.sh Report

-- WARNING:
-- WARNING:  Option '-nanopore-raw <files>' is deprecated.
-- WARNING:  Use option '-nanopore <files>' in the future.
-- WARNING:
-- canu 2.1.1
--
-- CITATIONS
--
-- For 'standard' assemblies of PacBio or Nanopore reads:
--   Koren S, Walenz BP, Berlin K, Miller JR, Phillippy AM.
--   Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation.
--   Genome Res. 2017 May;27(5):722-736.
--   http://doi.org/10.1101/gr.215087.116
-- 
-- Read and contig alignments during correction and consensus use:
--   Šošic M, Šikic M.
--   Edlib: a C/C ++ library for fast, exact sequence alignment using edit distance.
--   Bioinformatics. 2017 May 1;33(9):1394-1395.
--   http://doi.org/10.1093/bioinformatics/btw753
-- 
-- Overlaps are generated using:
--   Berlin K, et al.
--   Assembling large genomes with single-molecule sequencing and locality-sensitive hashing.
--   Nat Biotechnol. 2015 Jun;33(6):623-30.
--   http://doi.org/10.1038/nbt.3238
-- 
--   Myers EW, et al.
--   A Whole-Genome Assembly of Drosophila.
--   Science. 2000 Mar 24;287(5461):2196-204.
--   http://doi.org/10.1126/science.287.5461.2196
-- 
-- Corrected read consensus sequences are generated using an algorithm derived from FALCON-sense:
--   Chin CS, et al.
--   Phased diploid genome assembly with single-molecule real-time sequencing.
--   Nat Methods. 2016 Dec;13(12):1050-1054.
--   http://doi.org/10.1038/nmeth.4035
-- 
-- Contig consensus sequences are generated using an algorithm derived from pbdagcon:
--   Chin CS, et al.
--   Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.
--   Nat Methods. 2013 Jun;10(6):563-9
--   http://doi.org/10.1038/nmeth.2474
-- 
-- CONFIGURE CANU
--
-- Detected Java(TM) Runtime Environment '10.0.2' (from '/home/morilab/anaconda3/bin/java') without -d64 support.
--
-- WARNING:
-- WARNING:  Failed to run gnuplot using command 'gnuplot'.
-- WARNING:  Plots will be disabled.
-- WARNING:
--
-- Detected 24 CPUs and 47 gigabytes of memory.
-- No grid engine detected, grid and staging disabled.
--
--                                (tag)Concurrency
--                         (tag)Threads          |
--                (tag)Memory         |          |
--        (tag)             |         |          |       total usage      algorithm
--        -------  ----------  --------   --------  --------------------  -----------------------------
-- Local: meryl      7.000 GB    4 CPUs x   6 jobs    42.000 GB  24 CPUs  (k-mer counting)
-- Local: hap        7.000 GB    4 CPUs x   6 jobs    42.000 GB  24 CPUs  (read-to-haplotype assignment)
-- Local: cormhap    6.000 GB   12 CPUs x   2 jobs    12.000 GB  24 CPUs  (overlap detection with mhap)
-- Local: obtovl     4.000 GB    8 CPUs x   3 jobs    12.000 GB  24 CPUs  (overlap detection)
-- Local: utgovl     4.000 GB    8 CPUs x   3 jobs    12.000 GB  24 CPUs  (overlap detection)
-- Local: cor        8.000 GB    4 CPUs x   5 jobs    40.000 GB  20 CPUs  (read correction)
-- Local: ovb        4.000 GB    1 CPU  x  11 jobs    44.000 GB  11 CPUs  (overlap store bucketizer)
-- Local: ovs        8.000 GB    1 CPU  x   5 jobs    40.000 GB   5 CPUs  (overlap store sorting)
-- Local: red        9.000 GB    4 CPUs x   5 jobs    45.000 GB  20 CPUs  (read error detection)
-- Local: oea        8.000 GB    1 CPU  x   5 jobs    40.000 GB   5 CPUs  (overlap error adjustment)
-- Local: bat       16.000 GB    4 CPUs x   1 job     16.000 GB   4 CPUs  (contig construction with bogart)
-- Local: cns        -.--- GB    4 CPUs x   - jobs     -.--- GB   - CPUs  (consensus)
--
-- In 'corrected_reads.seqStore', found Nanopore reads:
--   Nanopore:                 1
--
--   Raw:                      1
--
-- Generating assembly 'corrected_reads' in '/home2/adham/NanoCLUST/work/6e/f3e0a8c687663fe25421e3ee7ea11d':
--    - only correct raw reads.
--
-- Parameters:
--
--  genomeSize        1500
--
--  Overlap Generation Limits:
--    corOvlErrorRate 0.3200 ( 32.00%)
--    obtOvlErrorRate 0.1200 ( 12.00%)
--    utgOvlErrorRate 0.1200 ( 12.00%)
--
--  Overlap Processing Limits:
--    corErrorRate    0.5000 ( 50.00%)
--    obtErrorRate    0.1200 ( 12.00%)
--    utgErrorRate    0.1200 ( 12.00%)
--    cnsErrorRate    0.2000 ( 20.00%)
--
--
-- BEGIN CORRECTION
--
--
-- Creating overlap store correction/corrected_reads.ovlStore using:
--      1 bucket
--      2 slices
--        using at most 1 GB memory each
-- Finished stage 'cor-overlapStoreConfigure', reset canuIteration.
--
-- Running jobs.  First attempt out of 2.
----------------------------------------
-- Starting 'ovB' concurrent execution on Fri Jun 25 10:15:52 2021 with 1108.477 GB free disk space (1 processes; 11 concurrently)

    cd correction/corrected_reads.ovlStore.BUILDING
    ./scripts/1-bucketize.sh 1 > ./logs/1-bucketize.000001.out 2>&1

-- Finished on Fri Jun 25 10:15:52 2021 (lickety-split) with 1108.477 GB free disk space
----------------------------------------
--
-- Overlap store bucketizer jobs failed, retry.
--   job correction/corrected_reads.ovlStore.BUILDING/bucket0001 FAILED.
--
--
-- Running jobs.  Second attempt out of 2.
----------------------------------------
-- Starting 'ovB' concurrent execution on Fri Jun 25 10:15:52 2021 with 1108.477 GB free disk space (1 processes; 11 concurrently)

    cd correction/corrected_reads.ovlStore.BUILDING
    ./scripts/1-bucketize.sh 1 > ./logs/1-bucketize.000001.out 2>&1

-- Finished on Fri Jun 25 10:15:52 2021 (like a bat out of hell) with 1108.477 GB free disk space
----------------------------------------
--
-- Overlap store bucketizer jobs failed, tried 2 times, giving up.
--   job correction/corrected_reads.ovlStore.BUILDING/bucket0001 FAILED.
--

ABORT:
ABORT: canu 2.1.1
ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped.
ABORT: Try restarting.  If that doesn't work, ask for help.
ABORT:
gzip: corrected_reads.correctedReads.fasta.gz: No such file or directory
wc: corrected_reads.correctedReads.fasta: No such file or directory

I do hope that someone can shed some light on this.

I like to thank everyone who reads this in advance.
Adham

Edit: I do not know how to fix the styling of the post so I am sorry.
Edit 2: I was able to make the post easier to read and I added the command I used.

Link to nf-core slack

Hi there,

Congratulations on the new pipeline and preprint - it looks great!

It seems that you used the @nf-core pipeline template to make the workflow. Absolutely no problem with this at all (in fact it's encouraged!), however it looks like the readme still has the following at the bottom:

For further information or help, don't hesitate to get in touch on Slack (you can join with this invite).

I think this is causing some confusion - we've had a couple of people asking for help with the pipeline on the nf-core slack over the past days, presumably finding it because of the above links.

If it's ok, would you mind removing the above text to avoid the confusion? Of course if you're interested in bringing the pipeline into the @nf-core community then we would be more than happy to discuss that too!

Thanks,

Phil

How about metagenomics data using NanoCLUST workflow

If I replace 16s db with bac+fungi+virus genome db, will this workflow appliable to metagenomics Nanopore data?

Noting toxonomy information for taxid from api.unipet using get_abundance.py

Hi,
I have an issue with using database. When I am calling for example "http://api.unipept.ugent.be/api/v1/taxonomy.json?input[]=817&extra=true&names=true" using NanoClust, some items like "species_name","genus_name", "family_name",'order_name' and "class_name" are empty, however; regarding the document at your site those should not be empty." https://unipept.ugent.be/apidocs/taxonomy". I also checked it up on mock4_run3bc08_5000.fastq as a sample test in NanoClust dataset.
I would appreciate it if you could let me know how to fix it.
Sima

Missing headers in consensus classification

Hi!
Thank you for this tool.
There is one thing which is not very clear to me.
When you get the consensus_classification file you see the blast classification output.
I was wondering whether you could tell me what the different columns stand for?

thank you in advance!

read clustering error

executor > local (5)
[d8/6fd53d] process > QC (1) [100%] 1 of 1 ✔
[80/44cf34] process > fastqc (1) [100%] 1 of 1 ✔
[0c/7b75e3] process > kmer_freqs (1) [100%] 1 of 1 ✔
[cb/16e8b0] process > read_clustering (1) [100%] 1 of 1, failed: 1 ✘
[- ] process > split_by_cluster -
[- ] process > read_correction -
[- ] process > draft_selection -
[- ] process > racon_pass -
[- ] process > medaka_pass -
[- ] process > consensus_classification -
[- ] process > join_results -
[- ] process > get_abundances -
[- ] process > plot_abundances -
[12/0efcc7] process > output_documentation [100%] 1 of 1 ✔
Error executing process > 'read_clustering (1)'

Caused by:
Process read_clustering (1) terminated with an error exit status (1)

Command executed [/NanoporeTools/NanoCLUST/templates/umap_hdbscan.py]:

#!/usr/bin/env python

import numpy as np
import umap
import matplotlib.pyplot as plt
from sklearn import decomposition
import random
import pandas as pd
import hdbscan

df = pd.read_csv("freqs.txt", delimiter="is_Pr")

#UMAP
motifs = [x for x in df.columns.values if x not in ["read", "length"]]
X = df.loc[:,motifs]
X_embedded = umap.UMAP(n_neighbors=15, min_dist=0.1, verbose=2).fit_transform(X)

df_umap = pd.DataFrame(X_embedded, columns=["D1", "D2"])
umap_out = pd.concat([df["read"], df["length"], df_umap], axis=1)

#HDBSCAN
X = umap_out.loc[:,["D1", "D2"]]
umap_out["bin_id"] = hdbscan.HDBSCAN(min_cluster_size=int(200), cluster_selection_epsilon=int(0.5)).fit_predict(X)

#PLOT
plt.figure(figsize=(20,20))
plt.scatter(X_embedded[:, 0], X_embedded[:, 1], c=umap_out["bin_id"], cmap='Spectral', s=1)
plt.xlabel("UMAP1", fontsize=18)
plt.ylabel("UMAP2", fontsize=18)
plt.gca().set_aspect('equal', 'datalim')
plt.title("Projecting " + str(len(umap_out['bin_id'])) + " reads. " + str(len(umap_out['bin_id'].unique())) + " clusters generated by HDBSCAN", fontsize=18)

for cluster in np.sort(umap_out['bin_id'].unique()):
read = umap_out.loc[umap_out['bin_id'] == cluster].iloc[0]
plt.annotate(str(cluster), (read['D1'], read['D2']), weight='bold', size=14)

plt.savefig('hdbscan.output.png')
umap_out.to_csv("hdbscan.output.tsv", sep=" ", index=False)

Command exit status:
1

Command output:
UMAP(verbose=2)
Construct fuzzy simplicial set
Fri Jan 29 12:06:46 2021 Finding Nearest Neighbors
Fri Jan 29 12:06:46 2021 Building RP forest with 21 trees
Fri Jan 29 12:06:49 2021 NN descent for 17 iterations
1 / 17
2 / 17
3 / 17
4 / 17
5 / 17
6 / 17
7 / 17
8 / 17
Stopping threshold met -- exiting after 8 iterations
Fri Jan 29 12:07:08 2021 Finished Nearest Neighbor Search
Fri Jan 29 12:07:10 2021 Construct embedding
completed 0 / 200 epochs
completed 20 / 200 epochs
completed 40 / 200 epochs
completed 60 / 200 epochs
completed 80 / 200 epochs
completed 100 / 200 epochs
completed 120 / 200 epochs
completed 140 / 200 epochs
completed 160 / 200 epochs
completed 180 / 200 epochs
Fri Jan 29 12:08:08 2021 Finished embedding

Command error:
Traceback (most recent call last):
File "/home/administrator/Desktop/Bovine_Mastitis_Project/Mastitis_nanopore_data/Project_1/Project1/Project1/20190612_1214_MN26935_FAK72557_229de4aa/work/conda/read_clustering-998d6264058a39a660addfff9962d1f9/lib/python3.8/site-packages/joblib/parallel.py", line 820, in dispatch_one_batch
tasks = self._ready_batches.get(block=False)
File "/home/administrator/Desktop/Bovine_Mastitis_Project/Mastitis_nanopore_data/Project_1/Project1/Project1/20190612_1214_MN26935_FAK72557_229de4aa/work/conda/read_clustering-998d6264058a39a660addfff9962d1f9/lib/python3.8/queue.py", line 167, in get
raise Empty
_queue.Empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File ".command.sh", line 23, in
umap_out["bin_id"] = hdbscan.HDBSCAN(min_cluster_size=int(200), cluster_selection_epsilon=int(0.5)).fit_predict(X)
File "/home/administrator/Desktop/Bovine_Mastitis_Project/Mastitis_nanopore_data/Project_1/Project1/Project1/20190612_1214_MN26935_FAK72557_229de4aa/work/conda/read_clustering-998d6264058a39a660addfff9962d1f9/lib/python3.8/site-packages/hdbscan/hdbscan_.py", line 941, in fit_predict
self.fit(X)
File "/home/administrator/Desktop/Bovine_Mastitis_Project/Mastitis_nanopore_data/Project_1/Project1/Project1/20190612_1214_MN26935_FAK72557_229de4aa/work/conda/read_clustering-998d6264058a39a660addfff9962d1f9/lib/python3.8/site-packages/hdbscan/hdbscan_.py", line 919, in fit
self.min_spanning_tree) = hdbscan(X, **kwargs)
File "/home/administrator/Desktop/Bovine_Mastitis_Project/Mastitis_nanopore_data/Project_1/Project1/Project1/20190612_1214_MN26935_FAK72557_229de4aa/work/conda/read_clustering-998d6264058a39a660addfff9962d1f9/lib/python3.8/site-packages/hdbscan/hdbscan.py", line 610, in hdbscan
(single_linkage_tree, result_min_span_tree) = memory.cache(
File "/home/administrator/Desktop/Bovine_Mastitis_Project/Mastitis_nanopore_data/Project_1/Project1/Project1/20190612_1214_MN26935_FAK72557_229de4aa/work/conda/read_clustering-998d6264058a39a660addfff9962d1f9/lib/python3.8/site-packages/joblib/memory.py", line 352, in call
return self.func(*args, **kwargs)
File "/home/administrator/Desktop/Bovine_Mastitis_Project/Mastitis_nanopore_data/Project_1/Project1/Project1/20190612_1214_MN26935_FAK72557_229de4aa/work/conda/read_clustering-998d6264058a39a660addfff9962d1f9/lib/python3.8/site-packages/hdbscan/hdbscan_.py", line 275, in _hdbscan_boruvka_kdtree
alg = KDTreeBoruvkaAlgorithm(tree, min_samples, metric=metric,
File "hdbscan/_hdbscan_boruvka.pyx", line 375, in hdbscan._hdbscan_boruvka.KDTreeBoruvkaAlgorithm.init
File "hdbscan/_hdbscan_boruvka.pyx", line 411, in hdbscan._hdbscan_boruvka.KDTreeBoruvkaAlgorithm._compute_bounds
File "/home/administrator/Desktop/Bovine_Mastitis_Project/Mastitis_nanopore_data/Project_1/Project1/Project1/20190612_1214_MN26935_FAK72557_229de4aa/work/conda/read_clustering-998d6264058a39a660addfff9962d1f9/lib/python3.8/site-packages/joblib/parallel.py", line 1041, in call
if self.dispatch_one_batch(iterator):
File "/home/administrator/Desktop/Bovine_Mastitis_Project/Mastitis_nanopore_data/Project_1/Project1/Project1/20190612_1214_MN26935_FAK72557_229de4aa/work/conda/read_clustering-998d6264058a39a660addfff9962d1f9/lib/python3.8/site-packages/joblib/parallel.py", line 831, in dispatch_one_batch
islice = list(itertools.islice(iterator, big_batch_size))
File "hdbscan/_hdbscan_boruvka.pyx", line 412, in genexpr
TypeError: delayed() got an unexpected keyword argument 'check_pickle'

Work dir:
/home/administrator/Desktop/Bovine_Mastitis_Project/Mastitis_nanopore_data/Project_1/Project1/Project1/20190612_1214_MN26935_FAK72557_229de4aa/work/cb/16e8b0a8d65a824ccac0a1378149f9

Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run

[test,conda]: read_clustering error

Dear NANOClust team, I wanted to give a shot to your pipeline. I proceed with a fresh install of Miniconda and nextflow. After clonning this repo and the NCBI database, I just launched a test with `nextflow run main.nf -profile test,conda` and get the following error. Could you help me? Thanks:

nextflow run main.nf -profile test,conda
N E X T F L O W ~ version 21.04.0
Launching main.nf [compassionate_bose] - revision: 5e0f88a799

  _   __                     ________    __  _____________
 / | / /___ _____  ____     / ____/ /   / / / / ___/_  __/
/  |/ / __ `/ __ \/ __ \   / /   / /   / / / /\__ \ / /

/ /| / // / / / / // / / // // // // // /
// |/_,// //_/ _/__/_//___///

NanoCLUST v1.0dev

Run Name : compassionate_bose
Reads : /home/omnia/Downloads/NanoCLUST/test_datasets/mock4_run3bc08_5000.fastq
Max Resources : 128 GB memory, 16 cpus, 10d time per job
Output dir : ./results
Launch dir : /home/omnia/Downloads/NanoCLUST
Working dir : /home/omnia/Downloads/NanoCLUST/work
Script dir : /home/omnia/Downloads/NanoCLUST
User : omnia
Config Profile : test,conda
Config Description: Minimal test dataset to check pipeline function

[- ] process > QC -
[- ] process > fastqc -
[- ] process > kmer_freqs -
[- ] process > read_clustering -
executor > local (1)
[88/4947f9] process > QC (1) [100%] 1 of 1 ✔
[- ] process > fastqc -
[- ] process > kmer_freqs -
executor > local (2)
[88/4947f9] process > QC (1) [100%] 1 of 1 ✔
[- ] process > fastqc -
[- ] process > kmer_freqs -
executor > local (2)
[88/4947f9] process > QC (1) [100%] 1 of 1 ✔
[- ] process > fastqc -
[- ] process > kmer_freqs -
executor > local (3)
[88/4947f9] process > QC (1) [100%] 1 of 1 ✔
[- ] process > fastqc -
[ce/aa2011] process > kmer_freqs (1) [ 0%] 0 of 1
executor > local (3)
[88/4947f9] process > QC (1) [100%] 1 of 1 ✔
executor > local (5)
[88/4947f9] process > QC (1) [100%] 1 of 1 ✔
[08/1dd3e6] process > fastqc (1) [100%] 1 of 1 ✔
[ce/aa2011] process > kmer_freqs (1) [100%] 1 of 1 ✔
[30/ae4788] process > read_clustering (1) [ 0%] 0 of 1
[- ] process > split_by_cluster -
[- ] process > read_correction -
[- ] process > draft_selection -
[- ] process > racon_pass -
[- ] process > medaka_pass -
[- ] process > consensus_classification -
[- ] process > join_results -
[- ] process > get_abundances -
[- ] process > plot_abundances -
[a1/ef749f] process > output_documentation [100%] 1 of 1 ✔
Error executing process > 'read_clustering (1)'

Caused by:
Process read_clustering (1) terminated with an error exit status (1)

Command executed [/home/omnia/Downloads/NanoCLUST/templates/umap_hdbscan.py]:

#!/usr/bin/env python

import numpy as np
import umap
import matplotlib.pyplot as plt
from sklearn import decomposition
import random
import pandas as pd
import hdbscan

df = pd.read_csv("freqs.txt", delimiter=" ")

#UMAP
motifs = [x for x in df.columns.values if x not in ["read", "length"]]
X = df.loc[:,motifs]
X_embedded = umap.UMAP(n_neighbors=15, min_dist=0.1, verbose=2).fit_transform(X)

df_umap = pd.DataFrame(X_embedded, columns=["D1", "D2"])
umap_out = pd.concat([df["read"], df["length"], df_umap], axis=1)

#HDBSCAN
X = umap_out.loc[:,["D1", "D2"]]
umap_out["bin_id"] = hdbscan.HDBSCAN(min_cluster_size=int(50), cluster_selection_epsilon=int(0.5)).fit_predict(X)

for cluster in np.sort(umap_out['bin_id'].unique()):
read = umap_out.loc[umap_out['bin_id'] == cluster].iloc[0]
plt.annotate(str(cluster), (read['D1'], read['D2']), weight='bold', size=14)

plt.savefig('hdbscan.output.png')
umap_out.to_csv("hdbscan.output.tsv", sep=" ", index=False)

Command exit status:
1

Command output:
(empty)

Command error:
retval = self._compile_core(args, return_type)
File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/dispatcher.py", line 106, in _compile_core
cres = compiler.compile_extra(self.targetdescr.typing_context,
File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/compiler.py", line 606, in compile_extra
return pipeline.compile_extra(func)
File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/compiler.py", line 353, in compile_extra
return self._compile_bytecode()
File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/compiler.py", line 415, in _compile_bytecode
return self._compile_core()
File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/compiler.py", line 395, in _compile_core
raise e
File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/compiler.py", line 386, in _compile_core
pm.run(self.state)
File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/compiler_machinery.py", line 339, in run
raise patched_exception
File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/compiler_machinery.py", line 330, in run
self._runPass(idx, pass_inst, state)
File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/compiler_lock.py", line 35, in _acquire_compile_lock
return func(*args, **kwargs)
File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/compiler_machinery.py", line 289, in _runPass
mutated |= check(pss.run_pass, internal_state)
File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/compiler_machinery.py", line 262, in check
mangled = func(compiler_state)
File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/typed_passes.py", line 463, in run_pass
NativeLowering().run_pass(state)
File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/typed_passes.py", line 384, in run_pass
lower.lower()
File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/lowering.py", line 136, in lower
self.lower_normal_function(self.fndesc)
File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/lowering.py", line 190, in lower_normal_function
entry_block_tail = self.lower_function_body()
File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/lowering.py", line 216, in lower_function_body
self.lower_block(block)
File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/lowering.py", line 230, in lower_block
self.lower_inst(inst)
File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/contextlib.py", line 131, in exit
self.gen.throw(type, value, traceback)
File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/errors.py", line 751, in new_error_context
raise newerr.with_traceback(tb)
numba.core.errors.LoweringError: Failed in nopython mode pipeline (step: nopython mode backend)
Storing i64 to ptr of i32 ('dim'). FE type int32
executor > local (5)
[88/4947f9] process > QC (1) [100%] 1 of 1 ✔
[08/1dd3e6] process > fastqc (1) [100%] 1 of 1 ✔
[ce/aa2011] process > kmer_freqs (1) [100%] 1 of 1 ✔
[30/ae4788] process > read_clustering (1) [100%] 1 of 1, failed: 1 ✘
[- ] process > split_by_cluster -
[- ] process > read_correction -
[- ] process > draft_selection -
[- ] process > racon_pass -
[- ] process > medaka_pass -
[- ] process > consensus_classification -
[- ] process > join_results -
[- ] process > get_abundances -
[- ] process > plot_abundances -
[a1/ef749f] process > output_documentation [100%] 1 of 1 ✔
Error executing process > 'read_clustering (1)'

Caused by:
Process read_clustering (1) terminated with an error exit status (1)

Command executed [/home/omnia/Downloads/NanoCLUST/templates/umap_hdbscan.py]:

#!/usr/bin/env python

import numpy as np
import umap
import matplotlib.pyplot as plt
from sklearn import decomposition
import random
import pandas as pd
import hdbscan

df = pd.read_csv("freqs.txt", delimiter=" ")

#UMAP
motifs = [x for x in df.columns.values if x not in ["read", "length"]]
X = df.loc[:,motifs]
X_embedded = umap.UMAP(n_neighbors=15, min_dist=0.1, verbose=2).fit_transform(X)

df_umap = pd.DataFrame(X_embedded, columns=["D1", "D2"])
umap_out = pd.concat([df["read"], df["length"], df_umap], axis=1)

#HDBSCAN
X = umap_out.loc[:,["D1", "D2"]]
umap_out["bin_id"] = hdbscan.HDBSCAN(min_cluster_size=int(50), cluster_selection_epsilon=int(0.5)).fit_predict(X)

for cluster in np.sort(umap_out['bin_id'].unique()):
read = umap_out.loc[umap_out['bin_id'] == cluster].iloc[0]
plt.annotate(str(cluster), (read['D1'], read['D2']), weight='bold', size=14)

plt.savefig('hdbscan.output.png')
umap_out.to_csv("hdbscan.output.tsv", sep=" ", index=False)

Command exit status:
1

Command output:
(empty)

Command error:
retval = self._compile_core(args, return_type)
File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/dispatcher.py", line 106, in _compile_core
cres = compiler.compile_extra(self.targetdescr.typing_context,
File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/compiler.py", line 606, in compile_extra
return pipeline.compile_extra(func)
File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/compiler.py", line 353, in compile_extra
return self._compile_bytecode()
File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/compiler.py", line 415, in _compile_bytecode
return self._compile_core()
File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/compiler.py", line 395, in _compile_core
raise e
File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/compiler.py", line 386, in _compile_core
pm.run(self.state)
File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/compiler_machinery.py", line 339, in run
raise patched_exception
File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/compiler_machinery.py", line 330, in run
self._runPass(idx, pass_inst, state)
File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/compiler_lock.py", line 35, in _acquire_compile_lock
return func(*args, **kwargs)
File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/compiler_machinery.py", line 289, in _runPass
mutated |= check(pss.run_pass, internal_state)
File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/compiler_machinery.py", line 262, in check
mangled = func(compiler_state)
File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/typed_passes.py", line 463, in run_pass
NativeLowering().run_pass(state)
File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/typed_passes.py", line 384, in run_pass
lower.lower()
File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/lowering.py", line 136, in lower
self.lower_normal_function(self.fndesc)
File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/lowering.py", line 190, in lower_normal_function
entry_block_tail = self.lower_function_body()
File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/lowering.py", line 216, in lower_function_body
self.lower_block(block)this
File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/lowering.py", line 230, in lower_block
self.lower_inst(inst)
File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/contextlib.py", line 131, in exit
self.gen.throw(type, value, traceback)
File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/errors.py", line 751, in new_error_context
raise newerr.with_traceback(tb)
numba.core.errors.LoweringError: Failed in nopython mode pipeline (step: nopython mode backend)
Storing i64 to ptr of i32 ('dim'). FE type int32

File "../../conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/umap/layouts.py", line 52:
def rdist(x, y):

result = 0.0
dim = x.shape[0]
^

During: lowering "dim = static_getitem(value=$8load_attr.2, index=0, index_var=$const10.3, fn=)" at /home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/umap/layouts.py (52)

Work dir:
/home/omnia/Downloads/NanoCLUST/work/30/ae4788f916916db4d30751000e564a

Tip: view the complete command output by changing to the process work dir and entering the command cat .command.out

Error executing process > 'read_clustering (1)'

Hi,
I have run a fastq.gz sample file on NanoCLUST using following command
nextflow run NanoCLUST/main.nf -profile conda --reads barcode01_filt.fastq.gz --db db/16S_ribosomal_RNA --tax db/taxdb/

Run Name : tiny_wescoff
Config Profile : conda

Getting error in read_clustering :

_Error executing process > 'read_clustering (1)'

Caused by:
Process read_clustering (1) terminated with an error exit status (1)

Command executed [/userdata/Punit/Rashmita_data/nanoclust/NanoCLUST/templates/umap_hdbscan.py]:

#!/usr/bin/env python

import numpy as np
import umap
import matplotlib.pyplot as plt
from sklearn import decomposition
import random
import pandas as pd
import hdbscan

df = pd.read_csv("freqs.txt", delimiter=" ")

#UMAP
motifs = [x for x in df.columns.values if x not in ["read", "length"]]
X = df.loc[:,motifs]
X_embedded = umap.UMAP(n_neighbors=15, min_dist=0.1, verbose=2).fit_transform(X)

df_umap = pd.DataFrame(X_embedded, columns=["D1", "D2"])
umap_out = pd.concat([df["read"], df["length"], df_umap], axis=1)

#HDBSCAN
X = umap_out.loc[:,["D1", "D2"]]
umap_out["bin_id"] = hdbscan.HDBSCAN(min_cluster_size=int(50), cluster_selection_epsilon=int(0.5)).fit_predict(X)

for cluster in np.sort(umap_out['bin_id'].unique()):
read = umap_out.loc[umap_out['bin_id'] == cluster].iloc[0]
plt.annotate(str(cluster), (read['D1'], read['D2']), weight='bold', size=14)

plt.savefig('hdbscan.output.png')
umap_out.to_csv("hdbscan.output.tsv", sep=" ", index=False)

Command exit status:
1

Command output:
(empty)

Command error:
retval = self._compile_core(args, return_type)
File "/userdata/Punit/Rashmita_data/nanoclust/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/dispatcher.py", line 106, in _compile_core
cres = compiler.compile_extra(self.targetdescr.typing_context,
File "/userdata/Punit/Rashmita_data/nanoclust/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/compiler.py", line 606, in compile_extra
return pipeline.compile_extra(func)
File "/userdata/Punit/Rashmita_data/nanoclust/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/compiler.py", line 353, in compile_extra
return self._compile_bytecode()
File "/userdata/Punit/Rashmita_data/nanoclust/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/compiler.py", line 415, in _compile_bytecode
return self._compile_core()
File "/userdata/Punit/Rashmita_data/nanoclust/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/compiler.py", line 395, in _compile_core
raise e
File "/userdata/Punit/Rashmita_data/nanoclust/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/compiler.py", line 386, in _compile_core
pm.run(self.state)
File "/userdata/Punit/Rashmita_data/nanoclust/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/compiler_machinery.py", line 339, in run
raise patched_exception
File "/userdata/Punit/Rashmita_data/nanoclust/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/compiler_machinery.py", line 330, in run
self._runPass(idx, pass_inst, state)
File "/userdata/Punit/Rashmita_data/nanoclust/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/compiler_lock.py", line 35, in _acquire_compile_lock
return func(*args, **kwargs)
File "/userdata/Punit/Rashmita_data/nanoclust/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/compiler_machinery.py", line 289, in _runPass
mutated |= check(pss.run_pass, internal_state)
File "/userdata/Punit/Rashmita_data/nanoclust/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/compiler_machinery.py", line 262, in check
mangled = func(compiler_state)
File "/userdata/Punit/Rashmita_data/nanoclust/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/typed_passes.pexecutor > local (5)
[bf/efa57f] process > QC (1) [100%] 1 of 1 ✔
[7e/0435bb] process > fastqc (1) [100%] 1 of 1 ✔
[e0/371311] process > kmer_freqs (1) [100%] 1 of 1 ✔
[59/28559d] process > read_clustering (1) [100%] 1 of 1, failed: 1 ✘
[- ] process > split_by_cluster -
[- ] process > read_correction -
[- ] process > draft_selection -
[- ] process > racon_pass -
[- ] process > medaka_pass -
[- ] process > consensus_classification -
[- ] process > join_results -
[- ] process > get_abundances -
[- ] process > plot_abundances -
[e3/c6f429] process > output_documentation [100%] 1 of 1 ✔
Error executing process > 'read_clustering (1)'

Caused by:
Process read_clustering (1) terminated with an error exit status (1)

Command executed [/userdata/Punit/Rashmita_data/nanoclust/NanoCLUST/templates/umap_hdbscan.py]:

#!/usr/bin/env python

import numpy as np
import umap
import matplotlib.pyplot as plt
from sklearn import decomposition
import random
import pandas as pd
import hdbscan

df = pd.read_csv("freqs.txt", delimiter="lue=$")

#UMAP
motifs = [x for x in df.columns.values if x not in ["read", "length"]]
X = df.loc[:,motifs]
X_embedded = umap.UMAP(n_neighbors=15, min_dist=0.1, verbose=2).fit_transform(X)

df_umap = pd.DataFrame(X_embedded, columns=["D1", "D2"])
umap_out = pd.concat([df["read"], df["length"], df_umap], axis=1)

#HDBSCAN
X = umap_out.loc[:,["D1", "D2"]]
umap_out["bin_id"] = hdbscan.HDBSCAN(min_cluster_size=int(50), cluster_selection_epsilon=int(0.5)).fit_predict(X)

for cluster in np.sort(umap_out['bin_id'].unique()):
read = umap_out.loc[umap_out['bin_id'] == cluster].iloc[0]
plt.annotate(str(cluster), (read['D1'], read['D2']), weight='bold', size=14)

plt.savefig('hdbscan.output.png')
umap_out.to_csv("hdbscan.output.tsv", sep=" ", index=False)

Command exit status:
1

Command output:
(empty)

Command error:
retval = self._compile_core(args, return_type)
File "/userdata/Punit/Rashmita_data/nanoclust/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/dispatcher.py", line 106, in _compile_core
cres = compiler.compile_extra(self.targetdescr.typing_context,
File "/userdata/Punit/Rashmita_data/nanoclust/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/compiler.py", line 606, in compile_extra
return pipeline.compile_extra(func)
File "/userdata/Punit/Rashmita_data/nanoclust/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/compiler.py", line 353, in compile_extra
return self._compile_bytecode()
File "/userdata/Punit/Rashmita_data/nanoclust/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/compiler.py", line 415, in _compile_bytecode
return self._compile_core()
File "/userdata/Punit/Rashmita_data/nanoclust/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/compiler.py", line 395, in _compile_core
raise e
File "/userdata/Punit/Rashmita_data/nanoclust/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/compiler.py", line 386, in _compile_core
pm.run(self.state)
File "/userdata/Punit/Rashmita_data/nanoclust/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/compiler_machinery.py", line 339, in run
raise patched_exception
File "/userdata/Punit/Rashmita_data/nanoclust/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/compiler_machinery.py", line 330, in run
self._runPass(idx, pass_inst, state)
File "/userdata/Punit/Rashmita_data/nanoclust/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/compiler_lock.py", line 35, in _acquire_compile_lock
return func(*args, **kwargs)
File "/userdata/Punit/Rashmita_data/nanoclust/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/compiler_machinery.py", line 289, in _runPass
mutated |= check(pss.run_pass, internal_state)
File "/userdata/Punit/Rashmita_data/nanoclust/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/compiler_machinery.py", line 262, in check
mangled = func(compiler_state)
File "/userdata/Punit/Rashmita_data/nanoclust/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/typed_passes.py", line 463, in run_pass
NativeLowering().run_pass(state)
File "/userdata/Punit/Rashmita_data/nanoclust/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/typed_passes.py", line 384, in run_pass
lower.lower()
File "/userdata/Punit/Rashmita_data/nanoclust/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/lowering.py", line 136, in lower
self.lower_normal_function(self.fndesc)
File "/userdata/Punit/Rashmita_data/nanoclust/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/lowering.py", line 190, in lower_normal_function
entry_block_tail = self.lower_function_body()
File "/userdata/Punit/Rashmita_data/nanoclust/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/lowering.py", line 216, in lower_function_body
self.lower_block(block)
File "/userdata/Punit/Rashmita_data/nanoclust/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/lowering.py", line 230, in lower_block
self.lower_inst(inst)
File "/userdata/Punit/Rashmita_data/nanoclust/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/contextlib.py", line 131, in exit
self.gen.throw(type, value, traceback)
File "/userdata/Punit/Rashmita_data/nanoclust/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/errors.py", line 751, in new_error_context
raise newerr.with_traceback(tb)
numba.core.errors.LoweringError: Failed in nopython mode pipeline (step: nopython mode backend)
Storing i64 to ptr of i32 ('dim'). FE type int32

File "../../conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/umap/layouts.py", line 52:
def rdist(x, y):

result = 0.0
dim = x.shape[0]
^

During: lowering "dim = static_getitem(value=$8load_attr.2, index=0, index_var=$const10.3, fn=)" at /userdata/Punit/Rashmita_data/nanoclust/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/umap/layouts.py (52)

Work dir:
/userdata/Punit/Rashmita_data/nanoclust/work/59/28559dcb43230a9ddfc92eef9ab981

Tip: when you have fixed the problem you can continue the execution adding the option -resume to the run command line_

can you help me in resolving the issue
Thanks

Error executing process > consensus_classification

Hello,
I'm trying to run NanoCLUST with my 16S sequence data. I run it on a Linux CentOS 7 machine.
Also with the test data I get an error reaching the 'consensus_classification' module:

When I use the command:
nextflow run main.nf -profile test,docker

I get the following terminal output:

N E X T F L O W ~ version 21.04.1
Launching main.nf [nostalgic_mestorf] - revision: 5e0f88a799

Run Name : nostalgic_mestorf
Reads : /home/bcl2fastq/Schreibtisch/Metagenomics/NanoCLUST/NanoCLUST/test_datasets/mock4_run3bc08_5000.fastq
Max Resources : 128 GB memory, 16 cpus, 10d time per job
Container : docker - [:]
Output dir : ./results
Launch dir : /home/bcl2fastq/Schreibtisch/Metagenomics/NanoCLUST/NanoCLUST
Working dir : /home/bcl2fastq/Schreibtisch/Metagenomics/NanoCLUST/NanoCLUST/work
Script dir : /home/bcl2fastq/Schreibtisch/Metagenomics/NanoCLUST/NanoCLUST
User : bcl2fastq
Config Profile : test,docker
Config Description: Minimal test dataset to check pipeline function

executor > local (80)
[59/8242a2] process > QC (1) [100%] 1 of 1 ✔
[ae/0c905f] process > fastqc (1) [100%] 1 of 1 ✔
[65/245dd8] process > kmer_freqs (1) [100%] 1 of 1 ✔
[75/ee543b] process > read_clustering (1) [100%] 1 of 1 ✔
[11/ba1911] process > split_by_cluster (1) [100%] 1 of 1 ✔
[6e/f6a7b9] process > read_correction (1) [100%] 8 of 8 ✔
[e5/038ac3] process > draft_selection (8) [100%] 8 of 8 ✔
[f5/4c1169] process > racon_pass (8) [100%] 8 of 8 ✔
[da/75dc38] process > medaka_pass (8) [100%] 8 of 8 ✔
[5b/acba47] process > consensus_classification (4) [ 95%] 36 of 38, failed: 36, retries: 35
[- ] process > join_results -
[- ] process > get_abundances -
[- ] process > plot_abundances -
[62/bce7f9] process > output_documentation [100%] 1 of 1 ✔
[d6/d15ac5] NOTE: Process consensus_classification (5) terminated with an error exit status (2) -- Execution is retried (4)
[a2/3f7b37] NOTE: Process consensus_classification (3) terminated with an error exit status (2) -- Execution is retried (5)
[13/0144d3] NOTE: Process consensus_classification (8) terminated with an error exit status (2) -- Execution is retried (2)
[6a/d909d2] NOTE: Process consensus_classification (2) terminated with an error exit status (2) -- Execution is retried (5)
[df/88a4ed] NOTE: Process consensus_classification (1) terminated with an error exit status (2) -- Execution is retried (5)
[7d/9aba43] NOTE: Process consensus_classification (7) terminated with an error exit status (2) -- Execution is retried (3)
[97/865ac9] NOTE: Process consensus_classification (6) terminated with an error exit status (2) -- Execution is retried (5)
[a7/819c20] NOTE: Process consensus_classification (4) terminated with an error exit status (2) -- Execution is retried (5)
[55/9ada5e] NOTE: Process consensus_classification (5) terminated with an error exit status (2) -- Execution is retried (5)
Error executing process > 'consensus_classification (3)'

Caused by:
Process consensus_classification (3) terminated with an error exit status (2)

Command executed:

export BLASTDB=
export BLASTDB=$BLASTDB:/tmp/db/taxdb/
blastn -query consensus.fasta -db /tmp/db/16S_ribosomal_RNA -task blastn -dust no -outfmt "10 sscinames staxids evalue length pident" -evalue 11 -max_hsps 50 -max_target_seqs 5 | sed 's/,/;/g' > consensus_classification.csv
#DECIDE FINAL CLASSIFFICATION
cat 2_draft.log > 2_blast.log
echo -n ";" >> 2_blast.log
BLAST_OUT=$(cut -d";" -f1,2,4,5 consensus_classification.csv | head -n1)
echo $BLAST_OUT >> 2_blast.log

Command exit status:
2

Command output:
(empty)

Command error:
BLAST Database error: No alias or index file found for nucleotide database [/tmp/db/16S_ribosomal_RNA] in search path [/home/bcl2fastq/Schreibtisch/Metagenomics/NanoCLUST/NanoCLUST/work/4d/1076da07aa671eabdee31a99520c1b::/tmp/db/taxdb:]

Work dir:
/home/bcl2fastq/Schreibtisch/Metagenomics/NanoCLUST/NanoCLUST/work/4d/1076da07aa671eabdee31a99520c1b

Many thanks,
Fritz

pipeline error

executor > local (5)
[b6/f83e29] process > demultiplex_porechop (1) [100%] 1 of 1, cached: 1 ✔
[46/63d408] process > QC (2) [100%] 12 of 12, cached: 12 ✔
[a2/a381f3] process > fastqc (8) [100%] 12 of 12, cached: 12 ✔
[7b/f93ed0] process > kmer_freqs (7) [100%] 12 of 12, cached: 12 ✔
[b4/723722] process > read_clustering (5) [100%] 11 of 11, cached: 9
[01/ed4cdd] process > split_by_cluster (3) [ 89%] 8 of 9, cached: 6, failed: 2
[- ] process > read_correction [ 0%] 0 of 70
[- ] process > draft_selection -
[- ] process > racon_pass -
[- ] process > medaka_pass -
[- ] process > consensus_classification -
[- ] process > join_results -
[- ] process > get_abundances -
[- ] process > plot_abundances -
[d6/8e92ba] process > output_documentation [100%] 1 of 1, cached: 1 ✔
Error executing process > 'split_by_cluster (9)'

Caused by:
Missing output file(s) *[0-9]*.log expected by process split_by_cluster (9)

Command executed:

sed 's/\srunid.*//g' BC07_qced_reads_set.fastq > only_id_header_readfile.fastq
CLUSTERS_CNT=$(awk '($5 ~ /[0-9]/) {print $5}' hdbscan.output.tsv | sort -nr | uniq | head -n1)

for ((i = 0 ; i <= $CLUSTERS_CNT ; i++));
do
cluster_id=$i
awk -v cluster="$cluster_id" '($5 == cluster) {print $1}' hdbscan.output.tsv > $cluster_id_ids.txt
seqtk subseq only_id_header_readfile.fastq $cluster_id_ids.txt > $cluster_id.fastq
READ_COUNT=$(( $(awk '{print $1/4}' <(wc -l $cluster_id.fastq)) ))
echo -n "$cluster_id;$READ_COUNT" > $cluster_id.log
done

Command exit status:
0

Command output:
(empty)

Work dir:
/home/administrator/Desktop/Bovine_Mastitis_Project/Mastitis_nanopore_data/Project_1/Project1/Project1/20190612_1214_MN26935_FAK72557_229de4aa/work/ba/711a8ad94186576f7e4db76c222be8

Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run
WARN: To render the execution DAG in the required format it is required to install Graphviz -- See http://www.graphviz

trouble installing

I have tried installing nextflow through both methods cited on their page: using curl, and using bioconda. I have then tried to test the nextflow pipeline for Nanoclust using 3 different conda environments for 3 different python versions. For python versions 3.4 or 3.7 the environment fails to resolve the kmer_freqs package. For python version 2.7, the pipeline gets further, but still fails to resolve the environment for the read_correction module.

For both I used the following command:

nextflow run main.nf -profile test,conda

I am pasting links for my error logs here:
https://vedantabio.box.com/s/ocuwb22128serr59tob3r3se7bzw8aez
https://vedantabio.box.com/s/ccmpuqez10vqd3gmpn6aramx760dwpog

Missing output file(s) `consensus_medaka.fasta/consensus.fasta` expected by process `medaka_pass (3)`

I have just met a problem.
Please help me.

executor > local (28)
[76/736bf5] process > QC (1) [100%] 1 of 1 ✔
[74/e820fa] process > fastqc (1) [100%] 1 of 1 ✔
[c5/0fca5a] process > kmer_freqs (1) [100%] 1 of 1 ✔
[8d/4fc55f] process > read_clustering (1) [100%] 1 of 1 ✔
[d1/a50527] process > split_by_cluster (1) [100%] 1 of 1 ✔
[42/5f1770] process > read_correction (6) [100%] 6 of 6 ✔
[3a/e0fc08] process > draft_selection (6) [100%] 6 of 6 ✔
[73/3a41d1] process > racon_pass (6) [100%] 6 of 6 ✔
[5d/fe60c2] process > medaka_pass (3) [ 50%] 3 of 6, failed: 1
[- ] process > consensus_classification [ 0%] 0 of 2
[- ] process > join_results -
[- ] process > get_abundances -
[- ] process > plot_abundances -
[18/a8674b] process > output_documentation [100%] 1 of 1 ✔
Error executing process > 'medaka_pass (3)'

Caused by:
Missing output file(s) consensus_medaka.fasta/consensus.fasta expected by process medaka_pass (3)

Command executed:

if medaka_consensus -i corrected_reads.correctedReads.fasta -d racon_consensus.fasta -o consensus_medaka.fasta -t 4 -m r941_min_high_g303 ; then
echo "Command succeeded"
else
cat racon_consensus.fasta > consensus_medaka.fasta
fi

Command exit status:
0

Command output:
(empty)

Command error:

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/conda/envs/medaka_pass/lib/python3.6/site-packages/medaka/medaka.py", line 39, in call
model_fp = medaka.models.resolve_model(val)
File "/opt/conda/envs/medaka_pass/lib/python3.6/site-packages/medaka/models.py", line 58, in resolve_model
" the internet and try again.".format(model))
medaka.models.DownloadError: The model file for r941_min_high_g303 is not already installed and could not be downloaded. Check you are connected to the internet and try again.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/conda/envs/medaka_pass/bin/medaka", line 11, in
sys.exit(main())
File "/opt/conda/envs/medaka_pass/lib/python3.6/site-packages/medaka/medaka.py", line 612, in main
args = parser.parse_args()
File "/opt/conda/envs/medaka_pass/lib/python3.6/argparse.py", line 1734, in parse_args
args, argv = self.parse_known_args(args, namespace)
File "/opt/conda/envs/medaka_pass/lib/python3.6/argparse.py", line 1766, in parse_known_args
namespace, args = self._parse_known_args(args, namespace)
File "/opt/conda/envs/medaka_pass/lib/python3.6/argparse.py", line 1954, in _parse_known_args
positionals_end_index = consume_positionals(start_index)
File "/opt/conda/envs/medaka_pass/lib/python3.6/argparse.py", line 1931, in consume_positionals
take_action(action, args)
File "/opt/conda/envs/medaka_pass/lib/python3.6/argparse.py", line 1840, in take_action
action(self, namespace, argument_values, option_string)
File "/opt/conda/envs/medaka_pass/lib/python3.6/argparse.py", line 1137, in call
subnamespace, arg_strings = parser.parse_known_args(arg_strings, None)
File "/opt/conda/envs/medaka_pass/lib/python3.6/argparse.py", line 1766, in parse_known_args
namespace, args = self._parse_known_args(args, namespace)
File "/opt/conda/envs/medaka_pass/lib/python3.6/argparse.py", line 1954, in _parse_known_args
positionals_end_index = consume_positionals(start_index)
File "/opt/conda/envs/medaka_pass/lib/python3.6/argparse.py", line 1931, in consume_positionals
take_action(action, args)
File "/opt/conda/envs/medaka_pass/lib/python3.6/argparse.py", line 1840, in take_action
action(self, namespace, argument_values, option_string)
File "/opt/conda/envs/medaka_pass/lib/python3.6/argparse.py", line 1137, in call
subnamespace, arg_strings = parser.parse_known_args(arg_strings, None)
File "/opt/conda/envs/medaka_pass/lib/python3.6/argparse.py", line 1766, in parse_known_args
namespace, args = self._parse_known_args(args, namespace)
File "/opt/conda/envs/medaka_pass/lib/python3.6/argparse.py", line 1972, in _parse_known_args
start_index = consume_optional(start_index)
File "/opt/conda/envs/medaka_pass/lib/python3.6/argparse.py", line 1912, in consume_optional
take_action(action, args, option_string)
File "/opt/conda/envs/medaka_pass/lib/python3.6/argparse.py", line 1840, in take_action
action(self, namespace, argument_values, option_string)
File "/opt/conda/envs/medaka_pass/lib/python3.6/site-packages/medaka/medaka.py", line 42, in call
raise RuntimeError(msg.format(self.dest, str(e)))
RuntimeError: Error validating model from '--model' argument: The model file for r941_min_high_g303 is not already installed and could not be downloaded. Check you are connected to the internet and try again..

Work dir:
/home/syu/nano/results/work/5d/fe60c2c93072c4fdeab8de9a735c6c

Tip: view the complete command output by changing to the process work dir and entering the command cat .command.out

Global OTU table

Dear all,

thanks for providing this workflow. I'm wondering if the pipeline does global clustering
on overall sequences when using wildcards in --reads option (*.fastq) ?
I have tested and i'm getting independent results folders for each fastq files.
Is there option to get unified results as a classic OTU table that allow to compare samples.

Thanks for your help
Etienne

read clustering error

Error executing process > 'read_clustering (1)'

Caused by:
Process read_clustering (1) terminated with an error exit status (1)

Command executed [/NanoporeTools/NanoCLUST/templates/umap_hdbscan.py]:

#!/usr/bin/env python

import numpy as np
import umap
import matplotlib.pyplot as plt
from sklearn import decomposition
import random
import pandas as pd
import hdbscan

df = pd.read_csv("freqs.txt", delimiter=" ")

#UMAP
motifs = [x for x in df.columns.values if x not in ["read", "length"]]
X = df.loc[:,motifs]
X_embedded = umap.UMAP(n_neighbors=15, min_dist=0.1, verbose=2).fit_transform(X)

df_umap = pd.DataFrame(X_embedded, columns=["D1", "D2"])
umap_out = pd.concat([df["read"], df["length"], df_umap], axis=1)

#HDBSCAN
X = umap_out.loc[:,["D1", "D2"]]
umap_out["bin_id"] = hdbscan.HDBSCAN(min_cluster_size=int(50), cluster_selection_epsilon=int(0.5)).fit_predict(X)

for cluster in np.sort(umap_out['bin_id'].unique()):
read = umap_out.loc[umap_out['bin_id'] == cluster].iloc[0]
plt.annotate(str(cluster), (read['D1'], read['D2']), weight='bold', size=14)

plt.savefig('hdbscan.output.png')
umap_out.to_csv("hdbscan.output.tsv", sep=" ", index=False)

Command exit status:
1

Command output:
UMAP(verbose=2)
Construct fuzzy simplicial set
Wed Jul 22 18:14:19 2020 Finding Nearest Neighbors
Wed Jul 22 18:14:19 2020 Building RP forest with 15 trees
Wed Jul 22 18:14:23 2020 NN descent for 15 iterations
0 / 15
1 / 15
2 / 15
3 / 15
4 / 15
5 / 15
6 / 15
7 / 15
8 / 15
9 / 15
10 / 15
11 / 15
12 / 15
13 / 15
14 / 15
Wed Jul 22 18:14:58 2020 Finished Nearest Neighbor Search

Command error:
Traceback (most recent call last):
File ".command.sh", line 16, in
X_embedded = umap.UMAP(n_neighbors=15, min_dist=0.1, verbose=2).fit_transform(X)
File "/NanoporeTools/Bovine_Mastitis_Project/P4_LSK109/p4_LSK109/nanoclust/work/conda/read_clustering-5be9e67ee159b7a180825e360a9ef1d5/lib/python3.8/site-packages/umap/umap_.py", line 1960, in fit_transform
self.fit(X, y)
File "/NanoporeTools/Bovine_Mastitis_Project/P4_LSK109/p4_LSK109/nanoclust/work/conda/read_clustering-5be9e67ee159b7a180825e360a9ef1d5/lib/python3.8/site-packages/umap/umap_.py", line 1781, in fit
self._search_graph.transpose()
File "/NanoporeTools/Bovine_Mastitis_Project/P4_LSK109/p4_LSK109/nanoclust/work/conda/read_clustering-5be9e67ee159b7a180825e360a9ef1d5/lib/python3.8/site-packages/scipy/sparse/lil.py", line 437, in transpose
return self.tocsr(copy=copy).transpose(axes=axes, copy=False).tolil(copy=False)
File "/NanoporeTools/Bovine_Mastitis_Project/P4_LSK109/p4_LSK109/nanoclust/work/conda/read_clustering-5be9e67ee159b7a180825e360a9ef1d5/lib/python3.8/site-packages/scipy/sparse/lil.py", line 462, in tocsr
_csparsetools.lil_get_lengths(self.rows, indptr[1:])
File "_csparsetools.pyx", line 109, in scipy.sparse._csparsetools.lil_get_lengths
File "stringsource", line 658, in View.MemoryView.memoryview_cwrapper
File "stringsource", line 349, in View.MemoryView.memoryview.cinit
TypeError: a bytes-like object is required, not 'list'

Work dir:
/NanoporeTools/Bovine_Mastitis_Project/P4_LSK109/p4_LSK109/nanoclust/work/bd/409eb66eca121aada74eafb4a2e7e3

Tip: view the complete command output by changing to the process work dir and entering the command cat .command.out

Default BLAST DB location

Should the default DB location be set to /tmp/ as below?

NanoCLUST/main.nf

Line 430 in 26dc468

blast_dir = "/tmp/"

When I manually specify the DB location at the command line and use the Docker profile, the '/tmp' is appended to my input and classification fails.

Installation of NanoCLUST

Hi,

I would like to install NanoCLUST, but I am not sure how to install it as I cannot find the command for installation of NanoCLUST from either Github or Conda. I have installed the Nextflow and conda in my desktop. What should I do next to install NanoCLUST? Thank you.

ValueError: invalid literal for int() with base 10: 'Bradyrhizobium mercantei'

Samples that contain reads that are identified as "Bradyrhizobium mercantei" appear to be breaking the pipeline during the estimate abundance phase. When I look into the consensus_classification.csv file I see that the reported name is reported twice as below

Bradyrhizobium viridifuturi;Bradyrhizobium mercantei;1654716;1904807;0.0;1412;99.079

Is this caused by the DB or does this happen as a result of the api call? Any way to fix it?

-Devin

More error output for context below

Run Name: nasty_almeida

####################################################
## nf-core/nanoclust execution completed unsuccessfully! ##
####################################################
The exit status of the task that caused the workflow execution to fail was: 1.
The full error message was:

Error executing process > 'get_abundances (6)'

Caused by:
  Process `get_abundances (6)` terminated with an error exit status (1)

Command executed [/data/NanoCLUST/templates/get_abundance.py]:

Command error:
  WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
  Traceback (most recent call last):
    File ".command.sh", line 55, in <module>
      get_abundance(names,paths, "C")
    File ".command.sh", line 49, in get_abundance
      df_final_grp = merge_abundance(dfs, tax_level)
    File ".command.sh", line 39, in merge_abundance
      df_final["taxid"] = [get_taxname(row["taxid"], tax_level) for index, row in df_final.iterrows()]
    File ".command.sh", line 39, in <listcomp>
      df_final["taxid"] = [get_taxname(row["taxid"], tax_level) for index, row in df_final.iterrows()]
    File ".command.sh", line 16, in get_taxname
      path = 'http://api.unipept.ugent.be/api/v1/taxonomy.json?input[]=' + str(int(tax_id)) + '&extra=true&names=true'
  ValueError: invalid literal for int() with base 10: 'Bradyrhizobium mercantei'

Work dir:
  /data/NanoCLUST/work/ae/a2eb5c00b570f7810d8dc2a60a7b5e

Pipeline Configuration:
-----------------------
 - Run Name: nasty_almeida
 - Reads: /data/PERM/PERM16S_20201028/sample_fasta/PERM16S_20201028.barcode*.qcreads.fastq
 - Max Resources: 128 GB memory, 16 cpus, 10d time per job
 - Container: docker - [:]
 - Output dir: /data/PERM/PERM16S_20201028/NanoCLUST.all
 - Launch dir: /data/NanoCLUST
 - Working dir: /data/NanoCLUST/work
 - Script dir: /data/NanoCLUST
 - User: dmdrown
 - Config Profile: docker
 - Date Started: 2020-12-03T06:51:55.682511-09:00
 - Date Completed: 2020-12-03T09:04:28.976429-09:00
 - Pipeline script file path: /data/NanoCLUST/main.nf
 - Pipeline script hash ID: 41a36e29b6db0c14a411b4f911c51f5e
 - Nextflow Version: 20.10.0
 - Nextflow Build: 5430
 - Nextflow Compile Timestamp: 01-11-2020 15:14 UTC

--

unable to specify hdbscan option "-min_samples"

Hi,

in the manual of hdbscan (https://hdbscan.readthedocs.io/en/latest/parameter_selection.html) it is specified that the "min_samples" option is a essential parameter. However, checking umap_hdbscan.py and the usage file, it does not seem to be possible to edit this parameter in NanoCLUST. Why is this not possible and are you planning to implement this option in the future?

Thank you in advance.

Regards,
Rob

Error in the get abundances step, by taxa not giving a response from Unipept.

Hi, I am running the nanoclust pipeline and I get an error in the get_abundances step that I am not understanding. I noticed it for multiple samples in my dataset. When I removed 12 samples from my analysis, my pipeline finished.

This is the input file for one of the samples: barcode26.trimmed.fastq.nanoclust_out.txt

id;reads_in_cluster;used_for_consensus;reads_after_corr;draft_id;sciname;taxid;length;per_ident
0;181;100;42;13ab7554-0823-4a77-a740-fb92763d8bda id=43;Candidatus Pelagibacter ubique HTCC1062;335992;1332;98.874
2;52;100;43;4d30559a-0e2e-4a4b-8fb2-464c37a509b9 id=10;Taeania maliponensis;1819564;1414;89.321
1;179;100;43;250f2279-6f02-47c1-9214-5d85b77c3376 id=92;Actinomarinicola tropica;2789776;1353;83.444
14;113;100;43;e730d27b-e8ac-4c97-b95c-d181d3624b8a id=60;Arenibacter nanhaiticus;558155;1417;92.802
18;63;100;43;2318b187-9a34-4ec0-94f0-185f27dbd9d0 id=34;Vicingus serpentipes;1926625;1417;87.862
22;84;100;43;37d2db52-9fdd-42d9-89f7-36e97cb5d13f id=4;Thalassotalea atypica;2054316;1434;92.678
12;72;100;43;52d92061-2e07-4e69-a78a-812ddc2f47e5 id=32;Gimesia maris;122;1459;87.731
4;57;100;44;5d2f6aec-09e3-47f0-b834-5e1e9e2d55cc id=17;Mesobacillus rigiliprofundi;1523158;1406;79.943
21;106;100;45;a87c285d-d8e0-46cb-b01e-df7945fec255 id=60;Coraliomargarita akajimensis;395922;1461;86.858
3;390;100;43;2e9306dd-f766-4b85-a273-a518dbaa2033 id=57;Methylotenera mobilis JLW8;583345;1443;93.763
24;186;100;43;4e907808-91a5-4700-a053-3194a2a38fde id=9;Alkalimarinus sediminis;1632866;1437;91.858
11;268;100;43;7f343204-08f0-45cc-b460-56ab9ff3b7d5 id=60;Alkalimarinus sediminis;1632866;1455;86.460
28;653;100;43;e9b91a9d-570d-4e6b-aba4-1989476e2fec id=27;Thalassomonas haliotis;485448;1439;95.483
9;60;100;43;747d1e37-e8ad-4a3f-af37-d2db083523c5 id=3;Altibacter lentus;1223410;1422;90.717
10;139;100;43;3dc2ee32-fff8-4e16-ab7a-6bd1fd68f971 id=50;Thiolapillus brandeum;1076588;1449;86.888
20;79;100;43;69f61715-71f1-443a-bdca-757af76e6235 id=23;Longimicrobium terrae;1639882;1402;83.024
27;105;100;43;dea0c225-44fb-447d-90c0-3b0966c6c067 id=48;Colwellia psychrerythraea;28229;1446;97.925
13;225;100;43;2360e51e-64ae-4bb1-a460-fb90dfd67d53 id=56;Polaribacter atrinae;1333662;1423;96.767
17;108;100;43;188ee7a4-ee57-4768-89c8-f776b65089fe id=94;Owenweeksia hongkongensis DSM 17368;926562;1412;87.890
26;60;100;44;7a0650e3-5ee5-4d28-9da6-61cb3c109162 id=51;Butyratibacter algicola;2029869;1184;92.230
25;151;100;42;a619e085-8431-49cb-afcf-2a95bafb9bf1 id=45;Pelagimonas varians;696760;1021;96.572
29;1573;100;43;6751acaa-f6a6-4f54-8390-74892f37d560 id=78;Thalassomonas haliotis;485448;1447;95.508
6;145;100;43;7ee9dbec-bbef-4115-9766-680fb3f80083 id=85;Owenweeksia hongkongensis DSM 17368;926562;1423;88.335
8;59;100;43;2426ac8d-0d30-4546-90d9-bc69ebc59995 id=37;Poseidonibacter lekithochrous;1904463;1407;95.736
16;1144;100;43;8f7269e2-4bfa-4e66-93a6-f04f6f583365 id=67;Glaciecola amylolytica;2489595;1421;96.904
19;140;100;44;ada6c9bd-7ab0-407a-b074-a48aafeb48df id=65;Gemmatimonas aurantiaca T-27;379066;1222;84.943
23;83;100;43;35f5b040-c792-463c-8528-630b82aea3bd id=10;Pseudohongiella nitratireducens;1768907;1436;94.777
15;2382;100;43;04c44679-e020-4e24-a05a-cbf3c3cdad62 id=80;Glaciecola amylolytica;2489595;1416;96.398
7;68;100;43;a2a507f7-e93c-40bc-bc59-5ac68aabf671 id=30;Poseidonibacter lekithochrous;1904463;1402;92.939
5;130;100;43;d5a1a6c7-6464-4ea7-be4c-be42d07e0c4f id=10;Phaeocystidibacter marisrubri;1577780;1416;90.042

And this is the .command.log output:

Traceback (most recent call last):
  File "/cluster/work/users/thhaverk/nanoclust_tmp/c0/7c1d408ddcb9bded4d2821d2232ea0/.command.sh", line 65, in <module>
    get_abundance(names,paths, "G")
  File "/cluster/work/users/thhaverk/nanoclust_tmp/c0/7c1d408ddcb9bded4d2821d2232ea0/.command.sh", line 59, in get_abundance
    df_final_grp = merge_abundance(dfs, tax_level)
  File "/cluster/work/users/thhaverk/nanoclust_tmp/c0/7c1d408ddcb9bded4d2821d2232ea0/.command.sh", line 49, in merge_abundance
    df_final["taxid"] = [get_taxname(row["taxid"], tax_level) for index, row in df_final.iterrows()]
  File "/cluster/work/users/thhaverk/nanoclust_tmp/c0/7c1d408ddcb9bded4d2821d2232ea0/.command.sh", line 49, in <listcomp>
    df_final["taxid"] = [get_taxname(row["taxid"], tax_level) for index, row in df_final.iterrows()]
  File "/cluster/work/users/thhaverk/nanoclust_tmp/c0/7c1d408ddcb9bded4d2821d2232ea0/.command.sh", line 28, in get_taxname
    return json.loads(complete_tax)[0][tax_level_tag]
IndexError: list index out of range

It seems as the genus step of the script is causing the error, but I am not understanding why?

This is the .command.sh file for the process:

#!/usr/bin/env python

import numpy as np
import matplotlib.pyplot as plt
from matplotlib import rc
import pandas as pd
from functools import reduce
import requests
import json
#https://unipept.ugent.be/apidocs/taxonomy

def get_taxname(tax_id,tax_level):
    tags = {"S": "species_name","G": "genus_name","F": "family_name","O":'order_name', "C": "class_name"}
    tax_level_tag = tags[tax_level]
    #Avoids pipeline crash due to "nan" classification output. Thanks to Qi-Maria from Github
    if str(tax_id) == "nan":
        tax_id = 1

    path = 'http://api.unipept.ugent.be/api/v1/taxonomy.json?input[]=' + str(int(tax_id)) + '&extra=true&names=true'
    complete_tax = requests.get(path).text

    #Checks for API correct response (field containing the tax name). Thanks to devinbrown from Github
    try:
        name = json.loads(complete_tax)[0][tax_level_tag]
    except:
        name = str(int(tax_id))

    return json.loads(complete_tax)[0][tax_level_tag]

def get_abundance_values(names,paths):
 dfs = []
    for name,path in zip(names,paths):
        data = pd.read_csv(path, index_col=False, sep=';').iloc[:,1:]

        total = sum(data['reads_in_cluster'])
        rel_abundance=[]

        for index,row in data.iterrows():
            rel_abundance.append(row['reads_in_cluster'] / total)

        data['rel_abundance'] = rel_abundance
        dfs.append(pd.DataFrame({'taxid': data['taxid'], 'rel_abundance': rel_abundance}))
        data.to_csv("" + name + "_nanoclust_out.txt")

    return dfs

def merge_abundance(dfs,tax_level):
    df_final = reduce(lambda left,right: pd.merge(left,right,on='taxid',how='outer').fillna(0), dfs)
    df_final["taxid"] = [get_taxname(row["taxid"], tax_level) for index, row in df_final.iterrows()]
    df_final_grp = df_final.groupby(["taxid"], as_index=False).sum()
    return df_final_grp

def get_abundance(names,paths,tax_level):
    if(not isinstance(paths, list)):
        paths = [paths]
        names = [names]

    dfs = get_abundance_values(names,paths)
    df_final_grp = merge_abundance(dfs, tax_level)
    df_final_grp.to_csv("rel_abundance_"+ names[0] + "_" + tax_level + ".csv", index = False)

paths = "barcode26.trimmed.fastq.nanoclust_out.txt"
names = "barcode26.trimmed.fastq"

get_abundance(names,paths, "G")
get_abundance(names,paths, "S")
get_abundance(names,paths, "O")
get_abundance(names,paths, "F")

swap memory issues

Im having an issue running the pipeline on an Ubuntu 18.04 LTS machine.
Im running it through a Docker profile as suggested, however it looks like there is a problem in memory allocation as the whole process stops as soon as the kmer_freqs process initiates (see the error message below)

Error executing process > 'kmer_freqs (172)'

Caused by:
  Process `kmer_freqs (172)` terminated with an error exit status (1)

Command executed:

  kmer_freq.py -r BC11_qced_reads_set.fastq > freqs.txt

Command exit status:
  1

Command output:
  (empty)

Command error:
  WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
  Traceback (most recent call last):
    File "/path/to/NanoCLUST/bin/kmer_freq.py", line 244, in <module>
      main(args)
    File "/path7to/NanoCLUST/bin/kmer_freq.py", line 216, in main
      ftype  = check_input_format(args.qced_reads)
    File "/path/to/NanoCLUST/bin/kmer_freq.py", line 207, in check_input_format
      if line[0]=="@":
  UnboundLocalError: local variable 'line' referenced before assignment

Work dir:
  /path/to/NanoCLUST/work/3d/6cfe8a5e749c3a98f56d1ce51b0d91

Now Ive tried to look at how to fix memory swap issues on Docker, but I don't seem to come across any good tutorial for a newbie like myself.
Has anyone got a clue what I need to do in order to fix this, or if there is an option in nanoCLUST to set a limit on the amount of processes to run simultaneously?

Appreciate any help

Martin

Using slurm instead of local

Hi, I have forked this repo and in order to get it to work on our hpc cluster, I would like to implement slurm usage in the pipeline. But I see that some steps in the pipeline want to download things from the internet. On our cluster it not possible to download things once a slurm job is running. It is only possible on the login nodes. So I think of deciding that for each step in the pipeline (a bit messy, I think)

But in order to do that, I think I need to modify the nextflow.config file further.

A suggestion would be to create something like this, with the profiles: test, conda, docker and singularity.
I have wrapped the processes just for clarity
Singularity is used on our cluster, so for me it makes sense to include the slurm configuration in the singularity process, by pointing to a slurm configuration file. like this: includeConfig 'conf/slurm.config'
But it might also be possible to make another profile called slurm, which includes 1) what the slurm configuration is and 2) which processes should use slurm to run. And then run the workflow with the profile settings slurm,singularity, just like test,singularity.

One issue with slurm is that you have to indicate an account. That has to be modified by users that would like to use it on a different platform. So if this is added, then I also need to add a little section on slurm to the main readme documentation.

Let me know what you think

profiles {
  test { includeConfig 'conf/test.config' }
  conda {
    process {
      withName: demultiplex { conda = "$baseDir/conda_envs/demultiplex/environment.yml" }
      ...
      withName: output_documentation { conda = "$baseDir/conda_envs/output_documentation/environment.yml" }
    }
  }
  docker {
    docker.enabled = true
    //process.container = 'nf-core/nanoclust:latest'
    process {
      withName: demultiplex { container = 'hecrp/nanoclust-demultiplex' }
      ...
      withName: output_documentation { container = 'hecrp/nanoclust-output_documentation' }
    }
    }
    singularity {
      includeConfig 'conf/slurm.config' ### Pointer to slurm config file.
      singularity.enabled = true
      singularity.autoMounts = true
      //process.container = 'nf-core/nanoclust:latest'
      process {
        withName: demultiplex { container = 'docker://hecrp/nanoclust-demultiplex' }
        ...
        withName: output_documentation { container = 'docker://hecrp/nanoclust-output_documentation' }
      }
      }
}

Error consensus classification

Hey!

For our project we need to make our own pipeline for 16S Nanopore data and comparing it with an existing pipeline. The data is trimmed with Porechop. We are using NanoCLUST to compare output. We are experiencing still the same issue with NanoCLUST namely with classifying the consensus of our reads.

The following command is used in shell:

nextflow run nanoclust/NanoCLUST/main.nf -profile docker --reads test_data/testdata_bc01_115000reads.fastq --db nanoclust/NanoCLUST/db/16S_ribosomal_RNA --tax nanoclust/NanoCLUST/db/taxdb

The following error is occuring:

Error executing process > 'consensus_classification (1)'

Caused by:
Process consensus_classification (1) terminated with an error exit status (2)

Command executed:

export BLASTDB=
export BLASTDB=$BLASTDB:/tmp/nanoclust/NanoCLUST/db/taxdb
blastn -query consensus.fasta -db /tmp/nanoclust/NanoCLUST/db/16S_ribosomal_RNA -task blastn -dust no -outfmt "10 sscinames staxids evalue length pident" -evalue 11 -max_hsps 50 -max_target_seqs 5 | sed 's/,/;/g' > consensus_classification.csv
#DECIDE FINAL CLASSIFFICATION
cat 33_draft.log > 33_blast.log
echo -n ";" >> 33_blast.log
BLAST_OUT=$(cut -d";" -f1,2,4,5 consensus_classification.csv | head -n1)
echo $BLAST_OUT >> 33_blast.log

Command exit status:
2

Command output:
(empty)

Command error:
BLAST Database error: No alias or index file found for nucleotide database [/tmp/nanoclust/NanoCLUST/db/16S_ribosomal_RNA] in search path [/mnt/StudentFiles/2020-21/Project11/project/work/64/555f24ca05d9c9f4c0807c78fb671c::/tmp/nanoclust/NanoCLUST/db/taxdb:]

Work dir:
/mnt/StudentFiles/2020-21/Project11/project/work/64/555f24ca05d9c9f4c0807c78fb671c

Tip: when you have fixed the problem you can continue the execution adding the option -resume to the run command line

We already read a closed issue which had the same issue as we have, but we still do not understand where the issue lies and how to solve the issue.

Thank you in advance!

Kind Regards,

Nadine

Taxid output "Nan"

Hi,
When I run NanoCLUST, I got a taxid "nan" and the whole program crushed. I added few lines code in get_abundance.py by assigning taxids to the root. Is that correct way to solve the problem? or Is there any other way you suggest?
Like this:
def get_taxname(tax_id,tax_level):
tags = {"S": "species_name","G": "genus_name","F": "family_name","O":'order_
name', "C": "class_name", "P": "phylum_name"}
tax_level_tag = tags[tax_level]
if str(tax_id) == "nan":
tax_id = 1

Phylum level

I noticed in the test output that there is no phylum level output file. Is that not supported? I couldn't find an option for it.
thanks :)

#Cannot read project manifest

Hello!

I met a problem when I tried to use this pipeline to analyze my data.
The code typed is :
nextflow run nf-core/nanoclust -profile conda --reads 'mergelength.fastq' --db "db/16S_ribosomal_RNA" --tax "db/taxdb/"

While the system reported:
N E X T F L O W ~ version 20.04.1
Pulling nf-core/nanoclust ...
WARN: Cannot read project manifest -- Cause: Remote resource not found: https://api.github.com/repos/nf-core/nanoclust/contents/nextflow.config
Remote resource not found: https://api.github.com/repos/nf-core/nanoclust/contents/main.nf

Is there something wrong?
Thanks!

Add full NCBI taxonomy to .nanoclust_out.txt file [Feature request]

Thanks for making this helpful analysis tool! Currently, the .nanoclust_out.txt looks like (providing the sciname and taxid of each read cluster):

id;reads_in_cluster;used_for_consensus;reads_after_corr;draft_id;sciname;taxid;length;per_ident
3;688;20;19;d16baee5-a0ca-4b2d-9b6c-8fe2f704a892 id=20;Staphylococcus aureus;1280;1467;99.387
1;686;20;19;6f3fee71-2ec6-4472-b668-f55cbfa125da id=16;Bacillus halotolerans;260554;1473;99.457
7;674;20;19;51c4bec0-ca0a-4b1d-878e-c986dca6736b id=9;Shigella sonnei;624;1462;99.453

Then, rel_abundance_ ... .csv files are made that summarize relative abundances of taxa at various taxonomic levels, I assume based on mapping the classifications of individual clusters to the NCBI taxdump.

I am wondering if it is possible to add the full NCBI taxonomy (rather than just the sciname and taxid) to the .nanoclust_out.txt file, rather than just including this information in the rel_abundance_ ... .csv files. This change would be very helpful for downstream analyses. Alternatively, even adding the full taxonomy path to the individual cluster__ output directories could work.

I imagine this is a low priority feature request if you are doing other major development to the code, but I thought I would create this feature request in case the code update is relatively straightforward. Again, thank you for your work on this tool.

Launching 'main.nf' [nauseous_albattani] - revision: 4b892cc54e

This is the message that appears as I put the command nextflow run main.nf -profile test, conda,

How can be solved?

Agnese

read clustering issue

Hi,

I keep getting errors while clustering reads, see error code below. Does anyone know how to troubleshoot this?

Thanks in advance.

#!/usr/bin/env python

import numpy as np
import umap
import os
os.environ['MPLCONFIGDIR'] = "~/NanoCLUST"
import matplotlib.pyplot as plt
from sklearn import decomposition
import random
import pandas as pd
import hdbscan

df = pd.read_csv("freqs.txt", delimiter=" ")

#UMAP
motifs = [x for x in df.columns.values if x not in ["read", "length"]]
X = df.loc[:,motifs]
X_embedded = umap.UMAP(n_neighbors=15, min_dist=0.1, verbose=2).fit_transform(X)

df_umap = pd.DataFrame(X_embedded, columns=["D1", "D2"])
umap_out = pd.concat([df["read"], df["length"], df_umap], axis=1)

#HDBSCAN
X = umap_out.loc[:,["D1", "D2"]]
umap_out["bin_id"] = hdbscan.HDBSCAN(min_cluster_size=int(50), cluster_selection_epsilon=int(0.5)).fit_predict(X)

for cluster in np.sort(umap_out['bin_id'].unique()):
read = umap_out.loc[umap_out['bin_id'] == cluster].iloc[0]
plt.annotate(str(cluster), (read['D1'], read['D2']), weight='bold', size=14)

plt.savefig('hdbscan.output.png')
umap_out.to_csv("hdbscan.output.tsv", sep=" ", index=False)

Command exit status:
139

Command output:
(empty)

Command error:
.command.run: line 159: 23 Segmentation fault (core dumped) /usr/bin/env python .command.sh

Work dir:
/home/ec2-user/NanoCLUST/work/0c/e056c7f5bc18e9ca237da8653175f5

Tip: when you have fixed the problem you can continue the execution adding the option -resume to the run command line

Error in read_clustering step

Hi,

I have run about 40 fastq files on NanoCLUST. However, there were 3 fastq files showed error in read_clustering step as shown in the below picture.

I am wondering why just certain fastq files have this problem. Does anyone know how to solve the error in read_clustering step?
Thank you.

BLAST Database error: No alias or index file found for nucleotide database

Hello,

I have followed the instructions to download the required databases, but this problems still persists:

Command executed:

export BLASTDB=
export BLASTDB=$BLASTDB:/home/ubuntu/bin/NanoCLUST/db/taxdb
blastn -query consensus.fasta -db /home/ubuntu/bin/NanoCLUST/db/16S_ribosomal_RNA -task blastn -dust no -outfmt "10 sscinames staxids evalue length pident" -evalue 11 -max_hsps 50 -max_target_seqs 5 | sed 's/,/;/g' > consensus_classification.csv
#DECIDE FINAL CLASSIFFICATION
cat 11_draft.log > 11_blast.log
echo -n ";" >> 11_blast.log
BLAST_OUT=$(cut -d";" -f1,2,4,5 consensus_classification.csv | head -n1)
echo $BLAST_OUT >> 11_blast.log

Command exit status:
2

Command output:
(empty)

Command error:
BLAST Database error: No alias or index file found for nucleotide database [/home/ubuntu/bin/NanoCLUST/db/16S_ribosomal_RNA] in search path [/media/kraken/NanoCLUST/work/ff/483a7d41cb9ec59d392dda702d6293::/home/ubuntu/bin/NanoCLUST/db/taxdb:]

Work dir:
/media/kraken/NanoCLUST/work/ff/483a7d41cb9ec59d392dda702d6293

Tip: when you have fixed the problem you can continue the execution adding the option -resume to the run command line

Performing differential OTU using nanoclust data

The pipeline ran after removing the samples which was resulting in error 
executor >  local (6046)
[a2/6ffd4c] process > QC (69)                         [100%] 73 of 73 ✔
[c6/b1e07e] process > fastqc (73)                     [100%] 73 of 73 ✔
[0b/655df7] process > kmer_freqs (67)                 [100%] 73 of 73 ✔
[1c/1ca964] process > read_clustering (71)            [100%] 73 of 73 ✔
[79/e62849] process > split_by_cluster (73)           [100%] 73 of 73 ✔
[a9/c29c19] process > read_correction (1046)          [100%] 1048 of 1048 ✔
[27/40b6e0] process > draft_selection (1048)          [100%] 1048 of 1048 ✔
[3a/e5d1e2] process > racon_pass (1048)               [100%] 1048 of 1048 ✔
[8b/936ff4] process > medaka_pass (1048)              [100%] 1048 of 1048 ✔
[a2/d78685] process > consensus_classification (1048) [100%] 1050 of 1050, failed: 2, retries: 2 ✔
[42/532a74] process > join_results (73)               [100%] 73 of 73 ✔
[b3/746fed] process > get_abundances (73)             [100%] 73 of 73 ✔
[db/dd86da] process > plot_abundances (292)           [100%] 292 of 292 ✔
[84/796ce6] process > output_documentation            [100%] 1 of 1 ✔
[nf-core/nanoclust] Pipeline completed successfully
WARN: [nf-core/nanoclust] Could not attach MultiQC report to summary email
Completed at: 20-May-2021 20:14:01
Duration    : 1h 57m 23s
CPU hours   : 97.6 (0% failed)
Succeeded   : 6'044
Failed      : 2

Previously i ran kraken2 where I would generate OTU table from various class and then perform differential OTU using deseq2 as it was raw counts.

How to do the same with the nanoclust output? It gives relative abundances.

Any suggestion how to go about this

kmer_freq error

Dear NanoCLUST team, the system reported an error when I analysis my data. Do you have any advice or suggestion?
Thanks a lot!

run test fail

when run
nextflow run ~/software/NanoCLUST/main.nf -profile test,docker
fails with:

N E X T F L O W ~ version 20.10.0
Launching `/home/azshara/software/NanoCLUST/main.nf` [lonely_sinoussi] - revision: c3b5ee2f3d
WARNING: Could not load nf-core/config profiles: https://raw.githubusercontent.com/nf-core/configs/master/nfcore_custom.config

  _   __                     ________    __  _____________
 / | / /___ _____  ____     / ____/ /   / / / / ___/_  __/
/  |/ / __ `/ __ \/ __ \   / /   / /   / / / /\__ \ / /

/ /| / // / / / / // / / // // // // // /
// |/_,// //_/ _/__/_//___///

NanoCLUST v1.0dev

Run Name : lonely_sinoussi
Reads : /home/azshara/software/NanoCLUST/test_datasets/mock4_run3bc08_5000.fastq
Max Resources : 128 GB memory, 16 cpus, 10d time per job
Container : docker - [:]
Output dir : ./results
Launch dir : /home/azshara
Working dir : /home/azshara/work
Script dir : /home/azshara/software/NanoCLUST
User : azshara
Config Profile : test,docker
Config Description: Minimal test dataset to check pipeline function

executor > local (38)
[f8/8f1bd5] process > QC (1) [100%] 1 of 1 ✔
[dd/348f4d] process > fastqc (1) [100%] 1 of 1 ✔
[a2/fef066] process > kmer_freqs (1) [100%] 1 of 1 ✔
[a9/d714f6] process > read_clustering (1) [100%] 1 of 1 ✔
[93/54a059] process > split_by_cluster (1) [100%] 1 of 1 ✔
[7c/acb0eb] process > read_correction (4) [100%] 8 of 8 ✔
[01/f7adb8] process > draft_selection (6) [100%] 8 of 8 ✔
[d6/9be817] process > racon_pass (7) [100%] 8 of 8 ✔
[77/ce919c] process > medaka_pass (8) [ 0%] 0 of 8
[- ] process > consensus_classification -
[- ] process > join_results -
[- ] process > get_abundances -
[- ] process > plot_abundances -
[db/83f799] process > output_documentation [100%] 1 of 1 ✔
Error executing process > 'medaka_pass (3)'

Caused by:
Missing output file(s) consensus_medaka.fasta/consensus.fasta expected by process medaka_pass (3)

Command executed:

Command exit status:
0

Command output:
(empty)

Command error:

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/conda/envs/medaka_pass/bin/medaka", line 11, in
sys.exit(main())
File "/opt/conda/envs/medaka_pass/lib/python3.6/site-packages/medaka/medaka.py", line 612, in main
args = parser.parse_args()
File "/opt/conda/envs/medaka_pass/lib/python3.6/argparse.py", line 1734, in parse_args
args, argv = self.parse_known_args(args, namespace)
File "/opt/conda/envs/medaka_pass/lib/python3.6/argparse.py", line 1766, in parse_known_args
namespace, args = self._parse_known_args(args, namespace)
File "/opt/conda/envs/medaka_pass/lib/python3.6/argparse.py", line 1954, in _parse_known_args
positionals_end_index = consume_positionals(start_index)
File "/opt/conda/envs/medaka_pass/lib/python3.6/argparse.py", line 1931, in consume_positionals
take_action(action, args)
File "/opt/conda/envs/medaka_pass/lib/python3.6/argparse.py", line 1840, in take_action
action(self, namespace, argument_values, option_string)
File "/opt/conda/envs/medaka_pass/lib/python3.6/argparse.py", line 1137, in call
subnamespace, arg_strings = parser.parse_known_args(arg_strings, None)
File "/opt/conda/envs/medaka_pass/lib/python3.6/argparse.py", line 1766, in parse_known_args
namespace, args = self._parse_known_args(args, namespace)
File "/opt/conda/envs/medaka_pass/lib/python3.6/argparse.py", line 1954, in _parse_known_args
positionals_end_index = consume_positionals(start_index)
File "/opt/conda/envs/medaka_pass/lib/python3.6/argparse.py", line 1931, in consume_positionals
take_action(action, args)
File "/opt/conda/envs/medaka_pass/lib/python3.6/argparse.py", line 1840, in take_action
action(self, namespace, argument_values, option_string)
File "/opt/conda/envs/medaka_pass/lib/python3.6/argparse.py", line 1137, in call
subnamespace, arg_strings = parser.parse_known_args(arg_strings, None)
File "/opt/conda/envs/medaka_pass/lib/python3.6/argparse.py", line 1766, in parse_known_args
namespace, args = self._parse_known_args(args, namespace)
executor > local (38)
[f8/8f1bd5] process > QC (1) [100%] 1 of 1 ✔
[dd/348f4d] process > fastqc (1) [100%] 1 of 1 ✔
[a2/fef066] process > kmer_freqs (1) [100%] 1 of 1 ✔
[a9/d714f6] process > read_clustering (1) [100%] 1 of 1 ✔
[93/54a059] process > split_by_cluster (1) [100%] 1 of 1 ✔
[7c/acb0eb] process > read_correction (4) [100%] 8 of 8 ✔
[01/f7adb8] process > draft_selection (6) [100%] 8 of 8 ✔
[d6/9be817] process > racon_pass (7) [100%] 8 of 8 ✔
[ba/db44e5] process > medaka_pass (1) [ 14%] 1 of 7, failed: 1
[- ] process > consensus_classification -
[- ] process > join_results -
[- ] process > get_abundances -
[- ] process > plot_abundances -
[db/83f799] process > output_documentation [100%] 1 of 1 ✔
Error executing process > 'medaka_pass (3)'

Caused by:
Missing output file(s) consensus_medaka.fasta/consensus.fasta expected by process medaka_pass (3)

Command executed:

Command exit status:
0

Command output:
(empty)

Command error:

During handling of the above exception, another exception occurred:

Work dir:
/home/azshara/work/ce/72aa71bf8bf6c6d1161b33d8892b72

Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run

Not a valid project name: main.nf

Ive cloned the repo and ran the single sample analysis in docker, successfully. When I go to try and run my own samples the pipeline stops immediately with this error:
Not a valid project name: main.nf
If I try to run the single sample analysys again, with
nextflow run main.nf -profile test,docker
I now get the same error.

Also, on https://nf-co.re/pipelines I can't find the nanoCLUST pipeline and its documentation, is this deprecated now?

Thanks for any help

Martin

Error executing process > 'output_documentation'

Dear Everyone,

I just have failed to run this program. /(ㄒoㄒ)/~~
The command I used is like this "nextflow run ../NanoCLUST-master/main.nf -profile test,conda"

'''

[nf-core/nanoclust] Pipeline completed with errors
Error executing process > 'output_documentation'

Caused by:
Process output_documentation terminated with an error exit status (1)

Command executed:

markdown_to_html.py 3pipeline_output.md -o results_description.html

Command exit status:
1

Command output:
(empty)

Command error:
.command.sh: line 2: /home/wuxiaoyun/ztz/20210623/NanoCLUST-master/bin/markdown_to_html.py: Permission denied

Work dir:
/home/wuxiaoyun/ztz/20210623/result/work/fe/10ff2b21a20e21eeefe3622fd92c80

Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run

'''
'''

[- ] process > QC -
[- ] process > fastqc -
executor > local (1)
executor > local (1)
[51/737004] process > QC (1) [100%] 1 of 1 ✔
executor > local (2)
[51/737004] process > QC (1) [100%] 1 of 1 ✔
executor > local (2)
[51/737004] process > QC (1) [100%] 1 of 1 ✔
executor > local (2)
[51/737004] process > QC (1) [100%] 1 of 1 ✔
executor > local (2)
[51/737004] process > QC (1) [100%] 1 of 1 ✔
executor > local (2)
[51/737004] process > QC (1) [100%] 1 of 1 ✔
executor > local (2)
[51/737004] process > QC (1) [100%] 1 of 1 ✔
[- ] process > fastqc -
[- ] process > kmer_freqs -
[- ] process > read_clustering -
[- ] process > split_by_cluster -
[- ] process > read_correction -
[- ] process > draft_selection -
[- ] process > racon_pass -
[- ] process > medaka_pass -
[- ] process > consensus_classification -
[- ] process > join_results -
[- ] process > get_abundances -
[- ] process > plot_abundances -
[fe/10ff2b] process > output_documentation [100%] 1 of 1, failed: 1 ✘
Creating Conda env: /home/wuxiaoyun/ztz/20210623/NanoCLUST-master/conda_envs/kmer_freqs/environment.yml [cache /home/wuxiaoyun/ztz/20210623/result/work/conda/kmer_freq-999fbd7473cd6106e662a85d832c592d]
Creating Conda env: /home/wuxiaoyun/ztz/20210623/NanoCLUST-master/conda_envs/fastqc/environment.yml [cache /home/wuxiaoyun/ztz/20210623/result/work/conda/fastqc_multiqc-fddb8d2dd7b722375e53387e3daf8470]
Execution cancelled -- Finishing pending tasks before exit

'''

I am new in bioinformatics.can you help me to solve this problem?
Thank you very much! (^_^)

genomicsiter / nanoclust Goto Github PK

nanoclust's Introduction

NanoCLUST

Introduction

Quick Start

Computing requirements note

Troubleshooting

Credits

Contributions and Support

nanoclust's People

Contributors

Stargazers

Watchers

Forkers

nanoclust's Issues

` NanoCLUST v1.0dev

NanoCLUST v1.0dev

Dear NANOClust team, I wanted to give a shot to your pipeline. I proceed with a fresh install of Miniconda and nextflow. After clonning this repo and the NCBI database, I just launched a test with nextflow run main.nf -profile test,conda and get the following error. Could you help me? Thanks:

NanoCLUST v1.0dev

N E X T F L O W ~ version 20.10.0 Launching /home/azshara/software/NanoCLUST/main.nf [lonely_sinoussi] - revision: c3b5ee2f3d WARNING: Could not load nf-core/config profiles: https://raw.githubusercontent.com/nf-core/configs/master/nfcore_custom.config

NanoCLUST v1.0dev

Recommend Projects

Recommend Topics

Recommend Org

`
NanoCLUST v1.0dev

Dear NANOClust team, I wanted to give a shot to your pipeline. I proceed with a fresh install of Miniconda and nextflow. After clonning this repo and the NCBI database, I just launched a test with `nextflow run main.nf -profile test,conda` and get the following error. Could you help me? Thanks:

N E X T F L O W ~ version 20.10.0
Launching `/home/azshara/software/NanoCLUST/main.nf` [lonely_sinoussi] - revision: c3b5ee2f3d
WARNING: Could not load nf-core/config profiles: https://raw.githubusercontent.com/nf-core/configs/master/nfcore_custom.config