Giter Site home page Giter Site logo

bu-isciii / plasmidid Goto Github PK

View Code? Open in Web Editor NEW
37.0 4.0 8.0 94 MB

PlasmidID is a mapping-based, assembly-assisted plasmid identification tool that analyzes and gives graphic solution for plasmid identification.

Home Page: https://github.com/BU-ISCIII/plasmidID/wiki

License: GNU General Public License v3.0

Shell 84.76% Dockerfile 0.28% Python 14.97%
plasmid ngs ngs-analysis microbiology whole-genome-sequencing

plasmidid's Introduction

install with bioconda CircleCI Build Status License: GPL v3 Scif

plasmidID Logo



Introduction

PlasmidID is a mapping-based, assembly-assisted plasmid identification tool that analyzes and gives graphic solution for plasmid identification.

PlasmidID is a computational pipeline implemented in BASH that maps Illumina reads over plasmid database sequences. The k-mer filtered, most covered sequences are clustered by identity to avoid redundancy and the longest are used as scaffold for plasmid reconstruction. Reads are assembled and annotated by automatic and specific annotation. All information generated from mapping, assembly, annotation and local alignment analyses is gathered and accurately represented in a circular image which allow user to determine plasmidic composition in any bacterial sample.

Requirements

Software

Plasmid database

Since version v1.5.1 plasmid database can be downloaded with the following command:

 download_plasmid_database.py -o FOLDER

Installation

Install from source

Install all dependencies and add them to $PATH

git clone https://github.com/BU-ISCIII/plasmidID.git

Add plasmidID and ./bin to $PATH

Install using conda

This option is recomended.

Install Anaconda3

conda install -c conda-forge -c bioconda plasmidid

Wait for the environment to solve

Ignore warnings/errors

Use Docker

Example: Clone the repo:

git clone [email protected]:BU-ISCIII/plasmidID.git
cd plasmidID

Run it with the test data using docker:

Notice that the input files MUST be in your present working directory or in any folder inside it. For example, if I execute this command in /home/smonzon, my folder with the files would be in /home/smonzon/test.

docker run -v $PWD:$PWD -w $PWD buisciii/plasmidid plasmidID \
     -1 test/KPN_TEST_R1.fastq.gz  \
     -2 test/KPN_TEST_R2.fastq.gz \
     -d test/plasmids_TEST_database.fasta \
     -c test/contigs_KPN_TEST.fasta \
     --no-trim \
     -s KPN

Quick usage

Illumina paired-end

plasmidID \
-1 SAMPLE_R1.fastq.gz  \
-2 SAMPLE_R2.fastq.gz \
-d YYYY-MM-DD_plasmids.fasta \
-c SAMPLE_assembled_contigs.fasta \
--no-trim \
-s SAMPLE

SMRT sequencing (only contigs)

plasmidID \
-d YYYY-MM-DD_plasmids.fasta \
-c SAMPLE_contigs.fasta \
-s SAMPLE

Annotate any fasta you want

plasmidID \
-d YYYY-MM-DD_plasmids.fasta \
-c SAMPLE_assembled_contigs.fasta \
-a annotation_file \
-s SAMPLE

More info about annotation file

If there are several samples in the same GROUP folder

summary_report_pid.py -i NO_GROUP/

Usage

usage : plasmidID <-1 R1> <-2 R2> <-d database(fasta)> <-s sample_name> [-g group_name] [options]

	Mandatory input data:
	-1 | --R1	<filename>	reads corresponding to paired-end R1 (mandatory)
	-2 | --R2	<filename>	reads corresponding to paired-end R2 (mandatory)
	-d | --database	<filename>	database to map and reconstruct (mandatory)
	-s | --sample	<string>	sample name (mandatory), less than 37 characters

	Optional input data:
	-g | --group	<string>	group name (optional). If unset, samples will be gathered in NO_GROUP group
	-c | --contigs	<filename>	file with contigs. If supplied, plasmidID will not assembly reads
	-a | --annotate <filename>	file with configuration file for specific annotation
	-o 		<output_dir>	output directory, by default is the current directory

	Pipeline options:
	--explore	Relaxes default parameters to find less reliable relationships within data supplied and database
	--only-reconstruct	Database supplied will not be filtered and all sequences will be used as scaffold
						This option does not require R1 and R2, instead a contig file can be supplied
	-w 			Undo winner takes it all algorithm when clustering by kmer - QUICKER MODE
	Trimming:
	--trimmomatic-directory Indicate directory holding trimmomatic .jar executable
	--no-trim	Reads supplied will not be quality trimmed

	Coverage and Clustering:
	-C | --coverage-cutoff	<int>	minimun coverage percentage to select a plasmid as scafold (0-100), default 80
	-S | --coverage-summary	<int>	minimun coverage percentage to include plasmids in summary image (0-100), default 90
	-f | --cluster	<int>	kmer identity to cluster plasmids into the same representative sequence (0 means identical) 		(0-1), default 0.5
	-k | --kmer	<int>	identity to filter plasmids from the database with kmer approach (0-1), default 0.95

	Contig local alignment
	-i | --alignment-identity <int>	minimun identity percentage aligned for a contig to annotate, default 90
	-l | --alignment-percentage <int>	minimun length percentage aligned for a contig to annotate, default 20
	-L | --length-total	<int>	minimun alignment length to filter blast analysis
	--extend-annotation <int>	look for annotation over regions with no homology found (base pairs), default 500bp

	Draw images:
	--config-directory <dir>	directory holding config files, default config_files/
	--config-file-individual <file-name> file name of the individual file used to reconstruct
	Additional options:

	-M | --memory	<int>	max memory allowed to use
	-T | --threads	<int>	number of threads
	-v | --version		version
	-h | --help		display usage message

example: ./plasmidID.sh -1 ecoli_R1.fastq.gz -2 ecoli_R2.fastq.gz -d database.fasta -s ECO_553 -G ENTERO
	./plasmidID.sh -1 ecoli_R1.fastq.gz -2 ecoli_R2.fastq.gz -d PacBio_sample.fasta -c scaffolds.fasta -C 60 -s ECO_60 -G ENTERO --no-trim

Examples

Under construction

Output

Since v1.6, the more relevant output is located in GROUP/SAMPLE folder:

  • SAMPLE_final_results.html(.tab)
    • id: Name of the accession number of reference
    • length: length of the reference sequence
    • species: species of the reference sequence
    • description: rest of reference fasta header
    • contig_name: number of the contigs that align the minimun required for complete contig track
    • SAMPLE:
      • Image of the reconstructed plasmid (click to open in new tab)
      • MAPPING % (percentage): percentage of reference covered with reads
        • X for contig mode (gray colour)
        • Orientative colouring (the closer to 100% the better)
      • ALIGN FR (fraction_covered): total length of contigs aligned (complete) / reference sequence length
        • Orientative colouring (the closer to 1 the better)

Annotation file

Under construction

Illustrated pipeline

This image sumarizes PlasmidID pipeline, including the most important steps. For furder details, including:

workflow_small

plasmidid's People

Contributors

migueljulia avatar pedroscampoy avatar saramonzon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

plasmidid's Issues

Case with no plasmids found

I got this error:

.
.
.
DRAWING CIRCOS IMAGES (Thu Sep 17 09:31:05 CEST 2020)
 An image per putative plasmid will be drawn having into account all data supplied.
 Additionally a summary image will be created to determine redundancy within remaining plasmids

DONE, files can be found at /myhome/NO_GROUP/81009/images

CREATING SUMMARY REPORT (Thu Sep 17 09:31:24 CEST 2020)
 An html report with miniatures of the images will be generate with useful statistics to determine the correct plasmids in the sample.
Namespace(group=False, input_folder='/myhome/NO_GROUP/81009')
Creating summary
Wrong number of items passed 7, placement implies 1
Traceback (most recent call last):
  File "/myhome/.local/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2646, in get_loc
    return self._engine.get_loc(key)
  File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'images'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/myhome/.local/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 1071, in set
    loc = self.items.get_loc(item)
  File "/myhome/.local/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2648, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'images'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/plasmidID/bin/summary_report_pid.py", line 465, in <module>
    main()
  File "/usr/local/plasmidID/bin/summary_report_pid.py", line 458, in main
    final_individual_dataframe = include_images(input_folder, summary_df)
  File "/usr/local/plasmidID/bin/summary_report_pid.py", line 135, in include_images
    summary_df['images'] = summary_df.apply(lambda x: image_finder(x, sample_folder), axis=1)
  File "/myhome/.local/lib/python3.6/site-packages/pandas/core/frame.py", line 2938, in __setitem__
    self._set_item(key, value)
  File "/myhome/.local/lib/python3.6/site-packages/pandas/core/frame.py", line 3001, in _set_item
    NDFrame._set_item(self, key, value)
  File "/myhome/.local/lib/python3.6/site-packages/pandas/core/generic.py", line 3624, in _set_item
    self._data.set(key, value)
  File "/myhome/.local/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 1074, in set
    self.insert(len(self.items), item, value)
  File "/myhome/.local/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 1181, in insert
    block = make_block(values=value, ndim=self.ndim, placement=slice(loc, loc + 1))
  File "/myhome/.local/lib/python3.6/site-packages/pandas/core/internals/blocks.py", line 3047, in make_block
    return klass(values, ndim=ndim, placement=placement)
  File "/myhome/.local/lib/python3.6/site-packages/pandas/core/internals/blocks.py", line 2595, in __init__
    super().__init__(values, ndim=ndim, placement=placement)
  File "/myhome/.local/lib/python3.6/site-packages/pandas/core/internals/blocks.py", line 125, in __init__
    f"Wrong number of items passed {len(self.values)}, "
ValueError: Wrong number of items passed 7, placement implies 1
Traceback (most recent call last):
  File "/myhome/.local/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2646, in get_loc
    return self._engine.get_loc(key)
  File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'images'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/myhome/.local/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 1071, in set
    loc = self.items.get_loc(item)
  File "/myhome/.local/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2648, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'images'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/plasmidID/bin/summary_report_pid.py", line 465, in <module>
    main()
  File "/usr/local/plasmidID/bin/summary_report_pid.py", line 458, in main
    final_individual_dataframe = include_images(input_folder, summary_df)
  File "/usr/local/plasmidID/bin/summary_report_pid.py", line 135, in include_images
    summary_df['images'] = summary_df.apply(lambda x: image_finder(x, sample_folder), axis=1)
  File "/myhome/.local/lib/python3.6/site-packages/pandas/core/frame.py", line 2938, in __setitem__
    self._set_item(key, value)
  File "/myhome/.local/lib/python3.6/site-packages/pandas/core/frame.py", line 3001, in _set_item
    NDFrame._set_item(self, key, value)
  File "/myhome/.local/lib/python3.6/site-packages/pandas/core/generic.py", line 3624, in _set_item
    self._data.set(key, value)
  File "/myhome/.local/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 1074, in set
    self.insert(len(self.items), item, value)
  File "/myhome/.local/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 1181, in insert
    block = make_block(values=value, ndim=self.ndim, placement=slice(loc, loc + 1))
  File "/myhome/.local/lib/python3.6/site-packages/pandas/core/internals/blocks.py", line 3047, in make_block
    return klass(values, ndim=ndim, placement=placement)
  File "/myhome/.local/lib/python3.6/site-packages/pandas/core/internals/blocks.py", line 2595, in __init__
    super().__init__(values, ndim=ndim, placement=placement)
  File "/myhome/.local/lib/python3.6/site-packages/pandas/core/internals/blocks.py", line 125, in __init__
    f"Wrong number of items passed {len(self.values)}, "
ValueError: Wrong number of items passed 7, placement implies 1

---------------------------------------

ERROR in Script plasmidID on or near line 1075; exiting with status 1
MESSAGE:

See /myhome/logs/plasmidID.log for more information.
command:
summary_report_pid.py -i /myhome/NO_GROUP/81009 -g

---------------------------------------

This error is done because the files (81009.bedgraph_term, 81009.gff.coordinates, 81009.gff.forward.coordinates, 81009.gff.reverse.coordinates, 81009.plasmids.blast.links, 81009.plasmids.complete, 81009.plasmids.links, pID_highlights.conf and pID_text_annotation.coordinates) are empty.

If no plasmids are selected by the filters, it could be nice to manage this case.

Incorrect number of sequence in mashclust.py

First i got this message:

.
.
.
SCREENING READS WITH KMERS (Wed Sep 16 14:01:18 CEST 2020)
 Reads will be screened against database supplied for further filtering and mapping,
 this will reduce the input sequences to map against 1351

CLUSTERING SEQUENCES BY KMER DISTANCE (Wed Sep 16 14:01:29 CEST 2020)
 Sequences obtained after screen will be clustered to reduce redundancy,
 one representative, the largest, will be considered for further analysis 1351

---------------------------------------

ERROR in Script plasmidID on or near line 754; exiting with status 1
MESSAGE:

See /myhome/logs/plasmidID.log for more information.
 command: mashclust.py -i /myhome/NO_GROUP/mystrain/kmer/database.filtered_0.95_term.fasta -d 0.5

---------------------------------------

In the log file, i got this:

.
.
.
#Executing /usr/local/plasmidID/bin/filter_fasta.sh 

Output directory is /myhome/NO_GROUP/1351/kmer
Wed Sep 16 14:01:28 CEST 2020
Filtering terms on file 2020-09-03_plasmids.fasta
Wed Sep 16 14:01:29 CEST 2020
DONE Filtering terms on file 2020-09-03_plasmids.fasta
File with filtered sequences can be found in /myhome/NO_GROUP/1351/kmer/database.filtered_0.95_term.fasta
Previous number of sequences= 686
Post number of sequences= 687


Namespace(distance=0.5, input_file='/myhome/NO_GROUP/mystrain/kmer/database.filtered_0.95_term.fasta', output=False, output_grouped=False)
Obtaining mash distance
�[31m�[1mCommand mash FAILED
�[0m�[1mWITH PARAMETERS: �[0mdist -i -p 10 /myhome/NO_GROUP/mystrain/kmer/database.filtered_0.95_term.fasta /myhome/NO_GROUP/mystrain/kmer/database.filtered_0.95_term.fasta
�[1mEXIT-CODE: -11
ERROR:
�[0mSketching /myhome/NO_GROUP/mystrain/kmer/database.filtered_0.95_term.fasta (provide sketch file made with "mash sketch" to skip)...
Obtaining cluster from distance
list index out of range
Traceback (most recent call last):
  File "/usr/local/plasmidID/bin/mashclust.py", line 396, in <module>
    main()
  File "/usr/local/plasmidID/bin/mashclust.py", line 386, in main
    cluster_df = big_pairwise_to_cluster(mash_file, threshold=args.distance)
  File "/usr/local/plasmidID/bin/mashclust.py", line 244, in big_pairwise_to_cluster
    cluster_df_return = cluster_df.stack().droplevel(1).reset_index().rename(columns={'index': 'group', 0: 'id'})
  File "/myhome/.local/lib/python3.6/site-packages/pandas/core/frame.py", line 6251, in stack
    return stack(self, level, dropna=dropna)
  File "/myhome/.local/lib/python3.6/site-packages/pandas/core/reshape/reshape.py", line 543, in stack
    dtype = dtypes[0]
IndexError: list index out of range
Traceback (most recent call last):
  File "/usr/local/plasmidID/bin/mashclust.py", line 396, in <module>
    main()
  File "/usr/local/plasmidID/bin/mashclust.py", line 386, in main
    cluster_df = big_pairwise_to_cluster(mash_file, threshold=args.distance)
  File "/usr/local/plasmidID/bin/mashclust.py", line 244, in big_pairwise_to_cluster
    cluster_df_return = cluster_df.stack().droplevel(1).reset_index().rename(columns={'index': 'group', 0: 'id'})
  File "/myhome/.local/lib/python3.6/site-packages/pandas/core/frame.py", line 6251, in stack
    return stack(self, level, dropna=dropna)
  File "/myhome/.local/lib/python3.6/site-packages/pandas/core/reshape/reshape.py", line 543, in stack
    dtype = dtypes[0]
IndexError: list index out of range

In the /myhome/NO_GROUP/mystrain/kmer/database.filtered_0.95_term.fasta file, i got 687 sequences with a empty sequence (only ">"), i removed the empty sequence and the pipeline continue without error.

Relative image paths in HTML

Hi there,

Many thanks for this tool! We use it in the @nf-core / viralrecon pipeline. I'm new to this type of analysis and am just poking around my first proper run of the pipeline.

I noticed that in the HTML report, the image embeds aren't working for me. When I look at the HTML source, the <img> src is an absolute file path back to the location that the pipeline ran on the server. The above pipeline is written in Nextflow which executes each sub-task in its own directory before copying final results to a folder. So even if I hadn't moved the report from my cluster that would still break each report.

I wonder if the HTML report could instead use relative image paths instead of absolute? If I edit in the browser from this:

<a href="file:///cluster/path/analysis/work/0a/7489264fc24b66e4ef634ad280a2303/NO_GROUP/P1234_567/images/P1234_567_REF.png" target="_blank">
<img src="file:///cluster/path/analysis/work/0a/7489264fc24b66e4ef634ad280a2303/NO_GROUP/P1234_567/images/P1234_567_REF.png" alt="REF">
</a>

to this:

<a href="images/P1234_567_REF.png" target="_blank">
<img src="images/P1234_567_REF.png" alt="REF">
</a>

..then the report can be moved around and the images load properly.

A very quick poke around the package source code (it's late here) makes me think that it's probably the following line that needs to be tweaked:

return 'file://' + os.path.join(root, name)

I guess that using a relative path here will propagate through to the HTML report? e.g. something like this:

return os.path.relpath(os.path.join(root, name), sample_folder)

However as I've only looked at the code for about 5 minutes, I'm not sure if this will affect functionality elsewhere and break stuff..

Anyway, let me know what you think! Happy to prepare a PR for this change next week if you like (or please go ahead if you'd rather as it's such a small change).

Phil

hard coded path

It assumes specific directory for trimmomatic install.

lib/quality_trim.sh

trimmomatic_directory=/opt/Trimmomatic/

also a suggestion: to check all dependencies before running, not just before each step.

issues while running docker command plasmidID

please update the the docker run parameter,
the docker executable: plasmidID.sh not plasmidID

docker run -v $PWD:$PWD -w $PWD buisciii/plasmidid plasmidID.sh
-1 test/KPN_TEST_R1.fastq.gz
-2 test/KPN_TEST_R2.fastq.gz
-d test/plasmids_TEST_database.fasta
-c test/contigs_KPN_TEST.fasta
--no-trim
-s KPN

Can't create database with multiple threads

Hi there,

I'd like to give plasmidID a shot! Therefore I was following the instructions provided on the wiki to create a database. Last night, I executed the last step using the following command:

cdhit_cluster.sh -i plasmids_term.fasta -p -M 20000 -c 100 -T 16

However, while I suspected it to run on our 16 threads, I just noticed that it is actually only using one single thread. The blastn commands that are running after executing this script also show the option -num_threads 1.

I'm guessing that clustering of the sequencing can take quite a while on just one single thread, so I'd rather have it run parallel. Any insights on how this could happen?

Thank you!

error reading database.filtered

Hi,
I tried to used the program with this command
plasmidID -1 forward_paired.fq.gz -2 reverse_paired.fq.gz -d 2021-11-15_plasmids.fasta -c assembly.fasta --no-trim -s prova

and appeared this error:

ERROR in Script plasmidID on or near line 610; exiting with status 1
MESSAGE:

See logs/plasmidID.log for more information.
command:mashclust.py -i kmer/database.filtered_0.95_term.fasta -d 0.5

Am I doing anything wrong?

Albert

Wrong Contigs in respective fasta file

Hi there and thank you for this amazing tool!
However, I found that sometimes some contigs are added to the "joined" fasta although they are not mentioned in the results.html.

For example:
Contigs from a sample matched perfectly to the plasmid NZ_CP048350.1.
The Contigs matching shown in the results.html included:
[14, 23, 34, 47, 50, 56, 57, 58, 62, 65, 69, 71, 78, 82, 95, 101, 102, 113, 116, 117, 118, 131, 151, 155]
The contigs added to the fasta
[14, 23, 34, 47, 50, 56, 57, 58, 62, 65, 69, 71, 78, 82, 95, 101, 102, 113, 116, 117, 118, 131, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 151, 155]

This happens occasionaly and mainly with smaller contigs. Why is that?
Thank you for your support in advance!

Case with too many plasmids

I got this error at the end of circos log file:

debuggroup summary 0.57s karyotype has 612 chromosomes of total size 67,339,346
 at /usr/local/circos-0.69-2/bin/../lib/Circos/Error.pm line 423, <F> line 612.
	Circos::Error::fatal_error("ideogram", "max_number", 612, 200) called at /usr/local/circos-0.69-2/bin/../lib/Circos.pm line 337
	Circos::run("Circos", "_argv", "-conf /myhome/NO_G"..., "_cwd", "/myhome", "configfile", "/myhome/NO_GROUP/1"...) called at /usr/local/circos-0.69-2/bin/circos line 529

  *** CIRCOS ERROR ***

      cwd: /myhome

      command: /usr/local/circos-0.69-2/bin/circos -conf
      /myhome/NO_GROUP/1351/images/1351_summary.circos.conf

  You have asked to draw [612] ideograms, but the maximum is currently set at
  [200]. To increase this number change max_ideograms in etc/housekeeping.conf.
  Keep in mind that drawing that many ideograms may create an image that is too
  busy and uninterpretable.

  If you are having trouble debugging this error, first read the best practices
  tutorial for helpful tips that address many common problems

      http://www.circos.ca/documentation/tutorials/reference/best_practices

  The debugging facility is helpful to figure out what's happening under the
  hood

      http://www.circos.ca/documentation/tutorials/configuration/debugging

  If you're still stumped, get support in the Circos Google Group. Please
  include this error and all your configuration and data files.

      http://groups.google.com/group/circos-data-visualization

  Stack trace:

I run the summary_report_pid.py command line and got the final summary as except.

It can be possible to trace/catch the error and ignoring it to have the final summary ?

circos error and minor error

circos throws an error as it seemingly cannot find this file (from what I can tell does not exist):

NO_GROUP//data/.annotation.coordinates

also after lib/prokka_annotation.sh

Removing unwanted files
rm: cannot remove 'NO_GROUP//data/.val': No such file or directory
rm: cannot remove 'NO_GROUP//data/.gbf': No such file or directory

which appears to be removed earlier.

Unable to copy files to container

When I run:

sudo singularity build plasmidid.simg Singularity

I get:

WARNING: Authentication token file not found : Only pulls of public images will succeed
INFO: Starting build...
Getting image source signatures
Skipping fetch of repeat blob sha256:8ba884070f611d31cb2c42eddb691319dc9facf5e0ec67672fcfa135181ab3df
Copying config sha256:b9a1b1f0b2baaec83946a26d7045e4028f11eccc8b0e5b3514568e56a391beb2
1.05 KiB / 1.05 KiB [======================================================] 0s
Writing manifest to image destination
Storing signatures
INFO: Copying ./scif_app_recipes/* to /opt/
FATAL: container creation failed: unable to copy files to container fs: While copying ./scif_app_recipes/* to /tmp/sbuild-141638742/fs/opt: exit status 1
FATAL: While performing build: while running engine: exit status 255

I have initiated and update submodules: confirmed that files exist.
I have tried with several computers and I get the same error message in both.

line 487: syntax error near unexpected token `>'

Hi,
I installed plasmidID using bioconda in a separate conda environment and I tried to run it using the below command in which I supplied file names with their directory path. Kindly advise
plasmidID \ -1 SAMPLE_R1.fastq.gz \ -2 SAMPLE_R2.fastq.gz \ -d YYYY-MM-DD_plasmids.fasta \ -c SAMPLE_assembled_contigs.fasta \ --no-trim \ -s SAMPLE
All dependencies are installed, however, I get this error
line 487: syntax error near unexpected token >'`
I am attaching the log file
plasmidID.log

Update Singularity

In your Singularity update this line scif install /opt/plasmidid_v1.4.0_centos7.scif to scif install /opt/plasmidid_v1.4.1_centos7.scif.

Can you add in your README.md just few line to how build up your image docker or singularity like for singularity with sudo singularity build plasmidid.simg Singularity on the git repo or download if you have a dockerhub repo or any others online repo ?

I am just start to see & test your tool, it's look very nice. Keep it up =)

ERROR in Script plasmidID on or near line 612

Hi, I am working with plasmidID version 1.5.1 in conda.
During the mapping reads step appears the following error:

ERROR in Script plasmidID on or near line 612

To solve that, I am trying to install the new release 1.6.5 without exit.
Can you help me with this issue?

Thank you in advance.
Best regards,
Sara

test.sh blast_align.sh - misses KPN.fna file

Hello,

When executing the test script as in

$ PATH=$PATH:/home/moeller/git/med-team/mmmulti/plasmidid/bin TEST_DATA/test.sh || true
Executing:../plasmidID -1 KPN_TEST_R1.fastq.gz -2 KPN_TEST_R2.fastq.gz -d plasmids_TEST_database.fasta -c contigs_KPN_TEST.fasta -s KPN --no-trim
Forward reads: KPN_TEST_R1.fastq.gz
Reverse reads: KPN_TEST_R2.fastq.gz
PlasmidDatabase: plasmids_TEST_database.fasta
Contigs: contigs_KPN_TEST.fasta
Options: --no-trim


------------------
Starting plasmidID version:1.6.3
------------------


CHECKING DEPENDENCIES AND MANDATORY FILES

DEPENDENCY                    STATUS
----------                    ------
blastn                       INSTALLED 
bowtie2-build                INSTALLED 
bowtie2                      INSTALLED 
bedtools                     INSTALLED 
prokka                       INSTALLED 
samtools                     INSTALLED 
mash                         INSTALLED 
circos                       INSTALLED 

Default output directory is: /home/moeller/git/med-team/mmmulti/plasmidid

Log will be saved in: /home/moeller/git/med-team/mmmulti/plasmidid/NO_GROUP/KPN/logs/plasmidID.log


No trim selected, skipping trimming step

Contigs supplied, ommiting assembly step


------------------
#Pipeline summary#
------------------
Reads R1                               KPN_TEST_R1.fastq.gz
Reads R2                               KPN_TEST_R2.fastq.gz
Will be mapped with ddbb               plasmids_TEST_database.fasta
Entries covered more than              80 %
Will be clustered by                   0.5 % identity
And used to reconstruct contigs in     contigs_KPN_TEST.fasta

 STARTING KMER FILTERING, CLUSTERING and MAPPING 


SCREENING READS WITH KMERS (Fri Jun 26 18:49:54 CEST 2020)
 Reads will be screened against database supplied for further filtering and mapping,
 this will reduce the input sequences to map against KPN

CLUSTERING SEQUENCES BY KMER DISTANCE (Fri Jun 26 18:49:59 CEST 2020)
 Sequences obtained after screen will be clustered to reduce redundancy,
 one representative, the largest, will be considered for further analysis KPN

MAPPING READS (Fri Jun 26 18:49:59 CEST 2020)
 Reads will be mapped against database supplied for further coverage calculation,
 this will determine the most likely plasmids in the sample KPN

FILTERING DATABASE BY COVERAGE (Fri Jun 26 18:51:03 CEST 2020)
 Coverage will be calculated and the entries covered more than 80%
 will pass to further analysis

 STARTING CONTIG ALIGNMENT and ANNOTATON


OBTAINING KARYOTYPE TRACKS (Fri Jun 26 18:51:05 CEST 2020)
 A file with the informatin of putative plasmid and its length will be generated.


OBTAINING COVERAGE TRACK (Fri Jun 26 18:51:05 CEST 2020)
 A bedgraph file containing mapping information for filtered plasmids will be generated.



-------------------------
#Pipeline reconstruction#
-------------------------
Contigs                                contigs_KPN_TEST.fasta
Will be aligned to                     KPN.coverage_adapted_filtered_80_term.fasta
That contains                          7 plasmids
And each contig aligned more than      20 %
and have at least                      90 % identity
Will be represented and annotated       

ANNOTATING CONTIGS (Fri Jun 26 18:51:07 CEST 2020)
 A file including all automatic annotations on contigs will be generated.


ALIGNING CONTIGS TO FILTERED PLASMIDS (Fri Jun 26 18:51:07 CEST 2020)
 Contigs are aligned to filtered plasmids and those are selected by alignment identity and alignment percentage in order to create links, full length and annotation tracks


---------------------------------------

ERROR in Script plasmidID on or near line 836; exiting with status 1
MESSAGE:

See /home/moeller/git/med-team/mmmulti/plasmidid/logs/plasmidID.log for more information.
command:
blast_align.sh -i  /home/moeller/git/med-team/mmmulti/plasmidid/NO_GROUP/KPN/data/KPN".fna" -d /home/moeller/git/med-team/mmmulti/plasmidid/NO_GROUP/KPN/mapping/KPN.coverage_adapted_filtered_80_term.fasta -o  /home/moeller/git/med-team/mmmulti/plasmidid/NO_GROUP/KPN/data -p plasmids

and there is no log file from what I saw. Executing the blast_align directly, I get

$ PATH=$PATH:$(pwd)/bin blast_align.sh -i  /home/moeller/git/med-team/mmmulti/plasmidid/NO_GROUP/KPN/data/KPN".fna" -d /home/moeller/git/med-team/mmmulti/plasmidid/NO_GROUP/KPN/mapping/KPN.coverage_adapted_filtered_80_term.fasta -o  /home/moeller/git/med-team/mmmulti/plasmidid/NO_GROUP/KPN/data -p plasmids

#Executing /home/moeller/git/med-team/mmmulti/plasmidid/bin/blast_align.sh 

KPN.fna not supplied, please, introduce a valid file
ERROR: 1 missing files, aborting execution

And a log file says that it is all prokka's fault:


#Executing /home/moeller/git/med-team/mmmulti/plasmidid/bin/prokka_annotation.sh 


DEPENDENCY                    STATUS
----------                    ------
prokka                       INSTALLED 
PREFIX KPN
Output directory is /home/moeller/git/med-team/mmmulti/plasmidid/NO_GROUP/KPN/data
Fri Jun 26 18:51:07 CEST 2020
Annotating /home/moeller/git/med-team/mmmulti/plasmidid/TEST_DATA/contigs_KPN_TEST.fasta with prokka
[18:51:07] This is prokka 1.14.6
[18:51:07] Written by Torsten Seemann <[email protected]>
[18:51:07] Homepage is https://github.com/tseemann/prokka
[18:51:07] Local time is Fri Jun 26 18:51:07 2020
[18:51:07] You are moeller
[18:51:07] Operating system is linux
[18:51:07] You have BioPerl 1.7.7
Argument "1.7.7" isn't numeric in numeric lt (<) at /usr/bin/prokka line 259.
[18:51:07] System has 2 cores.
[18:51:07] Will use maximum of 1 cores.
[18:51:07] Annotating as >>> Bacteria <<<
[18:51:07] The sequence databases have not been indexed. Please run 'prokka --setupdb' first.
Fri Jun 26 18:51:07 CEST 2020
done annotating /home/moeller/git/med-team/mmmulti/plasmidid/TEST_DATA/contigs_KPN_TEST.fasta with prokka
Removing unwanted files
ls: cannot access '/home/moeller/git/med-team/mmmulti/plasmidid/NO_GROUP/KPN/data/KPN.???': No such file or directory



#Executing /home/moeller/git/med-team/mmmulti/plasmidid/bin/blast_align.sh 

KPN.fna not supplied, please, introduce a valid file

I have then run the "prokka --setupdb" but this has not changed anything. Is there something you suggest for me do/check/..?

Many thanks!

Steffen

ERROR "there was a problem accessing the ftp"

Hi:

First, what a great pipeline is this.

I just re-installed in a new machine and I got this error trying to download the database. After checking the address to what was pointing to I found "https://ftp.ncbi.nlm.nih.gov/genomes/GENOME_REPORTS/plasmids.txt" is empty.

Any other way to download the database?

EDIT
I got solved by itself. Sorry to bother. Even when I've tried to download using wget I got plasmids.txt file empty. But probably was some sort of connection issue or they where updating it. It could be nice to have an alternative way though

Best regards,
Javier

hard coded paths

I think there is a hard coded path that you missed, and may want to correct for the public release:

lib/draw_circos_images.sh

cdsDdbbFile="/processing_Data/bioinformatics/research/20160530_ANTIBIOTICS_PSMP_T/REFERENCES/PLASMIDS/plasmid.all.genomic.feb212017.bed"

Create Database cdhit_cluster.sh

In your script, you used an outdated version of cdhit (v4.6.6 of 2016). If you have no reason to conserve it please update cdhit_cluster.sh script with the newest version of cdhir and uncomment this line #cluster_cutoff=${cluster_cutoff%.*} #Remove float value to have a valid cutoff ('1.' to '1') and change '-core $threads' in the cdhit call by '-para $threads' (i also specify -blp 4 (for 8 threads) ).

Now i am loking for a dbblast problem...

#Executing cdhit_cluster.sh 


DEPENDENCY	              STATUS
----------	              ------
cd-hit-est                   INSTALLED 
Default output directory is /MYPATH/plasmidID
filename is plasmids_term.fasta
lundi 22 octobre 2018, 16:20:27 (UTC+0200)
Clustering sequences with identity 100% or higher
Using psi-cd-hit.pl with file /MYPATH/plasmidID/plasmids_term.fasta


DEPENDENCY	              STATUS
----------	              ------
psi-cd-hit.pl                INSTALLED 
psi-cd-hit.pl -i plasmids_term.fasta -o plasmids_term.fasta_100 -c 1 -G 1 -g 1 -prog blastn -circle 1 -para 8 -blp 4
Name "main::bl_dir" used only once: possible typo at /usr/local/cd-hit-v4.6.7/psi-cd-hit/psi-cd-hit.pl line 103.
BLAST options error: max_file_sz must be < 2 GiB
Can not formatdb at /usr/local/cd-hit-v4.6.7/psi-cd-hit//psi-cd-hit-local.pl line 999.

---------------------------------------

ERROR in Script cdhit_cluster.sh on or near line 275; exiting with status 1
MESSAGE:

PSI-CD-HIT command failed. See /MYPATH/plasmidID/logs for more information.

---------------------------------------

DO IT TAKE LONG?

The plasmidID has been running for a week, and it still at the bowtie2 align stage.
Is this phenomenon normal

ValueError: You are trying to merge on object and float64 columns.

I'm trying to run PlasmidID via the Bioconda release, and Am running into an issue with Pandas. Might be user error though!

CREATING SUMMARY REPORT (Thu Jun 30 01:24:20 UTC 2022)
 An html report with miniatures of the images will be generate with useful statistics to determine the correct plasmids in the sample.
Namespace(group=False, input_folder='/home/robert_petit/temp/test/plasmid/NO_GROUP/SRX4563634')
Creating summary
You are trying to merge on object and float64 columns. If you wish to proceed you should use pd.concat
Traceback (most recent call last):
  File "/home/robert_petit/miniconda3/envs/test-plasmidid/bin/summary_report_pid.py", line 465, in <module>
    main()
  File "/home/robert_petit/miniconda3/envs/test-plasmidid/bin/summary_report_pid.py", line 457, in main
    summary_df = complete_report_df(complete_file, len_description_df, percentage_df)
  File "/home/robert_petit/miniconda3/envs/test-plasmidid/bin/summary_report_pid.py", line 116, in complete_report_df
    df = len_description_df.merge(covered_df, on='id', how='left')
  File "/home/robert_petit/miniconda3/envs/test-plasmidid/lib/python3.7/site-packages/pandas/core/frame.py", line 9203, in merge
    validate=validate,
  File "/home/robert_petit/miniconda3/envs/test-plasmidid/lib/python3.7/site-packages/pandas/core/reshape/merge.py", line 119, in merge
    validate=validate,
  File "/home/robert_petit/miniconda3/envs/test-plasmidid/lib/python3.7/site-packages/pandas/core/reshape/merge.py", line 703, in __init__
    self._maybe_coerce_merge_keys()
  File "/home/robert_petit/miniconda3/envs/test-plasmidid/lib/python3.7/site-packages/pandas/core/reshape/merge.py", line 1256, in _maybe_coerce_merge_keys
    raise ValueError(msg)
ValueError: You are trying to merge on object and float64 columns. If you wish to proceed you should use pd.concat
Traceback (most recent call last):
  File "/home/robert_petit/miniconda3/envs/test-plasmidid/bin/summary_report_pid.py", line 465, in <module>
    main()
  File "/home/robert_petit/miniconda3/envs/test-plasmidid/bin/summary_report_pid.py", line 457, in main
    summary_df = complete_report_df(complete_file, len_description_df, percentage_df)
  File "/home/robert_petit/miniconda3/envs/test-plasmidid/bin/summary_report_pid.py", line 116, in complete_report_df
    df = len_description_df.merge(covered_df, on='id', how='left')
  File "/home/robert_petit/miniconda3/envs/test-plasmidid/lib/python3.7/site-packages/pandas/core/frame.py", line 9203, in merge
    validate=validate,
  File "/home/robert_petit/miniconda3/envs/test-plasmidid/lib/python3.7/site-packages/pandas/core/reshape/merge.py", line 119, in merge
    validate=validate,
  File "/home/robert_petit/miniconda3/envs/test-plasmidid/lib/python3.7/site-packages/pandas/core/reshape/merge.py", line 703, in __init__
    self._maybe_coerce_merge_keys()
  File "/home/robert_petit/miniconda3/envs/test-plasmidid/lib/python3.7/site-packages/pandas/core/reshape/merge.py", line 1256, in _maybe_coerce_merge_keys
    raise ValueError(msg)
ValueError: You are trying to merge on object and float64 columns. If you wish to proceed you should use pd.concat

---------------------------------------

ERROR in Script plasmidID on or near line 1089; exiting with status 1
MESSAGE:

See /home/robert_petit/temp/test/plasmid/logs/plasmidID.log for more information.
command:
summary_report_pid.py -i /home/robert_petit/temp/test/plasmid/NO_GROUP/SRX4563634 -g

---------------------------------------

Command Used

plasmidID -d plasmidFinder_01_26_2018.fsa -s SRX4563634 -c SRX4563634.fna -T 4

Here are the files used (added .txt so GitHub would allow upload)
plasmidFinder_01_26_2018.fsa.txt
SRX4563634.fna.txt

Update 1.

Doing some digging, covered_df might the issue. It looks like this:

print(covered_df)
            id  len_covered
0  500039.4128         2363

print(covered_df.dtypes)
id             float64
len_covered      int64
dtype: object

Going to play around with this some more

Update 2

Converted the ID to a string and now have this

Columns must be same length as key
Traceback (most recent call last):
  File "./summary_report_pid.py", line 470, in <module>
    main()
  File "./summary_report_pid.py", line 462, in main
    summary_df = complete_report_df(complete_file, len_description_df, percentage_df)
  File "./summary_report_pid.py", line 126, in complete_report_df
    df['contig_name'] = df.apply(lambda x: set_to_list(x), axis=1)
  File "/home/robert_petit/miniconda3/envs/test-plasmidid/lib/python3.7/site-packages/pandas/core/frame.py", line 3602, in __setitem__
    self._set_item_frame_value(key, value)
  File "/home/robert_petit/miniconda3/envs/test-plasmidid/lib/python3.7/site-packages/pandas/core/frame.py", line 3729, in _set_item_frame_value
    raise ValueError("Columns must be same length as key")
ValueError: Columns must be same length as key
Traceback (most recent call last):
  File "./summary_report_pid.py", line 470, in <module>
    main()
  File "./summary_report_pid.py", line 462, in main
    summary_df = complete_report_df(complete_file, len_description_df, percentage_df)
  File "./summary_report_pid.py", line 126, in complete_report_df
    df['contig_name'] = df.apply(lambda x: set_to_list(x), axis=1)
  File "/home/robert_petit/miniconda3/envs/test-plasmidid/lib/python3.7/site-packages/pandas/core/frame.py", line 3602, in __setitem__
    self._set_item_frame_value(key, value)
  File "/home/robert_petit/miniconda3/envs/test-plasmidid/lib/python3.7/site-packages/pandas/core/frame.py", line 3729, in _set_item_frame_value
    raise ValueError("Columns must be same length as key")
ValueError: Columns must be same length as key

Update 3

Looks like the dataframe is empty

print(df)
Empty DataFrame
Columns: [id, length, species, description, fraction_covered, contig_name]
Index: []

    .... Code is below ... from complete_report_df()
    del df['len_covered']
    df = df.merge(contigs_df, on='id', how='left')
    df = df.dropna()
    print(df)
    df['contig_name'] = df.apply(lambda x: set_to_list(x), axis=1)

Not sure if it matters but the percentage_file (e.g. *.coverage_adapted_clustered_percentage) does not exist

ERROR in Script

**Hi i tried to use the TEST_DATA and I have some problems executing the program.

Here there is the input and output :**

recercangs@recercangs-MS-7B48:~$ docker run -v $PWD:$PWD -w $PWD buisciii/plasmidid plasmidID.sh -1 /home/recercangs/plasmidID/TEST_DATA/KPN_TEST_R1.fastq.gz -2 /home/recercangs/plasmidID/TEST_DATA/KPN_TEST_R2.fastq.gz -d /home/recercangs/plasmidID/TEST_DATA/plasmids_TEST_database.fasta -s /home/recercangs/contigs_KPN_TEST.fasta


Starting plasmidID version:1.3.3

CHECKING DEPENDENCIES AND MANDATORY FILES

DEPENDENCY STATUS


blastn INSTALLED
bowtie2-build INSTALLED
bowtie2 INSTALLED
cd-hit-est INSTALLED
bedtools INSTALLED
prokka INSTALLED
samtools INSTALLED
circos INSTALLED
ERROR: please, reduce the number of characters to equal or less than 37

plasmidID is a computational pipeline tha reconstruct and annotate the most likely plasmids present in one sample

** Then I tried copipasting the archives in the main directory**

recercangs@recercangs-MS-7B48:~$ docker run -v $PWD:$PWD -w $PWD buisciii/plasmidid plasmidID.sh -1 KPN_TEST_R1.fastq.gz -2 KPN_TEST_R2.fastq.gz -d plasmids_TEST_database.fasta -s contigs_KPN_TEST.fasta


Starting plasmidID version:1.3.3

CHECKING DEPENDENCIES AND MANDATORY FILES

DEPENDENCY STATUS


blastn INSTALLED
bowtie2-build INSTALLED
bowtie2 INSTALLED
cd-hit-est INSTALLED
bedtools INSTALLED
prokka INSTALLED
samtools INSTALLED
circos INSTALLED

Default output directory is: /home/recercangs

Log will be saved in: /home/recercangs/NO_GROUP/contigs_KPN_TEST.fasta/logs/plasmidID.log

TRIMMING READS Tue Apr 2 10:50:54 UTC 2019
Reads will be quality trimmed with a window of 4 and an average quality of 20

DEPENDENCY STATUS


java INSTALLED


ERROR in Script plasmidID.sh on or near line 465; exiting with status 1
MESSAGE:

See /home/recercangs/logs/plasmidID.log for more information.
command:
quality_trim.sh -1 KPN_TEST_R1.fastq.gz -2 KPN_TEST_R2.fastq.gz -s contigs_KPN_TEST.fasta -g NO_GROUP -o /home/recercangs/NO_GROUP/contigs_KPN_TEST.fasta/trimmed -d /opt/Trimmomatic/ -T 1


how can I solve the problem?

Error with spades_assembly.sh

I am currently working with WGS data and would like to use your programme PlasmidID. After installation, I tried to run my fasta files and encountered this error.

logs.zip
Capture

Problems with `.` in sample names

Hi,
thanks for plasmidid, a very useful tool.

I encountered a problem when sample names contain a . The program crashes and returns the following error messages:

ERROR in Script plasmidID on or near line 863; exiting with status 1
MESSAGE:

See results_plsdb_test/GCA_019841085/logs/plasmidID.log for more information.
command:
blast_to_link.sh -i  results_plsdb_test/GCA_019841085/E_marmotae/GCA_019841085.1/data/GCA_019841085.1".plasmids.blast" -I -l 20 -b 90

My command was

 plasmidID -T 20 -c GCA_019841085.1.fasta -d plasmids.fna -s GCA_019841085.1 -g 'E_coli' -o results_test/GCA_019841085.1

When I remove the . in the sample argument GCA_019841085.1 -> GCA_019841085 the program runs smoothly.

Still, I think it would be important to be able to name samples such as this is the offocial NCBI Refseq/Genbank naming convention (accession.version).

Thanks,
Carlus

P.S.:

$ plasmidID --version
1.6.4

get_coverage.sh error

Hi,
How are you? I hope this email finds you well!
I've been trying to run this tool to predict plasmids in e. coli, but I've seen to have run into a problem. It seems to be some kind of problem with the sorted.bam file (according to detailed in the log file). I installed plasmidID using conda, as detailed in the instructions.

Any help will be greatly appreciated! Thanks a lot for your help!

I detailed below the output from the tool and the log

  1. OUTPUT FROM THE TOOL

(plasmid_id) [jpaganini@n0061 reads]$ plasmidID -1 reads/GCA_011404755.1_ASM1140475v1_genomic_R1.fq -2 reads/GCA_011404755.1_ASM1140475v1_genomic_R2.fq -d plasmidid_db/2020-08-19_plasmids.fasta -s GCA_011404755.1_ASM1140475v1_genomic -o 20200830_test_output


Starting plasmidID version:1.6.3

CHECKING DEPENDENCIES AND MANDATORY FILES

DEPENDENCY STATUS


blastn INSTALLED
bowtie2-build INSTALLED
bowtie2 INSTALLED
bedtools INSTALLED
prokka INSTALLED
samtools INSTALLED
mash INSTALLED
circos INSTALLED
GCA_011404755.1_ASM1140475v1_genomic_R1.fq not supplied, please, introduce a valid file
GCA_011404755.1_ASM1140475v1_genomic_R2.fq not supplied, please, introduce a valid file
2020-08-19_plasmids.fasta not supplied, please, introduce a valid file
ERROR: 3 missing files, aborting execution
(plasmid_id) [jpaganini@n0061 reads]$ cd ..
(plasmid_id) [jpaganini@n0061 testing_plasmidid]$ plasmidID -1 reads/GCA_011404755.1_ASM1140475v1_genomic_R1.fq -2 reads/GCA_011404755.1_ASM1140475v1_genomic_R2.fq -d plasmidid_db/2020-08-19_plasmids.fasta -s GCA_011404755.1_ASM1140475v1_genomic -o 20200830_test_output


Starting plasmidID version:1.6.3

CHECKING DEPENDENCIES AND MANDATORY FILES

DEPENDENCY STATUS


blastn INSTALLED
bowtie2-build INSTALLED
bowtie2 INSTALLED
bedtools INSTALLED
prokka INSTALLED
samtools INSTALLED
mash INSTALLED
circos INSTALLED

Output directory is: 20200830_test_output

Log will be saved in: 20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/logs/plasmidID.log

TRIMMING READS Sun Aug 30 13:46:12 CEST 2020
Reads will be quality trimmed with a window of 4 and an average quality of 20

DEPENDENCY STATUS


trimmomatic INSTALLED

ASSEMBLY READS (Sun Aug 30 13:52:39 CEST 2020)
Reads will be assembled using SPAdes with k-mers: 21,33,55,77,99,127. This might take a while.
I suggest compare other assembly methods and input the contigs with -c|--contigs option.

DEPENDENCY STATUS


spades.py INSTALLED


#Pipeline summary#

Reads R1 GCA_011404755.1_ASM1140475v1_genomic_R1.fq
Reads R2 GCA_011404755.1_ASM1140475v1_genomic_R2.fq
Will be mapped with ddbb 2020-08-19_plasmids.fasta
Entries covered more than 80 %
Will be clustered by 0.5 % identity
And used to reconstruct contigs in scaffolds.fasta

STARTING KMER FILTERING, CLUSTERING and MAPPING

SCREENING READS WITH KMERS (Sun Aug 30 14:40:09 CEST 2020)
Reads will be screened against database supplied for further filtering and mapping,
this will reduce the input sequences to map against GCA_011404755.1_ASM1140475v1_genomic

CLUSTERING SEQUENCES BY KMER DISTANCE (Sun Aug 30 14:45:57 CEST 2020)
Sequences obtained after screen will be clustered to reduce redundancy,
one representative, the largest, will be considered for further analysis GCA_011404755.1_ASM1140475v1_genomic

MAPPING READS (Sun Aug 30 14:46:09 CEST 2020)
Reads will be mapped against database supplied for further coverage calculation,
this will determine the most likely plasmids in the sample GCA_011404755.1_ASM1140475v1_genomic

FILTERING DATABASE BY COVERAGE (Sun Aug 30 14:52:56 CEST 2020)
Coverage will be calculated and the entries covered more than 80%
will pass to further analysis


ERROR in Script plasmidID on or near line 642; exiting with status 1
MESSAGE:

See 20200830_test_output/logs/plasmidID.log for more information.
command:
get_coverage.sh -i 20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/mapping/GCA_011404755.1_ASM1140475v1_genomic".sorted.bam" -d plasmidid_db/2020-08-19_plasmids.fasta


################################################################################################

  1. LOG FILE:

LOG FILE PLASMIDID
Sun Aug 30 13:46:12 CEST 2020

#Executing /home/dla_mm/jpaganini/data/miniconda3/envs/plasmid_id/bin/quality_trim.sh

DEPENDENCY STATUS


trimmomatic �[0;32mINSTALLED�[0m
Output directory is 20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/trimmed
Sun Aug 30 13:50:38 CEST 2020
Quality trimming:
R1 = reads/GCA_011404755.1_ASM1140475v1_genomic_R1.fq
R2 = reads/GCA_011404755.1_ASM1140475v1_genomic_R2.fq
TrimmomaticPE: Started with arguments:
-threads 1 reads/GCA_011404755.1_ASM1140475v1_genomic_R1.fq reads/GCA_011404755.1_ASM1140475v1_genomic_R2.fq 20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/trimmed/GCA_011404755.1_ASM1140475v1_genomic_1_paired.fastq.gz 20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/trimmed/GCA_011404755.1_ASM1140475v1_genomic_1_unpaired.fastq.gz 20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/trimmed/GCA_011404755.1_ASM1140475v1_genomic_2_paired.fastq.gz 20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/trimmed/GCA_011404755.1_ASM1140475v1_genomic_2_unpaired.fastq.gz ILLUMINACLIP:/hpc/dla_mm/jpaganini/data/miniconda3/pkgs/trimmomatic-0.39-1/share/trimmomatic-0.39-1/adapters/TruSeq3-PE.fa:2:30:10 SLIDINGWINDOW:4:20 MINLEN:40
Using PrefixPair: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT' and 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
ILLUMINACLIP: Using 1 prefix pairs, 0 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Quality encoding detected as phred33
Input Read Pairs: 1000000 Both Surviving: 999995 (100.00%) Forward Only Surviving: 4 (0.00%) Reverse Only Surviving: 1 (0.00%) Dropped: 0 (0.00%)
TrimmomaticPE: Completed successfully
Sun Aug 30 13:52:39 CEST 2020
DONE quality trimming, file can be fount at:
20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/trimmed/GCA_011404755.1_ASM1140475v1_genomic_1_paired.fastq.gz
20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/trimmed/GCA_011404755.1_ASM1140475v1_genomic_1_unpaired.fastq.gz
20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/trimmed/GCA_011404755.1_ASM1140475v1_genomic_2_paired.fastq.gz
20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/trimmed/GCA_011404755.1_ASM1140475v1_genomic_2_unpaired.fastq.gz

#Executing /home/dla_mm/jpaganini/data/miniconda3/envs/plasmid_id/bin/spades_assembly.sh

DEPENDENCY STATUS


spades.py �[0;32mINSTALLED�[0m
Reads directory for quick mode is 20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/trimmed/
Output directory is 20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly
Entering QUICK MODE
Sun Aug 30 13:52:39 CEST 2020
Assembly:
R1 paired file = 20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/trimmed/GCA_011404755.1_ASM1140475v1_genomic_1_paired.fastq.gz
R2 paired file = 20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/trimmed/GCA_011404755.1_ASM1140475v1_genomic_2_paired.fastq.gz
Command line: /home/dla_mm/jpaganini/data/miniconda3/envs/plasmid_id/bin/spades.py --careful -t 1 -k 21,33,55,77,99,127 --pe1-1 /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/trimmed/GCA_011404755.1_ASM1140475v1_genomic_1_paired.fastq.gz --pe1-2 /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/trimmed/GCA_011404755.1_ASM1140475v1_genomic_2_paired.fastq.gz -o /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly

System information:
SPAdes version: 3.13.0
Python version: 3.6.6
OS: Linux-3.10.0-1062.4.1.el7.x86_64-x86_64-with-centos-7.7.1908-Core

Output dir: /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly
Mode: read error correction and assembling
Debug mode is turned OFF

Dataset parameters:
Multi-cell mode (you should set '--sc' flag if input data was obtained with MDA (single-cell) technology or --meta flag if processing metagenomic dataset)
Reads:
Library number: 1, library type: paired-end
orientation: fr
left reads: ['/hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/trimmed/GCA_011404755.1_ASM1140475v1_genomic_1_paired.fastq.gz']
right reads: ['/hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/trimmed/GCA_011404755.1_ASM1140475v1_genomic_2_paired.fastq.gz']
interlaced reads: not specified
single reads: not specified
merged reads: not specified
Read error correction parameters:
Iterations: 1
PHRED offset will be auto-detected
Corrected reads will be compressed
Assembly parameters:
k: [21, 33, 55, 77, 99, 127]
Repeat resolution is enabled
Mismatch careful mode is turned ON
MismatchCorrector will be used
Coverage cutoff is turned OFF
Other parameters:
Dir for temp files: /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly/tmp
Threads: 1
Memory limit (in Gb): 250

======= SPAdes pipeline started. Log can be found here: /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly/spades.log

===== Read error correction started.

== Running read error correction tool: /hpc/dla_mm/jpaganini/data/miniconda3/envs/plasmid_id/share/spades-3.13.0-0/bin/spades-hammer /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly/corrected/configs/config.info

0:00:00.000 4M / 4M INFO General (main.cpp : 75) Starting BayesHammer, built from refs/heads/spades_3.13.0, git revision 8ea46659e9b2aca35444a808db550ac333006f8b
0:00:00.000 4M / 4M INFO General (main.cpp : 76) Loading config from /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly/corrected/configs/config.info
0:00:00.001 4M / 4M INFO General (main.cpp : 78) Maximum # of threads to use (adjusted due to OMP capabilities): 1
0:00:00.001 4M / 4M INFO General (memory_limit.cpp : 49) Memory limit set to 40 Gb
0:00:00.001 4M / 4M INFO General (main.cpp : 86) Trying to determine PHRED offset
0:00:00.002 4M / 4M INFO General (main.cpp : 92) Determined value is 33
0:00:00.003 4M / 4M INFO General (hammer_tools.cpp : 36) Hamming graph threshold tau=1, k=21, subkmer positions = [ 0 10 ]
0:00:00.003 4M / 4M INFO General (main.cpp : 113) Size of aux. kmer data 24 bytes
=== ITERATION 0 begins ===
0:00:00.007 4M / 4M INFO K-mer Index Building (kmer_index_builder.hpp : 301) Building kmer index
0:00:00.007 4M / 4M INFO General (kmer_index_builder.hpp : 117) Splitting kmer instances into 16 files using 1 threads. This might take a while.
0:00:00.009 4M / 4M INFO General (file_limit.hpp : 32) Open file limit set to 4096
0:00:00.009 4M / 4M INFO General (kmer_splitters.hpp : 89) Memory available for splitting buffers: 13.332 Gb
0:00:00.009 4M / 4M INFO General (kmer_splitters.hpp : 97) Using cell size of 4194304
0:00:00.009 580M / 616M INFO K-mer Splitting (kmer_data.cpp : 97) Processing /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/trimmed/GCA_011404755.1_ASM1140475v1_genomic_1_paired.fastq.gz
0:00:14.156 580M / 616M INFO K-mer Splitting (kmer_data.cpp : 107) Processed 256726 reads
0:00:28.085 580M / 616M INFO K-mer Splitting (kmer_data.cpp : 107) Processed 513421 reads
0:00:41.771 580M / 616M INFO K-mer Splitting (kmer_data.cpp : 107) Processed 770008 reads
0:00:54.436 580M / 616M INFO K-mer Splitting (kmer_data.cpp : 107) Processed 999995 reads
0:00:54.436 580M / 616M INFO K-mer Splitting (kmer_data.cpp : 97) Processing /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/trimmed/GCA_011404755.1_ASM1140475v1_genomic_2_paired.fastq.gz
0:01:08.126 580M / 616M INFO K-mer Splitting (kmer_data.cpp : 107) Processed 1256743 reads
0:01:22.308 580M / 616M INFO K-mer Splitting (kmer_data.cpp : 107) Processed 1513407 reads
0:01:48.806 580M / 616M INFO K-mer Splitting (kmer_data.cpp : 112) Total 1999990 reads processed
0:01:48.878 4M / 616M INFO General (kmer_index_builder.hpp : 120) Starting k-mer counting.
0:01:57.428 4M / 616M INFO General (kmer_index_builder.hpp : 127) K-mer counting done. There are 44205208 kmers in total.
0:01:57.428 4M / 616M INFO General (kmer_index_builder.hpp : 133) Merging temporary buckets.
0:02:00.710 4M / 616M INFO K-mer Index Building (kmer_index_builder.hpp : 314) Building perfect hash indices
0:02:08.439 24M / 616M INFO General (kmer_index_builder.hpp : 150) Merging final buckets.
0:02:11.668 24M / 616M INFO K-mer Index Building (kmer_index_builder.hpp : 336) Index built. Total 20506240 bytes occupied (3.7111 bits per kmer).
0:02:11.671 24M / 616M INFO K-mer Counting (kmer_data.cpp : 356) Arranging kmers in hash map order
0:02:17.015 704M / 704M INFO General (main.cpp : 148) Clustering Hamming graph.
0:08:43.944 704M / 704M INFO General (main.cpp : 155) Extracting clusters
0:09:22.040 704M / 1G INFO General (main.cpp : 167) Clustering done. Total clusters: 11067046
0:09:22.073 364M / 1G INFO K-mer Counting (kmer_data.cpp : 376) Collecting K-mer information, this takes a while.
0:09:22.694 1G / 1G INFO K-mer Counting (kmer_data.cpp : 382) Processing /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/trimmed/GCA_011404755.1_ASM1140475v1_genomic_1_paired.fastq.gz
0:11:52.857 1G / 1G INFO K-mer Counting (kmer_data.cpp : 382) Processing /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/trimmed/GCA_011404755.1_ASM1140475v1_genomic_2_paired.fastq.gz
0:14:21.757 1G / 1G INFO K-mer Counting (kmer_data.cpp : 389) Collection done, postprocessing.
0:14:21.932 1G / 1G INFO K-mer Counting (kmer_data.cpp : 403) There are 44205208 kmers in total. Among them 34158964 (77.2736%) are singletons.
0:14:21.932 1G / 1G INFO General (main.cpp : 173) Subclustering Hamming graph
0:15:21.345 1G / 1G INFO Hamming Subclustering (kmer_cluster.cpp : 649) Subclustering done. Total 22 non-read kmers were generated.
0:15:21.345 1G / 1G INFO Hamming Subclustering (kmer_cluster.cpp : 650) Subclustering statistics:
0:15:21.345 1G / 1G INFO Hamming Subclustering (kmer_cluster.cpp : 651) Total singleton hamming clusters: 2471282. Among them 828412 (33.5215%) are good
0:15:21.345 1G / 1G INFO Hamming Subclustering (kmer_cluster.cpp : 652) Total singleton subclusters: 1719. Among them 1681 (97.7894%) are good
0:15:21.345 1G / 1G INFO Hamming Subclustering (kmer_cluster.cpp : 653) Total non-singleton subcluster centers: 8673370. Among them 8641493 (99.6325%) are good
0:15:21.345 1G / 1G INFO Hamming Subclustering (kmer_cluster.cpp : 654) Average size of non-trivial subcluster: 4.81173 kmers
0:15:21.345 1G / 1G INFO Hamming Subclustering (kmer_cluster.cpp : 655) Average number of sub-clusters per non-singleton cluster: 1.00923
0:15:21.345 1G / 1G INFO Hamming Subclustering (kmer_cluster.cpp : 656) Total solid k-mers: 9471586
0:15:21.345 1G / 1G INFO Hamming Subclustering (kmer_cluster.cpp : 657) Substitution probabilities: 4,4
0:15:21.361 1G / 1G INFO General (main.cpp : 178) Finished clustering.
0:15:21.361 1G / 1G INFO General (main.cpp : 197) Starting solid k-mers expansion in 1 threads.
0:16:56.911 1G / 1G INFO General (main.cpp : 218) Solid k-mers iteration 0 produced 471136 new k-mers.
0:18:32.504 1G / 1G INFO General (main.cpp : 218) Solid k-mers iteration 1 produced 6399 new k-mers.
0:20:07.929 1G / 1G INFO General (main.cpp : 218) Solid k-mers iteration 2 produced 92 new k-mers.
0:21:43.504 1G / 1G INFO General (main.cpp : 218) Solid k-mers iteration 3 produced 0 new k-mers.
0:21:43.504 1G / 1G INFO General (main.cpp : 222) Solid k-mers finalized
0:21:43.504 1G / 1G INFO General (hammer_tools.cpp : 220) Starting read correction in 1 threads.
0:21:43.504 1G / 1G INFO General (hammer_tools.cpp : 233) Correcting pair of reads: /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/trimmed/GCA_011404755.1_ASM1140475v1_genomic_1_paired.fastq.gz and /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/trimmed/GCA_011404755.1_ASM1140475v1_genomic_2_paired.fastq.gz
0:21:44.282 1G / 1G INFO General (hammer_tools.cpp : 168) Prepared batch 0 of 100000 reads.
0:21:55.033 1G / 1G INFO General (hammer_tools.cpp : 175) Processed batch 0
0:21:55.177 1G / 1G INFO General (hammer_tools.cpp : 185) Written batch 0
0:21:55.885 1G / 1G INFO General (hammer_tools.cpp : 168) Prepared batch 1 of 100000 reads.
0:22:06.635 1G / 1G INFO General (hammer_tools.cpp : 175) Processed batch 1
0:22:06.781 1G / 1G INFO General (hammer_tools.cpp : 185) Written batch 1
0:22:07.481 1G / 1G INFO General (hammer_tools.cpp : 168) Prepared batch 2 of 100000 reads.
0:22:18.261 1G / 1G INFO General (hammer_tools.cpp : 175) Processed batch 2
0:22:18.408 1G / 1G INFO General (hammer_tools.cpp : 185) Written batch 2
0:22:19.108 1G / 1G INFO General (hammer_tools.cpp : 168) Prepared batch 3 of 100000 reads.
0:22:29.872 1G / 1G INFO General (hammer_tools.cpp : 175) Processed batch 3
0:22:30.018 1G / 1G INFO General (hammer_tools.cpp : 185) Written batch 3
0:22:30.721 1G / 1G INFO General (hammer_tools.cpp : 168) Prepared batch 4 of 100000 reads.
0:22:41.559 1G / 1G INFO General (hammer_tools.cpp : 175) Processed batch 4
0:22:41.705 1G / 1G INFO General (hammer_tools.cpp : 185) Written batch 4
0:22:42.406 1G / 1G INFO General (hammer_tools.cpp : 168) Prepared batch 5 of 100000 reads.
0:22:53.146 1G / 1G INFO General (hammer_tools.cpp : 175) Processed batch 5
0:22:53.290 1G / 1G INFO General (hammer_tools.cpp : 185) Written batch 5
0:22:53.994 1G / 1G INFO General (hammer_tools.cpp : 168) Prepared batch 6 of 100000 reads.
0:23:04.727 1G / 1G INFO General (hammer_tools.cpp : 175) Processed batch 6
0:23:04.877 1G / 1G INFO General (hammer_tools.cpp : 185) Written batch 6
0:23:05.578 1G / 1G INFO General (hammer_tools.cpp : 168) Prepared batch 7 of 100000 reads.
0:23:16.342 1G / 1G INFO General (hammer_tools.cpp : 175) Processed batch 7
0:23:16.491 1G / 1G INFO General (hammer_tools.cpp : 185) Written batch 7
0:23:17.192 1G / 1G INFO General (hammer_tools.cpp : 168) Prepared batch 8 of 100000 reads.
0:23:27.956 1G / 1G INFO General (hammer_tools.cpp : 175) Processed batch 8
0:23:28.101 1G / 1G INFO General (hammer_tools.cpp : 185) Written batch 8
0:23:28.802 1G / 1G INFO General (hammer_tools.cpp : 168) Prepared batch 9 of 99995 reads.
0:23:39.562 1G / 1G INFO General (hammer_tools.cpp : 175) Processed batch 9
0:23:39.707 1G / 1G INFO General (hammer_tools.cpp : 185) Written batch 9
0:23:40.381 1G / 1G INFO General (hammer_tools.cpp : 274) Correction done. Changed 986783 bases in 805818 reads.
0:23:40.381 1G / 1G INFO General (hammer_tools.cpp : 275) Failed to correct 96864 bases out of 301903719.
0:23:40.485 4M / 1G INFO General (main.cpp : 255) Saving corrected dataset description to /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly/corrected/corrected.yaml
0:23:40.490 4M / 1G INFO General (main.cpp : 262) All done. Exiting.

== Compressing corrected reads (with pigz)

== Dataset description file was created: /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly/corrected/corrected.yaml

===== Read error correction finished.

===== Assembling started.

== Running assembler: K21

0:00:00.000 4M / 4M INFO General (main.cpp : 74) Loaded config from /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly/K21/configs/config.info
0:00:00.000 4M / 4M INFO General (main.cpp : 74) Loaded config from /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly/K21/configs/careful_mode.info
0:00:00.000 4M / 4M INFO General (memory_limit.cpp : 49) Memory limit set to 40 Gb
0:00:00.000 4M / 4M INFO General (main.cpp : 87) Starting SPAdes, built from refs/heads/spades_3.13.0, git revision 8ea46659e9b2aca35444a808db550ac333006f8b
0:00:00.000 4M / 4M INFO General (main.cpp : 88) Maximum k-mer length: 128
0:00:00.000 4M / 4M INFO General (main.cpp : 89) Assembling dataset (/hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly/dataset.info) with K=21
0:00:00.000 4M / 4M INFO General (main.cpp : 90) Maximum # of threads to use (adjusted due to OMP capabilities): 1
0:00:00.000 4M / 4M INFO General (launch.hpp : 51) SPAdes started
0:00:00.000 4M / 4M INFO General (launch.hpp : 58) Starting from stage: construction
0:00:00.000 4M / 4M INFO General (launch.hpp : 65) Two-step RR enabled: 0
0:00:00.001 4M / 4M INFO StageManager (stage.cpp : 132) STAGE == de Bruijn graph construction
0:00:00.004 4M / 4M INFO General (read_converter.hpp : 77) Converting reads to binary format for library #0 (takes a while)
0:00:00.004 4M / 4M INFO General (read_converter.hpp : 78) Converting paired reads
0:00:00.372 80M / 132M INFO General (binary_converter.hpp : 93) 16384 reads processed
0:00:00.555 96M / 132M INFO General (binary_converter.hpp : 93) 32768 reads processed
0:00:00.923 124M / 132M INFO General (binary_converter.hpp : 93) 65536 reads processed
0:00:01.661 180M / 180M INFO General (binary_converter.hpp : 93) 131072 reads processed
0:00:03.143 292M / 292M INFO General (binary_converter.hpp : 93) 262144 reads processed
0:00:06.101 520M / 520M INFO General (binary_converter.hpp : 93) 524288 reads processed
0:00:15.137 124M / 876M INFO General (binary_converter.hpp : 117) 999995 reads written
0:00:15.197 4M / 876M INFO General (read_converter.hpp : 87) Converting single reads
0:00:15.505 132M / 876M INFO General (binary_converter.hpp : 117) 0 reads written
0:00:15.518 4M / 876M INFO General (read_converter.hpp : 95) Converting merged reads
0:00:15.825 132M / 876M INFO General (binary_converter.hpp : 117) 0 reads written
0:00:16.694 4M / 876M INFO General (construction.cpp : 111) Max read length 151
0:00:16.695 4M / 876M INFO General (construction.cpp : 117) Average read length 150.933
0:00:16.695 4M / 876M INFO General (stage.cpp : 101) PROCEDURE == k+1-mer counting
0:00:16.697 4M / 876M INFO General (kmer_index_builder.hpp : 117) Splitting kmer instances into 1 files using 1 threads. This might take a while.
0:00:16.699 4M / 876M INFO General (file_limit.hpp : 32) Open file limit set to 4096
0:00:16.699 4M / 876M INFO General (kmer_splitters.hpp : 89) Memory available for splitting buffers: 13.332 Gb
0:00:16.699 4M / 876M INFO General (kmer_splitters.hpp : 97) Using cell size of 67108864
0:00:29.697 568M / 1G INFO General (kmer_splitters.hpp : 289) Processed 1032972 reads
0:00:42.515 568M / 1G INFO General (kmer_splitters.hpp : 289) Processed 2065962 reads
0:00:55.532 568M / 1G INFO General (kmer_splitters.hpp : 289) Processed 3098942 reads
0:01:07.105 568M / 1G INFO General (kmer_splitters.hpp : 289) Processed 3999980 reads
0:01:07.105 568M / 1G INFO General (kmer_splitters.hpp : 295) Adding contigs from previous K
0:01:07.148 4M / 1G INFO General (kmer_splitters.hpp : 308) Used 3999980 reads
0:01:07.148 4M / 1G INFO General (kmer_index_builder.hpp : 120) Starting k-mer counting.
0:01:08.357 4M / 1G INFO General (kmer_index_builder.hpp : 127) K-mer counting done. There are 6538497 kmers in total.
0:01:08.358 4M / 1G INFO General (kmer_index_builder.hpp : 133) Merging temporary buckets.
0:01:08.840 4M / 1G INFO General (stage.cpp : 101) PROCEDURE == Extension index construction
0:01:08.842 4M / 1G INFO K-mer Index Building (kmer_index_builder.hpp : 301) Building kmer index
0:01:08.842 4M / 1G INFO General (kmer_index_builder.hpp : 117) Splitting kmer instances into 16 files using 1 threads. This might take a while.
0:01:08.844 4M / 1G INFO General (file_limit.hpp : 32) Open file limit set to 4096
0:01:08.844 4M / 1G INFO General (kmer_splitters.hpp : 89) Memory available for splitting buffers: 13.332 Gb
0:01:08.844 4M / 1G INFO General (kmer_splitters.hpp : 97) Using cell size of 4194304
0:01:11.889 580M / 1G INFO General (kmer_splitters.hpp : 380) Processed 6538497 kmers
0:01:11.889 580M / 1G INFO General (kmer_splitters.hpp : 385) Used 6538497 kmers.
0:01:11.904 4M / 1G INFO General (kmer_index_builder.hpp : 120) Starting k-mer counting.
0:01:12.737 4M / 1G INFO General (kmer_index_builder.hpp : 127) K-mer counting done. There are 6494188 kmers in total.
0:01:12.737 4M / 1G INFO General (kmer_index_builder.hpp : 133) Merging temporary buckets.
0:01:13.314 4M / 1G INFO K-mer Index Building (kmer_index_builder.hpp : 314) Building perfect hash indices
0:01:14.434 4M / 1G INFO General (kmer_index_builder.hpp : 150) Merging final buckets.
0:01:14.922 4M / 1G INFO K-mer Index Building (kmer_index_builder.hpp : 336) Index built. Total 3019896 bytes occupied (3.72012 bits per kmer).
0:01:14.927 12M / 1G INFO DeBruijnExtensionIndexBu (kmer_extension_index_build: 99) Building k-mer extensions from k+1-mers
0:01:16.923 12M / 1G INFO DeBruijnExtensionIndexBu (kmer_extension_index_build: 103) Building k-mer extensions from k+1-mers finished.
0:01:16.925 12M / 1G INFO General (stage.cpp : 101) PROCEDURE == Early tip clipping
0:01:16.925 12M / 1G INFO General (construction.cpp : 253) Early tip clipper length bound set as (RL - K)
0:01:16.925 12M / 1G INFO Early tip clipping (early_simplification.hpp : 181) Early tip clipping
0:01:21.762 16M / 1G INFO Early tip clipping (early_simplification.hpp : 184) 997269 22-mers were removed by early tip clipper
0:01:21.762 16M / 1G INFO General (stage.cpp : 101) PROCEDURE == Condensing graph
0:01:21.765 16M / 1G INFO UnbranchingPathExtractor (debruijn_graph_constructor: 355) Extracting unbranching paths
0:01:24.640 24M / 1G INFO UnbranchingPathExtractor (debruijn_graph_constructor: 374) Extracting unbranching paths finished. 131049 sequences extracted
0:01:26.081 24M / 1G INFO UnbranchingPathExtractor (debruijn_graph_constructor: 310) Collecting perfect loops
0:01:26.769 24M / 1G INFO UnbranchingPathExtractor (debruijn_graph_constructor: 343) Collecting perfect loops finished. 0 loops collected
0:01:26.940 52M / 1G INFO General (stage.cpp : 101) PROCEDURE == Filling coverage indices (PHM)
0:01:26.940 52M / 1G INFO K-mer Index Building (kmer_index_builder.hpp : 301) Building kmer index
0:01:26.940 52M / 1G INFO K-mer Index Building (kmer_index_builder.hpp : 314) Building perfect hash indices
0:01:28.152 56M / 1G INFO K-mer Index Building (kmer_index_builder.hpp : 336) Index built. Total 3032416 bytes occupied (3.71023 bits per kmer).
0:01:28.165 84M / 1G INFO General (construction.cpp : 388) Collecting k-mer coverage information from reads, this takes a while.
0:02:09.252 80M / 1G INFO General (construction.cpp : 508) Filling coverage and flanking coverage from PHM
0:02:11.043 80M / 1G INFO General (construction.cpp : 464) Processed 262082 edges
0:02:11.086 40M / 1G INFO StageManager (stage.cpp : 132) STAGE == EC Threshold Finding
0:02:11.087 40M / 1G INFO General (kmer_coverage_model.cpp : 181) Kmer coverage valley at: 14
0:02:11.087 40M / 1G INFO General (kmer_coverage_model.cpp : 201) K-mer histogram maximum: 55
0:02:11.087 40M / 1G INFO General (kmer_coverage_model.cpp : 237) Estimated median coverage: 56. Coverage mad: 8.8956
0:02:11.087 40M / 1G INFO General (kmer_coverage_model.cpp : 259) Fitting coverage model
0:02:11.161 40M / 1G INFO General (kmer_coverage_model.cpp : 295) ... iteration 2
0:02:11.315 40M / 1G INFO General (kmer_coverage_model.cpp : 295) ... iteration 4
0:02:11.893 40M / 1G INFO General (kmer_coverage_model.cpp : 295) ... iteration 8
0:02:12.657 40M / 1G INFO General (kmer_coverage_model.cpp : 309) Fitted mean coverage: 56.5557. Fitted coverage std. dev: 7.58561
0:02:12.660 40M / 1G INFO General (kmer_coverage_model.cpp : 334) Probability of erroneous kmer at valley: 0.99998
0:02:12.660 40M / 1G INFO General (kmer_coverage_model.cpp : 358) Preliminary threshold calculated as: 35
0:02:12.660 40M / 1G INFO General (kmer_coverage_model.cpp : 362) Threshold adjusted to: 35
0:02:12.660 40M / 1G INFO General (kmer_coverage_model.cpp : 375) Estimated genome size (ignoring repeats): 4441797
0:02:12.660 40M / 1G INFO General (genomic_info_filler.cpp : 112) Mean coverage was calculated as 56.5557
0:02:12.660 40M / 1G INFO General (genomic_info_filler.cpp : 127) EC coverage threshold value was calculated as 35
0:02:12.660 40M / 1G INFO General (genomic_info_filler.cpp : 128) Trusted kmer low bound: 0
0:02:12.660 40M / 1G INFO StageManager (stage.cpp : 132) STAGE == Raw Simplification
0:02:12.660 40M / 1G INFO General (simplification.cpp : 128) PROCEDURE == InitialCleaning
0:02:12.660 40M / 1G INFO General (graph_simplification.hpp : 662) Flanking coverage based disconnection disabled
0:02:12.660 40M / 1G INFO Simplification (parallel_processing.hpp : 165) Running Self conjugate edge remover
0:02:12.696 40M / 1G INFO Simplification (parallel_processing.hpp : 167) Self conjugate edge remover triggered 0 times
0:02:12.696 40M / 1G INFO StageManager (stage.cpp : 132) STAGE == Simplification
0:02:12.696 40M / 1G INFO General (simplification.cpp : 357) Graph simplification started
0:02:12.696 40M / 1G INFO General (graph_simplification.hpp : 634) Creating parallel br instance
0:02:12.696 40M / 1G INFO General (simplification.cpp : 362) PROCEDURE == Simplification cycle, iteration 1
0:02:12.696 40M / 1G INFO Simplification (parallel_processing.hpp : 165) Running Tip clipper
0:02:12.741 40M / 1G INFO Simplification (parallel_processing.hpp : 167) Tip clipper triggered 174 times
0:02:12.741 40M / 1G INFO Simplification (parallel_processing.hpp : 165) Running Bulge remover
0:02:14.111 40M / 1G INFO Simplification (parallel_processing.hpp : 167) Bulge remover triggered 41002 times
0:02:14.111 40M / 1G INFO Simplification (parallel_processing.hpp : 165) Running Low coverage edge remover
0:02:14.179 32M / 1G INFO Simplification (parallel_processing.hpp : 167) Low coverage edge remover triggered 2083 times
0:02:14.179 32M / 1G INFO General (simplification.cpp : 362) PROCEDURE == Simplification cycle, iteration 2
0:02:14.179 32M / 1G INFO Simplification (parallel_processing.hpp : 165) Running Tip clipper
0:02:14.183 28M / 1G INFO Simplification (parallel_processing.hpp : 167) Tip clipper triggered 17 times
0:02:14.183 28M / 1G INFO Simplification (parallel_processing.hpp : 165) Running Bulge remover
0:02:14.196 28M / 1G INFO Simplification (parallel_processing.hpp : 167) Bulge remover triggered 76 times
0:02:14.196 28M / 1G INFO Simplification (parallel_processing.hpp : 165) Running Low coverage edge remover
0:02:14.196 28M / 1G INFO Simplification (parallel_processing.hpp : 167) Low coverage edge remover triggered 20 times
0:02:14.196 28M / 1G INFO General (simplification.cpp : 362) PROCEDURE == Simplification cycle, iteration 3
0:02:14.196 28M / 1G INFO Simplification (parallel_processing.hpp : 165) Running Tip clipper
0:02:14.196 28M / 1G INFO Simplification (parallel_processing.hpp : 167) Tip clipper triggered 0 times
0:02:14.196 28M / 1G INFO Simplification (parallel_processing.hpp : 165) Running Bulge remover
0:02:14.199 28M / 1G INFO Simplification (parallel_processing.hpp : 167) Bulge remover triggered 5 times
0:02:14.199 28M / 1G INFO Simplification (parallel_processing.hpp : 165) Running Low coverage edge remover
0:02:14.199 28M / 1G INFO Simplification (parallel_processing.hpp : 167) Low coverage edge remover triggered 0 times
0:02:14.199 28M / 1G INFO General (simplification.cpp : 362) PROCEDURE == Simplification cycle, iteration 4
0:02:14.199 28M / 1G INFO Simplification (parallel_processing.hpp : 165) Running Tip clipper
0:02:14.199 28M / 1G INFO Simplification (parallel_processing.hpp : 167) Tip clipper triggered 0 times
0:02:14.199 28M / 1G INFO Simplification (parallel_processing.hpp : 165) Running Bulge remover
0:02:14.199 28M / 1G INFO Simplification (parallel_processing.hpp : 167) Bulge remover triggered 0 times
0:02:14.199 28M / 1G INFO Simplification (parallel_processing.hpp : 165) Running Low coverage edge remover
0:02:14.199 28M / 1G INFO Simplification (parallel_processing.hpp : 167) Low coverage edge remover triggered 0 times
0:02:14.199 28M / 1G INFO General (simplification.cpp : 362) PROCEDURE == Simplification cycle, iteration 5
0:02:14.199 28M / 1G INFO Simplification (parallel_processing.hpp : 165) Running Tip clipper
0:02:14.199 28M / 1G INFO Simplification (parallel_processing.hpp : 167) Tip clipper triggered 0 times
0:02:14.199 28M / 1G INFO Simplification (parallel_processing.hpp : 165) Running Bulge remover
0:02:14.199 28M / 1G INFO Simplification (parallel_processing.hpp : 167) Bulge remover triggered 0 times
0:02:14.199 28M / 1G INFO Simplification (parallel_processing.hpp : 165) Running Low coverage edge remover
0:02:14.199 28M / 1G INFO Simplification (parallel_processing.hpp : 167) Low coverage edge remover triggered 0 times
0:02:14.199 28M / 1G INFO General (simplification.cpp : 362) PROCEDURE == Simplification cycle, iteration 6
0:02:14.199 28M / 1G INFO Simplification (parallel_processing.hpp : 165) Running Tip clipper
0:02:14.199 28M / 1G INFO Simplification (parallel_processing.hpp : 167) Tip clipper triggered 0 times
0:02:14.199 28M / 1G INFO Simplification (parallel_processing.hpp : 165) Running Bulge remover
0:02:14.199 28M / 1G INFO Simplification (parallel_processing.hpp : 167) Bulge remover triggered 0 times
0:02:14.199 28M / 1G INFO Simplification (parallel_processing.hpp : 165) Running Low coverage edge remover
0:02:14.199 28M / 1G INFO Simplification (parallel_processing.hpp : 167) Low coverage edge remover triggered 0 times
0:02:14.199 28M / 1G INFO General (simplification.cpp : 362) PROCEDURE == Simplification cycle, iteration 7
0:02:14.199 28M / 1G INFO Simplification (parallel_processing.hpp : 165) Running Tip clipper
0:02:14.199 28M / 1G INFO Simplification (parallel_processing.hpp : 167) Tip clipper triggered 0 times
0:02:14.199 28M / 1G INFO Simplification (parallel_processing.hpp : 165) Running Bulge remover
0:02:14.199 28M / 1G INFO Simplification (parallel_processing.hpp : 167) Bulge remover triggered 0 times
0:02:14.199 28M / 1G INFO Simplification (parallel_processing.hpp : 165) Running Low coverage edge remover
0:02:14.199 28M / 1G INFO Simplification (parallel_processing.hpp : 167) Low coverage edge remover triggered 0 times
0:02:14.199 28M / 1G INFO General (simplification.cpp : 362) PROCEDURE == Simplification cycle, iteration 8
0:02:14.199 28M / 1G INFO Simplification (parallel_processing.hpp : 165) Running Tip clipper
0:02:14.199 28M / 1G INFO Simplification (parallel_processing.hpp : 167) Tip clipper triggered 0 times
0:02:14.199 28M / 1G INFO Simplification (parallel_processing.hpp : 165) Running Bulge remover
0:02:14.199 28M / 1G INFO Simplification (parallel_processing.hpp : 167) Bulge remover triggered 0 times
0:02:14.199 28M / 1G INFO Simplification (parallel_processing.hpp : 165) Running Low coverage edge remover
0:02:14.199 28M / 1G INFO Simplification (parallel_processing.hpp : 167) Low coverage edge remover triggered 0 times
0:02:14.199 28M / 1G INFO General (simplification.cpp : 362) PROCEDURE == Simplification cycle, iteration 9
0:02:14.199 28M / 1G INFO Simplification (parallel_processing.hpp : 165) Running Tip clipper
0:02:14.199 28M / 1G INFO Simplification (parallel_processing.hpp : 167) Tip clipper triggered 0 times
0:02:14.199 28M / 1G INFO Simplification (parallel_processing.hpp : 165) Running Bulge remover
0:02:14.199 28M / 1G INFO Simplification (parallel_processing.hpp : 167) Bulge remover triggered 0 times
0:02:14.199 28M / 1G INFO Simplification (parallel_processing.hpp : 165) Running Low coverage edge remover
0:02:14.199 28M / 1G INFO Simplification (parallel_processing.hpp : 167) Low coverage edge remover triggered 0 times
0:02:14.199 28M / 1G INFO General (simplification.cpp : 362) PROCEDURE == Simplification cycle, iteration 10
0:02:14.199 28M / 1G INFO Simplification (parallel_processing.hpp : 165) Running Tip clipper
0:02:14.199 28M / 1G INFO Simplification (parallel_processing.hpp : 167) Tip clipper triggered 0 times
0:02:14.199 28M / 1G INFO Simplification (parallel_processing.hpp : 165) Running Bulge remover
0:02:14.199 28M / 1G INFO Simplification (parallel_processing.hpp : 167) Bulge remover triggered 0 times
0:02:14.199 28M / 1G INFO Simplification (parallel_processing.hpp : 165) Running Low coverage edge remover
0:02:14.199 28M / 1G INFO Simplification (parallel_processing.hpp : 167) Low coverage edge remover triggered 1 times
0:02:14.199 28M / 1G INFO General (simplification.cpp : 362) PROCEDURE == Simplification cycle, iteration 11
0:02:14.199 28M / 1G INFO Simplification (parallel_processing.hpp : 165) Running Tip clipper
0:02:14.200 28M / 1G INFO Simplification (parallel_processing.hpp : 167) Tip clipper triggered 1 times
0:02:14.200 28M / 1G INFO Simplification (parallel_processing.hpp : 165) Running Bulge remover
0:02:14.211 28M / 1G INFO Simplification (parallel_processing.hpp : 167) Bulge remover triggered 7 times
0:02:14.211 28M / 1G INFO Simplification (parallel_processing.hpp : 165) Running Low coverage edge remover
0:02:14.212 28M / 1G INFO Simplification (parallel_processing.hpp : 167) Low coverage edge remover triggered 0 times
0:02:14.212 28M / 1G INFO General (simplification.cpp : 362) PROCEDURE == Simplification cycle, iteration 12
0:02:14.212 28M / 1G INFO Simplification (parallel_processing.hpp : 165) Running Tip clipper
0:02:14.212 28M / 1G INFO Simplification (parallel_processing.hpp : 167) Tip clipper triggered 0 times
0:02:14.212 28M / 1G INFO Simplification (parallel_processing.hpp : 165) Running Bulge remover
0:02:14.212 28M / 1G INFO Simplification (parallel_processing.hpp : 167) Bulge remover triggered 0 times
0:02:14.212 28M / 1G INFO Simplification (parallel_processing.hpp : 165) Running Low coverage edge remover
0:02:14.212 28M / 1G INFO Simplification (parallel_processing.hpp : 167) Low coverage edge remover triggered 0 times
0:02:14.212 28M / 1G INFO StageManager (stage.cpp : 132) STAGE == Simplification Cleanup
0:02:14.212 28M / 1G INFO General (simplification.cpp : 196) PROCEDURE == Post simplification
0:02:14.212 28M / 1G INFO General (graph_simplification.hpp : 453) Disconnection of relatively low covered edges disabled
0:02:14.212 28M / 1G INFO General (graph_simplification.hpp : 489) Complex tip clipping disabled
0:02:14.212 28M / 1G INFO General (graph_simplification.hpp : 634) Creating parallel br instance
0:02:14.212 28M / 1G INFO Simplification (parallel_processing.hpp : 165) Running Tip clipper
0:02:14.213 28M / 1G INFO Simplification (parallel_processing.hpp : 167) Tip clipper triggered 0 times
0:02:14.213 28M / 1G INFO Simplification (parallel_processing.hpp : 165) Running Bulge remover
0:02:14.223 28M / 1G INFO Simplification (parallel_processing.hpp : 167) Bulge remover triggered 0 times
0:02:14.223 28M / 1G INFO Simplification (parallel_processing.hpp : 165) Running Tip clipper
0:02:14.224 28M / 1G INFO Simplification (parallel_processing.hpp : 167) Tip clipper triggered 0 times
0:02:14.224 28M / 1G INFO Simplification (parallel_processing.hpp : 165) Running Bulge remover
0:02:14.234 28M / 1G INFO Simplification (parallel_processing.hpp : 167) Bulge remover triggered 0 times
0:02:14.234 28M / 1G INFO General (simplification.cpp : 330) Disrupting self-conjugate edges
0:02:14.235 28M / 1G INFO Simplification (parallel_processing.hpp : 165) Running Removing isolated edges
0:02:14.235 28M / 1G INFO Simplification (parallel_processing.hpp : 167) Removing isolated edges triggered 0 times
0:02:14.235 28M / 1G INFO General (simplification.cpp : 470) Counting average coverage
0:02:14.236 28M / 1G INFO General (simplification.cpp : 476) Average coverage = 58.246
0:02:14.236 28M / 1G INFO StageManager (stage.cpp : 132) STAGE == Contig Output
0:02:14.236 28M / 1G INFO General (contig_output_stage.cpp : 40) Writing GFA to /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly//K21/assembly_graph_with_scaffolds.gfa
0:02:14.279 28M / 1G INFO General (contig_output.hpp : 22) Outputting contigs to /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly//K21/before_rr.fasta
0:02:14.370 28M / 1G INFO General (contig_output_stage.cpp : 51) Outputting FastG graph to /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly//K21/assembly_graph.fastg
0:02:14.627 28M / 1G INFO General (contig_output.hpp : 22) Outputting contigs to /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly//K21/simplified_contigs.fasta
0:02:14.720 28M / 1G INFO General (contig_output.hpp : 22) Outputting contigs to /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly//K21/final_contigs.fasta
0:02:14.852 28M / 1G INFO StageManager (stage.cpp : 132) STAGE == Contig Output
0:02:14.852 28M / 1G INFO General (contig_output_stage.cpp : 40) Writing GFA to /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly//K21/assembly_graph_with_scaffolds.gfa
0:02:14.897 28M / 1G INFO General (contig_output.hpp : 22) Outputting contigs to /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly//K21/before_rr.fasta
0:02:14.992 28M / 1G INFO General (contig_output_stage.cpp : 51) Outputting FastG graph to /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly//K21/assembly_graph.fastg
0:02:15.250 28M / 1G INFO General (contig_output.hpp : 22) Outputting contigs to /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly//K21/simplified_contigs.fasta
0:02:15.344 28M / 1G INFO General (contig_output.hpp : 22) Outputting contigs to /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly//K21/final_contigs.fasta
0:02:15.478 28M / 1G INFO General (launch.hpp : 149) SPAdes finished
0:02:15.492 8M / 1G INFO General (main.cpp : 109) Assembling time: 0 hours 2 minutes 15 seconds
Max read length detected as 151

== Running assembler: K33

0:00:00.000 4M / 4M INFO General (main.cpp : 74) Loaded config from /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly/K33/configs/config.info
0:00:00.000 4M / 4M INFO General (main.cpp : 74) Loaded config from /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly/K33/configs/careful_mode.info
0:00:00.000 4M / 4M INFO General (memory_limit.cpp : 49) Memory limit set to 40 Gb
0:00:00.000 4M / 4M INFO General (main.cpp : 87) Starting SPAdes, built from refs/heads/spades_3.13.0, git revision 8ea46659e9b2aca35444a808db550ac333006f8b
0:00:00.000 4M / 4M INFO General (main.cpp : 88) Maximum k-mer length: 128
0:00:00.000 4M / 4M INFO General (main.cpp : 89) Assembling dataset (/hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly/dataset.info) with K=33
0:00:00.000 4M / 4M INFO General (main.cpp : 90) Maximum # of threads to use (adjusted due to OMP capabilities): 1
0:00:00.000 4M / 4M INFO General (launch.hpp : 51) SPAdes started

0:02:05.760 16M / 1G INFO StageManager (stage.cpp : 132) STAGE == Contig Output
0:02:05.760 16M / 1G INFO General (contig_output_stage.cpp : 40) Writing GFA to /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly//K33/assembly_graph_with_scaffolds.gfa
0:02:05.800 16M / 1G INFO General (contig_output.hpp : 22) Outputting contigs to /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly//K33/before_rr.fasta
0:02:05.885 16M / 1G INFO General (contig_output_stage.cpp : 51) Outputting FastG graph to /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly//K33/assembly_graph.fastg
0:02:06.130 16M / 1G INFO General (contig_output.hpp : 22) Outputting contigs to /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly//K33/simplified_contigs.fasta
0:02:06.218 16M / 1G INFO General (contig_output.hpp : 22) Outputting contigs to /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly//K33/final_contigs.fasta
0:02:06.347 16M / 1G INFO StageManager (stage.cpp : 132) STAGE == Contig Output
0:02:06.347 16M / 1G INFO General (contig_output_stage.cpp : 40) Writing GFA to /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly//K33/assembly_graph_with_scaffolds.gfa
0:02:06.390 16M / 1G INFO General (contig_output.hpp : 22) Outputting contigs to /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly//K33/before_rr.fasta
0:02:06.480 16M / 1G INFO General (contig_output_stage.cpp : 51) Outputting FastG graph to /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly//K33/assembly_graph.fastg
0:02:06.735 16M / 1G INFO General (contig_output.hpp : 22) Outputting contigs to /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly//K33/simplified_contigs.fasta
0:02:06.826 16M / 1G INFO General (contig_output.hpp : 22) Outputting contigs to /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly//K33/final_contigs.fasta
0:02:06.958 16M / 1G INFO General (launch.hpp : 149) SPAdes finished
0:02:06.964 8M / 1G INFO General (main.cpp : 109) Assembling time: 0 hours 2 minutes 6 seconds

== Running assembler: K55

0:00:00.000 4M / 4M INFO General (main.cpp : 74) Loaded config from /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly/K55/configs/config.info
0:00:00.000 4M / 4M INFO General (main.cpp : 74) Loaded config from /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly/K55/configs/careful_mode.info
0:00:00.000 4M / 4M INFO General (memory_limit.cpp : 49) Memory limit set to 40 Gb
0:00:00.000 4M / 4M INFO General (main.cpp : 87) Starting SPAdes, built from refs/heads/spades_3.13.0, git revision 8ea46659e9b2aca35444a808db550ac333006f8b
0:00:00.000 4M / 4M INFO General (main.cpp : 88) Maximum k-mer length: 128
0:00:00.000 4M / 4M INFO General (main.cpp : 89) Assembling dataset (/hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/..
..
0:02:31.939 232M / 1G INFO General (contig_output.hpp : 22) Outputting contigs to /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly//K55/final_contigs.fasta
0:02:32.066 232M / 1G INFO General (launch.hpp : 149) SPAdes finished
0:02:32.418 12M / 1G INFO General (main.cpp : 109) Assembling time: 0 hours 2 minutes 32 seconds

== Running assembler: K77

0:00:00.000 4M / 4M INFO General (main.cpp : 74) Loaded config from /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly/K77/configs/config.info
0:00:00.000 4M / 4M INFO General (main.cpp : 74) Loaded config from /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly/K77/configs/careful_mode.info
0:00:00.000 4M / 4M INFO General (memory_limit.cpp : 49) Memory limit set to 40 Gb
0:00:00.000 4M / 4M INFO General (main.cpp : 87) Starting SPAdes, built from refs/heads/spades_3.13.0, git revision 8ea46659e9b2aca35444a808db550ac333006f8b
0:00:00.000 4M / 4M INFO General (main.cpp : 88) Maximum k-mer length: 128
0:00:00.000 4M / 4M INFO General (main.cpp : 89) Assembling dataset (/hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly/dataset.info) with K=77
0:00:00.000 4M / 4M INFO General (main.cpp : 90) Maximum # of threads to use (adjusted due to OMP capabilities): 1
0:00:00.000 4M / 4M INFO General (launch.hpp : 51) SPAdes started
0:00:00.000 4M / 4M INFO General (launch.hpp : 58) Starting from stage: construction
0:00:00.000 4M / 4M INFO General (launch.hpp : 65) Two-step RR enabled: 0
..
0:02:28.441 128M / 1G INFO General (launch.hpp : 149) SPAdes finished
0:02:28.479 8M / 1G INFO General (main.cpp : 109) Assembling time: 0 hours 2 minutes 28 seconds

== Running assembler: K99

0:00:00.000 4M / 4M INFO General (main.cpp : 74) Loaded config from /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly/K99/configs/config.info
0:00:00.000 4M / 4M INFO General (main.cpp : 74) Loaded config from /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly/K99/configs/careful_mode.info
0:00:00.000 4M / 4M INFO General (memory_limit.cpp : 49) Memory limit set to 40 Gb
0:00:00.000 4M / 4M INFO General (main.cpp : 87) Starting SPAdes, built from refs/heads/spades_3.13.0, git revision 8ea46659e9b2aca35444a808db550ac333006f8b
0:00:00.000 4M / 4M INFO General (main.cpp : 88) Maximum k-mer length: 128
0:00:00.000 4M / 4M INFO General (main.cpp : 89) Assembling dataset (/hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly/dataset.info) with K=99
0:00:00.000 4M / 4M INFO General (main.cpp : 90) Maximum # of threads to use (adjusted due to OMP capabilities): 1
0:00:00.000 4M / 4M INFO General (launch.hpp : 51) SPAdes started
0:00:00.000 4M / 4M INFO General (launch.hpp : 58) Starting from stage: construction
0:00:00.000 4M / 4M INFO General (launch.hpp : 65) Two-step RR enabled: 0
0:00:00.001 4M / 4M INFO General (launch.hpp : 76) Will need read mapping, kmer mapper will be attached
0:00:00.001 4M / 4M INFO StageManager (stage.cpp : 132) STAGE == de Bruijn graph construction
..
0:02:21.316 140M / 1G INFO General (contig_output.hpp : 22) Outputting contigs to /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly//K99/simplified_contigs.fasta
0:02:21.402 140M / 1G INFO General (contig_output.hpp : 22) Outputting contigs to /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly//K99/final_contigs.fasta
0:02:21.531 140M / 1G INFO General (launch.hpp : 149) SPAdes finished
0:02:21.588 8M / 1G INFO General (main.cpp : 109) Assembling time: 0 hours 2 minutes 21 seconds

== Running assembler: K127

0:00:00.000 4M / 4M INFO General (main.cpp : 74) Loaded config from /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly/K127/configs/config.info
0:00:00.000 4M / 4M INFO General (main.cpp : 74) Loaded config from /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly/K127/configs/careful_mode.info
0:00:00.000 4M / 4M INFO General (memory_limit.cpp : 49) Memory limit set to 40 Gb
0:00:00.000 4M / 4M INFO General (main.cpp : 87) Starting SPAdes, built from refs/heads/spades_3.13.0, git revision 8ea46659e9b2aca35444a808db550ac333006f8b
0:00:00.000 4M / 4M INFO General (main.cpp : 88) Maximum k-mer length: 128
0:00:00.000 4M / 4M INFO General (main.cpp : 89) Assembling dataset (/hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly/dataset.info) with K=127
0:00:00.000 4M / 4M INFO General (main.cpp : 90) Maximum # of threads to use (adjusted due to OMP capabilities): 1
0:00:00.000 4M / 4M INFO General (launch.hpp : 51) SPAdes started
0:00:00.000 4M / 4M INFO General (launch.hpp : 58) Starting from stage: construction
0:00:00.000 4M / 4M INFO General (launch.hpp : 65) Two-step RR enabled: 0
0:00:00.001 4M / 4M INFO General (launch.hpp : 76) Will need read mapping, kmer mapper will be attached
0:00:00.001 4M / 4M INFO StageManager (stage.cpp : 132) STAGE == de Bruijn graph construction
0:00:00.003 4M / 4M INFO General (read_converter.hpp : 59) Binary reads detected
0:00:00.004 4M / 4M INFO General (construction.cpp : 111) Max read length 151
0:00:00.004 4M / 4M INFO General (construction.cpp : 117) Average read length 150.933
0:00:00.004 4M / 4M INFO General (stage.cpp : 101) PROCEDURE == k+1-mer counting
..
0:02:14.702 144M / 1G INFO General (contig_output.hpp : 22) Outputting contigs to /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly//K127/before_rr.fasta
0:02:14.789 144M / 1G INFO General (contig_output_stage.cpp : 51) Outputting FastG graph to /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly//K127/assembly_graph.fastg
0:02:15.033 144M / 1G INFO General (contig_output_stage.cpp : 20) Outputting FastG paths to /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly//K127/final_contigs.paths
0:02:15.154 144M / 1G INFO General (contig_output_stage.cpp : 20) Outputting FastG paths to /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly//K127/scaffolds.paths
0:02:15.311 144M / 1G INFO General (launch.hpp : 149) SPAdes finished
0:02:15.354 8M / 1G INFO General (main.cpp : 109) Assembling time: 0 hours 2 minutes 15 seconds

===== Assembling finished. Used k-mer sizes: 21, 33, 55, 77, 99, 127

===== Mismatch correction started.

== Processing of contigs

== Running contig polishing tool: /hpc/dla_mm/jpaganini/data/miniconda3/envs/plasmid_id/share/spades-3.13.0-0/bin/spades-corrector-core /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly/mismatch_corrector/contigs/configs/corrector.info /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly/misc/assembled_contigs.fasta

== Dataset description file was created: /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly/mismatch_corrector/contigs/configs/corrector.info

/hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly/mismatch_corrector/contigs/configs/log.properties 0:00:00.000 4M / 4M INFO General (main.cpp : 58) Starting MismatchCorrector, built from refs/heads/spades_3.13.0, git revision 8ea46659e9b2aca35444a808db550ac333006f8b
0:00:00.000 4M / 4M INFO General (main.cpp : 59) Maximum # of threads to use (adjusted due to OMP capabilities): 1
0:00:00.000 4M / 4M INFO DatasetProcessor (dataset_processor.cpp : 195) Splitting assembly...
0:00:00.000 4M / 4M INFO DatasetProcessor (dataset_processor.cpp : 196) Assembly file: /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly/misc/assembled_contigs.fasta
0:00:00.629 4M / 4M INFO DatasetProcessor (dataset_processor.cpp : 203) Processing paired sublib of number 0
0:00:00.629 4M / 4M INFO DatasetProcessor (dataset_processor.cpp : 206) /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/trimmed/GCA_011404755.1_ASM1140475v1_genomic_1_paired.fastq.gz /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/trimmed/GCA_011404755.1_ASM1140475v1_genomic_2_paired.fastq.gz
0:00:00.630 4M / 4M INFO DatasetProcessor (dataset_processor.cpp : 140) Running bwa index ...: /hpc/dla_mm/jpaganini/data/miniconda3/envs/plasmid_id/share/spades-3.13.0-0/bin/spades-bwa index -a is /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly/misc/assembled_contigs.fasta
[bwa_index] Pack FASTA... 0.05 sec
[bwa_index] Construct BWT for the packed sequence...
[bwa_index] 1.11 seconds elapse.
[bwa_index] Update BWT... 0.03 sec
[bwa_index] Pack forward-only FASTA... 0.01 sec
[bwa_index] Construct SA from BWT and Occ... 0.48 sec
[main] Version: 0.7.12-r1039
[main] CMD: /hpc/dla_mm/jpaganini/data/miniconda3/envs/plasmid_id/share/spades-3.13.0-0/bin/spades-bwa index -a is /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly/misc/assembled_contigs.fasta
[main] Real time: 1.816 sec; CPU: 1.689 sec
0:00:02.491 4M / 4M INFO DatasetProcessor (dataset_processor.cpp : 149) Running bwa mem ...:/hpc/dla_mm/jpaganini/data/miniconda3/envs/plasmid_id/share/spades-3.13.0-0/bin/spades-bwa mem -v 1 -t 1 /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly/misc/assembled_contigs.fasta /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/trimmed/GCA_011404755.1_ASM1140475v1_genomic_1_paired.fastq.gz /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/trimmed/GCA_011404755.1_ASM1140475v1_genomic_2_paired.fastq.gz > /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly/tmp/corrector_u3yc490y/lib0_qEblzI/tmp.sam
[main] Version: 0.7.12-r1039
[main] CMD: /hpc/dla_mm/jpaganini/data/miniconda3/envs/plasmid_id/share/spades-3.13.0-0/bin/spades-bwa mem -v 1 -t 1 /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly/misc/assembled_contigs.fasta /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/trimmed/GCA_011404755.1_ASM1140475v1_genomic_1_paired.fastq.gz /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/trimmed/GCA_011404755.1_ASM1140475v1_genomic_2_paired.fastq.gz
[main] Real time: 99.016 sec; CPU: 106.835 sec
0:01:41.517 4M / 4M INFO DatasetProcessor (dataset_processor.cpp : 209) Adding samfile /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly/tmp/corrector_u3yc490y/lib0_qEblzI/tmp.sam
0:01:52.274 48M / 48M INFO DatasetProcessor (dataset_processor.cpp : 105) processed 1000000reads, flushing
0:02:03.447 48M / 48M INFO DatasetProcessor (dataset_processor.cpp : 105) processed 2000000reads, flushing
0:02:04.545 8M / 48M INFO DatasetProcessor (dataset_processor.cpp : 235) Processing contigs
0:02:10.606 36M / 48M INFO DatasetProcessor (dataset_processor.cpp : 251) Contig NODE_1_length_276305_cov_10.791692 processed with 2 changes in thread 0
0:02:15.477 36M / 48M INFO DatasetProcessor (dataset_processor.cpp : 251) Contig NODE_2_length_243149_cov_10.143588 processed with 0 changes in thread 0
0:02:19.350 32M / 48M INFO DatasetProcessor (dataset_processor.cpp : 251) Contig NODE_3_length_197455_cov_10.080455 processed with 0 changes in thread 0
0:02:22.948 28M / 48M INFO DatasetProcessor (dataset_processor.cpp : 251) Contig NODE_4_length_183139_cov_10.101064 processed with 0 changes in thread 0
0:02:26.215 24M / 48M INFO DatasetProcessor (dataset_processor.cpp : 251) Contig NODE_5_length_165258_cov_10.112747 processed with 0 changes in thread 0
0:02:28.932 20M / 48M INFO DatasetProcessor (dataset_processor.cpp : 251) Contig NODE_6_length_138875_cov_10.026631 processed with 0 changes in thread 0
0:02:31.608 20M / 48M INFO DatasetProcessor (dataset_processor.cpp : 251) Contig NODE_7_length_135159_cov_10.135857 processed with 0 changes in thread 0
0:02:34.209 20M / 48M INFO DatasetProcessor (dataset_processor.cpp : 251) Contig NODE_8_length_132800_cov_10.037302 processed with 0 changes in thread 0
0:02:36.800 20M / 48M INFO DatasetProcessor (dataset_processor.cpp : 251) Contig NODE_9_length_132384_cov_10.035091 processed with 0 changes in thread 0
0:02:39.396 20M / 48M INFO DatasetProcessor (dataset_processor.cpp : 251) Contig NODE_10_length_132014_cov_10.112847 processed with 0 changes in thread 0
0:02:41.920 20M / 48M INFO DatasetProcessor (dataset_processor.cpp : 251) Contig NODE_11_length_128079_cov_10.032606 processed with 0 changes in thread 0
0:02:44.124 20M / 48M INFO DatasetProcessor (dataset_processor.cpp : 251) Contig NODE_12_length_111557_cov_10.147160 processed with 0 changes in thread 0
0:02:46.230 20M / 48M INFO DatasetProcessor (dataset_processor.cpp : 251) Contig NODE_13_length_105858_cov_10.213192 processed with 0 changes in thread 0
0:02:48.244 20M / 48M INFO DatasetProcessor (dataset_processor.cpp : 251) Contig NODE_14_length_102135_cov_10.138920 processed with 0 changes in thread 0
0:02:49.972 16M / 48M INFO DatasetProcessor (dataset_processor.cpp : 251) Contig NODE_15_length_88537_cov_10.042631 processed with 0 changes in thread 0

0:03:19.537 8M / 48M INFO DatasetProcessor (dataset_processor.cpp : 251) Contig NODE_39_length_41772_cov_10.041662 processed with 0 changes in thread 0
0:03:20.339 8M / 48M INFO DatasetProcessor (dataset_processor.cpp : 251) Contig NODE_40_length_40483_cov_10.150263 processed with 0 changes in thread 0
0:03:21.126 8M / 48M INFO DatasetProcessor (dataset_processor.cpp : 251) Contig NODE_41_length_39822_cov_10.155889 processed with 0 changes in thread 0
0:03:21.856 8M / 48M INFO DatasetProcessor (dataset_processor.cpp : 251) Contig NODE_42_length_37235_cov_10.057966 processed with 0 changes in thread 0
0:03:22.564 8M / 48M INFO DatasetProcessor (dataset_processor.cpp : 251) Contig NODE_43_length_36180_cov_10.036557 processed with 0 changes in thread 0
0:03:23.198 8M / 48M INFO DatasetProcessor (dataset_processor.cpp : 251) Contig NODE_44_length_32026_cov_10.125991 processed with 0 changes in thread 0
0:03:23.809 8M / 48M INFO DatasetProcessor (dataset_processor.cpp : 251) Contig NODE_45_length_31111_cov_10.086980 processed with 0 changes in thread 0
0:03:24.409 8M / 48M INFO DatasetProcessor (dataset_processor.cpp : 251) Contig NODE_46_length_30878_cov_9.944132 processed with 0 changes in thread 0
0:03:25.010 8M / 48M INFO DatasetProcessor (dataset_processor.cpp : 251) Contig NODE_47_length_30822_cov_10.001662 processed with 0 changes in thread 0
0:03:25.598 8M / 48M INFO DatasetProcessor (dataset_processor.cpp : 251) Contig NODE_48_length_29723_cov_10.141033 processed with 0 changes in thread 0
0:03:26.113 8M / 48M INFO DatasetProcessor (dataset_processor.cpp : 251) Contig NODE_49_length_26457_cov_9.975997 processed with 0 changes in thread 0
0:03:26.524 8M / 48M INFO DatasetProcessor (dataset_processor.cpp : 251) Contig NODE_50_length_20861_cov_9.972412 processed with 0 changes in thread 0
0:03:34.831 4M / 48M INFO DatasetProcessor (dataset_processor.cpp : 255) Gluing processed contigs
0:03:34.941 4M / 48M INFO General (main.cpp : 72) Correcting time: 0 hours 3 minutes 34 seconds

== Processing of scaffolds

== Running contig polishing tool: /hpc/dla_mm/jpaganini/data/miniconda3/envs/plasmid_id/share/spades-3.13.0-0/bin/spades-corrector-core /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly/mismatch_corrector/scaffolds/configs/corrector.info /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly/misc/assembled_scaffolds.fasta

== Dataset description file was created: /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly/mismatch_corrector/scaffolds/configs/corrector.info

/hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly/mismatch_corrector/scaffolds/configs/log.properties 0:00:00.000 4M / 4M INFO General (main.cpp : 58) Starting MismatchCorrector, built from refs/heads/spades_3.13.0, git revision 8ea46659e9b2aca35444a808db550ac333006f8b
0:00:00.000 4M / 4M INFO General (main.cpp : 59) Maximum # of threads to use (adjusted due to OMP capabilities): 1
0:00:00.000 4M / 4M INFO DatasetProcessor (dataset_processor.cpp : 195) Splitting assembly...
0:00:00.000 4M / 4M INFO DatasetProcessor (dataset_processor.cpp : 196) Assembly file: /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly/misc/assembled_scaffolds.fasta
0:00:00.614 4M / 4M INFO DatasetProcessor (dataset_processor.cpp : 203) Processing paired sublib of number 0
0:00:00.614 4M / 4M INFO DatasetProcessor (dataset_processor.cpp : 206) /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/trimmed/GCA_011404755.1_ASM1140475v1_genomic_1_paired.fastq.gz /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/trimmed/GCA_011404755.1_ASM1140475v1_genomic_2_paired.fastq.gz
0:00:00.615 4M / 4M INFO DatasetProcessor (dataset_processor.cpp : 140) Running bwa index ...: /hpc/dla_mm/jpaganini/data/miniconda3/envs/plasmid_id/share/spades-3.13.0-0/bin/spades-bwa index -a is /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly/misc/assembled_scaffolds.fasta
[bwa_index] Pack FASTA... 0.05 sec
[bwa_index] Construct BWT for the packed sequence...
[bwa_index] 1.11 seconds elapse.
[bwa_index] Update BWT... 0.03 sec
[bwa_index] Pack forward-only FASTA... 0.03 sec
[bwa_index] Construct SA from BWT and Occ... 0.48 sec
[main] Version: 0.7.12-r1039
[main] CMD: /hpc/dla_mm/jpaganini/data/miniconda3/envs/plasmid_id/share/spades-3.13.0-0/bin/spades-bwa index -a is /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly/misc/assembled_scaffolds.fasta
[main] Real time: 1.834 sec; CPU: 1.707 sec
0:00:02.459 4M / 4M INFO DatasetProcessor (dataset_processor.cpp : 149) Running bwa mem ...:/hpc/dla_mm/jpaganini/data/miniconda3/envs/plasmid_id/share/spades-3.13.0-0/bin/spades-bwa mem -v 1 -t 1 /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly/misc/assembled_scaffolds.fasta /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/trimmed/GCA_011404755.1_ASM1140475v1_genomic_1_paired.fastq.gz /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/trimmed/GCA_011404755.1_ASM1140475v1_genomic_2_paired.fastq.gz > /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly/tmp/corrector_ljrgi6px/lib0_hKba9g/tmp.sam
[main] Version: 0.7.12-r1039
[main] CMD: /hpc/dla_mm/jpaganini/data/miniconda3/envs/plasmid_id/share/spades-3.13.0-0/bin/spades-bwa mem -v 1 -t 1 /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly/misc/assembled_scaffolds.fasta /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/trimmed/GCA_011404755.1_ASM1140475v1_genomic_1_paired.fastq.gz /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/trimmed/GCA_011404755.1_ASM1140475v1_genomic_2_paired.fastq.gz
[main] Real time: 99.052 sec; CPU: 106.864 sec
0:01:41.519 4M / 4M INFO DatasetProcessor (dataset_processor.cpp : 209) Adding samfile /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly/tmp/corrector_ljrgi6px/lib0_hKba9g/tmp.sam
0:01:52.248 48M / 48M INFO DatasetProcessor (dataset_processor.cpp : 105) processed 1000000reads, flushing
0:02:03.381 48M / 48M INFO DatasetProcessor (dataset_processor.cpp : 105) processed 2000000reads, flushing
0:02:04.470 4M / 48M INFO DatasetProcessor (dataset_processor.cpp : 235) Processing contigs
0:02:10.441 36M / 48M INFO DatasetProcessor (dataset_processor.cpp : 251) Contig NODE_1_length_276305_cov_10.791692 processed with 2 changes in thread 0
0:02:15.361 36M / 48M INFO DatasetProcessor (dataset_processor.cpp : 251) Contig NODE_2_length_243149_cov_10.143588 processed with 0 changes in thread 0
0:02:19.346 32M / 48M INFO DatasetProcessor (dataset_processor.cpp : 251) Contig NODE_3_length_197455_cov_10.080455 processed with 0 changes in thread 0
0:02:23.038 28M / 48M INFO DatasetProcessor (dataset_processor.cpp : 251) Contig NODE_4_length_183139_cov_10.101064 processed with 0 changes in thread 0
0:02:26.373 24M / 48M INFO DatasetProcessor (dataset_processor.cpp : 251) Contig NODE_5_length_165258_cov_10.112747 processed with 0 changes in thread 0
0:02:29.179 20M / 48M INFO DatasetProcessor (dataset_processor.cpp : 251) Contig NODE_6_length_139437_cov_10.042115 processed with 5 changes in thread 0
0:02:31.954 20M / 48M INFO DatasetProcessor (dataset_processor.cpp : 251) Contig NODE_7_length_138875_cov_10.026631 processed with 0 changes in thread 0
0:02:34.687 20M / 48M INFO DatasetProcessor (dataset_processor.cpp : 251) Contig NODE_8_length_135159_cov_10.135857 processed with 0 changes in thread 0
0:02:37.327 20M / 48M INFO DatasetProcessor (dataset_processor.cpp : 251) Contig NODE_9_length_132800_cov_10.037302 processed with 0 changes in thread 0
0:02:39.966 20M / 48M INFO DatasetProcessor (dataset_processor.cpp : 251) Contig NODE_10_length_132014_cov_10.112847 processed with 0 changes in thread 0
0:02:42.510 20M / 48M INFO DatasetProcessor (dataset_processor.cpp : 251) Contig NODE_11_length_128079_cov_10.032606 processed with 0 changes in thread 0
0:02:44.750 20M / 48M INFO DatasetProcessor (dataset_processor.cpp : 251) Contig NODE_12_length_111557_cov_10.147160 processed with 0 changes in thread 0
0:02:46.888 20M / 48M INFO DatasetProcessor (dataset_processor.cpp : 251) Contig NODE_13_length_105858_cov_10.213192 processed with 0 changes in thread 0
0:02:48.939 20M / 48M INFO DatasetProcessor (dataset_processor.cpp : 251) Contig NODE_14_length_102135_cov_10.138920 processed with 0 changes in thread 0
0:02:50.693 16M / 48M INFO DatasetProcessor (dataset_processor.cpp : 251) Contig NODE_15_length_88537_cov_10.042631 processed with 0 changes in thread 0
0:02:52.445 16M / 48M INFO DatasetProcessor (dataset_processor.cpp : 251) Contig NODE_16_length_87789_cov_10.071525 processed with 0 changes in thread 0

0:03:23.850 8M / 48M INFO DatasetProcessor (dataset_processor.cpp : 251) Contig NODE_43_length_36180_cov_10.036557 processed with 0 changes in thread 0
0:03:24.496 8M / 48M INFO DatasetProcessor (dataset_processor.cpp : 251) Contig NODE_44_length_32026_cov_10.125991 processed with 0 changes in thread 0
0:03:25.117 8M / 48M INFO DatasetProcessor (dataset_processor.cpp : 251) Contig NODE_45_length_31111_cov_10.086980 processed with 0 changes in thread 0
0:03:25.728 8M / 48M INFO DatasetProcessor (dataset_processor.cpp : 251) Contig NODE_46_length_30878_cov_9.944132 processed with 0 changes in thread 0
0:03:26.339 8M / 48M INFO DatasetProcessor (dataset_processor.cpp : 251) Contig NODE_47_length_30822_cov_10.001662 processed with 0 changes in thread 0
0:03:26.935 8M / 48M INFO DatasetProcessor (dataset_processor.cpp : 251) Contig NODE_48_length_29723_cov_10.141033 processed with 0 changes in thread 0
0:03:27.458 8M / 48M INFO DatasetProcessor (dataset_processor.cpp : 251) Contig NODE_49_length_26457_cov_9.975997 processed with 0 changes in thread 0
0:03:27.874 8M / 48M INFO DatasetProcessor (dataset_processor.cpp : 251) Contig NODE_50_length_20861_cov_9.972412 processed with 0 changes in thread 0
0:03:36.169 4M / 48M INFO DatasetProcessor (dataset_processor.cpp : 255) Gluing processed contigs
0:03:36.280 4M / 48M INFO General (main.cpp : 72) Correcting time: 0 hours 3 minutes 36 seconds

===== Mismatch correction finished.

  • Corrected reads are in /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly/corrected/
  • Assembled contigs are in /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly/contigs.fasta
  • Assembled scaffolds are in /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly/scaffolds.fasta
  • Assembly graph is in /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly/assembly_graph.fastg
  • Assembly graph in GFA format is in /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly/assembly_graph_with_scaffolds.gfa
  • Paths in the assembly graph corresponding to the contigs are in /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly/contigs.paths
  • Paths in the assembly graph corresponding to the scaffolds are in /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly/scaffolds.paths

======= SPAdes pipeline finished.

SPAdes log can be found here: /hpc/dla_mm/jpaganini/data/recovering_ecoli_plasmids/testing_plasmidid/20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly/spades.log

Thank you for using SPAdes!
Sun Aug 30 14:40:09 CEST 2020
DONE. Assembled contigs can be found at 20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly/contigs.fasta:
DONE. Assembled scaffolds can be found at 20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/assembly/scaffolds.fasta:
Removing unnecesary folders
DONE removing unwanted folders

#Executing /home/dla_mm/jpaganini/data/miniconda3/envs/plasmid_id/bin/mash_screener.sh

DEPENDENCY STATUS


bash �[0;32mINSTALLED�[0m
mash �[0;32mINSTALLED�[0m
Output directory is 20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/kmer
creating sketch of 2020-08-19_plasmids.fasta
Sketching plasmidid_db/2020-08-19_plasmids.fasta...
Writing to 20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/kmer/database.msh...
Sun Aug 30 14:43:22 CEST 2020
screening reads/GCA_011404755.1_ASM1140475v1_genomic_R1.fq
Loading 20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/kmer/database.msh...
8782236 distinct hashes.
Streaming from reads/GCA_011404755.1_ASM1140475v1_genomic_R1.fq...
Estimated distinct k-mers in pool: 15336880
Summing shared...
Reallocating to winners...
Computing coverage medians...
Writing output...
Sun Aug 30 14:45:36 CEST 2020
DONE Screening GCA_011404755.1_ASM1140475v1_genomic of NO_GROUP Group

Retrieving sequences matching more than 0.95 identity

#Executing /home/dla_mm/jpaganini/data/miniconda3/envs/plasmid_id/bin/filter_fasta.sh

Output directory is 20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/kmer
Sun Aug 30 14:45:37 CEST 2020
Filtering terms on file 2020-08-19_plasmids.fasta
Sun Aug 30 14:45:57 CEST 2020
DONE Filtering terms on file 2020-08-19_plasmids.fasta
File with filtered sequences can be found in 20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/kmer/database.filtered_0.95_term.fasta
Previous number of sequences= 22845
Post number of sequences= 28

Namespace(distance=0.5, input_file='20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/kmer/database.filtered_0.95_term.fasta', output=False, output_grouped=False)
Obtaining mash distance
Obtaining cluster from distance
Calculating length
Filtering representative fasta
�[35m28 sequences clustered into 28�[0m
DONE

#Executing /home/dla_mm/jpaganini/data/miniconda3/envs/plasmid_id/bin/bowtie_mapper.sh

DEPENDENCY STATUS


bowtie2-build �[0;32mINSTALLED�[0m
bowtie2 �[0;32mINSTALLED�[0m
Output directory is 20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/mapping
Building index of database.filtered_0.95_term.0.5.representative.fasta
Settings:
Output files: "20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/kmer/database.filtered_0.95_term.0.5.representative.fasta..bt2"
Line rate: 6 (line is 64 bytes)
Lines per side: 1 (side is 64 bytes)
Offset rate: 1 (one in 2)
FTable chars: 10
Strings: unpacked
Max bucket size: default
Max bucket size, sqrt multiplier: default
Max bucket size, len divisor: 4
Difference-cover sample period: 1024
Endianness: little
Actual local endianness: little
Sanity checking: disabled
Assertions: disabled
Random seed: 0
Sizeofs: void
:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/kmer/database.filtered_0.95_term.0.5.representative.fasta
Building a SMALL index
Reading reference sizes
Time reading reference sizes: 00:00:00
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
Time to join reference sequences: 00:00:00
bmax according to bmaxDivN setting: 363379
Using parameters --bmax 272535 --dcv 1024
Doing ahead-of-time memory usage test
Passed! Constructing with these parameters: --bmax 272535 --dcv 1024
Constructing suffix-array element generator
Building DifferenceCoverSample
Building sPrime
Building sPrimeOrder
V-Sorting samples
V-Sorting samples time: 00:00:00
Allocating rank array
Ranking v-sort output
Ranking v-sort output time: 00:00:00
Invoking Larsson-Sadakane on ranks
Invoking Larsson-Sadakane on ranks time: 00:00:00
Sanity-checking and returning
Building samples
Reserving space for 12 sample suffixes
Generating random suffixes
QSorting 12 sample offsets, eliminating duplicates
QSorting sample offsets, eliminating duplicates time: 00:00:00
Multikey QSorting 12 samples
(Using difference cover)
Multikey QSorting samples time: 00:00:00
Calculating bucket sizes
Splitting and merging
Splitting and merging time: 00:00:00
Avg bucket size: 1.45352e+06 (target: 272534)
Converting suffix-array elements to index image
Allocating ftab, absorbFtab
Entering Ebwt loop
Getting block 1 of 1
No samples; assembling all-inclusive block
Sorting block of length 1453518 for bucket 1
(Using difference cover)
Sorting block time: 00:00:00
Returning block of 1453519 for bucket 1
Exited Ebwt loop
fchr[A]: 0
fchr[C]: 358023
fchr[G]: 719961
fchr[T]: 1094992
fchr[$]: 1453518
Exiting Ebwt::buildToDisk()
Returning from initFromVector
Wrote 4681688 bytes to primary EBWT file: 20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/kmer/database.filtered_0.95_term.0.5.representative.fasta.1.bt2
Wrote 2907044 bytes to secondary EBWT file: 20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/kmer/database.filtered_0.95_term.0.5.representative.fasta.2.bt2
Re-opening _in1 and _in2 as input streams
Returning from Ebwt constructor
Headers:
len: 1453518
bwtLen: 1453519
sz: 363380
bwtSz: 363380
lineRate: 6
offRate: 1
offMask: 0xfffffffe
ftabChars: 10
eftabLen: 20
eftabSz: 80
ftabLen: 1048577
ftabSz: 4194308
offsLen: 726760
offsSz: 2907040
lineSz: 64
sideSz: 64
sideBwtSz: 48
sideBwtLen: 192
numSides: 7571
numLines: 7571
ebwtTotLen: 484544
ebwtTotSz: 484544
color: 0
reverse: 0
Total time for call to driver() for forward index: 00:00:00
Reading reference sizes
Time reading reference sizes: 00:00:00
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
Time to join reference sequences: 00:00:00
Time to reverse reference sequence: 00:00:00
bmax according to bmaxDivN setting: 363379
Using parameters --bmax 272535 --dcv 1024
Doing ahead-of-time memory usage test
Passed! Constructing with these parameters: --bmax 272535 --dcv 1024
Constructing suffix-array element generator
Building DifferenceCoverSample
Building sPrime
Building sPrimeOrder
V-Sorting samples
V-Sorting samples time: 00:00:00
Allocating rank array
Ranking v-sort output
Ranking v-sort output time: 00:00:00
Invoking Larsson-Sadakane on ranks
Invoking Larsson-Sadakane on ranks time: 00:00:00
Sanity-checking and returning
Building samples
Reserving space for 12 sample suffixes
Generating random suffixes
QSorting 12 sample offsets, eliminating duplicates
QSorting sample offsets, eliminating duplicates time: 00:00:00
Multikey QSorting 12 samples
(Using difference cover)
Multikey QSorting samples time: 00:00:00
Calculating bucket sizes
Splitting and merging
Splitting and merging time: 00:00:00
Avg bucket size: 1.45352e+06 (target: 272534)
Converting suffix-array elements to index image
Allocating ftab, absorbFtab
Entering Ebwt loop
Getting block 1 of 1
No samples; assembling all-inclusive block
Sorting block of length 1453518 for bucket 1
(Using difference cover)
Sorting block time: 00:00:01
Returning block of 1453519 for bucket 1
Exited Ebwt loop
fchr[A]: 0
fchr[C]: 358023
fchr[G]: 719961
fchr[T]: 1094992
fchr[$]: 1453518
Exiting Ebwt::buildToDisk()
Returning from initFromVector
Wrote 4681688 bytes to primary EBWT file: 20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/kmer/database.filtered_0.95_term.0.5.representative.fasta.rev.1.bt2
Wrote 2907044 bytes to secondary EBWT file: 20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/kmer/database.filtered_0.95_term.0.5.representative.fasta.rev.2.bt2
Re-opening _in1 and _in2 as input streams
Returning from Ebwt constructor
Headers:
len: 1453518
bwtLen: 1453519
sz: 363380
bwtSz: 363380
lineRate: 6
offRate: 1
offMask: 0xfffffffe
ftabChars: 10
eftabLen: 20
eftabSz: 80
ftabLen: 1048577
ftabSz: 4194308
offsLen: 726760
offsSz: 2907040
lineSz: 64
sideSz: 64
sideBwtSz: 48
sideBwtLen: 192
numSides: 7571
numLines: 7571
ebwtTotLen: 484544
ebwtTotSz: 484544
color: 0
reverse: 1
Total time for backward call to driver() for mirror index: 00:00:01
Sun Aug 30 14:46:10 CEST 2020
mapping reads/GCA_011404755.1_ASM1140475v1_genomic_R1.fq
mapping reads/GCA_011404755.1_ASM1140475v1_genomic_R2.fq
1000000 reads; of these:
1000000 (100.00%) were paired; of these:
748800 (74.88%) aligned concordantly 0 times
210899 (21.09%) aligned concordantly exactly 1 time
40301 (4.03%) aligned concordantly >1 times
----
748800 pairs aligned concordantly 0 times; of these:
444 (0.06%) aligned discordantly 1 time
----
748356 pairs aligned 0 times concordantly or discordantly; of these:
1496712 mates make up the pairs; of these:
1485688 (99.26%) aligned 0 times
4421 (0.30%) aligned exactly 1 time
6603 (0.44%) aligned >1 times
25.72% overall alignment rate
Sun Aug 30 14:51:16 CEST 2020
DONE Mapping GCA_011404755.1_ASM1140475v1_genomic of NO_GROUP Group

#Executing /home/dla_mm/jpaganini/data/miniconda3/envs/plasmid_id/bin/sam_to_bam.sh

DEPENDENCY STATUS


samtools �[0;32mINSTALLED�[0m
Default output directory is 20200830_test_output/NO_GROUP/GCA_011404755.1_ASM1140475v1_genomic/mapping
Sun Aug 30 14:51:16 CEST 2020
Converting SAM to sorted indexed BAM in GCA_011404755
Sun Aug 30 14:52:04 CEST 2020
Sorting BAM file in GCA_011404755
Sun Aug 30 14:52:52 CEST 2020
Indexing BAM file in GCA_011404755
Sun Aug 30 14:52:56 CEST 2020
DONE Converting SAM to sorted indexed BAM in GCA_011404755
GCA_011404755.bam removed

#Executing /home/dla_mm/jpaganini/data/miniconda3/envs/plasmid_id/bin/get_coverage.sh

GCA_011404755.1_ASM1140475v1_genomic.sorted.bam not supplied, please, introduce a valid file
ERROR: 1 missing files, aborting execution

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.