Giter Site home page Giter Site logo

dragonflye's Introduction

GitHub release (latest SemVer) Anaconda-Server Badge GitHub Gitpod ready-to-code

NOTE: This is under active development, any feedback will be very useful

dragonflye

πŸ‰ πŸͺ° Assemble bacterial isolate genomes from Nanopore reads

A Quick Note

If you've worked with bacterial sequences, in all likelihood you have used one of Torsten Seemann's tools. One such tool is Shovill, which takes the bacterial genome assembly process and makes it quick and painless. Shovill was developed for paired-end Illumina reads, and there is a fork, shovill-se, which supports single-end reads.

Given the widespread usage of Shovill, and Torsten basically laying much of the groundwork, I decided to use Shovill as a framework for Dragonflye. Dragonflye can be considered a fork of Shovill that supports assembling Oxford Nanopore sequences. By going this route users will not have to relearn parameters, and will already be familiar with the outputs.

At this point, you might be wondering: so Robert you just hacked Shovill to work with ONT reads, why not just call it 'shovill-ont'?

That's because when I asked if there was interest in a "Shovill" for ONT reads, Curtis Kapsak (@kapsakcj) responded:

Curtis Kapsak (@kapsakcj): if wrapping flye , perhaps call it dragonflye (a very fast flye)?.

And, honestly how could I not go with that?!? It's an amazing play-on-words that I'm willing to bet Torsten would be proud of it!

So to sum it up, thank you Torsten for Shovill and providing a framework for Dragonflye.

Introduction

Dragonflye is a pipeline that aims to make assembling Oxford Nanopore reads quick and easy. Still working on the quick part, but I think the easy part is there. Dragonflye currently supports Flye, Miniasm and Raven assemblers, and Racon and Medaka polishers.

Main Steps

  1. Estimate genome size and read length from reads (unless --gsize provided) (kmc)
  2. Filter reads by length (default --minreadlength 1000) (Nanoq)
  3. Reduce FASTQ files to a sensible depth (default --depth 150) (rasusa)
  4. Remove adapters (requires --trim be given) (Porechop)
  5. Assemble with Flye, Miniasm, or Raven
  6. Polish assembly with Racon and/or Medaka
  7. Polish assembly with short reads via Polypolish and/or Pilon
  8. Remove contigs that are too short, too low coverage, or pure homopolymers
  9. Produce final FASTA with nicer names and parsable annotations
  10. Reorient contigs from final FASTA using dnaapler
  11. Output parsable assembly statistics (assembly-scan)

Quick Start

dragonflye --reads my-ont.fastq.gz --outdir dragonflye --gsize 5000000
... LOG TEXT ...
[dragonflye] Final assembly contigs: /home/robert_petit/repos/dragonflye/temp/dragonflye/contigs.fa
[dragonflye] It contains 3 (min=4864) contigs totalling 4939840 bp.
[dragonflye] Dragonfly fossils have been found with wingspans up to two feet (61cm)!
[dragonflye] Done.

ls dragonflye/
contigs.fa  contigs.gfa  dragonflye.log  flye-info.txt  flye.fasta

head -n4 dragonfly/contigs.fa
>contig00001 len=2753792 origname=Utg1024_LN:i:2753792_RC:i:486_XO:i:0 polish=none sw=dragonflye-raven/1.2.0 date=20231031
TTCTATTTATCAGTATCATTACTTTTATATTATCGATAATTAATCCGAACATATCATTAA
TCAAGTTATTATTCGAAGTGGTTTTGCTGCATTTGGAACAGTCGGGTTAAGTATGAACCT
TACCACAGAAGATAATAATGGTATTACTAAAATAATTATTATATTCGTTATGCTTTGCGG

head -n4 dragonfly/contigs.reoriented.fa
>contig00001 len=2753792 origname=Utg1024_LN:i:2753792_RC:i:486_XO:i:0 polish=none sw=dragonflye-raven/1.2.0 date=20231031 rotated=True
ATGTCGGAAAAAGAAATTTGGGAAAAGTGCTTGAAATTGCTCAAGAAAAATTATCAGCTG
TAAGTTACTCAACTTTCCTAAAAGATGACGAGGCTTTACACGATTAAAGATGGTGAAGCT
ATCGTATTATCGAGTATTCCTTTTAATGCAAATTGGTTAAATCAACAATATGCTGAAATT

Installation

Dragonflye is available from Bioconda. Dragonflye includes a lot of programs, so it can take conda a while to solve the environment. Because of this, I personally use Mamba to install it, because it's so much faster.

# With conda
conda create -n dragonflye -c conda-forge -c bioconda dragonflye

# With Mamba (much quicker)
mamba create -n dragonflye -c conda-forge -c bioconda dragonflye

Usage

Dragonflye - A very fast flye

SYNOPSIS
  De novo assembly pipeline for bacterial isolates with Nanopore reads
USAGE
  dragonflye [options] --outdir DIR --reads READS.fastq.gz
GENERAL
  --help          This help
  --version       Print version and exit
  --check         Check dependencies are installed
  --seed N        Random seed to use (default: 42)
INPUT
  --reads XXX     Input Nanopore FASTQ (default: '')
  --depth N       Sub-sample --reads to this depth. Disable with --depth 0 (default: 150)
  --minreadlen N  Minimum read length. Disable with --minreadlength 0 (default: 1000)
  --gsize XXX     Estimated genome size eg. 3.2M <blank=AUTODETECT> (default: '')
OUTPUT
  --outdir XXX    Output folder (default: '')
  --prefix XXX    Prefix to use for final assembly FASTA (default: 'contigs')
  --force         Force overwite of existing output folder (default: OFF)
  --minlen N      Minimum contig length <0=AUTO> (default: 500)
  --mincov n.nn   Minimum contig coverage <0=AUTO> (default: 2)
  --namefmt XXX   Format of contig FASTA IDs in 'printf' style (default: 'contig%05d')
  --keepfiles     Keep intermediate files (default: OFF)
RESOURCES
  --tmpdir XXX    Fast temporary directory (default: '')
  --cpus N        Number of CPUs to use (0=ALL) (default: 8)
  --ram n.nn      Try to keep RAM usage below this many GB (default: 16)
ASSEMBLER
  --assembler XXX Assembler: raven miniasm flye (default: 'flye')
  --opts XXX      Extra assembler options in quotes eg. flye: '--interations' (default: '')
  --nanohq        For Flye, use '--nano-hq' instead of --nano-raw (default: OFF)
POLISHER
  --racon N       Number of polishing rounds to conduct with Racon (default: 1)
  --medaka N      Number of polishing rounds to conduct with Medaka (requires --model) (default: 0)
  --model XXX     The model to be used by Medaka, (Assumes 1 polishing round, if --medaka not used) (default: '')
  --list_models   List the models available to Medaka (default: OFF)
SHORT-READ POLISHER
  --polypolish N  Number of polishing rounds to conduct with Polypolish (requires --R1 and --R2) (default: 1)
  --polypolish_careful Polypolish will ignore any reads with multiple alignments (default: OFF)
  --pilon N       Number of polishing rounds to conduct with Pilon (requires --R1 and --R2) (default: 0)
  --R1 XXX        Read 1 FASTQ to use for polishing (default: '')
  --R2 XXX        Read 2 FASTQ to use for polishing (default: '')
REORIENT
  --noreorient    Disable contig reorientation using dnaapler (default: OFF)
  --dnaapler_mode XXX The mode of reorientation to execute (default: 'all')
  --dnaapler_opts XXX Extra dnaapler options in quotes eg. '--evalue 1e-5' (default: '')
MODULES
  --trim          Enable adaptor trimming (default: OFF)
  --trimopts XXX  Extra porechop options in quotes eg. '--adapter_threshold 80' (default: '')
  --nofilter      Disable read length filtering (default: OFF)
  --nopolish      Disable assembly polishing (default: OFF)
HOMEPAGE
  https://github.com/rpetit3/dragonflye - Robert A Petit III

--depth

Giving an assembler too much data is a bad thing. There comes a point where you are no longer adding new information (as the genome is a fixed size), and only adding more noise (sequencing errors). Because of this Dragonflye will downsample your FASTQ files to a specific depth (defaults to 150x). It estimates depth by dividing read yield by genome size.

--gsize

The genome size is needed to estimate depth and for the assembly stage. If you don't provide --gsize, it will be estimated via k-mer frequencies using kmc. It doesn't need to be a perfect estimate, just in the right ballpark. If you know the genome size it is usually better then the estimate, and will save some time.

--keepfiles

This will keep all the intermediate files in --outdir so you can explore and debug.

--cpus

By default it will attempt to use all available CPU cores.

--ram

Dragonflye will do its best to keep memory usage below this value, but it is not guaranteed. If you are on a HPC cluster, you should make sure you tell your job submission engine a value higher than this.

--assembler

By default it will use FlyeA.

--opts

If you want to provide some assembler-specific parameters you can use the --opts parameter. Make sure you quote the parameters so they get passed as a single string eg. For --assembler flye you might use --opts "--iterations 4 --plasmids".

--racon & --medaka

These two parameters adjust how many polishing rounds are conducted per-polisher. For example, --racon 2 would conduct 2 rounds of polishing with Racon. If --medaka is provided, a model must also be provided with --model.

--model

A valid basecaller model must be provided with --model. If a valid model is provided, but --medaka was not provided it will assume --medaka 1.

--list_models

This will list all basecaller models that are avialable in Medaka.

--polypolish & --pilon & --R1 & --R2

If Illumina short-reads are provided, polishing will be done with Polypolish and/or Pilon. The value of --polypolish (Default 1) is the number of polishing rounds that will be conducted. By default Pilon is turned off.

Choosing which stages to use

Stage Enable Disable
Genome size estimation default --gsize INT
Read subsampling --depth INT --depth 0
Read length filtering default --nofilter
Adapter Trimming --trim default

Environment variables recognised

These env-vars will be used as defaults instead of the built-in defaults. You can use the normal command line option to override them still.

Variable Option Default
$DRAGONFLYE_CPUS --cpus 8
$DRAGONFLYE_RAM --ram 16
$DRAGONFLYE_ASSEMBLER --assembler flye
$TMPDIR --tmpdir /tmp

Output Files

Filename Description
contigs.fa The final assembly you should use
contigs.reoriented.fa If available, a reorientation of the final assembly
contigs.dnaapler.summary.tsv If available, a summary description of reoriented contigs
contigs.gfa Assembly graph
dragonflye.log Full log file for bug reporting
flye.fasta Raw assembly (flye)
flye-info.txt Information about contigs output by Flye
miniasm.fasta Raw assembly (miniasm)
raven.fasta Raw assembly (raven)

FAQ

  • Perl?!?! Perl?!? Really, why Perl?

    Dragonflye is a fok of Shovill, and Shovill was written in Perl. Haha so yeah, instead of writing from scratch, I dusted off the old Perl skills. Upon which the Perl interpretor basically told me I sucked at Perl every time I tried to make a change (haha kept forgetting the semi-colons at the end of the line!).

  • Does dragonflye accept Illumina reads?

    It does, only if you would like to use them for short-read polishing. Otherwise, if you want to assemble just Illumina reads, use Shovill.

  • Doesn't Trycycler already do this?

    Dragonflye is not trying to replicate Trycycler, Trycycler is on a whole 'nother level. If you are looking to get super high quality assemblies with some manual inspection steps in between, use Trycycler. But, if you are looking to just get a quick assembly that you can work with, that's what Dragonfly is for.

  • Can I assemble more than one genome at a time?

    If you would like to assemble more than one genome using Dragonflye, I would recommend you do this with Bactopia. Bactopia will allow you to process a single genome or thousands, and it also includes many other bacterial genome analyses. If you don't want to use Bactopia, I suggest you see the next question!

  • Are there other similar pipelines?

    hybracter is a similar alternative to Dragonflye. It is written in Snakemake and includes many of the same analyses, with many fun additions by @gbouras13. Another alternative is bacass which is a Nextflow pipeline maintained by nf-core.

Feedback

Please file questions, bugs or ideas to the Issue Tracker

Acknowledgements

I would like to personally extend my many thanks and gratitude to the authors of these software packages. Really, thank you very much!

Software Included (19)

Author

Funding

Support for this project came from the Wyoming Public Health Laboratory.

WPHL

dragonflye's People

Contributors

rpetit3 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

dragonflye's Issues

Missed plasmid in reoriented.fa

Hello,

thank you for updating dragonflye! I tried the last version and found difference in a number of plasmids in "contigs.fa" and "reoriented.fa". Maybe I didn't understand an output description, because I thought that all plasmids, both oriented and not oriented, will be in the "reoriented" file.
Also, two small plasmids were missed in all versions of assemblies (ColRNA and Col440I), but I think it is a "bug" of Flye assembly. Unicycler long-only assembly had these plasmids.
Upd
I found the plasmid in a 'reoriented" file. It was missed because after reorientation coverage became 62%, but it was 100% in the non-reoriented file.

Best regards,
Valery

Niche QoL inquiry: polypolish acceptance of nonstandard file affixes?

Hi Robert,

Is there any way for dragonflye to accept nonstandard file inputs for polypolish?

e.g. get some version of this (fq.gz for R1/R2) working:

dragonflye\
 --cpus 12\
 --ram 12\
 --reads $RUN/gpy646sup/${SAMPLE}_merged_barcode*.fastq.gz\
 --R1 $RUN/d40_JING_out/output/${SAMPLE}*/${SAMPLE}*_val_1.fq.gz\
 --R2 $RUN/d40_JING_out/output/${SAMPLE}*/${SAMPLE}*_val_2.fq.gz\
 --depth 0\
 --nanohq\
 --medaka 2\
 --model r1041_e82_400bps_sup_g615\
 --polypolish 1\
 --outdir $OUTDIR/${SAMPLE}_dflye_m180p_out\
 --force  || echo "dflye error in i=$i"

instead of this (standard fastq.gz):

dragonflye\
 --cpus 12\
 --ram 12\
 --reads $RUN/gpy646sup/${SAMPLE}_merged_barcode*.fastq.gz\
 --R1 $RUN/d40_JING_out/output/${SAMPLE}*/${SAMPLE}*_val_1.fastq.gz\
 --R2 $RUN/d40_JING_out/output/${SAMPLE}*/${SAMPLE}*_val_2.fastq.gz\
 --depth 0\
 --nanohq\
 --medaka 2\
 --model r1041_e82_400bps_sup_g615\
 --polypolish 1\
 --outdir $OUTDIR/${SAMPLE}_dflye_m180p_out\
 --force  || echo "dflye error in i=$i"

I copy the .fastq.gz as .fq.gz and use the version immediately above for now, but I imagine there must be some less-bad way to just use the erstwhile-usable poorly-named files from another pipeline. (I still haven't been able to get bactopia-dev to spin up singularity containers with our SLURM nodes.)

Continued thanks for your amazing work either way!

medaka fails to open model file for r1041_e82_400bps_sup_g615

Hi Robert,

I have been testing your beauitiful new version using the biocontainers docker image for v1.1.1.

Unfortunately I ran into an issue with medaka again, actually the same that I was experiencing myself with my custom docker image (as mentioned in issue #19)

My run with model r1041_e82_400bps_sup_v4.2.0 went fine and completed successfully.

Another run with model r1041_e82_400bps_sup_g615 failed, see excerpt of dragonfly log below (full log attached):

[dragonflye] Polishing with Medaka (1 rounds)
[dragonflye] Running: medaka_consensus -i READS.fq.gz -d flye/polish/racon/1/consensus.fasta -o flye/polish/medaka/1 -m r1041_e82_400bps_sup_g615 -t 4  2>&1 | sed 's/^/[polishing - medaka (1 of 1)] /' | tee -a dragonflye.log
[polishing - medaka (1 of 1)] Checking program versions
[polishing - medaka (1 of 1)] This is medaka 1.8.0
[polishing - medaka (1 of 1)] Program    Version    Required   Pass
[polishing - medaka (1 of 1)] bcftools   1.17       1.11       True
[polishing - medaka (1 of 1)] bgzip      1.17       1.11       True
[polishing - medaka (1 of 1)] minimap2   2.26       2.11       True
[polishing - medaka (1 of 1)] samtools   1.17       1.11       True
[polishing - medaka (1 of 1)] tabix      1.17       1.11       True
[polishing - medaka (1 of 1)] Traceback (most recent call last):
[polishing - medaka (1 of 1)]   File "/usr/local/bin/medaka", line 11, in <module>
[polishing - medaka (1 of 1)]     sys.exit(main())
[polishing - medaka (1 of 1)]   File "/usr/local/lib/python3.10/site-packages/medaka/medaka.py", line 724, in main
[polishing - medaka (1 of 1)]     args.func(args)
[polishing - medaka (1 of 1)]   File "/usr/local/lib/python3.10/site-packages/medaka/medaka.py", line 267, in is_rle_model
[polishing - medaka (1 of 1)]     print(is_rle_encoder(args.model))
[polishing - medaka (1 of 1)]   File "/usr/local/lib/python3.10/site-packages/medaka/medaka.py", line 274, in is_rle_encoder
[polishing - medaka (1 of 1)]     encoder = modelstore.get_meta('feature_encoder')
[polishing - medaka (1 of 1)]   File "/usr/local/lib/python3.10/site-packages/medaka/datastore.py", line 193, in get_meta
[polishing - medaka (1 of 1)]     self.unpack()
[polishing - medaka (1 of 1)]   File "/usr/local/lib/python3.10/site-packages/medaka/datastore.py", line 118, in unpack
[polishing - medaka (1 of 1)]     with tarfile.open(self.filepath) as tar:
[polishing - medaka (1 of 1)]   File "/usr/local/lib/python3.10/tarfile.py", line 1639, in open
[polishing - medaka (1 of 1)]     raise ReadError(f"file could not be opened successfully:\n{error_msgs_summary}")
[polishing - medaka (1 of 1)] tarfile.ReadError: file could not be opened successfully:
[polishing - medaka (1 of 1)] - method gz: ReadError('empty file')
[polishing - medaka (1 of 1)] - method bz2: ReadError('not a bzip2 file')
[polishing - medaka (1 of 1)] - method xz: ReadError('not an lzma file')
[polishing - medaka (1 of 1)] - method tar: ReadError('empty file')

[dragonflye] Error running command: medaka_consensus -i READS.fq.gz -d flye/polish/racon/1/consensus.fasta -o flye/polish/medaka/1 -m r1041_e82_400bps_sup_g615 -t 4  2>&1 | sed 's/^/[polishing - medaka (1 of 1)] /' | tee -a dragonflye.log

I know this sounds like a medaka issue, but do you have a clue how to fix this before I escalate?
Unfortunately this model is the main model my users are looking to using...

dragonflye.log

Failed to run medaka consensus. - ModelStoreTF exception <class 'NotImplementedError'>

Hello @rpetit3 ,

Thank you for developing dragonflye.

We are trying to use dragonflye to perform assembly on the E. faecium isolates with the following command:
dragonflye --reads 02_fastq/220818_VRE1.fastq.gz --gsize 2.8M --outdir 04_dragonflye/220818_VRE1 --cpus 20 --nanohq --model r941_min_sup_g507

However, we encounted an error in the polishing step when using medaka. Please find the error logs below:

[dragonflye] Hello gilmansiu3
[dragonflye] You ran: /home/gilmansiu3/miniconda3/envs/dragonflye/bin/dragonflye --reads 02_fastq/220818_VRE1.fastq.gz --gsize 2.8M --outdir 04_dragonflye/220818_VRE1 --cpus 20 --nanohq --model r941_min_sup_g507
[dragonflye] This is dragonflye 1.0.13
[dragonflye] Written by Robert A Petit III
[dragonflye] Homepage is https://github.com/rpetit3/dragonflye
[dragonflye] Operating system is linux
[dragonflye] Perl version is v5.32.1
[dragonflye] Machine has 20 CPU cores and 125.72 GB RAM
[dragonflye] Verifying input model (--model): r941_min_sup_g507
[dragonflye] Model r941_min_sup_g507 verified!
[dragonflye] Valid model provided, but number of Medaka rounds (--medaka) not given, assuming 1 round
[dragonflye] Using any2fasta - /home/gilmansiu3/miniconda3/envs/dragonflye/bin/any2fasta | any2fasta 0.4.2
[dragonflye] Using assembly-scan - /home/gilmansiu3/miniconda3/envs/dragonflye/bin/assembly-scan | assembly-scan 0.4.1
[dragonflye] Using bwa - /home/gilmansiu3/miniconda3/envs/dragonflye/bin/bwa | Version: 0.7.17-r1188
[dragonflye] Using fastp - /home/gilmansiu3/miniconda3/envs/dragonflye/bin/fastp | fastp 0.23.2
[dragonflye] Using flye - /home/gilmansiu3/miniconda3/envs/dragonflye/bin/flye | 2.9-b1768
[dragonflye] Using kmc - /home/gilmansiu3/miniconda3/envs/dragonflye/bin/kmc | K-Mer Counter (KMC) ver. 3.2.1 (2022-01-04)
[dragonflye] Using medaka - /home/gilmansiu3/miniconda3/envs/dragonflye/bin/medaka | medaka 1.6.1
[dragonflye] Using miniasm - /home/gilmansiu3/miniconda3/envs/dragonflye/bin/miniasm | 0.3-r179
[dragonflye] Using minimap2 - /home/gilmansiu3/miniconda3/envs/dragonflye/bin/minimap2 | 2.24-r1122
[dragonflye] Using nanoq - /home/gilmansiu3/miniconda3/envs/dragonflye/bin/nanoq | nanoq 0.9.0
[dragonflye] Using pigz - /home/gilmansiu3/miniconda3/envs/dragonflye/bin/pigz | pigz 2.6
[dragonflye] Using pilon - /home/gilmansiu3/miniconda3/envs/dragonflye/bin/pilon | Pilon version 1.24 Thu Jan 28 13:00:45 2021 -0500
[dragonflye] Using polypolish - /home/gilmansiu3/miniconda3/envs/dragonflye/bin/polypolish | Polypolish v0.5.0
[dragonflye] Using porechop - /home/gilmansiu3/miniconda3/envs/dragonflye/bin/porechop | 0.2.4
[dragonflye] Using racon - /home/gilmansiu3/miniconda3/envs/dragonflye/bin/racon | 1.5.0
[dragonflye] Using rasusa - /home/gilmansiu3/miniconda3/envs/dragonflye/bin/rasusa | rasusa 0.7.0
[dragonflye] Using raven - /home/gilmansiu3/miniconda3/envs/dragonflye/bin/raven | 1.8.1
[dragonflye] Using samclip - /home/gilmansiu3/miniconda3/envs/dragonflye/bin/samclip | samclip 0.4.0
[dragonflye] Using samtools - /home/gilmansiu3/miniconda3/envs/dragonflye/bin/samtools | Version: 1.15.1 (using htslib 1.15.1)
[dragonflye] Using seqtk - /home/gilmansiu3/miniconda3/envs/dragonflye/bin/seqtk | Version: 1.3-r106
[dragonflye] Using tempdir: /tmp/tXf0FpvLDL
[dragonflye] Changing into folder: /mnt/data/Species-specific/CAUR/04_dragonflye/220818_VRE1
[dragonflye] Collecting raw read statistics with 'seqtk'
[dragonflye] Running: seqtk fqchk -q3 /mnt/data/Species-specific/CAUR/02_fastq/220818_VRE1.fastq.gz 2>&1 1>/tmp/deywkg47eV | sed 's/^/[seqtk] /' | tee -a dragonflye.log
[dragonflye] Read stats: avg_len = 4267
[dragonflye] Read stats: max_len = 45857
[dragonflye] Read stats: min_len = 1000
[dragonflye] Read stats: total_bp = 432559422
[dragonflye] Using genome size 2800000 bp
[dragonflye] Estimated sequencing depth: 154x
[dragonflye] Filter reads based on length and/or quality
[dragonflye] Running: nanoq --min-len 1000 --input /mnt/data/Species-specific/CAUR/02_fastq/220818_VRE1.fastq.gz --min-qual 0 2>&1 1> READS.filt.fq | sed 's/^/[nanoq] /' | tee -a dragonflye.log
[dragonflye] Running: pigz -f -p 20 --fast READS.filt.fq 2>&1 | sed 's/^/[pigz] /' | tee -a dragonflye.log
[dragonflye] No read depth reduction requested or necessary.
[dragonflye] No read adapter trimming requested.
[dragonflye] Running: ln -sf READS.filt.fq.gz READS.fq.gz 2>&1 | sed 's/^/[ln] /' | tee -a dragonflye.log
[dragonflye] Collecting qc'd read statistics with 'seqtk'
[dragonflye] Running: seqtk fqchk -q3 READS.fq.gz 2>&1 1>/tmp/l7Duy6iujE | sed 's/^/[seqtk] /' | tee -a dragonflye.log
[dragonflye] Final Read stats: min_len = 1000
[dragonflye] Final Read stats: max_len = 45857
[dragonflye] Final Read stats: avg_len = 4267
[dragonflye] Final Read stats: total_bp = 432559422
[dragonflye] Average read length looks like 4267 bp
[dragonflye] Assembling reads with 'flye'
[dragonflye] Running: flye --nano-hq READS.fq.gz -g 2800000 -i 0 --threads 20 -o flye 2>&1 | sed 's/^/[flye] /' | tee -a dragonflye.log
[flye] [2022-11-25 15:10:17] INFO: Starting Flye 2.9-b1768
[flye] [2022-11-25 15:10:17] INFO: >>>STAGE: configure
[flye] [2022-11-25 15:10:17] INFO: Configuring run
[flye] [2022-11-25 15:10:22] INFO: Total read length: 432559422
[flye] [2022-11-25 15:10:22] INFO: Input genome size: 2800000
[flye] [2022-11-25 15:10:22] INFO: Estimated coverage: 154
[flye] [2022-11-25 15:10:22] INFO: Reads N50/N90: 5835 / 2000
[flye] [2022-11-25 15:10:22] INFO: Minimum overlap set to 2000
[flye] [2022-11-25 15:10:22] INFO: >>>STAGE: assembly
[flye] [2022-11-25 15:10:22] INFO: Assembling disjointigs
[flye] [2022-11-25 15:10:22] INFO: Reading sequences
[flye] [2022-11-25 15:10:27] INFO: Building minimizer index
[flye] [2022-11-25 15:10:27] INFO: Pre-calculating index storage
[flye] 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[flye] [2022-11-25 15:10:30] INFO: Filling index
[flye] 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[flye] [2022-11-25 15:10:39] INFO: Extending reads
[flye] [2022-11-25 15:11:21] INFO: Overlap-based coverage: 111
[flye] [2022-11-25 15:11:21] INFO: Median overlap divergence: 0.0539394
[flye] 0% 10% 90% 100%
[flye] [2022-11-25 15:12:13] INFO: Assembled 6 disjointigs
[flye] [2022-11-25 15:12:13] INFO: Generating sequence
[flye] 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[flye] [2022-11-25 15:12:14] INFO: Filtering contained disjointigs
[flye] 0% 10% 30% 50% 60% 80% 100%
[flye] [2022-11-25 15:12:14] INFO: Contained seqs: 0
[flye] [2022-11-25 15:12:14] INFO: >>>STAGE: consensus
[flye] [2022-11-25 15:12:14] INFO: Running Minimap2
[flye] [2022-11-25 15:12:43] INFO: Computing consensus
[flye] [2022-11-25 15:14:22] INFO: Alignment error rate: 0.067847
[flye] [2022-11-25 15:14:22] INFO: >>>STAGE: repeat
[flye] [2022-11-25 15:14:22] INFO: Building and resolving repeat graph
[flye] [2022-11-25 15:14:22] INFO: Parsing disjointigs
[flye] [2022-11-25 15:14:22] INFO: Building repeat graph
[flye] 0% 10% 30% 50% 60% 80% 100%
[flye] [2022-11-25 15:14:23] INFO: Median overlap divergence: 0.00335946
[flye] [2022-11-25 15:14:23] INFO: Parsing reads
[flye] [2022-11-25 15:14:27] INFO: Aligning reads to the graph
[flye] 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[flye] [2022-11-25 15:14:36] INFO: Aligned read sequence: 383411070 / 389252928 (0.984992)
[flye] [2022-11-25 15:14:36] INFO: Median overlap divergence: 0.0258973
[flye] [2022-11-25 15:14:36] INFO: Mean edge coverage: 122
[flye] [2022-11-25 15:14:36] INFO: Simplifying the graph
[flye] [2022-11-25 15:14:36] INFO: >>>STAGE: contigger
[flye] [2022-11-25 15:14:36] INFO: Generating contigs
[flye] [2022-11-25 15:14:36] INFO: Reading sequences
[flye] [2022-11-25 15:14:41] INFO: Generated 7 contigs
[flye] [2022-11-25 15:14:41] INFO: Added 0 scaffold connections
[flye] [2022-11-25 15:14:41] INFO: >>>STAGE: finalize
[flye] [2022-11-25 15:14:41] INFO: Assembly statistics:
[flye]
[flye] Total length: 3161882
[flye] Fragments: 7
[flye] Fragments N50: 2791738
[flye] Largest frg: 2791738
[flye] Scaffolds: 0
[flye] Mean coverage: 121
[flye]
[flye] [2022-11-25 15:14:41] INFO: Final assembly: /mnt/data/Species-specific/CAUR/04_dragonflye/220818_VRE1/flye/assembly.fasta
[dragonflye] Polishing with Racon (1 rounds)
[dragonflye] Running: minimap2 -t 19 -x map-ont flye.fasta READS.fq.gz 2>&1 1> flye/polish/racon/1/aligments.paf | sed 's/^/[polishing - racon (1 of 1)] /' | tee -a dragonflye.log
[polishing - racon (1 of 1)] [M::mm_idx_gen::0.0551.01] collected minimizers
[polishing - racon (1 of 1)] [M::mm_idx_gen::0.061
2.58] sorted minimizers
[polishing - racon (1 of 1)] [M::main::0.0612.58] loaded/built the index for 7 target sequence(s)
[polishing - racon (1 of 1)] [M::mm_mapopt_update::0.067
2.45] mid_occ = 26
[polishing - racon (1 of 1)] [M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 7
[polishing - racon (1 of 1)] [M::mm_idx_stat::0.0712.36] distinct minimizers: 534042 (95.78% are singletons); average occurrences: 1.109; average spacing: 5.339; total length: 3161882
[polishing - racon (1 of 1)] [M::worker_pipeline::11.029
11.97] mapped 101377 sequences
[polishing - racon (1 of 1)] [M::main] Version: 2.24-r1122
[polishing - racon (1 of 1)] [M::main] CMD: minimap2 -t 19 -x map-ont flye.fasta READS.fq.gz
[polishing - racon (1 of 1)] [M::main] Real time: 11.035 sec; CPU: 132.074 sec; Peak RSS: 0.491 GB
[dragonflye] Running: racon -t 20 READS.fq.gz flye/polish/racon/1/aligments.paf flye.fasta 2>&1 1> flye/polish/racon/1/consensus.fasta | sed 's/^/[polishing - racon (1 of 1)] /' | tee -a dragonflye.log
[polishing - racon (1 of 1)] [racon::Polisher::initialize] loaded target sequences 0.012280 s
[polishing - racon (1 of 1)] [racon::Polisher::initialize] loaded sequences 4.787064 s
[polishing - racon (1 of 1)] [racon::Polisher::initialize] loaded overlaps 0.085765 s
[racon::Polisher::initialize] aligning overlaps [====================] 8.314751 s ] 0.522070 s
[polishing - racon (1 of 1)] [racon::Polisher::initialize] transformed data into windows 0.466438 s
[racon::Polisher::polish] generating consensus [====================] 44.492387 s ] 3.252132 s
[polishing - racon (1 of 1)] [racon::Polisher::] total = 58.222555 s
[dragonflye] Polishing with Medaka (1 rounds)
[dragonflye] Running: medaka_consensus -i READS.fq.gz -d flye/polish/racon/1/consensus.fasta -o flye/polish/medaka/1 -m r941_min_sup_g507 -t 20 2>&1 | sed 's/^/[polishing - medaka (1 of 1)] /' | tee -a dragonflye.log
[polishing - medaka (1 of 1)] Checking program versions
[polishing - medaka (1 of 1)] This is medaka 1.6.1
[polishing - medaka (1 of 1)] Program Version Required Pass
[polishing - medaka (1 of 1)] bcftools 1.15.1 1.11 True
[polishing - medaka (1 of 1)] bgzip 1.15.1 1.11 True
[polishing - medaka (1 of 1)] minimap2 2.24 2.11 True
[polishing - medaka (1 of 1)] samtools 1.15.1 1.11 True
[polishing - medaka (1 of 1)] tabix 1.15.1 1.11 True
[polishing - medaka (1 of 1)] Aligning basecalls to draft
[polishing - medaka (1 of 1)] Creating fai index file /mnt/data/Species-specific/CAUR/04_dragonflye/220818_VRE1/flye/polish/racon/1/consensus.fasta.fai
[polishing - medaka (1 of 1)] Creating mmi index file /mnt/data/Species-specific/CAUR/04_dragonflye/220818_VRE1/flye/polish/racon/1/consensus.fasta.map-ont.mmi
[polishing - medaka (1 of 1)] [M::mm_idx_gen::0.1031.02] collected minimizers
[polishing - medaka (1 of 1)] [M::mm_idx_gen::0.113
1.19] sorted minimizers
[polishing - medaka (1 of 1)] [M::main::0.1291.17] loaded/built the index for 7 target sequence(s)
[polishing - medaka (1 of 1)] [M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 7
[polishing - medaka (1 of 1)] [M::mm_idx_stat::0.133
1.16] distinct minimizers: 529319 (95.23% are singletons); average occurrences: 1.119; average spacing: 5.339; total length: 3163172
[polishing - medaka (1 of 1)] [M::main] Version: 2.24-r1122
[polishing - medaka (1 of 1)] [M::main] CMD: minimap2 -I 16G -x map-ont -d /mnt/data/Species-specific/CAUR/04_dragonflye/220818_VRE1/flye/polish/racon/1/consensus.fasta.map-ont.mmi /mnt/data/Species-specific/CAUR/04_dragonflye/220818_VRE1/flye/polish/racon/1/consensus.fasta
[polishing - medaka (1 of 1)] [M::main] Real time: 0.135 sec; CPU: 0.157 sec; Peak RSS: 0.033 GB
[polishing - medaka (1 of 1)] [M::main::0.0191.03] loaded/built the index for 7 target sequence(s)
[polishing - medaka (1 of 1)] [M::mm_mapopt_update::0.024
1.02] mid_occ = 27
[polishing - medaka (1 of 1)] [M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 7
[polishing - medaka (1 of 1)] [M::mm_idx_stat::0.0281.02] distinct minimizers: 529319 (95.23% are singletons); average occurrences: 1.119; average spacing: 5.339; total length: 3163172
[polishing - medaka (1 of 1)] [M::worker_pipeline::21.050
13.97] mapped 101377 sequences
[polishing - medaka (1 of 1)] [M::main] Version: 2.24-r1122
[polishing - medaka (1 of 1)] [M::main] CMD: minimap2 -x map-ont --secondary=no -L --MD -A 2 -B 4 -O 4,24 -E 2,1 -t 20 -a /mnt/data/Species-specific/CAUR/04_dragonflye/220818_VRE1/flye/polish/racon/1/consensus.fasta.map-ont.mmi /mnt/data/Species-specific/CAUR/04_dragonflye/220818_VRE1/READS.filt.fq.gz
[polishing - medaka (1 of 1)] [M::main] Real time: 21.053 sec; CPU: 294.060 sec; Peak RSS: 1.828 GB
[polishing - medaka (1 of 1)] [bam_sort_core] merging from 0 files and 20 in-memory blocks...
[polishing - medaka (1 of 1)] Running medaka consensus
[polishing - medaka (1 of 1)] [15:16:26 - Predict] Reducing threads to 2, anymore is a waste.
[polishing - medaka (1 of 1)] [15:16:27 - Predict] Setting tensorflow inter/intra-op threads to 2/1.
[polishing - medaka (1 of 1)] [15:16:27 - Predict] Processing region(s): contig_1:0-235990 contig_2:0-2793735 contig_3:0-45936 contig_4:0-8983 contig_5:0-32435 contig_6:0-34261 contig_7:0-11832
[polishing - medaka (1 of 1)] [15:16:27 - Predict] Using model: /home/gilmansiu3/miniconda3/envs/dragonflye/lib/python3.8/site-packages/medaka/data/r941_min_sup_g507_model.tar.gz.
[polishing - medaka (1 of 1)] [15:16:27 - Predict] Found a GPU.
[polishing - medaka (1 of 1)] [15:16:27 - Predict] If cuDNN errors are observed, try setting the environment variable TF_FORCE_GPU_ALLOW_GROWTH=true. To explicitely disable use of cuDNN use the commandline option `--disable_cudnn. If OOM (out of memory) errors are found please reduce batch size.
[polishing - medaka (1 of 1)] [15:16:27 - Predict] Processing 9 long region(s) with batching.
[polishing - medaka (1 of 1)] [15:16:27 - ModelLoad] GPU available: building model with cudnn optimization
[polishing - medaka (1 of 1)] [15:16:27 - MdlStrTF] ModelStoreTF exception <class 'NotImplementedError'>
[polishing - medaka (1 of 1)] Traceback (most recent call last):
[polishing - medaka (1 of 1)] File "/home/gilmansiu3/miniconda3/envs/dragonflye/bin/medaka", line 11, in
[polishing - medaka (1 of 1)] sys.exit(main())
[polishing - medaka (1 of 1)] File "/home/gilmansiu3/miniconda3/envs/dragonflye/lib/python3.8/site-packages/medaka/medaka.py", line 720, in main
[polishing - medaka (1 of 1)] args.func(args)
[polishing - medaka (1 of 1)] File "/home/gilmansiu3/miniconda3/envs/dragonflye/lib/python3.8/site-packages/medaka/prediction.py", line 160, in predict
[polishing - medaka (1 of 1)] model = model_store.load_model(time_steps=args.chunk_len)
[polishing - medaka (1 of 1)] File "/home/gilmansiu3/miniconda3/envs/dragonflye/lib/python3.8/site-packages/medaka/datastore.py", line 159, in load_model
[polishing - medaka (1 of 1)] self.model = model_partial_function(time_steps=time_steps)
[polishing - medaka (1 of 1)] File "/home/gilmansiu3/miniconda3/envs/dragonflye/lib/python3.8/site-packages/medaka/models.py", line 147, in build_model
[polishing - medaka (1 of 1)] model.add(Bidirectional(gru, input_shape=input_shape))
[polishing - medaka (1 of 1)] File "/home/gilmansiu3/miniconda3/envs/dragonflye/lib/python3.8/site-packages/tensorflow/python/training/tracking/base.py", line 456, in _method_wrapper
[polishing - medaka (1 of 1)] result = method(self, *args, **kwargs)
[polishing - medaka (1 of 1)] File "/home/gilmansiu3/miniconda3/envs/dragonflye/lib/python3.8/site-packages/tensorflow/python/keras/engine/sequential.py", line 198, in add
[polishing - medaka (1 of 1)] layer(x)
[polishing - medaka (1 of 1)] File "/home/gilmansiu3/miniconda3/envs/dragonflye/lib/python3.8/site-packages/tensorflow/python/keras/layers/wrappers.py", line 531, in call
[polishing - medaka (1 of 1)] return super(Bidirectional, self).call(inputs, **kwargs)
[polishing - medaka (1 of 1)] File "/home/gilmansiu3/miniconda3/envs/dragonflye/lib/python3.8/site-packages/tensorflow/python/keras/engine/base_layer.py", line 922, in call
[polishing - medaka (1 of 1)] outputs = call_fn(cast_inputs, *args, **kwargs)
[polishing - medaka (1 of 1)] File "/home/gilmansiu3/miniconda3/envs/dragonflye/lib/python3.8/site-packages/tensorflow/python/keras/layers/wrappers.py", line 644, in call
[polishing - medaka (1 of 1)] y = self.forward_layer(forward_inputs,
[polishing - medaka (1 of 1)] File "/home/gilmansiu3/miniconda3/envs/dragonflye/lib/python3.8/site-packages/tensorflow/python/keras/layers/recurrent.py", line 654, in call
[polishing - medaka (1 of 1)] return super(RNN, self).call(inputs, **kwargs)
[polishing - medaka (1 of 1)] File "/home/gilmansiu3/miniconda3/envs/dragonflye/lib/python3.8/site-packages/tensorflow/python/keras/engine/base_layer.py", line 922, in call
[polishing - medaka (1 of 1)] outputs = call_fn(cast_inputs, *args, **kwargs)
[polishing - medaka (1 of 1)] File "/home/gilmansiu3/miniconda3/envs/dragonflye/lib/python3.8/site-packages/tensorflow/python/keras/layers/recurrent_v2.py", line 408, in call
[polishing - medaka (1 of 1)] inputs, initial_state, _ = self._process_inputs(inputs, initial_state, None)
[polishing - medaka (1 of 1)] File "/home/gilmansiu3/miniconda3/envs/dragonflye/lib/python3.8/site-packages/tensorflow/python/keras/layers/recurrent.py", line 848, in _process_inputs
[polishing - medaka (1 of 1)] initial_state = self.get_initial_state(inputs)
[polishing - medaka (1 of 1)] File "/home/gilmansiu3/miniconda3/envs/dragonflye/lib/python3.8/site-packages/tensorflow/python/keras/layers/recurrent.py", line 636, in get_initial_state
[polishing - medaka (1 of 1)] init_state = get_initial_state_fn(
[polishing - medaka (1 of 1)] File "/home/gilmansiu3/miniconda3/envs/dragonflye/lib/python3.8/site-packages/tensorflow/python/keras/layers/recurrent.py", line 1910, in get_initial_state
[polishing - medaka (1 of 1)] return _generate_zero_filled_state_for_cell(self, inputs, batch_size, dtype)
[polishing - medaka (1 of 1)] File "/home/gilmansiu3/miniconda3/envs/dragonflye/lib/python3.8/site-packages/tensorflow/python/keras/layers/recurrent.py", line 2926, in _generate_zero_filled_state_for_cell
[polishing - medaka (1 of 1)] return _generate_zero_filled_state(batch_size, cell.state_size, dtype)
[polishing - medaka (1 of 1)] File "/home/gilmansiu3/miniconda3/envs/dragonflye/lib/python3.8/site-packages/tensorflow/python/keras/layers/recurrent.py", line 2944, in _generate_zero_filled_state
[polishing - medaka (1 of 1)] return create_zeros(state_size)
[polishing - medaka (1 of 1)] File "/home/gilmansiu3/miniconda3/envs/dragonflye/lib/python3.8/site-packages/tensorflow/python/keras/layers/recurrent.py", line 2939, in create_zeros
[polishing - medaka (1 of 1)] return array_ops.zeros(init_state_size, dtype=dtype)
[polishing - medaka (1 of 1)] File "/home/gilmansiu3/miniconda3/envs/dragonflye/lib/python3.8/site-packages/tensorflow/python/ops/array_ops.py", line 2677, in wrapped
[polishing - medaka (1 of 1)] tensor = fun(*args, **kwargs)
[polishing - medaka (1 of 1)] File "/home/gilmansiu3/miniconda3/envs/dragonflye/lib/python3.8/site-packages/tensorflow/python/ops/array_ops.py", line 2721, in zeros
[polishing - medaka (1 of 1)] output = _constant_if_small(zero, shape, dtype, name)
[polishing - medaka (1 of 1)] File "/home/gilmansiu3/miniconda3/envs/dragonflye/lib/python3.8/site-packages/tensorflow/python/ops/array_ops.py", line 2662, in _constant_if_small
[polishing - medaka (1 of 1)] if np.prod(shape) < 1000:
[polishing - medaka (1 of 1)] File "<array_function internals>", line 180, in prod
[polishing - medaka (1 of 1)] File "/home/gilmansiu3/.local/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 3045, in prod
[polishing - medaka (1 of 1)] return _wrapreduction(a, np.multiply, 'prod', axis, dtype, out,
[polishing - medaka (1 of 1)] File "/home/gilmansiu3/.local/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 86, in _wrapreduction
[polishing - medaka (1 of 1)] return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
[polishing - medaka (1 of 1)] File "/home/gilmansiu3/miniconda3/envs/dragonflye/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 748, in array
[polishing - medaka (1 of 1)] raise NotImplementedError("Cannot convert a symbolic Tensor ({}) to a numpy"
[polishing - medaka (1 of 1)] NotImplementedError: Cannot convert a symbolic Tensor (bidirectional/forward_gru1/strided_slice:0) to a numpy array.
[polishing - medaka (1 of 1)] Failed to run medaka consensus.
[dragonflye] Error running command: medaka_consensus -i READS.fq.gz -d flye/polish/racon/1/consensus.fasta -o flye/polish/medaka/1 -m r941_min_sup_g507 -t 20 2>&1 | sed 's/^/[polishing - medaka (1 of 1)] /' | tee -a dragonflye.log

Best regards,
Eddie

Problem with conda installing

Hello, Robert,

Thank you for a great tool!
I had dragonflye 1.0.10 and now I decided to create new env with the new version. And I have problem the same as was with bactopia ( bactopia/bactopia#334 ):

conda install -c bioconda dragonflye
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: \ 

and the process ends.
Are there any ideas how could I fix it?

A lot of thank,
Valery

Can't locate FindBin.pm in @INC

I don't know if this will be an issue to most of your users, but I installed dragonflye via micromamba in a gitpod environment and ran into the following error:

$ dragonflye --help
Can't locate FindBin.pm in @INC (you may need to install the FindBin module) (@INC contains: /etc/perl /usr/local/lib/x86_64-linux-gnu/perl/5.36.0 /usr/local/share/perl/5.36.0 /usr/lib/x86_64-linux-gnu/perl5/5.36 /usr/share/perl5 /usr/lib/x86_64-linux-gnu/perl-base /usr/lib/x86_64-linux-gnu/perl/5.36 /usr/share/perl/5.36 /usr/local/lib/site_perl) at /opt/conda/bin/dragonflye line 58.
BEGIN failed--compilation aborted at /opt/conda/bin/dragonflye line 58.

Medaka v1.7.3

Dear @rpetit3, is dragonFlye still being maintained?

Selfish request for medaka to be updated if so!

(I can never get medaka working independently polishing my Flye assemblies, so when I'm doing LR-only assemblies like to use dragonFlye. A bit funny since I originally used dragonFlye for quick&easy polypolish! Such a nice pipeline:)

Stuck in kmc part

Hi @rpetit3 ,

Thank you for developing this tool! It is amazing.

I have just tried to run this tool, but sometimes it stuck in the kmc and doesn't continue running (I checked the cpu usage from htop). I quit the terminal and rerun again with option --force. The run will then be successfully completed. May I know how to solve this problem?

Thank you very much!

how to check software in dragonflye?

Hi,

I don't know how dragonflye check software version. Dragonflye still using system bin path software after conda evn was activated.
source /Bio/User/kxie/software/miniforge3/bin/activate dragonflye
image

Why dragonflye don't use conda enviroment version? Some software in my system are very old.
Like fastp, there are no --unpaired1 --unpaired2 options in early versions, so the pipeline stop with error like following:
image

Best,
Kun

homologous polishing

Hi Robert,
First of all thank you for this useful tool. I'd like to suggest to add homopolish as a further polishing step (step 6.5?) in the pipeline.

mismatch between model names valid for dragonflye 1.1.0 and medaka 1.8.0

Hi,

I have made a conda install of dragonflye (within a docker image), forcing the dependencies for flye and medaka to be the latest versions:

micromamba install -n base -y -c conda-forge -c bioconda \
    flye=2.9.2 \
    medaka=1.8.0 \
    dragonflye=1.1.0

this works, but if I want to specifiy the use of the latest model r1041_e82_400bps_sup_v420 , I get an error at the medaka stage:

[...]
[dragonflye] Running: medaka_consensus -i READS.fq.gz -d flye/polish/racon/1/consensus.fasta -o flye/polish/medaka/1 -m r1041_e82_400bps_sup_v420 -t 4  2>&1 | sed 's/^/[polishing - medaka (1 of 1)] /' | tee -a dragonflye.log
[polishing - medaka (1 of 1)] Traceback (most recent call last):
[polishing - medaka (1 of 1)]   File "/opt/conda/lib/python3.10/site-packages/medaka/medaka.py", line 35, in __call__
[polishing - medaka (1 of 1)]     model_fp = medaka.models.resolve_model(val)
[polishing - medaka (1 of 1)]   File "/opt/conda/lib/python3.10/site-packages/medaka/models.py", line 31, in resolve_model
[polishing - medaka (1 of 1)]     raise ValueError(
[polishing - medaka (1 of 1)] ValueError: Model r1041_e82_400bps_sup_v420 is not a known model or existant file.
[dragonflye] Error running command: medaka_consensus -i READS.fq.gz -d flye/polish/racon/1/consensus.fasta -o flye/polish/medaka/1 -m r1041_e82_400bps_sup_v420 -t 4  2>&1 | sed 's/^/[polishing - medaka (1 of 1)] /' | tee -a
dragonflye.log

Indeed medaka wants something like this: r1041_e82_400bps_sup_v4.2.0, with dots in the version name.

docker run -v $HOME:$HOME -w $HOME/test gitlab-registry.internal.sanger.ac.uk/sanger-pathogens/docker-images-test/dragonflye:1.1.0 medaka tools list\_models
Available: r103_fast_g507, r103_fast_snp_g507, r103_fast_variant_g507, r103_hac_g507, r103_hac_snp_g507, r103_hac_variant_g507, r103_min_high_g345, r103_min_high_g360, r103_prom_high_g360, r103_prom_snp_g3210, r103_prom_variant_g3210, r103_sup_g507, r103_sup_snp_g507, r103_sup_variant_g507, r1041_e82_260bps_fast_g632, r1041_e82_260bps_fast_variant_g632, r1041_e82_260bps_hac_g632, r1041_e82_260bps_hac_v4.0.0, r1041_e82_260bps_hac_v4.1.0, r1041_e82_260bps_hac_variant_g632, r1041_e82_260bps_hac_variant_v4.1.0, r1041_e82_260bps_sup_g632, r1041_e82_260bps_sup_v4.0.0, r1041_e82_260bps_sup_v4.1.0, r1041_e82_260bps_sup_variant_g632, r1041_e82_260bps_sup_variant_v4.1.0, r1041_e82_400bps_fast_g615, r1041_e82_400bps_fast_g632, r1041_e82_400bps_fast_variant_g615, r1041_e82_400bps_fast_variant_g632, r1041_e82_400bps_hac_g615, r1041_e82_400bps_hac_g632, r1041_e82_400bps_hac_v4.0.0, r1041_e82_400bps_hac_v4.1.0, r1041_e82_400bps_hac_v4.2.0, r1041_e82_400bps_hac_variant_g615, r1041_e82_400bps_hac_variant_g632, r1041_e82_400bps_hac_variant_v4.1.0, r1041_e82_400bps_hac_variant_v4.2.0, r1041_e82_400bps_sup_g615, r1041_e82_400bps_sup_v4.0.0, r1041_e82_400bps_sup_v4.1.0, r1041_e82_400bps_sup_v4.2.0, r1041_e82_400bps_sup_variant_g615, r1041_e82_400bps_sup_variant_v4.1.0, r1041_e82_400bps_sup_variant_v4.2.0, r104_e81_fast_g5015, r104_e81_fast_variant_g5015, r104_e81_hac_g5015, r104_e81_hac_variant_g5015, r104_e81_sup_g5015, r104_e81_sup_g610, r104_e81_sup_variant_g610, r10_min_high_g303, r10_min_high_g340, r941_e81_fast_g514, r941_e81_fast_variant_g514, r941_e81_hac_g514, r941_e81_hac_variant_g514, r941_e81_sup_g514, r941_e81_sup_variant_g514, r941_min_fast_g303, r941_min_fast_g507, r941_min_fast_snp_g507, r941_min_fast_variant_g507, r941_min_hac_g507, r941_min_hac_snp_g507, r941_min_hac_variant_g507, r941_min_high_g303, r941_min_high_g330, r941_min_high_g340_rle, r941_min_high_g344, r941_min_high_g351, r941_min_high_g360, r941_min_sup_g507, r941_min_sup_snp_g507, r941_min_sup_variant_g507, r941_prom_fast_g303, r941_prom_fast_g507, r941_prom_fast_snp_g507, r941_prom_fast_variant_g507, r941_prom_hac_g507, r941_prom_hac_snp_g507, r941_prom_hac_variant_g507, r941_prom_high_g303, r941_prom_high_g330, r941_prom_high_g344, r941_prom_high_g360, r941_prom_high_g4011, r941_prom_snp_g303, r941_prom_snp_g322, r941_prom_snp_g360, r941_prom_sup_g507, r941_prom_sup_snp_g507, r941_prom_sup_variant_g507, r941_prom_variant_g303, r941_prom_variant_g322, r941_prom_variant_g360, r941_sup_plant_g610, r941_sup_plant_variant_g610
Default consensus:  r1041_e82_400bps_sup_v4.2.0
Default variant:  r1041_e82_400bps_sup_variant_v4.2.0

If trying to give that medaka-valid value to dragonflye:

dragonflye \
--reads dragonflye/barcode07.fastq.gz \
--R1 4075_2#2_1.fastq.gz \
--R2 4075_2#2_2.fastq.gz \
--gsize 4.5M --medaka 1 --model r1041_e82_400bps_sup_v4.2.0 \
--cpus 4 --ram 6 --outdir dragonflye/test_IP6794-89

then dragonflye fails at the argument validation step:

[dragonflye] You ran: /opt/conda/bin/dragonflye --reads dragonflye/barcode07.fastq.gz --R1 dragonflye/4075_2#2_1
.fastq.gz --R2 dragonflye/4075_2#2_2.fastq.gz --gsize 4.5M --medaka 1 --model r1041_e82_400bps_sup_v4.2.0 --cpus 4 --ram 6 --outdir dragonflye/test_IP6794-89
[dragonflye] This is dragonflye 1.1.0
[dragonflye] Written by Robert A Petit III
[dragonflye] Homepage is https://github.com/rpetit3/dragonflye
[dragonflye] Operating system is linux
[dragonflye] Perl version is v5.32.1
[dragonflye] Machine has 256 CPU cores and 2015.34 GB RAM
[dragonflye] Verifying input model (--model): r1041_e82_400bps_sup_v4.2.0
[dragonflye] Unable to verify model 'r1041_e82_400bps_sup_v4.2.0', please check spelling and try again.
[dragonflye] Available Medaka models include:
[dragonflye]    r103_fast_g507
[dragonflye]    r103_hac_g507
[dragonflye]    r103_min_high_g345
[dragonflye]    r103_min_high_g360
[...]

Could you please change your validation scheme so that it matches that of medaka?

Best wishes,
Florent

Unable to verify model 'r941_min_sup_g507'

Hi @rpetit3 ,

We tried to polish the contigs with model r941_min_sup_g507, but we get these error message:

[dragonflye] Verifying input model (--model): r941_min_sup_g507
[dragonflye] Unable to verify model 'r941_min_sup_g507', please check spelling and try again.

May I know is this model included in dragonflye?

Processing multiple samples

Hi Robert,
Can dragonflye process multiple samples? How do I indicate that in the output directory ? Is there a flag for sample?
Thanks,
TJ

trimming and medaka

Hi @rpetit3,

Firstly, thank you for this useful tool, great work and great name:)

Secondly, would it be possible (if it is not already taken care of in the tool, and I didn't realize) to add a adapter trimming (and in some case demultiplexing) step? (like in shovill there is a --trim option). We use porechop (https://github.com/rrwick/Porechop), but any other option would be good too.

Thirdly, is it possible to use medaka in gpu mode?

thanks again!
Yair

flye log

Hello,
Thanks for this great work.
It would be nice not to remove the flye log information as they are not included in the dragonflye log file.
Best regards
Mostafa

Execution error with Rasusa v1.0.0 in conda environment

Hi,

Thanks for developing Dragonflyeβ€”it's been highly useful! I've encountered an issue with Rasusa during installation on a new Ubuntu 22.04 machine using Conda. Dragonflye v1.2.0 seems to pull Rasusa v1.0.0 instead of v0.8.0, leading to a syntax error in the rasusa command.

[...]
[dragonflye] Using rasusa - /opt/conda/envs/dragonflye_1.2.0/bin/rasusa | rasusa 1.0.0
[...]
[dragonflye] Running: rasusa -i READS\.filt\.fq\.gz -c 100 -g 50000 -s 42  2>&1 1> READS.sub.fq | sed 's/^/[rasusa] /' | tee -a dragonflye.log
[rasusa] error: unexpected argument '-i' found
[...]

Correct Rasusa v1.0.0 syntax:

rasusa reads -c 100 -g 50000 -s 42 READS\.filt\.fq\.gz 2>&1 1> READS.sub.fq | sed 's/^/[rasusa] /' | tee -a dragonflye.log

Cheers,
Nouri

Typo

[dragonflye] Dragonfly larva eat just about anything: tadpoles, mosquitoes, fish, other insect larvae and even each other!
Either 'larvae eat' or 'larva eats'.
Everything should be just perfect :)
Thanks for this nice piece of software!

Filtering reads quality

Hi

The dragonflye use nanoq filter reads length, and nanoq also can filter reads quality, could you add the feature to dragonflye?

Thanks

Feature request: dragonflye 1.1.N default to flye 2.9.2?

Hi @rpetit3 - flye 2.9.2 is on bioconda now :)

Do you think any major changes will be required for flye 2.9.2? Wondering if that would be fairly safe to pin &/or if you've done anything with it yet!

P.S. totally unrelated, but having the CPU/GPU version separated on bioconda was a great idea. I've only tried CPU medaka so far on 1.1.0. Is dragonflye-gpu on your dev channel?

Polypolish error

Hi!

I'm trying to run dragonflye (v1.1.2) with default parameters using --reads, --R1 and --R2 inputs. The issue arises after assembling with flye, when dragonflye executes polypolish (v0.6.0):
Running: polypolish flye/polish/racon/1/consensus.fasta flye/polish/short_reads/polypolish/1/polypolish_R1-1.sam flye/polish/short_reads/polypolish/1/polypolish-R2-1.sam > flye/polish/short_reads/polypolish/1/polypolish-1.fasta | sed 's/^/[short read polishing - polypolish (1 of 1)] /' | tee -a dragonflye.log

Error running command: polypolish flye/polish/racon/1/consensus.fasta flye/polish/short_reads/polypolish/1/polypolish_R1-1.sam flye/polish/short_reads/polypolish/1/polypolish-R2-1.sam > flye/polish/short_reads/polypolish/1/polypolish-1.fasta | sed 's/^/[short read polishing - polypolish (1 of 1)] /' | tee -a dragonflye.log

error: unrecognized subcommand 'flye/polish/racon/1/consensus.fasta'

I believe the issue is that the polish command is missing when calling polypolish.

Thanks for your hard work!

Andrea

mamba installation problem

Hi,

Thank you for wonderful assembly pipeline!

I have successfully installed dragonflye via mamba but unfortunately installed v1.0.7.

So I decided to force mamba to install newest version:

mamba create -n dragonflye -c conda-forge -c bioconda dragonflye=1.0.13

              __    __    __    __
             /  \  /  \  /  \  /  \
            /    \/    \/    \/    \

β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ/ /β–ˆβ–ˆ/ /β–ˆβ–ˆ/ /β–ˆβ–ˆ/ /β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
/ / \ / \ / \ / \ ____
/ / _/ _/ _/ \ o _,
/ / _
__/ `
|/
β–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—
β–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—
β–ˆβ–ˆβ•”β–ˆβ–ˆβ–ˆβ–ˆβ•”β–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β–ˆβ–ˆβ–ˆβ–ˆβ•”β–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•‘
β–ˆβ–ˆβ•‘β•šβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘β•šβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•‘
β–ˆβ–ˆβ•‘ β•šβ•β• β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘ β•šβ•β• β–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘
β•šβ•β• β•šβ•β•β•šβ•β• β•šβ•β•β•šβ•β• β•šβ•β•β•šβ•β•β•β•β•β• β•šβ•β• β•šβ•β•

    mamba (0.22.1) supported by @QuantStack

    GitHub:  https://github.com/mamba-org/mamba
    Twitter: https://twitter.com/QuantStack

β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ

WARNING: A conda environment already exists at '/home/jang/anaconda3/envs/mamba/envs/dragonflye'
Remove existing environment (y/[n])? y

Looking for: ['dragonflye=1.0.13']

conda-forge/linux-64 Using cache
conda-forge/noarch Using cache
bioconda/linux-64 Using cache
bioconda/noarch Using cache
r/linux-64 Using cache
r/noarch Using cache
pkgs/main/noarch No change
pkgs/r/noarch No change
pkgs/r/linux-64 No change
pkgs/main/linux-64 No change
cruizperez/linux-64 No change
cruizperez/noarch No change
Encountered problems while solving:

  • nothing provides cudatoolkit 8.0.* needed by tensorflow-gpu-base-1.4.1-py27h01caf0a_0

Any hints?

I would like to run dragonflye with medaka using gpu or cpu and finally polish the assembly with polypolish.

Can you also add --prefix option for dragonflye to set custom file name for the final assembly?

Bests,
Jan

Coverage of "0"?

What does a coverage of 0 (for a config) in the "flye-info.txt" mean? I don't think these contigs end up in the final assembly file anyway, but I am curious as to how and why it is being reported as such, since it obviously doesn't make much sense.

Feature request: replicon rotation

Would you be open to adding a replicon rotation feature similar to what unicycler does? There's an existing issue on the flye repo that states that it doesn't directly support rotation, and suggests using circlator for that purpose.

The unicycler paper describes its approach:

A circular sequence can be shifted to any starting position without changing the biological information. Unicycler therefore uses TBLASTN to search for dnaA or repA alleles in each completed replicon[20]. If one is found, the sequence is rotated and/or flipped so that it begins with that gene encoded on the forward strand. This provides consistently oriented assemblies and reduces the risk that a gene will be split across the start and end of the sequence.

...or do you see that as out-of-scope for dragonflye?

Thanks for making such a useful tool. It really simplifies the process of creating high-quality hybrid assemblies.

Memory calculation

I'm wondering if there is a way to estimate the memory requirements up front so that it doesn't produce an error. Is that even possible? Thinking about pilon but it's possible something else on here takes more memory? If it's insufficient memory, then I'd suggest dragonfly produces an up front error.

Polypolish error

Hi @rpetit3,

Thanks for your excellent tool. I ran dragonflye (v1.1.2) with the intent of assembling long reads and polishing with short reads. However, I encountered an error when the pipeline got to the polypolish step.

[dragonflye] Polishing with Polypolish
[dragonflye] Running: polypolish flye/polish/racon/1/consensus.fasta flye/polish/short_reads/polypolish/1/polypolish_R1-1.sam flye/polish/short_reads/polypolish/1/polypolish-R2-1.sam > flye/polish/short_reads/polypolish/1/polypolish-1.fasta | sed 's/^/[short read polishing - polypolish (1 of 1)] /' | tee -a dragonflye.logerror: unrecognized subcommand 'flye/polish/racon/1/consensus.fasta'
Usage: polypolish <COMMAND>

For more information, try '--help'.

The command I ran is

dragonflye --reads /MIGE/01_DATA/01_FASTQ/15059.fastq.gz --R1 /MIGE/01_DATA/01_FASTQ/15059_1.fastq.gz --R2 /MIGE/01_DATA/01_FASTQ/15059_2.fastq.gz --prefix 15059 --outdir dragonflye_direct --force

I thought this was due to the dragonflye version that I used. However, the current version (v1.2.0) isn't installable using conda (I tried, but I encountered an error).

The error message gives the impression that polypolish requires a sub-command (e.g., polypolish filter or polypolish polish), which is currently missing from the dragonflye pipeline.

I look forward to hearing back from you.

Batch option for Medaka

Hi!

Running into issues with medaka polishing step. Runs out of GPU memory. Medaka manual states that passing a batch option (-b) to medaka_consensus helps limit the GPU memory usage.

Tried by editing the bin file and it works. Could there be a way to dynamically pass a batch size option to medaka when calling dragonflye?

Thanks!

MV

Expose nano-hq option

Might be useful for data generated with the latest ONT chemistry (Q20+) and basecallers to have the Flye --nano-hq mode available. I found that this can't be added with the --opts flag as --nano-raw is hardcoded. Great tool!

Citation

Hello, is there a specific citation format available for Dragonflye?

Memory default

I'm wondering if you want to increase the default memory up to something more solid since this is nanopore. 64G? The Java error that results is really confusing to people who don't know java. I have one more separate but related idea and will make a separate ticket for that.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.