Giter Site home page Giter Site logo

a-ludi / dentist Goto Github PK

View Code? Open in Web Editor NEW
47.0 4.0 6.0 19.99 MB

Close assembly gaps using long-reads at high accuracy.

Home Page: https://a-ludi.github.io/dentist/

License: MIT License

D 89.16% Shell 5.70% Python 4.04% Dockerfile 0.13% Makefile 0.64% Awk 0.27% jq 0.06%
genome-assembly gap-filling pacbio long-reads close-assembly-gaps snakemake damapper cluster daligner dub

dentist's People

Contributors

a-ludi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

dentist's Issues

example failing due to computeintrinsicqv

Hi! I am running the Dentist v3.0.0 on our cluster and the md5check fails on the example. I think there is again an issue with computeintrisicqv. Can you spot what's going on?

Here's the first exitStatus != 0 from process.1.log:

{"thread":47412916968384,
"logLevel":"diagnostic","
state":"post",
"command":
["computeintrinsicqv","-d19","/tmp/slurm_schradel.11559699/dentist-processPileUps-F2nX6W/pileup-52b-53f.db","/tmp/slurm_schradel.11559699/dentist-processPileUps-F2nX6W/pileup-52b-53f.pileup-52b-53f-filtered-chained-filtered.las"],
"output":["[V] processing /tmp/slurm_schradel.11559699/dentist-processPileUps-F2nX6W/pileup-52b-53f.pileup-52b-53f-filtered-chained-filtered.las",
"DatabaseFile::getTrimmedBlockInterval(): invalid block id 11559699","","/home/s/schradel/software/dentist.v3.0.0.x86_64/bin/computeintrinsicqv(+0x637f9)[0x563c2f02c7f9]","/home/s/schradel/software/dentist.v3.0.0.x86_64/bin/computeintrinsicqv(+0x5b42c)[0x563c2f02442c]","/home/s/schradel/software/dentist.v3.0.0.x86_64/bin/computeintrinsicqv(+0x16f24)[0x563c2efdff24]","/home/s/schradel/software/dentist.v3.0.0.x86_64/bin/computeintrinsicqv(+0x11ffd)[0x563c2efdaffd]","/lib64/libc.so.6(__libc_start_main+0xf5)[0x2b465d0e6555]","/home/s/schradel/software/dentist.v3.0.0.x86_64/bin/computeintrinsicqv(+0x15479)[0x563c2efde479]",""],
"exitStatus":1,
"timestamp":637889081198764946,
"action":"execute",
"type":"command"}

Here's the output of the md5check:

gap-closed.closed-gaps.bed: FAILED
gap-closed.fasta: FAILED
workdir/.reference.bps: OK
workdir/.reference.dentist-reads-H.anno: FAILED
workdir/.reference.dentist-reads-H.data: FAILED
workdir/.reference.dentist-reads.anno: OK
workdir/.reference.dentist-reads.data: OK
workdir/.reference.dentist-self-H.anno: FAILED
workdir/.reference.dentist-self-H.data: FAILED
workdir/.reference.dentist-self.anno: OK
workdir/.reference.dentist-self.data: OK
workdir/.reference.dust.anno: OK
workdir/.reference.dust.data: OK
workdir/.reference.hdr: OK
workdir/.reference.idx: OK
workdir/.reference.tan-H.anno: FAILED
workdir/.reference.tan-H.data: FAILED
workdir/.reference.tan.anno: OK
workdir/.reference.tan.data: OK
workdir/.gap-closed-preliminary.bps: FAILED
workdir/.gap-closed-preliminary.closed-gaps.anno: FAILED
workdir/.gap-closed-preliminary.closed-gaps.data: FAILED
workdir/.gap-closed-preliminary.dentist-self.anno: FAILED
workdir/.gap-closed-preliminary.dentist-self.data: FAILED
workdir/.gap-closed-preliminary.dentist-weak-coverage.anno: FAILED
workdir/.gap-closed-preliminary.dentist-weak-coverage.data: OK
workdir/.gap-closed-preliminary.dust.anno: FAILED
workdir/.gap-closed-preliminary.dust.data: FAILED
workdir/.gap-closed-preliminary.hdr: OK
workdir/.gap-closed-preliminary.idx: FAILED
workdir/.gap-closed-preliminary.tan.anno: FAILED
workdir/.gap-closed-preliminary.tan.data: FAILED
workdir/.reads.bps: OK
workdir/.reads.dentist-reads-10B.anno: OK
workdir/.reads.dentist-reads-10B.data: OK
workdir/.reads.dentist-reads-11B.anno: OK
workdir/.reads.dentist-reads-11B.data: OK
workdir/.reads.dentist-reads-12B.anno: OK
workdir/.reads.dentist-reads-12B.data: OK
workdir/.reads.dentist-reads-1B.anno: OK
workdir/.reads.dentist-reads-1B.data: OK
workdir/.reads.dentist-reads-2B.anno: OK
workdir/.reads.dentist-reads-2B.data: OK
workdir/.reads.dentist-reads-3B.anno: OK
workdir/.reads.dentist-reads-3B.data: OK
workdir/.reads.dentist-reads-4B.anno: OK
workdir/.reads.dentist-reads-4B.data: OK
workdir/.reads.dentist-reads-5B.anno: OK
workdir/.reads.dentist-reads-5B.data: OK
workdir/.reads.dentist-reads-6B.anno: OK
workdir/.reads.dentist-reads-6B.data: OK
workdir/.reads.dentist-reads-7B.anno: OK
workdir/.reads.dentist-reads-7B.data: OK
workdir/.reads.dentist-reads-8B.anno: OK
workdir/.reads.dentist-reads-8B.data: OK
workdir/.reads.dentist-reads-9B.anno: OK
workdir/.reads.dentist-reads-9B.data: OK
workdir/.reads.dentist-self-10B.anno: OK
workdir/.reads.dentist-self-10B.data: OK
workdir/.reads.dentist-self-11B.anno: OK
workdir/.reads.dentist-self-11B.data: OK
workdir/.reads.dentist-self-12B.anno: OK
workdir/.reads.dentist-self-12B.data: OK
workdir/.reads.dentist-self-1B.anno: OK
workdir/.reads.dentist-self-1B.data: OK
workdir/.reads.dentist-self-2B.anno: OK
workdir/.reads.dentist-self-2B.data: OK
workdir/.reads.dentist-self-3B.anno: OK
workdir/.reads.dentist-self-3B.data: OK
workdir/.reads.dentist-self-4B.anno: OK
workdir/.reads.dentist-self-4B.data: OK
workdir/.reads.dentist-self-5B.anno: OK
workdir/.reads.dentist-self-5B.data: OK
workdir/.reads.dentist-self-6B.anno: OK
workdir/.reads.dentist-self-6B.data: OK
workdir/.reads.dentist-self-7B.anno: OK
workdir/.reads.dentist-self-7B.data: OK
workdir/.reads.dentist-self-8B.anno: OK
workdir/.reads.dentist-self-8B.data: OK
workdir/.reads.dentist-self-9B.anno: OK
workdir/.reads.dentist-self-9B.data: OK
workdir/.reads.idx: OK
workdir/.reads.tan-10B.anno: OK
workdir/.reads.tan-10B.data: OK
workdir/.reads.tan-11B.anno: OK
workdir/.reads.tan-11B.data: OK
workdir/.reads.tan-12B.anno: OK
workdir/.reads.tan-12B.data: OK
workdir/.reads.tan-1B.anno: OK
workdir/.reads.tan-1B.data: OK
workdir/.reads.tan-2B.anno: OK
workdir/.reads.tan-2B.data: OK
workdir/.reads.tan-3B.anno: OK
workdir/.reads.tan-3B.data: OK
workdir/.reads.tan-4B.anno: OK
workdir/.reads.tan-4B.data: OK
workdir/.reads.tan-5B.anno: OK
workdir/.reads.tan-5B.data: OK
workdir/.reads.tan-6B.anno: OK
workdir/.reads.tan-6B.data: OK
workdir/.reads.tan-7B.anno: OK
workdir/.reads.tan-7B.data: OK
workdir/.reads.tan-8B.anno: OK
workdir/.reads.tan-8B.data: OK
workdir/.reads.tan-9B.anno: OK
workdir/.reads.tan-9B.data: OK
workdir/reference.dam: OK
workdir/gap-closed-preliminary.dam: FAILED
workdir/gap-closed-preliminary.fasta: FAILED
workdir/reads.db: OK
workdir/validation-report.json: OK
reference.fasta: OK
reads.fasta: OK
md5sum: WARNING: 21 computed checksums did NOT match

I am running this on centos

LSB Version:	:core-4.1-amd64:core-4.1-noarch
Distributor ID:	CentOS
Description:	CentOS Linux release 7.9.2009 (Core)
Release:	7.9.2009
Codename:	Core

rule mask_self fails on Docker container

I'm trying to use dentist on a docker container, but the example always fails during the mask_self step. From my mac, I'm starting the container using docker run -it --rm=true --platform linux/x86_64 centos:7 /bin/bash. I'm then running the following:

yum update -y -q
yum install -y wget

wget https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-Linux-x86_64.sh
sh Mambaforge-Linux-x86_64.sh -b -p /opt/mamba3
export PATH=/opt/mamba3/bin:$PATH
rm Mambaforge-Linux-x86_64.sh

mamba install -c conda-forge -c bioconda -y snakemake

wget https://github.com/a-ludi/dentist/releases/download/v3.0.0/dentist-example.tar.gz
tar -xzf dentist-example.tar.gz
cd dentist-example

snakemake --configfile=snakemake.yml --use-conda --cores=1

This causes the following error:

Error in rule mask_self:
    jobid: 11
    output: workdir/.reference.dentist-self.anno, workdir/.reference.dentist-self.data
    log: logs/mask-self.reference.log (check log file(s) for error message)
    conda-env: /dentist-example/.snakemake/conda/850bc5c09e81d3d6b875839f8fe0ed70
    shell:
        dentist mask --config=dentist.yml  workdir/reference.dam workdir/reference.reference.las dentist-self 2> logs/mask-self.reference.log
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

The contents of logs/mask-self.reference.log:

{"executableVersion":"v3.0.0","refDb":"workdir/reference.dam","readsDb":"","dbAlignmentFile":"workdir/reference.reference.las","repeatMask":"dentist-self","configFile":"dentist.yml","debugRepeatMasks":false,"help":false,"maxCoverageReads":4294967295,"coverageBoundsReads":[0,0],"maxCoverageSelf":3,"coverageBoundsSelf":[0,3],"maxImproperCoverageReads":4294967295,"improperCoverageBoundsReads":[0,0],"properAlignmentAllowance":100,"quiet":false,"readCoverage":20,"revertOptionNames":[],"tracePointDistance":100,"verbosity":2}
{"thread":274939902976,"logLevel":"diagnostic","timestamp":637783965479794038,"function":"dentist.commands.maskRepetitiveRegions.RepeatMaskAssessor.run","state":"enter"}
{"thread":274939902976,"logLevel":"diagnostic","timestamp":637783965479801084,"function":"dentist.commands.maskRepetitiveRegions.RepeatMaskAssessor.readInputs","state":"enter"}
{"thread":274939902976,"logLevel":"diagnostic","state":"pre","command":["DBdump","workdir/reference.dam"],"timestamp":637783965479844193,"action":"execute","type":"pipe"}
{"thread":274939902976,"logLevel":"diagnostic","timestamp":637783965480006401,"function":"dentist.commands.maskRepetitiveRegions.RepeatMaskAssessor.readInputs","state":"exit","timeElapsed":203652}
{"thread":274939902976,"logLevel":"diagnostic","timestamp":637783965480009609,"function":"dentist.commands.maskRepetitiveRegions.RepeatMaskAssessor.run","state":"exit","timeElapsed":208842}
core.exception.AssertError@source/dentist/util/process.d(215): Attempting to fetch the front of an empty LinesPipe
----------------
??:? _d_assert_msg [0x40010b3e06]
??:? uint dentist.dazzler.numDbRecords(in immutable(char)[]) [0x4000dbef18]
??:? dentist.util.process.LinesPipe!(dentist.util.process.ProcessInfo, 0).LinesPipe dentist.dazzler.dbdump!(ulong[]).dbdump(in immutable(char)[], ulong[], in immutable(char)[][]) [0x4000dcbdad]
??:? uint[] dentist.dazzler.LocalAlignmentReader.contigLengths(immutable(char)[]) [0x4000dbfff3]
??:? dentist.dazzler.LocalAlignmentReader dentist.dazzler.LocalAlignmentReader.__ctor(const(immutable(char)[]), immutable(char)[], immutable(char)[], dentist.dazzler.BufferMode, dentist.common.alignments.base.TracePoint[]) [0x4000dbfc21]
??:? void dentist.commands.maskRepetitiveRegions.RepeatMaskAssessor.readInputs() [0x4000c73d52]
??:? void dentist.commands.maskRepetitiveRegions.RepeatMaskAssessor.run() [0x4000c73b09]
??:? dentist.commandline.ReturnCode dentist.commandline.runCommand!(2).runCommand(in immutable(char)[][]) [0x4000bbd656]
??:? dentist.commandline.ReturnCode dentist.commandline.run(in immutable(char)[][]) [0x4000b2850d]
??:? _Dmain [0x40009bee67]

Strangely, I can run the same series of commands on a CentOS 7 cluster with no issue. The contents of logs/mask-self.reference.log for the successful run are

{"executableVersion":"v3.0.0","refDb":"workdir/reference.dam","readsDb":"","dbAlignmentFile":"workdir/reference.reference.las","repeatMask":"dentist-self","configFile":"dentist.yml","debugRepeatMasks":false,"help":false,"maxCoverageReads":4294967295,"coverageBoundsReads":[0,0],"maxCoverageSelf":3,"coverageBoundsSelf":[0,3],"maxImproperCoverageReads":4294967295,"improperCoverageBoundsReads":[0,0],"properAlignmentAllowance":100,"quiet":false,"readCoverage":20,"revertOptionNames":[],"tracePointDistance":100,"verbosity":2}
{"thread":22377401060544,"logLevel":"diagnostic","timestamp":637783913924743264,"function":"dentist.commands.maskRepetitiveRegions.RepeatMaskAssessor.run","state":"enter"}
{"thread":22377401060544,"logLevel":"diagnostic","timestamp":637783913924743707,"function":"dentist.commands.maskRepetitiveRegions.RepeatMaskAssessor.readInputs","state":"enter"}
{"thread":22377401060544,"logLevel":"diagnostic","state":"pre","command":["DBdump","workdir/reference.dam"],"timestamp":637783913924766164,"action":"execute","type":"pipe"}
{"thread":22377401060544,"logLevel":"diagnostic","state":"pre","command":["DBdump","-h","workdir/reference.dam"],"timestamp":637783913924790405,"action":"execute","type":"pipe"}
{"thread":22377401060544,"logLevel":"diagnostic","state":"pre","command":["DBdump","workdir/reference.dam"],"timestamp":637783913924830211,"action":"execute","type":"pipe"}
{"thread":22377401060544,"logLevel":"diagnostic","state":"pre","command":["DBdump","-h","workdir/reference.dam"],"timestamp":637783913924851842,"action":"execute","type":"pipe"}
{"thread":22377401060544,"logLevel":"diagnostic","timestamp":637783913924889160,"function":"dentist.commands.maskRepetitiveRegions.RepeatMaskAssessor.readInputs","state":"exit","timeElapsed":145216}
{"thread":22377401060544,"logLevel":"diagnostic","timestamp":637783913924889516,"function":"dentist.commands.maskRepetitiveRegions.RepeatMaskAssessor.assessRepeatStructure","state":"enter"}
{"thread":22377401060544,"logLevel":"diagnostic","state":"pre","command":["DBdump","workdir/reference.dam"],"timestamp":637783913924890069,"action":"execute","type":"pipe"}
{"thread":22377401060544,"logLevel":"diagnostic","state":"pre","command":["DBdump","-r","-h","workdir/reference.dam"],"timestamp":637783913924911577,"action":"execute","type":"pipe"}
{"alignmentType":"self","thread":22377401060544,"logLevel":"diagnostic","timestamp":637783913925023014,"repetitiveRegions":null,"numRepetitiveRegions":185}
{"thread":22377401060544,"logLevel":"diagnostic","timestamp":637783913925023427,"function":"dentist.commands.maskRepetitiveRegions.RepeatMaskAssessor.assessRepeatStructure","state":"exit","timeElapsed":133711}
{"thread":22377401060544,"logLevel":"diagnostic","timestamp":637783913925023646,"function":"dentist.commands.maskRepetitiveRegions.RepeatMaskAssessor.writeRepeatMask","state":"enter"}
{"thread":22377401060544,"logLevel":"diagnostic","state":"pre","command":["DBdump","workdir/reference.dam"],"timestamp":637783913925025791,"action":"execute","type":"pipe"}
{"thread":22377401060544,"logLevel":"diagnostic","state":"pre","command":["DBdump","workdir/reference.dam"],"timestamp":637783913925047965,"action":"execute","type":"pipe"}
{"thread":22377401060544,"logLevel":"diagnostic","timestamp":637783913925069691,"function":"dentist.commands.maskRepetitiveRegions.RepeatMaskAssessor.writeRepeatMask","state":"exit","timeElapsed":45797}
{"thread":22377401060544,"logLevel":"diagnostic","timestamp":637783913925069958,"function":"dentist.commands.maskRepetitiveRegions.RepeatMaskAssessor.run","state":"exit","timeElapsed":326260}

I've also tried this on the condaforge/mambaforge:4.11.0-0 container, but I get the same error.

Any insights are greatly appreciated!

Math error in Snakefile

Hi,
I'm trying to use Dentist to gapclose some test data using randomly generated fasta sequence and low error long reads.

The provided dataset is working well on our SLURM cluster, but i can't work with my own data, having trouble with a math error :

`Updating job make_merge_config.
InputFunctionException in line 1279 of /work2/project/seqoccin/assemblies/gap_closing/tests/manual_tests/dentist/Snakefile:
Error:
ValueError: math domain error
Wildcards:

Traceback:
File "/work2/project/seqoccin/assemblies/gap_closing/tests/manual_tests/dentist/Snakefile", line 1282, in
File "/work2/project/seqoccin/assemblies/gap_closing/tests/manual_tests/dentist/Snakefile", line 614, in insertions_batches
~ `

I realised that line 614 in Snake file :
num_digits = int(ceil(log10(num_batches)))

num_batches = 0. And log10(0) stop the execution of the pipeline. Some solution to solves this ? (and why is num_batches 0 ?)

Thanks !!

Failed to run with test data set: TypeError in line 45 of Snakefile

(base) ubuntu@bio-xanthomonas:~/dentist-example$ snakemake --configfile=snakemake.yml --use-singularity --cores=all
Pre-fetching singularity image...
TypeError in line 45 of /home/djs217/dentist-example/Snakefile:
__init__() got an unexpected keyword argument 'is_containerized'
  File "/home/djs217/dentist-example/Snakefile", line 745, in <module>
  File "/home/djs217/dentist-example/Snakefile", line 45, in prefetch_singularity_image

(base) ubuntu@bio-xanthomonas:~/dentist-example$ snakemake --version
5.32.1

md5checksum differences between nodes

Hi Arne,

I hope you are well. First of all congrats on the very exciting tool, I'm really looking forward to try it out on my ultra long Nanopore data!

However, I'm having a bit of trouble getting dentist set up. For starters, the example dataset only gets the md5 checksums right when I run it on a specific high memory node and not in any of our other computing nodes. In addition, this only works when used together with your singularity image and not with the bundled binaries in the bin folder of the example dataset.

Also, when I run it through slurm, there's yet an additional problem in that it reproducibly stops right after the collect check point and I need to run it again to finish the rest of the process. I'm not sure whether this might be due to the first issue since I haven't been able to queue it to that specific node but I'm in the process of checking.

Any ideas about where the problem might lie?

Thanks!!

Succesfull node (high memory)

[diego.terrones@clip-m1-0 dentist-example]$ md5sum -c checksum.md5
gap-closed.closed-gaps.bed: OK
gap-closed.fasta: OK
workdir/.assembly-test.bps: OK
workdir/.assembly-test.dentist-reads-H.anno: OK
workdir/.assembly-test.dentist-reads-H.data: OK
workdir/.assembly-test.dentist-reads.anno: OK
workdir/.assembly-test.dentist-reads.data: OK
workdir/.assembly-test.dentist-self-H.anno: OK
workdir/.assembly-test.dentist-self-H.data: OK
workdir/.assembly-test.dentist-self.anno: OK
workdir/.assembly-test.dentist-self.data: OK
workdir/.assembly-test.dust.anno: OK
workdir/.assembly-test.dust.data: OK
workdir/.assembly-test.hdr: OK
workdir/.assembly-test.idx: OK
workdir/.assembly-test.tan-H.anno: OK
workdir/.assembly-test.tan-H.data: OK
workdir/.assembly-test.tan.anno: OK
workdir/.assembly-test.tan.data: OK
workdir/.gap-closed-preliminary.bps: OK
workdir/.gap-closed-preliminary.closed-gaps.anno: OK
workdir/.gap-closed-preliminary.closed-gaps.data: OK
workdir/.gap-closed-preliminary.dentist-self.anno: OK
workdir/.gap-closed-preliminary.dentist-self.data: OK
workdir/.gap-closed-preliminary.dentist-weak-coverage.anno: OK
workdir/.gap-closed-preliminary.dentist-weak-coverage.data: OK
workdir/.gap-closed-preliminary.dust.anno: OK
workdir/.gap-closed-preliminary.dust.data: OK
workdir/.gap-closed-preliminary.hdr: OK
workdir/.gap-closed-preliminary.idx: OK
workdir/.gap-closed-preliminary.tan.anno: OK
workdir/.gap-closed-preliminary.tan.data: OK
workdir/.reads.bps: OK
workdir/.reads.dentist-reads-10B.anno: OK
workdir/.reads.dentist-reads-10B.data: OK
workdir/.reads.dentist-reads-11B.anno: OK
workdir/.reads.dentist-reads-11B.data: OK
workdir/.reads.dentist-reads-12B.anno: OK
workdir/.reads.dentist-reads-12B.data: OK
workdir/.reads.dentist-reads-1B.anno: OK
workdir/.reads.dentist-reads-1B.data: OK
workdir/.reads.dentist-reads-2B.anno: OK
workdir/.reads.dentist-reads-2B.data: OK
workdir/.reads.dentist-reads-3B.anno: OK
workdir/.reads.dentist-reads-3B.data: OK
workdir/.reads.dentist-reads-4B.anno: OK
workdir/.reads.dentist-reads-4B.data: OK
workdir/.reads.dentist-reads-5B.anno: OK
workdir/.reads.dentist-reads-5B.data: OK
workdir/.reads.dentist-reads-6B.anno: OK
workdir/.reads.dentist-reads-6B.data: OK
workdir/.reads.dentist-reads-7B.anno: OK
workdir/.reads.dentist-reads-7B.data: OK
workdir/.reads.dentist-reads-8B.anno: OK
workdir/.reads.dentist-reads-8B.data: OK
workdir/.reads.dentist-reads-9B.anno: OK
workdir/.reads.dentist-reads-9B.data: OK
workdir/.reads.dentist-self-10B.anno: OK
workdir/.reads.dentist-self-10B.data: OK
workdir/.reads.dentist-self-11B.anno: OK
workdir/.reads.dentist-self-11B.data: OK
workdir/.reads.dentist-self-12B.anno: OK
workdir/.reads.dentist-self-12B.data: OK
workdir/.reads.dentist-self-1B.anno: OK
workdir/.reads.dentist-self-1B.data: OK
workdir/.reads.dentist-self-2B.anno: OK
workdir/.reads.dentist-self-2B.data: OK
workdir/.reads.dentist-self-3B.anno: OK
workdir/.reads.dentist-self-3B.data: OK
workdir/.reads.dentist-self-4B.anno: OK
workdir/.reads.dentist-self-4B.data: OK
workdir/.reads.dentist-self-5B.anno: OK
workdir/.reads.dentist-self-5B.data: OK
workdir/.reads.dentist-self-6B.anno: OK
workdir/.reads.dentist-self-6B.data: OK
workdir/.reads.dentist-self-7B.anno: OK
workdir/.reads.dentist-self-7B.data: OK
workdir/.reads.dentist-self-8B.anno: OK
workdir/.reads.dentist-self-8B.data: OK
workdir/.reads.dentist-self-9B.anno: OK
workdir/.reads.dentist-self-9B.data: OK
workdir/.reads.idx: OK
workdir/.reads.tan-10B.anno: OK
workdir/.reads.tan-10B.data: OK
workdir/.reads.tan-11B.anno: OK
workdir/.reads.tan-11B.data: OK
workdir/.reads.tan-12B.anno: OK
workdir/.reads.tan-12B.data: OK
workdir/.reads.tan-1B.anno: OK
workdir/.reads.tan-1B.data: OK
workdir/.reads.tan-2B.anno: OK
workdir/.reads.tan-2B.data: OK
workdir/.reads.tan-3B.anno: OK
workdir/.reads.tan-3B.data: OK
workdir/.reads.tan-4B.anno: OK
workdir/.reads.tan-4B.data: OK
workdir/.reads.tan-5B.anno: OK
workdir/.reads.tan-5B.data: OK
workdir/.reads.tan-6B.anno: OK
workdir/.reads.tan-6B.data: OK
workdir/.reads.tan-7B.anno: OK
workdir/.reads.tan-7B.data: OK
workdir/.reads.tan-8B.anno: OK
workdir/.reads.tan-8B.data: OK
workdir/.reads.tan-9B.anno: OK
workdir/.reads.tan-9B.data: OK
workdir/assembly-test.dam: OK
workdir/gap-closed-preliminary.dam: OK
workdir/gap-closed-preliminary.fasta: OK
workdir/reads.db: OK
workdir/validation-report.json: OK

Config

[diego.terrones@clip-m1-0 dentist-example]$ lsb_release -a
LSB Version:	:core-4.1-amd64:core-4.1-noarch
Distributor ID:	CentOS
Description:	CentOS Linux release 7.9.2009 (Core)
Release:	7.9.2009
Codename:	Core
[diego.terrones@clip-m1-0 dentist-example]$ free -h
              total        used        free      shared  buff/cache   available
Mem:           1.9T        943G        938G         89M         55G        991G
Swap:            0B          0B          0B

All other nodes

[diego.terrones@clip-c2-2 dentist-example]$ md5sum -c checksum.md5
gap-closed.closed-gaps.bed: FAILED
gap-closed.fasta: FAILED
workdir/.assembly-test.bps: OK
workdir/.assembly-test.dentist-reads-H.anno: OK
workdir/.assembly-test.dentist-reads-H.data: OK
workdir/.assembly-test.dentist-reads.anno: OK
workdir/.assembly-test.dentist-reads.data: OK
workdir/.assembly-test.dentist-self-H.anno: OK
workdir/.assembly-test.dentist-self-H.data: OK
workdir/.assembly-test.dentist-self.anno: OK
workdir/.assembly-test.dentist-self.data: OK
workdir/.assembly-test.dust.anno: OK
workdir/.assembly-test.dust.data: OK
workdir/.assembly-test.hdr: OK
workdir/.assembly-test.idx: OK
workdir/.assembly-test.tan-H.anno: OK
workdir/.assembly-test.tan-H.data: OK
workdir/.assembly-test.tan.anno: OK
workdir/.assembly-test.tan.data: OK
workdir/.gap-closed-preliminary.bps: FAILED
workdir/.gap-closed-preliminary.closed-gaps.anno: FAILED
workdir/.gap-closed-preliminary.closed-gaps.data: FAILED
workdir/.gap-closed-preliminary.dentist-self.anno: FAILED
workdir/.gap-closed-preliminary.dentist-self.data: FAILED
workdir/.gap-closed-preliminary.dentist-weak-coverage.anno: FAILED
workdir/.gap-closed-preliminary.dentist-weak-coverage.data: OK
workdir/.gap-closed-preliminary.dust.anno: FAILED
workdir/.gap-closed-preliminary.dust.data: FAILED
workdir/.gap-closed-preliminary.hdr: OK
workdir/.gap-closed-preliminary.idx: FAILED
workdir/.gap-closed-preliminary.tan.anno: FAILED
workdir/.gap-closed-preliminary.tan.data: FAILED
workdir/.reads.bps: OK
workdir/.reads.dentist-reads-10B.anno: OK
workdir/.reads.dentist-reads-10B.data: OK
workdir/.reads.dentist-reads-11B.anno: OK
workdir/.reads.dentist-reads-11B.data: OK
workdir/.reads.dentist-reads-12B.anno: OK
workdir/.reads.dentist-reads-12B.data: OK
workdir/.reads.dentist-reads-1B.anno: OK
workdir/.reads.dentist-reads-1B.data: OK
workdir/.reads.dentist-reads-2B.anno: OK
workdir/.reads.dentist-reads-2B.data: OK
workdir/.reads.dentist-reads-3B.anno: OK
workdir/.reads.dentist-reads-3B.data: OK
workdir/.reads.dentist-reads-4B.anno: OK
workdir/.reads.dentist-reads-4B.data: OK
workdir/.reads.dentist-reads-5B.anno: OK
workdir/.reads.dentist-reads-5B.data: OK
workdir/.reads.dentist-reads-6B.anno: OK
workdir/.reads.dentist-reads-6B.data: OK
workdir/.reads.dentist-reads-7B.anno: OK
workdir/.reads.dentist-reads-7B.data: OK
workdir/.reads.dentist-reads-8B.anno: OK
workdir/.reads.dentist-reads-8B.data: OK
workdir/.reads.dentist-reads-9B.anno: OK
workdir/.reads.dentist-reads-9B.data: OK
workdir/.reads.dentist-self-10B.anno: OK
workdir/.reads.dentist-self-10B.data: OK
workdir/.reads.dentist-self-11B.anno: OK
workdir/.reads.dentist-self-11B.data: OK
workdir/.reads.dentist-self-12B.anno: OK
workdir/.reads.dentist-self-12B.data: OK
workdir/.reads.dentist-self-1B.anno: OK
workdir/.reads.dentist-self-1B.data: OK
workdir/.reads.dentist-self-2B.anno: OK
workdir/.reads.dentist-self-2B.data: OK
workdir/.reads.dentist-self-3B.anno: OK
workdir/.reads.dentist-self-3B.data: OK
workdir/.reads.dentist-self-4B.anno: OK
workdir/.reads.dentist-self-4B.data: OK
workdir/.reads.dentist-self-5B.anno: OK
workdir/.reads.dentist-self-5B.data: OK
workdir/.reads.dentist-self-6B.anno: OK
workdir/.reads.dentist-self-6B.data: OK
workdir/.reads.dentist-self-7B.anno: OK
workdir/.reads.dentist-self-7B.data: OK
workdir/.reads.dentist-self-8B.anno: OK
workdir/.reads.dentist-self-8B.data: OK
workdir/.reads.dentist-self-9B.anno: OK
workdir/.reads.dentist-self-9B.data: OK
workdir/.reads.idx: OK
workdir/.reads.tan-10B.anno: OK
workdir/.reads.tan-10B.data: OK
workdir/.reads.tan-11B.anno: OK
workdir/.reads.tan-11B.data: OK
workdir/.reads.tan-12B.anno: OK
workdir/.reads.tan-12B.data: OK
workdir/.reads.tan-1B.anno: OK
workdir/.reads.tan-1B.data: OK
workdir/.reads.tan-2B.anno: OK
workdir/.reads.tan-2B.data: OK
workdir/.reads.tan-3B.anno: OK
workdir/.reads.tan-3B.data: OK
workdir/.reads.tan-4B.anno: OK
workdir/.reads.tan-4B.data: OK
workdir/.reads.tan-5B.anno: OK
workdir/.reads.tan-5B.data: OK
workdir/.reads.tan-6B.anno: OK
workdir/.reads.tan-6B.data: OK
workdir/.reads.tan-7B.anno: OK
workdir/.reads.tan-7B.data: OK
workdir/.reads.tan-8B.anno: OK
workdir/.reads.tan-8B.data: OK
workdir/.reads.tan-9B.anno: OK
workdir/.reads.tan-9B.data: OK
workdir/assembly-test.dam: OK
workdir/gap-closed-preliminary.dam: FAILED
workdir/gap-closed-preliminary.fasta: FAILED
workdir/reads.db: OK
workdir/validation-report.json: OK
md5sum: WARNING: 15 computed checksums did NOT match

Error in rule `collect`

Hi,

Following #16 I tried the pipeline with SKIP_LACHEK=1 and now I have this error

Error in rule collect:
    jobid: 5
    output: workdir/pile-ups.db
    log: logs/collect.log (check log file(s) for error message)
    shell:
        dentist collect --config=dentist.json  --threads=4 --auxiliary-threads=2 --mask=dentist-self-H,tan-H,dentist-reads-H workdir/scaffolds_FINAL.dam workdir/non-hifi.1kb.db workdir/scaffolds_FINAL.non-hifi.1kb.las workdir/pile-ups.db 2> logs/collect.log
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
    cluster_jobid: 370318 logs/cluster/collect/unique/jobid5_4e09f197-d4f2-4b34-83cf-bac0967aa03c.out

Error executing rule collect on cluster (jobid: 5, external: 370318 logs/cluster/collect/unique/jobid5_4e09f197-d4f2-4b34-83cf-bac0967aa03c.out, jobscript: /lustre/scratch116/tol/teams/team308/users/mm49/tmp/non-hifi-reads2/.snakemake/tmp.vs0le48c/snakejob.collect.5.sh). For error details see the cluster log and the log files of the involved rule(s).

logs/collect.log is empty. 370318 logs/cluster/collect/unique/jobid5_4e09f197-d4f2-4b34-83cf-bac0967aa03c.out seems to be containing the output of a snakemake pipeline that has this error:

Error in rule propagate_mask_back_to_reference_block:
    jobid: 946
    output: workdir/.scaffolds_FINAL.dentist-self-H-257B.anno, workdir/.scaffolds_FINAL.dentist-self-H-257B.data
    log: logs/propagate-mask-back-to-reference-block.dentist-self.257.log (check log file(s) for error message)
    shell:
        dentist propagate-mask --config=dentist.json  -m dentist-self-257B workdir/non-hifi.1kb.db workdir/scaffolds_FINAL.dam workdir/non-hifi.1kb.257.scaffolds_FINAL.las dentist-self-H-257B 2> logs/propagate-mask-back-to-reference-block.dentist-self.257.log
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

And logs/propagate-mask-back-to-reference-block.dentist-self.257.log has this error:

core.exception.AssertError@/etc/../usr/include/dmd/phobos/std/range/primitives.d(2434): Attempting to fetch the front of an empty array of FlatLocalAlignment
----------------
??:? _d_assert_msg [0x55a59fc89657]
??:? dentist.common.alignments.base.FlatLocalAlignment[] std.algorithm.mutation.copy!(std.algorithm.iteration.ChunkByChunkImpl!(dentist.commands.propagateMask.MaskPropagator.getLocalAlignmentsByContig().__lambda1, dentist.dazzler.LocalAlignmentReader).ChunkByChunkImpl, dentist.common.alignments.base.FlatLocalAlignment[]).copy(std.algorithm.iteration.ChunkByChunkImpl!(dentist.commands.propagateMask.MaskPropagator.getLocalAlignmentsByContig().__lambda1, dentist.dazzler.LocalAlignmentReader).ChunkByChunkImpl, dentist.common.alignments.base.FlatLocalAlignment[]) [0x55a59f7ae2e8]
??:? dentist.common.alignments.base.FlatLocalAlignment[] dentist.commands.propagateMask.MaskPropagator.getLocalAlignmentsByContig().bufferChunks!(std.algorithm.iteration.ChunkByChunkImpl!(dentist.commands.propagateMask.MaskPropagator.getLocalAlignmentsByContig().__lambda1, dentist.dazzler.LocalAlignmentReader).ChunkByChunkImpl).bufferChunks(std.algorithm.iteration.ChunkByChunkImpl!(dentist.commands.propagateMask.MaskPropagator.getLocalAlignmentsByContig().__lambda1, dentist.dazzler.LocalAlignmentReader).ChunkByChunkImpl, ulong) [0x55a59f8e93a0]
??:? dentist.common.alignments.base.FlatLocalAlignment[] dentist.commands.propagateMask.MaskPropagator.getLocalAlignmentsByContig().__lambda2!(std.algorithm.iteration.ChunkByChunkImpl!(dentist.commands.propagateMask.MaskPropagator.getLocalAlignmentsByContig().__lambda1, dentist.dazzler.LocalAlignmentReader).ChunkByChunkImpl).__lambda2(std.algorithm.iteration.ChunkByChunkImpl!(dentist.commands.propagateMask.MaskPropagator.getLocalAlignmentsByContig().__lambda1, dentist.dazzler.LocalAlignmentReader).ChunkByChunkImpl) [0x55a59f8e932d]
??:? @property dentist.common.alignments.base.FlatLocalAlignment[] std.algorithm.iteration.MapResult!(dentist.commands.propagateMask.MaskPropagator.getLocalAlignmentsByContig().__lambda2, std.algorithm.iteration.ChunkByImpl!(dentist.commands.propagateMask.MaskPropagator.getLocalAlignmentsByContig().__lambda1, dentist.dazzler.LocalAlignmentReader).ChunkByImpl).MapResult.front() [0x55a59f8e94c8]
??:? @property dentist.util.region.Region!(ulong, ulong, "contigId", 0uL).Region.TaggedInterval[] std.algorithm.iteration.MapResult!(dentist.commands.propagateMask.MaskPropagator.run().__lambda1, std.algorithm.iteration.MapResult!(dentist.commands.propagateMask.MaskPropagator.getLocalAlignmentsByContig().__lambda2, std.algorithm.iteration.ChunkByImpl!(dentist.commands.propagateMask.MaskPropagator.getLocalAlignmentsByContig().__lambda1, dentist.dazzler.LocalAlignmentReader).ChunkByImpl).MapResult).MapResult.front() [0x55a59f8e97be]
??:? dentist.util.region.Region!(ulong, ulong, "contigId", 0uL).Region.TaggedInterval[][] std.algorithm.mutation.copy!(std.algorithm.iteration.MapResult!(dentist.commands.propagateMask.MaskPropagator.run().__lambda1, std.algorithm.iteration.MapResult!(dentist.commands.propagateMask.MaskPropagator.getLocalAlignmentsByContig().__lambda2, std.algorithm.iteration.ChunkByImpl!(dentist.commands.propagateMask.MaskPropagator.getLocalAlignmentsByContig().__lambda1, dentist.dazzler.LocalAlignmentReader).ChunkByImpl).MapResult).MapResult, dentist.util.region.Region!(ulong, ulong, "contigId", 0uL).Region.TaggedInterval[][]).copy(std.algorithm.iteration.MapResult!(dentist.commands.propagateMask.MaskPropagator.run().__lambda1, std.algorithm.iteration.MapResult!(dentist.commands.propagateMask.MaskPropagator.getLocalAlignmentsByContig().__lambda2, std.algorithm.iteration.ChunkByImpl!(dentist.commands.propagateMask.MaskPropagator.getLocalAlignmentsByContig().__lambda1, dentist.dazzler.LocalAlignmentReader).ChunkByImpl).MapResult).MapResult, dentist.util.region.Region!(ulong, ulong, "contigId", 0uL).Region.TaggedInterval[][]) [0x55a59f7ae725]
??:? void dentist.commands.propagateMask.MaskPropagator.run() [0x55a59f8e78ec]
??:? dentist.commandline.ReturnCode dentist.commandline.runCommand!(3).runCommand(in immutable(char)[][]) [0x55a59f816cbf]
??:? dentist.commandline.ReturnCode dentist.commandline.run(in immutable(char)[][]) [0x55a59f7e2e98]
??:? _Dmain [0x55a59f673704]

Full log: propagate-mask-back-to-reference-block.dentist-self.257.log

FATAL: Unable to handle docker://aludi/dentist:v1.0.1 uri

Hey Ludi,

I hope you are ok.
I work at the Sanger in the Darwin Tree of Life and Gene suggested me to try your tool to close a few assembly gaps.
One I run the test on the command line it finishes ok. By the time I change to send it to lsf I get an error concerning the version of LAsort? Could you please have a look:

[Tue Apr 13 11:06:22 2021]
Error in rule mask_dust:
    jobid: 15
    output: workdir/.assembly-test.dust.anno, workdir/.assembly-test.dust.data
    shell:
        DBdust workdir/assembly-test
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
    cluster_jobid: 179164 logs/cluster/mask_dust/dam=assembly-test/jobid15_fba32544-ef79-47b9-a8e5-0b95ca02bd59.out
Error executing rule mask_dust on cluster (jobid: 15, external: 179164 logs/cluster/mask_dust/dam=assembly-test/jobid15_fba32544-ef79-47b9-a8e5-0b95ca02bd59.out, jobscript: /lustre/scratch116/vr/projects/vgp/user/mu2/dentist/dentist-example2/dentist-example/.snakemake/tmp.4lrb2f3l/snakejob.mask_dust.15.sh). For error details see the cluster log and the log files of the involved rule(s).
[Tue Apr 13 11:06:22 2021]
Error in rule tandem_alignment_block:
    jobid: 18
    output: workdir/TAN.assembly-test.1.las
    log: logs/tandem-alignment.assembly-test.1.log (check log file(s) for error message)
    shell:
            {
                cd workdir
                datander '-T8' -s126 -l500 -e0.7 assembly-test.1
                LAcheck -v assembly-test TAN.assembly-test.1.las || { echo 'Check failed. Possible solutions:
Duplicate LAs: can be fixed by LAsort from 2020-03-22 or later.
In order to ignore checks entirely you may use the environment variable SKIP_LACHECK=1. Use only if you are positive the files are in fact OK!'; (( ${SKIP_LACHECK:-0} != 0 )); }
            } &> logs/tandem-alignment.assembly-test.1.log
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
    cluster_jobid: 179165 logs/cluster/tandem_alignment_block/dam=assembly-test.block=1/jobid18_8a60e75f-99ad-4021-bb14-0559b3bd4dc0.out
Error executing rule tandem_alignment_block on cluster (jobid: 18, external: 179165 logs/cluster/tandem_alignment_block/dam=assembly-test.block=1/jobid18_8a60e75f-99ad-4021-bb14-0559b3bd4dc0.out, jobscript: /lustre/scratch116/vr/projects/vgp/user/mu2/dentist/dentist-example2/dentist-example/.snakemake/tmp.4lrb2f3l/snakejob.tandem_alignment_block.18.sh). For error details see the cluster log and the log files of the involved rule(s).

I try exporting the variable but got the same error.
Could you help me?
Thank you.
Marcela.

Rule `ref_vs_reads_alignment_block` keeps failing

I'd like this ticket to be reopened, please. The error is still there with Dentist 1.0.1

Error in rule ref_vs_reads_alignment_block:
    jobid: 977
    output: workdir/scaffolds_FINAL.non-hifi.1kb.128.las, workdir/non-hifi.1kb.128.scaffolds_FINAL.las
    log: logs/ref-vs-reads-alignment.128.log (check log file(s) for error message)
    shell:
        
            {
                cd workdir
                damapper -C '-T8' -e0.7 -mdust -mdentist-self -mtan scaffolds_FINAL non-hifi.1kb.128
                LAcheck -v scaffolds_FINAL non-hifi.1kb scaffolds_FINAL.non-hifi.1kb.128.las || { echo 'Check failed. Possible solutions:

Duplicate LAs: can be fixed by LAsort from 2020-03-22 or later.

In order to ignore checks entirely you may use the environment variable SKIP_LACHECK=1. Use only if you are positive the files are in fact OK!'; (( ${SKIP_LACHECK:-0} != 0 )); }
                LAcheck -v non-hifi.1kb scaffolds_FINAL non-hifi.1kb.128.scaffolds_FINAL.las || { echo 'Check failed. Possible solutions:

Duplicate LAs: can be fixed by LAsort from 2020-03-22 or later.

In order to ignore checks entirely you may use the environment variable SKIP_LACHECK=1. Use only if you are positive the files are in fact OK!'; (( ${SKIP_LACHECK:-0} != 0 )); }
            } &> logs/ref-vs-reads-alignment.128.log
        
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
    cluster_jobid: 208494 logs/cluster/ref_vs_reads_alignment_block/block_reads=128/jobid977_e17a6776-b2a4-4570-aba0-d97bd422ba29.out

Error executing rule ref_vs_reads_alignment_block on cluster (jobid: 977, external: 208494 logs/cluster/ref_vs_reads_alignment_block/block_reads=128/jobid977_e17a6776-b2a4-4570-aba0-d97bd422ba29.out, jobscript: /lustre/scratch116/tol/teams/team308/users/mm49/tmp/non-hifi-reads2/.snakemake/tmp.4pilq3ef/snakejob.ref_vs_reads_alignment_block.977.sh). For error details see the cluster log and the log files of the involved rule(s).

Snakemake retries the jobs a few times, but they keep on failing for the same reason, and at some point snakemake gives up and quits.

The image is v1.0.1:

$ singularity inspect dentist_v1.0.1.sif 
org.label-schema.build-arch: amd64
org.label-schema.build-date: Thursday_22_April_2021_11:19:9_UTC
org.label-schema.schema-version: 1.0
org.label-schema.usage.singularity.deffile.bootstrap: docker
org.label-schema.usage.singularity.deffile.from: aludi/dentist:v1.0.1
org.label-schema.usage.singularity.version: 3.7.2

Originally posted by @muffato in #15 (comment)

DALIGNER version

Hi,

I'm getting a lot of LAcheck errors when I try running dentist (via singularity).

Your Snakefile reports the following message in the logs

Duplicate LAs: can be fixed by LAsort from 2020-03-22 or later.

But your dependencies section in the README has DALIGNER (=2020-01-15) i.e. git brach c2b47da6b3c94ed248a6be395c5b96a4e63b3f63 (which is used in your docker recipe)

Is dentist tied to DALIGNER version 2020-01-15 or can the bug-fixed version from 2020-03-22 be used?

no gap-filling in "scaffold" mode?

Hi again!

I have now run dentist successfully a few times and tested the different join policies, using as input the raw contig assembly or a scaffold assembly, scaffolded with LRscaf.

I get the best N50 running dentist with join-policy: scaffolds on the already scaffolded assembly (13.5 Mb). However, the final gap-closed.fasta contains almost as many Ns (186183) as the input assembly reference.fasta (186185).

When running with join-policy: contigs, neither the reference.fasta nor the gap-closed.fasta contain any Ns.

Is it intended that gaps are not closed in the join-policy:scaffold-mode? Do I have to run dentist a second time with join-policy: scaffoldGaps to actually glose the gaps in the 13.5 Mb assembly?

As far as I can tell, all dentist runs finished without any errors.

Here are some stats of the different assemblies:

join-policy: scaffolds

file              format  type  num_seqs      sum_len  min_len      avg_len     max_len     Q1      Q2         Q3  sum_gap         N50  Q20(%)  Q30(%)
gap-closed.fasta  FASTA   DNA        117  340,348,369    1,843  2,908,960.4  28,227,288  9,395  77,974  2,243,295        0  13,556,940       0       0
reference.fasta   FASTA   DNA        122  340,337,486      940  2,789,651.5  28,227,288  8,609  89,797  2,243,295        0  12,859,076       0       0

join-policy: contigs

file              format  type  num_seqs      sum_len  min_len      avg_len     max_len     Q1      Q2         Q3  sum_gap         N50  Q20(%)  Q30(%)
gap-closed.fasta  FASTA   DNA        161  303,401,106    1,069  1,884,478.9  28,227,288  7,516  31,549    988,561        0  10,239,100       0       0
reference.fasta   FASTA   DNA        171  303,403,113      520  1,774,287.2  25,232,339  7,314  31,549  971,039.5        0   9,931,962       0       0

Damapper command fails

Hi,
thank you for dentist, it is an intriguing tool.

However, it is a bit annoying for me. I keep getting errors like the one below, without any useful information.

damapper: Command Failed:
              LAmerge  -a reference.reads.439 /cluster/work/users/olekto/tmp/damapper.81025/[email protected]

Or

LAmerge: Did not write all records to reference.reads.426 (8035)

damapper: Command Failed:
              LAmerge  -a reference.reads.426 /tmp/damapper.69457/[email protected]

I wondered if the tmp dir was too small, so I set that to a shared folder that can contain multiple terabytes, so I doubt that is the issue anymore.

Some partitions go through fine, however, it is only a handful before one throws an error.

Is there a way to get more useful debugging information? So that I can know how the commands fail and can address that issue?

Thank you.

Sincerely,
Ole

Syntax issue in snakemake

Hi,

Apologies that I'm pasting screenshots here, in unable to login through my workstation.

The following are the versions that I'm using:
Snakemake: 3.13.3
dentist v1.0.0-beta.1 (commit 9e049a9)

Issue:

image

The second is the location of the same with respect to the Snakemake file:
1590057598372_image

I'd appreciate any help regarding the same

Pacbio header line format error

Hi,
I am getting a pacbio fasta header format error and I was wondering what format it is looking for? Here is a link to the terminal output.

The pacbio fasta headers look like this: >pacbio_SRR6282347.1.1 1 length=6524

There is a second error message I am not sure about either. The log file shows a segmentation dump

/bin/bash: line 5: 208846 Segmentation fault      (core dumped) datander '-T70' -s126 -l500 -e0.7 Ajap_genome.2

Nanopore reads

Heya,

I was just wondering if this could be utilised on nanopore assemblies? and if so what parameter I should specify for READ_TYPE in the snakemake.yml file.

Many thanks,
Ann

LAsort fails in dailgner2.0

Hi Arne,

The LAsort and LAmerge calls seem to have a bit of a problem if I use them from the daligner that I installed from git. Notice the “.N@” that gets appended. However, the LAsort and LAmerge from conda seem to work fine. However, the LAcheck from conda fails the “ref_vs_reads_alignment_block” because it tries to append “.las” to the second argument. The LAcheck from the daligner that I installed from git does work fine though.

daligner2.0: Command Failed:
LAsort /mnt/shared_tmp/376067.1.work.q/daligner.7314/scaffolds_FINAL.1.scaffolds_FINAL.7.N@

So, the furthest that we’ve progressed is with local damapper and LAcheck. The other prerequisites are from conda.

Originally posted by @BradleyRan in #3 (comment)

Snakefile calls to LAcheck in shell sections fail

In Snakefile, some of the shell command sections use "cd workdir" but then include "workdir" in the path arguments to LAcheck.

example output:
cd workdir
datander -l500 -e0.980100 '-T1' scaffolds_FINAL
if (( ${SKIP_LACHECK:-0} == 0 ))
then
LAcheck -v scaffolds_FINAL workdir/TAN.scaffolds_FINAL.las || echo Try setting the environment variable SKIP_LACHECK=1 if the error is Duplicate LAs; otherwise rerun this block
else
echo "Skipping LAcheck due to user request"
fi
} &> logs/tandem-alignment.log

Could not create pipe to check startup of child (Too many open files)

Hi,

I am using ONT data and I got the error in the title from the pileupCollector (in the collect.log).
I tried to increase the number of tolerated opened files with: ulimit -n 100000 and I set the batch_size: 100 in the snakemake.yml file but this did not solve the problem. Any advice on how to solve this?

Thanks a lot!

Dentist parameters

Hi,
I managed to run the pipeline but know I am having a closer look at the options.
My goal is to be conservative with this: I ma happy to fill gaps when there is strong evidence for it but I don't want to mess up the assembly because it is pretty good overall I think.

From what I gathered form the github, --read-coverage combined with the ploidy is one very important option. Do you have any recommendation about how to obtain the coverage value ? Is there recommended tools for this ?

I am thinking I should not play much with the other parameters as I have CLR data.

Any advice on this ?

Best
Aurélien

Broken symlinks and/or missing files from data test set

It seems that there are files missing from the example dataset at https://github.com/a-ludi/dentist-example/releases/download/v1.0.2-2/dentist-example.tar.gz.

Although there are mentions of this in some of the other issue threads, I don't see an open issue thread that specifically for the missing files.

I downloaded your test data set today 9th June 2021:

wget https://github.com/a-ludi/dentist-example/releases/download/v1.0.2-2/dentist-example.tar.gz
tar -xzf dentist-example.tar.gz
cd dentist-example

However, several symbolic links point to a folder called ./dentist-example/ that is missing from the tarball:

~/dentist-example$ ll
total 111M
-rw-r--r-- 1 ubuntu djs217 2.3K Apr 28 04:40 checksum.md5
lrwxrwxrwx 1 ubuntu djs217   54 Apr 26 08:30 cluster.yml -> dentist-example/external/dentist/snakemake/cluster.yml
drwxrwxr-x 2 ubuntu djs217 4.0K Jun  9 13:39 data
-rw-r--r-- 1 ubuntu djs217 1.3K Feb 18 14:42 dentist.json
-rwxr-xr-x 1 ubuntu djs217 111M Apr 27 08:52 dentist_v1.0.2.sif
lrwxrwxrwx 1 ubuntu djs217   66 Apr 26 08:29 profile-slurm.drmaa.yml -> dentist-example/external/dentist/snakemake/profile-slurm.drmaa.yml
lrwxrwxrwx 1 ubuntu djs217   73 Apr 26 08:29 profile-slurm.submit-async.yml -> dentist-example/external/dentist/snakemake/profile-slurm.submit-async.yml
lrwxrwxrwx 1 ubuntu djs217   72 Apr 26 08:29 profile-slurm.submit-sync.yml -> dentist-example/external/dentist/snakemake/profile-slurm.submit-sync.yml
-rw-r--r-- 1 ubuntu djs217 4.5K Apr 27 08:54 README.md
lrwxrwxrwx 1 ubuntu djs217   52 Apr 26 08:29 Snakefile -> dentist-example/external/dentist/snakemake/Snakefile
-rw-r--r-- 1 ubuntu djs217 3.8K Apr 28 04:40 snakemake.yml

Many thanks,

David

`subprocess.run` got an unexpected keyword argument 'text'

I am trying to run DENTIST installed with conda on a Linux machine (mamba and snakemake also installed in the same env as DENTIST).
My working directory contains 5 files and 2 directories:

short_read_assembly_contig.fa
PacBio_reads.fastq.gz 
Snakefile
snakemake.yml 
dentist.yml
envs\
scripts\

I have not modified dentist.yml file and the only modifications to snakemake.yml I made are file names.

When I try to run it with the following command - snakemake --configfile=snakemake.yml --use-conda --cores=16, I get this error:

SyntaxError in line 930 of /ANALYSIS1/SRS_genome_asm/Snakefile:
Unexpected keyword container in rule definition (Snakefile, line 930)

What can be causing this?

Questions about Input Reads

Hello, I have two questions regarding input reads:

1-Is it recommended to do error correction on the PB long reads before plugging them into dentist?

2-Is there a maximum coverage recommended for the input reads? Your example is 25x, so I was just wondering if there is an upper limit.

Thanks!

Getting started

Hi,

I am trying to use dentist for the first time but am having some trouble getting started. I am running dentist using singularity and have snakemake version 6.0.0 installed.

I downloaded the dentist.json and snakemake.yml files and edited them to include the relevant paths and also some options mentioned in the README (see below).

I first tried to validate the config files using the recommended command.

snakemake --configfile=snakemake.yml --use-singularity --cores=32 -f -- validate_dentist_config

INFO:    Convert SIF file to sandbox...
INFO:    Cleaning up image...
Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 32
Rules claiming more threads will be scaled down.
Job counts:
    count   jobs
    1       validate_dentist_config
    1
Select jobs to execute...

[Mon Mar  8 15:52:21 2021]
localrule validate_dentist_config:
    input: dentist.json
    jobid: 0

INFO:    Using cached SIF image
INFO:    Convert SIF file to sandbox...
INFO:    Cleaning up image...
Job counts:
    count   jobs
    1       validate_dentist_config
    1
[Mon Mar  8 15:52:31 2021]
Finished job 0.
1 of 1 steps (100%) done
Complete log: /scratch/amackintosh/DENTIST_02/.snakemake/log/2021-03-08T155213.682050.snakemake.log

All seemed to work fine, so I then tried to run it.

snakemake --configfile=snakemake.yml --use-singularity --cores=32

INFO:    Convert SIF file to sandbox...
INFO:    Cleaning up image...
Building DAG of jobs...
MissingInputException in line 1091 of /scratch/amackintosh/DENTIST_02/Snakefile:
Missing input files for rule ref_vs_reads_alignment_block:
/scratch/amackintosh/DENTIST_02/.brenthis_ino.SP_BI_364.v1_1.contigs.dentist-self.data
/scratch/amackintosh/DENTIST_02/.brenthis_ino.SP_BI_364.v1_1.contigs.dust.anno
/scratch/amackintosh/DENTIST_02/.brenthis_ino.SP_BI_364.v1_1.contigs.dentist-self.anno
/scratch/amackintosh/DENTIST_02/.brenthis_ino.SP_BI_364.v1_1.contigs.tan.data
/scratch/amackintosh/DENTIST_02/.brenthis_ino.SP_BI_364.v1_1.contigs.tan.anno
scratch/amackintosh/DENTIST_02/.brenthis_ino.SP_BI_364.v1_1.contigs.dust.data

I am not used to using snakemake but I assume the missing input files are because a preceding process could not be executed. Is it possible that the problem lies within how I filled out the json and yaml files? The part of the json I edited the most looks like this (below), could any of these options being causing problems?

	"// This is a comment and will be ignored": [
	"You must set at least either `ploidy` and `read-coverage`",
	"or `max-coverage-reads` and `min-coverage-reads`."
	],
	"__default__": {
		"read-coverage": 66.9,
		"min-reads-per-pile-up": 3,
		"min-spanning-reads": 3,
		"join-policy": "contigs",
		"ploidy": 2,
		"max-coverage-self": 3,
		"verbose": 2,

Any help would be really appreciated.

Best,

Alex

md5checksum shows example dataset analysis fails

Hi, I've been trying to use dentist on the provided example dataset but a number of the md5 check sums after it finishes running are failing with no other errors that I can find.

I installed snakemake v6.0.0 and singularity v3.6.3 through conda and ran through the example dataset as follows:

wget https://bds.mpi-cbg.de/hillerlab/DENTIST/dentist-example.v1.0.1.tar.gz
tar -xzf ./dentist-example.v1.0.1.tar.gz
cd dentist-example

# run the workflow
SKIP_LACHECK=1 snakemake --configfile=snakemake.yaml --use-singularity --cores=4 

# validate the files
md5sum -c checksum.md5

but the checksum output was as follows:

gap-closed.fasta: FAILED
workdir/.assembly-test.bps: OK
workdir/.assembly-test.dentist-reads.anno: OK
workdir/.assembly-test.dentist-reads.data: OK
workdir/.assembly-test.dentist-self.anno: OK
workdir/.assembly-test.dentist-self.data: OK
workdir/.assembly-test.dust.anno: OK
workdir/.assembly-test.dust.data: OK
workdir/.assembly-test.hdr: OK
workdir/.assembly-test.idx: OK
workdir/.assembly-test.tan.anno: OK
workdir/.assembly-test.tan.data: OK
workdir/.gap-closed-preliminary.bps: FAILED
workdir/.gap-closed-preliminary.dentist-self.anno: FAILED
workdir/.gap-closed-preliminary.dentist-self.data: FAILED
workdir/.gap-closed-preliminary.dentist-weak-coverage.anno: FAILED
workdir/.gap-closed-preliminary.dentist-weak-coverage.data: FAILED
workdir/.gap-closed-preliminary.dust.anno: FAILED
workdir/.gap-closed-preliminary.dust.data: FAILED
workdir/.gap-closed-preliminary.hdr: OK
workdir/.gap-closed-preliminary.idx: FAILED
workdir/.gap-closed-preliminary.tan.anno: FAILED
workdir/.gap-closed-preliminary.tan.data: FAILED
workdir/.reads.bps: OK
workdir/.reads.idx: OK
workdir/assembly-test.assembly-test.las: OK
workdir/assembly-test.dam: OK
workdir/assembly-test.reads.las: OK
workdir/gap-closed-preliminary.dam: FAILED
workdir/gap-closed-preliminary.fasta: FAILED
workdir/gap-closed-preliminary.gap-closed-preliminary.las: FAILED
workdir/gap-closed-preliminary.reads.las: FAILED
workdir/reads.db: OK
md5sum: WARNING: 15 computed checksums did NOT match

any advice on how to get the example dataset running would be greatly appreciated,
Thanks,
Rishi

'Wildcards' object has no attribute 'memory'

Hello,

I have been attempting to get dentist to complete on a cluster managed by slurm. Dentist fails with this message:

WorkflowError in line 562 of /lustre/project/gbru/gbru_X/RawData/dentist/Snakefile:
'Wildcards' object has no attribute 'memory'
  File "/software/7/apps/snakemake/5.25.0/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 111, in run_jobs
  File "/software/7/apps/snakemake/5.25.0/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 920, in run

I am worried that dentist/snakemake is not "seeing" the cluster.yml file which lists memory for the workflow. How can I fix this so that I can run dentist to completion?

My stderr, script, config.yaml, cluster.yml and snakemake.yml files are attached.
stderr.txt
Dentist_Script.txt
config.yaml.txt
cluster.txt
snakemake.txt

dentist collect v1.0.0-beta.3 requires DAScover from DASCRUBBER which failed to compile.

dentist collect v1.0.0-beta.3 requires DAScover from DASCRUBBER. DAScover.c failed to compile.

DAScover.c:119:27: error: ‘DB_CSS’ undeclared (first use in this function)
if ((Reads[j].flags & DB_CSS) == 0)

(snakemake) [randy.bradley@ceres dentist-test]$ cat logs/collect.log
Error: missing external tools:

Check your PATH and/or install the required software.

Adding haplotigs back for gap filling?

Dear Arne,

Thank you again for your help with setting up dentist on our cluster. I have one more conceptual question before giving it a try which may be better asked in a separate thread:

I am trying to use dentist to fill gaps in a PacBio assembly after Hi-C scaffolding. The sequenced plant was highly heterozygous, and the initial assembly contained both contigs and haplotigs, which represent alternate haplotypes at heterozygous loci. I purged all the haplotigs before scaffolding, so the current assembly is a haploid representation, while the raw PacBio reads to be mapped for gap filling are obviously from the diploid genome.

I am thus wondering whether this could lead to incorrect gap filling if reads from both haplotypes map to the same assembly gap. Would it be best to add the haplotigs again for gap filling?

I assume that the haplotigs could be removed again after gap filling as long as the option to merge contigs is not enabled?

Best,
Roman

Missing Snakefile from example test dataset

I believe this is a distinct issue from #21 as it pertains to the Snakefile rather than the .yml files that you helpfully signposted. I am using the https://github.com/a-ludi/dentist-example/releases/download/v1.0.2-2/dentist-example.tar.gz example dataset.

When I execute
(base) ubuntu@bio-xanthomonas:~/dentist-example$ snakemake --configfile=snakemake.yml --use-singularity --cores=all
I get this error message:
Error: no Snakefile found, tried Snakefile, snakefile, workflow/Snakefile, workflow/snakefile.

So, I guessed the obvious thing to do was to download this Snakefile: https://github.com/a-ludi/dentist/blob/develop/snakemake/Snakefile.

However, when I try again to execute the above command line, with this Snakefile in the current directory, I get:

IndentationError in line 113 of <tokenize>:
unindent does not match any outer indentation level (<tokenize>, line 113)
 File "/usr/lib/python3.8/tokenize.py", line 512, in _tokenize

I suspect, that this might be because I am not using the correct Snakefile?

Can gaps be filled with just extension reads?

Hello,

Can Dentist be configured to fill gaps with just extension reads (the purple and orange ones in fig) ? (e.g. by setting "min-spanning-reads" to 0)
dentist_suppfig1 (From the DENTIST paper supplementary figure 1)

Thanks,
Tim

No joined contigs with greedy configuration

Hi,

First, thanks for developing Dentist!

I have an issue very similar to @oushujun in #22 : basically I am trying to scaffold contigs with another assembly, and I am using the greedy configuration file for that.

After running Dentist, no contigs are joined, and the gap-closed.closed-gaps.bed file is empty. I ran the lost-gaps.py script, which gives the following report:

"In this run of DENTIST 4837 potentially closable gaps were not closed. More details:

Hint: use DBshow -n workdir/[REFERENCE].dam | cat -n to translate contig numbers to FASTA
coordinates.

  • lost 4 in collect phase
    • lost 0 gap(s) because of insufficient number of spanning reads (--min-spanning-reads=1)
    • lost 4 gap(s) because a scaffolding conflict was detected
      • conflicting gap closings: 1890-5809 (1 reads), 1890-28348 (1 reads)
      • conflicting gap closings: 2063-6912 (1 reads), 2063-15318 (1 reads)
      • conflicting gap closings: 5927-21838 (1 reads), 6196-21838 (1 reads)
      • conflicting gap closings: 4971-27615 (1 reads), 23980-27615 (1 reads)
  • lost 4833 in process phase
    • skipped 1389 read pile ups because of errors
      • consensus failed (1274 times)
      • other (115 times)
    • skipped 3444 read pile ups because of --only=spanning
  • lost 0 in output phase
    • skipped 0 insertion(s) because of --max-insertion-error=0.1
    • skipped 0 insertion(s) because of --join-policy=contigs
    • skipped 0 extension(s) because of --min-extension-length=100"

Looking into the process phase logs, I found 1273 errors reading "consensus alignment is invalid" and 230 errors "process DASqv returned with non-zero exit code 1: DASqv: Average coverage is too low (< 4X), cannot infer qv's\n".

Should I change some parameters in the configuration file?

Best regards
Yann

Error in rule validate_regions_block

Hi,
I used the latest version of Dentist to a genome fill gaps using ONT reads. The genome size is about 300 MB, and the draft genome has 546 contigs. The ONT reads is about 10X.
I used the following command:
snakemake --configfile=snakemake.yml --use-conda --cores=all
And got the following error message which I do not understand. Can anyone help? Thanks a lot in advance!

Error in rule validate_regions_block:
    jobid: 33
    input: workdir/gap-closed-preliminary.dam, workdir/.gap-closed-preliminary.bps, workdir/.gap-closed-preliminary.hdr, workdir/.gap-closed-preliminary.idx, workdir/reads.dam, workdir/.reads.bps, workdir/.reads.hdr, workdir/.reads.idx, workdir/.gap-closed-preliminary.closed-gaps.anno, workdir/.gap-closed-preliminary.closed-gaps.data, workdir/gap-closed-preliminary.1.reads.las
    output: workdir/validation-report.1.json, workdir/.gap-closed-preliminary.1.dentist-weak-coverage.anno, workdir/.gap-closed-preliminary.1.dentist-weak-coverage.data
    log: logs/validate-regions-block.1.log (check log file(s) for error details)
    conda-env: /storage/zhenyingLab/houyan/software_big/dentist.v4.0.0.x86_64/dentist-example/.snakemake/conda/b8ca5a6181a223c6ff65c49b8c435efe_
    shell:
        dentist validate-regions --config=dentist.yml --threads=1 --weak-coverage-mask=1.dentist-weak-coverage workdir/gap-closed-preliminary.dam workdir/reads.dam workdir/gap-closed-preliminary.1.reads.las closed-gaps > workdir/validation-report.1.json 2> logs/validate-regions-block.1.log
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

And I checked the file logs/validate-regions-block.1.log

{"executableVersion":"v4.0.0","gitCommit":"4cc3373284c0cbb2718a8d88d4db615b5f3c49d2","refDb":"workdir/gap-closed-preliminary.dam","readsDb":"workdir/reads.dam","readsAlignmentFile":"workdir/gap-closed-preliminary.1.reads.las","regions":"closed-gaps","configFile":"dentist.yml","help":false,"minCoverageReads":2,"coverageBoundsReads":[2,4294967295],"minSpanningReads":3,"reportAll":false,"ploidy":2,"properAlignmentAllowance":100,"quiet":false,"readCoverage":10,"regionContext":1000,"revertOptionNames":[],"weakCoverageWindow":500,"tracePointDistance":100,"numThreads":1,"verbosity":2,"weakCoverageMask":"1.dentist-weak-coverage"}
{"thread":139971147566400,"logLevel":"diagnostic","timestamp":638084339232475094,"function":"dentist.commands.validateRegions.RegionsValidator.run","state":"enter"}
{"thread":139971147566400,"logLevel":"diagnostic","timestamp":638084339232475484,"function":"dentist.commands.validateRegions.RegionsValidator.readInputs","state":"enter"}
{"thread":139971147566400,"logLevel":"diagnostic","state":"pre","command":["DBshow","-n","workdir/gap-closed-preliminary.dam"],"timestamp":638084339232475708,"action":"execute","type":"command"}
{"thread":139971147566400,"logLevel":"diagnostic","state":"post","command":["DBshow","-n","workdir/gap-closed-preliminary.dam"],"output":[">NW_023496800.1\tscaffold-1 :: Contig 0[0,1966]",">NW_023496800.1\tscaffold-1 :: Contig 1[2020,12298]",">NW_023496800.1\tscaffold-1 :: Contig 2[12352,17346]",">NW_023496800.1\tscaffold-1 :: Contig 3[17400,20027]",">NW_023496800.1\tscaffold-1 :: Contig 4[33786,35797]",">NW_023496800.1\tscaffold-1 :: Contig 5[35819,37076]",">NW_023496800.1\tscaffold-1 :: Contig 6[43260,44464]",">NW_023496800.1\tscaffold-1 :: Contig 7[44848,47029]",">NW_023496800.1\tscaffold-1 :: Contig 8[58668,62672]",">NW_023496800.1\tscaffold-1 :: Contig 9[90095,91767]",">NW_023496800.1\tscaffold-1 :: Contig 10[92382,93419]",">NW_023496800.1\tscaffold-1 :: Contig 11[100951,102012]",">NW_023496800.1\tscaffold-1 :: Contig 12[116195,118594]",">NW_023496800.1\tscaffold-1 :: Contig 13[120683,122409]",">NW_023496800.1\tscaffold-1 :: Contig 14[126485,127839]",">NW_023496800.1\tscaffold-1 :: Contig 15[135782,136895]",">NW_023496800.1\tscaffold-1 :: Contig 16[137681,138707]",">NW_023496800.1\tscaffold-1 :: Contig 17[138751,140574]",">NW_023496800.1\tscaffold-1 :: Contig 18[144047,145291]",">NW_023496800.1\ts"],"exitStatus":0,"timestamp":638084339232987997,"action":"execute","type":"command"}
{"thread":139971147566400,"logLevel":"diagnostic","state":"pre","command":["DBdump","workdir/gap-closed-preliminary.dam"],"timestamp":638084339234412205,"action":"execute","type":"pipe"}
{"thread":139971147566400,"logLevel":"diagnostic","state":"pre","command":["DBdump","-h","workdir/gap-closed-preliminary.dam"],"timestamp":638084339234592626,"action":"execute","type":"pipe"}
{"thread":139971147566400,"logLevel":"diagnostic","state":"pre","command":["DBdump","workdir/reads.dam"],"timestamp":638084339238570890,"action":"execute","type":"pipe"}
{"thread":139971147566400,"logLevel":"diagnostic","state":"pre","command":["DBdump","-h","workdir/reads.dam"],"timestamp":638084339239372035,"action":"execute","type":"pipe"}
{"thread":139971147566400,"logLevel":"diagnostic","state":"pre","command":["DBdump","workdir/gap-closed-preliminary.dam"],"timestamp":638084339295504008,"action":"execute","type":"pipe"}
{"thread":139971147566400,"logLevel":"diagnostic","state":"pre","command":["DBdump","workdir/gap-closed-preliminary.dam"],"timestamp":638084339295554201,"action":"execute","type":"pipe"}
{"thread":139971147566400,"logLevel":"info","numRegions":0,"contigABounds":[10,29154],"numContigIds":0,"timestamp":638084339295636137,"numReadIds":0}
{"thread":139971147566400,"logLevel":"diagnostic","timestamp":638084339295636589,"function":"dentist.commands.validateRegions.RegionsValidator.readInputs","state":"exit","timeElapsed":63160979}
{"thread":139971147566400,"logLevel":"diagnostic","timestamp":638084339295636834,"function":"dentist.commands.validateRegions.RegionsValidator.validateRegions","state":"enter"}
{"thread":139971147566400,"logLevel":"diagnostic","timestamp":638084339295639196,"function":"dentist.commands.validateRegions.RegionsValidator.validateRegions","state":"exit","timeElapsed":2282}
{"thread":139971147566400,"logLevel":"diagnostic","timestamp":638084339295639563,"function":"dentist.commands.validateRegions.RegionsValidator.run","state":"exit","timeElapsed":63164067}
Error: object.Exception@/home/alu/.local/share/mambaforge/envs/dentist/conda-bld/dentist-core_1663094479349/work/.dlang/dmd-2.100.2/linux/bin64/../../src/phobos/std/parallelism.d(1636): workUnitSize must be > 0.
----------------
??:? pure @safe noreturn std.exception.bailOut!(Exception).bailOut(immutable(char)[], ulong, scope const(char)[]) [0x12c8a82]
??:? pure @safe bool std.exception.enforce!().enforce!(bool).enforce(bool, lazy const(char)[], immutable(char)[], ulong) [0x12c89fc]
??:? pure @safe std.parallelism.ParallelForeach!(std.range.Zip!(dentist.util.region.Region!(ulong, ulong, "contigId", 0uL).Region.TaggedInterval[], dentist.util.region.Region!(ulong, ulong, "contigId", 0uL).Region.TaggedInterval[], uint[2][], uint[][]).Zip).ParallelForeach std.parallelism.TaskPool.parallel!(std.range.Zip!(dentist.util.region.Region!(ulong, ulong, "contigId", 0uL).Region.TaggedInterval[], dentist.util.region.Region!(ulong, ulong, "contigId", 0uL).Region.TaggedInterval[], uint[2][], uint[][]).Zip).parallel(std.range.Zip!(dentist.util.region.Region!(ulong, ulong, "contigId", 0uL).Region.TaggedInterval[], dentist.util.region.Region!(ulong, ulong, "contigId", 0uL).Region.TaggedInterval[], uint[2][], uint[][]).Zip, ulong) [0x1065468]
??:? pure @safe std.parallelism.ParallelForeach!(std.range.Zip!(dentist.util.region.Region!(ulong, ulong, "contigId", 0uL).Region.TaggedInterval[], dentist.util.region.Region!(ulong, ulong, "contigId", 0uL).Region.TaggedInterval[], uint[2][], uint[][]).Zip).ParallelForeach std.parallelism.TaskPool.parallel!(std.range.Zip!(dentist.util.region.Region!(ulong, ulong, "contigId", 0uL).Region.TaggedInterval[], dentist.util.region.Region!(ulong, ulong, "contigId", 0uL).Region.TaggedInterval[], uint[2][], uint[][]).Zip).parallel(std.range.Zip!(dentist.util.region.Region!(ulong, ulong, "contigId", 0uL).Region.TaggedInterval[], dentist.util.region.Region!(ulong, ulong, "contigId", 0uL).Region.TaggedInterval[], uint[2][], uint[][]).Zip) [0x1065417]
??:? @safe std.parallelism.ParallelForeach!(std.range.Zip!(dentist.util.region.Region!(ulong, ulong, "contigId", 0uL).Region.TaggedInterval[], dentist.util.region.Region!(ulong, ulong, "contigId", 0uL).Region.TaggedInterval[], uint[2][], uint[][]).Zip).ParallelForeach std.parallelism.parallel!(std.range.Zip!(dentist.util.region.Region!(ulong, ulong, "contigId", 0uL).Region.TaggedInterval[], dentist.util.region.Region!(ulong, ulong, "contigId", 0uL).Region.TaggedInterval[], uint[2][], uint[][]).Zip).parallel(std.range.Zip!(dentist.util.region.Region!(ulong, ulong, "contigId", 0uL).Region.TaggedInterval[], dentist.util.region.Region!(ulong, ulong, "contigId", 0uL).Region.TaggedInterval[], uint[2][], uint[][]).Zip) [0x1064dd7]
??:? void dentist.commands.validateRegions.RegionsValidator.validateRegions() [0x10584c8]
??:? void dentist.commands.validateRegions.RegionsValidator.run() [0x1057712]
??:? void dentist.commands.validateRegions.execute(in dentist.commandline.OptionsFor!(16).OptionsFor) [0x105768e]
??:? dentist.commandline.ReturnCode dentist.commandline.runCommand!(16).runCommand(in immutable(char)[][]) [0x10132ed]
??:? dentist.commandline.ReturnCode dentist.commandline.run(in immutable(char)[][]) [0xf1a5b9]
??:? _Dmain [0xd94f07]

Edit 2023-01-05: fixed formatting for better readbility.

Setting dentist parameters

When setting the parameters below, do these need to be included in the dentist.json config file? and if so in which section?

--max-insertion-error
--min-anchor-length
--min-reads-per-pile-up
--min-spanning-reads
--allow-single-reads
--join-policy

MissingOutputException error

Hi,
I am using HiFi data to fill the gaps in a scaffold assembly. I get this error. I changed the latency to 15s from 5s
I am running it from a non-singularity source.
MissingOutputException in line 984 of /scratchdata/shripathi/dentist.v2.0.0.x86_64/Snakefile:
Job Missing files after 15 seconds:
workdir/assembly.assembly.las
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.
Job id: 13 completed successfully, but some output files are missing. 13
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message

Conda-install-error

when I run "snakemake --configfile=snakemake.yml --use-conda" or "snakemake --configfile=snakemake.yml --use-conda --profile=slurm"
it told me that "SyntaxError in line 920 of /public/home/GL_lixn/biosoft/dentist.v3.0.0.x86_64/Snakefile:
Unexpected keyword container in rule definition (Snakefile, line 920)"

Missing half of output files from test run

Dear Arne,

I'm having some trouble getting dentist to run on our university cluster with the example dataset. I haven't used snakemake before, so please excuse if I missed something very basic. Here is what I've tried so far:

The symlinks to the .yml configuration files in the v.1.0.2-2 example folder appear broken on my system, and v.1.0.1 only has a drmaa config file. I thus took the ones from the pre-built binaries and adjusted them as follows:

  • Fixed some inconsistencies with file endings .yml / .yaml
  • Added -A PROJECT_NAME to the profile-slurm.submit-async.yml file (requirement on this cluster) and copied to $HOME/.config/snakemake/slurm/config.yaml
  • Changed partition from batch to the actual cluster partition name in the cluster.yml file
  • Changed file paths and increased max thread number to 32 in the snakemake.yml file
  • Changed read coverage and ploidy in the dentist.json file according to the one in v.1.0.1

Running the pipeline using snakemake v.6.4.0 and singularity v.3.7.1 finished in ~5.5h without any obvious error in the log files. However, there is no output and md5sum -c checksum.md5shows half of the files as missing:

gap-closed.fasta: FAILED open or read
workdir/.assembly-test.bps: OK
workdir/.assembly-test.dentist-reads.anno: OK
workdir/.assembly-test.dentist-reads.data: OK
workdir/.assembly-test.dentist-self.anno: OK
workdir/.assembly-test.dentist-self.data: OK
workdir/.assembly-test.dust.anno: OK
workdir/.assembly-test.dust.data: OK
workdir/.assembly-test.hdr: OK
workdir/.assembly-test.idx: OK
workdir/.assembly-test.tan.anno: OK
workdir/.assembly-test.tan.data: OK
md5sum: workdir/.gap-closed-preliminary.bps: No such file or directory
workdir/.gap-closed-preliminary.bps: FAILED open or read
md5sum: workdir/.gap-closed-preliminary.dentist-self.anno: No such file or directory
workdir/.gap-closed-preliminary.dentist-self.anno: FAILED open or read
md5sum: workdir/.gap-closed-preliminary.dentist-self.data: No such file or directory
workdir/.gap-closed-preliminary.dentist-self.data: FAILED open or read
md5sum: workdir/.gap-closed-preliminary.dentist-weak-coverage.anno: No such file or directory
workdir/.gap-closed-preliminary.dentist-weak-coverage.anno: FAILED open or read
md5sum: workdir/.gap-closed-preliminary.dentist-weak-coverage.data: No such file or directory
workdir/.gap-closed-preliminary.dentist-weak-coverage.data: FAILED open or read
md5sum: workdir/.gap-closed-preliminary.dust.anno: No such file or directory
workdir/.gap-closed-preliminary.dust.anno: FAILED open or read
md5sum: workdir/.gap-closed-preliminary.dust.data: No such file or directory
workdir/.gap-closed-preliminary.dust.data: FAILED open or read
md5sum: workdir/.gap-closed-preliminary.hdr: No such file or directory
workdir/.gap-closed-preliminary.hdr: FAILED open or read
md5sum: workdir/.gap-closed-preliminary.idx: No such file or directory
workdir/.gap-closed-preliminary.idx: FAILED open or read
md5sum: workdir/.gap-closed-preliminary.tan.anno: No such file or directory
workdir/.gap-closed-preliminary.tan.anno: FAILED open or read
md5sum: workdir/.gap-closed-preliminary.tan.data: No such file or directory
workdir/.gap-closed-preliminary.tan.data: FAILED open or read
workdir/.reads.bps: OK
workdir/.reads.idx: OK
workdir/assembly-test.assembly-test.las: OK
workdir/assembly-test.dam: OK
workdir/assembly-test.reads.las: OK
md5sum: workdir/gap-closed-preliminary.dam: No such file or directory
workdir/gap-closed-preliminary.dam: FAILED open or read
md5sum: workdir/gap-closed-preliminary.fasta: No such file or directory
workdir/gap-closed-preliminary.fasta: FAILED open or read
md5sum: workdir/gap-closed-preliminary.gap-closed-preliminary.las: No such file or directory
workdir/gap-closed-preliminary.gap-closed-preliminary.las: FAILED open or read
md5sum: workdir/gap-closed-preliminary.reads.las: No such file or directory
workdir/gap-closed-preliminary.reads.las: FAILED open or read
workdir/reads.db: OK
md5sum: WARNING: 16 listed files could not be read

Here some system information:

Distributor ID:	Scientific
Description:	Scientific Linux release 7.9 (Nitrogen)
Release:	7.9
Codename:	Nitrogen

And here the slurm submission script:


#SBATCH -J gorteria_dentist
#SBATCH -A PROJECT_NAME
#SBATCH --output=dentist_%A.out
#SBATCH --error=dentist_%A.err
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --time=6:00:00
#SBATCH --mail-type=ALL
#SBATCH -p PARTITION


. /etc/profile.d/modules.sh
module purge
module load rhel7/default-peta4
module load use.own
module load dentist
module load mambaforge
source activate /PATH/mambaforge/envs/snakemake

snakemake --configfile=snakemake.yaml --use-singularity --profile=slurm

Thank you very much for your help!
Roman

Singularity issue

Hi,

first time using singularity. Is this supposed to happen or am i supposed to do something else?

$ singularity --debug shell docker://aludi/dentist:latest
DEBUG   [U=2001,P=20764]   persistentPreRun()            Singularity version: 3.6.3
DEBUG   [U=2001,P=20764]   persistentPreRun()            Parsing configuration file /ceph/users/dlaetsch/.conda/envs/singularity/etc/singularity/singularity.conf
DEBUG   [U=2001,P=20764]   handleConfDir()               /ceph/users/dlaetsch/.singularity already exists. Not creating.
DEBUG   [U=2001,P=20764]   getCacheParentDir()           environment variable SINGULARITY_CACHEDIR not set, using default image cache
DEBUG   [U=2001,P=20764]   parseURI()                    Parsing docker://aludi/dentist:latest into reference
FATAL   [U=2001,P=20764]   replaceURIWithImage()         Unable to handle docker://aludi/dentist:latest uri: failed to get checksum for docker://aludi/dentist:latest: Error reading manifest latest in docker.io/aludi/dentist: manifest unknown: manifest unknown

cheers,

dom

one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!

Hello! Can I ask your help to troubleshoot? I keep having this error at self alignment block. This is one of the error. I have 3Gb genome I'd like to be gap filled with 33x coverage of HiFi reads.

Error in rule self_alignment_block:
    jobid: 114
    input: dentist.yml, workdir/sp_buf_purged_scaffolded_chrlevel_draft1.dam, workdir/.sp_buf_purged_scaffolded_chrlevel_draft1.bps, workdir/.sp_buf_purged_scaffolded_chrlevel_draft1.hdr, workdir/.sp_buf_purged_scaffolded_chrlevel_draft1.idx, workdir/.assembly.sp_buf_purged_scaffolded_chrlevel_draft1, workdir/.sp_buf_purged_scaffolded_chrlevel_draft1.dust.anno, workdir/.sp_buf_purged_scaffolded_chrlevel_draft1.dust.data, workdir/.sp_buf_purged_scaffolded_chrlevel_draft1.tan.anno, workdir/.sp_buf_purged_scaffolded_chrlevel_draft1.tan.data
    output: workdir/sp_buf_purged_scaffolded_chrlevel_draft1.10.sp_buf_purged_scaffolded_chrlevel_draft1.12.las, workdir/sp_buf_purged_scaffolded_chrlevel_draft1.12.sp_buf_purged_scaffolded_chrlevel_draft1.10.las
    log: logs/self-alignment.sp_buf_purged_scaffolded_chrlevel_draft1.10.12.log (check log file(s) for error message)
    conda-env: path/to/folder/.snakemake/conda/e844e04141fb5a79087f06209dc3fe6c_
    shell:
        
            {
                cd workdir
                daligner -I '-T8' -l500 -e0.7 -mdust -mtan sp_buf_purged_scaffolded_chrlevel_draft1.10 sp_buf_purged_scaffolded_chrlevel_draft1.12
            } &> logs/self-alignment.sp_buf_purged_scaffolded_chrlevel_draft1.10.12.log
        
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
    cluster_jobid: 15340668

Error executing rule self_alignment_block on cluster (jobid: 114, external: 15340668, jobscript: path/to/folder/.snakemake/tmp.2nk_qpv5/snakejob.self_alignment_block.114.sh). For error details see the cluster log and the log files of the involved rule(s).

Unfortunately, the the logfile has nothing at all besides this :

(base) [user@l01 logs]$ cat self-alignment.sp_buf_purged_scaffolded_chrlevel_draft1.log
  Merging 144 files totaling 212,461,282 records

Thank you very much!

Example fails with Colon expected after keyword global_conda. (Snakefile, line 103)

Trying to run the example on a fresh Ubuntu 22.04 install. I ran the following and get the following error:

# install Dentist
mamba create -n dentist -c a_ludi -c bioconda dentist-core
mamba activate dentist
mamba install -c conda-forge -c bioconda snakemake

#Get example data
wget https://github.com/a-ludi/dentist/releases/download/v4.0.0/dentist-example.tar.gz
tar -xzf dentist-example.tar.gz
cd dentist-example

# run the workflow
PATH="$PWD/bin:$PATH" snakemake --configfile=snakemake.yml --cores=all

I get the error:

Colon expected after keyword global_conda. (Snakefile, line 103)

Any suggestions?

Rerun after stop due to time limit

Hello,

I am ran the snakemake pipeline from on a slurm cluster in a single job using sbatch.
It stopped due to time limit and now when I try to rerun it, it just list a bunch of job and stops again without clear explanation.
Any idea what is going on ?

Here is my script

snakemake --configfile=snakemake.yml --use-conda --cores=all --rerun-incomplete --unlock
snakemake --configfile=snakemake.yml --use-conda --cores=all --rerun-incomplete

File locking blocks indefinitely in `writePileUps`

Hi Arne,

[...] The job seems to be stuck in “dentist collect”. The file “workdir/pile-ups.db” was created but it is empty. The node that it is running on shows that 71G of memory is used and 52G available.

Here are the last entries in the “collect.log” file.

{"thread":140737354013504,"timestamp":637255135504311334,"numPileUps":260,"numAlignmentChains":3186}
{"thread":140737354013504,"timestamp":637255135504323016,"state":"exit","function":"dentist.commands.collectPileUps.PileUpCollector.buildPileUps","timeElapsed":25742682}
{"thread":140737354013504,"timestamp":637255135504332559,"state":"enter","function":"dentist.commands.collectPileUps.PileUpCollector.writePileUps"}

Do you have any ideas? Could there be a problem with the pipeline before it hit dentist collect? Watching it run is a thing of beauty!

Regards,

Randy

Originally posted by @BradleyRan in #3 (comment)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.