tnturnerlab / tortoise Goto Github PK
View Code? Open in Web Editor NEWTortoise is the CPU workflow of the HAT https://github.com/TNTurnerLab/HAT tools.
License: MIT License
Tortoise is the CPU workflow of the HAT https://github.com/TNTurnerLab/HAT tools.
License: MIT License
In rule glnexus_dv
I receive the following error
[GLnexus] [error] <path>/dv_out/<sample_id>.dv.cpu.gvcf.gz Exists: sample is currently being added; each input gVCF should have a unique sample name (header column #10) (UnnamedSample (<path>/dv_out/<sample_id>.dv.cpu.gvcf.gz))
[GLnexus] [error] Failed to bulk load into DB: Failure: One or more gVCF inputs failed validation or database loading; check log for details.
I believe it is because the files <path>/dv_out/<sample_id>.dv.cpu.gvcf.gz
created by rule deepvariant
have header lines of the form:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT UnnamedSample
rather than
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT <sample_id>
Manually re-headering the gvcfs and re-running from this rule seems to do the trick and allows the process to continue. Is there some parameter I am missing to insert the appropriate names into the gvcfs, or could it be an error in their generation?
Hello!
I am running Tortoise on pacbio long read trio to call de novo variants. It ran for about 2.1 days, and I think made it very far! From what I can see the haplotype calling finished:
although I see a lot of rows like:
15:14:51.985 WARN DepthPerSampleHC - Annotation will not be calculated at position chr1:10616 and possibly subsequent; genotype for sample PBG_3484_0000009718 is not called
This is ultimately why I think it finished successfully:
09:45:47.868 INFO VectorLoglessPairHMM - Time spent in setup for JNI call : 63.23320811800001
09:45:47.868 INFO PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 27991.821773314
09:45:47.868 INFO SmithWatermanAligner - Total compute time in java Smith-Waterman : 24901.88 sec
09:45:47.869 INFO HaplotypeCaller - Shutting down engine
[June 2, 2023 9:45:47 AM EDT] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 2,551.14 minutes.
Runtime.totalMemory()=10189012992
But the biggest issue from my perspective is this error that appeared (for each sample analyzed):
A USER ERROR has occurred: Contig chr1_KI270706v1_random not present in the sequence dictionary [chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY, chrM]
***********************************************************************
Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.
Using GATK jar /hpc/packages/minerva-centos7/gatk/4.2.0.0/gatk-4.2.0.0/gatk-package-4.2.0.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx64g -jar /hpc/packages/minerva-centos7/gatk/4.2.0.0/gatk-4.2.0.0/gatk-package-4.2.0.0-local.jar HaplotypeCaller -R /sc/arion/projects/buxbaj01a/sloofl01/pac_bio/files/references/GRCh38.chrom.fasta -I /sc/arion/projects/buxbaj01a/sloofl01/pac_bio/results/Tortoise/052023/data_dir/PBG_3484_0000009726.GRCh38.bam -O /sc/arion/projects/buxbaj01a/sloofl01/pac_bio/results/Tortoise/052023/dnv_wf_cpu/gatk_out/PBG_3484_0000009726.gatk.cpu.g.vcf.gz --standard-min-confidence-threshold-for-calling 30 -ERC GVCF
[Fri Jun 2 09:45:47 2023]
Error in rule gatk:
jobid: 1
output: /sc/arion/projects/buxbaj01a/sloofl01/pac_bio/results/Tortoise/052023/dnv_wf_cpu/gatk_out/PBG_3484_0000009726.gatk.cpu.g.vcf.gz
shell:
export PATH=/opt/conda/bin:$PATH
mkdir -p /sc/arion/projects/buxbaj01a/sloofl01/pac_bio/results/Tortoise/052023/dnv_wf_cpu/gatk_out
gatk --java-options "-Xmx64g" HaplotypeCaller -R /sc/arion/projects/buxbaj01a/sloofl01/pac_bio/files/references/GRCh38.chrom.fasta -I /sc/arion/projects/buxbaj01a/sloofl01/pac_bio/results/Tortoise/052023/data_dir/PBG_3484_0000009726.GRCh38.bam -O /sc/arion/projects/buxbaj01a/sloofl01/pac_bio/results/Tortoise/052023/dnv_wf_cpu/gatk_out/PBG_3484_0000009726.gatk.cpu.g.vcf.gz --standard-min-confidence-threshold-for-calling 30 -ERC GVCF
touch /sc/arion/projects/buxbaj01a/sloofl01/pac_bio/results/Tortoise/052023/dnv_wf_cpu/gatk_out/PBG_3484_0000009726.gatk.cpu.g.vcf.gz.tbi
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Removing output files of failed job gatk since they might be corrupted:
/sc/arion/projects/buxbaj01a/sloofl01/pac_bio/results/Tortoise/052023/dnv_wf_cpu/gatk_out/PBG_3484_0000009726.gatk.cpu.g.vcf.gz
Is there a way I can get this to work without making new bamfiles to remove all of the contigs that are not referenced in the dictionary? Thank you for checking/helping me debug!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.