Giter Site home page Giter Site logo

tnturnerlab / tortoise Goto Github PK

View Code? Open in Web Editor NEW
4.0 2.0 0.0 53.52 MB

Tortoise is the CPU workflow of the HAT https://github.com/TNTurnerLab/HAT tools.

License: MIT License

Dockerfile 16.55% Python 53.68% WDL 29.77%
genetics genomics cpu denovo

tortoise's People

Contributors

jng2 avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

tortoise's Issues

Tortoise stopped with error

Hello!

I am running Tortoise on pacbio long read trio to call de novo variants. It ran for about 2.1 days, and I think made it very far! From what I can see the haplotype calling finished:

although I see a lot of rows like:
15:14:51.985 WARN DepthPerSampleHC - Annotation will not be calculated at position chr1:10616 and possibly subsequent; genotype for sample PBG_3484_0000009718 is not called

This is ultimately why I think it finished successfully:

09:45:47.868 INFO  VectorLoglessPairHMM - Time spent in setup for JNI call : 63.23320811800001
09:45:47.868 INFO  PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 27991.821773314
09:45:47.868 INFO  SmithWatermanAligner - Total compute time in java Smith-Waterman : 24901.88 sec
09:45:47.869 INFO  HaplotypeCaller - Shutting down engine
[June 2, 2023 9:45:47 AM EDT] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 2,551.14 minutes.
Runtime.totalMemory()=10189012992

But the biggest issue from my perspective is this error that appeared (for each sample analyzed):

A USER ERROR has occurred: Contig chr1_KI270706v1_random not present in the sequence dictionary [chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY, chrM]

***********************************************************************
Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.
Using GATK jar /hpc/packages/minerva-centos7/gatk/4.2.0.0/gatk-4.2.0.0/gatk-package-4.2.0.0-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx64g -jar /hpc/packages/minerva-centos7/gatk/4.2.0.0/gatk-4.2.0.0/gatk-package-4.2.0.0-local.jar HaplotypeCaller -R /sc/arion/projects/buxbaj01a/sloofl01/pac_bio/files/references/GRCh38.chrom.fasta -I /sc/arion/projects/buxbaj01a/sloofl01/pac_bio/results/Tortoise/052023/data_dir/PBG_3484_0000009726.GRCh38.bam -O /sc/arion/projects/buxbaj01a/sloofl01/pac_bio/results/Tortoise/052023/dnv_wf_cpu/gatk_out/PBG_3484_0000009726.gatk.cpu.g.vcf.gz --standard-min-confidence-threshold-for-calling 30 -ERC GVCF
[Fri Jun  2 09:45:47 2023]
Error in rule gatk:
    jobid: 1
    output: /sc/arion/projects/buxbaj01a/sloofl01/pac_bio/results/Tortoise/052023/dnv_wf_cpu/gatk_out/PBG_3484_0000009726.gatk.cpu.g.vcf.gz
    shell:

    export PATH=/opt/conda/bin:$PATH

    mkdir -p /sc/arion/projects/buxbaj01a/sloofl01/pac_bio/results/Tortoise/052023/dnv_wf_cpu/gatk_out

    gatk --java-options "-Xmx64g" HaplotypeCaller -R /sc/arion/projects/buxbaj01a/sloofl01/pac_bio/files/references/GRCh38.chrom.fasta -I /sc/arion/projects/buxbaj01a/sloofl01/pac_bio/results/Tortoise/052023/data_dir/PBG_3484_0000009726.GRCh38.bam -O /sc/arion/projects/buxbaj01a/sloofl01/pac_bio/results/Tortoise/052023/dnv_wf_cpu/gatk_out/PBG_3484_0000009726.gatk.cpu.g.vcf.gz --standard-min-confidence-threshold-for-calling 30 -ERC GVCF

    touch /sc/arion/projects/buxbaj01a/sloofl01/pac_bio/results/Tortoise/052023/dnv_wf_cpu/gatk_out/PBG_3484_0000009726.gatk.cpu.g.vcf.gz.tbi

        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Removing output files of failed job gatk since they might be corrupted:
/sc/arion/projects/buxbaj01a/sloofl01/pac_bio/results/Tortoise/052023/dnv_wf_cpu/gatk_out/PBG_3484_0000009726.gatk.cpu.g.vcf.gz

Is there a way I can get this to work without making new bamfiles to remove all of the contigs that are not referenced in the dictionary? Thank you for checking/helping me debug!

Error in rule glnexus_dv when loading gvcf files into database

In rule glnexus_dv I receive the following error

[GLnexus] [error] <path>/dv_out/<sample_id>.dv.cpu.gvcf.gz Exists: sample is currently being added; each input gVCF should have a unique sample name (header column #10) (UnnamedSample (<path>/dv_out/<sample_id>.dv.cpu.gvcf.gz))
[GLnexus] [error] Failed to bulk load into DB: Failure: One or more gVCF inputs failed validation or database loading; check log for details.

I believe it is because the files <path>/dv_out/<sample_id>.dv.cpu.gvcf.gz created by rule deepvariant have header lines of the form:

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT UnnamedSample

rather than

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT <sample_id>

Manually re-headering the gvcfs and re-running from this rule seems to do the trick and allows the process to continue. Is there some parameter I am missing to insert the appropriate names into the gvcfs, or could it be an error in their generation?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.