Giter Site home page Giter Site logo

truwl / deepvariant-wdl Goto Github PK

View Code? Open in Web Editor NEW

This project forked from dnanexus-rnd/deepvariant-glnexus-wdl

0.0 0.0 0.0 47 KB

WDL workflow for population variant calling using htsget, DeepVariant, and GLnexus

License: Apache License 2.0

WDL 82.93% Python 17.07%

deepvariant-wdl's Introduction

DeepVariant+GLnexus workflows

These portable WDL workflows use DeepVariant to call variants from WGS read alignments, followed by GLnexus to merge the resulting Genome VCF (gVCF) files for several samples into a Project VCF (pVCF). The wdl/ directory has three nested workflows:

Based on the DeepVariant docs, the sequential workflow to generate gVCF from a given BAM file and genomic range.

             +----------------------------------------------------------------------------+
             |                                                                            |
             |  DeepVariant.wdl                                                           |
             |                                                                            |
             |  +-----------------+    +-----------------+    +------------------------+  |
sample.bam   |  |                 |    |                 |    |                        |  |
 genome.fa ----->  make_examples  |---->  call_variants  |---->  postprocess_variants  |-----> gVCF
     range   |  |                 |    |                 |    |                        |  |
             |  +-----------------+    +--------^--------+    +------------------------+  |
             |                                  |                                         |
             |                                  |                                         |
             +----------------------------------|-----------------------------------------+
                                                |
                                       DeepVariant Model

make_examples and call_variants internally parallelize across CPUs on the machine they run on. The tasks use the docker image published by the DeepVariant team.

To further parallelize WGS calling accross several machines, scatters DeepVariant.wdl across several genomic ranges (typically full-length chromosomes). For each range, fetches a BAM slice using the GA4GH htsget client in samtools 1.7+, given an htsget server endpoint and sample ID. Finally, concatenates the per-range gVCFs to the complete product.

             +--------------------------------------------------------------------------------+
             |                                                                                |
             |  htsget_DeepVariant.wdl                                                        |
             |                                                                                |
             |       +-----------------+    +-------------------+                             |
             |       |                 |    |                   |  range gVCF                 |
             |   +--->  htsget client  |---->  DeepVariant.wdl  |---+                         |
             |   |   |  (samtools)     |    |                   |   |                         |
             |   |   |                 |    +-------------------+   |                         |
sample ID    |   |   +-----------------+                            |  +-------------------+  |
             |   |                                                  +-->                   |  |
   ranges -------+---> ...                  ...                 ... --->  bcftools concat  +-----> sample gVCF
    (e.g.    |   |                                                  +-->                   |  |
     chr1    |   |   +-----------------+                            |  +-------------------+  |
     chr2    |   |   |                 |    +-------------------+   |                         |
     ...)    |   +--->  htsget client  |    |                   |   |                         |
             |       |  (samtools)     |---->  DeepVariant.wdl  |---+                         |
             |       |                 |    |                   |  range gVCF                 |
             |       +------------^----+    +-------------------+                             |
             |            |       |                                                           |
             |            |       |                                                           |
             +------------|-------|-----------------------------------------------------------+
                          |       |
               sample ID  |       |
                   range  |       |  range BAM
                          |       |
                     +----v------------+
                     |                 |
                     |  htsget server  |
                     |                 |
                     +-----------------+

By using htsget, the workflow scatters across the ranges without first having to download and slice up a monolithic BAM file.

Scatters htsget_DeepVariant.wdl across several samples to generate an array of gVCF files, then feeds these to GLnexus to merge them into a pVCF.

              +-----------------------------------------------------------+
              |                                                           |
              |  htsget_DeepVariant_GLnexus.wdl                           |
              |                                                           |
              |       +--------------------------+                        |
              |       |                          |   sample gVCF          |
              |   +--->  htsget_DeepVariant.wdl  |----+                   |
              |   |   |                          |    |                   |
              |   |   +--------------------------+    |    +-----------+  |
              |   |                                   +---->           |  |
sample IDs -------+---> ...                      ...  ----->  GLnexus  +----> project VCF
              |   |                                   +---->           |  |
              |   |   +--------------------------+    |    +-----------+  |
              |   |   |                          |    |                   |
              |   +--->  htsget_DeepVariant.wdl  |----+                   |
              |       |                          |   sample gVCF          |
              |       +--------------------------+                        |
              |                                                           |
              +-----------------------------------------------------------+

Here's an example inputs JSON providing everything required to launch this top-level workflow with dxWDL or Cromwell:

{
    "htsget_DeepVariant_GLnexus.accessions": ["NA12878","NA12891","NA12892"],
    "htsget_DeepVariant_GLnexus.htsget_endpoint": "https://htsnexus.rnd.dnanex.us/v1/reads/BroadHiSeqX_b37",
    "htsget_DeepVariant_GLnexus.ranges": ["12:112204691-112247789","17:41196312-41277500"],
    "htsget_DeepVariant_GLnexus.ref_fasta_gz": (REFERENCE GENOME FILE),
    "htsget_DeepVariant_GLnexus.model_tar": (DEEPVARIANT MODEL FILES),
    "htsget_DeepVariant_GLnexus.output_name": "b37_CEUtrio_ALDH2_BRCA1",
}

deepvariant-wdl's People

Contributors

leipzig avatar mlin avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.