Giter Site home page Giter Site logo

wgs_a2g_konigskobra's Introduction

wgs_a2g_konigskobra

WGS workflow to generate results according to the ALL2Gether protocol

Snakefmt License: MIT

๐Ÿ’ฌ Introduction

  • build from modules
  • results for A2G protocol

โ— Dependencies

To run this workflow, the following tools need to be available:

python snakemake singularity

๐ŸŽ’ Preparations (Not up to date)

Sample data

  1. Add all sample ids to samples.tsv in the column sample.
  2. Add all sample data information to units.tsv. Each row represents a fastq file pair with corresponding forward and reverse reads. Also indicate the sample id, run id and lane number.

Reference data

  1. You need a reference .fasta file to map your reads to. For the different tools to work, you also need to prepare index files and a .dict file.
  • The required files for the human reference genome GRCh38 can be downloaded from google cloud. The download can be manually done using the browser or using gsutil via the command line:
gsutil cp gs://genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.fasta /path/to/download/dir/
  • If those resources are not available for your reference you may generate them yourself:
bwa index /path/to/reference.fasta
samtools faidx /path/to/reference.fasta
gatk CreateSequenceDictionary -R /path/to/reference.fasta -O /path/to/reference.dict
  1. For the task BaseRecalibrator we use a .vcf containing known indels. For GRCh38, this is available at google as well.
  2. To generate a WGS metrics report, the task CollectWgsMetrics requires a intervals file which is also available at google cloud.
  3. For parallel processing, the .bam file is split by chromosome depending on a .txt file containing a list with chromosome ids. Chromosome which are not included in this file will be excluded downstream!
  4. Add the paths of the different files to the config.yaml. Index files and the .dict file should be in the same directory as the reference .fasta.
  5. Make sure that adapter sequences and docker container versions are correct.

โœ… Testing

The workflow repository contains a small test dataset .tests/integration which can be run like so:

cd .tests/integration
snakemake -s ../../Snakefile -j1 --use-singularity

๐Ÿš€ Usage

The workflow is designed for WGS data meaning huge datasets which require a lot of compute power. For HPC clusters, it is recommended to use a cluster profile and run something like:

snakemake -s /path/to/Snakefile --profile my-awesome-profile

๐Ÿง‘โ€โš–๏ธ Rule Graph

rule_graph

wgs_a2g_konigskobra's People

Contributors

marrip avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.