Giter Site home page Giter Site logo

sherineawad / somaticmutations Goto Github PK

View Code? Open in Web Editor NEW
3.0 2.0 2.0 35 KB

This a snakemake pipeline to detect Somatic mutations (GATK4 and Mutect2)

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%
somatic mutations snakemake mutect2 gatk4

somaticmutations's Introduction

Snakemake License DOI

Snakemake Workflow for somatic mutation calling

The is GATK4/Mutect2 pipeline for Somatic Mutation.

Requirments

  • trim-galore=0.6.6
  • star=2.7.10a
  • picard=2.25.6
  • gatk4=4.2.0.0

You can run the pipeline in -use-conda mode to pull these tools automatically. See use conde section below.

Edit config file

You will need to edit the config file to match your samples and parameters.

The pipeline expects samples with suffix ".r_1.fq.gz" and ".r_2.fq.gz" if the samples are paired-end. Any prefix before this suffix is the sample name and to be written in the "samples.tsv". For single-end reads, the samples suffix is ".fq.gz" and any prefix before this suffix is written in the "samples.tsv". For example, if your sample name is sample1.s_1.r_1.fq.gz, then your sample name in the samples file should be sample1.s_1.

You need to update the config file with whether your samples are paired-end or single reads. If your samples are paired-end, then the PAIRD entry in the config file should be set to TRUE, otherwise, set the PAIRED entry in the config file to FALSE. You can change the samples.tsv name in the config file.

The samples.tsv has the following format:

Tumors Normals
SLX-18967.UDP0126.HT3G5DMXX.s_1 SLX-18967.UDP0129.HT3G5DMXX.s_1
SLX-18967.UDP0146.HT3G5DMXX.s_1 SLX-18967.UDP0149.HT3G5DMXX.s_1

You will need to edit the names and directory of your genome, your genome index, GTF, adapters, read groups in the GENOME, INDEX, GTF, ADAPTERS, and RG entries in the config file respectively. You will also need to have your DBSNP vcf, indels vcf, gold standard vcf, and AF only gnomAD in the DBSNP, INDELS, GOLD_STANDARD, and AFONLYGNOMAD entries respectively in the config file. If you are using human genome, these resources can be pulled from Broad institute Resource Bundle

You need to update your interval list, by editing the intervals.list file to list only the chromosomes of your interest. You can change the name of this file by editing the config file entry INTERVALS.

The pipeline will automatically pull the biallelic gnomAD. You can change its name/location by editing the GNOMAD_BIALLELIC entry in the config file.

Run Snakemake pipeline

Once you edit the config file to match your needs, then:

snakemake -jn 

where n is the number of cores for example for 10 cores use:

snakemake -j10 

For a dry run use:

snakemake -j1 -n 

and to print command in dry run use:

snakemake -j1 -n -p 

Use Conda

For less frooodiness, to pull automatically the same versions of dependencies use:

snakemake -jn --use-conda

This will pull the same versions of tools we used. Conda has to be installed in your system.

For example, for 10 cores:

snakemake -j10 --use-conda

Dry run

for a dry run use:

snakemake -j1 -n

and you can see the command printed on a dry run using:

snakemake -j1 -n -p

Keep going option

You can try the following to keep going if any issues happen, like no variants is found by one tool:

snakemake -j1 --keep-going

Collect some stats

snakemake -j 10 --keep-going --stats run.stats

Citations

If you use this pipeline, please cite us as follows:

Sherine Awad. (2022). SherineAwad/SomaticMutations: v1.0.0 (v1.0.0). Zenodo. https://doi.org/10.5281/zenodo.6202482

somaticmutations's People

Contributors

sherineawad avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

somaticmutations's Issues

Questions

Hi! Thank you for the workflow!

Do you have a flowchart diagram of the workflow steps?

I am dealing with processing massive files with a limited space. Is it possible to delete intermediate files between FASTQ & the final CRAM on the go? I can go to Snakefile and mark every output as temp, I guess, but this only works if there are no "untracked" output files, i.e. cases like there are 2 output files from a tool but you only included one of them in the output directive ...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.