Giter Site home page Giter Site logo

icgc-argo-workflows / dna-seq-processing-wfs Goto Github PK

View Code? Open in Web Editor NEW
5.0 13.0 4.0 3.48 MB

ICGC ARGO DNA-Seq Processing Workflow

License: GNU Affero General Public License v3.0

Python 39.66% Nextflow 58.68% Shell 0.18% Dockerfile 1.48%
bioinformatics cancer-genomics nextflow dna-seq workflow

dna-seq-processing-wfs's Introduction

Build Status

ICGC ARGO DNA Seq Processing Workflow

Introduction

This repository maintains the source code of the ICGC ARGO DNA Seq Processing Pipeline. The pipeline is written in Nextflow workflow language using DSLv2, with modules imported from other ICGC ARGO GitHub repositories. Specifically, here are repositories maintaining various of tools/modules:

Each Nextflow module (including associated container image which is registered in Quay.io) is strictly version controlled and released independently. To ensure reproducibility the pipeline declares explicitly which specific version of a module is to be imported.

Major tasks performed in the pipeline

  • download input sequencing metadata/data from SONG/SCORE
  • preprocess input sequencing reads (in FASTQ or BAM) into lane level (aka read group level) BAM
  • collect CollectQualityYieldMetrics using Picard tool for read group
  • perform BWA-MEM alignment against GRCh38 reference genome in parallel for each lane BAM
  • merge and markduplicate aligned lane BAM, produce coordinate-sorted CRAM/CRAI and duplicates_metrics
  • collect alignment QC metrics using samtools stats for aligned seq
  • collect CollectOxoGMetrics using GATK for aligned seq and calculate OxoQ score
  • generate SONG metadata for aligned seq and upload them to SONG/SCORE
  • generate SONG metadata for all collected qc_metrics and upload them to SONG/SCORE

Run the pipeline

To run the pipeline, please follow instruction here to install Nextflow (version 20.01.0 or higher) first.

Run 1.9.1 version of the pipeline:

nextflow run icgc-argo-workflows/dna-seq-processing-wfs -r 1.9.1 -params-file <your_params_file.json>

You may need to run nextflow pull icgc-argo-workflows/dna-seq-processing-wfs if the version 1.9.1 is new since last time the pipeline was run.

Please note that SONG/SCORE services need to be available and you have appropriate API token.

Testing

Automated Travis CI testing has been set up. However, tests relying on SONG/SCORE will be skipped when CI is triggered on a Travis server where SONG/SCORE services are not available. When running tests locally (where SONG/SCORE services may be available) please use the following commands under the root directory of this Git repository:

# perform all tests when SONG/SCORE is available
export api_token=<your_api_token>
pytest -v

# or perform tests that do not need SONG/SCORE
TRAVIS=true pytest -v

dna-seq-processing-wfs's People

Contributors

andricdu avatar hknahal avatar junjun-zhang avatar lepsalex avatar lindaxiang avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dna-seq-processing-wfs's Issues

parameterize container registry

Parameterizing container registry will allow us to customize container registry at pipeline execution time instead of hardcoding it at development time.

This will need to be implemented at each process/module level.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.