This is the workflow for the TCGA/ICGC PanCancer Analysis of Whole Genomes (PCAWG) project that aligns whole genome sequences with BWA-Mem.
For more information about the project overall see the PanCancer wiki space.
More detailed documentation about the production use of this workflow can be found in the PanCancer-Info project where we maintain our production documentation and SOPs.
You can also build a Docker image that has the workflow ready to run in it.
docker build -t pcawg-bwa-mem-workflow
java -jar cromwell-19.3.jar run pcawg-bwa-mem-workflow.wdl pcawg-bwa-mem-workflow.json
Some synthetic sample data.
- https://s3.amazonaws.com/oicr.workflow.bundles/released-bundles/synthetic_bam_for_GNOS_upload/hg19.chr22.5x.normal2.bam
- https://s3.amazonaws.com/oicr.workflow.bundles/released-bundles/synthetic_bam_for_GNOS_upload/hg19.chr22.5x.normal.bam
We use a specific reference based on GRCh37.
- http://s3.amazonaws.com/pan-cancer-data/pan-cancer-reference/genome.fa.gz
- http://s3.amazonaws.com/pan-cancer-data/pan-cancer-reference/genome.fa.gz.fai
- http://s3.amazonaws.com/pan-cancer-data/pan-cancer-reference/genome.fa.gz.64.amb
- http://s3.amazonaws.com/pan-cancer-data/pan-cancer-reference/genome.fa.gz.64.ann
- http://s3.amazonaws.com/pan-cancer-data/pan-cancer-reference/genome.fa.gz.64.bwt
- http://s3.amazonaws.com/pan-cancer-data/pan-cancer-reference/genome.fa.gz.64.pac
- http://s3.amazonaws.com/pan-cancer-data/pan-cancer-reference/genome.fa.gz.64.sa
- Brian O'Connor [email protected]
- Junjun Zhang [email protected]
- Adam Wright [email protected]
- Keiran Raine: PCAP-Core and BWA-Mem workflow design
- Roshaan Tahir: Original BWA-Align workflow design
- Adam Struck: WDL implementation