Giter Site home page Giter Site logo

cancerit / dockstore-cgpwgs Goto Github PK

View Code? Open in Web Editor NEW
28.0 14.0 14.0 3.83 MB

Dockstore implementation of CGP core WGS analysis

License: GNU Affero General Public License v3.0

Shell 54.86% Perl 25.73% R 1.73% Common Workflow Language 10.35% Dockerfile 7.33%
dockstore ngs somatic somatic-variants snv indels

dockstore-cgpwgs's Introduction

dockstore-cgpwgs

dockstore-cgpwgs provides a complete multi threaded WGS analysis for SNV, INDEL, SV and Copynumber variation with associated annotation of VCF files. This has been packaged specifically for use with the Dockstore.org framework.

Gitter Badge

Quay Badge

Master Develop
Master Badge Develop Badge

Usage

This is intended to be run using the Dockstore.org framework under docker but can be executed as a normal docker container (or imported into singularity.

See the dockstore execution method here.

You should see the usage for the ds-cgpwgs.pl script for all parameters (or the cwl definition).

Required input files are

  1. Tumour BAM file
  2. Normal BAM file
  3. Core reference archive (e.g. core_ref_GRCh37d5.tar.gz)
  4. WXS reference archive (e.g. SNV_INDEL_ref_GRCh37d5.tar.gz)
  5. WGS reference archive (e.g. CNV_SV_ref_GRCh37d5_brass6+.tar.gz)
  6. VAGrENT (annotation) reference archive (e.g. VAGrENT_ref_GRCh37d5_ensembl_75.tar.gz)
  7. Subclonal reference archive (SUBCL_ref_GRCh37d5.tar.gz)
  • Only needed if skipbb is false
  1. QC reference archive (e.g. qcGenotype_GRCh37d5.tar.gz)

Inputs 1&2 are expected to have been mapped using dockstore-cgpmap

Please check the Wiki then raise an issue if you require additional information on how to generate your own reference files. Much of this information is available on the individual algorithm wiki pages (or the subsequently linked protocols papers).

Usable Cores

When running outside of a docker container you can set the number of CPUs via:

  • export CPU=N
  • -cores|-c option of ds-cgpwgs.pl

If not set detects available cores on system.

Other uses

Native docker

All of the tools installed as part of dockstore-cgpmap and dockstore-cgpwxs and the above packages are available for direct use.

See the docker guide in the wiki for more details.

Singularity

The resulting docker container has been tested with Singularity.

See the docker guide in the wiki for more details.

Verifying your deployment

The examples/ tree contains test json files populated with data that can be used to verify the tool.

Example data

The data linked in the 'examples' area is from the cell line COLO-829.

Diagram of internals

This diagram was generated based on v1.1.0, it does not describe any of the file provisioning handled by a Dockstore run.

Internal flow of docker image

Development environment

This project uses git pre-commit hooks. Please enable them to prevent inappropriate large files being included. Any pull request found not to have adhered to this will be rejected and the branch will need to be manually cleaned to keep the repo size down.

Activate the hooks with

git config core.hooksPath git-hooks

Release process

This project is maintained using HubFlow.

  1. Make appropriate changes
  2. Build image locally
  3. Run all example inputs and verify any changes are acceptable
  4. Bump version in Dockerfile and Dockstore.cwl
  5. Push changes
  6. Check state on Travis
  7. Generate the release (add notes to GitHub)
  8. Confirm that image has been built on quay.io
  9. Update the dockstore entry, see their docs.

LICENCE

Copyright (c) 2017-2019 Genome Research Ltd.

Author: Cancer Genome Project <[email protected]>

This file is part of dockstore-cgpwgs.

dockstore-cgpwgs is free software: you can redistribute it and/or modify it under
the terms of the GNU Affero General Public License as published by the Free
Software Foundation; either version 3 of the License, or (at your option) any
later version.

This program is distributed in the hope that it will be useful, but WITHOUT
ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more
details.

You should have received a copy of the GNU Affero General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.

1. The usage of a range of years within a copyright statement contained within
this distribution should be interpreted as being equivalent to a list of years
including the first and last year specified and all consecutive years between
them. For example, a copyright statement that reads ‘Copyright (c) 2005, 2007-
2009, 2011-2012’ should be interpreted as being identical to a statement that
reads ‘Copyright (c) 2005, 2007, 2008, 2009, 2011, 2012’ and a copyright
statement that reads ‘Copyright (c) 2005-2012’ should be interpreted as being
identical to a statement that reads ‘Copyright (c) 2005, 2006, 2007, 2008,
2009, 2010, 2011, 2012’."

dockstore-cgpwgs's People

Contributors

byb121 avatar keiranmraine avatar mr-c avatar sdentro avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dockstore-cgpwgs's Issues

ds-cgpwgs.pl -sp not carried over to cgpFlagCaVEMan.pl -s

dockstore-cgpwgs-2.1.0 falls over on mouse bam files mapped using dockstore-cgpmap-3.0.4 giving these errors:

No flagList found in flag.vcf.config.WGS.ini for section MUS MUSCULUS_WGS FLAGLIST. No flagging will be done. at /opt/wtsi-cgp/bin/cgpFlagCaVEMan.pl line 822.

No config found in flag.vcf.config.WGS.ini for section MUS MUSCULUS_WGS FLAGLIST at /opt/wtsi-cgp/bin/cgpFlagCaVEMan.pl line 829.

I am using mm10 reference bundle both for mapping and variant calling.

I am running like
ds-cgpwgs.pl -sp mouse -as mm10
but as the bam files have "SP:Mus musculus" in their @sq header lines, this is carried over to
cgpFlagCaVEMan.pl -s 'Mus musculus'
which fails as the reference file flag.vcf.config.WGS.ini does not have the flagList or prams for 'Mus musculus'.

The behaviour should be that -sp from ds-cgpwgs.pl should override SP: in the bam headers so the flagging would run like this:
cgpFlagCaVEMan.pl -s mouse

"Tar ... File changed as we read it error"

I am running a slightly modified version of the example script. Which modifies the exclude line to "NC_007605,hs37d5,GL%". However it is producing various tar errors "file changed as we read it".

My only thought so far that is this related to the output being placed in the /tmp folder? Maybe related to this.

The command is launched from the /data/ardy/examplevariant2/ folder, using

sudo DOCKSTORE_ROOT=true /root/dockstore tool launch --local-entry /data/ardy/production/pipelines/dockstore-cgpwgs/cwls/cgpwgs.cwl --json /root/run.json

Source file

{
  "reference": {
    "path": "/data/ardy/testing/referencetesting/core_ref_GRCh37d5.tar.gz",
    "class": "File"
  },
  "annot": {
    "path": "/data/ardy/testing/referencetesting/VAGrENT_ref_GRCh37d5_ensembl_75.tar.gz",
    "class": "File"
  },
  "snv_indel": {
    "path": "/data/ardy/testing/referencetesting/SNV_INDEL_ref_GRCh37d5-fragment.tar.gz",
    "class": "File"
  },
  "cnv_sv": {
    "path": "/data/ardy/testing/referencetesting/CNV_SV_ref_GRCh37d5_brass6+.tar.gz",
    "class": "File"
  },
  "qcset": {
    "path": "/data/ardy/testing/referencetesting/qcGenotype_GRCh37d5.tar.gz",
    "class": "File"
  },
  "tumour": {
    "path": "/data/ardy/testing/referencetesting/COLO-829.bam",
    "class": "File"
  },
  "tumourIdx": {
    "path": "/data/ardy/testing/referencetesting/COLO-829.bam.bai",
    "class": "File"
  },
  "normal": {
    "path": "/data/ardy/testing/referencetesting/COLO-829-BL.bam",
    "class": "File"
  },
  "normalIdx": {
    "path": "/data/ardy/testing/referencetesting/COLO-829-BL.bam.bai",
    "class": "File"
  },
  "exclude": "NC_007605,hs37d5,GL%",
  "species": "human",
  "assembly": "GRCh37d5",
  "cavereads": 800000,
  "result_archive": {
    "path": "/tmp/result_WGS.tar.gz",
    "class": "File"
  },
  "timings": {
    "path": "/tmp/timings_WGS.tar.gz",
    "class": "File"
  },
  "global_time": {
    "path": "/tmp/global_WGS.time",
    "class": "File"
  },
  "run_params": {
    "path": "/tmp/params_WGS.params",
    "class": "File"
  }
}

log file

Setting up Parallel block 6
        [Parallel block 6] CaVEMan_annot added...
        [Parallel block 6] VerifyBam Tumour added...
Starting Parallel block 6: Tue Nov 26 06:52:34 UTC 2019
        Starting verify_MT
+ set +x
+ bash -c '/usr/bin/time -v verifyBamHomChk.pl -d 25  -o /CAlmQt/WGS_COLO-829/contamination  -b /CAlmQt/tmp/COLO-829.bam  -t 8  -a /CAlmQt/WGS_COLO-829_vs_COLO-829-BL/ascat/COLO-829.copynumber.caveman.csv  -j /CAlmQt/WGS_COLO-829/contamination/result.json  -s /CAlmQt/reference_files/verifyBamID_snps.vcf.gz >& /CAlmQt/timings/WGS_COLO-829_vs_COLO-829-BL.time.verify_MT ; echo '\'
'WRAPPER_EXIT: '\''$?'
+ set +x
+ bash -c '/usr/bin/time -v AnnotateVcf.pl -t -c /CAlmQt/reference_files/vagrent/vagrent.cache.gz  -i /CAlmQt/WGS_COLO-829_vs_COLO-829-BL/caveman/COLO-829_vs_COLO-829-BL.flagged.muts.vcf.gz  -o /CAlmQt/WGS_COLO-829_vs_COLO-829-BL/caveman/COLO-829_vs_COLO-829-BL.annot.muts.vcf >& /CAlmQt/timings/WGS_COLO-829_vs_COLO-829-BL.time.CaVEMan_annot ; echo '\''WRAPPER_EXIT: '\''$?'
        Starting CaVEMan_annot
Package results
tar: WGS_COLO-829_vs_COLO-829-BL/caveman/caveman.cfg.ini: file changed as we read it
tar: WGS_COLO-829_vs_COLO-829-BL/caveman/alg_bean: file changed as we read it
tar: WGS_COLO-829_vs_COLO-829-BL/caveman/cov_arr: file changed as we read it
tar: WGS_COLO-829_vs_COLO-829-BL/caveman/COLO-829_vs_COLO-829-BL.muts.ids.vcf.gz: file changed as we read it
tar: WGS_COLO-829_vs_COLO-829-BL/caveman/COLO-829_vs_COLO-829-BL.muts.ids.vcf.gz.tbi: file changed as we read it
tar: WGS_COLO-829_vs_COLO-829-BL/caveman/COLO-829_vs_COLO-829-BL.snps.ids.vcf.gz.tbi: file changed as we read it
tar: WGS_COLO-829_vs_COLO-829-BL/caveman/COLO-829_vs_COLO-829-BL.flagged.muts.vcf.gz.tbi: file changed as we read it
INFO [job cgpwgs] Max memory used: 3828853MiB
WARNING [job cgpwgs] completed permanentFail
WARNING Final process status is permanentFail
{
    "global_time": {
        "location": "file:///data/ardy/testing/examplevariant2/datastore/launcher-06add2a9-210a-46b7-89c4-f421258f62a2/outputs/WGS_COLO-829_vs_COLO-829-BL.time",
        "basename": "WGS_COLO-829_vs_COLO-829-BL.time",
        "class": "File",
        "checksum": "sha1$426535624c64e42a60444f8779a699162b946217",
        "size": 853,
        "path": "/data/ardy/testing/examplevariant2/datastore/launcher-06add2a9-210a-46b7-89c4-f421258f62a2/outputs/WGS_COLO-829_vs_COLO-829-BL.time"
    },
    "result_archive": {
        "location": "file:///data/ardy/testing/examplevariant2/datastore/launcher-06add2a9-210a-46b7-89c4-f421258f62a2/outputs/WGS_COLO-829_vs_COLO-829-BL.result.tar.gz",
        "basename": "WGS_COLO-829_vs_COLO-829-BL.result.tar.gz",
        "class": "File",
        "checksum": "sha1$8430a966e9873bb64f1a3b0f64f1d31987da9cf7",
        "size": 593483989,
        "path": "/data/ardy/testing/examplevariant2/datastore/launcher-06add2a9-210a-46b7-89c4-f421258f62a2/outputs/WGS_COLO-829_vs_COLO-829-BL.result.tar.gz"
    },
    "run_params": {
        "location": "file:///data/ardy/testing/examplevariant2/datastore/launcher-06add2a9-210a-46b7-89c4-f421258f62a2/outputs/run.params",
        "basename": "run.params",
        "class": "File",
        "checksum": "sha1$933f5c3ec98efee5229776044e0ab8f3620624ff",
        "size": 576,
        "path": "/data/ardy/testing/examplevariant2/datastore/launcher-06add2a9-210a-46b7-89c4-f421258f62a2/outputs/run.params"
    },
    "timings": {
        "location": "file:///data/ardy/testing/examplevariant2/datastore/launcher-06add2a9-210a-46b7-89c4-f421258f62a2/outputs/WGS_COLO-829_vs_COLO-829-BL.timings.tar.gz",
        "basename": "WGS_COLO-829_vs_COLO-829-BL.timings.tar.gz",
        "class": "File",
        "checksum": "sha1$a8ca5270ac2b3d901a3251e01f3d091fe8d622f7",
        "size": 3525,
        "path": "/data/ardy/testing/examplevariant2/datastore/launcher-06add2a9-210a-46b7-89c4-f421258f62a2/outputs/WGS_COLO-829_vs_COLO-829-BL.timings.tar.gz"
    }
}
org.apache.commons.exec.ExecuteException: Process exited with an error: 1 (Exit value: 1)
                                                                                                                                                                                                                                                                                                  

Pre-release for fragment+readQC

Create feature/fragementQc to link to pre-release of PCAP-core

  • Link pre-release of dockstore-cgpwxs
  • link pre-releases of relevant algs (alleleCounter)
  • update any other deps of relevance
  • do #17
  • do #18
  • fix #19
  • fix #20
  • NC_007605 needs to be added to HiDepth.bed.gz files in ref pack (brass)
  • Create pre-release
  • Trigger Quay build
  • Link pre-release to dockstore
  • Test with standard dataset

Ascat error

Dear Colleagues

Please could you advise if you have encountered the following error that I paste below? If so, please advise.

Kind regards
Dominik

I am running the image through singularity as follows (WGS 40x/80x):
ds-cgpwgs.pl \ -reference /var/spool/ref/core_ref_GRCh37d5.tar.gz \ -annot /var/spool/ref/VAGrENT_ref_GRCh37d5_ensembl_75.tar.gz \ -snv_indel /var/spool/ref/SNV_INDEL_ref_GRCh37d5.tar.gz \ -cnv_sv /var/spool/ref/CNV_SV_ref_GRCh37d5_brass6+.tar.gz \ -subcl /var/spool/ref/SUBCL_ref_GRCh37d5.tar.gz \ -exclude NC_007605,hs37d5,GL% \ -species Human \ -assembly GRCH37D5 \ -tumour /var/spool/data/tum/tum.bam \ -normal /var/spool/data/norm/norm.bam \ -tidx /var/spool/data/tum/tum.bam.bai \ -nidx /var/spool/data/norm/norm.bam.bai \ -cavereads 800000 \ -cores 12 \ -qcset /var/spool/ref/qcGenotype_GRCh37d5.tar.gz

Error in ASCAT log:
`+ cut -f 1-3 /home/WGS_MNM00026_vs_MNM00025/ascat/tmpAscat/SnpGcCorrections.tsv

  • cd /home/WGS_MNM00026_vs_MNM00025/ascat/tmpAscat/ascat
  • /usr/bin/Rscript /opt/wtsi-cgp/lib/perl5/auto/share/module/Sanger-CGP-Ascat-Implement/ascat/runASCAT.R /opt/wtsi-cgp/lib/perl5/auto/share/modu
    le/Sanger-CGP-Ascat-Implement/ascat /home/WGS_MNM00026_vs_MNM00025/ascat/tmpAscat/SnpPositions.tsv /home/WGS_MNM00026_vs_MNM00025/ascat/tmpAscat
    /SnpGcCorrections.tsv MNM00026 MNM00026.count MNM00025 MNM00025.count XX 24 /home/WGS_MNM00026_vs_MNM00025/ascat/tmpAscat/ascat/MNM00026.Rdata
    Error in apply(corr_tot, 1, function(x) sum(abs(x * length_tot))/sum(length_tot)) :
    dim(X) must have a positive length
    Calls: ascat.GCcorrect -> apply
    Execution halted
    Command exited with non-zero status 1
    445.93user 1.63system 7:28.04elapsed 99%CPU (0avgtext+0avgdata 2161004maxresident)k
    15432inputs+440512outputs (33major+658054minor)pagefaults 0swaps`

broken link in README

Link '(ftp://ftp.sanger.ac.uk/pub/cancer/dockstore/human/))' in README.md doesn't open
due to additional bracket

CRAM input cause the workflow consistently failed at the QC steps

Hi Keiran,
Thank you for sharing this sanger-variant-caller docker image to the public and we very appreciate your great work.
We tested the caller version 2.1.0 (https://github.com/cancerit/dockstore-cgpwgs/releases/tag/2.1.0) by docker with both BAM and CRAM inputs.

  • For BAM input, it works perfectly and generated all the results as expected. Especially thanks to your great job, it was running much faster and utilizing the cpu very efficiently comparing with the previous versions.

  • For CRAM input, we found that the workflow will consistently fail at QC steps. Here are part of the log information related to the Errors:

Starting Parallel block 1: Fri Sep 20 14:32:51 UTC 2019
Starting geno
Starting CaVEMan_setup
+ set +x
+ bash -c \'/usr/bin/time -v compareBamGenotypes.pl   -o /wQEnVt/WGS_HCC1147_vs_HCC1147_BL/genotyped   -nb /wQEnVt/tmp/HCC1147_BL.cram   -j /wQEnVt/WGS_HCC1147_vs_HCC1147_BL/genotyped/result.json   -tb /wQEnVt/tmp/HCC1147.cram   -s /wQEnVt/reference_files/general.tsv   -g /wQEnVt/reference_files/gender.tsv >& /wQEnVt/timings/WGS_HCC1147_vs_HCC1147_BL.time.geno ; echo \'\\\'\'WRAPPER_EXIT: \'\\\'\'$?\'
+ set +x
\tStarting cache_POP
+ bash -c \'/usr/bin/time -v caveman.pl  -r /wQEnVt/reference_files/genome.fa.fai  -ig /wQEnVt/reference_files/caveman/HiDepth.tsv  -b /wQEnVt/reference_files/caveman/flagging  -ab /wQEnVt/reference_files/vagrent  -u /wQEnVt/reference_files/caveman  -s \'\\\'\'human\'\\\'\'  -sa GRCh38  -t 18  -st WGS  -tc /wQEnVt/tmp/tum.cn.bed  -nc /wQEnVt/tmp/norm.cn.bed  -td 5 -nd 2  -tb /wQEnVt/tmp/HCC1147.cram  -nb /wQEnVt/tmp/HCC1147_BL.cram  -c /wQEnVt/flag.vcf.config.WGS.ini  -f /wQEnVt/reference_files/caveman/flagging/flag.to.vcf.convert.ini  -e 800000  -o /wQEnVt/WGS_HCC1147_vs_HCC1147_BL/caveman  -x chrUn%,HLA%,%_alt,%_random,chrM,chrEBV,chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr22,chrX,chrY,chrMT,NC_007605,hs37d5,GL%  -p setup >& /wQEnVt/timings/WGS_HCC1147_vs_HCC1147_BL.time.CaVEMan_setup ; echo \'\\\'\'WRAPPER_EXIT: \'\\\'\'$?\'
+ set +x
\tStarting verify_WT
+ bash -c \'/usr/bin/time -v seq_cache_populate.pl -root /wQEnVt/ref_cache /wQEnVt/reference_files/genome.fa >& /wQEnVt/timings/WGS_HCC1147_vs_HCC1147_BL.time.cache_POP ; echo \'\\\'\'WRAPPER_EXIT: \'\\\'\'$?\'
+ set +x
+ bash -c \'/usr/bin/time -v verifyBamHomChk.pl -d 25     -o /wQEnVt/WGS_HCC1147_BL/contamination     -b /wQEnVt/tmp/HCC1147_BL.cram     -t 18     -j /wQEnVt/WGS_HCC1147_BL/contamination/result.json     -s /wQEnVt/reference_files/verifyBamID_snps.vcf.gz >& /wQEnVt/timings/WGS_HCC1147_vs_HCC1147_BL.time.verify_WT ; echo \'\\\'\'WRAPPER_EXIT: \'\\\'\'$?\'
ERRORS OCCURRED:
/wQEnVt/cache_POP.wrapper.log
/wQEnVt/geno.wrapper.log
/wQEnVt/verify_WT.wrapper.log

I looked into those three log files and found that in geno.wrapper.log, it said WRAPPER_EXIT: 255.
So I disable the QC steps by setting --skipqc, then the workflow can get through and complete the other steps, also generate the results other than QC related results.

We are not sure if this is bug or anything wrong with our settings. Any thoughts what was going wrong here? Thank you!

Regards,
-Linda

Non-unique row names for ascat SNP positions

I get the following error during Sanger_CGP_Ascat_Implement_ascat.0

Error in row.names<-.data.frame(*tmp*, value = value) :
duplicate 'row.names' are not allowed
Calls: rownames<- -> row.names<- -> row.names<-.data.frame
In addition: Warning message:
non-unique values when setting 'row.names': '10000159076', '10001443180', '10001464892', '10001580662', '10002969047', '10003714170', '10004604986', '10005716858', '10006222744', '10007144174', '10007615195', '10008597988', '10008638235', '10008711757', '10008719519', '10008851426', '10009166345', '10010104492', '10010340044', '10010514599', '10010801161', '10013103202', '10013411415', '10013685551', '10013875720', '10016305792', '10016395866', '10017260290', '10017591228', '10018459282', '10018906507', '10018984666', '10019342086', '10020000428', '10021349368', '10023396508', '10023818927', '10023832872', '10024423030', '10024838291', '10026182328', '10026345258', '10026483183', '10027469678', '10028123364', '10028123716', '10028250952', '10032085493', '10033257247', '10033411435', '10033690581', '10034577697', '10035410291', '10036635494', '1004389199', '10044013493', '10044496971', '1004482610', '10044962660', '10049591718', '10049911701', '10049918427', '1005002367', '10050263201', [... truncated]
Execution halted
Command exited with non-zero status 1
43.08user 4.19system 0:47.89elapsed 98%CPU (0avgtext+0avgdata 942276maxresident)k
25768inputs+85432outputs (62major+791407minor)pagefaults 0swaps

Cran keys

Likely a rebuild will fail due to changes in the signing keys:

See here

replicating brass results

we attempted to replicate brass results using the implementation in analysisWGS.sh on pcawg samples, but we are unfortunately getting very different results for some of the samples. Take for example DO45191_tumor, where pcawg's brass vcf reports only 270 rows in the vcf, but ours has >45000 rows, and the pcawg calls seem to be a subset of our calls. Interestingly, only 2 of 42 pcawg samples had such a dramatic difference between them, although for most of the samples, our calls were supersets of the pcawg-generated calls. as far as i can tell i used the implementation of analysisWGS.sh pretty much to the letter, even separating out the input and cover steps from the downstream.

We are now just wondering if there's any extra filtering steps to perform on brass to pare down that insanely high number, or if you would be able to replicate our results on your end using the same sample. i am currently using the brass version 6.3.4

2.0.0: Battenberg results folder empty ?!

Hi there,

after running the cgpwgs:2.0.0 pipeline on .a T/N-pair aligned with cgpmap the /battenberg/ subfolder in the results directory only has a /tmpBattenberg/ subfolder but does not include the actual result files that I would expect from the battenberg.pl pipeline script.

Is this intended and the battenberg.pl run is not part of the standard cgpwgs pipeline? And if so, why is a battenberg folder created?

Thank you!

No co-located bas file found for tumour

I'm trying to use dockstore-cgpwgs_2.1.1.sif with singularity.
It seems to require bas file along with the bam file :
No co-located bas file found for tumour, expected at /var/spool/data/tumor.bam.bas at /opt/wtsi-cgp/bin/ds-cgpwgs.pl line 81.
But I haven't found any documentation on how to create this type of index file.

samtools: error while loading shared libraries: libhts.so.2

Hi
I have built a singularity image
singularity build --remote cgpwgs_v2.0.0.simg docker://quay.io/wtsicgp/dockstore-cgpwgs:2.0.0

But upon inputting all of the required fields:

singularity exec images/cgpwgs_v2.0.0.simg ds-cgpwgs.pl
-r /cgpwgs_ref/core_ref_GRCh37d5.tar.gz
-a /cgpwgs_ref/VAGrENT_ref_GRCh37d5_ensembl_75.tar.gz
-si /cgpwgs_ref/SNV_INDEL_ref_GRCh37d5.tar.gz
-cs /cgpwgs_ref/CNV_SV_ref_GRCh37d5_brass6+.tar.gz
-sc /cgpwgs_ref/SUBCL_ref_GRCh37d5.tar.gz \

  • qc /cgpwgs_ref/qcGenotype_GRCh37d5.tar.gz
    -t /test_data/HCC1143_ds/HCC1143.bam
    -tidx /test_data/HCC1143_ds/HCC1143.bam.bai
    -n /test_data/HCC1143_ds/data/HCC1143_BL.bam
    -nidx /test_data/HCC1143_ds/HCC1143_BL.bam.bai
    -e 'MT,GL%,hs37d5,NC_007605'

I get the following error and I cant figure it out!

samtools: error while loading shared libraries: libhts.so.2: cannot open shared object file: No such file or directory
Can't close(GLOB(0x28248e0)) filehandle: '' at /opt/wtsi-cgp/bin/ds-cgpwgs.pl line 232

Any help would be appreciated!
Many thanks
Ryan

Error in cgpFlagCaVEMan.pl: Can't use an undefined value as an ARRAY reference...

I have been using the docker image of cgpwgs v2.1.0 and it's been working great for more than 50 samples. However, I'm getting this issue for one sample when running the cgpFlagCaVEMan command:

Can't use an undefined value as an ARRAY reference at /opt/wtsi-cgp/lib/perl5/Sanger/CGP/CavemanPostProcessor/PostProcessor.pm line 560, <__ANONIO__> line 540.
 at /opt/wtsi-cgp/bin/cgpFlagCaVEMan.pl line 125.
Command exited with non-zero status 2
108.58user 0.85system 1:54.27elapsed 95%CPU (0avgtext+0avgdata 86040maxresident)k
0inputs+0outputs (0major+65360minor)pagefaults 0swaps

Any idea what's gone wrong here?
Let me know if I can provide any further details.

Best,
Lars

Suggested exclusion parameters for GrCh38

Line 23 of cgpwgs.cwl contains suggested exclude parameters for GRCh37, but does not include the suggested default parameters for GrCh38.

WARNING: The usual setting for 'exclude' is 'NC_007605,hs37d5,GL%' (human GRCh37/NCBI37). Examples

This was my best guess, but I am not sure if this is correct:

"chr1_%,chr2_%,chr3_%,chr4_%,chr5_%,chr6_%,chr7_%,chr8_%,chr9_%,chr10_%,chr11_%,chr12_%,chr13_%,chr14_%,chr15_%,chr16_%,chr17_%,chr18_%,chr19_%,chr20_%,chr21_%,chr22_%,chrX_%,chrY_%,chrUn%,HLA%,chrEBV"

ERROR: Use of uninitialized value $item in sprintf at /opt/wtsi-cgp/bin/ds-cgpwgs.pl line 255.

Hi there,

I tried to run the dockstore-cgpwgs:2.0.0 container like this:

singularity exec ./dockstore-cgpwgs-2.0.0.simg ds-cgpwgs.pl \
-r /groups/ob/Software/docker/ftp.sanger.ac.uk/pub/cancer/dockstore/human/core_ref_GRCh37d5.tar.gz \
-a /groups/ob/Software/docker/ftp.sanger.ac.uk/pub/cancer/dockstore/human/VAGrENT_ref_GRCh37d5_ensembl_75.tar.gz \
-si /groups/ob/Software/docker/ftp.sanger.ac.uk/pub/cancer/dockstore/human/SNV_INDEL_ref_GRCh37d5.tar.gz \
-cs /groups/ob/Software/docker/ftp.sanger.ac.uk/pub/cancer/dockstore/human/CNV_SV_ref_GRCh37d5_brass6+.tar.gz \
-cr 350000 \
-sc /groups/ob/Software/docker/ftp.sanger.ac.uk/pub/cancer/dockstore/human/SUBCL_ref_GRCh37d5.tar.gz \
-t /groups/ob/Software/docker/MCC_1T/MCC_1T.bam \
-tidx /groups/ob/Software/docker/MCC_1T/MCC_1T.bam.bai \
-n /groups/ob/Software/docker/MCC_1N/MCC_1N.bam \
-nidx /groups/ob/Software/docker/MCC_1N/MCC_1N.bam.bai \
-o ./cgpwgs_MCC1_out \
-c 14 \
-e /groups/ob/Software/docker/ignore_contigs_GRCh37d5.txt

And got the following error output:

https://gist.github.com/leiendeckerlu/c72d1589dd2595bd5e0c2f2c23dbb247

Any idea whats going wrong here?

Thank you!

Dockerfile compile issue

The Dockerfile fails to compile unless the install lines include a "--allow-unauthenticated" flag.

Help to run this dock file using singularity

Hello, I am trying to run this dock pipeline using singularity, I pulled the image "dockstore-cgpwgs-1.1.3.simg" using singularity but I can't run any script? Is anyway wrong? For example,

ds-cgpwgs.pl
bash: ds-cgpwgs.pl: command not found

Do I need to download the "ds-cgpwgs.pl:" from this GitHub?

Also, I tried this:

singularity exec
--workdir /.../workspace
--home /.../workspace:/home
--bind /.../ref/human:/var/spool/ref:ro
--bind /.../example_data/cgpwgs:/var/spool/data:ro
dockstore-cgpwgs-${CGPWGS_VER}.simg
ds-cgpwgs.pl

ERROR : Image path dockstore-cgpwgs-.simg doesn't exist
ABORT : Retval = 255

Can you show me how to run this? thanks.

--force option

Hi,

We noted that when the pipeline breaks, intermediate files are left in the output folder. To restart the analysis the output dir needs to be deleted. Have you considered a --force option to run the analysis and overwrite (remove) such existing files?

Best wishes,

dockstore registry page is broken

the registry page can not display the descriptor file or the example file.

We might need to have a new registry for versions past v2.

AlleleCount error when attempting to run Battenberg

To Whom it May Concern,

I have been trying to use Battenberg on my local HPC cluster using the dockstore-cgpwgs image in singularity using the following command:

singularity exec /home/.../dockstore-cgpwgs-2.0.0.simg battenberg.pl
-o /.../exp/PD12808
-r /home/regmvcr/Scratch/reference/hg19.fa.fai
-tb /home/.../SAMPLENAME_tumour.bam
-nb /home/.../SAMPLENAME_normal.bam
-ge XY
-e /home/...reference/impute/impute_info.txt
-u /home/.../reference/1000genomesloci
-ig /home/.../reference/ignore_contigs.txt
-gc /home/.../battenberg_wgs_gc_correction_1000g_v3.tar.gz
-g /home/...
-c /home/.../reference/probloci_270415.txt
-pr WGS
-t 8

It runs without producing an error message endlessly unless it is killed. However, the output consists of allele frequency files which consist entirely of 0s, like this:

#CHR POS Count_A Count_C Count_G Count_T Good_depth
13 19020013 0 0 0 0 0
13 19020047 0 0 0 0 0
13 19020095 0 0 0 0 0
13 19020145 0 0 0 0 0
13 19020341 0 0 0 0 0

and Sanger_CGP_Battenberg_Implement_battenberg_allelecount.x files in the progress folder which are entirely blank.

I've followed the suggestions as per this alleleCount question:
cancerit/alleleCount#41
But none of the suggestions there fixed the problem.

It looks to me like someone else is having a similar problem when trying to run ASCAT:
#36

Since I'm relatively new to all of this, I'm not entirely sure if I'm doing something wrong or if there is an issue with the alleleCount function in the singularity image.

Any help on solving/circumventing the problem would be greatly appreciated.

rename ds-wrapper.pl

Rename to ds-cgpwgs.pl to allow layered images to preserve functionality of underlying ones. Mainly so singularity export of dockstore-cgpwgs can be used for mapping, wxs and wgs analysis.

Analysis completed with empty caveman snps.ids vcf

I am running CGPWGS on a large number of samples (around 800 pairs). In case of 27 the pipeline finished without an error, but caveman snps.ids.vcf.gz contains only a header.

Could you point me where could I search for the cause?

I was running it on an HPC cluster using singularity image, version 2.0.1 with this run params

export PCAP_THREADED_NO_SCRIPT=1
export PCAP_THREADED_FORCE_SYNC=1
export PCAP_THREADED_LOADBACKOFF=1
export PCAP_THREADED_REM_LOGS=1
PROTOCOL=WGS
OUTPUT_DIR='/var/spool/results'
REF_BASE='/var/spool/results/reference_files'
BAM_MT='/var/spool/data/CPCT02010267TII/CPCT02010267TII.bam'
IDX_MT='/var/spool/data/CPCT02010267TII/CPCT02010267TII.bam.bai'
BAM_WT='/var/spool/data/CPCT02010267R/CPCT02010267R.bam'
IDX_WT='/var/spool/data/CPCT02010267R/CPCT02010267R.bam.bai'
CONTIG_EXCLUDE='MT,NC_007605,hs37d5,GL%'
PINDEL_MAXCPU=8
SPECIES='human'
ASSEMBLY='GRCh37d5'
CAVESPLIT='800000'
SNVFLAG='/var/spool/results/flag.vcf.config.WGS.ini'
CPU=28
CLEAN_REF=1
SKIPBB=1
SKIPQC=1

These are five last lines from timing files that contained caveman in the name

==> WGS_CPCT02010267TII_vs_CPCT02010267R.time.CaVEMan <==
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0

==> WGS_CPCT02010267TII_vs_CPCT02010267R.time.CaVEMan_annot <==
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0

==> WGS_CPCT02010267TII_vs_CPCT02010267R.time.CaVEMan_setup <==
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0

==> WGS_CPCT02010267TII_vs_CPCT02010267R.time.CaVEMan_split <==
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0

==> WGS_CPCT02010267TII_vs_CPCT02010267R.time.cgpFlagCaVEMan <==
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0

This is the content of the *time file in the work folder.

 Command being timed: "/opt/wtsi-cgp/bin/analysisWGS.sh /var/spool/results/run.params"
User time (seconds): 2200457.98
System time (seconds): 18570.81
Percent of CPU this job got: 2450%
Elapsed (wall clock) time (h:mm:ss or m:ss): 25:09:32
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 9836908
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 506
Minor (reclaiming a frame) page faults: 10819652688
Voluntary context switches: 105813361
Involuntary context switches: 80816834
Swaps: 0
File system inputs: 93244
File system outputs: 66220240
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0

resource consumption increase since 2.0.1

First of all thank you for the great work with the cgpwgs pipeline. We have been using version 2.0.1 and consider moving to 2.1.0 now, mainly due to the ASCAT/Caveman bug, but also performance improvements.

In the test runs we noticed however a considerable increase in resources needed to complete the analysis:

  1. runtime jumped from ~24 to ~40 hrs on 32 threads
  2. memory consumption raised as well (can report on precise numbers if needed)

Is there a reason for such increase in resource use considering changes introduced in 2.1.0 (since 2.0.1)?

Reference bundle for 38

Hi, Keiran, thanks for your answer on this same issue a several days ago. You said i should raise it here.

Hi, I have a question regarding this workflows ability to work with 38 reference files? And if it's possible to run it with them, where i can find reference bundles for 38, or if you can point me to the method or tool that was used to generate these existing 37 reference bundles. Thanks.

cannot run cgpwgs

Hi, I am trying to run cgpwgs within docker and the first thing I wanted to do is to test on the example data by following the wiki:

  1. I downloaded the Reference data into my local PC in /home/leo/Desktop/SequenceSimulator/sanger_pipeline/ref
  2. since the Reference data download code unpack the data, then all the option just need to point to ref
  3. I downloaded the example file: COLO-829.bam for tumor, COLO-829-BL.bam for normal and the corresponding index files, then rename them to tumor.bam, tumor.bam.bai, normal.bam, normal.bam.bai, respectively, save them to /home/leo/Desktop/SequenceSimulator/sanger_pipeline/data
  4. then I just run the the pipeline by following the Run command on Wiki:
sudo docker run -d \
--read-only --tmpfs /tmp \
--env HOME=/var/spool/results \
-v /home/leo/Desktop/SequenceSimulator/sanger_pipeline/ref:/var/spool/ref:ro \
-v /home/leo/Desktop/SequenceSimulator/sanger_pipeline/data:/var/spool/data:ro \
-v /home/leo/Desktop/SequenceSimulator/sanger_pipeline/results:/var/spool/results:rw \
quay.io/wtsicgp/dockstore-cgpwgs:2.1.1 \
ds-cgpwgs.pl \
-r /var/spool/ref \
-a /var/spool/ref \
-si /var/spool/ref \
-cs /var/spool/ref \
-qc /var/spool/ref \
-pl 3.65 -pu 1.0 \
-e 'MT,GL%,hs37d5,NC_007605' \
-t /var/spool/data/tumor.bam \
-tidx /var/spool/data/tumor.bam.bai \
-n /var/spool/data/normal.bam \
-nidx /var/spool/data/normal.bam.bai \
-o /var/spool/results

But when I hit Enter, I got a series of numbers and letters: ba0c2460434a722a9df47dd20f1b2b62d7f93d959dded18f24ae4c47eb69b164 and nothing happened

image
I also could not find any output in the results folder on my own PC, can someone help me with this? Thanks!

alleleCounter behavior on overlapping read pairs

Hello CancerIT team,

could you explain what the behavior of alleleCounter is when it encounters read pairs where the reads overlap? Are such pairs counted for depth?

This becomes relevant when DNA is very fragmented.

All the best
Dominik

Frequent crashes at various steps of the workflow

Dear Keiran,

The containers available at dockstore crash frequently in our environment. I tried the versions 1.0.8, 1.1.2, 1.1.3, and 1.1.4. The crashes occur at random steps of the workflow even for the same dataset, which led me to believe that it is a technical issue and unrelated to the data. With few exceptions, I could not find any error messages in the log files. The *.wrapper.log files contained an exit code of 255 and the files inside the timings folder, too. But other than that there was no hint about the source of the error in any of the other log files.

After extensive debugging I managed to track down the crashes to two issues:

  • In order to launch a job, a command is written to a shell script file, for example WGS_tumor_vs_control/caveman/tmpCaveman/logs/Sanger_CGP_Caveman_Implement_caveman_estep.94.sh. This script is then made executable and called right after. Apparently, some versions/storage drivers of docker have an issue with this. When there is no delay between making the script executable and running it, occasionally the change in permissions has not yet become effective before the script is run, resulting in an error Text file busy and the termination of the workflow. Others have reported this issue, too: moby/moby#9547. Supposedly, it helps to insert a sync or sleep 1 between making the script executable and running it. I am not sure whether this helps, because switching to singularity fixed this issue for me, so I did not bother to find out which scripts would need to be modified and actually try it out. Even though this is not a bug in the workflow itself but in Docker, you might want to consider inserting a sync, because other users might run into the same error.

  • After solving the above issue, only about half of the runs crashed (rather than 9/10). The remaining crashes were caused by the need_backoff function in /opt/wtsi-cgp/lib/perl5/PCAP/Threaded.pm. The following line occasionally threw an error Use of uninitialized value $one_min:
    $ret = 1 if($one_min > $self->{'system_cpus'});
    I was unable to find out, why $one_min is undefined sometimes. I tried writing the value of $uptime to STDERR to check, if the regex fails to match, but for reasons I do not understand the values did not get written to the log files of the workflow. I tried replacing the uptime tool with something that is guaranteed to produce an output string matching the regex, but the error still occurred. At this point, I'm thinking that perhaps the call to the external command uptime from within Perl fails from time to time. I eventually gave up, since it takes days to reproduce the issue and I was able to avoid the crashes altogether by simply wrapping the offending line into this:

if (defined $one_min) {
  $ret = 1 if($one_min > $self->{'system_cpus'});
}

I assume you do not bump into these issues as often as I do, because you certainly would have noticed an error that affects a major fraction of the runs. I have no explanation as to why these two errors happen so frequently in our environment. Still, I was able to reproduce the issues on various systems (openSuSE/CentOS) with various kernel/Docker versions and various storage drivers, so other users might be affected, too. I therefore figured that it is reasonable to take precautions to circumvent the errors and wanted to give you this feedback.

Regards,
Sebastian

Error when running under singularity

Hello, I tried to run cgpwgs under singularity but an error occured in the step of:

bash -c '/usr/bin/time -v brass.pl -j 4 -k 4 -c 48  -d /var/spool/results/reference_files/brass/HiDepth.bed.gz  -f /var/spool/results/reference_files/brass/brass_np.groups.gz  -g /var/spool/results/reference_files/genome.fa  -s '\''Human'\'' -as NCBI38 -pr WGS -pl ILLUMINA  -g_cache /var/spool/results/reference_files/vagrent/vagrent.cache.gz  -vi /var/spool/results/reference_files/brass/viral.genomic.fa.2bit  -mi /var/spool/results/reference_files/brass/all_ncbi_bacteria  -b /var/spool/results/reference_files/brass/500bp_windows.gc.bed.gz  -ct /var/spool/results/reference_files/brass/CentTelo.tsv  -cb /var/spool/results/reference_files/brass/cytoband.txt  -t /var/spool/results/tmp/TUM1.bam  -n /var/spool/results/tmp/WT1.bam  -o /var/spool/results/WGS_TUM1_vs_WT1/brass  -p input >& /var/spool/results/timings/WGS_TUM1_vs_WT1.time.BRASS_input ; echo '\''WRAPPER_EXIT: '\''$?'
ERRORS OCCURRED:
/var/spool/results/ascat.wrapper.log
/var/spool/results/BRASS_input.wrapper.log


>cat ascat.wrapper.log 
WRAPPER_EXIT: 1

> cat tmpAscat/logs/Sanger_CGP_Ascat_Implement_ascat.0.err
...
Error in apply(corr_tot, 1, function(x) sum(abs(x * length_tot))/sum(length_tot)) : 
  dim(X) must have a positive length
Calls: ascat.GCcorrect -> apply
Execution halted
Command exited with non-zero status 1

Wondering if anyone can help with it? Thanks.

(repo owner edit: corrected formatting)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.