Giter Site home page Giter Site logo

dkfz-odcf / aceseqworkflow Goto Github PK

View Code? Open in Web Editor NEW
24.0 13.0 10.0 15.84 MB

Allele-specific copy number estimation with whole genome sequencing

Home Page: http://aceseq.readthedocs.io

License: MIT License

Shell 10.34% Python 16.92% R 54.01% Perl 9.22% Java 6.56% Groovy 2.95%
copy-number-variation gc-correction structural-variation genome-sequencing workflow roddy

aceseqworkflow's Introduction

ACEseq Workflow

Original Author: Kortine Kleinheinz [email protected]

Current Author: Gregor Warsow [email protected]

Description

ACEseq (Allele-specific copy number estimation with whole genome sequencing) is a tool to estimate allele-specific copy numbers from human WGS data, and comes along with a variety of features:

  • GC/replication timing Bias correction
  • quality check
  • SV breakpoint inclusion
  • automated estimation of ploidy and tumor cell content
  • HRD/TAI/LST score estimation
  • with/without matched control processing

Citation

Kortine Kleinheinz, Isabell Bludau, Daniel Huebschmann, Michael Heinold, Philip Kensche, Zuguang Gu, Cristina Lopez, Michael Hummel, Wolfram Klapper, Peter Moeller, Inga Vater, Rabea Wagener, ICGC MMML-Seq project, Benedikt Brors, Reiner Siebert, Roland Eils, Matthias Schlesner. ACEseq - allele specific copy number estimation from whole genome sequencing. biorxiv.

Prepackaged files

The complete installation instructions can be found in the documentation.

Further required software and plugins

Additional necessary software like e.g. Roddy can be found here

Support us (indirectly)

de.NBI logoYour opinion matters! The development of this workflow is supported by the German Network for Bioinformatic Infrastructure (de.NBI). By completing this very short (30-60 seconds) survey you support our efforts to improve this tool.

Changelog

  • Version update 6.0.0

    • Changed the phasing routine. The program "impute2" was replaced by "Beagle". Files and tools were renamed accordingly.
    • Generally renamed all tools and files from "imputeGenotype" to "phaseGenotype" (and so on) as the subroutine does not actually perform imputation but rather phasing.
  • Version update to 5.0.1

    • fixed density(NA) bug and index bug for frequencies (as.character) in clustering step
  • Version update to 5.0.0

    • introduced CNA.type 'AMP' (TCN>=2*ploidy + 1)
    • do not use ChrX/Y for round/full ploidy determination
    • fixed faulty assignment of 'neutral' to gonosomal segments in male samples
    • fixed pruning bug (combineNeighbours, homozygousDel)
    • force gender to be set in noControl cases
    • enhanced cluster plots
    • added CovBaf plots
  • Version update to 4.0.3

    • use fake controls from shared folder
  • Version update to 4.0.2

    • fixed several gap merging bugs (HRD score)
  • Version update to 4.0.1

    • smoothing: merge first segment after long gap
  • Version update to 4.0.0

    • fixed bug in HRD score determination (tcnStatePerChrom nrow vs length)
    • write out file with segments contributing to HRD score
    • introduced HRD score with gapped centromeres (numberHRDSmoothReduced)
    • corrected LST and TAI calculation (changed centromere regions file)
  • Version update to 3.0.0

    • fixed 'artifact-1' artifact (allow for low purity solutions in this case)
    • work with old (version < 2.x) ACEseq result files on rerun
    • introduced ymaxcov_threshold for maximum TCN count represented in segment plots
    • fixed bug in creation of json file (doubled first solution)
    • fixed noControl filegroup bug
    • fixed noControl control-bam file access issue
    • run without SV in noControl cases
    • use true|false for SV cvalue and not yes|no
    • fixed bug contamination/sample swap detection (secondPeak index)
  • Version update to 2.0.0

    • add contours in 2D plots
    • add 1.0 (instead of 0.00001) to lengths when getting log2 for weights to consider segments with length=1
    • Add flags and checks for handling the sv file and remove null pointer exceptions
    • fix bug in tcc ploidy estimation
    • parametrized local minimum upper boundary (local_minium_upper_boundary_shift)
  • Version update to 1.2.10

    • add bioconda dependencies
    • replace all occurences of qq.R and getopt2 by getopt
    • replace all occurenced of name delly and crest, also in final output
    • change color for deletions(red ==>blue) and duplications (red ==> blue)
    • enable modularization of workflow
    • remove generateVCF job, add estiate HRD score
    • remove dependency of haploblock files in cluster_and_prune_segments
    • add HRD score estimation, smooth segments and filter for blacklist segments
    • add 0.00001 to lengths when getting log2 for weights to consider segments with length=1, which will be merged in a future release
    • adjust colors for clustering so they are consistent across all three cluster plots
  • Version update to 1.2.8-1

    • remove vcf creation in final job (obsolete)
  • Version update to 1.2.8

    • comb_pro_extra and most_important_info contain X and Y
    • removed GNL column
    • new annotation of CNA.type (DEL/DUP/LOH/TCNNeutral/NA)
    • new estimation of quality (length of subclonal over total mapped)
  • Version update to 1.2.7-1/2

    • removed dependencies on coConfigurations
    • change of svOutputdirectory and set to default SOPHIA
  • Version update to 1.2.7

    • addition of tumorSample and controlSample variable as read out from bam file
  • Version update to 1.2.6-*

    • bugfix allowing coordinates for chrom2 in SV file to be smaller than chrom1 coordinates
    • bugfix allowing chrom1 being decoy chromosome in case chrom2 is autosome|X|Y
    • bugfix plots using print and ggplot2:ggsave to generate plots
  • Version update to 1.2.6

    • runparallel for impute moved from COWorkflows to ACEseqMethods.groovy
    • sort most_important_info and comb_pro_extra file
  • Version update to 1.2.1

    • enable BAF plots as extra step, that is only run for paired workflow and only writes down checkpoint
    • enable json with quality parameters for closest to diploid solution, should be read out by otp, file noted down in config.xml
    • enable upgrade to R-3.3.1, R-2.15.0 is only working with exception (pscbs_all_R_2.15.R must be redefined as tool and path to pscbs lib should be given)
    • better format of cnv_parameter files
    • removed "set -x", pipefail etc.
    • PSCBSgabs_delly.py
      • improved code
      • added selective column
    • add noControl options
    • cluster and prune take mean if two equally high peaks appear or for single peak remove bug
    • add chromosome labels to general coverage plots
    • remove chr prefixes throughout analysis
    • make email option optional for gcCorrection
    • be flexible on sv_type file, "id" column optional
  • Version update to 1.0.189

    • pscbs_all.R: bugfix that screwed up chrlength
  • Version update to 1.0.187

    • stabilizing addition to pscbs_all.R
  • Version update to 1.0.185

    • bugfixes for pscbs_all.R
  • Version update to 1.0.183

    • pscbs_all.R and PSCBSall.sh:
      • scientific format of start and end here already, replace +Inf/-Inf with chromosome length/0
    • adjustsAlleleFreqs and functions.R and purity_ploidy.R:
      • don't consider dbSNP position with 0 reads mapped in tumor
    • manual_pruning.R:
      • don't consider dbSNP position with 0 reads mapped in tumor and remove bug in case no main cluster is found
  • Version update to 1.0.181

    • convertTabTovcf.sh: moved usage of id option (for pcawg output)
    • convertToVcf.py:
      • libraries removed
      • default for id argument set (NA)
      • changed version number (for pcawg output)
      • added missing "\n" for header
      • added SAMPLE_ID line for header (pancan)
      • changed sample_$pid column name to TUMOR
    • correctGCBias_functions.R:
      • corrected coordinates in coverage plots
    • datatablePSCBSgaps.sh:
      • tabix for position not window (col5 instead of 1)
    • haplotypes.sh:
      • tabix corrected (added -e)
    • manual_pruning.R:
      • adjusted for no cluster within cluster limits
    • merge_and_filter_cnv.py/merge_and_filter_snp.py:
      • removed coverage filter for tumor
    • pscbs_all.R:
      • round coordinates with .5 (ceiling for start, floor for end)
    • PSCBSgabs_plus_delly_points.py:
      • removed library
    • pscbs_plots_functions.R:
      • fullPloidy calculation by largest fraction of genome instead of most segments in genome
      • annotation of LOH (cn,gain,loss) corrected
      • LOH definition adjusted instead of dhMean </> 0.8 using round(c1) and round(c2) (==0 for LOHs, != 0 for everything else)
    • pscbs_plots.R:
      • added gc()
    • purity_ploidy.R:
      • average coverage on autosomes instead of all chromosomes (better for males)
      • allow missing autosomes
    • vcfAnno.sh:
      • rearranged order of script (first gender estimation)
    • addition of fake control option:
      • added scripts:
        • replaceControl.sh
        • replaceControlACEseq.R
  • Version update to 1.0.158

    • extracted into single plugin
  • Version update to 1.0.131

    • plots: tab seperateed file with cnv parameters written
    • correctGCBias: extra file with qc parameters printed, conversion to json, parameter for scale adjustion added
    • Change workflow class to override another execute method. This makes the workflow a bit cleaner.
  • Version update to 1.0.114

    • PSCBSgabs_plus_CRESTpoints.py: add identifier column to sv_points file (used with DELLY calls)
    • PSCBSgabs_plus_delly_points.py: add identifier column to sv_points file, convert start coordinates to 1-based
    • homozygous_deletion.pl: added id column for sv file, adjust length calculation to inclusion of start AND End coordinate
    • correctGCBias.R: density estimation without restrictions for final corrected covRatio
    • manual_pruning.R: check for haploblock files to prevent script from failing after 2 hours due to missing files, plot Names with PID, new feature: merge clusters again after outlier removal and choose random cluster if 2 or more "mainClusters are found"
    • purity_ploidy.R: estimate and report average coverage, add minCoverage as optional parameter
    • purity_ploidy_estimation_final.R: bug fixes for balanced segments using wrong matrix, new feature: disallow copy number states of 0, don't punish balanced segments with copy number down to -0.3
    • functions.R: soft limits for control SNPs used for peak calling (coverage dependant), no limits for tumor SNPs, single threat for running
    • purityPloidity_EstimateFinal.sh: add PID as parameter
    • analysisCopyNumberEstimation.xml: adjust settings for peak calling to actually used resources
    • pscbs_plots_functions.R: convert 0.5 coordinates to integer values, don't allow scientific format for output files
  • Version update to 1.0.109

  • Version update to 1.0.105

  • Version update to 1.0.104

    • bug fix: read all lines of breakpoint.txt to avoid missing information on chr 1
  • Version update to 1.0.103

aceseqworkflow's People

Contributors

dankwart-de avatar gwarsow avatar suhrig avatar tlkaufmann avatar turtletrok avatar vinjana avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

aceseqworkflow's Issues

Warning about malformed variable reference in plugin XML

We observe the following warning

* Variable 'CHR_NR' defined in '/path/to/plugins/ACEseqWorkflow/resources/configurationFiles/analysisCopyNumberEstimation.xml' may use malformatted variable references. For    
+variables references like ${variable identifier} nesting like '${${innerVar}}' is forbidden and it must not be empty.
  • CHR_NR is set in the groovy code for each chromosome-specific job
  • Otherwise the variable is only used in the configuration (from where this error comes) in the mentioned XML. There, obviously it does not have a value, because the value is only set at runtime of Roddy.

Not sure why the message warns about emptiness "and it must not be empty" (and there is a dangling "it" in the sentence, which does not add to the clarity of the warning). Usually, empty variables at evaluation time of the configuration are no problem, and I think also references to such variables should not be a problem.

COWorkflowBasePlugin Download?

The first part of the guide mentions the following: "Download the COWorkflowBasePlugin zip-archive from Github-Releases. The version to download can be found in the ACEseq buildinfo.txt."

However, this file does not exist in your repo. The .zip and the tarball are both the same.
Can you please upload it, as it does not seem to be available elsewhere.

Just to be clear, COWorkflowBasePlugin doesn't exist on this repo, the buildinfo.txt does.

Many thanks,
Toseph

Make output MultiQC-compatible

MultiQC simplifies the visual representation of QC data. For display in MultiQC result files either should be one of the already supported formats or can be annotated. Check the output files and restructure their content to support MultiQC.

Installation failure caused by conda confliction

HI,
First thanks for developing such an awesome tool. I want to apply it on my WGS data to call somatic CNA. However, when I tried this step:
conda env create -n ACEseqWorkflow -f $PATH_TO_PLUGIN_DIRECTORY/resources/analysisTools/copyNumberEstimationWorkflow/environments/conda.yml
It seems a lot of conda packages, especially their versions are conflicted. And some package version couldn't be found in current conda as well. Do you heve some recommendations on how to solve this problem?

Many thanks!

Yang

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.