clinical-genomics / balsamic Goto Github PK
View Code? Open in Web Editor NEWBioinformatic Analysis pipeLine for SomAtic Mutations In Cancer
Home Page: https://balsamic.readthedocs.io/
License: MIT License
Bioinformatic Analysis pipeLine for SomAtic Mutations In Cancer
Home Page: https://balsamic.readthedocs.io/
License: MIT License
The reports coming out from Picard should be sorted row wised.
TMB was defined as the number of somatic, coding, base substitution, and indel mutations per megabase of genome examined. https://genomemedicine.biomedcentral.com/articles/10.1186/s13073-017-0424-2
1.1. Non-coding alterations were not counted.
1.2 Alterations listed as known somatic alterations in COSMIC and truncations in tumor suppressor genes were not counted
1.3 Alterations predicted to be germline by the somatic-germline-zygosity algorithm were not counted
1.4 Alterations that were recurrently predicted to be germline in our cohort of clinical specimens were not counted.
1.5 Known germline alterations in dbSNP were not counted.
1.6 To calculate the TMB per megabase, the total number of mutations counted is divided by the size of the coding region of the targeted territory.
1.7 select the table 1 from paper above as a ref for comparison.
Tumor mutation burden (TMB), fraction of copy number–altered genome, and gene alterations were compared among patients with DCB and no durable benefit (NDB). http://ascopubs.org/doi/full/10.1200/JCO.2017.75.3384
2.1 in addition to above, copy number alterations were also counted.
An additional rule for picard CollectAlignmentSummaryMetrics
from sbatch
set -D, --workdir=directory
for job submissions
Manta in single sample mode should only have the following files in their output:
"tumorSV.vcf.gz", "candidateSV.vcf.gz", "candidateSmallIndels.vcf.gz"
while right now it is:
"diploidSV.vcf.gz", "somaticSV.vcf.gz", "candidateSV.vcf.gz", "candidateSmallIndels.vcf.gz"
implementation of CollectInsertSizeMetrics results from Picard in results.
Result report filter config is series of filters, configs, and parameters to summarize analysis results as an effort to present a list of actionable targets and discovered new targets.
This aggregated list will generate 4 list of variants for SNV and INDELs:
Vardict reports bunch of complex variants that need to be either decomposed or put into another batch of variants.
Add up tom 1000X instead of 200X that is today.
After 200X, it should be increased by 50X (e.g. 250, 300, etc) to make it possible to plot the data
A new tool for plotting structural variants: samplot.py
: https://github.com/ryanlayer/samplot
CollectMultipleMetrics can gather multiple metrics, but it might fail on some metrics due to various reasons (e.g. java memory issue). Prepare a list of metrics that we are interested from this.
Describe what needs to done, similar to figure 2 in this document: https://www.accessdata.fda.gov/cdrh_docs/reviews/den170058.pdf
VCF union of all three caller:
It seems the AF that mutect2 in gatk3.8 is reporting is not matching the actual AD values. This is due to AD show is "unfiltered" AD. And read depth values are just estimates "before" applying the internal filters. broadinstitute/gatk#3808 and thus DP is not reported by Mutect2 in gatk3.8 (however, it is reported in gatk4 version, where AD values are also not matching with gatk3.8). This could be due to internal filtering process that's happening behind the scene.
A solution to get the values used by Mutect2 to add the actually AD to VCF is to invoke Coverage
annotation through --annotation
command. Note that DP value reported here will still be unfiltered reads of tumor + normal. For actually DP value, AD values need to be summed.
BALSAMIC/BALSAMIC/config/create_config.py
Line 130 in 9c37ea8
BALSAMIC/BALSAMIC/config/MSK_impact.json
Line 22 in 2d501c7
report mapping quality in mutect2 using https://software.broadinstitute.org/gatk/documentation/tooldocs/3.8-0/org_broadinstitute_gatk_tools_walkers_annotator_RMSMappingQuality.php
divide very large bam files into manageable sizes
In accordance with https://github.com/Clinical-Genomics/Goals/issues/51
BED files need interval preparation and run estimation time. A separate mini-analysis to prepare and split bed files can speed up analysis and reduce runtime.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.