Giter Site home page Giter Site logo

All Uncertain Significance about charger HOT 1 CLOSED

ding-lab avatar ding-lab commented on May 26, 2024
All Uncertain Significance

from charger.

Comments (1)

fernanda-rodrigues avatar fernanda-rodrigues commented on May 26, 2024 1

Hi @ekofman

Thank you for your question and for using our tool.

The reason why all your variants are being classified as uncertain significance is that you're not setting any additional parameters to CharGer. The simpler way to put this is: the more information about your variants that you give CharGer, the better your variant classification will be. For more information on ACMG guidelines implemented into CharGer, please read: https://www.nature.com/articles/gim201530.pdf

Is your input vcf file annotated with VEP? If not, you can VEP annotate your file within CharGer (please refer to README). This should improve your results a bit.

Adding some of the different parameters described in our README file should also make your analysis more precise.

For example, you can use have CharGer access the ClinVar database by using the -l flag accompanied by the --mac-clinvar-tsv file that you can download from the MacArthur lab github page (https://github.com/macarthur-lab/clinvar/tree/5b04ade4fb4d2f13ffd39e4a8d9ade9af28fdaf9). This will allow CharGer to gather information for you variants from the ClinVar database and improve variant classification. CharGer will soon allow input files downloaded directly from ClinVar, but you can use the MacArthur lab file for now.

You can also input some of the cross-reference data files described, or a allele frequency threshold for rarity (please refer to README). For an example of the CharGer tool being applied to one of our studies, please refer to our PanCan Atlas germline paper: https://www.sciencedirect.com/science/article/pii/S0092867418303635?via%3Dihub#sec4
The cross-reference data-files used in this study (pathogenic variants .vcf file, inheritanceGeneList (which includes a list of 152 known cancer predisposition genes), and a HotSpot3D clusters file) are present here: https://github.com/ding-lab/CharGer/tree/master/PanCanAtlasData
These files should give you a good example of their expected formats.
For a more in-depth description of some of the cross-reference data files you can use as input, please read below:

-z pathogenic variants, .vcf : this is a .vcf file with known pathogenic variants that you may compile yourself. This list is taken into account by CharGer when implementing the PS1 and PM5 ACMG evidence levels.
Depending on your study, you may compile a list of known pathogenic variants (confirmed in the literature and/or ClinVar) that are specific and/or relevant to your disease.

-e expression matrix file, .tsv : this is a .tsv file, which a column for each sample, and a row for each gene. If you have expression data for the genes you’re targeting or genes of your interest, you can generate a matrix like this using RSEM, for example.
If you do not input an expression matrix, CharGer will allow eligible truncations in your data set without expression data in the PVS1 evidence level.
If you provide expression data, a threshold of 0.2 is used. If expression is lower than the threshold, truncation is allowed in the PVS1 evidence level.
Note that the PVS1 evidence level requires the mode of inheritance to be dominant (assuming heterzygosity) and co-occurence with reduced gene expression if expression data is provided.

--inheritanceGeneList: is a tab-delimited file that should contain three columns: gene, disease, and mode of inheritance (autosomal dominant, autosomal recessive). Make sure to use approved HUGO symbols.
This file should be use when you have a list of known predisposition genes you would like to input to CharGer. This list is taken into account by several evidence levels (PVS1, PSC1, PM4, PP2, and PPC1).

--PP2 Gene list: this is just a file with a gene per line (be sure to use approved HUGO symbols. Following the ACMG guidelines description, this list should include susceptibility genes that have a low rate of benign missense variation and in which missense variants are a common mechanism of disease. Missense variants in any of these genes will fall into the PP2 evidence level.

--BP1 Gene list: same format as the PP2 Gene list. Following the ACMG guidelines description, this list should include genes for which primarily truncating variants are known to cause disease. Missense variants falling in any of these genes will fall into the BP1 evidence level.

-n de novo file: this is a standard maf (mutation annotation format) file; this file should contain de novo variants with maternity and paternity confirmation and no family history. This file, if provided, is taken into account in the PS2 ACMG evidence level. If you have this information from your dataset; please provide it using this argument.

-a assumed de novo file: this is a standard maf file as above; this file should contain assumed the novo variants from your dataset; i.e. variants for which you have evidence are de novo, but do not have maternity or paternity confirmation. This file, if provided, is taken into account in the PM6 ACMG evidence level.

-c co-segregation file: this is also a standard maf file; this file should include variants cosegregating with disease in multiple affected family members in a gene definitively known to cause the disease (according to ACMG guidelines).

-H HotSpot3D clusters file: this a file can be generated by our HotSpot3d tool (https://github.com/ding-lab/hotspot3d), which identifies mutation hotspots from linear protein sequence and correlate the hotspots with known or potentially interacting domains, mutations, or drugs. If provided, this file is taken into account in the PM1 evidence level. If a germline variant is located in a mutational hot spot and/or critical and well-established functional domain (e.g. active site of an enzyme) without benign variation, the the variant is flagged with a pathogenic characterization of PM1.
An example of this file, which was used in our PanCan study, is present here: https://github.com/ding-lab/CharGer/tree/master/PanCanAtlasData

Applying some of these parameters and files should improve your results.
Hope this helps. Please let us know if you have any additional questions.

  • Fernanda

from charger.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.