Giter Site home page Giter Site logo

Comments (5)

ccwang002 avatar ccwang002 commented on June 5, 2024

hey @yubau1112, you have created quite a lot of GitHub issues here, and to be honest I still don't know what you are trying to achieve with CharGer. Here is my understanding of what you want to do: you want to run CharGer on your cancer WGS germline VCF and annotate your variants with ClinVar and ExAC.

If that's the case, I would recommend you to follow the steps bellow:

  1. Determine the human genome reference version you are using (hg19/GRCh37 or hg38/GRCh38)
  2. Annotate your VCF using VEP
  3. Run CharGer

Step 1: Detemine the human genome reference version

You need to be sure which genome reference you used to generate your germline VCF. If you are using hg19, you should run VEP with GRCh37 cache and use the hg19 version of ClinVar. You cannot mix the genome reference.

Because you are using Demo/demo.vcf here, it's GRCh37. And I will assume your actual data is using the same genome reference below.

Step 2: Annotate your VCF using VEP

You should have your VEP installed and preferably set up VEP's cache. Here I use VEP v95 as an example, but any version later than that should also work. You should annotate your VCF with the following command:

vep --format vcf --vcf \
    --assembly GRCh37 \
    --everything --af_exac \
    --offline --cache --dir_cache /path/to/vep_cache/ --fasta /path/to/vep_cache/homo_sapiens/95_GRCh37/Homo_sapiens.GRCh37.dna.toplevel.fa \
    --input_file your_wgs_germline.vcf.gz \
    --output_file your_wgs_germline.vep95.vcf 

The VEP command above will give you a VCF your_wgs_germline.vep95.vcf with the following information:

  • Assume your VCF is using hg19/GRCh37 genome reference
  • Assume you have the VEP v95 cache set up locally at /path/to/vep_cache/homo_sapiens/95_GRCh37
  • Annotate your variants with 1000 Genomes, ExAC, and gnomAD population allele frequencies. We recommend gnomAD over ExAC, so if you don't particularly need ExAC, remove the --af_exac flag

Step 3: Run CharGer

Please start with the following and try to add extra options only after it works.

charger \
    -f your_wgs_germline.vep95.vcf \
    -o your_wgs_germline.charger.tsv \
    -l --mac-clinvar-tsv ~/CharGer3/CharGer/Demo/clinvar_alleles.multi.b37.tsv.gz \
    -D 

You will see some of the ACMG/CharGer modules got disabled because they require additional annotations. But you should be able to successfully run CharGer.

If any of the steps above doesn't work, I will need the following information:

  • Your genome reference version
  • Your VEP version, command, and one of your VEP annotated VCF
  • Your CharGer command

Please stop posting your installation logs, stop trying other CharGer options, and stop creating new issues here unless you get the steps above working. I will comment on your other questions in detail later. Thank you.

from charger.

ccwang002 avatar ccwang002 commented on June 5, 2024

Adding extra CharGer options

Please only consider these options after you successfully run through the three steps above.

The CharGer command above doesn't run all the modules because it lacks the additional annotation. And the additional annotations here depends on your disease, so there is no one annotation for everything and very likely you need to create your own annotations if you are studying a different disease. If you are studying cancer, we have some example pan-cancer annotations under PanCanAtlasData. The files listed below can be found under that folder.

  • -z for all known pathogenic variant in your disease (pan-cancer example: emptyRemoved_20160428_pathogenic_variants_HGVSg_VEP.vcf.gz)
  • --inheritanceGeneList for the known inheritance mode of genes in your disease (pan-cancer example: 20160301_Rahman_KJ_KH_gene_table_CharGer.txt.gz)
  • -H HotSpot3D cluster file (pan-cancer example: MC3.noHypers.mericUnspecified.d10.r20.v114.clusters.gz)

Please also check out the detailed description of these options (and more) by my colleague at #18 (comment).

Using hg38 genome reference

If your VCF is using hg38 genome reference, you need to change all the annotations. It at least affects these parameters:

from charger.

ccwang002 avatar ccwang002 commented on June 5, 2024

Finally, to answer the rest of your questions.

Why do the extra flags -l -t -E -x fail to work?
charger -f demo.vcf -o demo.ltEx.tsv -l -t -E -x --exac-vcf ~/CharGer3/CharGer/Demo/ExAC.r1.sites.vep.vcf --mac-clinvar-tsv ~/CharGer3/CharGer/Demo/clinvar_alleles.multi.b37.tsv.gz

These flags enable CharGer to look for the corresponding annotation. It used to try using online API to get those annotation without a local copy (for example, search ExAC online database when --exac-vcf is not given), but many of those online APIs have changed and are quite buggy to use. So we now recommend to only use the local copy of the annotations.

If you already annotated the input VCF with VEP, many of the flags here are not necessary. Please just follow the 3 steps described above.

somebody found 3 bugs (https://www.jianshu.com/p/544caf92b24c) ...

These bugs and the proposed solutions refer to the online API calling. However, it probably doesn't fix everything like I mentioned above. You will likely run into a new sets of issues talking to all the online APIs (e.g., rate limit, change of API and etc) so we no longer recommend people to use this approach. If you are running CharGer on thousands of VCFs, using the online API will be much slower than have a local annotation copy.

The options of calling online APIs will be removed in the next CharGer release. So the bugs will naturally go away in CharGer v0.6 and later.

What is the format of -d diseases?

It's the same format as --inheritanceGeneList. I think what you need here is actually --inheritanceGeneList and not -d. You can find the example file in my comment above.

from charger.

yubau1112 avatar yubau1112 commented on June 5, 2024

Ok, thank you so much , I will try your 3 step recommend.

from charger.

fernanda-rodrigues avatar fernanda-rodrigues commented on June 5, 2024

@yubau1112
we haven't heard back from you so I assume you fixed the issue.
Please feel free to reopen the issue if you need further help.
I am closing it for now.

Thanks!

from charger.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.