Comments (5)
hey @yubau1112, you have created quite a lot of GitHub issues here, and to be honest I still don't know what you are trying to achieve with CharGer. Here is my understanding of what you want to do: you want to run CharGer on your cancer WGS germline VCF and annotate your variants with ClinVar and ExAC.
If that's the case, I would recommend you to follow the steps bellow:
- Determine the human genome reference version you are using (hg19/GRCh37 or hg38/GRCh38)
- Annotate your VCF using VEP
- Run CharGer
Step 1: Detemine the human genome reference version
You need to be sure which genome reference you used to generate your germline VCF. If you are using hg19, you should run VEP with GRCh37 cache and use the hg19 version of ClinVar. You cannot mix the genome reference.
Because you are using Demo/demo.vcf
here, it's GRCh37. And I will assume your actual data is using the same genome reference below.
Step 2: Annotate your VCF using VEP
You should have your VEP installed and preferably set up VEP's cache. Here I use VEP v95 as an example, but any version later than that should also work. You should annotate your VCF with the following command:
vep --format vcf --vcf \
--assembly GRCh37 \
--everything --af_exac \
--offline --cache --dir_cache /path/to/vep_cache/ --fasta /path/to/vep_cache/homo_sapiens/95_GRCh37/Homo_sapiens.GRCh37.dna.toplevel.fa \
--input_file your_wgs_germline.vcf.gz \
--output_file your_wgs_germline.vep95.vcf
The VEP command above will give you a VCF your_wgs_germline.vep95.vcf
with the following information:
- Assume your VCF is using hg19/GRCh37 genome reference
- Assume you have the VEP v95 cache set up locally at
/path/to/vep_cache/homo_sapiens/95_GRCh37
- Annotate your variants with 1000 Genomes, ExAC, and gnomAD population allele frequencies. We recommend gnomAD over ExAC, so if you don't particularly need ExAC, remove the
--af_exac
flag
Step 3: Run CharGer
Please start with the following and try to add extra options only after it works.
charger \
-f your_wgs_germline.vep95.vcf \
-o your_wgs_germline.charger.tsv \
-l --mac-clinvar-tsv ~/CharGer3/CharGer/Demo/clinvar_alleles.multi.b37.tsv.gz \
-D
You will see some of the ACMG/CharGer modules got disabled because they require additional annotations. But you should be able to successfully run CharGer.
If any of the steps above doesn't work, I will need the following information:
- Your genome reference version
- Your VEP version, command, and one of your VEP annotated VCF
- Your CharGer command
Please stop posting your installation logs, stop trying other CharGer options, and stop creating new issues here unless you get the steps above working. I will comment on your other questions in detail later. Thank you.
from charger.
Adding extra CharGer options
Please only consider these options after you successfully run through the three steps above.
The CharGer command above doesn't run all the modules because it lacks the additional annotation. And the additional annotations here depends on your disease, so there is no one annotation for everything and very likely you need to create your own annotations if you are studying a different disease. If you are studying cancer, we have some example pan-cancer annotations under PanCanAtlasData
. The files listed below can be found under that folder.
-z
for all known pathogenic variant in your disease (pan-cancer example:emptyRemoved_20160428_pathogenic_variants_HGVSg_VEP.vcf.gz
)--inheritanceGeneList
for the known inheritance mode of genes in your disease (pan-cancer example:20160301_Rahman_KJ_KH_gene_table_CharGer.txt.gz
)-H
HotSpot3D cluster file (pan-cancer example:MC3.noHypers.mericUnspecified.d10.r20.v114.clusters.gz
)
Please also check out the detailed description of these options (and more) by my colleague at #18 (comment).
Using hg38 genome reference
If your VCF is using hg38 genome reference, you need to change all the annotations. It at least affects these parameters:
--mac-clinvar-tsv
should point toclinvar_alleles.single.b38.tsv.gz
-z
pathogenic variant VCF should beemptyRemoved_20160428_pathogenic_variants_HGVSg_VEP_grch38lifOver.vcf
-H
HotSpot3D clusters file should beMC3.noHypers.mericUnspecified.d10.r20.v114.grch38liftOver.clusters
from charger.
Finally, to answer the rest of your questions.
Why do the extra flags
-l -t -E -x
fail to work?
charger -f demo.vcf -o demo.ltEx.tsv -l -t -E -x --exac-vcf ~/CharGer3/CharGer/Demo/ExAC.r1.sites.vep.vcf --mac-clinvar-tsv ~/CharGer3/CharGer/Demo/clinvar_alleles.multi.b37.tsv.gz
These flags enable CharGer to look for the corresponding annotation. It used to try using online API to get those annotation without a local copy (for example, search ExAC online database when --exac-vcf
is not given), but many of those online APIs have changed and are quite buggy to use. So we now recommend to only use the local copy of the annotations.
If you already annotated the input VCF with VEP, many of the flags here are not necessary. Please just follow the 3 steps described above.
somebody found 3 bugs (https://www.jianshu.com/p/544caf92b24c) ...
These bugs and the proposed solutions refer to the online API calling. However, it probably doesn't fix everything like I mentioned above. You will likely run into a new sets of issues talking to all the online APIs (e.g., rate limit, change of API and etc) so we no longer recommend people to use this approach. If you are running CharGer on thousands of VCFs, using the online API will be much slower than have a local annotation copy.
The options of calling online APIs will be removed in the next CharGer release. So the bugs will naturally go away in CharGer v0.6 and later.
What is the format of
-d diseases
?
It's the same format as --inheritanceGeneList
. I think what you need here is actually --inheritanceGeneList
and not -d
. You can find the example file in my comment above.
from charger.
Ok, thank you so much , I will try your 3 step recommend.
from charger.
@yubau1112
we haven't heard back from you so I assume you fixed the issue.
Please feel free to reopen the issue if you need further help.
I am closing it for now.
Thanks!
from charger.
Related Issues (20)
- something wrong HOT 7
- run demo.sh, report error HOT 2
- can you give me an example command with run VEP? HOT 4
- I can not get any output file using v0.6.0b1 version HOT 1
- IndexError: list index out of range while running with Mac-Clinvar HOT 4
- Provide instructions to generate the HotSpot3D cluster file
- Null Variant classified as Benign HOT 3
- installed Charger 0.5.4 but version in help is 0.5.3 HOT 1
- error: Hint: is the input amino acid change column correct , charger version 0.5.4 HOT 1
- Reference for inheritanceGeneList 20160301_Rahman_KJ_KH_gene_table_CharGer.txt.gz HOT 2
- What Cross-reference data files to use and where to get for lung cancer
- CharGer::runIndelModules Error:
- Install error
- NameError: global name 'entrezaip' is not defined when doing clinvar search
- is this software alive?
- Need most updated ClinVar files
- PM5 found 0 pathogenic variants
- Default inheritance table and gene list for BP2 and PP1.
- Unable to install charger via pip or conda
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from charger.