Comments (2)
####### VEP 109.3
Ensembl's VEP (Variant Effect Predictor) is popular for how it picks a single effect per gene as detailed here, its CLIA-compliant HGVS variant format, and Sequence Ontology nomenclature for variant effects.
Instead of the official instructions, we will use mamba (conda, but faster) to install VEP and its dependencies. If you don't already have mamba, use these steps to download and install it into $HOME/mambaforge, then run a script that adds it to your $PATH:
curl -L https://github.com/conda-forge/miniforge/releases/download/4.12.0-0/Mambaforge-Linux-x86_64.sh -o /tmp/mambaforge.sh
sh /tmp/mambaforge.sh -bfp $HOME/mambaforge && rm -f mambaforge.sh
. $HOME/mambaforge/etc/profile.d/conda.sh
You can add the following to your ~/.bashrc file to add mamba and conda to your $PATH whenever you login:
if [ -f "$HOME/mambaforge/etc/profile.d/conda.sh" ]; then
. $HOME/mambaforge/etc/profile.d/conda.sh
fi
Use mamba to create and activate a conda environment with VEP, its dependencies, and other related tools:
mamba create -n vep
conda activate vep
mamba install -y -c conda-forge -c bioconda -c defaults ensembl-vep==109.3 htslib==1.14 bcftools==1.14 samtools==1.14 ucsc-liftover==377
cd {home of vep environment}/share
git clone https://github.com/mskcc/vcf2maf.git
cd vcf2maf
chown 777 *.pl
cp *.pl ../../bin
change the vcf2maf.pl code:
$vep_cmd .= ( $vep_script =~ m/vep$/ ? " --af_1kg --af_esp --af_gnomad" : " --maf_1kg --maf_esp" ) unless( $online );
-->
$vep_cmd .= ( $vep_script =~ m/vep$/ ? " --af_1kg --af_gnomad" : " --maf_1kg --maf_esp" ) unless( $online );
my @ann_cols = qw( Allele Gene Feature Feature_type Consequence cDNA_position CDS_position
Protein_position Amino_acids Codons Existing_variation ALLELE_NUM DISTANCE STRAND_VEP SYMBOL
SYMBOL_SOURCE HGNC_ID BIOTYPE CANONICAL CCDS ENSP SWISSPROT TREMBL UNIPARC RefSeq SIFT PolyPhen
EXON INTRON DOMAINS AF AFR_AF AMR_AF ASN_AF EAS_AF EUR_AF SAS_AF AA_AF EA_AF CLIN_SIG SOMATIC
PUBMED MOTIF_NAME MOTIF_POS HIGH_INF_POS MOTIF_SCORE_CHANGE IMPACT PICK VARIANT_CLASS TSL
HGVS_OFFSET PHENO MINIMISED GENE_PHENO FILTER flanking_bps vcf_id vcf_qual gnomAD_AF gnomAD_AFR_AF
gnomAD_AMR_AF gnomAD_ASJ_AF gnomAD_EAS_AF gnomAD_FIN_AF gnomAD_NFE_AF gnomAD_OTH_AF gnomAD_SAS_AF );
-->
my @ann_cols = qw( Allele Gene Feature Feature_type Consequence cDNA_position CDS_position
Protein_position Amino_acids Codons Existing_variation ALLELE_NUM DISTANCE STRAND_VEP SYMBOL
SYMBOL_SOURCE HGNC_ID BIOTYPE CANONICAL CCDS ENSP SWISSPROT TREMBL UNIPARC RefSeq SIFT PolyPhen
EXON INTRON DOMAINS AF AFR_AF AMR_AF ASN_AF EAS_AF EUR_AF SAS_AF AA_AF EA_AF CLIN_SIG SOMATIC
PUBMED MOTIF_NAME MOTIF_POS HIGH_INF_POS MOTIF_SCORE_CHANGE IMPACT PICK VARIANT_CLASS TSL
HGVS_OFFSET PHENO MINIMISED GENE_PHENO FILTER flanking_bps vcf_id vcf_qual gnomADe_AF gnomADe_AFR_AF
gnomADe_AMR_AF gnomADe_ASJ_AF gnomADe_EAS_AF gnomADe_FIN_AF gnomADe_NFE_AF gnomADe_OTH_AF gnomADe_SAS_AF );
Download VEP's offline cache for GRCh38, and the reference FASTA:
mkdir -p $HOME/.vep/homo_sapiens/109_GRCh38/
rsync -avr --progress rsync://ftp.ensembl.org/ensembl/pub/release-109/variation/indexed_vep_cache/homo_sapiens_vep_109_GRCh38.tar.gz $HOME/.vep/
tar -zxf $HOME/.vep/homo_sapiens_vep_109_GRCh38.tar.gz -C $HOME/.vep/
rsync -avr --progress rsync://ftp.ensembl.org/ensembl/pub/release-109/fasta/homo_sapiens/dna_index/ $HOME/.vep/homo_sapiens/109_GRCh38/
(Optional) Download VEP's offline cache for GRCh37, and the reference FASTA which we must bgzip instead of gzip:
mkdir -p $HOME/.vep/homo_sapiens/109_GRCh37/
rsync -avr --progress rsync://ftp.ensembl.org/ensembl/pub/release-109/variation/indexed_vep_cache/homo_sapiens_vep_109_GRCh37.tar.gz $HOME/.vep/
tar -zxf $HOME/.vep/homo_sapiens_vep_109_GRCh37.tar.gz -C $HOME/.vep/
rsync -avr --progress rsync://ftp.ensembl.org/ensembl/pub/grch37/release-109/fasta/homo_sapiens/dna/Homo_sapiens.GRCh37.dna.toplevel.fa.gz $HOME/.vep/homo_sapiens/109_GRCh37/
gzip -d $HOME/.vep/homo_sapiens/109_GRCh37/Homo_sapiens.GRCh37.dna.toplevel.fa.gz
bgzip -i $HOME/.vep/homo_sapiens/109_GRCh37/Homo_sapiens.GRCh37.dna.toplevel.fa
samtools faidx $HOME/.vep/homo_sapiens/109_GRCh37/Homo_sapiens.GRCh37.dna.toplevel.fa.gz
Test running VEP in offline mode on a GRCh38 VCF:
curl -sLO https://raw.githubusercontent.com/Ensembl/ensembl-vep/release/109/examples/homo_sapiens_GRCh38.vcf
vep --species homo_sapiens --assembly GRCh38 --offline --no_progress --no_stats --sift b --ccds --uniprot --hgvs --symbol --numbers --domains --gene_phenotype --canonical --protein --biotype --tsl --pubmed --variant_class --shift_hgvs 1 --check_existing --total_length --allele_number --no_escape --xref_refseq --failed 1 --vcf --minimal --flag_pick_allele --pick_order canonical,tsl,biotype,rank,ccds,length --dir $HOME/.vep --fasta $HOME/.vep/homo_sapiens/109_GRCh38/Homo_sapiens.GRCh38.dna.toplevel.fa.gz --input_file homo_sapiens_GRCh38.vcf --output_file homo_sapiens_GRCh38.vep.vcf --polyphen b --af --af_1kg --af_esp --regulatory
from vcf2maf.
Hi @coonya
Thanks a lot for your suggestions to change vcf2maf.pl.
Would you know what other changes would be needed for say maf2maf.pl, maf2vcf.pl, vcf2vcf.pl.
Thanks in advance.
from vcf2maf.
Related Issues (20)
- Release new tag HOT 4
- ensg_to_entrez_id_map_ensembl_feb2014.tsv erroneously assigns Entrez ID
- ERROR: Failed to run the VEP annotator!
- The best solution for one-to-one correspondence between genes and transcripts HOT 3
- VCF2MAF and allele frequency HOT 4
- cosmic annotation
- HGVSc + HGVSp + HGVSp_Short not available after vcf2maf from an annotated VCF from sarek with VEP v108 HOT 3
- Error Argument "" isn't numeric in subtraction HOT 2
- WARNING: Unrecognized effect "sequence_feature". Assigning lowest priority! HOT 1
- Unrecognized biotype/effect HOT 2
- Empty 't_depth' 't_ref_count' 't_alt_count' from VCF with format GT:GL:GOF:GQ:NR:NV HOT 1
- Normalization of vcf file
- vcf2maf generates an empty maf file
- traying to make a maf from an already annotated vcf
- docker clone HOT 1
- error with VEP HOT 1
- maf2maf issue
- Running vcf2maf WARNING: No genotype column for NORMAL in VCF! WARNING: Unrecognized biotype "protein_coding_CDS_not_defined". Assigning lowest priority!
- ERROR: No cache found for homo_sapiens, version 93
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vcf2maf.