Giter Site home page Giter Site logo

Comments (2)

coonya avatar coonya commented on September 27, 2024

####### VEP 109.3
Ensembl's VEP (Variant Effect Predictor) is popular for how it picks a single effect per gene as detailed here, its CLIA-compliant HGVS variant format, and Sequence Ontology nomenclature for variant effects.

Instead of the official instructions, we will use mamba (conda, but faster) to install VEP and its dependencies. If you don't already have mamba, use these steps to download and install it into $HOME/mambaforge, then run a script that adds it to your $PATH:

curl -L https://github.com/conda-forge/miniforge/releases/download/4.12.0-0/Mambaforge-Linux-x86_64.sh -o /tmp/mambaforge.sh
sh /tmp/mambaforge.sh -bfp $HOME/mambaforge && rm -f mambaforge.sh
. $HOME/mambaforge/etc/profile.d/conda.sh

You can add the following to your ~/.bashrc file to add mamba and conda to your $PATH whenever you login:

if [ -f "$HOME/mambaforge/etc/profile.d/conda.sh" ]; then
    . $HOME/mambaforge/etc/profile.d/conda.sh
fi

Use mamba to create and activate a conda environment with VEP, its dependencies, and other related tools:

mamba create -n vep
conda activate vep
mamba install -y -c conda-forge -c bioconda -c defaults ensembl-vep==109.3 htslib==1.14 bcftools==1.14 samtools==1.14 ucsc-liftover==377
cd {home of vep environment}/share
git clone https://github.com/mskcc/vcf2maf.git
cd vcf2maf
chown 777 *.pl
cp *.pl ../../bin

change the vcf2maf.pl code:

$vep_cmd .= ( $vep_script =~ m/vep$/ ? " --af_1kg --af_esp --af_gnomad" : " --maf_1kg --maf_esp" ) unless( $online );

-->

$vep_cmd .= ( $vep_script =~ m/vep$/ ? " --af_1kg --af_gnomad" : " --maf_1kg --maf_esp" ) unless( $online );


my @ann_cols = qw( Allele Gene Feature Feature_type Consequence cDNA_position CDS_position
    Protein_position Amino_acids Codons Existing_variation ALLELE_NUM DISTANCE STRAND_VEP SYMBOL
    SYMBOL_SOURCE HGNC_ID BIOTYPE CANONICAL CCDS ENSP SWISSPROT TREMBL UNIPARC RefSeq SIFT PolyPhen
    EXON INTRON DOMAINS AF AFR_AF AMR_AF ASN_AF EAS_AF EUR_AF SAS_AF AA_AF EA_AF CLIN_SIG SOMATIC
    PUBMED MOTIF_NAME MOTIF_POS HIGH_INF_POS MOTIF_SCORE_CHANGE IMPACT PICK VARIANT_CLASS TSL
    HGVS_OFFSET PHENO MINIMISED GENE_PHENO FILTER flanking_bps vcf_id vcf_qual gnomAD_AF gnomAD_AFR_AF
    gnomAD_AMR_AF gnomAD_ASJ_AF gnomAD_EAS_AF gnomAD_FIN_AF gnomAD_NFE_AF gnomAD_OTH_AF gnomAD_SAS_AF );

-->

my @ann_cols = qw( Allele Gene Feature Feature_type Consequence cDNA_position CDS_position
    Protein_position Amino_acids Codons Existing_variation ALLELE_NUM DISTANCE STRAND_VEP SYMBOL
    SYMBOL_SOURCE HGNC_ID BIOTYPE CANONICAL CCDS ENSP SWISSPROT TREMBL UNIPARC RefSeq SIFT PolyPhen
    EXON INTRON DOMAINS AF AFR_AF AMR_AF ASN_AF EAS_AF EUR_AF SAS_AF AA_AF EA_AF CLIN_SIG SOMATIC
    PUBMED MOTIF_NAME MOTIF_POS HIGH_INF_POS MOTIF_SCORE_CHANGE IMPACT PICK VARIANT_CLASS TSL
    HGVS_OFFSET PHENO MINIMISED GENE_PHENO FILTER flanking_bps vcf_id vcf_qual gnomADe_AF gnomADe_AFR_AF
    gnomADe_AMR_AF gnomADe_ASJ_AF gnomADe_EAS_AF gnomADe_FIN_AF gnomADe_NFE_AF gnomADe_OTH_AF gnomADe_SAS_AF );

Download VEP's offline cache for GRCh38, and the reference FASTA:

mkdir -p $HOME/.vep/homo_sapiens/109_GRCh38/
rsync -avr --progress rsync://ftp.ensembl.org/ensembl/pub/release-109/variation/indexed_vep_cache/homo_sapiens_vep_109_GRCh38.tar.gz $HOME/.vep/
tar -zxf $HOME/.vep/homo_sapiens_vep_109_GRCh38.tar.gz -C $HOME/.vep/
rsync -avr --progress rsync://ftp.ensembl.org/ensembl/pub/release-109/fasta/homo_sapiens/dna_index/ $HOME/.vep/homo_sapiens/109_GRCh38/

(Optional) Download VEP's offline cache for GRCh37, and the reference FASTA which we must bgzip instead of gzip:

mkdir -p $HOME/.vep/homo_sapiens/109_GRCh37/
rsync -avr --progress rsync://ftp.ensembl.org/ensembl/pub/release-109/variation/indexed_vep_cache/homo_sapiens_vep_109_GRCh37.tar.gz $HOME/.vep/
tar -zxf $HOME/.vep/homo_sapiens_vep_109_GRCh37.tar.gz -C $HOME/.vep/
rsync -avr --progress rsync://ftp.ensembl.org/ensembl/pub/grch37/release-109/fasta/homo_sapiens/dna/Homo_sapiens.GRCh37.dna.toplevel.fa.gz $HOME/.vep/homo_sapiens/109_GRCh37/
gzip -d $HOME/.vep/homo_sapiens/109_GRCh37/Homo_sapiens.GRCh37.dna.toplevel.fa.gz
bgzip -i $HOME/.vep/homo_sapiens/109_GRCh37/Homo_sapiens.GRCh37.dna.toplevel.fa
samtools faidx $HOME/.vep/homo_sapiens/109_GRCh37/Homo_sapiens.GRCh37.dna.toplevel.fa.gz

Test running VEP in offline mode on a GRCh38 VCF:

curl -sLO https://raw.githubusercontent.com/Ensembl/ensembl-vep/release/109/examples/homo_sapiens_GRCh38.vcf
vep --species homo_sapiens --assembly GRCh38 --offline --no_progress --no_stats --sift b --ccds --uniprot --hgvs --symbol --numbers --domains --gene_phenotype --canonical --protein --biotype --tsl --pubmed --variant_class --shift_hgvs 1 --check_existing --total_length --allele_number --no_escape --xref_refseq --failed 1 --vcf --minimal --flag_pick_allele --pick_order canonical,tsl,biotype,rank,ccds,length --dir $HOME/.vep --fasta $HOME/.vep/homo_sapiens/109_GRCh38/Homo_sapiens.GRCh38.dna.toplevel.fa.gz --input_file homo_sapiens_GRCh38.vcf --output_file homo_sapiens_GRCh38.vep.vcf --polyphen b --af --af_1kg --af_esp --regulatory

from vcf2maf.

tamuanand avatar tamuanand commented on September 27, 2024

Hi @coonya

Thanks a lot for your suggestions to change vcf2maf.pl.

Would you know what other changes would be needed for say maf2maf.pl, maf2vcf.pl, vcf2vcf.pl.

Thanks in advance.

from vcf2maf.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.