ensembl / vep_plugins Goto Github PK

View Code? Open in Web Editor NEW

136.0 23.0 114.0 2.51 MB

Plugins for the Ensembl Variant Effect Predictor (VEP)

License: Apache License 2.0

Perl 100.00%

vep_plugins's People

Stargazers

Watchers

Forkers

genome-vendor njayaram14 pamag cartersgenes konradjk fcunningham willmclaren apastore mmesbahu brspurri pboutet guillermomarco david-a-parry nihar12 chia-ching shameer snashraf supernifty johnmcma tiratatp mkohram sambrightman abjonnes tgreen41474 xflicsu abhisheknrl sarahhunt helensch twelvesummer mimame logust79 derkelly jprnz liubo2012 ima23 sanger-cosmic ens-lgil jye-lee aparton stekaz hsiaoyi0504 inambioinfo zhouhufeng boutrys crazymaribell tacaca kkchau tingszhang dglemos emiag linhxxx chadohyeon ens-emily snehapandey miguelpmachado demian1 akotlar cccnrc daffay shu2010 marchoeppner tatianaliu zhk8111 booew limbus-medtec raziafrooz barrydigby ktbiotech kwsamarasinghe melnel000 mlebeur puva antonkulaga commandlinegirl niaz-lab-ux vlad-dembrovskyi leequn olaaustine thoughtsynapse git-jemiller navenm likhitha-surapaneni sukritipaul05 nakib103 yysu0815 nuno-agostinho mfasnacht jamie-m-a diegomscoelho ntm qistark bryanlaura736 sysbiocoder biomguler nbalanda23 migmarbor janapet alexbaras lenapfitzer lucasmiranda42

vep_plugins's Issues

PolyPhen and SIFT score discrepancies between VEP 88 and dbNSFP v3.0

I'm hitting an issue where PolyPhen and SIFT scores differ between VEP and dbNSFP.
Here are some examples:

SIFT mismatch:
CHROM POS REF ALT VEP-SIFT VEP-PRED DBNSFP-SIFT DBNSFP-PRED
1 55474215 G A 0.01 D 0.069 T
9 117783486 G C 0.05 T .,0.0,0.0,0.0,0.041,0.0 .,D,D,D,D,D
18 44109166 G A 0.05 D 0.093,.,.,0.063,. T,.,.,T,.

-PolyPhen mismatch:
CHROM POS REF ALT VEP_POLYPHEN VEP_PRED DBNSFP_POLY_HDIV DBNSFP_PRED
1 6485211 C A 0.044 B 0.999 D
3 127336823 G A 0.177 B 0.982,0.596,0.596 D,P,P
10 73574953 G A 0.243 B 1.0 D

Any reasoning behind these discrepancies?

Issue Installing dbscSNV1.1 for GRCh38 as plugin for VEP

Installation for GRCh37 went without trouble, but still have an issue for the GRCh38 version:
My code (comes from https://www.ensembl.org/info/docs/tools/vep/script/vep_plugins.html#exac):
wget ftp://dbnsfp:[email protected]/dbscSNV1.1.zip
unzip dbscSNV1.1.zip
head -n1 dbscSNV1.1.chr1 > h
cat dbscSNV1.1.chr* | grep -v ^chr | sort -k5,5 -k6,6n | cat h - | bgzip -c > dbscSNV1.1_GRCh38.txt.gz
tabix -s 5 -b 6 -e 6 -c c dbscSNV1.1_GRCh38.txt.gz

But I'm getting that message:
[E::get_intv] failed to parse TBX_GENERIC, was wrong -p [type] used?
The offending line was: "10 11374604 A C . . y n UTR3 CELF2 . . UTR3 ENSG00000048740 . . 0.00378177909178219 ."
Segmentation fault (core dumped)

Anyone has solved that issue ?

Downstream plugin - incorrect mutant protein sequence

Hello,

I'm using the Downstream plugin to get the mutant protein sequence for frameshift variants annotated by VEP GRCh37 and I've found a case where the wrong sequence is returned

Input to VEP : 1 115256528 115256528 T/- +
Expected Downstream sequence : KSTVP*
Returned Downstream sequence : EEYSAMRDQYMRTGEGFLCVFAINNSKSFADINLYREQIKRVKDSDDVPMVLVGNKCDLPTRTVDTKQAHELAKSYGIPFIETSAKTRQGVEDAFYTLVREIRQYRMKKLNSSDDGTQGCMGLPCVVM

The 'ProteinLengthChange' is not as expected either.

I've confirmed with Mutalyzer that the expected sequence is correct.

I'm using VEP cache in offline mode, for the GRCh37 assembly.

Help with this please?

error installing cadd plugin

I'm trying to install the CADD (offline) plugin for VEP :

I downloaded conda
wget -c https://repo.continuum.io/miniconda/Miniconda2-latest-Linux-x86_64.sh
bash Miniconda2-latest-Linux-x86_64.sh -p ~/miniconda2 -b
##SPECIFIC WINDOWS setup
export PATH="$HOME/miniconda2/bin:$PATH"

downloaded the git
git clone https://github.com/kircherlab/CADD-scripts.git

So from now on I'm in my directory where I put CADD-scripts.git, I checked that conda is in the path,
but when I lauch the ./install.sh , I'm getting that error (see attached file)
installSH_error.txt

bad interpreter: No such file or directory

I think it might be dure to windows/unix, but no idea how to fix it,
I tried to follow the 2 first procedures described in https://stackoverflow.com/questions/14219092/bash-my-script-bin-bashm-bad-interpreter-no-such-file-or-directory
but still not working

Anyone has a solution/idea :)

Thanks in advance

Generalise the ExACpLI.pm

I would like to use the ExACpLI.pm plugin with the new pLI scores from gnomad: https://storage.googleapis.com/gnomad-public/release/2.1/ht/constraint/constraint.txt.bgz

However, since the plugin is called ExACpLI, when i supply the gnomad pLI scores the CSQ field in the vcf header will say "ExACpLI", which is not true. Since, the plugin allows other values_file than ExACs would it not be more appropriate to rename to a more generic "pLI.pm" or allow to rename the filed name in the CSQ.

dbNSFP.pm went wrong

Hi there,
We met a syntax error when we using VEP with dbNSFP.pm to annotate variants.

perl /home/zoeching/Tools/src/ensembl-tools/scripts/variant_effect_predictor/variant_effect_predictor.pl \ --offline \ --fork 10\ --pick\ --merged /home/zoeching/pku/Ref\ --everything \ --force_overwrite\ --vcf \ --vcf_info_field CSQ\ --plugin dbNSFP,/home/zoeching/pku/Ref/dbNSFPv2.9.1/dbNSFP.gz,SIFT_score,SIFT_converted_rankscore,SIFT_pred,Polyphen2_HDIV_score,Polyphen2_HDIV_rankscore,Polyphen2_HDIV_pred,Polyphen2_HVAR_score,Polyphen2_HVAR_rankscore,Polyphen2_HVAR_pred,MutationTaster_score,MutationTaster_converted_rankscore,MutationTaster_pred,CADD_raw,CADD_raw_rankscore,CADD_phred,GERP++_NR,GERP++_RS,GERP++_RS_rankscore\ --plugin LoF,human_ancestor_fa:/home/zoeching/pku/Ref/human_ancestor.fa.gz\ -i /work1/ASD/Data/other/ASD569_1.gatk-queue_raw_snps.vcf \ -o ASD569_1.gatk-queue_raw_ann_snps.vcf

after ran the command, I got this:

The output (if any) follows:

2016-09-04 16:02:10 - Read existing cache info
2016-09-04 16:02:10 - INFO: Disabling --hgvs; using --offline and no FASTA file found
2016-09-04 16:02:10 - Failed to compile plugin dbNSFP: syntax error at /home/zoeching/Tools/library/perl_module/lib/perl5/dbNSFP.pm line 304, near "s/[|]/&/gr"
Compilation failed in require at (eval 68) line 2.
BEGIN failed--compilation aborted at (eval 68) line 2.

2016-09-04 16:02:11 - Loaded plugin: LoF
2016-09-04 16:02:11 - Starting...
2016-09-04 16:02:11 - Detected format of input file as vcf
2016-09-04 16:02:11 - Read 5000 variants into buffer
2016-09-04 16:02:11 - Calculating consequences

and this

Bareword found where operator expected at /home/zoeching/Tools/library/perl_module/lib/perl5/dbNSFP.pm line 304, near "s/[|]/&/gr"

i didn't get the mean of "r" in "s/[|]/&/gr", line304, so i deleted "r" and ran the command again. At this time, no errors came out but the annotation failed. The columns for annotation from dbNSFP.gz were empty. And i can successfully use "chr pos" to grep infomation from dbNSFP2.9.1_variant.chr* file. So, the input file is okay.

Anyone met this error before? Anything wrong with dbNSFP.pm(line 304)? What does the "r" means here?

dbNSFP version from filename SOLVED

Hi, I have problem with my dbNSFP plugin.
WARNING: Failed to instantiate plugin dbNSFP: ERROR: Could not retrieve dbNSFP version from filename /home/ertan/Desktop/vep/dbNSFP3/dbNSFP_hg19.gz. How can I solve this?

my command I used
./vep --fork 4 -cache -port 3337 --use_given_ref --symbol --tab --offline --refseq -i /home/ertan/Desktop/pipeline/playground/NECESSARY/DE-FMF1/DE-FMF1.vcf -o /home/ertan/Desktop/fmf.txt --registry ensembl.registry --polyphen p --sift b --coding_only --genomes --species homo_sapiens --custom /home/ertan/Desktop/vep/clinvar/clinvar_20190513.vcf.gz,ClinVar,vcf,exact,0,CLNSIG --force_overwrite --plugin GO --plugin LOVD --custom /home/ertan/Desktop/vep/gnomAD-exomes/gnomad.exomes.r2.0.1.sites.noVEP.vcf.gz,gnomADg,vcf,exact,0 --plugin dbscSNV,/home/ertan/Desktop/vep/dbscSNV/dbscSNV1.1/dbscSNV1.1_GRCh37.txt.gz --assembly GRCh37 --plugin SpliceRegion,Extended --offline --hgvs --fasta /home/ertan/Desktop/vep/fasta/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa --plugin dbNSFP,/home/ertan/Desktop/vep/dbNSFP3/dbNSFP_hg19.gz,SIFT_score,SIFT_converted_rankscore,SIFT_pred,Polyphen2_HDIV_score,Polyphen2_HDIV_rankscore,Polyphen2_HDIV_pred,Polyphen2_HVAR_score,Polyphen2_HVAR_rankscore,Polyphen2_HVAR_pred,MutationTaster_score,MutationTaster_converted_rankscore,MutationTaster_pred,CADD_raw,CADD_raw_rankscore,CADD_phred,GERP++_NR,GERP++_RS,GERP++_RS_rankscore

[SOLVED] I have error dbNSFP plugin

hello dear;
I am have a problem dbNSFP plugin on vep

my running command is:

./vep --fork 4 --force_overwrite --coding_only --cache --port 3337 --symbol --tab --polyphen p --sift p --pubmed --hgvs --fasta /home/ertan/Desktop/programlar/ensembl-data/fasta/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa --plugin LOVD --plugin Condel,/home/.vep/Plugins/config/Condel/config/condel_SP.conf,b --plugin dbNSFP,/home/ertan/Desktop/programlar/ensembl-data/dbNSFP3.5/dbNSFP3.5.gz,ALL --plugin GO --plugin dbscSNV,/home/$USER/Desktop/programlar/ensembl-data/dbscSNV/dbscSNV1.1/dbscSNV1.1_GRCh37.txt.gz --plugin G2P,/home/$USER/Desktop/programlar/ensembl-data/G2P/DDG2P_17_5_2019.csv.gz -i /home/ertan/Desktop/pipeline/playground/NECESSARY/166527/166527.vcf -o /home/$USER/Desktop/deneme.csv

my idea error output may be:
Use of uninitialized value $readme_file in concatenation (.) or string at /home/ertan/.vep/Plugins/dbNSFP.pm line 218.

Something not right with Drosophila melanogaster in Ensembl v96 Phenotypes.pm plugin

I tried to generate the GFF file for Drosophila melanogaster BDGP6.22 for offline use of the Phenotypes.pm plugin with VEP v96, but got an error.

Here is the command I ran:

vep -i drosophila.vcf --cache --dir_cache /fdb/VEP/96/cache --fasta Drosophila_melanogaster.BDGP6.22.dna.toplevel.fa --species drosophila_melanogaster --plugin Phenotypes

Here is the resulting error:

### Phenotypes plugin: This will take some time but it will only run once per species, assembly and release
### Phenotypes plugin: Querying database
WARNING: Failed to instantiate plugin Phenotypes: Can't call method "dbc" on an undefined value at /fdb/VEP/96/cache/Plugins/Phenotypes.pm line 196.

Earlier versions (88 through 95) finished normally, only 96 failed.

Conservation plugin

Hello,

I'm expecting error while using Conservation plugin with VEP 71 and Ensembl 71.

ERROR: Forked process failed
Use of uninitialized value in numeric ge (>=) at /home/likewise-open/SGNET/gmarco/.vep/Plugins/Conservation.pm line 105.

PolyPhen_SIFT plugin missing on release 88

Attempting to update to VEP 88, but notice install is lacking PolyPhen_SIFT plugin.
Is this an intentional omission, or an oversight?

Installation REVEL plugin

Hi,

the REVEL plugin installation instructions seem to be outdated.
The command given is:

./vep -i variations.vcf --plugin REVEL,/path/to/revel/data.tsv.gz

But at the given URL (https://sites.google.com/site/revelgenomics/downloads), only CSV files are downloadable.

Is there another download URL available for the TSV, or are there instructions how to convert the data?

Best,
Marc

Should the Downstream plugin include the 3'UTR when generating the downstream sequence?

The Downstream plugin only considers the coding sequence when producing the shifted sequence. What if I have a frame shift that removes the stop codon?

For instance, this one

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT
chr1    100484696   .   A   AC  5000    .   .   .

which affects transcript ENSMUST00000086738. The last bit of the coding sequence plus 3'UTR look like

TTC ATC TGA ACT ATT GTG TGG TCA TCT GGT CCT CTT TTT TGC AGA GGT TTC CAT CTC TTT TTC TTT TCT TTC TTT TAA
        ^ stop codon

and the variant changes it to

TTC ACT CTG AAC TAT TGT GTG GTC ATC TGG TCC (...)

Should a bit of the 3'UTR be translated to protein ? Likewise, should then stop_loss annotations also be supported by the plugin ?

generate LD fields with population suffix

Currently, all result from LD plugin share the same name of the result field (LinkedVariants), but it will be good to have a suffix like LinkedVariants_CEU.

LD plugin doesn't support generating result for multiple populations at the same time

It seems to me that it would be good to make LD plugin can generate result for more than one population at the same time. For example, allow LD plugin to be run --plugin LD, 1000GENOMES:phase_3:CDX,1000GENOMES:phase_3:FIN.

G2P error output please look here

Hello
I'm getting a bug like this when I'm running the G2P plugin

my command is:
./vep -i FMF-31.vcf -o /home/ertan/Desktop/deneme.csv -e --offline --cache --refseq --use_given_ref --fasta /home/ertan/Desktop/programlar/ensembl-data/fasta/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa --tab --show_ref_allele --af_1kg --af_esp --af_exac --plugin G2P,/home/ertan/Desktop/programlar/ensembl-data/G2P/DDG2P_17_5_2019.csv

my error output is:
Use of uninitialized value in concatenation (.) or string at /home/ertan/.vep/Plugins/G2P.pm line 972.

however this error does not interfere with the operation. The plugin is working, but why does it show a warning message.

I would like to ask one more question in this topic in order not to open different topics

is there any way to use the G2P plugin offline?

LD plugin doesn't support super population

Current supported population are:
1000GENOMES:phase_3:ACB, 1000GENOMES:phase_3:ASW, 1000GENOMES:phase_3:BEB, 1000GENOMES:phase_3:CDX, 1000GENOMES:phase_3:CEU, 1000GENOMES:phase_3:CHB, 1000GENOMES:phase_3:CHS, 1000GENOMES:phase_3:CLM, 1000GENOMES:phase_3:ESN, 1000GENOMES:phase_3:FIN, 1000GENOMES:phase_3:GBR, 1000GENOMES:phase_3:GIH, 1000GENOMES:phase_3:IBS, 1000GENOMES:phase_3:ITU, 1000GENOMES:phase_3:JPT, 1000GENOMES:phase_3:KHV, 1000GENOMES:phase_3:LWK, 1000GENOMES:phase_3:GWD, 1000GENOMES:phase_3:MSL, 1000GENOMES:phase_3:MXL, 1000GENOMES:phase_3:PEL, 1000GENOMES:phase_3:PJL, 1000GENOMES:phase_3:PUR, 1000GENOMES:phase_3:STU, 1000GENOMES:phase_3:TSI, 1000GENOMES:phase_3:YRI

However, it doesn't support super populations:

AFR, African
AMR, Ad Mixed American
EAS, East Asian
EUR, European
SAS, South Asian

Problem with SameCodon plugin

Hello. I'm trying to annotate some variants using the samecodon plugin. When I launch the vep, it does not give me any warning or any problem, but it always leaves the empty vcf column. Is there a dataset with which I can try the plugin? Thank you

dbNSFP Plugin

Currently the instructions for dbNSFP states the dbNSFP zip file must be processed this way: (lines 46-49 of dbNSFP.pm)

> wget ftp://dbnsfp:[email protected]/dbNSFPv3.0b2a.zip
> unzip dbNSFPv3.0b2a.zip
> cat dbNSFP*chr* | bgzip -c > dbNSFP.gz
> tabix -s 1 -b 2 -e 2 dbNSFP.gz

However, if one processes the database this way the plugin will stop working and complain the header is missing. The correct method should be similar to the one described in lines 46-49 of dbscSNV.pm; I have used it on dbNSFP v3.2a:

> wget ftp://dbnsfp:[email protected]/dbNSFPv3.0b2a.zip
> unzip dbNSFPv3.2a.zip
> head -n1 dbNSFP3.2a_variant.chr1 > h
> cat dbNSFP*chr* | grep -v ^#chr | cat h - | bgzip -c > dbNSFP.gz
> tabix -s 1 -b 2 -e 2 dbNSFP.gz

Although I have to say the dbNSFP files often needs to be sorted by the user (sort -k1,1 -k2,2n) before it can be successfully indexed.

Difference between dbNSFP Annotation from dbNSFP.plugin and directly grep from dbNSFP database

Hi there,
What I need to do is filter out nondeleterious variants based on the SIFT/Polyphen2/CADD scores etc from dbNSFP database. So i used VEP dbNSFP.pm plugin to annotate my vcf files. When i check the result, i found out some variants my colleague predicted as deleterious using other tools were empty in every columns for dbNSFP. Then i used the information of "chr" "pos" "ref" "alt" these variants as keyword to grep SIFT/Polyphen2/CADD scores etc from dbNSFP database. This time, the output wasn't empty and even support these variants were deleterious.

To figure out the difference between VEP annotation from dbNSFP.plugin and directly grep from dbNSFP database, i used 25514 variants from chr10 by randomly selection to annotate by joint application of both the two methods. See details and results as below:

using VEP dbNSFP plugin:
1.1 script :
annotate & grep dbNSFP info from test.ANN.vcf

perl /home/perl5/Ensembl-APIs/ensembl-tools/scripts/variant_effect_predictor/variant_effect_predictor.pl --offline --fork 4 --pick --merged --everything --force_overwrite --vcf --vcf_info_field CSQ --plugin LoF,human_ancestor_fa:/home/Ref/human_ancestor.fa.gz --plugin dbNSFP,/home/Ref/dbNSFPv2.9.1/dbNSFP.gz,SIFT_score,SIFT_converted_rankscore,SIFT_pred,Polyphen2_HDIV_score,Polyphen2_HDIV_rankscore,Polyphen2_HDIV_pred,MutationTaster_score,MutationTaster_converted_rankscore,MutationTaster_pred,CADD_raw,CADD_raw_rankscore,CADD_phred,phyloP100way_vertebrate,phyloP100way_vertebrate_rankscore -i test.vcf -o test.ANN.vcf

echo -e "chr\tpos\tref\talt\tSIFT_score\tPolyphen2_HDIV_score\tMutationTaster_pred\tCADD_pred\tphyloP100way_vertabrate" |less >test.ANN.vcf_info
le test.ANN.vcf|awk '!/#/'|cut -f 1-2,4-5,8|perl -ne 'chomp;my @a=split/\t/,$_;my @b=split/;/,$a[4];my @c=split/\|/,$b[-1];if($c[77] ne "" || $c[74] ne "" || $c[70] ne "" || $c[66] ne "" || $c[78] ne ""){print "$a[0]\t$a[1]\t$a[2]\t$a[3]\t$c[77]\t$c[74]\t$c[70]\t$c[66]\t$c[78]\n"}'|cat test.ANN.vcf_h - >test.ANN.vcf_info

1.2 result
only 564 variants were annotated dbNSFP scores with non-null values sucsessfully.

2.using awk
2.1 script:
i just used the information of "chr" "pos" "ref" "alt" these variants as keyword to grep SIFT/Polyphen2/CADD scores etc from dbNSFP database. for example:

awk '$1~/^10$/ && $2~/^100017453$/ && $3~/^T$/ && $4~/^G$/' /home/Ref/dbNSFPv2.9.1/dbNSFP.chr10.txt >> test.dbNSFPv2.9.1.vcf
le test.dbNSFPv2.9.1.vcf|awk -F "\t" '{print"chr"$1"\t"$2"\t"$3"\t"$4"\t"$27"\t"$30"\t"$41"\t"$41"\t"$70}'|cat test.vcf_info_h ->test.dbNSFPv2.9.1.vcf_info

2.2 result
704 variants were annotated dbNSFP scores with non-null values sucsessfully. Beside, 138 variants has more than one row of annotation.

There is my confusion:
1) which information does dbNSFP.pm use as key to annotate input files, especially when there are more than one annotation for chr_pos_refallele_altallele.
2) why i cannot annotate dbNSFP scores for some variants while their corresponding informations in dbNSFP database exist.

Any comments and suggestion will be appreciated.

VEP with ProteinSeqs plugin and --fork does not work correctly

I found out that Variant Effect Predictor with ProteinSeqs plugin and --fork does not work correctly.

Running the following command:

/software/variant_effect_predictor/ensembl-tools-release-85/scripts/variant_effect_predictor/variant_effect_predictor.pl \
    --species homo_sapiens \
    --format vcf \
    --coding_only \
    --symbol \
    --protein \
    --uniprot \
    --plugin ProteinSeqs,${sample}.prot_seq.reference.fa,${sample}.prot_seq.mutated.fa \
    --custom ${sample}.nofirst1bp.dedup.filtered.depth.bed.gz,DP \
    --custom ${sample}.nofirst1bp.dedup.filtered.vaf.bed.gz,VAF \
    --input ${sample}.nofirst1bp.dedup.filtered.vcf.gz \
    --output ${sample}.nofirst1bp.dedup.filtered.annotated.coding_only.VEP \
    --cache \
    --offline \
    --fork 5

Results in fasta files like this:

>ENSP00000364569
MTQLMKAAKSGTKDGLEKTRMAVMRKVSFLHRKDVLGDSEEEDMGLLEVSVSDIKPPAPELGPMPEGLSPQQVVRRHILG
SIVQSEGSYVESLKRILQDYRNPLMEMEPKALSARKCQVVFFRVKEILHCHSMFQIALSSRVAEWDSTEKIGDLFVASFS
KSMVLDVYSDYVNNFTSAMSIIKKACLTKPAFLEFLKRRQVCSPDRVTLYGLMVKPIQRFPQFILLLQDMLKNTPRGHPD
RLSLQLALTELETLAEKLNEQKRLADQVAEIQQLTKSVSDRSSLNKLLTSGQRQLLLCETLTETVY>ENSP00000368015
MRRATVEREMELRHKNEMLRVETEARARAKAERENADIIREQIRLKASEHRQTVLESIRTAGTLFGEGFRAFVTDRDKVT
ATVNIFIKQGWQVAERQHHFRRRRWADHEVRRSRSSW
>ENSP00000234800
MSSSVKTPALEELVPGSEEKPKGRSPLSWGSLFGHRSEKIVFAKSDGGTDENVLTVTITETTVIESDLGVWSSRALLYLT
LWFFFSFCTLFLNKYILSLLGGEPSMLGAVQMLSTTVIGCVKTLVPCCLYQHKARLSYPPNFLMTMLFVGLMRFATVVLG
LVSLKNVAVSFAETVKSSAPIFTVIMSRMILGEYTGLLVNLSLIPVMGGLALCTATEISFNVLGFSAALSTNIMDCLQNV
FSKKLLSGDKYRFSAPELQFYTSAAAVAMLVPARVFFTDVPVIGRSGKSFSYNQDVVLLLLTDGVLFHLQSVTAYALMGK
ISPVTFSVASTVKHALSIWLSVIVFGNKITSLSAVGTALVTVGVLLYNKARQHQQEALQSLAAATGRAPDDTVEPLLPQD
PRQHP

It looks like because of the forking the file is partially overwriten, as the file pointer
is not synchronised between the forks.

The following command (without --fork) does not generate this problem.

/software/variant_effect_predictor/ensembl-tools-release-85/scripts/variant_effect_predictor/variant_effect_predictor.pl \
    --species homo_sapiens \
    --format vcf \
    --coding_only \
    --symbol \
    --protein \
    --uniprot \
    --plugin ProteinSeqs,${sample}.prot_seq.reference.fa,${sample}.prot_seq.mutated.fa \
    --custom ${sample}.nofirst1bp.dedup.filtered.depth.bed.gz,DP \
    --custom ${sample}.nofirst1bp.dedup.filtered.vaf.bed.gz,VAF \
    --input ${sample}.nofirst1bp.dedup.filtered.vcf.gz \
    --output ${sample}.nofirst1bp.dedup.filtered.annotated.coding_only.VEP \
    --cache \
    --offline

Now the FASTA file looks like this:

>ENSP00000368015
MRRATVEREMELRHKNEMLRVETEARARAKAERENADIIREQIRLKASEHRQTVLESIRTAGTLFGEGFRAFVTDRDKVT
ATVNIFIKQGWQVAERQHHFRRRRWADHEVRRSRSSW
>ENSP00000234800
MSSSVKTPALEELVPGSEEKPKGRSPLSWGSLFGHRSEKIVFAKSDGGTDENVLTVTITETTVIESDLGVWSSRALLYLT
LWFFFSFCTLFLNKYILSLLGGEPSMLGAVQMLSTTVIGCVKTLVPCCLYQHKARLSYPPNFLMTMLFVGLMRFATVVLG
LVSLKNVAVSFAETVKSSAPIFTVIMSRMILGEYTGLLVNLSLIPVMGGLALCTATEISFNVLGFSAALSTNIMDCLQNV
FSKKLLSGDKYRFSAPELQFYTSAAAVAMLVPARVFFTDVPVIGRSGKSFSYNQDVVLLLLTDGVLFHLQSVTAYALMGK
ISPVTFSVASTVKHALSIWLSVIVFGNKITSLSAVGTALVTVGVLLYNKARQHQQEALQSLAAATGRAPDDTVEPLLPQD
PRQHP
>ENSP00000367931
MSSSVKTPALEELVPGSEEKPKGRSPLSWGSLFGHRSEKIVFAKSDGGTDENVLTVTITETTVIESDLGVWSSRALLYLT
LWFFFSFCTLFLNKYILSLLGGEPSMLGAVQMLSTTVIGCVKTLVPCCLYQHKARLSYPPNFLMTMLFVGLMRFATVVLG
LVSLKNVAVSFAETVKSSAPIFTVIMSRMILGEYTGLLVNLSLIPVMGGLALCTATEISFNVLGFSAALSTNIMDCLQNV
FSKKLLSGDKYRFSAPELQFYTSAAAVAMLVPARVFFTDVPVIGRSGKSFSYNQDVVLLLLTDGVLFHLQSVTAYALMGK
ISPVTFSVASTVKHALSIWLSVIVFGNKITSLSAVGTALVTVGVLLYNKARQHQQEALQSLAAATGRAPDDTVEPLLPQD
PRQHP
...
>ENSP00000364564
MASSNPPPQPAIGDQLVPGVPGPSSEAEDDPGEAFEFDDSDDEEDTSAALGVPSLAPERDTDPPLIHLDSIPVTDPDPAA
APPGTGVPAWVSNGDAADAAFSGARHSSWKRKSSRRIDRFTFPALEEDVIYDDVPCESPDAHQPGAERNLLYEDAHRAGA
PRQAEDLGWSSSEFESYSEDSGEEAKPEVEVEPAKHRVSFQPKMTQLMKAAKSGTKDGLEKTRMAVMRKVSFLHRKDVLG
DSEEEDMGLLEVSVSDIKPPAPELGPMPEGLSPQQVVRRHILGSIVQSEGSYVESLKRILQDYRNPLMEMEPKALSARKC
QVVFFRVKEILHCHSMFQIALSSRVAEWDSTEKIGDLFVASFSKSMVLDVYSDYVNNFTSAMSIIKKACLTKPAFLEFLK
RRQVCSPDRVTLYGLMVKPIQRFPQFILLLQDMLKNTPRGHPDRLSLQLALTELETLAEKLNEQKRLADQVAEIQQLTKS
VSDRSSLNKLLTSGQRQLLLCETLTETVYGDRGQLIKSKERRVFLLNDMLVCANINFKPANHRGQLEISSLVPLGPKYVV
KWNTALPQVQVVEVGQDGGTYDKDNVLIQHSGAKKASASGQAQNKVYLGPPRLFQELQDLQKDLAVVEQITLLISTLHGT
YQNLNMTVAQDWCLALQRLMRVKEEEIHSANKCRLRLLLPGKPDKSGRPISFMVVFITPNPLSKISWVNRLHLAKIGLRE
ENQPGWLCPDEDKKSKAPFWCPILACCIPAFSSRALSLQLGALVHSPVNCPLLGFSAVSTSLPQGYLWVGGGQEGAGGQV
EIFSLNRPSPRTVKSFPLAAPVLCMEYIPELEEEAESRDESPTVADPSATVHPTICLGLQDGSILLYSSVDTGTQCLVSC
RSPGLQPVLCLRHSPFHLLAGLQDGTLAAYPRTSGGVLWDLESPPVCLTVGPGPVRTLLSLEDAVWASCGPWVTVLEATT
LQPQQSFEAHQDEAVSVTHMVKAGSGVWMAFSSGTSIRLFHTETLEHLQEINIATRTTFLLPGQKHLCVTSLLICQGLLW
VGTDQGVIVLLPVPRLEGIPKITGKGMVSLNGHCGPVAFLAVATSILAPDILRSDQEEAEGPRAEEDKPDGQAHEPMPDS
HVGRELTRKKGILLQYRLRSTAHLPGPLLSMREPAPADGAALEHSEEDGSIYEMADDPDIWVRSRPCARDAHRKEICSVA
IISGGQGYRNFGSALGSSGRQAPCGETDSTLLIWQVPLML
>ENSP00000364569
MTQLMKAAKSGTKDGLEKTRMAVMRKVSFLHRKDVLGDSEEEDMGLLEVSVSDIKPPAPELGPMPEGLSPQQVVRRHILG
SIVQSEGSYVESLKRILQDYRNPLMEMEPKALSARKCQVVFFRVKEILHCHSMFQIALSSRVAEWDSTEKIGDLFVASFS
KSMVLDVYSDYVNNFTSAMSIIKKACLTKPAFLEFLKRRQVCSPDRVTLYGLMVKPIQRFPQFILLLQDMLKNTPRGHPD
RLSLQLALTELETLAEKLNEQKRLADQVAEIQQLTKSVSDRSSLNKLLTSGQRQLLLCETLTETVYGDRGQLIKSKERRV
FLLNDMLVCANINFKPANHRGQLEISSLVPLGPKYVVKWNTALPQVQVVEVGQDGGTYDKDNVLIQHSGAKKASASGQAQ
NKVYLGPPRLFQELQDLQKDLAVVEQITLLISTLHGTYQNLNMTVAQDWCLALQRLMRVKEEEIHSANKCRLRLLLPGKP
DKGTWSDM
>ENSP00000394621
MASSNPPPQPAIGDQLVPGVPGPSSEAEDDPGEAFEFDDSDDEEDTSAALGVPSLAPERDTDPPLIHLDSIPVTDPDPAA
APPGTGVPAWVSNGDAADAAFSGARHSSWKRKSSRRIDRFTFPALEEDVIYDDVPCESPDAHQPGAERNLLYEDAHRAGA
PRQAEDLGWSSSEFESYSEDSGEEAKPEVEVEPAKHRVSFQPKLSPDLTRLKERYARTKRDILALRVGGRDMQELKHKYD
CKMTQLMKAAKSGTKDGLEKTRMAVMRKVSFLHRKDVLGDSEEEDMGLLEVSVSDIKPPAPELGPMPEGLSPQQVVRRHI
LGSIVQSEGSYVESLKRILQDYRNPLMEMEPKALSARKCQVVFFRVKEILHCHSMFQIALSSRVAEWDSTEKIGDLFVAS
FSKSMVLDVYSDYVNNFTSAMSIIKKACLTKPAFLEFLKRRQVCSPDRVTLYGLMVKPIQRFPQFILLLQDMLKNTPRGH
PDRLSLQLALTELETLAEKLNEQKRLADQVAEIQQLTKSVSDRSSLNKLLTSGQRQLLLCETLTETVYGDRGQLIKSKER
RVFLLNDMLVCANINFKGQLEISSLVPLGPKYVVKWNTALPQVQVVEVGQDGGTYDKDNVLIQHSGAKKASASGQAQNKV
YLGPPRLFQELQDLQKDLAVVEQITLLISTLHGTYQNLNMTVAQDWCLALQRLMRVKEEEIHSANKCRLRLLLPGKPDKS
GRPISFMVVFITPNPLSKISWVNRLHLAKIGLREENQPGWLCPDEDKKSKAPFWCPILACCIPAFSSRALSLQLGALVHS
PVNCPLLGFSAVSTSLPQGYLWVGGGQEGAGGQVEIFSLNRPSPRTVKSFPLAAPVLCMEYIPELEEEAESRDESPTVAD
PSATVHPTICLGLQDGSILLYSSVDTGTQCLVSCRSPGLQPVLCLRHSPFHLLAGLQDGTLAAYPRTSGGVLWDLESPPV
CLTVGPGPVRTLLSLEDAVWASCGPWVTVLEATTLQPQQSFEAHQDEAVSVTHMVKAGSGVWMAFSSGTSIRLFHTETLE
HLQEINIATRTTFLLPDRSLIKCSPRA

performance of GO plugin

I tried to enable GO plugin in VEP

the plugin query database over internet
that make the speed of processing variants slow down from 2500 vars/sec to 19 vars/sec

is it possible to query a local-installed database or cache?
and which mysql database I should download?

Plugin for annotation in ClinVar and OMIM

Hi,

I just would like to ask is there any plugin for VEP that can further annotate variant with information in database ClinVar and OMIM? Or I need to write one by myself?

Thanks a lot

Joyce

Wildtype.pm

Why is there no Wildtype.pm plugin? @

Can you help me in trouble with the dbNSFP plugin?

Hello:

When I use the dbNSFP plugin, it writes all the results in a single line in my output file. that is, dbNSFP disrupts the readability of the output, making it dysfunctional. not only their output, but all other output writes in a single line. I don't know what it says because it doesn't have readability.

Failed to compile plugin LoF

I'm struggling to get the LoF plugin running in VEP (88). I've specified my plugin directory and VEP is looking there for the plugins. Everytime I run VEP during initialisiation, I get the following:

Failed to compile plugin LoF:
2017-07-19 16:44:34 - Failed to compile plugin LoF: Can't open me2x3acc1!

I've grabbed the maxEntScan stuff from GitHub and put that in the Plugin directory, but still no luck.

Any suggestions?

ExAC plugin fails at multiallelic sites

The ExAC plugin doesn't give freq on multiallelic sites, such as this:
1 1222267 rs11260579 G T

SOLVED How can I solve such an error when I run the G2P plugin?

How can I solve such an error when I run the G2P plugin?

ERROR:
WARNING: Failed to instantiate plugin G2P:
-------------------- EXCEPTION --------------------
MSG: Could not get adaptor VCFCollection for homo_sapiens variation

STACK Bio::EnsEMBL::DBSQL::DBAdaptor::AUTOLOAD /home/ertan/Desktop/vep/ensembl-vep/Bio/EnsEMBL/DBSQL/DBAdaptor.pm:993
STACK G2P::new /home/ertan/.vep/Plugins/G2P.pm:330
STACK (eval) /home/ertan/Desktop/vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:895
STACK Bio::EnsEMBL::VEP::Runner::get_all_Plugins /home/ertan/Desktop/vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:894
STACK Bio::EnsEMBL::VEP::Runner::init /home/ertan/Desktop/vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:120
STACK Bio::EnsEMBL::VEP::Runner::run /home/ertan/Desktop/vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:194
STACK toplevel ./vep:218
Date (localtime) = Sat May 18 09:27:31 2019
Ensembl API version = 96

COMMAND:
./vep -i FMF-31.vcf -o /home/ertan/Desktop/deneme.txt --plugin G2P,file=/home/ertan/Desktop/DDG2P_17_5_2019.csv --cache --port 3337 --force_overwrite

plugin_config.txt in release/85 refers to release/84 URLs

Not sure if 85 is in some sort of pre-release state and this is currently intentional, but doesn't look right.

VEP misesses some dbSNFP entries

Hi,
let's start from beginning:

format dbNSFP:

wget ftp://dbnsfp:[email protected]/dbNSFP4.0b1a.zip
unzip dbNSFP4.0b1a.zip
cd dbNSFP40b1a

zcat dbNSFP4.0b1a_variant.chr*.gz  | grep -v ^#chr | awk '$8 != "."' | sort -T /mnt/tmp -k8,8 -k9,9n - | cat h - | bgzip -c > dbNSFP4.0b1a_variant.chr.ALL.gz
tabix -s 8 -b 9 -e 9 dbNSFP4.0b1a_variant.chr.ALL.gz

let's check the position directly from dbNSFP4.0b1a_variant.chr.ALL.gz file:
tabix hg19_dbNSFP4.0b1a_variant.chr.ALL.gz 21:44483184-44483184
(...and here get plenty of annotations).
my VCF file line:

#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	XXXXY
21	44483184	21_44483184_A_G	A	G	.	.	.	GT	0/1

run VEP with selected (almost all) dbNSFP columns:
docker run --rm -t -i \ -v /mnt/ssd_01/VEP/vepCache:/opt/vep/.vep \ -v /mnt/ssd_01/DATA:/home/vep/DATA \ -v /mnt/sata_03/DATABASES:/home/vep/DATABASES \ -v /mnt/ssd_01/VEP/runVEP:/home/vep/runVEP \ -v /mnt/ssd_01/refs/:/home/vep/refs ensemblorg/ensembl-vep:94 vep \ --format "vcf" \ -i $inVepFile \ -o $outVepFile \ --no_stats \ --cache \ --fasta /home/vep/refs/hs37d5_noHap.fa --port 3337 --ASSEMBLY GRCh37 \ --refseq --tab \ --use_transcript_ref \ --variant_class --sift b --polyphen b --humdiv --gene_phenotype --regulatory --no_escape \ --hgvs --hgvsg --shift_hgvs 1 --symbol --protein --ccds --uniprot --numbers --domains --canonical --biotype \ --af --af_1kg --af_esp --af_gnomad --max_af --pubmed \ --exclude_predicted \ --buffer_size 50 \ --plugin dbNSFP,/home/vep/DATABASES/dbNSFP40b1a/hg19_dbNSFP4.0b1a_variant.chr.ALL.gz,ref,alt,aaref,aaalt,rs_dbSNP151,hg19_chr,"hg19_pos(1-based)",hg18_chr,"hg18_pos(1-based)",aapos,genename,Ensembl_geneid,Ensembl_transcriptid,Ensembl_proteinid,Uniprot_acc,Uniprot_entry,APPRIS,GENCODE_basic,TSL,VEP_canonical,cds_strand,refcodon,codonpos,codon_degeneracy,Ancestral_allele,AltaiNeandertal,Denisova,VindijiaNeandertal,SIFT_score,SIFT_converted_rankscore,SIFT_pred,SIFT4G_score,SIFT4G_converted_rankscore,SIFT4G_pred,Polyphen2_HDIV_score,Polyphen2_HDIV_rankscore,Polyphen2_HDIV_pred,Polyphen2_HVAR_score,Polyphen2_HVAR_rankscore,Polyphen2_HVAR_pred,LRT_score,LRT_converted_rankscore,LRT_pred,LRT_Omega,MutationTaster_score,MutationTaster_converted_rankscore,MutationTaster_pred,MutationTaster_model,MutationTaster_AAE,MutationAssessor_score,MutationAssessor_rankscore,MutationAssessor_pred,FATHMM_score,FATHMM_converted_rankscore,FATHMM_pred,PROVEAN_score,PROVEAN_converted_rankscore,PROVEAN_pred,VEST4_score,VEST4_rankscore,MetaSVM_score,MetaSVM_rankscore,MetaSVM_pred,MetaLR_score,MetaLR_rankscore,MetaLR_pred,Reliability_index,M-CAP_score,M-CAP_rankscore,M-CAP_pred,REVEL_score,REVEL_rankscore,MutPred_score,MutPred_rankscore,MutPred_protID,MutPred_AAchange,MutPred_Top5features,MVP_score,MVP_rankscore,MPC_score,MPC_rankscore,PrimateAI_score,PrimateAI_rankscore,PrimateAI_pred,DEOGEN2_score,DEOGEN2_rankscore,DEOGEN2_pred,Aloft_Fraction_transcripts_affected,Aloft_prob_Tolerant,Aloft_prob_Recessive,Aloft_prob_Dominant,Aloft_pred,Aloft_Confidence,CADD_raw,CADD_raw_rankscore,CADD_phred,DANN_score,DANN_rankscore,fathmm-MKL_coding_score,fathmm-MKL_coding_rankscore,fathmm-MKL_coding_pred,fathmm-MKL_coding_group,fathmm-XF_coding_score,fathmm-XF_coding_rankscore,fathmm-XF_coding_pred,Eigen-raw_coding,Eigen-raw_coding_rankscore,Eigen-pred_coding,Eigen-PC-raw_coding,Eigen-PC-raw_coding_rankscore,Eigen-PC-phred_coding,GenoCanyon_score,GenoCanyon_rankscore,integrated_fitCons_score,integrated_fitCons_rankscore,integrated_confidence_value,GM12878_fitCons_score,GM12878_fitCons_rankscore,GM12878_confidence_value,H1-hESC_fitCons_score,H1-hESC_fitCons_rankscore,H1-hESC_confidence_value,HUVEC_fitCons_score,HUVEC_fitCons_rankscore,HUVEC_confidence_value,LINSIGHT,LINSIGHT_rankscore,GERP++_NR,GERP++_RS,GERP++_RS_rankscore,phyloP100way_vertebrate,phyloP100way_vertebrate_rankscore,phyloP30way_mammalian,phyloP30way_mammalian_rankscore,phyloP17way_primate,phyloP17way_primate_rankscore,phastCons100way_vertebrate,phastCons100way_vertebrate_rankscore,phastCons30way_mammalian,phastCons30way_mammalian_rankscore,phastCons17way_primate,phastCons17way_primate_rankscore,29way_pi,29way_logOdds,29way_logOdds_rankscore,bStatistic,bStatistic_rankscore,1000Gp3_AC,1000Gp3_AF,1000Gp3_EUR_AC,1000Gp3_EUR_AF,TWINSUK_AC,TWINSUK_AF,ALSPAC_AC,ALSPAC_AF,UK10K_AC,UK10K_AF,ESP6500_AA_AC,ESP6500_AA_AF,ESP6500_EA_AC,ESP6500_EA_AF,ExAC_AC,ExAC_AF,ExAC_Adj_AC,ExAC_Adj_AF,ExAC_NFE_AC,ExAC_NFE_AF,gnomAD_exomes_flag,gnomAD_exomes_AC,gnomAD_exomes_AN,gnomAD_exomes_AF,gnomAD_exomes_nhomalt,gnomAD_exomes_ASJ_AC,gnomAD_exomes_ASJ_AN,gnomAD_exomes_ASJ_AF,gnomAD_exomes_ASJ_nhomalt,gnomAD_exomes_NFE_AN,gnomAD_exomes_NFE_AF,gnomAD_exomes_NFE_nhomalt,gnomAD_exomes_POPMAX_AC,gnomAD_exomes_POPMAX_AN,gnomAD_exomes_POPMAX_AF,gnomAD_exomes_POPMAX_nhomalt,gnomAD_genomes_flag,gnomAD_genomes_AC,gnomAD_genomes_AN,gnomAD_genomes_AF,gnomAD_genomes_nhomalt,gnomAD_genomes_ASJ_AC,gnomAD_genomes_ASJ_AN,gnomAD_genomes_ASJ_AF,gnomAD_genomes_ASJ_nhomalt,gnomAD_genomes_NFE_AC,gnomAD_genomes_NFE_AN,gnomAD_genomes_NFE_AF,gnomAD_genomes_NFE_nhomalt,gnomAD_genomes_POPMAX_AC,gnomAD_genomes_POPMAX_AN,gnomAD_genomes_POPMAX_AF,gnomAD_genomes_POPMAX_nhomalt,clinvar_rs,clinvar_clnsig,clinvar_trait,clinvar_review,clinvar_hgvs,clinvar_var_source,Interpro_domain,GTEx_V7_gene,GTEx_V7_tissue,Geuvadis_eQTL_target_gene \ --plugin dbscSNV,/home/vep/DATA/dbscSNP/dbscSNV1.1_GRCh37.txt.gz \ --plugin SpliceRegion \ --plugin MaxEntScan,/home/vep/DATA/MaxEntScan \ --plugin GeneSplicer,/home/vep/DATA/GeneSplicer/sources/genesplicer,/home/vep/DATA/GeneSplicer/human \ --plugin ExACpLI,/home/vep/DATA/ExACpLI/ExACpLI_values.txt \ --plugin Phenotypes \ -custom /home/vep/DATA/gnomAD/gnomad.genomes.r2.0.1.sites.noVEP.vcf.gz,gnomADg,vcf,exact,0,AF_NFE,POPMAX,AF \ --fork 40

After running VEP with dbNSFP in this case (21_44483184_A_G) I don't get any annotations.
I happens sometimes...

is there some error in the dbNSFP module?

EDIT:
mind that I use bash line seprator "" to break long bash command into several lines

CADD plugin not annotating indels?

Hi -

We are just starting to use VEP and the CADD plugin to annotate our variants. I noticed that our InDels are not being annotated (only the snps) with CADD Scores. I double checked, and the InDELs are present in the CADD InDels.tsv.gz file that we are specifying when we run the CADD plugin: --plugin CADD,whole_genome_SNVs.tsv.gz,InDels.tsv.gz

I am attaching a very small VCF file (only contains 3 indels, each of which are present in the CADD v 1.3 InDels.tsv.gz file we are running with). I was hoping you might have some recommendations on how we can leverage the CADD plugin to annotate indels when there is a match with the InDels.tsv.gz file.

Thanks,

Ann

GJB2.1kg.phase3.v5a.INDELS.vcf.zip

How to contribute to VEP with a new Plugin?

I have developed VEP plugin and want to add this plugin into VEP. However, there is no documentation that explains the contribution process.

Should I just open pull request which adds my plugin as single Perl file into the vep_plugin project? Are there other guidelines which our plugin or algorithm need to meet? Can you give further information about the contribution and review processes?

GXA plugin support

Hello.
I cannot find any info if GXA plugin is still supported by recent VEP releases.
Last time it was in 91 release plugin_config.txt#L842.
Is it still supported by VEP or it is still in repository for historical reasons?

Can't locate NearestExonJB.pm

Hi,
the error:

WARNING: Failed to compile plugin NearestExonJB: Can't locate NearestExonJB.pm in @INC (you may need to install the NearestExonJB module) (@INC contains: /opt/vep/.vep/Plugins /opt/vep/src/ensembl-vep/modules /opt/vep/src/ensembl-vep /etc/perl /usr/local/lib/x86_64-linux-gnu/perl/5.26.1 /usr/local/share/perl/5.26.1 /usr/lib/x86_64-linux-gnu/perl5/5.26 /usr/share/perl5 /usr/lib/x86_64-linux-gnu/perl/5.26 /usr/share/perl/5.26 /usr/local/lib/site_perl /usr/lib/x86_64-linux-gnu/perl-base) at (eval 51) line 2.
BEGIN failed--compilation aborted at (eval 51) line 2.

2019-04-17 09:30:17 - INFO: BAM-edited cache detected, enabling --use_transcript_ref; use --use_given_ref to override this

the docker VEP96 command:

docker run --rm -t -i \
-v /mnt/ssd_01/VEP/vepCache:/opt/vep/.vep \
-v /mnt/ssd_01/DATA:/home/vep/DATA  \
-v /mnt/sata_03/DATABASES:/home/vep/DATABASES \
-v $PWD:$PWD \
-v /mnt/ssd_01/refs/:/home/vep/refs ensemblorg/ensembl-vep vep \
--format "vcf" \
-i $inFileFull \
-o $outVepFile \
--no_stats \
--cache \
--fasta /home/vep/refs/hs37d5_noHap.fa --port 3337 --ASSEMBLY GRCh37 \
--merged --tab \
--use_transcript_ref \
--variant_class --sift b --polyphen b --humdiv --gene_phenotype --regulatory \
--hgvs --hgvsg --shift_hgvs 1 --symbol --protein --ccds --uniprot  --numbers --domains --canonical --biotype \
--af --af_1kg --af_esp --af_gnomad --max_af --pubmed \
--exclude_predicted \
--buffer_size 50 \
--plugin NearestExonJB

the cheers:
cheers

missing dependency when downloading LoF plugin

Hi,

The LofTee plugin has a missing dependency when downloading it with the automated installer.
current config is

    # LOFTEE
    # Requires LoFtool_scores.txt file as first param (available in VEP_plugins GitHub repo)
    {
      "key" => "LoF",
      "helptip" => "LOFTEE identifies LoF (loss-of-function) variation",
      "available" => 0,
      "enabled" => 0,
      "section" => "Pathogenicity predictions",
      "plugin_url" => "https://raw.githubusercontent.com/konradjk/loftee/master/LoF.pm",
      "requires_data" => 1,
      "requires_install" => 1,
      "params" => [
        "@*"
      ]
    },

But the plugin depends on splice_module.pl which can be found in the same repo. Can this file be downloaded too when installing the loftee plugin?

excerpt from LoF.pm:

package LoF;
require "splice_module.pl";

I'm not familiar with the inner workings of the download process, but could something like this be added?

"plugin_url" => "https://raw.githubusercontent.com/konradjk/loftee/master/LoF.pm",
"plugin_depend_url" => "https://raw.githubusercontent.com/konradjk/loftee/master/splice_module.pl",

Thanks
M

dbNSFP plugin - whitespaces and commas in annotation fields

Hello,

I ran into several whitespaces and commas when trying to parse a VEP-annotated VCF today. From what I can see, at least two columns from dbNSFP (Interpro_domain and FATHMM_score) when used for annotation with the dbNSFP plugin result in whitespaces and commas being introduced in the resulting VCF.

Example fields:
FATHMM_score -2.2,-2.2
Interpro_domain TNFR/CD27/30/40/95cysteine-richregion(1),
Interpro_domain GPCR, rhodopsin-likesuperfamily(1),
clinvar_trait Microcytic anemia

I'm assuming these are simply pulled in from the original database files, however is this intended behaviour?

Thank you for your time.

EDIT:
##VEP="v87" time="2017-03-10 13:08:10" cache="/opt/ensembl-vep/cache/homo_sapiens/87_GRCh37" ensembl=87.f547798 ensembl-io=87.48cb128 ensembl-funcgen=87.0577dd0 ensembl-variation=87.661e72c 1000genomes="phase3" COSMIC="78" ClinVar="201610" ESP="20141103" ExAC="0.3" HGMD-PUBLIC="20162" assembly="GRCh37.p13" dbSNP="147" gencode="GENCODE 19" genebuild="2011-04" polyphen="2.2.2" regbuild="1.0" sift="sift5.2.2"

The annotation was done with ensemble-vep prerelease, so the comma is possibly fixed? I'll test with a new build.

installing dbNSFP takes forever

Hello, I followed the command in order to download dbNSFP data,

but I'm stuck at this last line, I lauched it and let it run for 4 days, still not finished, is it normal ?

PS; computer has 32go RAM, 3 To .. so I don't understand why so slow

ExAC plugin skips indels

The ExAC plugin appears to skip over indels and not annotate them. I believe the following change fixes the problem.

170c170
<       next unless $vcf_vf->{start} == $vf->{end} && $vcf_vf->{start} == $vf->{end};

---
>       next unless $vcf_vf->{start} == $vf->{start} && $vcf_vf->{end} == $vf->{end};

CADD plugin output

Hi,

I just started to use VEP with CADD plugin and found out that its output in VCF in 'Extra' column is almost always in a different order. Why is that and couldn't it be fixed?

Thanks,
Petr

argument isn't numeric in division

running VEP version 84,

Argument "10404,66738" isn't numeric in division (/) at /VEP/.vep/Plugins/ExAC.pm line 249, <TABIX> line 2.

Seeing multiple instances of this. Assuming its a multi-allelic variant? Using the databases:

ExAC.r0.3.1.sites.vep.vcf.gz
and the associated tbi as downloaded from exac.

clinvar_* values use | char as separator, wreaking havoc on CSQ field parsing

For example, below is a record from dbNSFPv2.9
You can see values such as 5|5|5 and Eichsfeld_type_congenital_muscular_dystrophy|Congenital_myopathy_with_fiber_type_disproportion|not_provided.

1   26136244    G   A   G   S   rs121908188 26008831    .   25809753    SEPN1   Q9NZV5-2;Q9NZV5 .;SELN_HUMAN    281;315 .   +   GGC -29.8321    1   0   G   ENSG00000162430 ENST00000361547;ENST00000354177;ENST00000374315 315;281;281 ENSP00000346109:G281S   ENSP00000355141:G315S;ENSP00000346109:G281S;ENSP00000363434:G281S   0.01;0.01;0.01  0.55262 D;D;D   1.0;1.0 0.89917 D;D 0.999;0.999 0.91635 D;D 0.000000    0.85682 D   1.000   0.70825 A   2.215   0.72894 M   -3.89;-3.91;-3.9    0.96104 D;D;D   1.0365  0.98010 D   0.9153  0.97411 D   9   0.981   0.98373 -3.39;-4.44;-4.44   0.77401 D;D;D   2.685513    0.65476 22.5    4.84    4.84    0.62591 0.462000    0.41574 2.527000    0.85204 9.623000    0.98386 0.840000    0.47671 1.000000    0.80357 1.000000    0.71417 0.0:0.0:1.0:0.0 18.1324 0.89605 0.000000    .   .   .   .   .   .   .   .   .   .   .   2.33E-4 5.88E-4 0   0.00000 4   0.00000 21  1.733e-04   21  1.740e-04   1.020e-04   0   0   0   0   2   3.024e-04   18  2.700e-04   0   0   rs121908188 5|5|5   Eichsfeld_type_congenital_muscular_dystrophy|Congenital_myopathy_with_fiber_type_disproportion|not_provided .   .

Each of these values currently ends up in the CSQ field as-is and so is parsed as several CSQ values instead of 1.

LD plugin: invalid population EUR

When trying to find SNPs in LD with the requested SNP for EUR population, an error occurs:

./vep -id rs1042779 -o /opt/vep/.vep/output_with_LD.txt --tab --cache --merge --force_overwrite --plugin LD,1000GENOMES:phase_3:EUR,0.4
WARNING: Failed to instantiate plugin LD: Invalid population '1000GENOMES:phase_3:EUR'; valid populations are:
<...>

I rechecked using the REST API:
https://rest.ensembl.org/ld/human/rs1042779/1000GENOMES:phase_3:EUR?r2=0.4;content-type=application/json
REST API gives acceptable results.

Download site of FATHMM's data is down

Based on the description in the script

VEP_plugins/FATHMM.pm

Line 46 in d374e49

> wget ftp://supfam2.cs.bris.ac.uk/FATHMM/database/fathmm.v2.1.SQL

, I should be able to download it from a ftp site.

However, the ftp isn't live now.

The only resource I can find now is http://fathmm.biocompute.org.uk/database/fathmm.v2.3.SQL.gz (in there github repo: https://github.com/HAShihab/fathmm), but the version number is different.

MaxEntScan score interpretation

What is the interpretation of MaxEntScan scores?

for wariant: 11_117863938_C_G I get such numbers.

MaxEntScan_alt    3.8294583218888
MaxEntScan_diff   1.02692945321821
MaxEntScan_ref    4.85638777510701

I guess these numbers are for splicing acceptor, as we have NM_001558.3:c.368-18C>G
Is there any min/max value? I can't find any clear answer...

Thanks
Damian

LOVD plugin seems to pass "chr" twice to search API

Hi!

For the past few weeks, I've suddenly been receiving faulty API calls in this form:
/search.php?build=hg19&position=chrchr4:21238957_21238957
Note the double chr in the position.
Since the LOVD module doesn't pass a user agent to our services, I'm not 100% sure these calls are from the VEP module, however:

The calls started suddenly without tests, and a dev making his own client would test first.
The calls come from multiple IP adresses, that don't seem to be related.
The user agent is libwww-perl, just like LOVD.pm would have.
The search strings use double positions even for single position variants, which is not documented, but does work also and the LOVD plugin always shows this behaviour.
The code of LOVD.pm does seem to be susceptible for this bug, depending on the VCF file parser.

Did perhaps the VCF file parser change recently?

I would be very grateful if you could look into whether or not, through updates in the VCF parser, this bug got introduced? Also, if indeed this is something you're fixing on your side, could you please set a user agent string to something that would make recognizing VEP calls easy?

Thanks!

FATHMM plugin

The latest release of fathmm.py no longer requires the -i and -o flags therefore the VEP plugin needs to be edited.

The line to change i:

my $fathmm_err = cd $command_dir; $command -i $tmp_in_file -o $tmp_out_file;

to this:

my $fathmm_err = cd $command_dir; $command $tmp_in_file $tmp_out_file;

Apologies I am relatively new to GitHub and have not yet learned how to "submit patches" (if that's the correct terminology?!). I will learn but am off on holiday for 2 weeks and have run out of time!

Cheers

Chris

ExAC plugin returning only ExAC_AF, not population-specific AF

I need to dig through the source, but I am seeing empty columns for everything except ExAC_AF. The source ExAC VCF is directly from the ExAC download page.

Plugin 'ExAC' went wrong: Illegal division by zero

Hi Sir/Madam,

I am currently testing out the VEP program that was download from (https://github.com/Ensembl/ensembl-tools/archive/release/84.zip).

The VEP program ran to completion and generate reasonable outputs (including ExAC frequencies). However, I noticed that I am getting the following repeated error lines:

Plugin 'ExAC' went wrong: Illegal division by zero at /root/vep/Plugins/ExAC.pm line 253, line 2.

Plugin 'ExAC' went wrong: Illegal division by zero at /root/vep/Plugins/ExAC.pm line 253, line 4.

Plugin 'ExAC' went wrong: Illegal division by zero at /root/vep/Plugins/ExAC.pm line 253, line 6.

Plugin 'ExAC' went wrong: Illegal division by zero at /root/vep/Plugins/ExAC.pm line 253, line 8.

Plugin 'ExAC' went wrong: Illegal division by zero at /root/vep/Plugins/ExAC.pm line 253, line 10.

Plugin 'ExAC' went wrong: Illegal division by zero at /root/vep/Plugins/ExAC.pm line 253, line 1.

Plugin 'ExAC' went wrong: Illegal division by zero at /root/vep/Plugins/ExAC.pm line 253, line 2.

Plugin 'ExAC' went wrong: Illegal division by zero at /root/vep/Plugins/ExAC.pm line 253, line 3.

Plugin 'ExAC' went wrong: Illegal division by zero at /root/vep/Plugins/ExAC.pm line 253, line 4.

Plugin 'ExAC' went wrong: Illegal division by zero at /root/vep/Plugins/ExAC.pm line 253, line 5.

Plugin 'ExAC' went wrong: Illegal division by zero at /root/vep/Plugins/ExAC.pm line 253, line 6.

Plugin 'ExAC' went wrong: Illegal division by zero at /root/vep/Plugins/ExAC.pm line 253, line 7.

Is this is something I should be worried about?

The input VCF coordinates are in GRCh37 reference. I obtained the ExAC resource files from https://googledrive.com/host/0B6o74flPT8FAYnBJTk9aTF9WVnM and then used tabix v0.2.6 to create the *.tbi file.

I can provide more information/data to reproduce this error if needed.

Blank columns when using dbNSFP

I am currently attempting to use the dbNSFP database with Ensembl VEP to annotate a list of variants that I have. I have a number of questions about the installation and use of the plugin:
Firstly I am using dbNSFPv4.0b2a.zip (February 20) which I downloaded from the google site.
I have also run through the installation for dbNSFP using the VEP’s installation script which created the dbNSFP.pm file under ~/.vep/Plugins/.

I then followed the instructions below to unzip and create a useable, tabix indexed .gz file for use with hg38 data:

unzip dbNSFP4.0b2a.zip
head -n1 dbNSFP4.0b2a_variant.chr1 > h
cat dbNSFP4.0b2a_variant.chr* | grep -v ^#chr | sort -k1,1 -k2,2n - | cat h - | bgzip -c > dbNSFP4.0b2a.gz
tabix -s 1 -b 2 -e 2 dbNSFP4.0b2a.gz

The most up-to-date instructions I could find in the dbNSFP.pm VEP plugins git repository mentioned that this was the way to prepare the file for use with VEP. However, when I run VEP, I receive no error messages about being unable to find information from the database or of missing files but instead receive missing output for all of the SNPs in my analysis under the columns I selected from dbNSFP. While I appreciate that not all columns may contain information for every SNP, I believe that the problem may lie with the configuration of the dbNSFP database within VEP as I am currently working with over 10M SNPs from exome sequencing.

Would you be able to give me some clarity as to whether I have made a mistake during the installation of the plugin or perhaps why I may be getting empty annotations across all of my variants?

The portion of the VEP command that relates to dbNSFP is as follows and VEP has been tested and is working without dbNSFP and produces the same output:
--plugin dbNSFP,/rds/project/rjh234/rds-rjh234-mrc-epid/Studies/People/Nick/dbNSFP/dbNSFP4.0b2a.gz,SIFT_pred,Polyphen2_HDIV_pred,Polyphen2_HVAR_pred,gnomAD_exomes_NFE_AF,gnomAD_genomes_NFE_AF

ProteinSeqs plugin truncating protein sequences

Hi there,

I have a VCF file containing the following variant:

chr1 44130707 . CCAA C

...resulting in the inframe_deletion p.Asn378del.
Using the ProteinSeqs plugin, this is the sequence I got for the wildtype:

ENSP00000361373
MYGRPQAEMEQEAGELSRWQAAHQAAQDNENSAPILNMSSSSGSSGVHTSWNQGLPSIQHFPHSAEMLGSPLVSVEAPGQNVNEGGPQFSMPLPERGMSYCPQATLTPSRMIYCQRMSPPQQEMTIFSGPQLMPVGEPNIPRVARPFGGNLRMPPNGLPVSASTGIPIMSHTGNPPVPYPGLSTVPSDETLLGPTVPSTEAQAVLPSMAQMLPPQDAHDLGMPPAESQSLLVLGSQDSLVSQPDSQEGPFLPEQPGPAPQTVEKNSRPQEGTGRRGSSEARPYCCNYENCGKAYTKRSHLVSHQRKHTGERPYSCNWESCSWSFFRSDELRRHMRVHTRYRPYKCDQCSREFMRSDHLKQHQKTHRPGPSDPQANNNNGEQDSPPAAGP

...and this is the protein sequence for the mutant:

ENSP00000361373.3:p.Asn378del
MYGRPQAEMEQEAGELSRWQAAHQAAQDNENSAPILNMSSSSGSSGVHTSWNQGLPSIQHFPHSAEMLGSPLVSVEAPGQ

As it is not a stop_gain mutation, I expected the protein sequence to skip just that missing "N" aminoacid, rather than being truncated. Am I wrong or is this a bug?

Thank you for your help with this!
Silvia

ensembl / vep_plugins Goto Github PK

vep_plugins's People

Stargazers

Watchers

Forkers

vep_plugins's Issues

Recommend Projects

Recommend Topics

Recommend Org