ensembl / vep_plugins Goto Github PK
View Code? Open in Web Editor NEWPlugins for the Ensembl Variant Effect Predictor (VEP)
License: Apache License 2.0
Plugins for the Ensembl Variant Effect Predictor (VEP)
License: Apache License 2.0
I'm hitting an issue where PolyPhen and SIFT scores differ between VEP and dbNSFP.
Here are some examples:
-PolyPhen mismatch:
CHROM POS REF ALT VEP_POLYPHEN VEP_PRED DBNSFP_POLY_HDIV DBNSFP_PRED
1 6485211 C A 0.044 B 0.999 D
3 127336823 G A 0.177 B 0.982,0.596,0.596 D,P,P
10 73574953 G A 0.243 B 1.0 D
Any reasoning behind these discrepancies?
Installation for GRCh37 went without trouble, but still have an issue for the GRCh38 version:
My code (comes from https://www.ensembl.org/info/docs/tools/vep/script/vep_plugins.html#exac):
wget ftp://dbnsfp:[email protected]/dbscSNV1.1.zip
unzip dbscSNV1.1.zip
head -n1 dbscSNV1.1.chr1 > h
cat dbscSNV1.1.chr* | grep -v ^chr | sort -k5,5 -k6,6n | cat h - | bgzip -c > dbscSNV1.1_GRCh38.txt.gz
tabix -s 5 -b 6 -e 6 -c c dbscSNV1.1_GRCh38.txt.gz
But I'm getting that message:
[E::get_intv] failed to parse TBX_GENERIC, was wrong -p [type] used?
The offending line was: "10 11374604 A C . . y n UTR3 CELF2 . . UTR3 ENSG00000048740 . . 0.00378177909178219 ."
Segmentation fault (core dumped)
Anyone has solved that issue ?
Hello,
I'm using the Downstream plugin to get the mutant protein sequence for frameshift variants annotated by VEP GRCh37 and I've found a case where the wrong sequence is returned
Input to VEP : 1 115256528 115256528 T/- +
Expected Downstream sequence : KSTVP*
Returned Downstream sequence : EEYSAMRDQYMRTGEGFLCVFAINNSKSFADINLYREQIKRVKDSDDVPMVLVGNKCDLPTRTVDTKQAHELAKSYGIPFIETSAKTRQGVEDAFYTLVREIRQYRMKKLNSSDDGTQGCMGLPCVVM
The 'ProteinLengthChange' is not as expected either.
I've confirmed with Mutalyzer that the expected sequence is correct.
I'm using VEP cache in offline mode, for the GRCh37 assembly.
Help with this please?
I'm trying to install the CADD (offline) plugin for VEP :
I downloaded conda
wget -c https://repo.continuum.io/miniconda/Miniconda2-latest-Linux-x86_64.sh
bash Miniconda2-latest-Linux-x86_64.sh -p ~/miniconda2 -b
##SPECIFIC WINDOWS setup
export PATH="$HOME/miniconda2/bin:$PATH"
downloaded the git
git clone https://github.com/kircherlab/CADD-scripts.git
So from now on I'm in my directory where I put CADD-scripts.git, I checked that conda is in the path,
but when I lauch the ./install.sh , I'm getting that error (see attached file)
installSH_error.txt
bad interpreter: No such file or directory
I think it might be dure to windows/unix, but no idea how to fix it,
I tried to follow the 2 first procedures described in https://stackoverflow.com/questions/14219092/bash-my-script-bin-bashm-bad-interpreter-no-such-file-or-directory
but still not working
Anyone has a solution/idea :)
Thanks in advance
I would like to use the ExACpLI.pm plugin with the new pLI scores from gnomad: https://storage.googleapis.com/gnomad-public/release/2.1/ht/constraint/constraint.txt.bgz
However, since the plugin is called ExACpLI, when i supply the gnomad pLI scores the CSQ field in the vcf header will say "ExACpLI", which is not true. Since, the plugin allows other values_file than ExACs would it not be more appropriate to rename to a more generic "pLI.pm" or allow to rename the filed name in the CSQ.
Hi there,
We met a syntax error when we using VEP with dbNSFP.pm to annotate variants.
perl /home/zoeching/Tools/src/ensembl-tools/scripts/variant_effect_predictor/variant_effect_predictor.pl \ --offline \ --fork 10\ --pick\ --merged /home/zoeching/pku/Ref\ --everything \ --force_overwrite\ --vcf \ --vcf_info_field CSQ\ --plugin dbNSFP,/home/zoeching/pku/Ref/dbNSFPv2.9.1/dbNSFP.gz,SIFT_score,SIFT_converted_rankscore,SIFT_pred,Polyphen2_HDIV_score,Polyphen2_HDIV_rankscore,Polyphen2_HDIV_pred,Polyphen2_HVAR_score,Polyphen2_HVAR_rankscore,Polyphen2_HVAR_pred,MutationTaster_score,MutationTaster_converted_rankscore,MutationTaster_pred,CADD_raw,CADD_raw_rankscore,CADD_phred,GERP++_NR,GERP++_RS,GERP++_RS_rankscore\ --plugin LoF,human_ancestor_fa:/home/zoeching/pku/Ref/human_ancestor.fa.gz\ -i /work1/ASD/Data/other/ASD569_1.gatk-queue_raw_snps.vcf \ -o ASD569_1.gatk-queue_raw_ann_snps.vcf
after ran the command, I got this:
The output (if any) follows:
2016-09-04 16:02:10 - Read existing cache info
2016-09-04 16:02:10 - INFO: Disabling --hgvs; using --offline and no FASTA file found
2016-09-04 16:02:10 - Failed to compile plugin dbNSFP: syntax error at /home/zoeching/Tools/library/perl_module/lib/perl5/dbNSFP.pm line 304, near "s/[|]/&/gr"
Compilation failed in require at (eval 68) line 2.
BEGIN failed--compilation aborted at (eval 68) line 2.2016-09-04 16:02:11 - Loaded plugin: LoF
2016-09-04 16:02:11 - Starting...
2016-09-04 16:02:11 - Detected format of input file as vcf
2016-09-04 16:02:11 - Read 5000 variants into buffer
2016-09-04 16:02:11 - Calculating consequences
and this
Bareword found where operator expected at /home/zoeching/Tools/library/perl_module/lib/perl5/dbNSFP.pm line 304, near "s/[|]/&/gr"
i didn't get the mean of "r" in "s/[|]/&/gr", line304, so i deleted "r" and ran the command again. At this time, no errors came out but the annotation failed. The columns for annotation from dbNSFP.gz were empty. And i can successfully use "chr pos" to grep infomation from dbNSFP2.9.1_variant.chr* file. So, the input file is okay.
Anyone met this error before? Anything wrong with dbNSFP.pm(line 304)? What does the "r" means here?
Hi, I have problem with my dbNSFP plugin.
WARNING: Failed to instantiate plugin dbNSFP: ERROR: Could not retrieve dbNSFP version from filename /home/ertan/Desktop/vep/dbNSFP3/dbNSFP_hg19.gz. How can I solve this?
my command I used
./vep --fork 4 -cache -port 3337 --use_given_ref --symbol --tab --offline --refseq -i /home/ertan/Desktop/pipeline/playground/NECESSARY/DE-FMF1/DE-FMF1.vcf -o /home/ertan/Desktop/fmf.txt --registry ensembl.registry --polyphen p --sift b --coding_only --genomes --species homo_sapiens --custom /home/ertan/Desktop/vep/clinvar/clinvar_20190513.vcf.gz,ClinVar,vcf,exact,0,CLNSIG --force_overwrite --plugin GO --plugin LOVD --custom /home/ertan/Desktop/vep/gnomAD-exomes/gnomad.exomes.r2.0.1.sites.noVEP.vcf.gz,gnomADg,vcf,exact,0 --plugin dbscSNV,/home/ertan/Desktop/vep/dbscSNV/dbscSNV1.1/dbscSNV1.1_GRCh37.txt.gz --assembly GRCh37 --plugin SpliceRegion,Extended --offline --hgvs --fasta /home/ertan/Desktop/vep/fasta/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa --plugin dbNSFP,/home/ertan/Desktop/vep/dbNSFP3/dbNSFP_hg19.gz,SIFT_score,SIFT_converted_rankscore,SIFT_pred,Polyphen2_HDIV_score,Polyphen2_HDIV_rankscore,Polyphen2_HDIV_pred,Polyphen2_HVAR_score,Polyphen2_HVAR_rankscore,Polyphen2_HVAR_pred,MutationTaster_score,MutationTaster_converted_rankscore,MutationTaster_pred,CADD_raw,CADD_raw_rankscore,CADD_phred,GERP++_NR,GERP++_RS,GERP++_RS_rankscore
hello dear;
I am have a problem dbNSFP plugin on vep
my running command is:
./vep --fork 4 --force_overwrite --coding_only --cache --port 3337 --symbol --tab --polyphen p --sift p --pubmed --hgvs --fasta /home/ertan/Desktop/programlar/ensembl-data/fasta/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa --plugin LOVD --plugin Condel,/home/.vep/Plugins/config/Condel/config/condel_SP.conf,b --plugin dbNSFP,/home/ertan/Desktop/programlar/ensembl-data/dbNSFP3.5/dbNSFP3.5.gz,ALL --plugin GO --plugin dbscSNV,/home/$USER/Desktop/programlar/ensembl-data/dbscSNV/dbscSNV1.1/dbscSNV1.1_GRCh37.txt.gz --plugin G2P,/home/$USER/Desktop/programlar/ensembl-data/G2P/DDG2P_17_5_2019.csv.gz -i /home/ertan/Desktop/pipeline/playground/NECESSARY/166527/166527.vcf -o /home/$USER/Desktop/deneme.csv
my idea error output may be:
Use of uninitialized value $readme_file in concatenation (.) or string at /home/ertan/.vep/Plugins/dbNSFP.pm line 218.
I tried to generate the GFF file for Drosophila melanogaster BDGP6.22 for offline use of the Phenotypes.pm plugin with VEP v96, but got an error.
Here is the command I ran:
vep -i drosophila.vcf --cache --dir_cache /fdb/VEP/96/cache --fasta Drosophila_melanogaster.BDGP6.22.dna.toplevel.fa --species drosophila_melanogaster --plugin Phenotypes
Here is the resulting error:
### Phenotypes plugin: This will take some time but it will only run once per species, assembly and release
### Phenotypes plugin: Querying database
WARNING: Failed to instantiate plugin Phenotypes: Can't call method "dbc" on an undefined value at /fdb/VEP/96/cache/Plugins/Phenotypes.pm line 196.
Earlier versions (88 through 95) finished normally, only 96 failed.
Hello,
I'm expecting error while using Conservation plugin with VEP 71 and Ensembl 71.
ERROR: Forked process failed
Use of uninitialized value in numeric ge (>=) at /home/likewise-open/SGNET/gmarco/.vep/Plugins/Conservation.pm line 105.
Attempting to update to VEP 88, but notice install is lacking PolyPhen_SIFT plugin.
Is this an intentional omission, or an oversight?
Hi,
the REVEL plugin installation instructions seem to be outdated.
The command given is:
./vep -i variations.vcf --plugin REVEL,/path/to/revel/data.tsv.gz
But at the given URL (https://sites.google.com/site/revelgenomics/downloads), only CSV files are downloadable.
Is there another download URL available for the TSV, or are there instructions how to convert the data?
Best,
Marc
The Downstream plugin only considers the coding sequence when producing the shifted sequence. What if I have a frame shift that removes the stop codon?
For instance, this one
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT
chr1 100484696 . A AC 5000 . . .
which affects transcript ENSMUST00000086738. The last bit of the coding sequence plus 3'UTR look like
TTC ATC TGA ACT ATT GTG TGG TCA TCT GGT CCT CTT TTT TGC AGA GGT TTC CAT CTC TTT TTC TTT TCT TTC TTT TAA
^ stop codon
and the variant changes it to
TTC ACT CTG AAC TAT TGT GTG GTC ATC TGG TCC (...)
Should a bit of the 3'UTR be translated to protein ? Likewise, should then stop_loss annotations also be supported by the plugin ?
Currently, all result from LD plugin share the same name of the result field (LinkedVariants), but it will be good to have a suffix like LinkedVariants_CEU.
It seems to me that it would be good to make LD plugin can generate result for more than one population at the same time. For example, allow LD plugin to be run --plugin LD, 1000GENOMES:phase_3:CDX,1000GENOMES:phase_3:FIN
.
Hello
I'm getting a bug like this when I'm running the G2P plugin
my command is:
./vep -i FMF-31.vcf -o /home/ertan/Desktop/deneme.csv -e --offline --cache --refseq --use_given_ref --fasta /home/ertan/Desktop/programlar/ensembl-data/fasta/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa --tab --show_ref_allele --af_1kg --af_esp --af_exac --plugin G2P,/home/ertan/Desktop/programlar/ensembl-data/G2P/DDG2P_17_5_2019.csv
my error output is:
Use of uninitialized value in concatenation (.) or string at /home/ertan/.vep/Plugins/G2P.pm line 972.
however this error does not interfere with the operation. The plugin is working, but why does it show a warning message.
I would like to ask one more question in this topic in order not to open different topics
is there any way to use the G2P plugin offline?
Current supported population are:
1000GENOMES:phase_3:ACB, 1000GENOMES:phase_3:ASW, 1000GENOMES:phase_3:BEB, 1000GENOMES:phase_3:CDX, 1000GENOMES:phase_3:CEU, 1000GENOMES:phase_3:CHB, 1000GENOMES:phase_3:CHS, 1000GENOMES:phase_3:CLM, 1000GENOMES:phase_3:ESN, 1000GENOMES:phase_3:FIN, 1000GENOMES:phase_3:GBR, 1000GENOMES:phase_3:GIH, 1000GENOMES:phase_3:IBS, 1000GENOMES:phase_3:ITU, 1000GENOMES:phase_3:JPT, 1000GENOMES:phase_3:KHV, 1000GENOMES:phase_3:LWK, 1000GENOMES:phase_3:GWD, 1000GENOMES:phase_3:MSL, 1000GENOMES:phase_3:MXL, 1000GENOMES:phase_3:PEL, 1000GENOMES:phase_3:PJL, 1000GENOMES:phase_3:PUR, 1000GENOMES:phase_3:STU, 1000GENOMES:phase_3:TSI, 1000GENOMES:phase_3:YRI
However, it doesn't support super populations:
AFR, African
AMR, Ad Mixed American
EAS, East Asian
EUR, European
SAS, South Asian
Hello. I'm trying to annotate some variants using the samecodon plugin. When I launch the vep, it does not give me any warning or any problem, but it always leaves the empty vcf column. Is there a dataset with which I can try the plugin? Thank you
Currently the instructions for dbNSFP states the dbNSFP zip file must be processed this way: (lines 46-49 of dbNSFP.pm)
> wget ftp://dbnsfp:[email protected]/dbNSFPv3.0b2a.zip
> unzip dbNSFPv3.0b2a.zip
> cat dbNSFP*chr* | bgzip -c > dbNSFP.gz
> tabix -s 1 -b 2 -e 2 dbNSFP.gz
However, if one processes the database this way the plugin will stop working and complain the header is missing. The correct method should be similar to the one described in lines 46-49 of dbscSNV.pm; I have used it on dbNSFP v3.2a:
> wget ftp://dbnsfp:[email protected]/dbNSFPv3.0b2a.zip
> unzip dbNSFPv3.2a.zip
> head -n1 dbNSFP3.2a_variant.chr1 > h
> cat dbNSFP*chr* | grep -v ^#chr | cat h - | bgzip -c > dbNSFP.gz
> tabix -s 1 -b 2 -e 2 dbNSFP.gz
Although I have to say the dbNSFP files often needs to be sorted by the user (sort -k1,1 -k2,2n
) before it can be successfully indexed.
Hi there,
What I need to do is filter out nondeleterious variants based on the SIFT/Polyphen2/CADD scores etc from dbNSFP database. So i used VEP dbNSFP.pm plugin to annotate my vcf files. When i check the result, i found out some variants my colleague predicted as deleterious using other tools were empty in every columns for dbNSFP. Then i used the information of "chr" "pos" "ref" "alt" these variants as keyword to grep SIFT/Polyphen2/CADD scores etc from dbNSFP database. This time, the output wasn't empty and even support these variants were deleterious.
To figure out the difference between VEP annotation from dbNSFP.plugin and directly grep from dbNSFP database, i used 25514 variants from chr10 by randomly selection to annotate by joint application of both the two methods. See details and results as below:
perl /home/perl5/Ensembl-APIs/ensembl-tools/scripts/variant_effect_predictor/variant_effect_predictor.pl --offline --fork 4 --pick --merged --everything --force_overwrite --vcf --vcf_info_field CSQ --plugin LoF,human_ancestor_fa:/home/Ref/human_ancestor.fa.gz --plugin dbNSFP,/home/Ref/dbNSFPv2.9.1/dbNSFP.gz,SIFT_score,SIFT_converted_rankscore,SIFT_pred,Polyphen2_HDIV_score,Polyphen2_HDIV_rankscore,Polyphen2_HDIV_pred,MutationTaster_score,MutationTaster_converted_rankscore,MutationTaster_pred,CADD_raw,CADD_raw_rankscore,CADD_phred,phyloP100way_vertebrate,phyloP100way_vertebrate_rankscore -i test.vcf -o test.ANN.vcf
echo -e "chr\tpos\tref\talt\tSIFT_score\tPolyphen2_HDIV_score\tMutationTaster_pred\tCADD_pred\tphyloP100way_vertabrate" |less >test.ANN.vcf_info
le test.ANN.vcf|awk '!/#/'|cut -f 1-2,4-5,8|perl -ne 'chomp;my @a=split/\t/,$_;my @b=split/;/,$a[4];my @c=split/\|/,$b[-1];if($c[77] ne "" || $c[74] ne "" || $c[70] ne "" || $c[66] ne "" || $c[78] ne ""){print "$a[0]\t$a[1]\t$a[2]\t$a[3]\t$c[77]\t$c[74]\t$c[70]\t$c[66]\t$c[78]\n"}'|cat test.ANN.vcf_h - >test.ANN.vcf_info
1.2 result
only 564 variants were annotated dbNSFP scores with non-null values sucsessfully.
2.using awk
2.1 script:
i just used the information of "chr" "pos" "ref" "alt" these variants as keyword to grep SIFT/Polyphen2/CADD scores etc from dbNSFP database. for example:
awk '$1~/^10$/ && $2~/^100017453$/ && $3~/^T$/ && $4~/^G$/' /home/Ref/dbNSFPv2.9.1/dbNSFP.chr10.txt >> test.dbNSFPv2.9.1.vcf
le test.dbNSFPv2.9.1.vcf|awk -F "\t" '{print"chr"$1"\t"$2"\t"$3"\t"$4"\t"$27"\t"$30"\t"$41"\t"$41"\t"$70}'|cat test.vcf_info_h ->test.dbNSFPv2.9.1.vcf_info
2.2 result
704 variants were annotated dbNSFP scores with non-null values sucsessfully. Beside, 138 variants has more than one row of annotation.
There is my confusion:
1) which information does dbNSFP.pm use as key to annotate input files, especially when there are more than one annotation for chr_pos_refallele_altallele.
2) why i cannot annotate dbNSFP scores for some variants while their corresponding informations in dbNSFP database exist.
Any comments and suggestion will be appreciated.
I found out that Variant Effect Predictor with ProteinSeqs plugin and --fork does not work correctly.
Running the following command:
/software/variant_effect_predictor/ensembl-tools-release-85/scripts/variant_effect_predictor/variant_effect_predictor.pl \
--species homo_sapiens \
--format vcf \
--coding_only \
--symbol \
--protein \
--uniprot \
--plugin ProteinSeqs,${sample}.prot_seq.reference.fa,${sample}.prot_seq.mutated.fa \
--custom ${sample}.nofirst1bp.dedup.filtered.depth.bed.gz,DP \
--custom ${sample}.nofirst1bp.dedup.filtered.vaf.bed.gz,VAF \
--input ${sample}.nofirst1bp.dedup.filtered.vcf.gz \
--output ${sample}.nofirst1bp.dedup.filtered.annotated.coding_only.VEP \
--cache \
--offline \
--fork 5
Results in fasta files like this:
>ENSP00000364569
MTQLMKAAKSGTKDGLEKTRMAVMRKVSFLHRKDVLGDSEEEDMGLLEVSVSDIKPPAPELGPMPEGLSPQQVVRRHILG
SIVQSEGSYVESLKRILQDYRNPLMEMEPKALSARKCQVVFFRVKEILHCHSMFQIALSSRVAEWDSTEKIGDLFVASFS
KSMVLDVYSDYVNNFTSAMSIIKKACLTKPAFLEFLKRRQVCSPDRVTLYGLMVKPIQRFPQFILLLQDMLKNTPRGHPD
RLSLQLALTELETLAEKLNEQKRLADQVAEIQQLTKSVSDRSSLNKLLTSGQRQLLLCETLTETVY>ENSP00000368015
MRRATVEREMELRHKNEMLRVETEARARAKAERENADIIREQIRLKASEHRQTVLESIRTAGTLFGEGFRAFVTDRDKVT
ATVNIFIKQGWQVAERQHHFRRRRWADHEVRRSRSSW
>ENSP00000234800
MSSSVKTPALEELVPGSEEKPKGRSPLSWGSLFGHRSEKIVFAKSDGGTDENVLTVTITETTVIESDLGVWSSRALLYLT
LWFFFSFCTLFLNKYILSLLGGEPSMLGAVQMLSTTVIGCVKTLVPCCLYQHKARLSYPPNFLMTMLFVGLMRFATVVLG
LVSLKNVAVSFAETVKSSAPIFTVIMSRMILGEYTGLLVNLSLIPVMGGLALCTATEISFNVLGFSAALSTNIMDCLQNV
FSKKLLSGDKYRFSAPELQFYTSAAAVAMLVPARVFFTDVPVIGRSGKSFSYNQDVVLLLLTDGVLFHLQSVTAYALMGK
ISPVTFSVASTVKHALSIWLSVIVFGNKITSLSAVGTALVTVGVLLYNKARQHQQEALQSLAAATGRAPDDTVEPLLPQD
PRQHP
It looks like because of the forking the file is partially overwriten, as the file pointer
is not synchronised between the forks.
The following command (without --fork) does not generate this problem.
/software/variant_effect_predictor/ensembl-tools-release-85/scripts/variant_effect_predictor/variant_effect_predictor.pl \
--species homo_sapiens \
--format vcf \
--coding_only \
--symbol \
--protein \
--uniprot \
--plugin ProteinSeqs,${sample}.prot_seq.reference.fa,${sample}.prot_seq.mutated.fa \
--custom ${sample}.nofirst1bp.dedup.filtered.depth.bed.gz,DP \
--custom ${sample}.nofirst1bp.dedup.filtered.vaf.bed.gz,VAF \
--input ${sample}.nofirst1bp.dedup.filtered.vcf.gz \
--output ${sample}.nofirst1bp.dedup.filtered.annotated.coding_only.VEP \
--cache \
--offline
Now the FASTA file looks like this:
>ENSP00000368015
MRRATVEREMELRHKNEMLRVETEARARAKAERENADIIREQIRLKASEHRQTVLESIRTAGTLFGEGFRAFVTDRDKVT
ATVNIFIKQGWQVAERQHHFRRRRWADHEVRRSRSSW
>ENSP00000234800
MSSSVKTPALEELVPGSEEKPKGRSPLSWGSLFGHRSEKIVFAKSDGGTDENVLTVTITETTVIESDLGVWSSRALLYLT
LWFFFSFCTLFLNKYILSLLGGEPSMLGAVQMLSTTVIGCVKTLVPCCLYQHKARLSYPPNFLMTMLFVGLMRFATVVLG
LVSLKNVAVSFAETVKSSAPIFTVIMSRMILGEYTGLLVNLSLIPVMGGLALCTATEISFNVLGFSAALSTNIMDCLQNV
FSKKLLSGDKYRFSAPELQFYTSAAAVAMLVPARVFFTDVPVIGRSGKSFSYNQDVVLLLLTDGVLFHLQSVTAYALMGK
ISPVTFSVASTVKHALSIWLSVIVFGNKITSLSAVGTALVTVGVLLYNKARQHQQEALQSLAAATGRAPDDTVEPLLPQD
PRQHP
>ENSP00000367931
MSSSVKTPALEELVPGSEEKPKGRSPLSWGSLFGHRSEKIVFAKSDGGTDENVLTVTITETTVIESDLGVWSSRALLYLT
LWFFFSFCTLFLNKYILSLLGGEPSMLGAVQMLSTTVIGCVKTLVPCCLYQHKARLSYPPNFLMTMLFVGLMRFATVVLG
LVSLKNVAVSFAETVKSSAPIFTVIMSRMILGEYTGLLVNLSLIPVMGGLALCTATEISFNVLGFSAALSTNIMDCLQNV
FSKKLLSGDKYRFSAPELQFYTSAAAVAMLVPARVFFTDVPVIGRSGKSFSYNQDVVLLLLTDGVLFHLQSVTAYALMGK
ISPVTFSVASTVKHALSIWLSVIVFGNKITSLSAVGTALVTVGVLLYNKARQHQQEALQSLAAATGRAPDDTVEPLLPQD
PRQHP
...
>ENSP00000364564
MASSNPPPQPAIGDQLVPGVPGPSSEAEDDPGEAFEFDDSDDEEDTSAALGVPSLAPERDTDPPLIHLDSIPVTDPDPAA
APPGTGVPAWVSNGDAADAAFSGARHSSWKRKSSRRIDRFTFPALEEDVIYDDVPCESPDAHQPGAERNLLYEDAHRAGA
PRQAEDLGWSSSEFESYSEDSGEEAKPEVEVEPAKHRVSFQPKMTQLMKAAKSGTKDGLEKTRMAVMRKVSFLHRKDVLG
DSEEEDMGLLEVSVSDIKPPAPELGPMPEGLSPQQVVRRHILGSIVQSEGSYVESLKRILQDYRNPLMEMEPKALSARKC
QVVFFRVKEILHCHSMFQIALSSRVAEWDSTEKIGDLFVASFSKSMVLDVYSDYVNNFTSAMSIIKKACLTKPAFLEFLK
RRQVCSPDRVTLYGLMVKPIQRFPQFILLLQDMLKNTPRGHPDRLSLQLALTELETLAEKLNEQKRLADQVAEIQQLTKS
VSDRSSLNKLLTSGQRQLLLCETLTETVYGDRGQLIKSKERRVFLLNDMLVCANINFKPANHRGQLEISSLVPLGPKYVV
KWNTALPQVQVVEVGQDGGTYDKDNVLIQHSGAKKASASGQAQNKVYLGPPRLFQELQDLQKDLAVVEQITLLISTLHGT
YQNLNMTVAQDWCLALQRLMRVKEEEIHSANKCRLRLLLPGKPDKSGRPISFMVVFITPNPLSKISWVNRLHLAKIGLRE
ENQPGWLCPDEDKKSKAPFWCPILACCIPAFSSRALSLQLGALVHSPVNCPLLGFSAVSTSLPQGYLWVGGGQEGAGGQV
EIFSLNRPSPRTVKSFPLAAPVLCMEYIPELEEEAESRDESPTVADPSATVHPTICLGLQDGSILLYSSVDTGTQCLVSC
RSPGLQPVLCLRHSPFHLLAGLQDGTLAAYPRTSGGVLWDLESPPVCLTVGPGPVRTLLSLEDAVWASCGPWVTVLEATT
LQPQQSFEAHQDEAVSVTHMVKAGSGVWMAFSSGTSIRLFHTETLEHLQEINIATRTTFLLPGQKHLCVTSLLICQGLLW
VGTDQGVIVLLPVPRLEGIPKITGKGMVSLNGHCGPVAFLAVATSILAPDILRSDQEEAEGPRAEEDKPDGQAHEPMPDS
HVGRELTRKKGILLQYRLRSTAHLPGPLLSMREPAPADGAALEHSEEDGSIYEMADDPDIWVRSRPCARDAHRKEICSVA
IISGGQGYRNFGSALGSSGRQAPCGETDSTLLIWQVPLML
>ENSP00000364569
MTQLMKAAKSGTKDGLEKTRMAVMRKVSFLHRKDVLGDSEEEDMGLLEVSVSDIKPPAPELGPMPEGLSPQQVVRRHILG
SIVQSEGSYVESLKRILQDYRNPLMEMEPKALSARKCQVVFFRVKEILHCHSMFQIALSSRVAEWDSTEKIGDLFVASFS
KSMVLDVYSDYVNNFTSAMSIIKKACLTKPAFLEFLKRRQVCSPDRVTLYGLMVKPIQRFPQFILLLQDMLKNTPRGHPD
RLSLQLALTELETLAEKLNEQKRLADQVAEIQQLTKSVSDRSSLNKLLTSGQRQLLLCETLTETVYGDRGQLIKSKERRV
FLLNDMLVCANINFKPANHRGQLEISSLVPLGPKYVVKWNTALPQVQVVEVGQDGGTYDKDNVLIQHSGAKKASASGQAQ
NKVYLGPPRLFQELQDLQKDLAVVEQITLLISTLHGTYQNLNMTVAQDWCLALQRLMRVKEEEIHSANKCRLRLLLPGKP
DKGTWSDM
>ENSP00000394621
MASSNPPPQPAIGDQLVPGVPGPSSEAEDDPGEAFEFDDSDDEEDTSAALGVPSLAPERDTDPPLIHLDSIPVTDPDPAA
APPGTGVPAWVSNGDAADAAFSGARHSSWKRKSSRRIDRFTFPALEEDVIYDDVPCESPDAHQPGAERNLLYEDAHRAGA
PRQAEDLGWSSSEFESYSEDSGEEAKPEVEVEPAKHRVSFQPKLSPDLTRLKERYARTKRDILALRVGGRDMQELKHKYD
CKMTQLMKAAKSGTKDGLEKTRMAVMRKVSFLHRKDVLGDSEEEDMGLLEVSVSDIKPPAPELGPMPEGLSPQQVVRRHI
LGSIVQSEGSYVESLKRILQDYRNPLMEMEPKALSARKCQVVFFRVKEILHCHSMFQIALSSRVAEWDSTEKIGDLFVAS
FSKSMVLDVYSDYVNNFTSAMSIIKKACLTKPAFLEFLKRRQVCSPDRVTLYGLMVKPIQRFPQFILLLQDMLKNTPRGH
PDRLSLQLALTELETLAEKLNEQKRLADQVAEIQQLTKSVSDRSSLNKLLTSGQRQLLLCETLTETVYGDRGQLIKSKER
RVFLLNDMLVCANINFKGQLEISSLVPLGPKYVVKWNTALPQVQVVEVGQDGGTYDKDNVLIQHSGAKKASASGQAQNKV
YLGPPRLFQELQDLQKDLAVVEQITLLISTLHGTYQNLNMTVAQDWCLALQRLMRVKEEEIHSANKCRLRLLLPGKPDKS
GRPISFMVVFITPNPLSKISWVNRLHLAKIGLREENQPGWLCPDEDKKSKAPFWCPILACCIPAFSSRALSLQLGALVHS
PVNCPLLGFSAVSTSLPQGYLWVGGGQEGAGGQVEIFSLNRPSPRTVKSFPLAAPVLCMEYIPELEEEAESRDESPTVAD
PSATVHPTICLGLQDGSILLYSSVDTGTQCLVSCRSPGLQPVLCLRHSPFHLLAGLQDGTLAAYPRTSGGVLWDLESPPV
CLTVGPGPVRTLLSLEDAVWASCGPWVTVLEATTLQPQQSFEAHQDEAVSVTHMVKAGSGVWMAFSSGTSIRLFHTETLE
HLQEINIATRTTFLLPDRSLIKCSPRA
I tried to enable GO plugin in VEP
the plugin query database over internet
that make the speed of processing variants slow down from 2500 vars/sec to 19 vars/sec
is it possible to query a local-installed database or cache?
and which mysql database I should download?
Hi,
I just would like to ask is there any plugin for VEP that can further annotate variant with information in database ClinVar and OMIM? Or I need to write one by myself?
Thanks a lot
Joyce
Why is there no Wildtype.pm plugin? @
Hello:
When I use the dbNSFP plugin, it writes all the results in a single line in my output file. that is, dbNSFP disrupts the readability of the output, making it dysfunctional. not only their output, but all other output writes in a single line. I don't know what it says because it doesn't have readability.
I'm struggling to get the LoF plugin running in VEP (88). I've specified my plugin directory and VEP is looking there for the plugins. Everytime I run VEP during initialisiation, I get the following:
Failed to compile plugin LoF:
2017-07-19 16:44:34 - Failed to compile plugin LoF: Can't open me2x3acc1!
I've grabbed the maxEntScan stuff from GitHub and put that in the Plugin directory, but still no luck.
Any suggestions?
The ExAC plugin doesn't give freq on multiallelic sites, such as this:
1 1222267 rs11260579 G T
How can I solve such an error when I run the G2P plugin?
ERROR:
WARNING: Failed to instantiate plugin G2P:
-------------------- EXCEPTION --------------------
MSG: Could not get adaptor VCFCollection for homo_sapiens variation
STACK Bio::EnsEMBL::DBSQL::DBAdaptor::AUTOLOAD /home/ertan/Desktop/vep/ensembl-vep/Bio/EnsEMBL/DBSQL/DBAdaptor.pm:993
STACK G2P::new /home/ertan/.vep/Plugins/G2P.pm:330
STACK (eval) /home/ertan/Desktop/vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:895
STACK Bio::EnsEMBL::VEP::Runner::get_all_Plugins /home/ertan/Desktop/vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:894
STACK Bio::EnsEMBL::VEP::Runner::init /home/ertan/Desktop/vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:120
STACK Bio::EnsEMBL::VEP::Runner::run /home/ertan/Desktop/vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:194
STACK toplevel ./vep:218
Date (localtime) = Sat May 18 09:27:31 2019
Ensembl API version = 96
COMMAND:
./vep -i FMF-31.vcf -o /home/ertan/Desktop/deneme.txt --plugin G2P,file=/home/ertan/Desktop/DDG2P_17_5_2019.csv --cache --port 3337 --force_overwrite
Not sure if 85 is in some sort of pre-release state and this is currently intentional, but doesn't look right.
Hi,
let's start from beginning:
wget ftp://dbnsfp:[email protected]/dbNSFP4.0b1a.zip
unzip dbNSFP4.0b1a.zip
cd dbNSFP40b1a
zcat dbNSFP4.0b1a_variant.chr*.gz | grep -v ^#chr | awk '$8 != "."' | sort -T /mnt/tmp -k8,8 -k9,9n - | cat h - | bgzip -c > dbNSFP4.0b1a_variant.chr.ALL.gz
tabix -s 8 -b 9 -e 9 dbNSFP4.0b1a_variant.chr.ALL.gz
let's check the position directly from dbNSFP4.0b1a_variant.chr.ALL.gz file:
tabix hg19_dbNSFP4.0b1a_variant.chr.ALL.gz 21:44483184-44483184
(...and here get plenty of annotations).
my VCF file line:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT XXXXY
21 44483184 21_44483184_A_G A G . . . GT 0/1
docker run --rm -t -i \ -v /mnt/ssd_01/VEP/vepCache:/opt/vep/.vep \ -v /mnt/ssd_01/DATA:/home/vep/DATA \ -v /mnt/sata_03/DATABASES:/home/vep/DATABASES \ -v /mnt/ssd_01/VEP/runVEP:/home/vep/runVEP \ -v /mnt/ssd_01/refs/:/home/vep/refs ensemblorg/ensembl-vep:94 vep \ --format "vcf" \ -i $inVepFile \ -o $outVepFile \ --no_stats \ --cache \ --fasta /home/vep/refs/hs37d5_noHap.fa --port 3337 --ASSEMBLY GRCh37 \ --refseq --tab \ --use_transcript_ref \ --variant_class --sift b --polyphen b --humdiv --gene_phenotype --regulatory --no_escape \ --hgvs --hgvsg --shift_hgvs 1 --symbol --protein --ccds --uniprot --numbers --domains --canonical --biotype \ --af --af_1kg --af_esp --af_gnomad --max_af --pubmed \ --exclude_predicted \ --buffer_size 50 \ --plugin dbNSFP,/home/vep/DATABASES/dbNSFP40b1a/hg19_dbNSFP4.0b1a_variant.chr.ALL.gz,ref,alt,aaref,aaalt,rs_dbSNP151,hg19_chr,"hg19_pos(1-based)",hg18_chr,"hg18_pos(1-based)",aapos,genename,Ensembl_geneid,Ensembl_transcriptid,Ensembl_proteinid,Uniprot_acc,Uniprot_entry,APPRIS,GENCODE_basic,TSL,VEP_canonical,cds_strand,refcodon,codonpos,codon_degeneracy,Ancestral_allele,AltaiNeandertal,Denisova,VindijiaNeandertal,SIFT_score,SIFT_converted_rankscore,SIFT_pred,SIFT4G_score,SIFT4G_converted_rankscore,SIFT4G_pred,Polyphen2_HDIV_score,Polyphen2_HDIV_rankscore,Polyphen2_HDIV_pred,Polyphen2_HVAR_score,Polyphen2_HVAR_rankscore,Polyphen2_HVAR_pred,LRT_score,LRT_converted_rankscore,LRT_pred,LRT_Omega,MutationTaster_score,MutationTaster_converted_rankscore,MutationTaster_pred,MutationTaster_model,MutationTaster_AAE,MutationAssessor_score,MutationAssessor_rankscore,MutationAssessor_pred,FATHMM_score,FATHMM_converted_rankscore,FATHMM_pred,PROVEAN_score,PROVEAN_converted_rankscore,PROVEAN_pred,VEST4_score,VEST4_rankscore,MetaSVM_score,MetaSVM_rankscore,MetaSVM_pred,MetaLR_score,MetaLR_rankscore,MetaLR_pred,Reliability_index,M-CAP_score,M-CAP_rankscore,M-CAP_pred,REVEL_score,REVEL_rankscore,MutPred_score,MutPred_rankscore,MutPred_protID,MutPred_AAchange,MutPred_Top5features,MVP_score,MVP_rankscore,MPC_score,MPC_rankscore,PrimateAI_score,PrimateAI_rankscore,PrimateAI_pred,DEOGEN2_score,DEOGEN2_rankscore,DEOGEN2_pred,Aloft_Fraction_transcripts_affected,Aloft_prob_Tolerant,Aloft_prob_Recessive,Aloft_prob_Dominant,Aloft_pred,Aloft_Confidence,CADD_raw,CADD_raw_rankscore,CADD_phred,DANN_score,DANN_rankscore,fathmm-MKL_coding_score,fathmm-MKL_coding_rankscore,fathmm-MKL_coding_pred,fathmm-MKL_coding_group,fathmm-XF_coding_score,fathmm-XF_coding_rankscore,fathmm-XF_coding_pred,Eigen-raw_coding,Eigen-raw_coding_rankscore,Eigen-pred_coding,Eigen-PC-raw_coding,Eigen-PC-raw_coding_rankscore,Eigen-PC-phred_coding,GenoCanyon_score,GenoCanyon_rankscore,integrated_fitCons_score,integrated_fitCons_rankscore,integrated_confidence_value,GM12878_fitCons_score,GM12878_fitCons_rankscore,GM12878_confidence_value,H1-hESC_fitCons_score,H1-hESC_fitCons_rankscore,H1-hESC_confidence_value,HUVEC_fitCons_score,HUVEC_fitCons_rankscore,HUVEC_confidence_value,LINSIGHT,LINSIGHT_rankscore,GERP++_NR,GERP++_RS,GERP++_RS_rankscore,phyloP100way_vertebrate,phyloP100way_vertebrate_rankscore,phyloP30way_mammalian,phyloP30way_mammalian_rankscore,phyloP17way_primate,phyloP17way_primate_rankscore,phastCons100way_vertebrate,phastCons100way_vertebrate_rankscore,phastCons30way_mammalian,phastCons30way_mammalian_rankscore,phastCons17way_primate,phastCons17way_primate_rankscore,29way_pi,29way_logOdds,29way_logOdds_rankscore,bStatistic,bStatistic_rankscore,1000Gp3_AC,1000Gp3_AF,1000Gp3_EUR_AC,1000Gp3_EUR_AF,TWINSUK_AC,TWINSUK_AF,ALSPAC_AC,ALSPAC_AF,UK10K_AC,UK10K_AF,ESP6500_AA_AC,ESP6500_AA_AF,ESP6500_EA_AC,ESP6500_EA_AF,ExAC_AC,ExAC_AF,ExAC_Adj_AC,ExAC_Adj_AF,ExAC_NFE_AC,ExAC_NFE_AF,gnomAD_exomes_flag,gnomAD_exomes_AC,gnomAD_exomes_AN,gnomAD_exomes_AF,gnomAD_exomes_nhomalt,gnomAD_exomes_ASJ_AC,gnomAD_exomes_ASJ_AN,gnomAD_exomes_ASJ_AF,gnomAD_exomes_ASJ_nhomalt,gnomAD_exomes_NFE_AN,gnomAD_exomes_NFE_AF,gnomAD_exomes_NFE_nhomalt,gnomAD_exomes_POPMAX_AC,gnomAD_exomes_POPMAX_AN,gnomAD_exomes_POPMAX_AF,gnomAD_exomes_POPMAX_nhomalt,gnomAD_genomes_flag,gnomAD_genomes_AC,gnomAD_genomes_AN,gnomAD_genomes_AF,gnomAD_genomes_nhomalt,gnomAD_genomes_ASJ_AC,gnomAD_genomes_ASJ_AN,gnomAD_genomes_ASJ_AF,gnomAD_genomes_ASJ_nhomalt,gnomAD_genomes_NFE_AC,gnomAD_genomes_NFE_AN,gnomAD_genomes_NFE_AF,gnomAD_genomes_NFE_nhomalt,gnomAD_genomes_POPMAX_AC,gnomAD_genomes_POPMAX_AN,gnomAD_genomes_POPMAX_AF,gnomAD_genomes_POPMAX_nhomalt,clinvar_rs,clinvar_clnsig,clinvar_trait,clinvar_review,clinvar_hgvs,clinvar_var_source,Interpro_domain,GTEx_V7_gene,GTEx_V7_tissue,Geuvadis_eQTL_target_gene \ --plugin dbscSNV,/home/vep/DATA/dbscSNP/dbscSNV1.1_GRCh37.txt.gz \ --plugin SpliceRegion \ --plugin MaxEntScan,/home/vep/DATA/MaxEntScan \ --plugin GeneSplicer,/home/vep/DATA/GeneSplicer/sources/genesplicer,/home/vep/DATA/GeneSplicer/human \ --plugin ExACpLI,/home/vep/DATA/ExACpLI/ExACpLI_values.txt \ --plugin Phenotypes \ -custom /home/vep/DATA/gnomAD/gnomad.genomes.r2.0.1.sites.noVEP.vcf.gz,gnomADg,vcf,exact,0,AF_NFE,POPMAX,AF \ --fork 40
After running VEP with dbNSFP in this case (21_44483184_A_G) I don't get any annotations.
I happens sometimes...
is there some error in the dbNSFP module?
EDIT:
mind that I use bash line seprator "" to break long bash command into several lines
Hi -
We are just starting to use VEP and the CADD plugin to annotate our variants. I noticed that our InDels are not being annotated (only the snps) with CADD Scores. I double checked, and the InDELs are present in the CADD InDels.tsv.gz file that we are specifying when we run the CADD plugin: --plugin CADD,whole_genome_SNVs.tsv.gz,InDels.tsv.gz
I am attaching a very small VCF file (only contains 3 indels, each of which are present in the CADD v 1.3 InDels.tsv.gz file we are running with). I was hoping you might have some recommendations on how we can leverage the CADD plugin to annotate indels when there is a match with the InDels.tsv.gz file.
Thanks,
Ann
I have developed VEP plugin and want to add this plugin into VEP. However, there is no documentation that explains the contribution process.
Should I just open pull request which adds my plugin as single Perl file into the vep_plugin project? Are there other guidelines which our plugin or algorithm need to meet? Can you give further information about the contribution and review processes?
Hello.
I cannot find any info if GXA plugin is still supported by recent VEP releases.
Last time it was in 91 release plugin_config.txt#L842.
Is it still supported by VEP or it is still in repository for historical reasons?
Hi,
the error:
WARNING: Failed to compile plugin NearestExonJB: Can't locate NearestExonJB.pm in @INC (you may need to install the NearestExonJB module) (@INC contains: /opt/vep/.vep/Plugins /opt/vep/src/ensembl-vep/modules /opt/vep/src/ensembl-vep /etc/perl /usr/local/lib/x86_64-linux-gnu/perl/5.26.1 /usr/local/share/perl/5.26.1 /usr/lib/x86_64-linux-gnu/perl5/5.26 /usr/share/perl5 /usr/lib/x86_64-linux-gnu/perl/5.26 /usr/share/perl/5.26 /usr/local/lib/site_perl /usr/lib/x86_64-linux-gnu/perl-base) at (eval 51) line 2.
BEGIN failed--compilation aborted at (eval 51) line 2.
2019-04-17 09:30:17 - INFO: BAM-edited cache detected, enabling --use_transcript_ref; use --use_given_ref to override this
the docker VEP96 command:
docker run --rm -t -i \
-v /mnt/ssd_01/VEP/vepCache:/opt/vep/.vep \
-v /mnt/ssd_01/DATA:/home/vep/DATA \
-v /mnt/sata_03/DATABASES:/home/vep/DATABASES \
-v $PWD:$PWD \
-v /mnt/ssd_01/refs/:/home/vep/refs ensemblorg/ensembl-vep vep \
--format "vcf" \
-i $inFileFull \
-o $outVepFile \
--no_stats \
--cache \
--fasta /home/vep/refs/hs37d5_noHap.fa --port 3337 --ASSEMBLY GRCh37 \
--merged --tab \
--use_transcript_ref \
--variant_class --sift b --polyphen b --humdiv --gene_phenotype --regulatory \
--hgvs --hgvsg --shift_hgvs 1 --symbol --protein --ccds --uniprot --numbers --domains --canonical --biotype \
--af --af_1kg --af_esp --af_gnomad --max_af --pubmed \
--exclude_predicted \
--buffer_size 50 \
--plugin NearestExonJB
the cheers:
cheers
Hi,
The LofTee plugin has a missing dependency when downloading it with the automated installer.
current config is
# LOFTEE
# Requires LoFtool_scores.txt file as first param (available in VEP_plugins GitHub repo)
{
"key" => "LoF",
"helptip" => "LOFTEE identifies LoF (loss-of-function) variation",
"available" => 0,
"enabled" => 0,
"section" => "Pathogenicity predictions",
"plugin_url" => "https://raw.githubusercontent.com/konradjk/loftee/master/LoF.pm",
"requires_data" => 1,
"requires_install" => 1,
"params" => [
"@*"
]
},
But the plugin depends on splice_module.pl
which can be found in the same repo. Can this file be downloaded too when installing the loftee plugin?
excerpt from LoF.pm:
package LoF;
require "splice_module.pl";
I'm not familiar with the inner workings of the download process, but could something like this be added?
"plugin_url" => "https://raw.githubusercontent.com/konradjk/loftee/master/LoF.pm",
"plugin_depend_url" => "https://raw.githubusercontent.com/konradjk/loftee/master/splice_module.pl",
Thanks
M
Hello,
I ran into several whitespaces and commas when trying to parse a VEP-annotated VCF today. From what I can see, at least two columns from dbNSFP (Interpro_domain and FATHMM_score) when used for annotation with the dbNSFP plugin result in whitespaces and commas being introduced in the resulting VCF.
Example fields:
FATHMM_score -2.2,-2.2
Interpro_domain TNFR/CD27/30/40/95cysteine-richregion(1),
Interpro_domain GPCR, rhodopsin-likesuperfamily(1),
clinvar_trait Microcytic anemia
I'm assuming these are simply pulled in from the original database files, however is this intended behaviour?
Thank you for your time.
EDIT:
##VEP="v87" time="2017-03-10 13:08:10" cache="/opt/ensembl-vep/cache/homo_sapiens/87_GRCh37" ensembl=87.f547798 ensembl-io=87.48cb128 ensembl-funcgen=87.0577dd0 ensembl-variation=87.661e72c 1000genomes="phase3" COSMIC="78" ClinVar="201610" ESP="20141103" ExAC="0.3" HGMD-PUBLIC="20162" assembly="GRCh37.p13" dbSNP="147" gencode="GENCODE 19" genebuild="2011-04" polyphen="2.2.2" regbuild="1.0" sift="sift5.2.2"
The annotation was done with ensemble-vep prerelease, so the comma is possibly fixed? I'll test with a new build.
Hello, I followed the command in order to download dbNSFP data,
wget -c ftp://dbnsfp:[email protected]/dbNSFPv3.5a.zip
unzip dbNSFPv3.5a.zip
head -n1 dbNSFP3.5a_variant.chr1 > h
#for GRCh37/hg19 data
cat dbNSFP3.5a_variant.chr* | grep -v ^#chr | awk '$8 != "."' | sort -k8,8 -k9,9n - | cat h - | bgzip -c > dbNSFP_hg19.gz
but I'm stuck at this last line, I lauched it and let it run for 4 days, still not finished, is it normal ?
PS; computer has 32go RAM, 3 To .. so I don't understand why so slow
The ExAC plugin appears to skip over indels and not annotate them. I believe the following change fixes the problem.
170c170
< next unless $vcf_vf->{start} == $vf->{end} && $vcf_vf->{start} == $vf->{end};
---
> next unless $vcf_vf->{start} == $vf->{start} && $vcf_vf->{end} == $vf->{end};
Hi,
I just started to use VEP with CADD plugin and found out that its output in VCF in 'Extra' column is almost always in a different order. Why is that and couldn't it be fixed?
Thanks,
Petr
running VEP version 84,
Argument "10404,66738" isn't numeric in division (/) at /VEP/.vep/Plugins/ExAC.pm line 249, <TABIX> line 2.
Seeing multiple instances of this. Assuming its a multi-allelic variant? Using the databases:
ExAC.r0.3.1.sites.vep.vcf.gz
and the associated tbi as downloaded from exac.
For example, below is a record from dbNSFPv2.9
You can see values such as 5|5|5
and Eichsfeld_type_congenital_muscular_dystrophy|Congenital_myopathy_with_fiber_type_disproportion|not_provided
.
1 26136244 G A G S rs121908188 26008831 . 25809753 SEPN1 Q9NZV5-2;Q9NZV5 .;SELN_HUMAN 281;315 . + GGC -29.8321 1 0 G ENSG00000162430 ENST00000361547;ENST00000354177;ENST00000374315 315;281;281 ENSP00000346109:G281S ENSP00000355141:G315S;ENSP00000346109:G281S;ENSP00000363434:G281S 0.01;0.01;0.01 0.55262 D;D;D 1.0;1.0 0.89917 D;D 0.999;0.999 0.91635 D;D 0.000000 0.85682 D 1.000 0.70825 A 2.215 0.72894 M -3.89;-3.91;-3.9 0.96104 D;D;D 1.0365 0.98010 D 0.9153 0.97411 D 9 0.981 0.98373 -3.39;-4.44;-4.44 0.77401 D;D;D 2.685513 0.65476 22.5 4.84 4.84 0.62591 0.462000 0.41574 2.527000 0.85204 9.623000 0.98386 0.840000 0.47671 1.000000 0.80357 1.000000 0.71417 0.0:0.0:1.0:0.0 18.1324 0.89605 0.000000 . . . . . . . . . . . 2.33E-4 5.88E-4 0 0.00000 4 0.00000 21 1.733e-04 21 1.740e-04 1.020e-04 0 0 0 0 2 3.024e-04 18 2.700e-04 0 0 rs121908188 5|5|5 Eichsfeld_type_congenital_muscular_dystrophy|Congenital_myopathy_with_fiber_type_disproportion|not_provided . .
Each of these values currently ends up in the CSQ field as-is and so is parsed as several CSQ values instead of 1.
When trying to find SNPs in LD with the requested SNP for EUR population, an error occurs:
./vep -id rs1042779 -o /opt/vep/.vep/output_with_LD.txt --tab --cache --merge --force_overwrite --plugin LD,1000GENOMES:phase_3:EUR,0.4
WARNING: Failed to instantiate plugin LD: Invalid population '1000GENOMES:phase_3:EUR'; valid populations are:
<...>
I rechecked using the REST API:
https://rest.ensembl.org/ld/human/rs1042779/1000GENOMES:phase_3:EUR?r2=0.4;content-type=application/json
REST API gives acceptable results.
Based on the description in the script
Line 46 in d374e49
However, the ftp isn't live now.
The only resource I can find now is http://fathmm.biocompute.org.uk/database/fathmm.v2.3.SQL.gz (in there github repo: https://github.com/HAShihab/fathmm), but the version number is different.
What is the interpretation of MaxEntScan scores?
for wariant: 11_117863938_C_G
I get such numbers.
MaxEntScan_alt 3.8294583218888
MaxEntScan_diff 1.02692945321821
MaxEntScan_ref 4.85638777510701
I guess these numbers are for splicing acceptor, as we have NM_001558.3:c.368-18C>G
Is there any min/max value? I can't find any clear answer...
Thanks
Damian
Hi!
For the past few weeks, I've suddenly been receiving faulty API calls in this form:
/search.php?build=hg19&position=chrchr4:21238957_21238957
Note the double chr
in the position.
Since the LOVD module doesn't pass a user agent to our services, I'm not 100% sure these calls are from the VEP module, however:
Did perhaps the VCF file parser change recently?
I would be very grateful if you could look into whether or not, through updates in the VCF parser, this bug got introduced? Also, if indeed this is something you're fixing on your side, could you please set a user agent string to something that would make recognizing VEP calls easy?
Thanks!
The latest release of fathmm.py no longer requires the -i and -o flags therefore the VEP plugin needs to be edited.
The line to change i:
my $fathmm_err = cd $command_dir; $command -i $tmp_in_file -o $tmp_out_file
;
to this:
my $fathmm_err = cd $command_dir; $command $tmp_in_file $tmp_out_file
;
Apologies I am relatively new to GitHub and have not yet learned how to "submit patches" (if that's the correct terminology?!). I will learn but am off on holiday for 2 weeks and have run out of time!
Cheers
Chris
I need to dig through the source, but I am seeing empty columns for everything except ExAC_AF. The source ExAC VCF is directly from the ExAC download page.
Hi Sir/Madam,
I am currently testing out the VEP program that was download from (https://github.com/Ensembl/ensembl-tools/archive/release/84.zip).
The VEP program ran to completion and generate reasonable outputs (including ExAC frequencies). However, I noticed that I am getting the following repeated error lines:
Plugin 'ExAC' went wrong: Illegal division by zero at /root/vep/Plugins/ExAC.pm line 253, line 2.
Plugin 'ExAC' went wrong: Illegal division by zero at /root/vep/Plugins/ExAC.pm line 253, line 4.
Plugin 'ExAC' went wrong: Illegal division by zero at /root/vep/Plugins/ExAC.pm line 253, line 6.
Plugin 'ExAC' went wrong: Illegal division by zero at /root/vep/Plugins/ExAC.pm line 253, line 8.
Plugin 'ExAC' went wrong: Illegal division by zero at /root/vep/Plugins/ExAC.pm line 253, line 10.
Plugin 'ExAC' went wrong: Illegal division by zero at /root/vep/Plugins/ExAC.pm line 253, line 1.
Plugin 'ExAC' went wrong: Illegal division by zero at /root/vep/Plugins/ExAC.pm line 253, line 2.
Plugin 'ExAC' went wrong: Illegal division by zero at /root/vep/Plugins/ExAC.pm line 253, line 3.
Plugin 'ExAC' went wrong: Illegal division by zero at /root/vep/Plugins/ExAC.pm line 253, line 4.
Plugin 'ExAC' went wrong: Illegal division by zero at /root/vep/Plugins/ExAC.pm line 253, line 5.
Plugin 'ExAC' went wrong: Illegal division by zero at /root/vep/Plugins/ExAC.pm line 253, line 6.
Plugin 'ExAC' went wrong: Illegal division by zero at /root/vep/Plugins/ExAC.pm line 253, line 7.
Is this is something I should be worried about?
The input VCF coordinates are in GRCh37 reference. I obtained the ExAC resource files from https://googledrive.com/host/0B6o74flPT8FAYnBJTk9aTF9WVnM and then used tabix v0.2.6 to create the *.tbi file.
I can provide more information/data to reproduce this error if needed.
I am currently attempting to use the dbNSFP database with Ensembl VEP to annotate a list of variants that I have. I have a number of questions about the installation and use of the plugin:
Firstly I am using dbNSFPv4.0b2a.zip (February 20) which I downloaded from the google site.
I have also run through the installation for dbNSFP using the VEP’s installation script which created the dbNSFP.pm file under ~/.vep/Plugins/.
I then followed the instructions below to unzip and create a useable, tabix indexed .gz file for use with hg38 data:
unzip dbNSFP4.0b2a.zip
head -n1 dbNSFP4.0b2a_variant.chr1 > h
cat dbNSFP4.0b2a_variant.chr* | grep -v ^#chr | sort -k1,1 -k2,2n - | cat h - | bgzip -c > dbNSFP4.0b2a.gz
tabix -s 1 -b 2 -e 2 dbNSFP4.0b2a.gz
The most up-to-date instructions I could find in the dbNSFP.pm VEP plugins git repository mentioned that this was the way to prepare the file for use with VEP. However, when I run VEP, I receive no error messages about being unable to find information from the database or of missing files but instead receive missing output for all of the SNPs in my analysis under the columns I selected from dbNSFP. While I appreciate that not all columns may contain information for every SNP, I believe that the problem may lie with the configuration of the dbNSFP database within VEP as I am currently working with over 10M SNPs from exome sequencing.
Would you be able to give me some clarity as to whether I have made a mistake during the installation of the plugin or perhaps why I may be getting empty annotations across all of my variants?
The portion of the VEP command that relates to dbNSFP is as follows and VEP has been tested and is working without dbNSFP and produces the same output:
--plugin dbNSFP,/rds/project/rjh234/rds-rjh234-mrc-epid/Studies/People/Nick/dbNSFP/dbNSFP4.0b2a.gz,SIFT_pred,Polyphen2_HDIV_pred,Polyphen2_HVAR_pred,gnomAD_exomes_NFE_AF,gnomAD_genomes_NFE_AF
Hi there,
I have a VCF file containing the following variant:
chr1 44130707 . CCAA C
...resulting in the inframe_deletion p.Asn378del.
Using the ProteinSeqs plugin, this is the sequence I got for the wildtype:
ENSP00000361373
MYGRPQAEMEQEAGELSRWQAAHQAAQDNENSAPILNMSSSSGSSGVHTSWNQGLPSIQHFPHSAEMLGSPLVSVEAPGQNVNEGGPQFSMPLPERGMSYCPQATLTPSRMIYCQRMSPPQQEMTIFSGPQLMPVGEPNIPRVARPFGGNLRMPPNGLPVSASTGIPIMSHTGNPPVPYPGLSTVPSDETLLGPTVPSTEAQAVLPSMAQMLPPQDAHDLGMPPAESQSLLVLGSQDSLVSQPDSQEGPFLPEQPGPAPQTVEKNSRPQEGTGRRGSSEARPYCCNYENCGKAYTKRSHLVSHQRKHTGERPYSCNWESCSWSFFRSDELRRHMRVHTRYRPYKCDQCSREFMRSDHLKQHQKTHRPGPSDPQANNNNGEQDSPPAAGP
...and this is the protein sequence for the mutant:
ENSP00000361373.3:p.Asn378del
MYGRPQAEMEQEAGELSRWQAAHQAAQDNENSAPILNMSSSSGSSGVHTSWNQGLPSIQHFPHSAEMLGSPLVSVEAPGQ
As it is not a stop_gain mutation, I expected the protein sequence to skip just that missing "N" aminoacid, rather than being truncated. Am I wrong or is this a bug?
Thank you for your help with this!
Silvia
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.