An automatic classification tool for PVS1 interpretation of null variants.
This is a modifed port of the original https://github.com/JiguangPeng/autopvs1, branched from commit 7fb1be97667e5ef576f81bf2fabbddcf9a4c7594
.
Major modifications include:
- Gutting running of VEP, as it is assumed it was run ahead of time
- Gutting CNV functions as they are not currently used
- Parsing of an already VEP-annotated vcf file using pysam
- Config file should be at
pwd
, data files in config best with full paths - Dropped hg19 references and made imports absolute instead of relative We have noted in some sections where things stayed the same and other changed
A web version for AutoPVS1 is also provided: http://autopvs1.genetics.bgi.com
VEP should have been run ahead of time, no longer built-in. Recommend KFDRC Germline Annotation Workflow: CWL source code can be run on Cavatica or with any cwl runner.
Samtools provides a function “faidx” (FAsta InDeX), which creates a small flat index file “.fai” allowing for fast random access to any subsequence in the indexed FASTA file, while loading a minimal amount of the file in to memory.
pyfaidx module implements pure Python classes for indexing, retrieval, and in-place modification of FASTA files using a samtools compatible index.
maxentpy is a python wrapper for MaxEntScan to calculate splice site strength.
It contains two functions. score5
is adapted from MaxEntScan::score5ss to score 5' splice sites. score3
is adapted from MaxEntScan::score3ss to score 3' splice sites.
maxentpy
is already included in the autopvs1 Dockerfile.
pyhgvs provides a simple Python API for parsing, formatting, and normalizing HGVS names.
But it only supports python2, I modified it to support python3 and added some other features.
pyhgvs
is also included in the autopvs1 Dockerfile.
pwd/config.ini
[DEFAULT]
pvs1levels = data/PVS1.level
gene_alias = data/hgnc.symbol.previous.tsv
gene_trans = data/clinvar_trans_stats.tsv
[HG38]
genome = data/hg38.fa
transcript = data/ncbiRefSeq_hg38.gpe
domain = data/functional_domains_hg38.bed
hotspot = data/mutational_hotspots_hg38.bed
curated_region = data/expert_curated_domains_hg38.bed
exon_lof_popmax = data/exon_lof_popmax_hg38.bed
pathogenic_site = data/clinvar_pathogenic_GRCh38.vcf
All refs were obtained from original git repo from data
dir except for hg38 fasta.
They now live here
User should provide that as part of input
Note: the chromosome name in fasta files should have chr
prefix
python3 pathogenicity-assessment/autopvs1/autoPVS1_from_VEP_vcf.py --genome_version hg38 --vep_vcf ~/volume/VEP_TEST/AUTOPVS1_TEST/input_VEP_annotated.vcf.gz > output.autopvs1.tsv
Please see https://autopvs1.genetics.bgi.com/faq/
Users may freely use the AutoPVS1 for non-commercial purposes as long as they properly cite it.
This resource is intended for research purposes only. For clinical or medical use, please consult professionals.
📝citation: Jiale Xiang, Jiguang Peng, Samantha Baxter, Zhiyu Peng. (2020). AutoPVS1: An automatic classification tool for PVS1 interpretation of null variants. Hum Mutat 41, 1488-1498. (Editor's choice and cover article)