Giter Site home page Giter Site logo

bphunter's Introduction

BPHunter

Genome-wide detection of human variants that disrupt intronic branchpoints

Introduction

  • The search for pathogenic candidate variants in massive parallel sequencing (MPS) or next-generation sequencing (NGS) data typically focuses on non-synonymous variants within coding sequences or variants in essential splice sites, while mostly ignoring non-coding intronic variants.

  • RNA splicing, as a necessary step for protein-coding gene expression in eukaryotic cells, operates its spliceosome mostly within introns to define the exon-intron boundaries and hence the coding sequences. Introns probably harbor a substantially larger number of pathogenic variants than has so far been appreciated.

  • Intronic branchpoint (BP) is recognized by spliceosome in the beginning of the splicing process, and constitutes a vulnerability of splicing by its potential variants. BP variants may potentially result in aberrant splicing consequences (exon skipping, intron retention), which could be deleterious to the gene product.

  • BPHunter is a genome-wide computational approach to systematically detect intronic variants that may disrupt BP recognition, efficiently and informatively. This standalone version can be easily implemented into NGS analysis by a one-line command. We also provided a BPHunter webserver with a user-friendly interface.

News

  • Feb 2023: BPHunter official version-2 was released, with an additional program for processing VCF files in batch, and an additional output parameter 'BPHunter_HIGHRISK' (YES/NO) for labeling more promising candidate variants.
  • Oct 2022: "Genome-wide detection of human variants that disrupt intronic branchpoints" that introduces BPHunter was published in PNAS.
  • Aug 2022: BPHunter official version-1 was released.
  • Jun 2021: BPHunter webserver & github were launched.
  • Dec 2020: BPHunter prototype was completed.

Usage

Current version: version-2

Dependency

The code is written in python3, and requires bedtools installed.

Reference datasets

Due to the file size limit in GitHub, please download the BPHunter reference datasets and put them into your BPHunter folder.

To use the latest version-2, please download and replace the reference datasets.

File Format

Input: Variants in VCF format, with 5 mandatory and tab-delimited fields (CHROM, POS, ID, REF, ALT).

  • The 48 published pathogenic BP variants are provided as the example input. (Example_var_BP.vcf)

Output: BPHunter-detected variants will be output with the following annotations.

  • SAMPLE (only for BPHunter_VCF_batch.py)
  • CHROM, POS, ID, REF, ALT (exactly the same as input)
  • STRAND: +/-
  • VAR_TYPE: snv, x-nt del, x-nt ins
  • GENE: gene symbol
  • TRANSCRIPT_IVS: ENST123456789_IVS10
  • CANONICAL: canonical transcript_IVS, or '.'
  • BP_NAME: m/e/cBP_chrom_pos_strand_nucleotide
  • BP_ACC_DIST: distance from BP to the acceptor site
  • BP_RANK: rank of BP in this intron
  • BP_TOTAL: total number of BP in this intron
  • BP_HIT: BP position (-2, -1, 0) hit by the variant
  • BP_SOURCE: number of sources supporting this BP position
  • CONSENSUS: level of consensus (1:YTNAY, 2:YTNA, 3:TNA, 4:YNA, 0:none)
  • BP/BP2_GERP: conservation score GERP for BP and BP-2 positions
  • BP/BP2_PHYL: conservation score PHYLOP for BP and BP-2 positions
  • BPHunter_HIGHRISK: YES/NO if a BP variant considered as high-risk
  • BPHunter_SCORE: score of a BP variant (suggested cutoff>=3, max=10)

Command & Parameters (BPHunter_VCF.py)

python BPHunter_VCF.py -i variants.vcf
python BPHunter_VCF.py -i variants.vcf -g GRCh37 -t all
Parameter Type Description Default
-i file variants in VCF format, with 5 fields (CHROM, POS, ID, REF, ALT) N.A.
-g str human reference genome assembly (GRCh37 / GRCh38) GRCh37
-t str all / canonical transcripts? all

Command & Parameters (BPHunter_VCF_batch.py)

python BPHunter_VCF_batch.py -d /dir -s samplelist.txt -o output.csv
python BPHunter_VCF_batch.py -d /dir -s samplelist.txt -o output.csv -g GRCh37 -t all
Parameter Type Description Default
-d str directory of VCF files N.A.
-s file sample list (without .vcf extension) to be screened in the above directory N.A.
-o str output CSV filename N.A.
-g str human reference genome assembly (GRCh37 / GRCh38) GRCh37
-t str all / canonical transcripts? all

BPHunter Scoring Scheme

Reference

  • Zhang P. et al. Genome-wide detection of human variants that disrupt intronic branchpoints. PNAS. 119(44):e2211194119. 2022.

Contact

Developer: Peng Zhang, Ph.D.

Email: [email protected]

Laboratory: St. Giles Laboratory of Human Genetics of Infectious Diseases

Institution: The Rockefeller University, New York, NY, USA

bphunter's People

Contributors

casanova-lab avatar zhangpeng1202 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

zhangpeng1202

bphunter's Issues

canonical transcript?

How canonical transcripts shown below are prepared?

if canonical == 'no':
filename_bphunter_ref = 'Data_BPHunter_'+genome+'detection_all.bed'
elif canonical == 'yes':
filename_bphunter_ref = 'Data_BPHunter
'+genome+'_detection_canonical.bed'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.