Giter Site home page Giter Site logo

chenkenbio / predpsi-svr Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 2.77 MB

Predicting alternative splicing change caused by genetic mutations

License: GNU General Public License v3.0

Shell 17.88% Gherkin 0.04% Python 57.61% MATLAB 0.39% Perl 24.08%

predpsi-svr's Introduction

PredPSI-SVR

PredPSI-SVR was designed to predict the change of percent spliced in (delta-PSI or ) caused by genetic variants for the CAGI 5 vex-seq challenge.

Send questions and comments to [email protected]

Requirements

  • Operation system: Unix/Linux
  • Memory: 4GB at least
  • Perl in your PATH
  • Python 2
  • Python 3 (with numpy package installed)
    If you have trouble installing python 3 or numpy, you can try miniconda
    cd ~/Downloads
    wget -c https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
    chmod +x Miniconda3-latest-Linux-x86_64.sh
    ./Miniconda3-latest-Linux-x86_64.sh             #pay attention to path of installation, we'll use the default path "$HOME/miniconda3" in this tutorial 
    source $HOME/miniconda/bin/activate
    pip install numpy

Getting started

Preparation

Note: If you have the following packages installed on your system, you can skip installing them and just edit path in src/init.sh
We put PrePSI-SVR in $HOME directory as default

  1. Download PredPSI-SVR,

    cd ~
    git clone https://github.com/chenkenbio/PredPSI-SVR
  2. Download ANNOVAR (http://annovar.openbioinformatics.org/en/latest/user-guide/download/), libsvm (https://www.csie.ntu.edu.tw/~cjlin/libsvm), and samtools (http://www.htslib.org/download/). And move them to PredPSI-SVR/tools.

  3. Extract packages:

    cd ~/PredPSI-SVR/tools
    tar -xzvf annovar.latest.tar.gz
    tar -xzvf libsvm-3.23.tar.gz
    tar -xjvf samtools-1.9.tar.bz2
    cd libsvm-3.23
    make all
    cd ../samtools-1.9
    make all
    cd ..
  4. Download basic annotation databases for ANNOVAR

    cd ~/PredPSI-SVR/tools/annovar        # PredPSI-SVR/tools
    ./annotate_variation.pl -buildver hg19 -downdb -webfrom annovar ensGene ./humandb/
  5. Download third-party database SPIDEX from http://www.openbioinformatics.org/annovar/spidex_download_form.php. Move it to ~/PredPSI-SVR/tools/annovar/humandb/ and decompress with unzip:

unzip hg19_spidex.zip        #working directory: PredPSI-SVR/tools/annovar/humandb
  1. Download hg19 genome
cd ~/PredPSI-SVR/genome
wget -c http://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/chromFa.tar.gz
cat *.fa > hg19.fasta
$HOME/PredPSI-SVR/tools/samtools-1.9/samtools faidx hg19.fasta
  1. Finally, check variables in src/init.sh, edit them to fit your system

Usage

Example:

cd ~/PredPSI-SVR/
## PredPSI-SVR, with "-p" option
./main.sh example/sample.vcf -p example/sample.psi -o example/outdir
## PredPSI-SVR-noPSI, with out "-p"
./main.sh example/sample.vcf -o example/outdir

Result file is example/outdir/OUTPUT.dpsi

Attention:

The PredPSI-SVR will filter the VCF file at first to remove variants in intergenic regions or distant to splice sites (more than 200 bp ). Threrefore sometimes you will find that there are fewer variants in OUTPUT.psi than your input VCF file.

References

PredPSI-SVR/tools/ese3/ese3_mod.py is modified based a script in SilVA package (Paper: https://www.ncbi.nlm.nih.gov/pubmed/23736532, GitHub: https://github.com/buske/silva)

Citation

Chen, K., Lu, Y., Zhao, H., & Yang, Y. (2019). Predicting the change of exon splicing caused by genetic variant using support vector regression. Human mutation, 40(9), 1235โ€“1242. https://doi.org/10.1002/humu.23785

predpsi-svr's People

Contributors

chenkenbio avatar

Watchers

 avatar

predpsi-svr's Issues

Error with libsvm and utils.py

I installed libsvm 3.32 and got this error with PredPSI-SVM

./main.sh example/sample.vcf -o example/outdir 

 

>>== Check commands availablity ==<< 

/usr/bin/python 

/usr/bin/python3 

/usr/bin/perl 

/usr/bin/samtools 

** ERROR: check variable "svm_scale" in src/init.sh 

** ERROR: check variable "svm_predict" in src/init.sh 

** ANNOVAR annotating...... 

NOTICE: Finished reading 99 lines from VCF file 

NOTICE: A total of 99 locus in VCF file passed QC threshold, representing 99 SNPs (77 transitions and 22 transversions) and 0 indels/substitutions 

NOTICE: Finished writting 0 SNPs (0 transitions and 0 transversions) and 0 indels/substitutions for 1 sample 

NOTICE: The --geneanno operation is set to ON by default 

NOTICE: Reading gene annotation from /data/jess_tmp/fh/PredPSI-SVR/tools/annovar/humandb/hg19_ensGene.txt ... Done with 103433 transcripts (including 38799 without coding sequence annotation) for 47132 unique genes 

NOTICE: Finished gene-based annotation on 0 genetic variants in /data/jess_tmp/fh/PredPSI-SVR/example/outdir/input.avinput 

NOTICE: Output files were written to /data/jess_tmp/fh/PredPSI-SVR/example/outdir/annovar.variant_function, /data/jess_tmp/fh/PredPSI-SVR/example/outdir/annovar.exonic_variant_function 

    DONE 

** Preparing mutation info... 

  - only vcf, finding exon transcript... 

/data/jess_tmp/fh/PredPSI-SVR/src/utils.py:30: SyntaxWarning: "is" with a literal. Did you mean "=="? 

  assert errmsg is '', "Error in file 'utils.py':'get_genome': " + errmsg 

/data/jess_tmp/fh/PredPSI-SVR/src/utils.py:51: SyntaxWarning: "is" with a literal. Did you mean "=="? 

  if len(args) is 0: 

/data/jess_tmp/fh/PredPSI-SVR/src/utils.py:118: SyntaxWarning: "is" with a literal. Did you mean "=="? 

  if self.strand is '-': 

/data/jess_tmp/fh/PredPSI-SVR/src/utils.py:292: SyntaxWarning: "is" with a literal. Did you mean "=="? 

  if strand is '+': 

However, you can fix it by doing

chmod -x svm-predict.c 

chmod -x svm-scale.c 

And change all the "is" in util.py to "=="

This gives

./main.sh example/sample.vcf -o example/outdir 

 

>>== Check commands availablity ==<< 

/usr/bin/python 

/usr/bin/python3 

/usr/bin/perl 

/usr/bin/samtools 

/data/jess_tmp/fh/PredPSI-SVR/tools/libsvm-3.32/svm-scale.c 

/data/jess_tmp/fh/PredPSI-SVR/tools/libsvm-3.32/svm-predict.c 

** ANNOVAR annotating...... 

NOTICE: Finished reading 99 lines from VCF file 

NOTICE: A total of 99 locus in VCF file passed QC threshold, representing 99 SNPs (77 transitions and 22 transversions) and 0 indels/substitutions 

NOTICE: Finished writting 0 SNPs (0 transitions and 0 transversions) and 0 indels/substitutions for 1 sample 

NOTICE: The --geneanno operation is set to ON by default 

NOTICE: Reading gene annotation from /data/jess_tmp/fh/PredPSI-SVR/tools/annovar/humandb/hg19_ensGene.txt ... Done with 103433 transcripts (including 38799 without coding sequence annotation) for 47132 unique genes 

NOTICE: Finished gene-based annotation on 0 genetic variants in /data/jess_tmp/fh/PredPSI-SVR/example/outdir/input.avinput 

NOTICE: Output files were written to /data/jess_tmp/fh/PredPSI-SVR/example/outdir/annovar.variant_function, /data/jess_tmp/fh/PredPSI-SVR/example/outdir/annovar.exonic_variant_function 

    DONE 

** Preparing mutation info... 

  - only vcf, finding exon transcript... 

I haven't got OUTPUT.dpsi but that might be because I haven't got the SPIDEX database installed yet (still waiting for a link)

/PredPSI-SVR/example/outdir$ ls

annovar.exonic_variant_function  annovar.log  annovar.variant_function  input.avinput  input.vcf  mut_info  mut_info.valid

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.