Giter Site home page Giter Site logo

erds-pe's Introduction

#ERDS-pe #Introduction ERDS-pe is a tool designed to detect detect CNVs from whole-exome sequencing (WES) data. ERDS-pe employs RPKM and principal component analysis to normalize WES data and incorporates RD and single-nucleotide variation information together as a hybrid signal into a paired hidden Markov model to infer CNVs from WES data. #Installation ERDS-pe is easy to run. you just need:

  1. Python 2.7+ (Python 3 is not yet supported)

  2. Numpy, PyVCF, Pysam libraries installed.

  3. Target(Probe) files in bed format.

#Running

usage: python erds_pe.py [OPTIONS]

  1. Getting RD and transformming to RPKM format

    This command is to extract the read depth (RD) signal from the BAM files and to calculate RPKM values for all samples.

     usage: python erds_pe.py rpkm [OPTIONS] 
    
     --target  <FILE>  Target region file in bed format (required)
    
     --input   <FILE>  A list of bam files list. eg. bam_list_example.txt (required)
    
     --output  <FILE>  Directory for RPKM files (required)
    
  2. Merge single sample RPKM files to a union data matrix

     usage: python erds_pe.py merge_rpkm [OPTIONS] 
    
     --rpkm_dir  <FILE>  Giving the RPKM files directory for taking data (required)
    
     --target   <FILE>  Target region file in bed format (required)
    
     --output  <FILE>  Output the data matrix file (optional)
    
  3. Normalization

    This command is to normalize RPKM data matrix using principal component analysis.

     usage: python erds_pe.py svd [OPTIONS] 
    
     --rpkm_matrix  <FILE>  Giving the RPKM files PCA normalization (required)
    
     --output  <FILE>  Output the normalized data matrix file (optional)
    
  4. Calling CNVs

    This command is to call CNVs from pooled whole-exome sequencing samples.

     usage: python erds_pe.py discover [OPTIONS]  
    
     --params  <FILE>  Parameters file for HMM (required)
    
     --datafile  <FILE>  Normalized data matrix file (required)
     
     --output  <FILE>  Output the normalized data matrix file (optional)
     
     --sample  <STRING>  Giving a specific sample for calling (optional)
    
     --vcf  <FILE>  Taking SNV information from vcf file (optional)
     
     	--hetsnp  <BOOL>  Using or not take heterogenous SNV information into HMM (optional, default FALSE)
     
     	--tagsnp  <BOOL>  Using or not take tagSNP-copy number polymorphism information into HMM (optional, default FALSE)
    
     --tagsnp_file  <FILE>  A file records the linkage disequilibrium information between tagSNP and copy number polymorphism (optional)
    

#File Instruction

  1. bam list file (three columns)

    Column 1: Sample_name Column 2: Path of bam files Column 3: The population of the corresponding sample. e.g CEU, YRI, CHB etc.

    Example:

     NA06984	/data/rjtan/1000GP/exome/NA06984.mapped.ILLUMINA.bwa.CEU.exome.20120522.bam	CEU
     NA06985	/data/rjtan/1000GP/exome/NA06985.mapped.ILLUMINA.bwa.CEU.exome.20130415.bam	CEU
     NA06994	/data/rjtan/1000GP/exome/NA06994.mapped.ILLUMINA.bwa.CEU.exome.20120522.bam	CEU
     NA07000	/data/rjtan/1000GP/exome/NA07000.mapped.ILLUMINA.bwa.CEU.exome.20130415.bam	CEU
     ...
    

#Quick Start & a example

  1. Step 1:

     python erds_pe.py rpkm
     --input $bamlist_file \
     --target $target_file \
     --output $rpkm_files
    
  2. Step 2:

     python erds_pe.py merge_rpkm
     --rpkm_dir $rpkm_files \
     --target $target_file
    
  3. Step 3:

     python erds_pe.py svd
     --rpkm_matrix $RPKM_matrix.raw.filtered
    
  4. Step 4:

     python erds_exome.py discover
     --params params.txt \
     --datafile $RPKM_matrix.raw.filtered.SVD \
     --sample NA12878 \
     --vcf=$snv_vcf_file \
     --hetsnp True \
     --tagsnp Ture \
     --tagsnp_file $tagsnp_file \
     --output NA12878.pooled.Het.Tag.cnv
    

#Contact [email protected]

erds-pe's People

Contributors

microtan0902 avatar

Stargazers

Tyler J Moss avatar Michael Knudsen avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.