Giter Site home page Giter Site logo

peak_pair_statistics's Introduction

Peak-pair Statistics

Introduction

The scripts in this repository can be used to perform some basic statistics on the peak-pairs. The peak-pair file is the output file obtained by running cw-peak-pair python script on the peak call file.

The basic operations include calculating a) the mean and median of peak-pair occupancy, b) peak-pair mode, c) Counting the number of orphans, d) Fraction of all mapped reads that reside in peak-pairs and, e) Signal to noise ratio in the dataset.

Requirements

  • The script only requires Perl (5 or higher) to run.
  • The input tag file should have the idx/tab extension and should be of the form (chr,index,forward,reverse,value(optional column)).
  • The peak-pair files should be in the standard gff format.

THE SCRIPT WILL BREAK IF:

  • The files have excel ^M character in it. For sanity check, open your file in terminal, to see if you can see ^M character in your file. In case, you find ^M character in your file, use the following command to remove it:

    $ perl -p -e 's/^M/\n/g;' <file_with_excel_char> > <new_file>
    
  • The peak-pair file should start with a "S_" and should end with a "_sXeXFX.gff", where X could be any number, ex: "_s5e20F10.gff".

  • The orphan file should start with a "O_".

  • The names of all S_* and O_* files should contain the index file name in it. For ex. if index file name is "Reb1-rep2.idx", than all the S_* and O_* files should be like S_XXX_Reb1-rep2_XXX_sXe20F1.gff, where X is any character.

Installing and Running the scripts

Unpack the source code archive. The folder contains the following:

-  robust_peak_pair_stats.pl: Script for basic statistics and an increasing-window quantile scan for signal:noise.
-  pp_stats_5pt_scan.pl: Script for basic statistics and a fixed-window quantile scan for signal:noise.
-  README.rst: Readme file
-  Sample data: which includes (two index files: Reb1-rep2.idx and Reb1-rep3.idx) and folder (genetrack_s5e10F1) containing peak calls and a subfolder (cwpair_output_mode_f0u0d100b3) containing all the S_*, D_*, O_*, and P_*, peak-pair files

If you wish to get the signal:noise ratio infomation using increasing-window quantile scan (for ex. top 1%, top 5%, top 10% etc) than use the following script:

$ perl  robust_peak_pair_stats.pl -h
$ Options: -i <path1>     path to the folder with index files [accepted index file extensions, idx, tab].
$          -d <path2>     path to the folder with S_*.gff and O_* files.
$          -g             organism, sg07=>yeast, mm09=>MouseV9, mm08=>MouseV8, hg18=>human18, hg19=>human19, dm03=>Drosophila
$          -s            size of genome[optional] In case of other genomes, set -g as NA and -s as the size of genome (see ex. below)

Do a test run of the script by typing:

$ perl robust_peak_pair_stats.pl -i  ./ -d genetrack_s5e10F1/cwpair_output_mode_f0u0d100b3/ -g sg07

The folder should now contain a "peak_pair_stats.txt" file. This means that script runs fine on your system.

if you wish to get the signal:noise information using fixed-width quantile scan (for ex. 0-5 %, 5-10 %, 10- 15 %) than use the following script:

$ perl pp_stats_5pt_scan.pl -h
$ Options: -i <path1>     path to the folder with index files[accepted index file extensions, idx, tab].
$          -d <path2>     path to the folder with S_*.gff and O_* files.
$          -g             organism, sg07=>yeast, mm09=>MouseV9, mm08=>MouseV8, hg18=>human18, hg19=>human19, dm03=>Drosophila
$          -s             size of genome[optional] In case of other genomes, set -g as NA and -s as the size of genome (see ex. below)
$          -p <number>    the percent quantile you need to use to scan. For ex. scan window of 5 is default.

Do a test run of the script by typing:

$  perl pp_stats_5pt_scan.pl -i  ./ -d genetrack_s5e10F1/cwpair_output_mode_f0u0d100b3/ -g NA -s 160000000 -p 10

The folder should now contain, a "peak_pair_stats.txt" and a "signal2noise_qt_scan.txt" file. This means that script runs fine on your system.

Output

All output files will be produced in the folder that contain S_* and O_* files. Following output files will be generated:

  • The script "pp_stats_5pt_scan.pl" produces an extra file named: "signal2noise_qt_scan.txt", which will contain the quantile range and the signal to noise ratio in a tab delimited format.

  • "peak_pair_stats.txt" containing the summary for each input file. The summary includes the following information:

    - Filename
    - Peak-pair mode
    - Peaks in peak pairs
    - Orphan peaks
    - Median peak-pair occupancy
    - Mean peak-pair occupancy
    - FRIP (Fraction of all mapped reads in peak-pairs)
    - top_1pt_signal:noise [only in the output of "robust_peak_pair_stats.pl"]
    - top_5pt_signal:noise [only in the output of "robust_peak_pair_stats.pl"]
    - top_10pt_signal:noise [only in the output of "robust_peak_pair_stats.pl"]
    - top_25pt_signal:noise [only in the output of "robust_peak_pair_stats.pl"]
    - top_50pt_signal:noise [only in the output of "robust_peak_pair_stats.pl"]
    - top_75pt_signal:noise [only in the output of "robust_peak_pair_stats.pl"]
    - top_100pt_signal:noise [only in the output of "robust_peak_pair_stats.pl"]
    

peak_pair_statistics's People

Contributors

rreja avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.