Giter Site home page Giter Site logo

readcount-generator's Introduction

ReadCount_Gen

This script file is intended to easily generate readcount files from bam files using coverageBed. A target file for differential analysis using edgeR will also be prepared based on the generated readcounts. You can stop the work at anytime and restart the script. Progress of already finished files will be saved as long as the input and output folders remain unchanged. Multithreading and temporary usage of local storage are supported if faster processing is needed and hardware conditions allow it.

For a full utilization of the code, several inputs are needed: copy_to_local - Set True if using local storage temporarily. Highly recommended if a SSD is available and multithreading is enabled, since SSD's read speed dramatically outweight external hard drive read speed. SSD's read speed will not be a bottle neck when multithreading is enabled, while external hard drive's will.

allocated_space - Amount of temporary storage space which will not be used by ReadCount_Gen. If this parameter is unspecified, all available space in the disc where temporary folder is located will be used. The units of the number entered are specified with --allocated_space_unit

allocated_space_unit - Unit of allocated_space. Only accepts "MB" or "GB".

local_poolsize - Amount of threads used to process file from local temporary folder. Highly recommended if you have a powerful computer and SSD as temporary local storage.

external_poolsize - Amount of threads used to process file from external hard drive. Do not recommend integer higher than 4 since the relatively low read speed will influence each thread's performance.

coverageBed_path - Full path of coverageBed executable file. If not specified, script will assume coverageBed is in PATH.

bam_folder - Full path of folder containing original source of bam files. Can accept multiple arguments separated by space, specify category in category parameter. Bam files should be directly under the folder (not in subfolders).

category - Category of bam folders in the exact same order provided in bam_folder parameter. Accept multiple arguments, separated by space.

bed_file - Full path of the bed file that is going to be used.

temp_folder - Full path of temporary folder, where bam files will be copied, if temporary local storage is used. Folder will be created if not previously exists.

output_folder - Full path of folder for readcount outputs, a progress file and a target file that records task progress. Folder will be created if not previously exists.

RNA_type - sno,lnc,piwi,etc..This string will be incoporated into output file names.

A sample of comprehensive arguments are provided as below, for detailed explaination of each parameters please use --help argument.

python2.7 path/to/script/ReadCount_Gen_args.py --copy_to_local True --allocated_space 50 --allocated_space_unit GB --local_poolsize 13 --external_poolsize 2 --coverageBed_path /usr/local/bin/coverageBed --bam_folder /media/huntertom/LaCie/Esophageal/miRNA_normal_BAM /media/huntertom/LaCie/Esophageal/miRNA_BAM --category normal Esophageal_Carcinoma --bed_file /home/huntertom/Downloads/NONCODEv4u1_human_ncRNAgrch37compat-3.bed --temp_folder /home/huntertom/Documents/temp_bam --output_folder /home/huntertom/Documents/Esophageal/piwi/test --RNA_type piwi

When using this script, please be aware that:

  1. Python 2.7 is needed to run the script. Python 2.7 comes with Ubuntu.
  2. This script does not support matched pair analysis so far. (tumor and normal data from the same patients)
  3. If normal hard drive is used as local storage, improvement of performance over external hard drive has not be tested yet.
  4. The sum of local_poolsize and external_poolsize is not advised to exceed the total amount of logical cores available. In order to check number of available logical core,

readcount-generator's People

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.