Giter Site home page Giter Site logo

tangwei1129 / pipeliner Goto Github PK

View Code? Open in Web Editor NEW

This project forked from dlwheeler/pipeliner

0.0 2.0 1.0 12.64 MB

Pipeliner Program

Home Page: http://ccbr.github.io/Pipeliner

Roff 1.94% Python 48.58% JavaScript 4.84% HTML 4.63% CSS 0.98% Perl 9.67% Shell 0.58% R 4.45% Ragel 24.32%

pipeliner's Introduction

Test

##PLEASE CLICK HERE TO VIEW/DOWNLOAD THE FULL PIPELINER MANUAL

#Pipeliner

Introduction

Pipeliner provides access to some of the NGS data analysis pipelines used by CCBR on the NIH Biowulf Linux Cluster. Pipeliner allows the user to select a set of data files, such as fastq sequence reads, and process them via a sequence of programs that constitute an analysis 'pipeline' to reach an endpoint, such as a list of mutations with accompanying annotations. The program provides a graphical interface in which pipeline options are configured using pulldowns and text entry fields. Once configured, the pipeline is executed on the Biowulf cluster.

Quick Start

Prerequisites

  • An account on the NIH Biowulf Linux Cluster

Pipeliner runs on the NIH Biowulf cluster and uses a graphical interface. To use the program one must log into Biowulf using ssh with X11 packet forwarding or via a No machine NX terminal (https://www.nomachine.com/) specifically for RNAseq run. For instance:

Running Pipeliner

Once logged in, launch the Pipeliner program:

/path/to/pipeliner/install/runpipe.sh

If you are running your own installation of Pipeliner, you will need to edit the paths to the programs and files given in these configuration files located within the Pipeliner installation directory:

hg19.json
mm10.json
standard-bin.json

Within the Pipeliner program, the following steps must be executed in order:

  • Set the working directory--this is the directory in which all output files will appear and must already exist
  • Set the data source--this is the directory containing the starting data which is generally a set of files containing paired end fastq reads
  • Initialize the working directory--this step first clears the working directory, then makes a few subdirectories within it and populates these with a number of files needed to run the pipelines
  • Link the data files to the working directory--this step creates symbolic links from the data files into the working directory to prevent the need to copy or move large files
  • Perform a dry run to verify that the pipeline is configured properly--if the dry run produces errors, these must be corrected prior to launching
  • Launch the pipeline--this submits the pipeline job to the Biowulf cluster

Configuration and Details

Pipeliner uses a program called Snakemake to manage pipeline workflows.

https://bitbucket.org/snakemake/snakemake/wiki/Home

Snakemake, in turn, accepts a configuration file formatted in JSON (Javascript Object Notation).

The principal configuration files used by Pipeliner

  • hg19.json :references for human genome version 19/GRch37
  • standard-bin.json :paths to programs used in the pipeline
  • rules.json :lists of Snakemake rules assigned to named pipelines
  • cluster.json :SLURM parameters for each rule (memory requested, time, # cpus)

Backend python programs

  • pipeliner.py :main program, including GUI components
  • makeasnake.py :called by Pipeliner to build Snakefiles required by Snakemake
  • stats2html.py :builds reports at end of a pipeline run

Sub Directories used within working directory

  • Reports :contains scripts for aggregate report generation and aggregate reports created by Pipeliner
  • QC :contains QC reports generated during various pipeline steps
  • Scripts :contains custom scripts used by some pipelines

Other subdirectories

  • Data :resides within the installation Pipeliner directory and contains files specifying adapter sequences to be trimmed from reads, in fasta format. These should be referenced in the appropriate reference json file, e.g. hg19.json.

Pipelines Implemented

  • ExomeSeqPairs
  • ExomeSeqGermline
  • RNASeq(counts-based)

Pipelines Planned for Inclusion

  • ChipSeq
  • mirSeq

Organisms supported

  • Human (hg19)
  • Mouse (mm10)

Adding Custom Pipelines

You can configure new pipelines by adding Snakemake rules to the Rules directory and referencing them within 'rules.json' as requirements for named pipelines by adding the name of the rule (key) to the json keys and adding the name of the pipeline requiring the rule to the list (the value) for the key. A new entry for the pipeline must also be added to the pulldown in the GUI--this requires that pipeliner.py be edited.

pipeliner's People

Contributors

felloumi avatar jlac avatar dlwheeler avatar kopardev avatar pajailwala avatar joshuabhk avatar nikhilbg avatar

Watchers

James Cloos avatar Wei Tang avatar

Forkers

bennuru

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.