Giter Site home page Giter Site logo

maxgreil / rnaseq Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 12.84 MB

Proof of concept of a RNA-Seq pipeline from reads to count matrix (including quality control) with Nextflow and additional example RNA-Seq analysis in R

License: BSD 3-Clause "New" or "Revised" License

Dockerfile 0.06% Nextflow 0.49% HTML 99.45%
rna-seq nextflow pipeline docker quality-control rna-seq-pipeline rna-seq-analysis bioinformatics hisat2 featurecounts

rnaseq's Introduction

rnaseq

Proof of concept of a RNA-Seq pipeline from reads to count matrix (including quality control) with Nextflow and additional example RNA-Seq analysis in R.

Prerequisites

  • Unix-like OS (Linux, macOS, etc.)
  • Java version 8
  • Docker engine 1.10.x (or later)

Necessary files

  • Reads to be mapped must be stored in compressed .fastq.gz file format in folder data

Additional necessary files

If the reads to be analyzed originate from a human RNA-Seq experiment, these additional 3 files must be stored in folder data:

  • Prebuild Hisat2 index for H. sapiens, release GRCh38
https://genome-idx.s3.amazonaws.com/hisat/grch38_snptran.tar.gz
  • Gencode GTF file, release 38 (GRCh38.p13)
https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_38/gencode.v38.chr_patch_hapl_scaff.annotation.gtf.gz
  • USCS BED file, assembly GRCh38/hg38, track GENCODE V38
http://genome.ucsc.edu/cgi-bin/hgTables

The BED file must be stored in *.annotation.bed.gz file format.

For the analysis of another species, the corresponding files for this organismus must be downloaded.

Table of Contents

Quick start

Because this pipeline uses HISAT2 as the alignment program for mapping reads, this pipeline is for short reads only!

Example run:

nextflow run main.nf

The above example uses default parameter params.reads for single-end reads:

nextflow run main.nf --reads "data/*.fastq.gz"

For paired-end reads, additionally parameter params.singleEnd in nextflow.config must be changed to false. Then the input command must be:

nextflow run main.nf --reads "data/*_{1,2}*.fastq.gz"

Optionally, you can specify the Nextflow output directory with flag --outdir <folder>. By default, all resulting files will be saved in folder output and folder info will contain all information about the last run nextflow session.

Installation

Clone this repository with the following command:

git clone https://github.com/maxgreil/rnaseq && cd rnaseq

Then, install Nextflow by using the following command:

curl https://get.nextflow.io | bash

The above snippet creates the nextflow launcher in the current directory.

Finally pull the following Docker container:

docker pull maxgreil/rnaseq

Alternatively, you can build the Docker Image yourself using the following command:

cd docker && docker image build . -t maxgreil/rnaseq

Arguments

Optional Arguments

Argument Usage Description
--reads <files> Directory and glob pattern of input files
--outdir <folder> Directory to save output files

Documentation

This pipeline is designed to:

  • map given reads to a genome
  • create a count matrix of mapped reads for subsequent RNA-Seq analysis in R
  • do a quality control of the created files

Pipeline overview

The pipeline is built using Nextflow and processes data using the following steps:

  1. hisat2 - map given reads to genome
  2. samtools - create sorted BAM files from HISAT2 SAM files
  3. picard - mark duplicates in sorted BAM files
  4. featureCounts - count mapped reads to genomic features (exons)
  5. deeptools - create BIGWIG from BAM for IGV
  6. preseq - predict and estimate the complexity of genomic sequencing library
  7. reseqc - comprehensive evaluation of used RNA-Seq data
  8. FastQC - BAM file quality control
  9. MultiQC - aggregate report, describing results of the whole pipeline

rnaseq's People

Contributors

maxgreil avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.