Giter Site home page Giter Site logo

rnaseq_pipeline_ngs2013's Introduction

RNAseq_pipeline_NGS2013

README file / Tutorial

RNAseq - From QC passed paired-end reads to read counts per transcript

Allison, Diego Lauren, Rachel

MSU NGS2013 Summer Course

This pipeline aligns paired end reads to a reference transcriptome using bwa-0.7.5a, creates SAM and BAM files, counts the mapped reads and creates as a final output the file "bwa_transcriptome_counts.txt" ready to use as input file in edgeR. A .fasta file for the unmapped reads is also created for each sample in case you want to blast then against any other reference database.

Please run each command replacing the < > parts with your files or paths to files

Feel free to look inside each shell script to figure out what they are doing!

You may want to run this pipeline with some sample data (RNAseq paired end reads, 4 samples, 10k reads for each pair) from Drosophila. In this case, download the Drosophila reference transcriptome (Step 4 of this tutorial - look below)

curl -O -L ftp://ftp.flybase.net/releases/current/dmel_r5.51/fasta/dmel-all-transcript-r5.51.fasta.gz
gunzip dmel-all-transcript-r5.51.fasta.gz

###Tutorial:

Start a new EC2 instance and follow the steps:

1- Create a working directory

mkdir -p /mnt/pipeline/
cd /mnt/pipeline/

2- Clone the scripts into this directory

git clone https://github.com/mazzottidr/RNAseq_pipeline_NGS2013.git
cd /mnt/pipeline/RNAseq_pipeline_NGS2013/

3- run INSTALL_RNAseq_group.sh (This will install all required software)

bash INSTALL_RNAseq_group.sh
cd /mnt/pipeline/RNAseq_pipeline_NGS2013/

4- Download your reference transcriptome (skip this step if you are running the sample data. You've probably downloaded the Drosophila reference transcriptome)

curl -O <url_to_your_reference_transcriptome>
gunzip <reference.fasta.gz>

5- Run PREPAREFILES_RNAseq_group.sh (This will prepare the reference transcriptome file)

bash PREPAREFILES_RNAseq_group.sh <reference.fasta>

6- Create folders for each sample to be proccessed (example for 4 samples is shown below).

mkdir sample1
mkdir sample2
mkdir sample3
mkdir sample4

7- Run PIPELINE_RNAseq_group.sh for each sample (run in different screens if you have a large dataset - not necessary for the sample data - it will run quickly)

cd /mnt/pipeline/RNAseq_pipeline_NGS2013/sample1
bash ../PIPELINE_RNAseq_group.sh ../<reference.fasta> <path_to_sample1_paired_end1.fq> <path_to_sample1_paired_end2.fq> <sample1_output_file_prefix>

cd /mnt/pipeline/RNAseq_pipeline_NGS2013/sample2
bash ../PIPELINE_RNAseq_group.sh ../<reference.fasta> <path_to_sample2_paired_end1.fq> <path_to_sample2_paired_end2.fq> <sample2_output_file_prefix>

cd /mnt/pipeline/RNAseq_pipeline_NGS2013/sample3
bash ../PIPELINE_RNAseq_group.sh ../<reference.fasta> <path_to_sample3_paired_end1.fq> <path_to_sample3_paired_end2.fq> <sample3_output_file_prefix>

cd /mnt/pipeline/RNAseq_pipeline_NGS2013/sample4
bash ../PIPELINE_RNAseq_group.sh ../<reference.fasta> <path_to_sample4_paired_end1.fq> <path_to_sample4_paired_end2.fq> <sample4_output_file_prefix>

#################################################################################

Automation of steps 6 and 7 for N samples will be developed (someday)

#################################################################################

8- When step 7 is done for all the samples, run bedtools to create a file that contains read count for each transcript for all bam file

cd ../
bedtools multicov -q 30 -p -bams sample1/<sample1_output_file_prefix.bam> sample2/<sample2_output_file_prefix.bam> sample3/<sample3_output_file_prefix.bam> sample4/<sample4_output_file_prefix.bam> -bed <reference.fasta.bed> > bwa_transcriptome_counts.txt

9- DONE! (You may want to run edgeR with the results of this pipeline)

rnaseq_pipeline_ngs2013's People

Contributors

ctb avatar mazzottidr avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.