Giter Site home page Giter Site logo

diseasexpress-pipeline's Introduction

DiseaseXpress RNA-seq Pipeline

Authors: Komal Rathi
Contact: [email protected]
Organization: DBHi, CHOP
Status: This is "work in progress"
Date: 2024-09-28

Login

ssh -i "rnaseq.pem" [email protected]

Installation

Install all software using conda. Depending on your system, you might have to install other pre-requisites.

conda create --name rnaseq-env
source activate rnaseq-env
conda install -c biobuilds sra-tools=2.5.6
conda install -c bioconda rsem=1.2.28
conda install -c bioconda star=2.5.2b

# R packages to install
GEOquery
SRAdb
DBI

# Other tools required for faster downloads
EDirect: https://www.ncbi.nlm.nih.gov/books/NBK179288/
aspera ascp client: http://downloads.asperasoft.com/en/downloads/50

Pipeline

Create Genome Index for STAR and RSEM:

# This is to be done just once per genome build (hg19, hg38, mm10 etc).
# you need to have an existing fasta and gtf (check config.yaml)
snakemake -p -s Snakefile_genome --config freeze=hg19

Get raw fastq files and create a directory structure:

# provide either GEO accession or SRA study ID
Rscript getSRA.R SRP033200
Rscript getSRA.R GSE52564

This will create a directory structure under the source dir (by default: /mnt/rnaseq/data/raw) like this:

.
├── GSE52564
│   ├── sra
├── SRP033200
│   └── sra
└── log.txt

Run snakemake

Then run snakemake with three parameters:

  1. -f or --freeze. The genome build (e.g. mm10, hg19 or hg38).
  2. -s or --sourcedir. The source directory which is path to the project directory.
  3. -p or --paired. TRUE or FALSE for paired or single-ended reads.
# E.g. to process data in /mnt/rnaseq/data/raw/GSE57945
# for single-ended reads
source activate rnaseq-env
bash run_snakemake.sh -f=hg38 -s=/mnt/rnaseq/data/raw/GSE57945 -p=FALSE

# for paired-end data
bash run_snakemake.sh -f=hg38 -s=/mnt/rnaseq/data/raw/GSE52564 -p=TRUE

This will create an output directory structure like this:

# output directory structure for GSE2564:

tree -L /mnt/rnaseq/data/raw/GSE52564/

├── bam
│   ├── SRR1033783_Aligned.toTranscriptome.out.bam
│   ├── SRR1033783_Log.final.out
│   ├── SRR1033783_Log.out
│   ├── SRR1033783_Log.progress.out
│   ├── SRR1033783_SJ.out.tab
├── fastq
│   ├── SRR1033783_1.fastq.gz
│   ├── SRR1033783_2.fastq.gz
├── quant
│   ├── SRR1033783.genes.results
│   ├── SRR1033783.isoforms.results
│   ├── SRR1033783.stat
│   │   ├── SRR1033783.cnt
│   │   ├── SRR1033783.model
│   │   └── SRR1033783.theta
└── sra
    |── SRR1033783.sra

diseasexpress-pipeline's People

Stargazers

Najeeb Ashraf Syed avatar

Watchers

James Cloos avatar Anthony Cros avatar Allison Heath avatar Yuankun Zhu avatar Komal Rathi avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.