Authors: | Komal Rathi |
---|---|
Contact: | [email protected] |
Organization: | DBHi, CHOP |
Status: | This is "work in progress" |
Date: | 2024-09-28 |
ssh -i "rnaseq.pem" [email protected]
Install all software using conda. Depending on your system, you might have to install other pre-requisites.
conda create --name rnaseq-env
source activate rnaseq-env
conda install -c biobuilds sra-tools=2.5.6
conda install -c bioconda rsem=1.2.28
conda install -c bioconda star=2.5.2b
# R packages to install
GEOquery
SRAdb
DBI
# Other tools required for faster downloads
EDirect: https://www.ncbi.nlm.nih.gov/books/NBK179288/
aspera ascp client: http://downloads.asperasoft.com/en/downloads/50
# This is to be done just once per genome build (hg19, hg38, mm10 etc).
# you need to have an existing fasta and gtf (check config.yaml)
snakemake -p -s Snakefile_genome --config freeze=hg19
# provide either GEO accession or SRA study ID
Rscript getSRA.R SRP033200
Rscript getSRA.R GSE52564
This will create a directory structure under the source dir (by default: /mnt/rnaseq/data/raw) like this:
.
├── GSE52564
│ ├── sra
├── SRP033200
│ └── sra
└── log.txt
Then run snakemake with three parameters:
- -f or --freeze. The genome build (e.g. mm10, hg19 or hg38).
- -s or --sourcedir. The source directory which is path to the project directory.
- -p or --paired. TRUE or FALSE for paired or single-ended reads.
# E.g. to process data in /mnt/rnaseq/data/raw/GSE57945
# for single-ended reads
source activate rnaseq-env
bash run_snakemake.sh -f=hg38 -s=/mnt/rnaseq/data/raw/GSE57945 -p=FALSE
# for paired-end data
bash run_snakemake.sh -f=hg38 -s=/mnt/rnaseq/data/raw/GSE52564 -p=TRUE
This will create an output directory structure like this:
# output directory structure for GSE2564:
tree -L /mnt/rnaseq/data/raw/GSE52564/
├── bam
│ ├── SRR1033783_Aligned.toTranscriptome.out.bam
│ ├── SRR1033783_Log.final.out
│ ├── SRR1033783_Log.out
│ ├── SRR1033783_Log.progress.out
│ ├── SRR1033783_SJ.out.tab
├── fastq
│ ├── SRR1033783_1.fastq.gz
│ ├── SRR1033783_2.fastq.gz
├── quant
│ ├── SRR1033783.genes.results
│ ├── SRR1033783.isoforms.results
│ ├── SRR1033783.stat
│ │ ├── SRR1033783.cnt
│ │ ├── SRR1033783.model
│ │ └── SRR1033783.theta
└── sra
|── SRR1033783.sra