Add write to uncompressed bam

SRAlign

A flexible pipeline for short read alignment to a reference with extensive QC reporting.

Introduction

SRAlign is a Nextflow pipeline for aligning short reads to a reference.

SRAlign is designed to be highly flexible by allowing for the easy addition of tools to the pipeline as well as serving as a starting point for genomic analyses that rely on alignment of short reads to a reference.

Pipeline overview

Trim reads
QC of reads
1. Raw reads FastQC
2. Trim reads FastQC
3. Summary MultiQC
Align reads
1. Align to reference genome/transcriptome
2. Check contamination
Preprocess alignments
1. Mark duplicates
2. Compress sam to bam
3. Index bam
QC of alignments
1. samtools stats
2. Samtools index stats
3. Percent duplicates
4. Percent aligned to contamination reference
5. Summary MultiQC
Library complexity and reproducibility
1. Preseq library complexity
2. DeepTools correlation
3. DeepTools PCA
Full pipeline MultiQC

Quick start

Prerequisites

Any POSIX compatible system (e.g. Linux, OS X, etc) with internet access
- Run on Windows with Windows Subsystem for Linux (WSL). WSL2 highly recommended.
Nextflow version >= 21.04
- See Nextflow Get started for prerequisites and instructions on installing and updating Nextflow.
Docker
- I recommend Docker Desktop for OS X or Windows users

Get or update `SRAlign`

Download or update SRAlign:
- Downloads the project into $HOME/.nextflow/assets
- Useful for quickly downloading and easily running a project.
  - Allows for accessing SRAlign using Nextflow command by simply referring to trev-f/SRAlign without having to refer to the location of SRAlign in the system.
  - To customize or expand SRAlign, see the documentation on customizing or expanding SRAlign.
```
nextflow pull trev-f/SRAlign
```
Show project info:
```
nextflow info trev-f/SRAlign
```

Test `SRAlign`

Check that SRAlign works on your system:
- -profile test uses preconfigured test parameters to run SRAlign in full on a small test dataset stored in a remote GitHub repository.
  - Because these test files are stored in a remote repository, internet access is required to run the test.
  - For more information, see the profiles section of the nextflow config file and trev-f/SRAlign-test.
```
nextflow run trev-f/SRAlign -profile test 
```

Run `SRAlign`

Prepare the input design csv file.
- Input design file must be in csv format with no whitespace.
- Either reads (fastq or fastq.gz) or alignments (bam) are accepted.
  - If reads are supplied, can be paired or unpaired.
- Required columns:
  - reads: lib_ID, sample_name, replicate, reads1, reads2 (optional)
  - alignments: lib_ID, sample_name, replicate, bam, tool_IDs
- See sample inputs in the SRAlign-test repository.
- A template project repository can be downloaded from the SRAlign-template repository.
Show all configurable options for SRAlign by showing a help message:
- The most important information here is probably the list of available reference genomes.
```
nextflow run trev-f/SRAlign --help
```

Analyze your data with SRAlign:

nextflow run trev-f/SRAlign -profile docker --input <input.csv> --genome <valid genome key>

Tips for running Nextflow and `SRAlign`

SRAlign is designed to be highly configurable, meaning that its default behavior can be changed by supplying any of a number of configurable parameters. These can be supplied in a number of ways that have a specific hierarchy of precedence.

Show configurable parameters by showing command line help documentation: nextflow run trev-f/SRAlign --help
Nextflow arguments always begin with a single dash, e.g. -profile.
Pipeline parameters specified at the command line always begin with a double dash, e.g. --input.
- Parameters specified at the command line always have the highest precedence. They will overwrite parameters specified in any config or params files.
- I recommend specifying required parameters (i.e. --input and --genome) and up to a few others at the command line in this manner. Specifying more than this at the command line gets unwieldy.
A custom config or parameters file is a good option for cases where you want to supply more parameters than can comfortably be done at the command line or you want to use the same custom parameters in multiple runs.
- For a config file, use the params scope
- For a JSON/YAML parameters file, see the Nextflow CLI docs.

Additional documentation

Additional documentation can be found in docs.

Quick links:

	def tools = [
	trim : ['fastp'],
	alignment : ['bowtie2', 'hisat2']
	]

	// check valid read-trimming tool
	assert params.trimTool in tools.trim ,
	"'${params.trimTool}' is not a valid read trimming tool.\n\tValid options: ${tools.trim.join(', ')}\n\t"

	// check valid alignment tool
	assert params.alignmentTool in tools.alignment ,
	"'${params.alignmentTool}' is not a valid alignment tool.\n\tValid options: ${tools.alignment.join(', ')}\n\t"

	ch_multiqcConfig = file(params.multiqcConfig, checkIfExists: true)

	/*
	---------------------------------------------------------------------
	Design and Inputs
	---------------------------------------------------------------------
	*/

	// check design file
	if (params.input) {
	ch_input = file(params.input)
	} else {
	exit 1, 'Input design file not specified!'
	}


	// set input design name
	inName = params.input.take(params.input.lastIndexOf('.')).split('/')[-1]

	// set a timestamp
	timeStamp = new java.util.Date().format('yyyy-MM-dd_HH-mm')

	// set workflow prefix name to be used for output files that combine all files (i.e. only one output file such as the full MultiQC)
	wfPrefix = "${inName}_-_${workflow.runName}_-_${timeStamp}"

	1. QC of raw reads - [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) & [MultiQC](https://multiqc.info/)
	2. Trim raw reads - [cutadapt](https://github.com/marcelm/cutadapt)
	3. Align reads - [BWA](http://bio-bwa.sourceforge.net/) -OR- [Bowtie 2](http://bowtie-bio.sourceforge.net/bowtie2/index.shtml)
	4. Mark duplicates - [samblaster](https://github.com/GregoryFaust/samblaster)
	5. QC of alignments - [Samtools](http://www.htslib.org/) & [MultiQC](https://multiqc.info/)

	3. Download sralign:
	```
	git clone https://github.com/trev-f/sralign.git
	```
	4. Run sralign in test mode:
	```
	nextflow run sralign -profile test
	```
	5. Run your analysis:
	```
	nextflow run sralign -profile <> --input YYYYMMDD_input.csv --genome WBCel235
	```

	- fastqc
	- samtools

trev-f / sralign Goto Github PK

sralign's Introduction

SRAlign

Introduction

Pipeline overview

Quick start

Prerequisites

Get or update SRAlign

Test SRAlign

Run SRAlign

Tips for running Nextflow and SRAlign

Additional documentation

sralign's People

Contributors

Watchers

Forkers

sralign's Issues

Recommend Projects

Recommend Topics

Recommend Org

Get or update `SRAlign`

Test `SRAlign`

Run `SRAlign`

Tips for running Nextflow and `SRAlign`