Giter Site home page Giter Site logo

sralign's Introduction

SRAlign

A flexible pipeline for short read alignment to a reference with extensive QC reporting.

Introduction

SRAlign is a Nextflow pipeline for aligning short reads to a reference.

SRAlign is designed to be highly flexible by allowing for the easy addition of tools to the pipeline as well as serving as a starting point for genomic analyses that rely on alignment of short reads to a reference.

Pipeline overview

  1. Trim reads
  2. QC of reads
    1. Raw reads FastQC
    2. Trim reads FastQC
    3. Summary MultiQC
  3. Align reads
    1. Align to reference genome/transcriptome
    2. Check contamination
  4. Preprocess alignments
    1. Mark duplicates
    2. Compress sam to bam
    3. Index bam
  5. QC of alignments
    1. samtools stats
    2. Samtools index stats
    3. Percent duplicates
    4. Percent aligned to contamination reference
    5. Summary MultiQC
  6. Library complexity and reproducibility
    1. Preseq library complexity
    2. DeepTools correlation
    3. DeepTools PCA
  7. Full pipeline MultiQC

Quick start

Prerequisites

  1. Any POSIX compatible system (e.g. Linux, OS X, etc) with internet access

  2. Nextflow version >= 21.04

  3. Docker

    • I recommend Docker Desktop for OS X or Windows users

Get or update SRAlign

  1. Download or update SRAlign:

    • Downloads the project into $HOME/.nextflow/assets
    • Useful for quickly downloading and easily running a project.
      • Allows for accessing SRAlign using Nextflow command by simply referring to trev-f/SRAlign without having to refer to the location of SRAlign in the system.
      • To customize or expand SRAlign, see the documentation on customizing or expanding SRAlign.
    nextflow pull trev-f/SRAlign
  2. Show project info:

    nextflow info trev-f/SRAlign

Test SRAlign

  1. Check that SRAlign works on your system:

    • -profile test uses preconfigured test parameters to run SRAlign in full on a small test dataset stored in a remote GitHub repository.
      • Because these test files are stored in a remote repository, internet access is required to run the test.
      • For more information, see the profiles section of the nextflow config file and trev-f/SRAlign-test.
    nextflow run trev-f/SRAlign -profile test 

Run SRAlign

  1. Prepare the input design csv file.

    • Input design file must be in csv format with no whitespace.
    • Either reads (fastq or fastq.gz) or alignments (bam) are accepted.
      • If reads are supplied, can be paired or unpaired.
    • Required columns:
      • reads: lib_ID, sample_name, replicate, reads1, reads2 (optional)
      • alignments: lib_ID, sample_name, replicate, bam, tool_IDs
    • See sample inputs in the SRAlign-test repository.
    • A template project repository can be downloaded from the SRAlign-template repository.
  2. Show all configurable options for SRAlign by showing a help message:

    • The most important information here is probably the list of available reference genomes.
    nextflow run trev-f/SRAlign --help
  3. Analyze your data with SRAlign:

    nextflow run trev-f/SRAlign -profile docker --input <input.csv> --genome <valid genome key>

Tips for running Nextflow and SRAlign

SRAlign is designed to be highly configurable, meaning that its default behavior can be changed by supplying any of a number of configurable parameters. These can be supplied in a number of ways that have a specific hierarchy of precedence.

  • Show configurable parameters by showing command line help documentation: nextflow run trev-f/SRAlign --help
  • Nextflow arguments always begin with a single dash, e.g. -profile.
  • Pipeline parameters specified at the command line always begin with a double dash, e.g. --input.
    • Parameters specified at the command line always have the highest precedence. They will overwrite parameters specified in any config or params files.
    • I recommend specifying required parameters (i.e. --input and --genome) and up to a few others at the command line in this manner. Specifying more than this at the command line gets unwieldy.
  • A custom config or parameters file is a good option for cases where you want to supply more parameters than can comfortably be done at the command line or you want to use the same custom parameters in multiple runs.

Additional documentation

Additional documentation can be found in docs.

Quick links:

sralign's People

Contributors

trev-f avatar t-f-freeman avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.