Giter Site home page Giter Site logo

nanopore-nf's Introduction

Nanopore pipeline for DPIRD - Nextflow edition

The pipeline requires Nextflow to run. Tests have been done with Nextflow version 19.04.1. The standard profile assumes running on Zeus at Pawsey Supercomputing Centre and uses containerised software.

Pipeline

Basecalling* -> Chopping -> De-novo assembling -> Blasting+ -> Aligning#

* Optional
+ Either with Blast or Diamond
# Requires additional input in a subsequent run

Basic usage

nextflow run marcodelapierre/nanopore-nf \
  -profile zeus --slurm_account='pawsey0001' \
  --read_dir='reads'

The flag --read_dir feeds the directory name where read files from a single experiment are located. Name patterns can be used to run multiple experiments at once. Output files are stored in subdirectory(ies) with name results_$read_dir. The flag --slurm_account sets your Pawsey account to run on Zeus. In alternative, edit the value of the variable params.slurm_account in the file nextflow.config. Finally, the flag -profile allows to select the appropriate profile for the machine in use, Zeus in this case.

After blasting and identifying reference sequences of interest, alignment can be performed against them, by using the flag --seqid to provide the sequence IDs, and the flag -resume to restart from the previous run:

nextflow run marcodelapierre/nanopore-nf \
  -profile zeus --slurm_account='pawsey0001' \
  --read_dir='reads' \
  -resume --seqid='comma,separated,list,of,ids,from,blast'

Pipeline variants

The expected default input is one or multiple directory/ies containing raw read files from experiment(s). By default, Blast is used for blasting.

  1. To feed instead a single (or multiple, using name patterns) already basecalled FASTQ file(s) as input, use the flag --basecalled='basecalled.fastq'; raw reads are ignored.
  2. To use Diamond for blasting, add the flag --diamond.

Optional parameters

  • Change evalue for blasting: --evalue='0.1'.
  • Change minimum length threshold for assembled contigs to be considered for blasting: --min_len_contig='1000'.

Multiple inputs at once

Name patterns can be used to let the pipeline process multiple datasets at once.

  1. Imagine you have read directories all within the same location, with names sample*. Then use the flag --read_dir='sample*'. One output directory per input dataset will be created in the same location, with names results_sample*.

  2. If you have read directories organised as sample*/reads, then use the flag --read_dir='sample*/reads'. Output directories will be created according to sample*/results_reads.

A similar syntax holds when using basecalled FASTQ inputs through the flag --basecalled.

Requirements

Software:

  • Guppy
  • Pomoxis
  • Blast or Diamond

Reference data:

  • Database for Blast or Diamond

Additional resources

The extra directory contains example Slurm scripts, job1.sh and job2.sh to run on Zeus. There is also a sample script log.sh that takes a run name as input and displays formatted runtime information.

nanopore-nf's People

Contributors

marcodelapierre avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.