Giter Site home page Giter Site logo

iarcbioinfo / abra-nf Goto Github PK

View Code? Open in Web Editor NEW
4.0 4.0 3.0 239 KB

Nextflow pipeline for ABRA (Assembly Based ReAligner)

License: GNU General Public License v3.0

Nextflow 90.77% Dockerfile 6.02% Shell 3.22%
nextflow assembly pipeline ngs alignment

abra-nf's Introduction

abra-nf

Nextflow pipeline for ABRA2 (Assembly Based ReAligner)

CircleCI Docker Hub https://www.singularity-hub.org/static/img/hosted-singularity--hub-%23e32929.svg

Workflow representation

Description

Apply ABRA2 to realign next generation sequencing data using localized assembly in a set of BAM files.

This scripts takes a set of BAM files (called *.bam) grouped folders as an input. There are two modes:

  • When using matched tumor/normal pairs, the two samples of each pair are realigned together (see https://github.com/mozack/abra#somatic--mode). In this case the user has to provide as an input the folders containing tumor (--tumor_bam_folder) and normal BAM files (--normal_bam_folder) (it can be the same unique folder). The tumor bam file format must be (sample suffix_tumor .bam) with suffix_tumor as _T by default and customizable in input (--suffix_tumor). (e.g. sample1_T.bam). The normal bam file format must be (sample suffix_normal .bam) with suffix_normal as _N by default and customizable in input (--suffix_normal). (e.g. sample1_N.bam).
  • When using only normal (or only tumor) samples, each bam is treated independently. In this case the user has to provide a single folder containing all BAM files (--bam_folder).

In all cases BAI indexes have to be present in the same location than their BAM mates and called *.bam.bai.

Note that ABRA v1 is no longer supported (see the last version supporting it here: https://github.com/IARCbioinfo/abra-nf/releases/tag/v1.0)

Dependencies

  1. This pipeline is based on nextflow. As we have several nextflow pipelines, we have centralized the common information in the IARC-nf repository. Please read it carefully as it contains essential information for the installation, basic usage and configuration of nextflow and our pipelines.

  2. External software:

A conda receipe, and docker and singularity containers are available with all the tools needed to run the pipeline (see "Usage" and the IARC-nf repository for more information.)

Input

  • In tumor-normal mode

Name Description
--tumor_bam_folder Folder containing tumor BAM files
--normal_bam_folder Folder containing matched normal BAM files
--suffix_tumor Suffix identifying tumor bam (default: _T)
--suffix_normal Suffix identifying normal bam (default: _N)
  • Otherwise

Name Description
--bam_folder Folder containing BAM files

Parameters

  • Mandatory

Name Example value Description
--ref /path/to/ref.fasta Reference fasta file indexed
--abra_path /path/to/abra2.jar abra.jar explicit path (not needed if you use docker or singularity container)
  • Optional

Name Default value Description
--bed /path/to/intervals.bed Bed file containing intervals (without header)
--gtf /path/to/annotations.gtf GTF file containing junction annotations
--mem 16 Maximum RAM used
--cpu 4 Number of threads used
--output_folder abra_BAM/ Output folder containing the realigned BAM
  • Flags

Flags are special parameters without value.

Name Description
--help Display help
--single Switch to single-end sequencing mode
--rna Add RNA-specific recommended ABRA2 parameters
--junctions Use STAR identified junctions

Usage

Simple use case example:

nextflow run iarcbioinfo/abra-nf --bam_folder BAM/ --bed target.bed --ref ref.fasta --abra_path /path/to/abra.jar

With singularity:

nextflow run iarcbioinfo/abra-nf -profile singularity --bam_folder BAM/ --bed target.bed --ref ref.fasta --abra_path /path/to/abra.jar

Alternatively, one can run the pipeline using a docker container (-profile docker) or the conda receipe containing all required dependencies (-profile conda).

Output

Type Description
ABRA BAM Realigned BAM files with their indexes

Contributions

Name Email Description
Matthieu Foll* [email protected] Developer to contact for support
Nicolas Alcala [email protected] Developer

FAQ

A few samples always crash with error exit status 130, causing all processes to be stopped by nextflow. What can I do about it?

ABRA memory use has a large variance, often resulting in a few bam files unpredictably requiring much more memory than others, and causing a memory error (exit code 130). Because pipeline ABRA-nf involves a single process that is executed in parallel across all bam files, results for each sample (or Tumor-Normal pair) are independent, and it is recommended to use the nextflow option (e.g., in the nextflow.config file):

process.errorStrategy = 'ignore'

so that files that cause an error do not stop all other processes that would have been processed just fine. ABRA-nf can then be launched again with more memory (option --mem) for the files that failed. ย 

An other possibility is to automatically relaunch individual crashed process with more memory, with something like this in the config file:

process {
     $abra {
           memory = { task.exitStatus == 130 ? 8.GB * task.attempt : 8.GB }
           errorStrategy = { task.exitStatus == 130 ? 'retry' : 'ignore' }
           maxRetries = 4
      }
}

Here we ask Nextflow to try first with 8GB of memory, and if it crashed due to memory (exitcode 130 in this example, but note that this error code is specific to the scheduler used), it will retry with 16GB, then 24GB etc. until 4 maximum retries. If ABRA crashes for another reason the error is ignored.

abra-nf's People

Contributors

gaborieauvalerie avatar mfoll avatar nalcala avatar tdelhomme avatar v-catherine avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

abra-nf's Issues

Input tuple error

The pipeline does not work anymore when we don't have junctions files (which are specific to RNAseq data):
"Input tuple does not match input set cardinality declared by process abra"
The issue is at the input of the abra process : "set bam_tag, file(bam), file(bai), file(junctions) from bam_bai". There should be an alternative like "set bam_tag, file(bam), file(bai) from bam_bai" for DNA seq bams

Add --sa option

We have very long runtimes on WGS data, apparently the --sa option could improve the runtime without scarifying too much performance. See discussion here: mozack/abra2#18

Add support for single end data

Needs to add --single and reduce mapping quality threshold with something like --mapq 20 (default is 40) in ABRA command line (single end reads tend to have lower mapping quality and 40 seems too stringent on a few examples I checked).

Currently ABRA 2 versions 2.07 and above crash with single end data (see mozack/abra2#10).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.