Giter Site home page Giter Site logo

bioexcel_align's Introduction

BioExcel_Align

Python package to run a genome alignment pipeline, based on workflows defined by IGMM.

Requirements:

  • BWA
  • Samblaster
  • Samtools
  • BioBamBam2 (optional - for beta version of BWA Mem alignment stage)
  • GATK4
  • Python 3.x

We recommend using the conda package manager, and making use of virtual environments. This tool also exists in the bioconda channel. This has the benefit of automatically installing all pre-requisites when installing this tool.

Installation

There are two main ways to install the package.

Conda package installation

Set up a new conda environment (optional):

$ conda create -n my_env -c bioconda python=3 

This creates a clean Python3 environment in which to install and run the tool. If you have a conda environment you already wish to use, make sure you add the bioconda channel to the environment, or your conda package as a whole.

Install BioExcel_Align

$ conda install bioexcel_align

This one line will install BioExcel_Align and all of it's dependencies.

Manual installation

If you wish to install manually, follow the steps below. We still recommend using some kind of virtual environment. Before running the workflow, install the pre-requisite tools and ensure they are contained in your $PATH

$ git clone https://github.com/bioexcel/BioExcel_Align.git
$ cd BioExcel_Align
$ python setup.py install

Usage

Once installed, there are several ways to use the tool. The easiest is to call the executable script, which runs the whole workflow based on several options and arguments the user can modify. Find these using

$ bxcl_align -h

An example of basic usage of the pipeline is:

$ bxcl_align --files in1.fq.gz in2.fq.gz --threads 8 
--outdir ./output --sample 'TestAlign' 
--bwa_ind_ref genomes/Hsapiens/GRCh37/bwa/GRCh37.fa 
-r genomes/Hsapiens/GRCh37/seq/GRCh37.fa 
-k genomes/Hsapiens/GRCh37/variation/dbsnp-147.vcf.gz 
-j '-Djava.io.tmpdir=/home/tmp -Xmx64G'

Python Module

In addition to the executable version, the tool is installed as a Python package, so each stage can be imported as a module into other scripts, if the user wishes to perform more unique/complicated/expanded workflows. Each function creates and returns a python subprocess.

import bioexcel_align
import bioexcel_align.runbwa as rb
import bioexcel_align.rungatk as rg

# Do things before running BWA Mem/samtools/samblaster alignment command

pbwa = rb.bwamem_stable(bwa_ind_ref, threads, date, sampleID, files, bwadir)
pbwa.wait()

# Do things after BWA Mem, and before GATK4 BQSR stages

pbr = rg.baserecal(jvm_opts, threads, ref, infile, knownsites, gatkdir, sampleID)
pbr.wait()

pab = rg.applybqsr(jvm_opts, threads, infile, gatkdir, sampleID)
pab.wait()

# Do further analysis

Stages

Our pipeline consists of two main stages: runbwa and rungatk. Each stage exists as a python module as shown above. Each module contains specific functions that execute the tools listed. The diagram below shows each of these stages and functions, with colour coding to show which tools are used in each module/function, as well as useful output files.

alt text

Each module can also be executed independently of the main executable workflow. For example, if a situation occurs that causes GATK to fail, the rungatk stage can be executed from the command line as

$ python -m bioexcel_align.rungatk <arguments>

bwamem_beta

There is also a more recent, but less tested version of the first stage of the pipeline, which replaces samblaster/samtools with the tool bamsormadup (available as part of biobambam2). We recommend using this with caution. IGMM partners suggested this change, but we have encountered some errors when testing using the Cirrus machine as a testbed for our workflows. Further effort will be needed to develop this further and better understand the cause of the errors.

bioexcel_align's People

Contributors

djwhiteastro avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.