Giter Site home page Giter Site logo

eipm / bowtie2 Goto Github PK

View Code? Open in Web Editor NEW

This project forked from benlangmead/bowtie2

0.0 0.0 0.0 150.98 MB

A fast and sensitive gapped read aligner

License: GNU General Public License v3.0

Makefile 0.57% C++ 80.78% C 1.03% Perl 14.78% Python 1.51% Shell 1.07% CMake 0.26%

bowtie2's Introduction

Github Actions Generic badge Build Status License: GPL v3

Overview

Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. It is particularly good at aligning reads of about 50 up to 100s or 1,000s of characters, and particularly good at aligning to relatively long (e.g. mammalian) genomes. Bowtie 2 indexes the genome with an FM Index to keep its memory footprint small: for the human genome, its memory footprint is typically around 3.2 GB. Bowtie 2 supports gapped, local, and paired-end alignment modes.

Obtaining Bowtie2

Bowtie 2 is available from various package managers, notably Bioconda. With Bioconda installed, you should be able to install Bowtie 2 with conda install bowtie2.

Containerized versions of Bowtie 2 are also available via the Biocontainers project (e.g. via Docker Hub).

You can also download Bowtie 2 sources and binaries from the "releases" tab on this page. Binaries are available for the Linux, Mac OS X, and Windows. By utilizing the SIMDE project Bowtie 2 now supports the following architectures: ARM64, PPC64, and s390x. If you plan to compile Bowtie 2 yourself, make sure you at least have the zlib library and header files installed. See the Building from source section of the manual for details.

Getting started

Looking to try out Bowtie 2? Check out the Bowtie 2 UI (currently in beta).

Alignment

bowtie2 takes a Bowtie 2 index and a set of sequencing read files and outputs a set of alignments in SAM format.

"Alignment" is the process by which we discover how and where the read sequences are similar to the reference sequence. An "alignment" is a result from this process, specifically: an alignment is a way of "lining up" some or all of the characters in the read with some characters from the reference in a way that reveals how they're similar. For example:

  Read:      GACTGGGCGATCTCGACTTCG
             |||||  |||||||||| |||
  Reference: GACTG--CGATCTCGACATCG

Where dash symbols represent gaps and vertical bars show where aligned characters match.

We use alignment to make an educated guess as to where a read originated with respect to the reference genome. It's not always possible to determine this with certainty. For instance, if the reference genome contains several long stretches of As (AAAAAAAAA etc.) and the read sequence is a short stretch of As (AAAAAAA), we cannot know for certain exactly where in the sea of As the read originated.

Examples

# Aligning unpaired reads
bowtie2 -x example/index/lambda_virus -U example/reads/longreads.fq

# Aligning paired reads
bowtie2 -x example/index/lambda_virus -1 example/reads/reads_1.fq -2 example/reads/reads_2.fq

Building an index

bowtie2-build builds a Bowtie index from a set of DNA sequences. bowtie2-build outputs a set of 6 files with suffixes .1.bt2, .2.bt2, .3.bt2, .4.bt2, .rev.1.bt2, and .rev.2.bt2. In the case of a large index these suffixes will have a bt2l termination. These files together constitute the index: they are all that is needed to align reads to that reference. The original sequence FASTA files are no longer used by Bowtie 2 once the index is built.

Bowtie 2's .bt2 index format is different from Bowtie 1's .ebwt format, and they are not compatible with each other.

Examples

# Building a small index
bowtie2-build example/reference/lambda_virus.fa example/index/lambda_virus

# Building a large index
bowtie2-build --large-index example/reference/lambda_virus.fa example/index/lambda_virus

Index inpection

bowtie2-inspect extracts information from a Bowtie 2 index about what kind of index it is and what reference sequences were used to build it. When run without any options, the tool will output a FASTA file containing the sequences of the original references (with all non-A/C/G/T characters converted to Ns). It can also be used to extract just the reference sequence names using the -n/--names option or a more verbose summary using the -s/--summary option.

Examples

# Inspecting a lambda_virus index (small index) and outputting the summary
bowtie2-inspect --summary example/index/lambda_virus

# Inspecting the entire lambda virus index (large index)
bowtie2-inspect --large-index example/index/lambda_virus

Publications

Bowtie 2 Papers

Related Publications

Related Work

Check out the Bowtie 2 UI, a shiny, frontend to the Bowtie 2 command line.

bowtie2's People

Contributors

benlangmead avatar bmwiedemann avatar bwlang avatar cbrueffer avatar ch4rr0 avatar christopherwilks avatar extemporaneousb avatar hamilcare avatar infphilo avatar jeffhussmann avatar jmarshall avatar junaruga avatar mr-c avatar mtojek avatar nathanweeks avatar nsoranzo avatar petehaitch avatar rpetit3 avatar sjackman avatar val-antonescu avatar vejnar avatar wasade avatar wookietreiber avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.