Giter Site home page Giter Site logo

assembltrie's Introduction

Assembltrie

Assembltrie is a software tool for compressing collections of (fixed length) Illumina reads, written in C++14 and is availble under an open-source license. Currently, Assembltrie is the only FASTQ compressor that approaches the information theory limit for a given short read collection uniformly sampled from an underlying reference genome. Assembltrie becomes the first FASTQ compressor that achieves both combinatorial optimality and information theoretic optimality under fair assumptions.

Installation

Assembltrie is suggested to compile with

  • GCC version 5.0 or higher (or equivalent GCC version that supports at least C++14)
  • Intel version 17.0 or higher

to generate reliable compression performance. Note that unlike most existing software tools, Assembltrie does not depend on any down-stream compressors, such as gzip or bzip2 so it is not necessary to install them. The tentative building process is as simple as

cd assembltrie
make

which will give an executable program astrie.

Usage

To run Assembltrie from command line, type

astrie -c -i <input.fastq> -o <result> [options]

for compression; and

astrie -d -i <input.out> -o <result> [options]

for decompression.

ย Compression. The mode -c implies to compress the input FASTQ file, generating two separate binary output (compressed) files: one named result.out, containing the encoding of assembled reads; the other named part.out, containing the encoding of singletons as well as other meta information necessary for decompression. In addition, options specifies the following mandatory and selectable parameters:

  • -L <integer> Mandatory, specifies the (fixed) read length in one compression run, the maximum value is L = 250
  • -K <integer> Mandatory, specifies the minimum overlap length/hash length, the suggested value is floor(L / 5) for L = 100
  • -h 0 | 1 | 2 Optional, h = 0 ignores any strand correction heuristic; h = 2 applies our greedy strand correction heuristic
  • -s <integer> Optional, accelerates potential children search by ignoring the already processed reads with suffix length less than or equal to integer, and the suggested value is floor(L / cov), where cov denotes the coverage of the input read collection (FASTQ file)
  • -e <integer> Optional, the maximum allowed mismatches for read overlaps. The default value is 3, but 4 is strongly recommended for L = 100 and 6 for L = 150
  • -n <integer> Optional, the number of working threads, defualt value is n = 8

Decompression. The mode -d implies to decompress the input compressed file input.out plus the available part.out into result.fasta, which contains a permutation (according to their locations in the constructed read forest) of (the sequence content only) of the original uncompressed read collection. To properly decompress input.out, Assembltrie expects the following parameters

  • -L <integer> Mandatory, the (fixed) read length in one compression run, should be the same as what is specified in the compression process.
  • -h 0 | 1 | 2 Mandatory, although in Assembltrie's compression process it's optional. Again, it should follow what is specified in the compression process.

Sample Usage

(export PATH=.:$PATH)
astrie -c -L100 -K20 -h0 -s4 -iSRR554369_1.fastq -oSRR554369_1.out -e4 -n8
astrie -d -L100 -h0 -iSRR554369_1.out -oSRR554369_1.fasta

assembltrie's People

Contributors

kyzhu avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.