Giter Site home page Giter Site logo

imos's Introduction

IMOS

Improved Meta-aligner and Minimap2 On Spark.

IMOS is an aligner for mapping noisy long reads to the reference genome. It can be used on a single node as well as on distributed nodes. In its single-node mode, IMOS is an Improved version of Meta-aligner (IM) enhancing both its accuracy and speed. IM is up to 6x faster than the original Meta-aligner. It is also implemented to run IM and Minimap2 on Apache Spark for deploying on a cluster of nodes. Moreover, multi-node IMOS is faster than SparkBWA while executing both IM (1.5x) and Minimap2 (25x)

Citation:

Hadadian Nejad Youesfi, Mostafa, et al. "IMOS: improved Meta-aligner and Minimap2 On Spark". BMC Bioinformatics. (2019): link.

Contact : [email protected]

IMOS can be downloaded from here.

Pre-Generated human genome index files can be downloaded from here. (in command line enter hg.fa as index after -REF)

Index Builder

For building index files from an FA file, place SureMap-IndexBuilder and Reference file in FASTA format in the same directory as IMOS.jar. Currently, it is tested on 64 bit Linux.

Usage: java -cp IMOS.jar IndexBuilder [FA File]
        FA File :         FastA Reference File

Load Balancing

Before putting file to the HDFS, use the load balancer to reach better performance. The program will build a .fastm file which is balanced base on the HDFS operations. In case you used this, add -FM in the command when submitting job to spark.

Usage: java -cp IMOS.jar LoadBalancer [aligner] [filename] [node] [isIllumina]
        aligner:              [mini,meta]
        filename [string]:    path to the input FastQ file.
        node [int]:           indicates number of nodes in the cluster
        isIllumina:           yes, if it is illumina, No or leaving it blank for pacbio

IMOS Single Node Mode - IM

This mode is designed and developed for single node use. When you do not want to use Apache Spark, use this mode.

Usage: java -cp IMOS.jar IM [OPTIONS] -I [inputFQ] -REF [index]
        inputFQ:              Input reads in FastQ format
        index:                Index files name built with index builder
    OPTIONS:
        -C [int]:             Number of cores
        -ER [float]:          Tolerable error rate, 0<=rate<=1
        -O [String]:          Output file path
        -RF [int]:            Refine Factor 1<=factor<=10 [default=4]
        -X [String]:          Sequencer Machine : {"Pacbio","Illumina"}
        
    EXAMPLE: java -cp IMOS.jar IMOSClient -c 4 -x Pacbio -O out.sam -I Read.fq -REF chr19.fa

IMOS SPARK Mode (Distributed Mode)

First, you must set up an apache spark cluster. Note that IMOS can operate on any Spark cluster. It only requires running an IMOSWorker on every Spark worker node. If you want to run Spark locally, we recommend you to use IMOSClient for better performance. When the cluster setup completed, submit IMOS to the Spark cluster. Currently, it is tested on Linux.

IMOSWorker

Usage: java -cp IMOS.jar IMOSWorker [ALIGNER] [OPTIONS] -REF [INDEX]
Warning: port 7777 and 7778 must be open
Warning: use -Xmx18G for human genome
    INDEX:
        Index files name built with index builder
    ALIGNER:
        IM : Improved Meta-aligner
        Mini : Minimap2
        Third : 3rd party aligner
    OPTIONS:
        Minimap2:
           The arguments give directly to the Minimap2. See its help for more details.
        Third:
           The arguments give directly to the Third party aligner.
        IM:
           -C [int]:       Number of cores
           -ER [float]:    Tolerable error rate, 0<=rate<=1
           -RF [int]:      Refine Factor, 1<=rate<=10 [default=4]
           -X [String]:    Sequencer Machine : {"Pacbio","Illumina"}
    
    EXAMPLE: java -cp IMOS.jar IMOSWorker im -c 4 -x Pacbio -REF chr19.fa

Minimap2

For compiling Minimap2 in order to work with IMOSWorker, download main.c form here and the minimap2 package from Github. Copy our modified main.c into the main folder of minimap2 downloaded from GitHub and do the rest as before to compile minimap2. Finally, put minimap2 and IMOSWorker in the same directory.

Submit IMOS

Usage: spark-submit --class IMOS --master [MASTER] --executor-memory 10G --dirver-memory 2G IMOS.jar [ALIGNER] [OPTIONS] -I [inputFQ]
     MASTER: Identify Spark Master local, yarn or ip of spark standalone master
     inputFQ: Input reads in FastQ format
     ALIGNER: IM for Improved Meta-aligner and ThirdParty, Mini for Minimap2</p>
     OPTIONS:
            -FM : if load balancer is used and the file in the hdfs is a fastm format
        Mini:
            No Option is required. The options must be set at the worker nodes.</p>
        IM:
           -ER [float]: Tolerable error rate, 0<=rate<=1
           -O [String]: Output file path
           -X [String]: Sequencer Machine : {"Pacbio","Illumina"}
    
    EXAMPLE: spark-submit --class IMOS --master local --executor-memory 10G --dirver-memory 2G IMOS.jar IM -X Pacbio -I Read.fq -O out.sam


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

imos's People

Contributors

mosjava avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.