Giter Site home page Giter Site logo

michael-golfi / flint Goto Github PK

View Code? Open in Web Editor NEW

This project forked from camilo-v/flint

0.0 2.0 0.0 5.83 MB

Main repository of the Flint project for Spark and Amazon EMR.

Home Page: https://camilo-v.github.io/flint/

License: MIT License

Shell 26.55% Python 71.25% R 2.20%

flint's Introduction

Flint

This is the main repository of the Flint project for Amazon Web Services. Flint is a metagenomics profiling pipeline that is built on top of the Apache Spark framework, and is designed for fast real-time profiling of metagenomic samples against a large collection of reference genomes. Flint takes advantage of Spark's built-in parallelism and streaming engine architecture to quickly map reads against a large reference collection of bacterial genomes.

Our computational framework is primarily implemented using the MapReduce model, and deployed in a cluster launched using the Elastic Map Reduce service offered by AWS (Amazon Web Services). The cluster consists of multiple commodity worker machines (computational nodes), and in the current configuration of the cluster that we use, each worker machine consists of 15 GB of RAM, 8 vCPUs (a hyperthread of a single Intel Xeon core), and 100 GB of EBS disk storage. Each of the worker nodes will work in parallel to align the input sequencing DNA reads to a partitioned shard of the reference database; after the alignment step is completed, each worker node acts as a regular Spark executor node.

The current database for running Flint is version 41 from Ensembl Bacteria, but we are currently working on the latest version of RefSeq, which should be available this summer.

Publications

Valdes, Stebliankin, Narasimhan (2019), Large Scale Microbiome Profiling in the Cloud, ISMB 2019, in review.

How To Get Started

Communication

  • If you found a bug, open an issue and please provide detailed steps to reliably reproduce it.
  • If you have feature request, open an issue.
  • If you would like to contribute, please submit a pull request.

Requirements

Flint is designed to run on Apache Spark, but the current implementation is tuned for Amazon's EMR Elastic Map Reduce. The basic requirements for an EMR cluster are:

The basic requirements for the worker nodes are:

Bowtie2

Bowtie is required for the alignment step, and needs to be installed in all worker nodes of the Spark Cluster. See the Bowtie2 manual for more information.

Python Packages

The remaining requirements are python packages that Flint needs for a successful run, please refer to the package's documentation for instructions and/or installation instructions.

Contact

Contact Camilo Valdes for pull requests, bug reports, good jokes and coffee recipes.

Maintainers

Collaborators

License

The software in this repository is available under the MIT License. See the LICENSE file for more information.

flint's People

Contributors

camilo-v avatar stebliankin avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.