Giter Site home page Giter Site logo

awesome-pipeline's Introduction

Awesome Pipeline

A curated list of awesome pipeline toolkits inspired by Awesome Sysadmin

Pipeline frameworks & libraries

  • ActionChain - A workflow system for simple linear success/failure workflows.
  • Airflow - Python-based workflow system created by AirBnb.
  • Anduril - Component-based workflow framework for scientific data analysis.
  • Antha - High-level language for biology.
  • Bds - Scripting language for data pipelines.
  • Bpipe - Tool for running and managing bioinformatics pipelines.
  • Briefly - Python Meta-programming Library for Job Flow Control.
  • Cluster Flow - Command-line tool which uses common cluster managers to run bioinformatics pipelines.
  • Clusterjob - Automated reproducibility, and hassle-free submission of computational jobs to clusters.
  • Compss - Programming model for distributed infrastructures.
  • Conan2 - Light-weight workflow management application.
  • Cosmos - Python library for massively parallel workflows.
  • Cromwell - Workflow Management System geared towards scientific workflows from the Broad Institute.
  • Cuneiform - Advanced functional workflow language and framework, implemented in Erlang.
  • Dockerflow - Workflow runner that uses Dataflow to run a series of tasks in Docker.
  • Doit - Task management & automation tool.
  • Dagobah - Simple DAG-based job scheduler in Python.
  • Drake - Robust DSL akin to Make, implemented in Clojure.
  • Flex - Language agnostic framework for building flexible data science pipelines (Python/Shell/Gnuplot).
  • Flowr - Robust and efficient workflows using a simple language agnostic approach (R package).
  • Gwf - Make-like utility for submitting workflows via qsub.
  • Hive - System for creating and running pipelines on a distributed compute resource.
  • Joblib - Set of tools to provide lightweight pipelining in Python.
  • Ketrew - Embedded DSL in the OCAML language alongside a client-server management application.
  • Kronos - Workflow assembler for cancer genome analytics and informatics.
  • Loom - Tool for running bioinformatics workflows locally or in the cloud.
  • Longbow - Job proxying tool for biomolecular simulations.
  • Luigi - Python module that helps you build complex pipelines of batch jobs.
  • Makeflow - Workflow engine for executing large complex workflows on clusters.
  • Mario - Scala library for defining data pipelines.
  • Mistral - Python based workflow engine by the Open Stack project.
  • Moa - Lightweight workflows in bioinformatics.
  • Nextflow - Flow-based computational toolkit for reproducibile and scalable bioinformatics pipelines.
  • NiPype - Workflows and interfaces for neuroimaging packages.
  • OpenGE - Accelerated framework for manipulating and interpreting high-throughput sequencing data.
  • PipEngine Ruby based launcher for complex biological pipelines.
  • Pinball - Python based workflow engine by Pinterest.
  • PyFlow - Lightweight parallel task engine.
  • PypeFlow - Lightweight workflow engine for data analysis scripting.
  • Pwrake - Parallel workflow extension for Rake.
  • Qdo - Lightweight high-throughput queuing system for workflows with many small tasks to perform.
  • Qsubsec - Simple tokenised template system for SGE.
  • Rabix - Python-based workflow toolkit based on the Common Workflow Language and Docker.
  • Remake - Make-like declarative workflows in R.
  • Rmake - Wrapper for the creation of Makefiles, enabling massive parallelization.
  • Rubra - Pipeline system for bioinformatics workflows.
  • Ruffus - Computation Pipeline library for Python.
  • Ruigi - Pipeline tool for R, inspired by Luigi.
  • Sake - Self-documenting build automation tool.
  • SciLuigi - Helper library for writing flexible scientific workflows in Luigi.
  • Scoop - Scalable Concurrent Operations in Python.
  • Snakemake - Tool for running and managing bioinformatics pipelines.
  • Spiff - Based on the Workflow Patterns initiative and implemented in Python.
  • Stpipe - File processing pipelines as a Python library.
  • Suro - Java-based distributed pipeline from Netflix.
  • Swift - Fast easy parallel scripting - on multicores, clusters, clouds and supercomputers.
  • Toil - Distributed pipeline workflow manager (mostly for genomics).
  • Yap - Extensible parallel framework, written in Python using OpenMPI libraries.
  • WorldMake - Easy Collaborative Reproducible Computing.

Workflow platforms

  • ActivePapers - Computational science made reproducible and publishable.
  • Apache Iravata - Framework for executing and managing computational workflows on distributed computing resources.
  • Arvados - A container based workflow platform.
  • Biokepler - Bioinformatics Scientific Workflow for Distributed Analysis of Large-Scale Biological Data.
  • Chipster - Open source platform for data analysis.
  • Fireworks - Centralized workflow server for dynamic workflows of high-throughput computations.
  • Galaxy - Web-based platform for biomedical research.
  • Kepler - Kepler scientific workflow application from University of California.
  • NextflowWorkbench - Integrated development environment for Nextflow, Docker and Reusable Workflows.
  • OpenMOLE - Workflow Management System for exploration of models and parameter optimization.
  • Ophidia - Data-analytics platform with declarative workflows of distributed operations.
  • Pegasus - Workflow Management System.
  • Sushi - Supporting User for SHell script Integration.
  • Yabi - Online research environment for grid, HPC and cloud computing.
  • Taverna - Domain independent workflow system.
  • VisTrails - Scientific workflow and provenance management system.
  • Wings - Semantic workflow system utilizing Pegasus as execution system.

Workflow languages

Workflow standardization initiatives

Literate programming (aka interactive notebooks)

  • Beaker Notebook-style development environment.
  • Binder - Turn a GitHub repo into a collection of interactive notebooks powered by Jupyter and Kubernetes
  • IPython A rich architecture for interactive computing.
  • Jupyter Language-agnostic notebook literate programming environment.
  • Pathomx - Interactive data workflows built on Python.
  • R Notebooks - R Markdown notebook literate programming environment.
  • Wakari - Web-based Python Data Analysis.
  • Zeppelin - Web-based notebook that enables interactive data analytics.

Build automation tools

  • Bazel - Build software just as engineers do at Google.
  • DoIt - Highly generalized task-management and automation in Python.
  • Gradle - Unified cross platforms builds.
  • Scons - Python library focused on C/C++ builds.
  • Shake - Define robust build systems akin to GNU Make using Haskell.
  • Make - The GNU Make build system.

Other projects

  • HPC Grid Runner
  • noWorkflow - Supporting infrastructure to run scientific experiments without a scientific workflow management system, and still get things like provenance.

awesome-pipeline's People

Contributors

averagehat avatar dwyatte avatar ewels avatar gangchen avatar jopasserat avatar krlmlr avatar ludovicc avatar maximilianh avatar michaelcho avatar pditommaso avatar pvanheus avatar rcthomas avatar sahilseth avatar samuell avatar sjackman avatar thejmazz avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.