Giter Site home page Giter Site logo

awesome-pipeline's Introduction

Awesome Pipeline

A curated list of awesome pipeline toolkits inspired by Awesome Sysadmin

Pipeline frameworks & libraries

  • ActionChain - A workflow system for simple linear success/failure workflows.
  • Adage - Small package to describe workflows that are not completely known at definition time.
  • Airflow - Python-based workflow system created by AirBnb.
  • Anduril - Component-based workflow framework for scientific data analysis.
  • Antha - High-level language for biology.
  • Bds - Scripting language for data pipelines.
  • BioMake - GNU-Make-like utility for managing builds and complex workflows.
  • BioQueue - Explicit framework with web monitoring and resource estimation.
  • Bistro - Library to build and execute typed scientific workflows.
  • Bpipe - Tool for running and managing bioinformatics pipelines.
  • Briefly - Python Meta-programming Library for Job Flow Control.
  • Cluster Flow - Command-line tool which uses common cluster managers to run bioinformatics pipelines.
  • Clusterjob - Automated reproducibility, and hassle-free submission of computational jobs to clusters.
  • Compss - Programming model for distributed infrastructures.
  • Conan2 - Light-weight workflow management application.
  • Consecution - A Python pipeline abstraction inspired by Apache Storm topologies.
  • Cosmos - Python library for massively parallel workflows.
  • Cromwell - Workflow Management System geared towards scientific workflows from the Broad Institute.
  • Cuneiform - Advanced functional workflow language and framework, implemented in Erlang.
  • Dagobah - Simple DAG-based job scheduler in Python.
  • Dagr - A scala based DSL and framework for writing and executing bioinformatics pipelines as Directed Acyclic GRaphs.
  • Dask - Dask is a flexible parallel computing library for analytics.
  • Dockerflow - Workflow runner that uses Dataflow to run a series of tasks in Docker.
  • Doit - Task management & automation tool.
  • Drake - Robust DSL akin to Make, implemented in Clojure.
  • Drake (R package) - R-focused reproducibility and scalable high-performance computing with an easy interface.
  • Dray - An engine for managing the execution of container-based workflows.
  • Fission Workflows - A fast, lightweight workflow engine for serverless/FaaS functions.
  • Flex - Language agnostic framework for building flexible data science pipelines (Python/Shell/Gnuplot).
  • Flowr - Robust and efficient workflows using a simple language agnostic approach (R package).
  • Gwf - Make-like utility for submitting workflows via qsub.
  • Hive - System for creating and running pipelines on a distributed compute resource.
  • Joblib - Set of tools to provide lightweight pipelining in Python.
  • Jug - A task Based parallelization framework for Python.
  • Ketrew - Embedded DSL in the OCAML language alongside a client-server management application.
  • Kronos - Workflow assembler for cancer genome analytics and informatics.
  • Loom - Tool for running bioinformatics workflows locally or in the cloud.
  • Longbow - Job proxying tool for biomolecular simulations.
  • Luigi - Python module that helps you build complex pipelines of batch jobs.
  • Makeflow - Workflow engine for executing large complex workflows on clusters.
  • Mario - Scala library for defining data pipelines.
  • MD Studio - Microservice based workflow engine.
  • Mistral - Python based workflow engine by the Open Stack project.
  • Moa - Lightweight workflows in bioinformatics.
  • Nextflow - Flow-based computational toolkit for reproducibile and scalable bioinformatics pipelines.
  • NiPype - Workflows and interfaces for neuroimaging packages.
  • OpenGE - Accelerated framework for manipulating and interpreting high-throughput sequencing data.
  • Pachyderm - Distributed and reproducible data pipelining and data management, built on the container ecosystem.
  • PipEngine Ruby based launcher for complex biological pipelines.
  • Pinball - Python based workflow engine by Pinterest.
  • PyFlow - Lightweight parallel task engine.
  • PypeFlow - Lightweight workflow engine for data analysis scripting.
  • pyperator - Simple push-based python workflow framework using asyncio, supporting recursive networks.
  • pyppl - A python lightweight pipeline framework.
  • pypyr - Simple task runner for sequential steps defined in a pipeline yaml, with AWS and Slack plug-ins.
  • Pwrake - Parallel workflow extension for Rake.
  • Qdo - Lightweight high-throughput queuing system for workflows with many small tasks to perform.
  • Qsubsec - Simple tokenised template system for SGE.
  • Rabix - Python-based workflow toolkit based on the Common Workflow Language and Docker.
  • Reflow - Language and runtime for distributed, incremental data processing in the cloud.
  • Remake - Make-like declarative workflows in R.
  • Rmake - Wrapper for the creation of Makefiles, enabling massive parallelization.
  • Rubra - Pipeline system for bioinformatics workflows.
  • Ruffus - Computation Pipeline library for Python.
  • Ruigi - Pipeline tool for R, inspired by Luigi.
  • Sake - Self-documenting build automation tool.
  • SciLuigi - Helper library for writing flexible scientific workflows in Luigi.
  • SciPipe - Library for writing Scientific Workflows in Go.
  • Scoop - Scalable Concurrent Operations in Python.
  • Snakemake - Tool for running and managing bioinformatics pipelines.
  • Spiff - Based on the Workflow Patterns initiative and implemented in Python.
  • Stpipe - File processing pipelines as a Python library.
  • Sundial - Jobsystem on AWS ECS or AWS Batch managing dependencies and scheduling.
  • Suro - Java-based distributed pipeline from Netflix.
  • Swift - Fast easy parallel scripting - on multicores, clusters, clouds and supercomputers.
  • Toil - Distributed pipeline workflow manager (mostly for genomics).
  • Yap - Extensible parallel framework, written in Python using OpenMPI libraries.
  • WorldMake - Easy Collaborative Reproducible Computing.

Workflow platforms

  • ActivePapers - Computational science made reproducible and publishable.
  • Apache Iravata - Framework for executing and managing computational workflows on distributed computing resources.
  • Arvados - A container based workflow platform.
  • Biokepler - Bioinformatics Scientific Workflow for Distributed Analysis of Large-Scale Biological Data.
  • Butler - Framework for running scientific workflows on public and academic clouds.
  • Chipster - Open source platform for data analysis.
  • Clubber - Cluster Load Balancer for Bioinformatics e-Resources.
  • Digdag - Workflow manager designed for simplicity, extensibility and collaboration.
  • Fireworks - Centralized workflow server for dynamic workflows of high-throughput computations.
  • Galaxy - Web-based platform for biomedical research.
  • Kepler - Kepler scientific workflow application from University of California.
  • KNIME Analytics Platform - General-purpose platform with many specialized domain extensions.
  • NextflowWorkbench - Integrated development environment for Nextflow, Docker and Reusable Workflows.
  • OpenMOLE - Workflow Management System for exploration of models and parameter optimization.
  • Ophidia - Data-analytics platform with declarative workflows of distributed operations.
  • Pegasus - Workflow Management System.
  • Pentaho Kettle - Workflow platform with a graphical design environment.
  • Sushi - Supporting User for SHell script Integration.
  • Yabi - Online research environment for grid, HPC and cloud computing.
  • Taverna - Domain independent workflow system.
  • VisTrails - Scientific workflow and provenance management system.
  • Wings - Semantic workflow system utilizing Pegasus as execution system.

Workflow languages

Workflow standardization initiatives

Literate programming (aka interactive notebooks)

  • Beaker Notebook-style development environment.
  • Binder - Turn a GitHub repo into a collection of interactive notebooks powered by Jupyter and Kubernetes
  • IPython A rich architecture for interactive computing.
  • Jupyter Language-agnostic notebook literate programming environment.
  • Pathomx - Interactive data workflows built on Python.
  • R Notebooks - R Markdown notebook literate programming environment.
  • SoS - Readable, interactive, cross-platform and cross-language data science workflow system.
  • Zeppelin - Web-based notebook that enables interactive data analytics.

Build automation tools

  • Bazel - Build software just as engineers do at Google.
  • DoIt - Highly generalized task-management and automation in Python.
  • Gradle - Unified cross platforms builds.
  • Scons - Python library focused on C/C++ builds.
  • Shake - Define robust build systems akin to GNU Make using Haskell.
  • Make - The GNU Make build system.

Other projects

  • HPC Grid Runner
  • noWorkflow - Supporting infrastructure to run scientific experiments without a scientific workflow management system, and still get things like provenance.
  • Reprozip - Simplifies the process of creating reproducible experiments from command-line executions.

Related lists

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.