Giter Site home page Giter Site logo

spate's Introduction

Spate workflow composition toolkit

Spate is a lightweight Python-based file processing workflow composition and visualization toolkit. It offers an API to create and export workflows to various popular execution engines, as well as shells and job schedulers.

The purpose of Spate is to offer a simple way to design a dataflow instance (i.e., file-based data processing steps, and how they depend on each others) and export it to one or more execution resources (i.e., hardware and software environments) where it will run. Add jobs, remove jobs, merge workflows; Spate will take care of the details.

Spate can export the whole workflow, or only the steps that need to be executed. Spate checks input and output files for modification time, and iteratively identify all steps that need to be run or updated. No more worries about re-running a whole time-consuming workflow when you update few files; Spate will ensure that only the affected steps are executed.

Example

This workflow will take two input files, file_A and file_B, and process them to create an output file file_C by running the Unix cat command:

import spate

# create a new empty workflow
workflow = spate.new_workflow("my_workflow")

# add a simple job; here a Unix shell
# command to merge two files into one
workflow.add_job(
	inputs = ("file_A", "file_B"),  # two input files
	outputs = "file_C",  # one output file
	content = """
	   cat {{INPUT0}} {{INPUT1}} > {{OUTPUT}}
	   """)

# export this workflow as a shell script; to run
# it, type `./my_workflow.sh` the the shell
spate.to_shell_script(workflow, "my_workflow.sh")

# export this workflow as a SLURM sbatch script; to run it,
# type `sbatch my_workflow.slurm` on a SLURM-enabled cluster
spate.to_slurm(workflow, "my_workflow.slurm")

see doc/ for additional examples

Installation

Spate is hosted both on PyPI and GitHub, thus offering two options to install it:

  • If using PyPI (recommended to get the latest release) and the pip package manager, type pip install spate on the command line.
  • If using GitHub (in case you want one of the development version) you can download the archive of one of the releases or commits and run pip install <archive>.

Supported execution environments

Environment Status Comment
Unix shell scripts Stable No parallel job execution; jobs are processed sequencially.
Make files Stable
Makeflow scripts Experimental Not fully tested.
Drake scripts Experimental Not fully tested.
SLURM job scheduler Stable Produces sbatch scripts with directed acyclic graph of jobs. Tested with SLURM 14.11.11.
TORQUE/PBS job scheduler Experimental Produces job arrays; does not handle job dependencies. Currently untested.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.