Giter Site home page Giter Site logo

datamake's Introduction

Datamake

Travis CI Status

This is an experiment, feedback welcome.

A simple tool for managing parameterized job flows with data dependencies. This is not a scheduler.

Each task specifies an artifact with a URI and a command to be executed on the shell.

Tasks can specify tasks they depend on. A task can depend on and be depended on by multiple tasks.

Flows are run by specifying the end task that you wish to run: all upstream tasks are run first. Tasks in the flow file, but not upstream of the target task will not be run.

When a flow runs, for each task in the flow, if its artifact does not exist then its command will be run. If its artifact does exist, then its command will not be run - and any upstream tasks solely dependent on this task will also not run.

Downstream jobs pass parameters to upstream jobs.

Command line parameters can be eval'd to multiple values and will run the whole flow for each value. Eg: an aggregation script can be passed dates for the last 7 days and only the missing days will have anything to do.

The downstream jobs are the ones scheduled by cron / citrine / something else and pull on the tasks upstream and run them as necessary. So they pull rather than push.

Install

python setup.py install

Run an example

Run with

datamake main examples/date.json --eval-param date='days_range(-2,0)'

Flow file format

Example 1:

{
  "version": "1.0",
  "description": "This is a contrived example showing a diamond of dependencies.",
  "namespace": "examples",
  "tasks":
  [
    {
      "id": "download",
      "command": "curl -i https://api.github.com/users/${username} > /tmp/datamake-diamond-example-${username}.json",
      "cleanup": true,
      "artifact": "/tmp/datamake-diamond-example-${username}.json"
    },
    {
      "id": "grep-email",
      "command": "grep email /tmp/datamake-diamond-example-${username}.json",
      "dependencies": ["download"]
    },
    {
      "id": "grep-name",
      "command": "grep name /tmp/datamake-diamond-example-${username}.json",
      "dependencies": ["download"]
    },
    {
      "id": "user-details",
      "dependencies": ["grep-email", "grep-name"]
    }
  ]
}

run with:

datamake examples.user-details examples/diamond.json --param username=tims

Example 2:

{
  "version": "1.0",
  "description": "This is a contrived example showing eval params and a helpful date util function.",
  "namespace": "examples",
  "tasks":
  [
    {
      "id": "download",
      "command": "touch /tmp/datamake-date-example-${date}.json",
    }
  ]
}

run with:

datamake main examples/date.json --eval-param date='days_range(-2,0)'

Help

datamake --help

There's some irritating dependencies. Like oursql, which it used for hacky mysql artifacts. One day artifact types should be pluggable, because this sucks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.