Giter Site home page Giter Site logo

drake's Introduction

Drake

Drake is a simple-to-use, extensible, text-based data workflow tool that organizes command execution around data and its dependencies. Data processing steps are defined along with their inputs and outputs and Drake automatically resolves their dependencies and calculates:

  • which commands to execute (based on file timestamps)
  • in what order to execute the commands (based on dependencies)

Drake is similar to GNU Make, but designed especially for data workflow management. It has HDFS support, allows multiple inputs and outputs, and includes a host of features designed to help you bring sanity to your otherwise chaotic data processing workflows.

Installation

Drake is a Clojure project, so to build Drake you will need to have leiningen.

Note that Drake has been tested under Linux and Mac OS X. We've not tested it on Windows.

Clone the project:

$ git clone [email protected]:Factual/drake.git
$ cd drake

Build the uberjar:

$ lein uberjar

Run Drake from the jar

Once you've built the uberjar, you can run Drake like this:

$ java -jar drake.jar

You can pass in arguments and options to Drake by putting them at the end of the above command, e.g.:

$ java -jar drake.jar --version

A nicer way to run Drake

We recommend you "install" Drake in your environment so that you can run it by just typing "drake". For example, you could have an executable script called drake, like this on your path:

#!/bin/bash
java -cp $(dirname $0)/drake.jar drake.core $@

Drake documentation refers to running Drake as "drake". If you are instead running the uberjar, just replace "drake" with "java -jar drake.jar" in the examples.

Basic Usage

The wiki is the home for Drake's documentation, but here are simple notes on usage:

To build a specific target (and any out-of-date dependencies, if necessary):

$ drake mytarget

To build a target and everything that depends on it (a.k.a. "down-tree" mode):

$ drake ^mytarget

To build a specific target only, without any dependencies, up or down the tree:

$ drake =mytarget

To force build a target:

$ drake +mytarget

To force build a target and all its downtree dependencies:

$ drake +^mytarget

To force build the entire workflow:

$ drake +...

To exclude targets:

$ drake ... -sometarget -anothertarget

By default, Drake will look for ./workflow.d. The simplest way to run your workflow is to name your workflow file workflow.d, and make sure you're in the same directory. Then, simply:

$ drake

To specify the workflow file explicitly, use -w or --workflow. E.g.:

$ drake -w /myworkflow/my-fav-workflow.d

Use drake --help for the full list of options.

Documentation, etc.

The wiki is the home for Drake's documentation.

A lot of work went into designing and specifying Drake. To prove it, here's the 60 page specification document. It can be downloaded as a PDF and treated like a user manual.

There are annotated workflow examples in the demos directory.

There's a Google Group for Drake

If you like screencasts, check out this Drake walk-through video recorded by Artem, Drake's primary designer:

HDFS Compatibility

Drake provides HDFS support by allowing you to specify inputs and outputs like hdfs://my/big_file.txt.

If you plan to use Drake with HDFS, please see the wiki doc on HDFS Compatibility.

License

Source Copyright © 2012-2013 Factual, Inc.

Distributed under the Eclipse Public License, the same as Clojure uses. See the file COPYING.

drake's People

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.