Giter Site home page Giter Site logo

wattsinabox / incubator-datafu Goto Github PK

View Code? Open in Web Editor NEW

This project forked from apache/datafu

0.0 1.0 0.0 54.47 MB

Mirror of Apache DataFu

License: Apache License 2.0

Java 97.41% Ruby 0.87% PigLatin 0.49% XSLT 0.52% CSS 0.52% JavaScript 0.06% Shell 0.13%

incubator-datafu's Introduction

Apache DataFu

Apache DataFu is a collection of libraries for working with large-scale data in Hadoop. The project was inspired by the need for stable, well-tested libraries for data mining and statistics.

It consists of two libraries:

  • Apache DataFu Pig: a collection of user-defined functions for Apache Pig
  • Apache DataFu Hourglass: an incremental processing framework for Apache Hadoop in MapReduce

For more information please visit the website:

If you'd like to jump in and get started, check out the corresponding guides for each library:

Blog Posts

Presentations

Papers

Getting Help

Bugs and feature requests can be filed here. For other help please see the website.

Developers

Building the Code

To build DataFu from a git checkout or binary release, run:

./gradlew clean assemble

To build DataFu from a source release, it is first necessary to download the gradle wrapper script above. This bootstrapping process requires Gradle to be installed on the source machine. Gradle is available through most package managers or directly from its website. To bootstrap the wrapper, run:

gradle -b bootstrap.gradle

After the bootstrap script has completed, the regular gradlew instructions are available.

The datafu-pig JAR can be found under datafu-pig/build/libs by the name datafu-pig-x.y.z.jar, where x.y.z is the version. Similarly, the datafu-hourglass can be found in the datafu-hourglass/build/libs directory.

Generating Eclipse Files

This command generates the eclipse project and classpath files:

./gradlew eclipse

To clean up the eclipse files:

./gradlew cleanEclipse

Running the Tests

To run all the tests:

./gradlew test

To run only the DataFu Pig tests:

./gradlew :datafu-pig:test

To run only the DataFu Hourglass tests:

./gradlew :datafu-hourglass:test

To run tests for a single class, use the test.single property. For example, to run only the QuantileTests:

./gradlew :datafu-pig:test -Dtest.single=QuantileTests

The tests can also be run from within eclipse. Note that you may run out of heap when executing tests in Eclipse. To fix this adjust your heap settings for the TestNG plugin. Go to Eclipse->Preferences. Select TestNG->Run/Debug. Add "-Xmx1G" to the JVM args.

incubator-datafu's People

Contributors

aaronjosephs avatar cestella avatar cw11oyd avatar jbanerjee avatar jghoman avatar matthayes avatar mbastian avatar mengxr avatar mtiwari avatar navteniev avatar rjurney avatar sam-s avatar samshah avatar talevy avatar william-g-vaughan avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.