Giter Site home page Giter Site logo

sanha / nemo Goto Github PK

View Code? Open in Web Editor NEW

This project forked from snuspl/nemo

0.0 2.0 0.0 21.46 MB

Nemo: A flexible data processing system

Home Page: https://snuspl.github.io/nemo/

License: Apache License 2.0

Shell 0.36% Python 1.68% Java 97.96%

nemo's Introduction

Nemo

Build Status

A Data Processing System for Flexible Employment With Different Deployment Characteristics.

Online Documentation

Details about Nemo and its development can be found in:

Nemo prerequisites and setup

Prerequisites

  • Java 8
  • Maven
  • YARN settings
  • Protobuf 2.5.0
    • On Ubuntu 14.04 LTS and its point releases:

      sudo apt-get install protobuf-compiler
    • On Ubuntu 16.04 LTS and its point releases:

      sudo add-apt-repository ppa:snuspl/protobuf-250
      sudo apt update
      sudo apt install protobuf-compiler=2.5.0-9xenial1
    • On macOS:

      brew tap homebrew/versions
      brew install [email protected]
    • Or build from source:

    • To check for a successful installation of version 2.5.0, run protoc --version

Installing Nemo

  • Run all tests and install: mvn clean install -T 2C
  • Run only unit tests and install: mvn clean install -DskipITs -T 2C

Running Beam applications

Running an external Beam application

  • Use run_external_app.sh instead of run.sh
  • Set the first argument the path to the external Beam application jar
./bin/run_external_app.sh \
    `pwd`/nemo_app/target/bd17f-1.0-SNAPSHOT.jar \
    -job_id mapreduce \
    -executor_json `pwd`/examples/resources/sample_executor_resources.json \
    -user_main MapReduce \
    -user_args "`pwd`/mr_input_data `pwd`/nemo_output/output_data"

Configurable options

  • -job_id: ID of the Beam job
  • -user_main: Canonical name of the Beam application
  • -user_args: Arguments that the Beam application accepts
  • -optimization_policy: Canonical name of the optimization policy to apply to a job DAG in Nemo Compiler
  • -deploy_mode: yarn is supported(default value is local)

Examples

## MapReduce example
./bin/run.sh \
	-job_id mr_default \
	-executor_json `pwd`/examples/resources/sample_executor_resources.json \
	-optimization_policy edu.snu.nemo.compiler.optimizer.policy.DefaultPolicy \
	-user_main edu.snu.nemo.examples.beam.MapReduce \
	-user_args "`pwd`/examples/resources/sample_input_mr `pwd`/examples/resources/sample_output_mr"

## YARN cluster example
./bin/run.sh \
	-deploy_mode yarn \
  	-job_id mr_pado \
	-executor_json `pwd`/examples/resources/sample_executor_resources.json \
  	-user_main edu.snu.nemo.examples.beam.MapReduce \
  	-optimization_policy edu.snu.nemo.compiler.optimizer.policy.PadoPolicy \
  	-user_args "hdfs://v-m:9000/sample_input_mr hdfs://v-m:9000/sample_output_mr"

Resource Configuration

-executor_json command line option can be used to provide a path to the JSON file that describes resource configuration for executors. Its default value is config/default.json, which initializes one of each Transient, Reserved, and Compute executor, each of which has one core and 1024MB memory.

Configurable options

  • num (optional): Number of containers. Default value is 1
  • type: Three container types are supported:
    • Transient : Containers that store eviction-prone resources. When batch jobs use idle resources in Transient containers, they can be arbitrarily evicted when latency-critical jobs attempt to use the resources.
    • Reserved : Containers that store eviction-free resources. Reserved containers are used to reliably store intermediate data which have high eviction cost.
    • Compute : Containers that are mainly used for computation.
  • memory_mb: Memory size in MB
  • capacity: Number of TaskGroups that can be run in an executor. Set this value to be the same as the number of CPU cores of the container.

Examples

[
  {
    "num": 12,
    "type": "Transient",
    "memory_mb": 1024,
    "capacity": 4
  },
  {
    "type": "Reserved",
    "memory_mb": 1024,
    "capacity": 2
  }
]

This example configuration specifies

  • 12 transient containers with 4 cores and 1024MB memory each
  • 1 reserved container with 2 cores and 1024MB memory

Monitoring your job using web UI

Nemo Compiler and Engine can store JSON representation of intermediate DAGs.

  • -dag_dir command line option is used to specify the directory where the JSON files are stored. The default directory is ./dag. Using our online visualizer, you can easily visualize a DAG. Just drop the JSON file of the DAG as an input to it.

Examples

./bin/run.sh \
	-job_id als \
	-executor_json `pwd`/examples/resources/sample_executor_resources.json \
  	-user_main edu.snu.nemo.examples.beam.AlternatingLeastSquare \
  	-optimization_policy edu.snu.nemo.compiler.optimizer.policy.PadoPolicy \
  	-dag_dir "./dag/als" \
  	-user_args "`pwd`/examples/resources/sample_input_als 10 3"

nemo's People

Contributors

differentsc avatar ejjeong avatar gwsshs22 avatar jeongyooneo avatar johnyangk avatar jooykim avatar mhkwon924 avatar sanha avatar seojangho avatar skystar-p avatar taegeonum avatar wonook avatar wynot12 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.