Giter Site home page Giter Site logo

epos's Introduction

Join the chat at https://gitter.im/lensacom/dask.mesos Coding Hours

Apache Mesos backend for Dask scheduling library. Run distributed python dask workflows on your Mesos cluster.

Notable Features

  • distributively run tasks in docker container
  • specify resource requirements per task
  • bin packing for optimized resource utilization

Installation

Prerequisits: satyr, dask, toolz. All of them should be installed w/ the following commands:

pip install dask.mesos or use lensa/dask.mesos Docker image

Configuration:

  • MESOS_MASTER=zk://127.0.0.1:2181/mesos
  • ZOOKEEPER_HOST=127.0.0.1:2181

Example

from __future__ import absolute_import, division, print_function

from daskos import mesos, MesosExecutor


@mesos(cpus=0.1, mem=64)
def add(x, y):
    """Run add on mesos with specified resources"""
    return x + y


@mesos(cpus=0.3, mem=128, image='lensa/daskos')
def mul(x, y):
    """Run mul on mesos in specified docker image"""
    return x * y


with MesosExecutor(name='dask') as executor:
    """This context handles Mesos scheduler's lifecycle"""
    a, b = 23, 89
    alot = add(a, b)
    gigalot = mul(alot, add(10, 2))

    gigalot.compute(get=executor.get)  # (a + b) * (10 + 2)
    executor.compute([alot, gigalot])  # list of futures

Configuring daskos Tasks

You can configure your mesos tasks in your decorator, currently the following options are available:

  • cpus: The amount of cpus to use for the task.
  • mem: Memory in MB to reserver for the task.
  • disk: The amount of disk to use for the task.
  • image: A docker image name. If not set, mesos containerizer will be used.

Both mem and cpus are defaults to some low values set in satyr so you're encouraged to override them here. More options like constraints, other resources are on the way.

epos's People

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

Forkers

bossjones

epos's Issues

Chronos tasks w/o docker

We use the $MESOS_SANDBOX variable where cloudpickle should reside but when we run our stuff cloudpickle can not be found. The variable and it's content should be checked.

Add extras_require to setup.py

extras_require = {
    # TODO revisit
    'backends': ['odo', 'pywebhdfs', 'pymongo', 'sqlalchemy', 'paramiko',
                 'cassandra-driver'],
    # dask.mesos should be enough
    'mesos': ['dask.mesos', 'satyr', 'toolz', 'mesos.native'],
    'spark': ['pyspark', 'py4j']
}
extras_require['complete'] = sorted(set(sum(extras_require.values(), [])))

Setting log level

In functions decorated w/ @mesos, @marathon, etc. logging level should be set from the outside, maybe inherited from the calling environment.

Odo Cassandra Backend

To support the following conversions:

odo(spark_dataframe, 'cql://cassandrahost::cassandra_table')
odo(pandas_dataframe, 'cql://cassandrahost::cassandra_table')
odo('cql://cassandrahost::cassandra_table', sqlContext)   # spark dataframe
odo('cql://cassandrahost::cassandra_table', pd.DataFrame) # pandas dataframe

Odo Parquet Format

To support the following operations:

  • odo(spark_dataframe, 'hdfs://path/to/output.parquet')
  • odo(pandas_dataframe, 'hdfs://path/to/output.parquet')
  • odo('hdfs://path/to/input.parquet', sqlContext) # spark dataframe
  • odo('hdfs://path/to/input.parquet', pd.DataFrame) # pandas dataframe

related odo issue blaze/odo#31

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.