Giter Site home page Giter Site logo

dcos-zeppelin's Introduction

Zeppelin on DC/OS

This project provides docker images and marathon app definitions to run Apache Zeppelin on DC/OS. Images are published to Docker Hub.

This is a custom-built image based on the mesosphere spark docker image because the official Zeppelin docker image does not contain the neccessary libraries for mesos and can not be configured for the extra features that are possible with DC/OS (see below).

At the moment this project can not be installed via the DC/OS Universe.

The Spark interpreter in Zeppelin can currently not be used on a DC/OS EE cluster with strict security mode enabled. Please use a cluster configured with disabled or permissive security mode.

Quickstart

  1. dcos marathon app add deploy/zeppelin-minimal.json
  2. Open the DC/OS UI
  3. Wait till the zeppelin service is green / healthy
  4. use the "Open Service" link to open zeppelin in a new tab

How to install

  1. Use the marathon app definition in deploy/zeppelin-minimal.json as a basis
  2. Choose the extra features you want from the list below and modify the json accordingly or use extended zeppelin-volume-shiro-hdfs.json file
  3. Choose a image variant based on spark version
  4. If you use another spark version than the default, don't forget to change the environment variable SPARK_MESOS_EXECUTOR_DOCKER_IMAGE
  5. Change SPARK_CORES_MAX and SPARK_EXECUTOR_MEMORY depending on your cluster size and available resources
  6. Deploy to marathon

Requirements

  • DC/OS 1.10 (OpenSource or Enterprise)
  • Optional: HDFS
  • Optional: Marathon-LB
  • Optional: HTTP Fileserver

Features

Persistent Notebooks

The docker image is built to store notebook data on a persistent volume. To use it add a volume definition to the app

{
  "container": {
    "volumes": [
      {
        "containerPath": "/zeppelin-data",
        "external": {
          "name": "volume-zeppelin-data",
          "provider": "dvdi",
          "options": {
            "dvdi/driver": "rexray"
          }
        },
        "mode": "RW"
      }
    ]
  }
}

and set the following environment variables:

  • Set ZEPPELIN_DATA_VOLUME to the mount path of the volume (e.g. /zeppelin-data)
  • Set ZEPPELIN_NOTEBOOK_DIR to a subpath of the volume (e.g. /zeppelin-data/notebook)

It is recommended to use an external persistent volume so that data is not lost even when a node breaks down.

User Management

For authentication and authorization Zeppelin uses Shiro. It is configured using a file shiro.ini. The docker image searches for this file in the sandbox directory on startup. You can provide it either via the fetch file mechanism or as a secret (recommended, only available on DC/OS EE).

To use a secret execute the following steps:

  1. Create your custom shiro.ini file, there is an example file in the deploy folder in this repo.
  2. Create a secret from this file using the dcos cli: dcos security secrets create -f shiro.ini zeppelin/shiro-conf
  3. Add a secrets definition to the app:
{
  "secrets": {
    "shiroconf": {
      "source": "zeppelin/shiro-conf"
    }
  }
}
  1. Provide secret as an environment variable
{
  "env": {
    "ZEPPELIN_SHIRO_CONF": {
      "secret": "shiroconf"
    }
  }
}

To use the fetch file mechanism:

  1. Create your custom shiro.ini file, there is an example file in the deploy folder in this repo.
  2. Upload your shiro.ini file to a location accessible via http from your cluster.
  3. Add a fetch definition to your app:
{
  "fetch": [
    {"uri": "http://my.fileserver/zeppelin/shiro.ini", "extract": false, "executable": false, "cache": false }
  ]
}

HDFS

To access HDFS from zeppelin you need to provide the files hdfs-site.xml and core-site.xml. if you installed the HDFS framework from the Universe, you just need to add the following fetch definition to the app:

{
  "fetch": [
    { "uri": "http://api.hdfs.marathon.l4lb.thisdcos.directory/v1/endpoints/hdfs-site.xml", "extract": false, "executable": false, "cache": false },
    { "uri": "http://api.hdfs.marathon.l4lb.thisdcos.directory/v1/endpoints/core-site.xml", "extract": false, "executable": false, "cache": false }
  ]
}

Extra configuration

You can provide your own custom zeppelin-site.xml:

  1. Create your custom zeppelin-site.xml
  2. Make sure not to change the default bind port for zeppelin (8080) as on startup this will be replaced with the host port of the container
  3. Upload your zeppelin-site.xml file to a location accessible via http from your cluster.
  4. Add a fetch definition to your app:
{
  "fetch": [
    {"uri": "http://my.fileserver/zeppelin/zeppelin-site.xml", "extract": false, "executable": false, "cache": false }
  ]
}

External access

The provided marathon app definition by default allows access to zeppelin via the admin router proxy ("Open Service" in the DC/OS UI). if you have marathon-lb installed you can also use it. Just add the following labels to the app:

{
  "labels": {
    "HAPROXY_GROUP": "external",
    "HAPROXY_0_VHOST": "zeppelin.my.domain"
  },
}

Interpreters

There are two variants of the docker image based on the download variants on the zeppelin homepage:

  • all: With all interpeters
  • netinst: With only the spark interpreter

By default the app definitions use the all variant. If you want the netinst variant, just change the -all in the docker image tag to -netinst.

Building

./build.sh

The build script will build docker images for different spark versions and with zeppelin with all interpreters (all) or just the spark interpreter (netinst). The dockerfile uses ARG before FROM, therefore you need at least docker 17.05 to build.

Acknowledgments

This project is based on the official mesosphere spark docker image.

Contributing

If you find a bug or have a feature request, just open an issue in Github. Or, if you want to contribute something, feel free to open a pull request.

dcos-zeppelin's People

Contributors

swoehrl-mw avatar

Watchers

Tomas Barton avatar James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.