This project provides docker images and marathon app definitions to run Apache Zeppelin on DC/OS. Images are published to Docker Hub.
This is a custom-built image based on the mesosphere spark docker image because the official Zeppelin docker image does not contain the neccessary libraries for mesos and can not be configured for the extra features that are possible with DC/OS (see below).
At the moment this project can not be installed via the DC/OS Universe.
The Spark interpreter in Zeppelin can currently not be used on a DC/OS EE cluster with strict security mode enabled. Please use a cluster configured with disabled or permissive security mode.
dcos marathon app add deploy/zeppelin-minimal.json
- Open the DC/OS UI
- Wait till the zeppelin service is green / healthy
- use the "Open Service" link to open zeppelin in a new tab
- Use the marathon app definition in deploy/zeppelin-minimal.json as a basis
- Choose the extra features you want from the list below and modify the json accordingly or use extended zeppelin-volume-shiro-hdfs.json file
- Choose a image variant based on spark version
- If you use another spark version than the default, don't forget to change the environment variable
SPARK_MESOS_EXECUTOR_DOCKER_IMAGE
- Change
SPARK_CORES_MAX
andSPARK_EXECUTOR_MEMORY
depending on your cluster size and available resources - Deploy to marathon
- DC/OS 1.10 (OpenSource or Enterprise)
- Optional: HDFS
- Optional: Marathon-LB
- Optional: HTTP Fileserver
The docker image is built to store notebook data on a persistent volume. To use it add a volume definition to the app
{
"container": {
"volumes": [
{
"containerPath": "/zeppelin-data",
"external": {
"name": "volume-zeppelin-data",
"provider": "dvdi",
"options": {
"dvdi/driver": "rexray"
}
},
"mode": "RW"
}
]
}
}
and set the following environment variables:
- Set
ZEPPELIN_DATA_VOLUME
to the mount path of the volume (e.g./zeppelin-data
) - Set
ZEPPELIN_NOTEBOOK_DIR
to a subpath of the volume (e.g./zeppelin-data/notebook
)
It is recommended to use an external persistent volume so that data is not lost even when a node breaks down.
For authentication and authorization Zeppelin uses Shiro. It is configured using a file shiro.ini
. The docker image searches for this file in the sandbox directory on startup. You can provide it either via the fetch file mechanism or as a secret (recommended, only available on DC/OS EE).
To use a secret execute the following steps:
- Create your custom shiro.ini file, there is an example file in the deploy folder in this repo.
- Create a secret from this file using the dcos cli:
dcos security secrets create -f shiro.ini zeppelin/shiro-conf
- Add a secrets definition to the app:
{
"secrets": {
"shiroconf": {
"source": "zeppelin/shiro-conf"
}
}
}
- Provide secret as an environment variable
{
"env": {
"ZEPPELIN_SHIRO_CONF": {
"secret": "shiroconf"
}
}
}
To use the fetch file mechanism:
- Create your custom shiro.ini file, there is an example file in the deploy folder in this repo.
- Upload your shiro.ini file to a location accessible via http from your cluster.
- Add a fetch definition to your app:
{
"fetch": [
{"uri": "http://my.fileserver/zeppelin/shiro.ini", "extract": false, "executable": false, "cache": false }
]
}
To access HDFS from zeppelin you need to provide the files hdfs-site.xml
and core-site.xml
. if you installed the HDFS framework from the Universe, you just need to add the following fetch definition to the app:
{
"fetch": [
{ "uri": "http://api.hdfs.marathon.l4lb.thisdcos.directory/v1/endpoints/hdfs-site.xml", "extract": false, "executable": false, "cache": false },
{ "uri": "http://api.hdfs.marathon.l4lb.thisdcos.directory/v1/endpoints/core-site.xml", "extract": false, "executable": false, "cache": false }
]
}
You can provide your own custom zeppelin-site.xml:
- Create your custom zeppelin-site.xml
- Make sure not to change the default bind port for zeppelin (8080) as on startup this will be replaced with the host port of the container
- Upload your zeppelin-site.xml file to a location accessible via http from your cluster.
- Add a fetch definition to your app:
{
"fetch": [
{"uri": "http://my.fileserver/zeppelin/zeppelin-site.xml", "extract": false, "executable": false, "cache": false }
]
}
The provided marathon app definition by default allows access to zeppelin via the admin router proxy ("Open Service" in the DC/OS UI). if you have marathon-lb installed you can also use it. Just add the following labels to the app:
{
"labels": {
"HAPROXY_GROUP": "external",
"HAPROXY_0_VHOST": "zeppelin.my.domain"
},
}
There are two variants of the docker image based on the download variants on the zeppelin homepage:
- all: With all interpeters
- netinst: With only the spark interpreter
By default the app definitions use the all variant. If you want the netinst variant, just change the -all
in the docker image tag to -netinst
.
./build.sh
The build script will build docker images for different spark versions and with zeppelin with all interpreters (all) or just the spark interpreter (netinst). The dockerfile uses ARG before FROM, therefore you need at least docker 17.05 to build.
This project is based on the official mesosphere spark docker image.
If you find a bug or have a feature request, just open an issue in Github. Or, if you want to contribute something, feel free to open a pull request.