This is a Docker image appropriate for running Spark on Kuberenetes. It produces three main images:
spark-master
- Runs a Spark master in Standalone mode and exposes a port for Spark and a port for the WebUI.spark-worker
- Runs a Spark worer in Standalone mode and connects to the Spark master via DNS namespark-master
.zeppelin
- Runs a Zeppelin web notebook and connects to the Spark master via DNS namespark-master
and exposes a port for the WebUI.
In addition, there are two additional pushed images:
spark-base
- This base image forspark-master
andspark-worker
that starts nothing.spark-driver
- This image, just like thezeppelin
image, allows running things likepyspark
to connect tospark-master
, but is lighter weight than thezeppelin
image.
To Build the docker image:
Now clone the github directory for the spark docker container.
git clone https://github.com/wesleydias/spark-docker-images
Now cd to the spark-docker folder and set the env variable.
cd spark-docker
export DOCKERID=<docker-id>
Now build the doker image.
docker image build --tag $DOCKERID/spark:2.0 .
Now push the built image to the docker repo.
docker image push $DOCKERID/spark:2.0