This repository contains a script for running Spark in CoreOS. It was adapted from https://github.com/amplab/docker-scripts.
First, you need pull a few Docker Images:
./spark.sh pull_images
Before you start a Spark Cluster, you need setup a DNS service. Here we use docker-dns instead of dnsmasq to provide DNS inside Docker containers. Run the following command to fetch pre-compiled docker-dns binary from github and install it (and system unit file) to CoreOS:
./spark.sh install_docker_dns
Use the same script, you can start Spark master, workers and shell.
To start the master, run:
./spark.sh start_master
To start the workers, run:
./spark.sh start_worker 3
where 3
is the number of workers.
./spark.sh start_shell
Attach to the shell container via this command,
sudo docker attach shell
If the screen appears to stay blank just hit return to get to the prompt.
scala> val textFile = sc.textFile("hdfs://master:9000/user/hdfs/test.txt")
scala> textFile.count()
scala> textFile.map({line => line}).collect()