✨Sparkler is a tool to easily deploy a Master/Worker computing grid at home. It interfaces the Apache Spark framework.
- Simple, non-technical app interface
- Automated, fault-tolerant, distributed computing at home
- High-performance computing framework for multiple languages
Table of Contents (click to expand)
So far, it has only being tested on
- Windows 8.1, 10, Ubuntu 16.04
- Java JRE 1.8.0
- (Already compiled) Apache Spark 2.11.8
- winutils (if using Windows OS)
You can either get an installer from the latest release or clone this repository
git clone https://github.com/espetro/sparkler.git
Download the Apache Spark distribution and unzip the repo inside its folder. Then copy sparkler.jar
(with/out its .exe, .sh equivalents) to the Apache root folder.
-
How to deploy a master
Start the app, and press
Start
button on the Master tab. To stop it, pressFinish
. You can see the executors (jobs) and workers connected at http://localhost:8080/.The master Spark's URL will be written on the upper field.
-
How to deploy a worker
Start the app on the Slave tab and introduce the Master's URL (
spark://address:port
). Optionally, you can select the amount of CPU/Memory resources the worker will use. You can see the executors (jobs) and worker information at http://localhost:8081/.
Take into account that either the Slave or Master must be killed pressing FINISH
, as exiting the program won't stop the background processes.
Once at least a master and a worker has been deployed, you can submit Spark jobs with
- Java using its Java API; see this example.
- Scala using its Scala API; see this example.
- Python 3+ with built-in pyspark; see this example.
- R using built-in SparkR or sparklyr; see this example.
- C#, F# and .NET using Mobius; see this word-count example.
- Julia using Spark.jl; see this example.
- MATLAB using Tall Arrays; see this example.
Seemingly, as Spark clusters usually run on remote machines, it's usual to use Jupyter Notebooks to submit applications against the cluster. You can see the configuration process here.
Please see CONTRIBUTING.md.
- Apache Spark: Hardware Provisioning
- Parallelize R code using Apache Spark
- Distributed TensorFlow: Scaling Google’s Deep Learning Library on Spark
- GPU Acceleration: Speeding UP Deep Learning on Apache Spark
- Developing Apache Spark Applications in .NET using Mobius
- Livy: A REST Interface for Apache Spark