1 Configure Livy in Ambari

Until jupyter-incubator/sparkmagic#285 is fixed, set

livy.server.csrf_protection.enabled ==> false

in Ambari under Spark Config - Advanced livy-conf

2 Install and Configure Sparkmagic

Details see https://github.com/jupyter-incubator/sparkmagic

2.1 Install sparkmagic

Install Jupyter, if you don't already have it:

$ sudo -H pip install jupyter notebook ipython

Install Sparkmagic:

$ sudo -H pip install sparkmagic

2.2 Install Kernels

$ pip show sparkmagic # check path, e.g /usr/local/lib/python2.7/site-packages

$ cd /usr/local/lib/python2.7/site-packages

$ jupyter-kernelspec install --user sparkmagic/kernels/sparkkernel
$ jupyter-kernelspec install --user sparkmagic/kernels/pysparkkernel

2.3 Install widgets

$ sudo -H jupyter nbextension enable --py --sys-prefix widgetsnbextension

2.4 Install config

To avoid timeouts connecting to HDP 2.5 it is important to add
```
 "livy_server_heartbeat_timeout_seconds": 0,
```

To ensure the Spark job will run on the cluster (livy default is local), spark.master needs to be set to yarn-cluster, a conf object needs to be provided. Here you can also add extra jars for the session:

 "session_configs": {
 	"driverMemory": "2G",
 	"executorCores": 4,
 	"proxyUser": "bernhard",
 	"conf": {
 		"spark.master": "yarn-cluster",
 		"spark.jars.packages": "com.databricks:spark-csv_2.10:1.5.0"
 	}
 }

Here is an example config.json. Adapt and copy to ~/.sparkmagic

3 Run Notebooks with Sparkmagic

3.1 Start Jupyter Notebooks

Start Jupyter

$ cd <project-dir>
$ jupyter notebook

In Notebook Home select New -> Spark or New -> Spark or New Python

3.2 Load Sparkmagic

Add into your Notebook after the Kernel started

%load_ext sparkmagic.magics
%manage_spark

3.3 Configure Spark Access

3.3.1 Select Add Endpoint

Result:

3.3.2 Create Session

Select Create Session. If you have copied config.json to ~/.sparkmagic before you start jupyter the properties section will be prefilled with the session_configs JSON object

Result:

4 Notes

Livy on HDP 2.5 currently does not return YARN Application ID
Jupyter session name provided under Create session is notebook intenal and not used by Livy Server on the cluster. Livy-Server will create sessions on YARN called livy-session-###, e.g. livy-session-10. The session in Jupyter will have session id ###, e.g. 10.

For multiline Scala code in the Notebook you have to add the dot at the end, as in

 val df = sqlContext.read.
                 format("com.databricks.spark.csv").
                 option("header", "true").
                 option("inferSchema", "true").
                 load("/tmp/iris.csv")

5 Examples

Sparkmagic for Scala

Spark in Python Notebook

6 Calling Livy via Knox

See Knox.md

tspannhw / sparkmagic-on-hdp Goto Github PK

sparkmagic-on-hdp's Introduction