Until jupyter-incubator/sparkmagic#285 is fixed, set
livy.server.csrf_protection.enabled ==> false
in Ambari under Spark Config
- Advanced livy-conf
Details see https://github.com/jupyter-incubator/sparkmagic
Install Jupyter, if you don't already have it:
$ sudo -H pip install jupyter notebook ipython
Install Sparkmagic:
$ sudo -H pip install sparkmagic
$ pip show sparkmagic # check path, e.g /usr/local/lib/python2.7/site-packages
$ cd /usr/local/lib/python2.7/site-packages
$ jupyter-kernelspec install --user sparkmagic/kernels/sparkkernel
$ jupyter-kernelspec install --user sparkmagic/kernels/pysparkkernel
$ sudo -H jupyter nbextension enable --py --sys-prefix widgetsnbextension
-
To avoid timeouts connecting to HDP 2.5 it is important to add
"livy_server_heartbeat_timeout_seconds": 0,
-
To ensure the Spark job will run on the cluster (livy default is local),
spark.master
needs to be set toyarn-cluster
, aconf
object needs to be provided. Here you can also add extra jars for the session:"session_configs": { "driverMemory": "2G", "executorCores": 4, "proxyUser": "bernhard", "conf": { "spark.master": "yarn-cluster", "spark.jars.packages": "com.databricks:spark-csv_2.10:1.5.0" } }
Here is an example config.json. Adapt and copy to ~/.sparkmagic
Start Jupyter
$ cd <project-dir>
$ jupyter notebook
In Notebook Home select New -> Spark or New -> Spark or New Python
Add into your Notebook after the Kernel started
%load_ext sparkmagic.magics
%manage_spark
Result:
Select Create Session. If you have copied config.json to ~/.sparkmagic
before you start jupyter
the properties section will be prefilled with the session_configs
JSON object
Result:
-
Livy on HDP 2.5 currently does not return YARN Application ID
-
Jupyter session name provided under Create session is notebook intenal and not used by Livy Server on the cluster. Livy-Server will create sessions on YARN called livy-session-###, e.g. livy-session-10. The session in Jupyter will have session id ###, e.g. 10.
-
For multiline Scala code in the Notebook you have to add the dot at the end, as in
val df = sqlContext.read. format("com.databricks.spark.csv"). option("header", "true"). option("inferSchema", "true"). load("/tmp/iris.csv")
See Knox.md