Giter Site home page Giter Site logo

sparkmagic-on-hdp's Introduction

1 Configure Livy in Ambari

Until jupyter-incubator/sparkmagic#285 is fixed, set

livy.server.csrf_protection.enabled ==> false

in Ambari under Spark Config - Advanced livy-conf

2 Install and Configure Sparkmagic

Details see https://github.com/jupyter-incubator/sparkmagic

2.1 Install sparkmagic

Install Jupyter, if you don't already have it:

$ sudo -H pip install jupyter notebook ipython 

Install Sparkmagic:

$ sudo -H pip install sparkmagic

2.2 Install Kernels

$ pip show sparkmagic # check path, e.g /usr/local/lib/python2.7/site-packages

$ cd /usr/local/lib/python2.7/site-packages

$ jupyter-kernelspec install --user sparkmagic/kernels/sparkkernel
$ jupyter-kernelspec install --user sparkmagic/kernels/pysparkkernel

2.3 Install widgets

$ sudo -H jupyter nbextension enable --py --sys-prefix widgetsnbextension

2.4 Install config

  • To avoid timeouts connecting to HDP 2.5 it is important to add

     "livy_server_heartbeat_timeout_seconds": 0,
  • To ensure the Spark job will run on the cluster (livy default is local), spark.master needs to be set to yarn-cluster, a conf object needs to be provided. Here you can also add extra jars for the session:

     "session_configs": {
     	"driverMemory": "2G",
     	"executorCores": 4,
     	"proxyUser": "bernhard",
     	"conf": {
     		"spark.master": "yarn-cluster",
     		"spark.jars.packages": "com.databricks:spark-csv_2.10:1.5.0"
     	}
     }

Here is an example config.json. Adapt and copy to ~/.sparkmagic

3 Run Notebooks with Sparkmagic

3.1 Start Jupyter Notebooks

Start Jupyter

$ cd <project-dir>
$ jupyter notebook

In Notebook Home select New -> Spark or New -> Spark or New Python

3.2 Load Sparkmagic

Add into your Notebook after the Kernel started

%load_ext sparkmagic.magics
%manage_spark

3.3 Configure Spark Access

3.3.1 Select Add Endpoint

Add Endpoint

Result:

Add Endpoint - Success

3.3.2 Create Session

Select Create Session. If you have copied config.json to ~/.sparkmagic before you start jupyter the properties section will be prefilled with the session_configs JSON object

Create Session

Result:

Create Session - Success

4 Notes

  • Livy on HDP 2.5 currently does not return YARN Application ID

  • Jupyter session name provided under Create session is notebook intenal and not used by Livy Server on the cluster. Livy-Server will create sessions on YARN called livy-session-###, e.g. livy-session-10. The session in Jupyter will have session id ###, e.g. 10.

  • For multiline Scala code in the Notebook you have to add the dot at the end, as in

     val df = sqlContext.read.
                     format("com.databricks.spark.csv").
                     option("header", "true").
                     option("inferSchema", "true").
                     load("/tmp/iris.csv")

5 Examples

Sparkmagic for Scala

Spark in Python Notebook

6 Calling Livy via Knox

See Knox.md

sparkmagic-on-hdp's People

Stargazers

Timothy Spann avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.