Giter Site home page Giter Site logo

mitpatel5 / dsc-4-38-05-installing-configuring-spark-with-docker-online-ds-pt-112618 Goto Github PK

View Code? Open in Web Editor NEW

This project forked from learn-co-students/dsc-4-38-05-installing-configuring-spark-with-docker-online-ds-pt-112618

0.0 1.0 0.0 8.09 MB

License: Other

Jupyter Notebook 100.00%

dsc-4-38-05-installing-configuring-spark-with-docker-online-ds-pt-112618's Introduction

Installing and Configuring PySpark with Docker

Introduction

In addition to running on the clusters, Spark provides a simple standalone deploy mode. We can launch a standalone cluster either manually, by starting a master and workers by hand, or use our provided launch scripts. It is also possible to run these daemons on a single machine for testing. In this lesson, we'll look at installing a standalone version of Spark on Windows and Mac machines. All the required tools are open source and directly downloadable from official sites referenced in the lesson.

Objectives

You will be able to:

  • Install Docker and Kitematic on Windows/Mac environments
  • Install a standalone version of Spark on a local server
  • Test the spark installation by running a simple test script

Docker

For this section, we shall run PySpark on a single machine in a virtualized environment using Docker. Docker is a container technology that allows packaging and distribution of software so that it takes away the headache of things like setting up environment, configuring logging, configuring options etc. Docker basically removes the excuse "It works on my machine".

Visit this link learn more about docker and containers

Install Docker and docker Toolbox

In addition to Docker, we will also need to down and install the docker toolbox.

Visit following guides for step by step installation instructions.

Guide for installing docker toolbox on mac

Guide for installing docker toolbox on windows

Kitematic

Docker toolbox is mainly required above for a Docker plugin called "Kitematic". Kitematic allows "one click install" of containers in Docker running on your Mac and windows and lets you control your app containers from a graphical user interface (GUI). This takes away a lot of cognitive load required to set up and configure virtual environments.

Once Docker and Toolbox are successfully installed, we need to perform following tasks in the given sequence.

Click on the docker toolbar on mac and select Kitematic

Sign up on docker hub

Upon running kitematic, you will be asked to sign up on docker hub. This is optional , but recommended as it can allow to share your docker containers and run them on different machines.

This option can be accessed via "My Repos" Section in the Kitematic GUI.

Search for pyspark-notebook repository, and click on the image provided by jupyter

It is imperative to use the one from jupyter for our labs to run as expected, as there are lots of other offerings available.

Run the repo when it is downloaded, it will start an ipython-kernel. To run jupyter notebooks , click on the right half of kitematic where it says "web preview".

This will open a browser window asking you for a token ID. Go back to the kitematic and check the left bottom of terminal-like screen for string that says: token?= --- as shown above . Copy the text after that and put it into the jupyter notebook page.

This will open a new jupyter notebook , like we've seen before. We are now ready to program in spark.

Testing the installation

In order to make sure everything went smooth, Let's run a simple script in a new jupyter notebook.

import pyspark
sc = pyspark.SparkContext('local[*]')
rdd = sc.parallelize(range(1000))
rdd.takeSample(False, 5)

If everything went fine , you should see an out like this:

[941, 60, 987, 542, 718]

Do not worry if you dont fully comprehend what above meant. Next we will look into some basic programming principles and methods from Spark which will explain this.

Summary

In this lesson, we looked at installing Spark using Docker container. The process works same for both Mac and Windows based systems. Make sure to follow all steps in the given sequence.

dsc-4-38-05-installing-configuring-spark-with-docker-online-ds-pt-112618's People

Contributors

shakeelraja avatar loredirick avatar mathymitchell avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.