Giter Site home page Giter Site logo

runpyspark's Introduction

Easy to use environment with jupyter notebook and apache spark.

Instead of opening jupyter notebook, just copy the preapareSparkEnvironment.sh script into your prefered directory and inside it run:

    $ source preapareSparkEnvironment.sh

Run directly from github:

    $ source <(curl -s https://raw.githubusercontent.com/boyander/runpyspark/master/prepareSparkEnvironment.sh)

It prepare your shell with pyspark configured to use jupyter notebook. After env is ready run pyspark it will open a jupyter notebook.

IMPORTANT: Before running the preapareSparkEnvironment.sh script, ensure you have followed the checklist for your OS.

Checklist for Ubuntu

  • Install spark following this medium post
  • Create an alias of spark in your home directory or rename the installation to just "spark"
    $ ln -s ~/spark-2.4.0-bin-hadoop2.7 ~/spark
  • Ensure you have spark-shell in your $PATH variable (Note: this suposes you are running zsh or oh-my-zsh terminal, if that's not the case or you are not sure, just change .zshrc to .bashrc in the following command).
    $ echo "export PATH=\"\$PATH:$HOME/spark/bin\"" >> ~/.zshrc
    $ source ~/.zshrc

To check it works, you must be able to run spark-shell from your terminal.

Checklist for MacOSX

  • You need brew installed
  • brew install jq
  • brew install spark-shell

Important Notes

  • This script uses python3, ensure python3 is installed and running in your terminal.
  • When creating a jupyter notebook, ensure you've choosed python 3 kernel, otherwise it will not work.
  • There's also a notebook PysparkDemo.ipynb to test apache spark worked.
  • In case you've created multiple spark contexts, run $ killall java to stop all apache spark instances.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.