Giter Site home page Giter Site logo

spark-playground's Introduction

Basic spark boiler plate to run some examples.

Each of NumberCount, WordCount, and CSVReader contain some code and tasks that can be completed.

To run -

click run in the editor/IDE

With spark-submit

Package with

mvn clean package

Submit with

spark-submit \
  --name spark-demo \
  --class com.scottlogic.pod.spark.playground.SparkPlayground \
  ./spark-playground/target/spark-playground-1.0-SNAPSHOT.jar

To spin up with a local cluster

Install spark locally with SDKMan sdk install spark 3.5.0 (version needs to match spark versions in the pom) and ensure this is the default/set version for your shells for the following commands

In separate shell instances.

Start a master instance with-

spark-class org.apache.spark.deploy.master.Master --host localhost

Then spin up a couple of worker nodes, running the following in their own shells for each node.

spark-class org.apache.spark.deploy.worker.Worker spark://localhost:7077 --host localhost

Modify SparkPlayground to use .master("spark://localhost:7077") instead of .master("local[*]")

Resources

Spark 101 - https://www.youtube.com/watch?v=4pSSv1GlkU0

Some example data: https://www.datablist.com/learn/csv/download-sample-csv-files


Docs

https://spark.apache.org/docs/latest/quick-start.html

https://github.com/apache/spark/tree/master/examples/src/main/java/org/apache/spark/examples

https://spark.apache.org/examples.html

Warning out of date: 0.9.0 but some useful info on key differences with using Java https://spark.apache.org/docs/0.9.0/java-programming-guide.html

spark-playground's People

Contributors

cuthullu avatar dcottle-scottlogic avatar jhcoll avatar jroper-scottlogic avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.