Giter Site home page Giter Site logo

hypnosapos / sparknetes Goto Github PK

View Code? Open in Web Editor NEW
1.0 5.0 1.0 1.14 MB

Spark on Kubernetes PoCs

Home Page: https://github.com/hypnosapos/sparknetes

License: MIT License

Makefile 94.75% Dockerfile 5.25%
spark kubernetes sparknetes spark-on-kubernetes gcp gke gke-cluster google-storage spark-ml spark-clusters

sparknetes's Introduction

Sparknetes

Build status sparknetes layers sparknetes version spark layers

Spark on kubernetes. Based on official documentation of spark 2.4

Requirerements:

  • Make (gcc)
  • Docker (17+)
  • Kubernetes 1.8+

Spark docker images

To get a base docker image to use for launch spark on kubernetes type:

make sparknetes-build spark-image

NOTE: This process may take you several minutes (~20 mins, under the wood there is a maven packaging task running). Take a look at Makefile file to view default values and other variables.

This docker image is available at dockerhub/hypnosapos.

Kubernetes cluster

Examples will be tested on GKE service, here you have instructions to create a kubernetes cluster).

When we've got our kubernetes cluster ready (for instance with GKE_CLUSTER_NAME=spark variable exported) we have to prepare a minimal bootstrapping operation:

export GKE_CLUSTER_NAME=spark
make gke-spark-bootstrap

Launch basic examples

Spark on kubernetes

As the picture above shows you, spark-submit commands will be thrown from a pod of a kubernetes job.

First example is the well known SparkPi:

make spark-basic-example

Logs of jobs may be tracked on this way:

JOB_NAME=<job_name> make gke-job-logs

NOTE: is the name of the example with the suffix '-job' instead of '-example' (i.e. "spark-basic-job" instead of "spark-basic-example")

If it run successfully, spark-submit command should outline something like this:

2018-05-27 14:00:16 INFO  LoggingPodStatusWatcherImpl:54 - State changed, new state:
	 pod name: spark-pi-63ba1a53bc663d728936c24c91fb339b-driver
	 namespace: default
	 labels: spark-app-selector -> spark-2a6817ac76a248ba8a9cef7f3b988d82, spark-role -> driver
	 pod uid: 4698a7b8-61b6-11e8-b653-42010a840124
	 creation time: 2018-05-27T14:00:13Z
	 service account name: spark
	 volumes: spark-token-92jw7
	 node name: gke-spark-default-pool-ba0e670d-w989
	 start time: 2018-05-27T14:00:13Z
	 container images: hypnosapos/spark:2.4
	 phase: Succeeded
2018-05-27 14:00:16 INFO  LoggingPodStatusWatcherImpl:54 - Container final statuses:
Container name: spark-kubernetes-driver
	 Container image: hypnosapos/spark:2.4
	 Container state: Terminated
	 Exit code: 0
2018-05-27 14:00:16 INFO  Client:54 - Application spark-pi finished.

Second example is a linear regression, let's launch the log watcher in line too:

JOB_NAME=spark-ml-job make spark-ml-example gke-job-logs

GCS example

GCS and Spark on kubernetes

This example uses a remote dependency for GCS connector and the GCP credentials to authenticate with internal metadata server. We've used a private jar and class (provide your values directly in Makefile file, quoted by marks < >), but essentially you only need update your code to use gs:// instead the typical hdfs:// scheme for data input/output.

JOB_NAME=spark-gcs-job make spark-gcs-example gke-job-logs

In order to view the driver UI through a public load balance service:

export SPARK_APP_NAME=spark-gcs
make gke-spark-expose-ui
make gke-spark-open-ui

Driver UI - Stages

Driver UI - Executors

Using spark-k8s operator

Few months ago google community published the k8s-spark-operator. Thus, it's time to check it out:

make gke-spark-operator-install
make gke-spark-operator-example

Cleaning

Remove all spark resources on kubernetes cluster:

make gke-spark-clean

sparknetes's People

Contributors

engapa avatar laetitiae avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Forkers

nagendra-avr

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.