Giter Site home page Giter Site logo

reynoldsm88 / spark-operator Goto Github PK

View Code? Open in Web Editor NEW

This project forked from radanalyticsio/spark-operator

0.0 3.0 0.0 3.26 MB

Operator for managing the Spark clusters on Kubernetes and OpenShift.

License: Apache License 2.0

Shell 25.67% Makefile 4.19% Java 70.13%

spark-operator's Introduction

spark-operator

Build status License

{CRD|ConfigMap}-based approach for managing the Spark clusters in Kubernetes and OpenShift.

This operator uses abstract-operator library.

Watch the full asciicast

How does it work

UML diagram

Quick Start

Run the spark-operator deployment:

kubectl apply -f manifest/operator.yaml

Create new cluster from the prepared example:

kubectl apply -f examples/cluster.yaml

After issuing the commands above, you should be able to see a new Spark cluster running in the current namespace.

kubectl get pods
NAME                               READY     STATUS    RESTARTS   AGE
my-spark-cluster-m-5kjtj           1/1       Running   0          10s
my-spark-cluster-w-m8knz           1/1       Running   0          10s
my-spark-cluster-w-vg9k2           1/1       Running   0          10s
spark-operator-510388731-852b2     1/1       Running   0          27s

Once you don't need the cluster anymore, you can delete it by deleting the custom resource by:

kubectl delete sparkcluster my-spark-cluster

Very Quick Start

# create operator
kubectl apply -f http://bit.ly/sparkop

# create cluster
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: SparkCluster
metadata:
  name: my-cluster
spec:
  worker:
    instances: "2"
EOF

Spark Applications

Apart from managing clusters with Apache Spark, this operator can also manage Spark applications similarly as the GoogleCloudPlatform/spark-on-k8s-operator. These applications spawn their own Spark cluster for their needs and it uses the Kubernetes as the native scheduling mechanism for Spark. For more details, consult the Spark docs.

# create spark application
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: SparkApplication
metadata:
  name: my-cluster
spec:
  mainApplicationFile: local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar
  mainClass: org.apache.spark.examples.SparkPi
EOF

OpenShift

For deployment on OpenShift use the same commands as above (with oc instead of kubectl if kubectl is not installed) and make sure the logged user can create CRDs: oc login -u system:admin && oc project default

Config Map approach

This operator can also work with Config Maps instead of CRDs. This can be useful in situations when user is not allowed to create CRDs or ClusterRoleBinding resources. The schema for config maps is almost identical to custom resources and you can check the examples.

kubectl apply -f manifest/operator-cm.yaml

The manifest above is almost the same as the operator.yaml. If the environmental variable CRD is set to false, the operator will watch on config maps with certain labels.

You can then create the Spark clusters as usual by creating the config map (CM).

kubectl apply -f examples/cluster-cm.yaml
kubectl get cm -l radanalytics.io/kind=SparkCluster

or Spark applications that are natively scheduled on Spark clusters by:

kubectl apply -f examples/test/cm/app.yaml
kubectl get cm -l radanalytics.io/kind=SparkApplication

Images

Image name Description Layers quay.io docker.io
:latest-released represents the latest released version Layers info quay.io repo docker.io repo
:latest represents the master branch Layers info
:x.y.z one particular released version Layers info

For each variant there is also available an image with -alpine suffix based on Alpine for instance Layers info

Configuring the operator

The spark-operator contains several defaults that are implicit to the creation of Spark clusters and applications. Here are a list of environment variables that can be set to adjust the default behaviors of the operator.

  • CRD set to true if the operator should respond to Custom Resources, and set to false if it should respone to ConfigMaps.
  • DEFAULT_SPARK_CLUSTER_IMAGE a container image reference that will be used as a default for all pods in a SparkCluster deployment when the image is not specified in the cluster manifest.
  • DEFAULT_SPARK_APP_IMAGE a container image reference that will be used as a default for all executor pods in a SparkApplication deployment when the image is not specified in the application manifest.

Please note that these environment variables must be set in the operator's container, see operator.yaml and operator-cm.yaml for operator deployment information.

Related projects

The radanalyticsio/spark-operator is not the only Kubernetes operator service that targets Apache Spark.

  • GoogleCloudPlatform/spark-on-k8s-operator is an operator which shares a similar schema for the Spark cluster and application resources. One major difference between it and the radanalyticsio/spark-operator is that the latter has been designed to work well in environments where a user has a limited role-based access to Kubernetes, such as on OpenShift and also that radanalyticsio/spark-operator can deploy standalone Spark clusters.

If you are looking for tooling to make interacting with the spark-operator more convenient, please see the following.

  • oshinko-temaki is a shell application for generating SparkCluster manifest definitions. It can produce full schema manifests from a few simple command line flags.

Troubleshooting

Show the log:

# last 25 log entries
kubectl logs --tail 25 -l app.kubernetes.io/name=spark-operator
# follow logs
kubectl logs -f `kubectl get pod -l app.kubernetes.io/name=spark-operator -o='jsonpath="{.items[0].metadata.name}"' | sed 's/"//g'`

Run the operator from your host (also possible with the debugger):

java -jar target/spark-operator-*.jar

spark-operator's People

Contributors

jkremser avatar sharebear avatar dependabot-support avatar elmiko avatar stieler-it avatar dependabot[bot] avatar

Watchers

Michael Reynolds avatar James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.