Giter Site home page Giter Site logo

jahstreetorg / spark-on-kubernetes-helm Goto Github PK

View Code? Open in Web Editor NEW
195.0 13.0 76.0 1.33 MB

Spark on Kubernetes infrastructure Helm charts repo

License: Apache License 2.0

Jupyter Notebook 22.35% Shell 16.55% HTML 0.08% Dockerfile 5.47% Mustache 55.56%
spark kubernetes livy jupyter history-server helm

spark-on-kubernetes-helm's Introduction

CircleCI

Spark on Kubernetes Cluster Helm Chart

This repo contains the Helm chart for the fully functional and production ready Spark on Kubernetes cluster setup integrated with the Spark History Server, JupyterHub and Prometheus stack.

Refer the design concept for the implementation details.

Getting Started

Initialize Helm (for Helm 2.x)

In order to use Helm charts for the Spark on Kubernetes cluster deployment first we need to initialize Helm client.

kubectl create serviceaccount tiller --namespace kube-system
kubectl create clusterrolebinding tiller --clusterrole cluster-admin --serviceaccount=kube-system:tiller
helm init --upgrade --service-account tiller --tiller-namespace kube-system
kubectl get pods --namespace kube-system -w
# Wait until Pod `tiller-deploy-*` moves to Running state
Install Livy

The basic Spark on Kubernetes setup consists of the only Apache Livy server deployment, which can be installed with the Livy Helm chart.

helm repo add jahstreet https://jahstreet.github.io/helm-charts
helm repo update
kubectl create namespace livy
helm upgrade --install livy --namespace livy jahstreet/livy \
    --set rbac.create=true # If you are running RBAC-enabled Kubernetes cluster
kubectl get pods --namespace livy -w
# Wait until Pod `livy-0` moves to Running state

For more advanced Spark cluster setups refer the Documentation page.

Run Spark Job

Now when Livy is up and running we can submit Spark job via Livy REST API.

kubectl exec --namespace livy livy-0 -- \
    curl -s -k -H 'Content-Type: application/json' -X POST \
      -d '{
            "name": "SparkPi-01",
            "className": "org.apache.spark.examples.SparkPi",
            "numExecutors": 2,
            "file": "local:///opt/spark/examples/jars/spark-examples_2.11-2.4.5.jar",
            "args": ["10000"],
            "conf": {
                "spark.kubernetes.namespace": "livy"
            }
          }' "http://localhost:8998/batches" | jq
# Record BATCH_ID from the response
Track running job

To track the running Spark job we can use all the available Kubernetes tools and the Livy REST API.

# Watch running Spark Pods
kubectl get pods --namespace livy -w --show-labels
# Check Livy batch status
kubectl exec --namespace livy livy-0 -- curl -s http://localhost:8998/batches/$BATCH_ID | jq

To configure Ingress for direct access to Livy UI and Spark UI refer the Documentation page.

Spark on Kubernetes Cluster Design Concept

Motivation

Running Spark on Kubernetes is available since Spark v2.3.0 release on February 28, 2018. Now it is v2.4.5 and still lacks much comparing to the well known Yarn setups on Hadoop-like clusters.

Corresponding to the official documentation user is able to run Spark on Kubernetes via spark-submit CLI script. And actually it is the only in-built into Apache Spark Kubernetes related capability along with some config options. Debugging proposal from Apache docs is too poor to use it easily and available only for console based tools. Schedulers integration is not available either, which makes it too tricky to setup convenient pipelines with Spark on Kubernetes out of the box. Yarn based Hadoop clusters in turn has all the UIs, Proxies, Schedulers and APIs to make your life easier.

On the other hand the usage of Kubernetes clusters in opposite to Yarn ones has definite benefits (July 2019 comparison):

  • Pricing. Comparing the similar cluster setups on Azure Cloud shows that AKS is about 35% cheaper than HDInsight Spark.
  • Scaling. Kubernetes cluster in Cloud support elastic autoscaling with many cool related features alongside, eg: Nodepools. Scaling of Hadoop clusters is far not as fast though, can be done either manually or automatically (on July 2019 was in preview).
  • Integrations. You can run any workloads in Kubernetes cluster wrapped into the Docker container. But do you know anyone who has ever written Yarn App in the modern world?
  • Support. You don't have a full control over the cluster setup provided by Cloud and usually there are no latest versions of software available for months after the release. With Kubernetes you can build image on your own.
  • Other Kuebernetes pros. CI/CD with Helm, Monitoring stacks ready for use in-one-button-click, huge popularity and community support, good tooling and of course HYPE.

All that makes much sense to try to improve Spark on Kubernetes usability to take the whole advantage of modern Kubernetes setups in use.

Design concept

The heart of all the problems solution is Apache Livy. Apache Livy is a service that enables easy interaction with a Spark cluster over a REST interface. It is supported by Apache Incubator community and Azure HDInsight team, which uses it as a first class citizen in their Yarn cluster setup and does many integrations with it. Watch Spark Summit 2016, Cloudera and Microsoft, Livy concepts and motivation for the details.

The cons is that Livy is written for Yarn. But Yarn is just Yet Another resource manager with containers abstraction adaptable to the Kubernetes concepts. Livy is fully open-sourced as well, its codebase is RM aware enough to make Yet Another One implementation of it's interfaces to add Kubernetes support. So why not!? Check the WIP PR with Kubernetes support proposal for Livy.

The high-level architecture of Livy on Kubernetes is the same as for Yarn.

Livy schema

Livy server just wraps all the logic concerning interaction with Spark cluster and provides simple REST interface.

[EXPAND] For example, to submit Spark Job to the cluster you just need to send `POST /batches` with JSON body containing Spark config options, mapped to `spark-submit` script analogous arguments.

$SPARK_HOME/bin/spark-submit \
    --master k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port> \
    --deploy-mode cluster \
    --name SparkPi \
    --class org.apache.spark.examples.SparkPi \
    --conf spark.executor.instances=5 \
    --conf spark.kubernetes.container.image=<spark-image> \
    local:///path/to/examples.jar
 
# Has the similar effect as calling Livy via REST API
 
curl -H 'Content-Type: application/json' -X POST \
  -d '{
        "name": "SparkPi",
        "className": "org.apache.spark.examples.SparkPi",
        "numExecutors": 5,
        "conf": {
          "spark.kubernetes.container.image": "<spark-image>"
        },
        "file": "local:///path/to/examples.jar"
      }' "http://livy.endpoint.com/batches"

Under the hood Livy parses POSTed configs and does spark-submit for you, bypassing other defaults configured for the Livy server.

After the job submission Livy discovers Spark Driver Pod scheduled to the Kubernetes cluster with Kubernetes API and starts to track its state, cache Spark Pods logs and details descriptions making that information available through Livy REST API, builds routes to Spark UI, Spark History Server, Monitoring systems with Kubernetes Ingress resources, Nginx Ingress Controller in particular and displays the links on Livy Web UI.

Providing REST interface for Spark Jobs orchestration Livy allows any number of integrations with Web/Mobile apps and services, easy way of setting up flows via jobs scheduling frameworks.

Livy has in-built lightweight Web UI, which makes it really competitive to Yarn in terms of navigation, debugging and cluster discovery.

Livy home Livy sessions Livy logs Livy diagnostics

Livy supports interactive sessions with Spark clusters allowing to communicate between Spark and application servers, thus enabling the use of Spark for interactive web/mobile applications. Using that feature Livy integrates with Jupyter Notebook through Sparkmagic kernel out of box giving user elastic Spark exploratory environment in Scala and Python. Just deploy it to Kubernetes and use!

Livy schema

On top of Jupyter it is possible to set up JupyterHub, which is a multi-user Hub that spawns, manages, and proxies multiple instances of the single-user Jupyter notebook servers. Follow the video PyData 2018, London, JupyterHub from the Ground Up with Kubernetes - Camilla Montonen to learn the details of the implementation. JupyterHub provides a way to setup auth through Azure AD with AzureAdOauthenticator plugin as well as many other Oauthenticator plugins.

Jupyterhub architecture

Monitoring setup of Kubernetes cluster itself can be done with Prometheus Operator stack with Prometheus Pushgateway and Grafana Loki using a combined Helm chart, which allows to do the work in one-button-click. Learn more about the stack from videos:

The overall monitoring architecture solves pull and push model of metrics collection from the Kubernetes cluster and the services deployed to it. Prometheus Alertmanager gives an interface to setup alerting system.

Prometheus architecture Prometheus operator schema

With the help of JMX Exporter or Pushgateway Sink we can get Spark metrics inside the monitoring system. Grafana Loki provides out-of-box logs aggregation for all Pods in the cluster and natively integrates with Grafana. Using Grafana Azure Monitor datasource and Prometheus Federation feature you can setup complex global monitoring architecture for your infrastructure.

Global monitoring

References:

spark-on-kubernetes-helm's People

Contributors

jahstreet avatar jerryldh avatar kyprifog avatar lgov avatar liubarnabas avatar lucasces avatar rakeshramakrishnan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

spark-on-kubernetes-helm's Issues

Liveness & rediness checks timeout on Livy

Currently liveness is being checked on /batches endpoint:
https://github.com/jahstreet/spark-on-kubernetes-helm/blob/a1fd2ac19580feb0d9469c1d7cadd8630710ac13/charts/livy/templates/statefulset.yaml#L33

When there is a bigger number of batches, these check timeout occasionally:

Events:
  Type     Reason     Age                 From                                                    Message
  ----     ------     ----                ----                                                    -------
  Warning  Unhealthy  54m (x56 over 10d)  kubelet, ip-XX  Readiness probe failed: Get http://XX:8998/batches: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
  Warning  Unhealthy  54m (x59 over 10d)  kubelet, ip-XX  Liveness probe failed: Get http://XX:8998/batches: net/http: request canceled (Client.Timeout exceeded while awaiting headers)

Would it be ok to add ?size=1 to limit response size, or at least to have an option to disable these checks on livy chart?

K8s API error

Unable to install on AWS EKS. It is working locally with minikube

Error: Chart requires kubernetesVersion: 1.11.0 - 1.18.0 which is incompatible with Kubernetes v1.17.12-eks-7684af

Ques: Does this install spark

Does the helm chart for Livy deploy Spark? If not, how do we configure Spark helm chart and Livy helm chart so they can "talk" to each other/submit jobs via REST services?

Run livy with customize docker image got the "Read-only file system" error

Hi,
I tried to customize the spark image for Livy so that I clone the https://github.com/JahstreetOrg/spark-on-kubernetes-docker repo and build the images by order.

Spark
Livy-Spark
Livy
(I'm use Livy builder from sasnouskikh/livy-builder:0.3)

I can able to run the helm install with the customize livy image but when I create a batch, I got the Read-only file system error.
/opt/entrypoint.sh: line 45: /opt/spark/conf/spark-defaults.conf: Read-only file system
image

I tried to add the chmod cmd, but It's not working.
RUN ["chmod", "777", "-R", "/opt/spark/conf/spark-defaults.conf"]

Question: Livy Scala Api

What version of scala/spark do client applications need to use?
The image says Spark 3.0.1 but Livy api requires Scala 2.11 which only works with earlier versions of Spark.

I am getting a ClassNotFound error when trying to submit a job:
00:20 WARN: [kryo] Unable to load class org.apache.livy.scalaapi.LivyScalaClient$$anon$1 with kryo's ClassLoader. Retrying with current..
21/08/23 19:14:12 INFO JobWrapper: Failed to run job b3353a3c-2140-4d4c-94c2-520473152ec2
org.apache.livy.shaded.kryo.kryo.KryoException: Unable to find class: org.apache.livy.scalaapi.LivyScalaClient$$anon$1

Getting 503 error when accessing Livy

Hello,

I'm working through the full instructions and am running into an issue accessing Livy. My k8s control plane is managed by AWS EKS and I've configured a Route 53 CNAME record for k8s.mydomain.io to point to the Classic Load Balancer that gets spun up by AWS when I install the cluster-base config. Because I'm not deploying locally, I've replaced all instances of my-cluster.example.com with k8s.mydomain.io. I do get security errors when accessing this page via the browser but I'm assuming this is just because k8s is using a self-signed certificate.

I then installed spark-cluster and confirmed all the pods (including livy) were successfully running:

(venv) โžœ  spark-cluster git:(master) โœ— kubectl get pods --watch --namespace spark-cluster
NAME                              READY   STATUS    RESTARTS   AGE
continuous-image-puller-24wqp     1/1     Running   0          2m10s
hub-5ffb5cb6cd-6g7qd              1/1     Running   0          2m10s
proxy-84549d5bd5-8g4mr            1/1     Running   0          2m10s
spark-cluster-livy-0              1/1     Running   0          2m10s
user-scheduler-5dd7cbc579-6m6ml   1/1     Running   0          2m10s
user-scheduler-5dd7cbc579-ff727   1/1     Running   0          2m10s

I am able to go to k8s.mydomain.io/jupyterhub, sign in, and launch the Python example notebook. But the Spark application never seems to start
image
and after 5 minutes I get
image

When I try to go to k8s.mydomain.io/livy, I get an Nginx error page that says 503 Service Temporarily Unavailable. All the metric dashboards in spark-monitoring also don't work - I can access the pages but no metrics are available. I suspect all these problems likely have a singular root cause; there is something I'm missing here.

Additional information that might be useful:

  • I am using AWS EKS with k8s version 1.18
  • I used the default values.yaml file in cluster-base and the custom-values-local.yaml file in spark-cluster, both with the domain replacement mentioned above
  • The worker instance running all of these pods is a t3.large, which should probably sufficient to run a PySpark notebook
  • My livy log is showing a context timeout - the full log can be found here

Any help would be greatly appreciated - thanks in advance!

request for Spark Job-Server

Is it possible to integrate spark job-server as an alternative? I believe most existing clients use spark job-server rather than Livy. I am expecting your reply, thank you.

Question: How to add livy chart as dependent chart

First off - amazing work get this helm chart!!!

I tried specifying the dependencies

dependencies:
- name: livy
  version: "v2.0.2"
  repository: https://jahstreet.github.io/helm-charts

It does not work.
I might be doing something wrong here.

What is correct way to add livy chart as a dependencies?

maven packages spark.jars.packages doesn't loaded into executers classpath

Hi

I'm having an issue while loading maven packages dependencies, while using SparkMagic and this helm chart for livy and spark in k8s.

In spark config I set the config:
spark.jars.packages=org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.1

the dependencies downloaded into /root/.ivy2/jars but dosn't included into spark classpath and when trying to execute action I'm getting the following error:

21/01/05 11:22:15 INFO DAGScheduler: ShuffleMapStage 1 (take at :30) failed in 0.197 s due to Job aborted due to stage failure: Task 1 in stage 1.0 failed 4 times, most recent failure: Lost task 1.3 in stage 1.0 (TID 17, 10.4.187.11, executor 2): java.lang.ClassNotFoundException: org.apache.spark.sql.kafka010.KafkaSourceRDDPartition

Do you have any suggestions ?

Thanks

Upgrade kubeVersion requirement on Livy helm charts

Tried installing the latest helm charts (commit hash)

Got the following error:

โฏ helm upgrade --install livy --namespace livy jahstreet/livy                                                                                                                   
Release "livy" does not exist. Installing it now.
Error: chart requires kubeVersion: 1.11.0 - 1.18.0 which is incompatible with Kubernetes v1.18.8

Can see that kubeVersion requirements have been upgraded for spark-cluster. Can we upgrade the same for Livy too?

ERROR SparkKubernetesApp: Couldn't refresh Kubernetes state

I'm trying to run a spark app through Livy API

curl -s -k -H 'Content-Type: application/json' -X POST \
  -d '{
        "name": "SparkPi-01",
        "className": "org.apache.spark.examples.SparkPi",
        "numExecutors": 2,
        "file": "local:///opt/spark/examples/jars/spark-examples_2.12-3.0.1.jar",
        "args": ["10000"],
        "conf": {
            "spark.kubernetes.namespace": "livy"
        }
      }' "http://localhost:8998/batches"

I can see the spark-driver POD running and complete correctly. But seems the Livy Server can't get the state of application, log said:

22/12/27 03:38:29 ERROR SparkKubernetesApp: Couldn't refresh Kubernetes state
io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: GET at: https://kubernetes.default.svc.cluster.local/apis/extensions/v1beta1/namespaces/livy/ingresses?labelSelector=spark-app-tag%3Dlivy-batch-0-dcfPwsoJ. Message: 404 page not found

I tried other spark applications, all with same result, the spark app can run and complete, but Livy can't query its state.

Do I miss some configurations?

[QUESTION/BUG?] Problem with getting started guide

I tried to follow the getting started guide in the README.md. But if I submit a batch job to Livy it ends up in the state dead. However, I can see in my cluster that the spark driver and executor pods are deployed. Also, according to their logs, they seem to perform all calculations successfully.

If I look at the logs of the livy pod, I get the following error:

22/03/22 15:01:16 ERROR SparkKubernetesApp: Couldn't refresh Kubernetes state
io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: GET at: https://kubernetes.default.svc.cluster.local/apis/extensions/v1beta1/namespaces/livy/ingresses?labelSelector=spark-app-tag%3Dlivy-batch-4-HtbACzka. Message: 404 page not found
.
        at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:589)
        at io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:528)
        at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:492)
        at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:451)
        at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:433)
        at io.fabric8.kubernetes.client.dsl.base.BaseOperation.listRequestHelper(BaseOperation.java:166)
        at io.fabric8.kubernetes.client.dsl.base.BaseOperation.list(BaseOperation.java:640)
        at io.fabric8.kubernetes.client.dsl.base.BaseOperation.list(BaseOperation.java:79)
        at org.apache.livy.utils.LivyKubernetesClient.getIngress(SparkKubernetesApp.scala:503)
        at org.apache.livy.utils.LivyKubernetesClient.getApplicationReport(SparkKubernetesApp.scala:487)
        at org.apache.livy.utils.SparkKubernetesApp$$anonfun$1$$anonfun$4.apply(SparkKubernetesApp.scala:189)
        at org.apache.livy.utils.SparkKubernetesApp$$anonfun$1$$anonfun$4.apply(SparkKubernetesApp.scala:189)
        at scala.util.Try$.apply(Try.scala:192)
        at org.apache.livy.utils.SparkKubernetesApp$.org$apache$livy$utils$SparkKubernetesApp$$withRetry(SparkKubernetesApp.scala:104)
        at org.apache.livy.utils.SparkKubernetesApp$$anonfun$1.apply$mcV$sp(SparkKubernetesApp.scala:189)
        at org.apache.livy.Utils$$anon$1.run(Utils.scala:97)

It seems like livy can't refresh the state of the spark pods because it is missing an ingress. Any idea how to fix this or what the actual problem is?

Add Delta Lake Support

One of the things that it is really nice to have is the delta lake support (https://delta.io/) for Spark. It is wildly used right now (pushed by Databricks).
Is there any good way to start looking at it?

Set tolerations for spark drivers and executors

First of all, thank you @jahstreet for your amazing work here! Hope your diffs will be merged into livy soon.

In our usage, we recently have a need to use a dedicated nodepool for spark drivers and executors pods. We have this node pool set up with labels and taints, however we don't know how to add tolerations to the drivers and executors pods from livy. Looks like with spark 3.0.0 this is possible, but not in 2.4.5?

Not sure if the image you built have this feature?

Upgrade to Spark 3.2.0

Hi @jahstreet how are you doing so far ? First of all thank you for your initiatives. This is awesome. What is the current status of the project ? We are considering to leverage your work for our analytics platforms. We are thinking to use spark distribution 3.2.0 and hadoop 3.2. What would be your suggestion in such case?

Missing spark history server docker image for Spark 3.0.1

The YAML file spark-on-kubernetes-helm/charts/spark-cluster/values.yaml uses the image with tag 3.0.1_2.12-hadoop_3.2.0_cloud but that image is not present in the docker hub repo. Latest image there is with tag 3.0.0_2.12-hadoop_3.2.0_cloud.

I think you probably forgot to push the docker image after building it.

Add licence

Hey, can you please add some licence to this project and to JahstreetOrg/spark-on-kubernetes-docker.
I can't use this project because of policies in our company. Furthermore, its not clear if I can use it at all as it's not clear if its free to use. Seems like according to default GitHub licence it isn't:

You're under no obligation to choose a license. However, without a license, the default copyright laws apply, meaning that you retain all rights to your source code and no one may reproduce, distribute, or create derivative works from your work.

Thank you.

Error when running with a modified docker image

It might be a naive question but I have modified the livy docker image to create a new one using the following

from sasnouskikh/livy:0.8.0-incubating-spark_3.0.1_2.12-hadoop_3.2.0_cloud
RUN python3 -m pip install avro 

and I replaced the image repository in the values.yaml of the livy chart as well as the value for LIVY_SPARK_KUBERNETES_CONTAINER_IMAGE.

But now if I run the example pi spark example I get the following error in the job logs:

/opt/entrypoint.sh: line 45: /opt/spark/conf/spark-defaults.conf: Read-only file system

Any idea why? Everything ran just fine with the original image.

Unable to create spark session

I was able to setup Livy using the helm chart, but when I create a session it fails. I am using the default configuration with minikube

Create session payload

{
    "kind": "pyspark",
    "name": "test-session1234",
    "conf": {
      "spark.kubernetes.namespace": "livy"
    }
}
20/09/25 04:00:58 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in thread "main" io.fabric8.kubernetes.client.KubernetesClientException: Operation: [create]  for kind: [Pod]  with name: [null]  in namespace: [livy]  failed.
	at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64)
	at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72)
	at io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:337)
	at io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:330)
	at org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:141)
	at org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:140)
	at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543)
	at org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:140)
	at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:250)
	at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:241)
	at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543)
	at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:241)
	at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:204)
	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.net.SocketException: Broken pipe (Write failed)
	at java.net.SocketOutputStream.socketWrite0(Native Method)
	at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
	at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
	at sun.security.ssl.OutputRecord.writeBuffer(OutputRecord.java:431)
	at sun.security.ssl.OutputRecord.write(OutputRecord.java:417)
	at sun.security.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:894)
	at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:865)
	at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:123)
	at okio.Okio$1.write(Okio.java:79)
	at okio.AsyncTimeout$1.write(AsyncTimeout.java:180)
	at okio.RealBufferedSink.flush(RealBufferedSink.java:224)
	at okhttp3.internal.http2.Http2Writer.settings(Http2Writer.java:203)
	at okhttp3.internal.http2.Http2Connection.start(Http2Connection.java:515)
	at okhttp3.internal.http2.Http2Connection.start(Http2Connection.java:505)
	at okhttp3.internal.connection.RealConnection.startHttp2(RealConnection.java:298)
	at okhttp3.internal.connection.RealConnection.establishProtocol(RealConnection.java:287)
	at okhttp3.internal.connection.RealConnection.connect(RealConnection.java:168)
	at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:257)
	at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:135)
	at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:114)
	at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:126)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at io.fabric8.kubernetes.client.utils.BackwardsCompatibilityInterceptor.intercept(BackwardsCompatibilityInterceptor.java:119)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at io.fabric8.kubernetes.client.utils.ImpersonatorInterceptor.intercept(ImpersonatorInterceptor.java:68)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at io.fabric8.kubernetes.client.utils.HttpClientUtils.lambda$createHttpClient$3(HttpClientUtils.java:110)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:254)
	at okhttp3.RealCall.execute(RealCall.java:92)
	at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:411)
	at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:372)
	at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleCreate(OperationSupport.java:241)
	at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleCreate(BaseOperation.java:819)
	at io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:334)
	... 17 more
20/09/25 04:01:00 INFO ShutdownHookManager: Shutdown hook called
20/09/25 04:01:00 INFO ShutdownHookManager: Deleting directory /tmp/spark-343d41df-d58c-4ed4-8a03-2eabbc21da1d

Kubernetes Diagnostics: 
Operation: [list]  for kind: [Pod]  with name: [null]  in namespace: [null]  failed.

MountVolume.SetUp failed for volume for Livy driver pod

Hello

when I install Livy helm chart and create a Livy driver pod, it failed in most volume set up.
I am getting this error even if ConfigMap already there
"MountVolume.SetUp failed for volume "spark-conf-volume" : configmap "livy-session-0-ee5a9e78150bb674-driver-conf-map" not found"

Thanks

403 Forbidden issue

I tried to install in our GKE cluster and got this error:

2020-03-06 10:04:58,536 WARN  [OkHttp https://kubernetes.default.svc/...] internal.WatchConnectionManager (WatchConnectionManager.java:onFailure(197)) - Exec Failure: HTTP 403, Status: 403 - 
java.net.ProtocolException: Expected HTTP 101 response but was '403 Forbidden'
	at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:228)
	at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:195)
	at okhttp3.RealCall$AsyncCall.execute(RealCall.java:153)
	at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
2020-03-06 10:04:58,543 WARN  [pool-3-thread-1] k8s.ExecutorPodsWatchSnapshotSource (Logging.scala:logWarning(87)) - Kubernetes client has been closed (this is expected if the application is shutting down.)
2020-03-06 10:04:58,544 ERROR [pool-3-thread-1] spark.SparkContext (Logging.scala:logError(91)) - Error initializing SparkContext.
io.fabric8.kubernetes.client.KubernetesClientException: 
	at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$2.onFailure(WatchConnectionManager.java:201)
	at okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:570)
	at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:197)
	at okhttp3.RealCall$AsyncCall.execute(RealCall.java:153)
	at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
2020-03-06 10:04:58,554 INFO  [pool-3-thread-1] server.AbstractConnector (AbstractConnector.java:doStop(318)) - Stopped Spark@6fc37701{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2020-03-06 10:04:58,557 INFO  [pool-3-thread-1] ui.SparkUI (Logging.scala:logInfo(54)) - Stopped Spark web UI at http://livy-session-2-1583489087099-driver-svc.livy.svc:4040
2020-03-06 10:04:58,561 INFO  [pool-3-thread-1] k8s.KubernetesClusterSchedulerBackend (Logging.scala:logInfo(54)) - Shutting down all executors
2020-03-06 10:04:58,565 INFO  [dispatcher-event-loop-2] k8s.KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint (Logging.scala:logInfo(54)) - Asking each executor to shut down
2020-03-06 10:04:58,785 ERROR [kubernetes-executor-snapshots-subscribers-1] util.Utils (Logging.scala:logError(91)) - Uncaught exception in thread kubernetes-executor-snapshots-subscribers-1
io.fabric8.kubernetes.client.KubernetesClientException: Operation: [create]  for kind: [Pod]  with name: [null]  in namespace: [livy]  failed.
	at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64)
	at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72)
	at io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:364)
	at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator$$anonfun$org$apache$spark$scheduler$cluster$k8s$ExecutorPodsAllocator$$onNewSnapshots$1.apply$mcVI$sp(ExecutorPodsAllocator.scala:139)
	at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
	at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.org$apache$spark$scheduler$cluster$k8s$ExecutorPodsAllocator$$onNewSnapshots(ExecutorPodsAllocator.scala:126)
	at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator$$anonfun$start$1.apply(ExecutorPodsAllocator.scala:68)
	at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator$$anonfun$start$1.apply(ExecutorPodsAllocator.scala:68)
	at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsSnapshotsStoreImpl$$anonfun$org$apache$spark$scheduler$cluster$k8s$ExecutorPodsSnapshotsStoreImpl$$callSubscriber$1.apply$mcV$sp(ExecutorPodsSnapshotsStoreImpl.scala:102)
	at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1340)
	at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsSnapshotsStoreImpl.org$apache$spark$scheduler$cluster$k8s$ExecutorPodsSnapshotsStoreImpl$$callSubscriber(ExecutorPodsSnapshotsStoreImpl.scala:99)
	at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsSnapshotsStoreImpl$$anonfun$addSubscriber$1.apply$mcV$sp(ExecutorPodsSnapshotsStoreImpl.scala:71)
	at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsSnapshotsStoreImpl$$anon$1.run(ExecutorPodsSnapshotsStoreImpl.scala:107)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.InterruptedIOException: interrupted
	at okio.Timeout.throwIfReached(Timeout.java:146)
	at okio.Okio$1.write(Okio.java:76)
	at okio.AsyncTimeout$1.write(AsyncTimeout.java:180)
	at okio.RealBufferedSink.flush(RealBufferedSink.java:224)
	at okhttp3.internal.http1.Http1Codec.finishRequest(Http1Codec.java:166)
	at okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.java:84)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:45)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:126)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at io.fabric8.kubernetes.client.utils.BackwardsCompatibilityInterceptor.intercept(BackwardsCompatibilityInterceptor.java:119)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at io.fabric8.kubernetes.client.utils.ImpersonatorInterceptor.intercept(ImpersonatorInterceptor.java:68)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at io.fabric8.kubernetes.client.utils.HttpClientUtils$2.intercept(HttpClientUtils.java:107)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:200)
	at okhttp3.RealCall.execute(RealCall.java:77)
	at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:379)
	at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:344)
	at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleCreate(OperationSupport.java:227)
	at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleCreate(BaseOperation.java:787)
	at io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:357)
	... 17 more
2020-03-06 10:04:58,854 INFO  [dispatcher-event-loop-6] spark.MapOutputTrackerMasterEndpoint (Logging.scala:logInfo(54)) - MapOutputTrackerMasterEndpoint stopped!
2020-03-06 10:04:58,861 INFO  [pool-3-thread-1] memory.MemoryStore (Logging.scala:logInfo(54)) - MemoryStore cleared
2020-03-06 10:04:58,862 INFO  [pool-3-thread-1] storage.BlockManager (Logging.scala:logInfo(54)) - BlockManager stopped
2020-03-06 10:04:58,868 INFO  [pool-3-thread-1] storage.BlockManagerMaster (Logging.scala:logInfo(54)) - BlockManagerMaster stopped
2020-03-06 10:04:58,868 WARN  [pool-3-thread-1] metrics.MetricsSystem (Logging.scala:logWarning(66)) - Stopping a MetricsSystem that is not running
2020-03-06 10:04:58,870 INFO  [dispatcher-event-loop-4] scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint (Logging.scala:logInfo(54)) - OutputCommitCoordinator stopped!
2020-03-06 10:04:58,883 INFO  [pool-3-thread-1] spark.SparkContext (Logging.scala:logInfo(54)) - Successfully stopped SparkContext
Exception in thread "main" java.lang.NullPointerException
	at org.apache.livy.rsc.driver.JobWrapper.cancel(JobWrapper.java:90)
	at org.apache.livy.rsc.driver.RSCDriver.shutdown(RSCDriver.java:127)
	at org.apache.livy.rsc.driver.RSCDriver.run(RSCDriver.java:364)
	at org.apache.livy.rsc.driver.RSCDriverBootstrapper.main(RSCDriverBootstrapper.java:93)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

I did some research and apparently the kubernetes client jar is too old: https://stackoverflow.com/questions/57643079/kubernetes-watchconnectionmanager-exec-failure-http-403

I followed the suggestions there and replace the jars, however after that I got this:

/opt/entrypoint.sh: line 45: /opt/spark/conf/spark-defaults.conf: Read-only file system

Upgrade to 2.0.1 issues with S3

On upgrading to 2.0.1 I can no longer leverage hadoop-aws and get this somewhat cryptic error:

spark.sparkContext._jsc.hadoopConfiguration().set("fs.s3a.impl","org.apache.hadoop.fs.s3a.S3AFileSystem")
spark.sparkContext._jsc.hadoopConfiguration().set("fs.s3a.awsAccessKeyId", access_key)
spark.sparkContext._jsc.hadoopConfiguration().set("fs.s3a.awsSecretAccessKey",secret_key)

querying this public bucket:

spark.read.csv(s3a://nyc-tlc/misc/uber_nyc_data.csv")
py4j.protocol.Py4JJavaError: An error occurred while calling o74.csv.
: java.nio.file.AccessDeniedException: s3a://nyc-tlc/misc/uber_nyc_data.csv: getFileStatus on s3a://nyc-tlc/misc/uber_nyc_data.csv: com.amazonaws.services.s3.model.AmazonS3Exception: 
	at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:230)
	at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:151)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2198)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2163)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2102)
	at org.apache.hadoop.fs.FileSystem.isDirectory(FileSystem.java:1700)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.isDirectory(S3AFileSystem.java:2995)
	at org.apache.spark.sql.execution.streaming.FileStreamSink$.hasMetadata(FileStreamSink.scala:47)
	at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:366)
	at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:297)
	at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:286)
	at scala.Option.getOrElse(Option.scala:189)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:286)
	at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:723)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)
Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: 
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1640)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1304)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1058)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4368)
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4315)
	at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1271)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$getObjectMetadata$4(S3AFileSystem.java:1249)
	at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:322)
	at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:285)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.getObjectMetadata(S3AFileSystem.java:1246)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2183)
	... 22 more

How do I configure hadoop-aws like version 1.0.0? This seemed to be working then.

Notebook connectivity issue

Can't tell if this is a sparkmagics issue. I start livy using the helm chart attached and ran the kubectl exec example and it ran fine. Then I started the helm chart jupyter-sparkmagic after changing the values.yaml to have:

livyEndpoint: "livy.livy:80"

which is the endpoint I think needs to be referenced here (i also tried livyheadless.livy:8998, livy.livy:8998, and the corresponding cluster IP for both of those services exposed by livy, as well as the same ports in the pods directly without the services)

However in a notebook if I run:

%load_ext sparkmagic.magics
%manage_spark

to try to establish a session using that livy endpoint, I am only able to get this error:

HttpClientException: Error sending http request and maximum retry encountered.

Even though using curl and telnet I am able to access those endpoints from the jupyter-sparkmagic pod. One strange thing is if I exec into the livy-0 pod spun up by livy and run:

livy-server status

I get that the server is not running even though the kubectl exec example worked and livy appears to be running for PID 1.

how the livy spark driver start spark executor pod

hi:
I have installed the chart following your guide, but the spark application started by spark magic seems only contains a ivy-spark driver pod , how can I get the executor pod run in k8s throght livy-spark driver pod ?

File file:/tmp/history-server does not exist when configuring history server

Hi,

we are strugling around with configuring the history server in livy using these env vars:

LIVY_SPARK_EVENT1LOG_ENABLED: {value: "true"}
LIVY_SPARK_EVENT1LOG_DIR: {value: "file:///tmp/history-server"}
LIVY_LIVY_UI_HISTORY0SERVER0URL: {value: "https://historyserver.mycluster.lan"}

after we trigger a job, we see this error message in the driver container:
Exception in thread "main" java.io.FileNotFoundException: File file:/tmp/history-server does not exist

does anybody have a clue what we are doing wrong?

BR
Andreas

Issues with load testing livy

I am load testing (using locust) my livy server which is deployed on k8 pods using this helm. I have tried testing with session recovery enabled on both zookeeper and filesystem. We also have basic auth enabled on our server using nginx ingress.

My config for session recovery looks like this:

LIVY_LIVY_SERVER_RECOVERY_MODE: {value: "recovery"}
LIVY_LIVY_SERVER_RECOVERY_STATE0STORE: {value: "filesystem"}
LIVY_LIVY_SERVER_RECOVERY_STATE0STORE_URL: {value: "file:///tmp/livy/store/state"}

These are some of the logs/issues that I am getting when testing with session recovery enabled on the filesystem:

2022-05-02 14:12:45,487 : livy_test : CRITICAL : ERROR IN SUBMIT BATCHES: 500 "java.io.FileNotFoundException: File /tmp/livy/store/state/v1/batch/state.tmp does not exist"

2022-05-02 14:12:44,467 : livy_test : CRITICAL : ERROR IN SUBMIT BATCHES: 500 "org.apache.hadoop.fs.FileAlreadyExistsException: rename destination /tmp/livy/store/state/v1/batch/state already exists."

2022-05-02 14:12:46,950 : livy_test : CRITICAL : ERROR IN SUBMIT BATCHES: 500 "ExitCodeException exitCode=1: chmod: cannot access '/tmp/livy/store/state/v1/batch/.state.tmp.crc': No such file or directory\n"

Load testing configuration (locust conf):

users = 100
spawn-rate = 100
run-time = 1m

Can someone help with why I might be getting the above errors?

How to add extra files like keytabs to the livy pod

Hi,
I am trying to add keytab files with the help of volumes to livy. But as I see the mount and volume definitions get overridden.
Is there any way to pass on extra files if required to livy pod?
In my case, I need to provide keytab files to access secure services.
Any input will be helpful.

Thanks & Regards,
Swathi Desai

Modify the parameters of the SparkMagic config to connect jupyter notebook with the k8s cluster

Hi. I'm confused that what should be filled in url

"kernel_python_credentials" : {
    "username": "",
    "password": "",
    "url": "http://localhost:8998",
    "auth": "None"
  },

I have depolyed the livy chart on a kubernetes cluster(based on the ACK which is provided by AliCloud). And I have deployed jupyter notebook and sparkmagic on my laptop. Now I'm trying to connect the jupyter notebook with the kubernetes cluster. So I have to figure out how to modify the Spark magic config JSON.

Sessions dead

I deploy using default values but after creating session, I get the status of it and got this error message

     "\t container status: ",
        "\t\t container name: spark-kubernetes-driver",
        "\t\t container image: sasnouskikh/livy-spark:0.8.0-incubating-spark_3.0.1_2.12-hadoop_3.2.0_cloud",
        "\t\t container state: waiting",
        "\t\t pending reason: ContainerCreating",
        "23/07/14 08:52:06 INFO LoggingPodStatusWatcherImpl: Deployed Spark application livy-session-0 with submission ID spark-jobs:livy-session-0-1faa39895399a3d2-driver into Kubernetes",
        "23/07/14 08:52:06 INFO ShutdownHookManager: Shutdown hook called",
        "23/07/14 08:52:06 INFO ShutdownHookManager: Deleting directory /tmp/spark-9cdcf0d9-ebca-4548-b591-2e1708c7d050",
        "\nKubernetes Diagnostics: ",
        "Failure executing: GET at: https://kubernetes.default.svc.cluster.local/apis/extensions/v1beta1/namespaces/spark-jobs/ingresses?labelSelector=spark-app-tag%3Dlivy-session-0-Xzo72sxU. Message: 404 page not found\n."
    ]

I see the config of using ingress is set to false but it still seems to try to find ingress. Can you help me ?

2023: this example doesn't work (various errors)

On kind k8s:

  • the curl command from readme gives "curl": executable file not found in $PATH: unknown
  • if I use port-forward and curl against local port, I get curl: (3) URL using bad/illegal format or missing URL "No content to map due to end-of-input\n at [Source: (org.eclipse.jetty.server.HttpInputOverHTTP); line: 1, column: 0]"%
  • nothing in logs - logs aren't available in kubectl logs...

Deploying new profile to existing k8s Hub cluster with SparkMagic

Hi,

First, thanks. for your amazing job !
I have deployed your charts in cluster mode with livy/jupyter on a single k8s cluster and everything works great.
But here's my setup :
I have an already existing JupyterHub instance (deployed with https://github.com/jupyterhub/zero-to-jupyterhub-k8s) that already have some profile.
For example here the sample of the actual datascientist profile that we have :

      description: "Environment for data scientist"
      kubespawner_override:
        image: mycompanyregistry/singleuser-datascientist:stable

But now I would like to add a new one.

I have fork your Dockerfile here (https://github.com/JahstreetOrg/spark-on-kubernetes-docker/blob/master/jupyter/4.6.3-sparkmagic_0.15.0/Dockerfile) with the single-user entry point by default and added just this env vars for testing purpose:

ENV JUPYTER_ALLOW_INSECURE_WRITES=true
ENV JUPYTER_RUNTIME_DIR=/tmp

I'm able to run these commands and the session manager is shown:

%load_ext sparkmagic.magics
%manage_spark

But when I'm trying to run some %%configure command I encountered the following error :

UsageError: Cell magic %%configure not found.

And when I'm trying to create new notebook, I don't have the possibility to choose pyspark or spark Kernel neither, only python 3 option is available.

Last thing, I have manually override at docker build the spark magic config with this :

  "kernel_python_credentials" : {
    "username": "",
    "password": "",
    "url": "https://my-cluster.example.com/livy",
    "auth": "None"
  },

Of course the spawning container have /etc/hosts matching for my POC Livy cluster internal IP :)

Do you have any idea of what I have miss ?

Thanks a lot !
KnutFr

Authorization on REST API Header

Would like to enable basic authorization that was configurable at run time. When deploying this a multi-tenant model (ie. each tenant has their own Livy orchestration), we need to secure the authentications made otherwise the Livy API is open to the entire cluster.

One way for doing this is LDAP but wonder if there is a supportable way to incorporate either JWT token or Basic Auth.

Example:

import json, pprint, requests, textwrap
host = 'http://localhost:8998'
data = {'kind': 'spark'}
headers = {'Content-Type': 'application/json'}
r = requests.post(host + '/sessions', data=json.dumps(data), headers=headers)
r.json()

{u'state': u'starting', u'id': 0, u'kind': u'spark'}

We could add Authorization to the headers and have this embedded with basic auth.

k8s v1.19.0 support

Hey, thank you for a great job!

It would be great to add support for the latest k8s.

PR: #50

Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.