Comments (8)
Hello @Gouthamkreddy1234 , how are you mounting the config file?
by default the built container will use the config located at /opt/palantir/services/spark-scheduler/var/conf/install.yml
checkout the example podspec here:
https://github.com/palantir/k8s-spark-scheduler/blob/master/examples/extender.yml#L127
from k8s-spark-scheduler.
Hi @onursatici, I am modifying this file k8s-spark-scheduler/docker/var/conf/install.yml in my local repo as well as the configMap https://github.com/palantir/k8s-spark-scheduler/blob/master/examples/extender.yml#L66 and both seem point to the same mount path - /opt/palantir/services/spark-scheduler/var/conf/install.yml.
The configMap data is what I think is finally is used in my case when I use the Dockerfile present in the repo and then use this manifest file - https://github.com/palantir/k8s-spark-scheduler/blob/master/examples/extender.yml, but both the FIFO and the binpacking features do not seem to work for me.
Am I missing anything else here?
from k8s-spark-scheduler.
thanks for the info @Gouthamkreddy1234. Strange, altering the configmap should work. Can you verify that /opt/palantir/services/spark-scheduler/var/conf/install.yml
has your changes when you do kubectl apply -f examples/extender.yml
by ssh'ing into the created pod?
how are you validating that the config changes are not taking effect? for FIFO one way to test this is to submit a large application that won't be able to fit to your cluster followed by a smaller one, and if FIFO is enabled, the smaller application should be stuck in pending until you remove the larger application.
One thing to note for selecting the binpack algorithm, anything else than the accepted values would default to distribute-evenly.
Ideally we warn in these scenarios: #131
from k8s-spark-scheduler.
Yes, I ssh'd to the container and could see the configMap mounted there which has fifo enabled. Also, I did re-create the FIFO scenario you mentioned and I can see the smaller pod being scheduled while the bigger one is still in pending state (which is not expected behaviour). I am not sure what I am missing. I am running the latest version of the scheduler-extender images.
Am I missing anything here?
P.S: I am working with the master branch.
from k8s-spark-scheduler.
got it, finally, can you check if the pods you are creating have spark-scheduler
as the schedulerName
in their spec?
you can also use this script to simulate a spark application launch: https://github.com/palantir/k8s-spark-scheduler/blob/master/examples/submit-test-spark-app.sh#L31
from k8s-spark-scheduler.
Yes, the pods are being scheduled by spark-scheduler.
I am not sure what else I might be missing here. Any thoughts?
from k8s-spark-scheduler.
Hey @Gouthamkreddy1234 had a look at this. So within the extender, it is assumed that nodes have a configurable label, and the value of this label dictates which group a node is in. FIFO order is preserved for applications waiting to be scheduled for the same group. I have updated the examples to include that label here: #134 .
you would also need to add this label (instance-group
for the example by default) to the nodes that you are planning to schedule spark applications to, and set the nodeSelector for the spark pods to match that
from k8s-spark-scheduler.
I can confirm that with that change, if I submit a large application with:
# 10^10 cpu requests for the driver, this will be stuck Pending
./submit-test-spark-app.sh 1 2 100G 100 100m 200
then submit a smaller application
# smaller app with 2 executors that can fit
./submit-test-spark-app.sh 2 2 100m 100 100m 200
the smaller application will be blocked in Pending until I kill the larger application.
from k8s-spark-scheduler.
Related Issues (19)
- Autoscaling HOT 1
- Automatically deleting resourcereservation object when spark-driver completed HOT 1
- warn when an invalid binpacking algorithm is set in config
- [Feature] Extending the scheduler-extender to support non-Spark workload HOT 2
- Consider pod limits when scheduling pods
- Pods scheduled stuck in Pending state HOT 2
- executor pod schedule stucked with enough resource HOT 2
- Work with Cluster Autoscaler? HOT 1
- Separate out failure-fits from failure-fifo
- The spark job is always in pending HOT 1
- Exclude completed pods from overhead calculation HOT 1
- binpacking log line should include all binpacking parameters
- Compatibility with K8S 1.25
- pods go in Pending state intermittently, scheduler restart solves the issue HOT 1
- Spark scheduler should keep the pods and resource reservation mapping uptodate HOT 2
- Goland project generated by `./godelw goland` does not compile HOT 2
- Example in examples/extender.yml bug HOT 2
- [METRICS] - No information on the README.md about a basic metric client setup HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from k8s-spark-scheduler.