Comments (17)
from helm-chart.
I can't overstate how cool the new scheduler is. Our first 24 hours with it and it successfully scaled down from 5 to 2 nodes. We shipped 0.7 last week, and I'm already more excited about getting our 0.8 release ready!
The manual solution that's been the only feasible scale-down is to cordon nodes that you don't need anymore and wait for them to drain and be reclaimed by the cluster autoscaler. They may still need manual draining to get pods other than users off of them that prevent scale-down (e.g. a miscellaneous kube-dns pod can show up).
We have work to do on autoscaling, especially documenting all the strategies and caveats and helpers.
from helm-chart.
@jhamman we are soon releasing version 0.8 of the z2jh helm chart, and I'm currently writing documentation about it.
from helm-chart.
I propose to close this issue, as the scaling down issue seems identified within z2jh or GKE community, and instead open a "Update to z2jh 0.8 helm chart" one. It seems, as @consideRatio informed us, that the release is imminent (jupyterhub/zero-to-jupyterhub-k8s#1054).
from helm-chart.
For those who are interested, here is our daily cloud bill, split into compute and storage:
from helm-chart.
from helm-chart.
I didn't do much other than click a few buttons on the google cloud console. For some reason, I think only I am authorized to see the full details of our billing.
from helm-chart.
@yuvipanda was recently mentioning some ongoing work within JupyterHub to rework the kubernetes scaling protocols to make scale down more efficient. Perhaps he can point us to that work so we can follow along?
from helm-chart.
@consideRatio is the person doing most of that work, @jhamman. I think it got merged very recently, he should be able to help.
from helm-chart.
@jhamman I just wrote some about the deployment @minrk did today on mybinder.org. He enabled the freshly merged and foundational building block in a series of improvements to the scheduling and autoscaling of the zero-to-jupyterhub-k8s chart.
See pangeo-data/pangeo#322 (comment)
from helm-chart.
Thanks @yuvipanda / @consideRatio / @minrk. This all seems quite promising. If we wanted to try this out in the near term, what would be the best way for us to do that. @minrk - can you point the config you are using for mybinder.org?
from helm-chart.
mybinder.org's deployment config is at https://github.com/jupyterhub/mybinder.org-deploy, the relevant bit here:
scheduling:
userScheduler:
enabled: true
replicas: 2
This feature is only in the 0.8 dev versions of the chart at the moment. If you are upgrading from 0.6, make sure to test out a deploy/upgrade. As it stands right now, 0.6->0.7 chart upgrades seem to require relaunching users, so performing the upgrade would be a significant disruption.
from helm-chart.
We now depend on version 0.8 of the z2jh chart. Do we want to add the scheduling optimization bit to the pangeo chart so its on by default?
from helm-chart.
@jhamman I guess so, @minrk why is it not included in the jupyterhub chart by default?
from helm-chart.
We made it opt-in as a new, somewhat experimental feature that increases the resource requirements a bit. That said, we've been using it on mybinder.org for several months to great effect. I think we will probably switch it to on-by-default in the next major chart release.
from helm-chart.
@jhamman I think it is quite complex to decide on good default values. If you are using an autoscaling cluster or not, as well as if you are using GKE or another cluster with certain settings relating to the cluster autoscaler.
Some of the complexities
About not having the user scheduler on by default:
It will pack the user pods tight on nodes, but that does not make sense if you have a fixed number of nodes, but it does if you use autoscaling. So, what the default should be is not critical but also not obvious.
About podPriority / userPlaceholder pods:
These changes only makes sense for autoscaling clusters as well, if that is configured or not is out of scope for the helm chart. The cluster autoscalers setting called "pod priority cutoff" deciding on a lower limit of a required pod priority of a pending pod to trigger scale up, is not fixed. Some clusters may need to adjust their podPriority settings based on this.
from helm-chart.
I've opened #87 with what I see as sensible defaults for typical pangeo applications. I think we can assume basically all pangeo applications will require autoscaling clusters.
from helm-chart.
Related Issues (20)
- [Proposal] Add dask-gateway to this chart HOT 3
- chartpress 0.5.0 is out HOT 1
- Upgrade dask-gateway to 0.6.1
- how can i use docker image instead of helm-chart to deploy pangeo? HOT 5
- Failed to start up a notebook server HOT 1
- Automate version bumps HOT 2
- Automate setup of dask-gateway jupyterhub service HOT 8
- Archive and stop updating pangeo helm chart? HOT 4
- Incorrect value for jupyterhub.hub.services.dask-gateway.url HOT 3
- refactor notebook Dockerfile to use environment.yaml HOT 2
- problems with compatibility with zero2jupyterhub-k8s 0.7 chart version HOT 2
- Travis is broken HOT 4
- PV for Deploying Pangeo on the Cloud HOT 2
- Unable to mount volumes - Pangeo on the Cloud HOT 13
- Restricting user's resources HOT 14
- CI Testing of this chart
- upgrading to v0.8 of the JupyterHub Helm Chart HOT 4
- New helm chart to address this vulnerability? HOT 2
- Pangeo failing all Helm versions > v0.1.1-e5fa7c4 HOT 12
- Typo in Dask RBAC RoleBinding
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from helm-chart.