Giter Site home page Giter Site logo

Comments (17)

mrocklin avatar mrocklin commented on August 12, 2024 2

from helm-chart.

minrk avatar minrk commented on August 12, 2024 2

I can't overstate how cool the new scheduler is. Our first 24 hours with it and it successfully scaled down from 5 to 2 nodes. We shipped 0.7 last week, and I'm already more excited about getting our 0.8 release ready!

The manual solution that's been the only feasible scale-down is to cordon nodes that you don't need anymore and wait for them to drain and be reclaimed by the cluster autoscaler. They may still need manual draining to get pods other than users off of them that prevent scale-down (e.g. a miscellaneous kube-dns pod can show up).

We have work to do on autoscaling, especially documenting all the strategies and caveats and helpers.

from helm-chart.

consideRatio avatar consideRatio commented on August 12, 2024 1

@jhamman we are soon releasing version 0.8 of the z2jh helm chart, and I'm currently writing documentation about it.

from helm-chart.

guillaumeeb avatar guillaumeeb commented on August 12, 2024 1

I propose to close this issue, as the scaling down issue seems identified within z2jh or GKE community, and instead open a "Update to z2jh 0.8 helm chart" one. It seems, as @consideRatio informed us, that the release is imminent (jupyterhub/zero-to-jupyterhub-k8s#1054).

from helm-chart.

rabernat avatar rabernat commented on August 12, 2024

For those who are interested, here is our daily cloud bill, split into compute and storage:

image

from helm-chart.

mrocklin avatar mrocklin commented on August 12, 2024

from helm-chart.

rabernat avatar rabernat commented on August 12, 2024

I didn't do much other than click a few buttons on the google cloud console. For some reason, I think only I am authorized to see the full details of our billing.

from helm-chart.

jhamman avatar jhamman commented on August 12, 2024

@yuvipanda was recently mentioning some ongoing work within JupyterHub to rework the kubernetes scaling protocols to make scale down more efficient. Perhaps he can point us to that work so we can follow along?

from helm-chart.

yuvipanda avatar yuvipanda commented on August 12, 2024

@consideRatio is the person doing most of that work, @jhamman. I think it got merged very recently, he should be able to help.

from helm-chart.

consideRatio avatar consideRatio commented on August 12, 2024

@jhamman I just wrote some about the deployment @minrk did today on mybinder.org. He enabled the freshly merged and foundational building block in a series of improvements to the scheduling and autoscaling of the zero-to-jupyterhub-k8s chart.

See pangeo-data/pangeo#322 (comment)

from helm-chart.

jhamman avatar jhamman commented on August 12, 2024

Thanks @yuvipanda / @consideRatio / @minrk. This all seems quite promising. If we wanted to try this out in the near term, what would be the best way for us to do that. @minrk - can you point the config you are using for mybinder.org?

from helm-chart.

minrk avatar minrk commented on August 12, 2024

mybinder.org's deployment config is at https://github.com/jupyterhub/mybinder.org-deploy, the relevant bit here:

scheduling:
  userScheduler:
    enabled: true
    replicas: 2

This feature is only in the 0.8 dev versions of the chart at the moment. If you are upgrading from 0.6, make sure to test out a deploy/upgrade. As it stands right now, 0.6->0.7 chart upgrades seem to require relaunching users, so performing the upgrade would be a significant disruption.

from helm-chart.

jhamman avatar jhamman commented on August 12, 2024

We now depend on version 0.8 of the z2jh chart. Do we want to add the scheduling optimization bit to the pangeo chart so its on by default?

from helm-chart.

guillaumeeb avatar guillaumeeb commented on August 12, 2024

@jhamman I guess so, @minrk why is it not included in the jupyterhub chart by default?

from helm-chart.

minrk avatar minrk commented on August 12, 2024

We made it opt-in as a new, somewhat experimental feature that increases the resource requirements a bit. That said, we've been using it on mybinder.org for several months to great effect. I think we will probably switch it to on-by-default in the next major chart release.

from helm-chart.

consideRatio avatar consideRatio commented on August 12, 2024

@jhamman I think it is quite complex to decide on good default values. If you are using an autoscaling cluster or not, as well as if you are using GKE or another cluster with certain settings relating to the cluster autoscaler.

Some of the complexities

About not having the user scheduler on by default:
It will pack the user pods tight on nodes, but that does not make sense if you have a fixed number of nodes, but it does if you use autoscaling. So, what the default should be is not critical but also not obvious.

About podPriority / userPlaceholder pods:
These changes only makes sense for autoscaling clusters as well, if that is configured or not is out of scope for the helm chart. The cluster autoscalers setting called "pod priority cutoff" deciding on a lower limit of a required pod priority of a pending pod to trigger scale up, is not fixed. Some clusters may need to adjust their podPriority settings based on this.

from helm-chart.

jhamman avatar jhamman commented on August 12, 2024

I've opened #87 with what I see as sensible defaults for typical pangeo applications. I think we can assume basically all pangeo applications will require autoscaling clusters.

from helm-chart.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.