Giter Site home page Giter Site logo

Comments (7)

betatim avatar betatim commented on June 12, 2024

This would also help with the "out of space" error we have been seeing a few times and that appears to be fixed by removing layers/docker images. Though it seems unclear why exactly it works, or what it also changes that then fixes this problem.

To get an idea how this would work: to remove a node would we first kubectl cordon <node> (kubectl drain deletes pods straight away right?), add a new node (*), wait for the node to be empty of user pods, remove node.

(*) done by increasing the minimum size of the cluster?

from mybinder.org-deploy.

yuvipanda avatar yuvipanda commented on June 12, 2024

I would say the way to do this is:

  1. Pick oldest node
  2. Cordon it
  3. Wait for it to be empty of user and build pods(can be several hours!)
  4. Delete it (via gcloud API)

Deleting the node should bring up a new one.

And yup, kubectl drain deletes pods straight away!

from mybinder.org-deploy.

yuvipanda avatar yuvipanda commented on June 12, 2024

Note that currently making changes to the Kubernetes Cluster (such as changing minimum size of cluster) causes a multi-minute GKE outage, since GKE does not do Highly Available masters yet. But deleting a node is fine...

from mybinder.org-deploy.

betatim avatar betatim commented on June 12, 2024

I was thinking cordon off + increase capacity by one so that the overall number of available nodes stays the same. Currently we are pretty comfortable in terms of capacity so might not be a problem.

What is a GKE outage? We can't issue GKE related commands or our existing nodes stop doing stuff?

from mybinder.org-deploy.

betatim avatar betatim commented on June 12, 2024

Related to this, I tried to add a new node to the staging cluster to practice/mess around. This fails with the following error:
"Instance 'gke-staging-default-pool-8014xxxx-zxxx' creation failed: Quota 'DISKS_TOTAL_GB' exceeded. Limit: 4096.0 in region us-central1"

The command I used was gcloud compute instance-groups managed resize gke-staging-default-pool-.... --zone us-central1-a --project binder-staging --size 3

What is weird about this is that prod has the same limit (4096), more nodes and less of the quota is used. Not sure I understand that.

from mybinder.org-deploy.

minrk avatar minrk commented on June 12, 2024

@betatim I checked the disks on staging, and several disks from previous test clusters were hanging around, using up the quota. It should be available now.

I also think the staging cluster could be using smaller disks. 1TB disks is a bit overkill for the staging load.

from mybinder.org-deploy.

yuvipanda avatar yuvipanda commented on June 12, 2024

This is no longer necessary since we switched to using DIND daemons. I'll close this for now!

from mybinder.org-deploy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.