Comments (7)
This would also help with the "out of space" error we have been seeing a few times and that appears to be fixed by removing layers/docker images. Though it seems unclear why exactly it works, or what it also changes that then fixes this problem.
To get an idea how this would work: to remove a node would we first kubectl cordon <node>
(kubectl drain
deletes pods straight away right?), add a new node (*), wait for the node to be empty of user pods, remove node.
(*) done by increasing the minimum size of the cluster?
from mybinder.org-deploy.
I would say the way to do this is:
- Pick oldest node
- Cordon it
- Wait for it to be empty of user and build pods(can be several hours!)
- Delete it (via gcloud API)
Deleting the node should bring up a new one.
And yup, kubectl drain deletes pods straight away!
from mybinder.org-deploy.
Note that currently making changes to the Kubernetes Cluster (such as changing minimum size of cluster) causes a multi-minute GKE outage, since GKE does not do Highly Available masters yet. But deleting a node is fine...
from mybinder.org-deploy.
I was thinking cordon off + increase capacity by one so that the overall number of available nodes stays the same. Currently we are pretty comfortable in terms of capacity so might not be a problem.
What is a GKE outage? We can't issue GKE related commands or our existing nodes stop doing stuff?
from mybinder.org-deploy.
Related to this, I tried to add a new node to the staging cluster to practice/mess around. This fails with the following error:
"Instance 'gke-staging-default-pool-8014xxxx-zxxx' creation failed: Quota 'DISKS_TOTAL_GB' exceeded. Limit: 4096.0 in region us-central1"
The command I used was gcloud compute instance-groups managed resize gke-staging-default-pool-.... --zone us-central1-a --project binder-staging --size 3
What is weird about this is that prod has the same limit (4096), more nodes and less of the quota is used. Not sure I understand that.
from mybinder.org-deploy.
@betatim I checked the disks on staging, and several disks from previous test clusters were hanging around, using up the quota. It should be available now.
I also think the staging cluster could be using smaller disks. 1TB disks is a bit overkill for the staging load.
from mybinder.org-deploy.
This is no longer necessary since we switched to using DIND daemons. I'll close this for now!
from mybinder.org-deploy.
Related Issues (20)
- High School science workshop at Vanderbilt for https://github.com/rkunnawa/ssmv_eic HOT 4
- https://github.com/Qiskit/platypus-binder has twice the number of launches of other repositories HOT 3
- GESIS BinderHub server saw number of Succeeded pods increased around 2023-06-21 16:00 CEST
- GESIS BinderHub server was accumulating Running pods that were more than 1 day old HOT 3
- https://grafana.mybinder.org is returning 503 Service Temporarily Unavailable most of the time
- GESIS Error during build: UnixHTTPConnectionPool HOT 2
- Error: No space left on device HOT 2
- RuntimeError: Could not set URL HOT 6
- Request to bump resources for PyVistaによる3Dビジュアライゼーション tutorial at PyConAPAC 2023 HOT 2
- GESIS server overwhelming with `Terminating` pods HOT 4
- Getting Too Many Users error - but pretty sure there are no users... HOT 6
- Image build request to GESIS node are failing HOT 1
- Improve error handling from repo2docker HOT 3
- GESIS node fail to launch repository HOT 4
- Request to bump resources for tutorials and notebook talks at PyHEP 2023 HOT 2
- [Documentation / Governance ] New Documenation Working Group issue under JupyterLab org for all Jupyter docs HOT 5
- COMET InSAR Workshop 2023 HOT 9
- Gesis git trigger fails: certificate expired HOT 2
- URL to share produces "Failed to connect to event stream" HOT 1
- OVH: Failed launch of large image (1.2 GB) HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mybinder.org-deploy.