Giter Site home page Giter Site logo

Comments (15)

BenTheElder avatar BenTheElder commented on August 22, 2024

We should probably increase the scaling limits for this, it's expected that job migrations will drive more usage ...

cc @ameukam @upodroid @dims

FYI @rjsadow 😅

from test-infra.

BenTheElder avatar BenTheElder commented on August 22, 2024

We still have recent failure to schedule but we're only at 130 nodes currently, and AFAICT we have set the limit to 1-80 per zone ...

from test-infra.

BenTheElder avatar BenTheElder commented on August 22, 2024

We had recently peaked in cluster scale though

Cluster CPU capacity_ allocatable, sum(limit)

Node - Total, Request, Allocatable CPU cores
Node - Total ephemeral storage

from test-infra.

BenTheElder avatar BenTheElder commented on August 22, 2024

https://prow.k8s.io/?state=error has dropped off for the moment, possibly following #32157 🤞

Last error pod was scheduled at 3:11 Pacific, config updated at 3:24

from test-infra.

BenTheElder avatar BenTheElder commented on August 22, 2024

technically unrelated but similar issue: kubernetes/k8s.io#6519 (boskos pool exhausting quota)

from test-infra.

BenTheElder avatar BenTheElder commented on August 22, 2024

Still happening.

https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/123568/pull-kubernetes-e2e-kind/1764802917736386560

There are no nodes that your pod can schedule to - check your requests, tolerations, and node selectors (0/141 nodes are available: 141 Insufficient cpu, 9 Insufficient memory. preemption: 0/141 nodes are available: 141 No preemption victims found for incoming pod..)

Schrödinger's scale-up??

Type	Reason	Age	Source	Message
Warning	FailedScheduling	15m	default-scheduler	0/135 nodes are available: 11 Insufficient memory, 135 Insufficient cpu. preemption: 0/135 nodes are available: 135 No preemption victims found for incoming pod..
Normal	TriggeredScaleUp	16m	cluster-autoscaler	pod triggered scale-up: [{https://www.googleapis.com/compute/v1/projects/k8s-infra-prow-build/zones/us-central1-b/instanceGroups/gke-prow-build-pool5-2021092812495606-3a8095df-grp 41->42 (max: 80)}]
Normal	NotTriggerScaleUp	16m	cluster-autoscaler	pod didn't trigger scale-up (it wouldn't fit if a new node is added): 3 Insufficient cpu
Warning	FailedScheduling	15m	default-scheduler	0/136 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/not-ready: }, 10 Insufficient memory, 135 Insufficient cpu. preemption: 0/136 nodes are available: 1 Preemption is not helpful for scheduling, 135 No preemption victims found for incoming pod..
Warning	FailedScheduling	15m	default-scheduler	0/136 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/network-unavailable: }, 10 Insufficient memory, 135 Insufficient cpu. preemption: 0/136 nodes are available: 1 Preemption is not helpful for scheduling, 135 No preemption victims found for incoming pod..
Warning	FailedScheduling	14m	default-scheduler	0/136 nodes are available: 10 Insufficient memory, 136 Insufficient cpu. preemption: 0/136 nodes are available: 136 No preemption victims found for incoming pod..
Warning	FailedScheduling	14m	default-scheduler	0/136 nodes are available: 136 Insufficient cpu, 9 Insufficient memory. preemption: 0/136 nodes are available: 136 No preemption victims found for incoming pod..
Warning	FailedScheduling	14m	default-scheduler	0/137 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/not-ready: }, 10 Insufficient memory, 136 Insufficient cpu. preemption: 0/137 nodes are available: 1 Preemption is not helpful for scheduling, 136 No preemption victims found for incoming pod..
Warning	FailedScheduling	14m	default-scheduler	0/137 nodes are available: 10 Insufficient memory, 137 Insufficient cpu. preemption: 0/137 nodes are available: 137 No preemption victims found for incoming pod..
Warning	FailedScheduling	13m	default-scheduler	0/139 nodes are available: 10 Insufficient memory, 137 Insufficient cpu, 2 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/139 nodes are available: 137 No preemption victims found for incoming pod, 2 Preemption is not helpful for scheduling..
Warning	FailedScheduling	13m	default-scheduler	0/139 nodes are available: 10 Insufficient memory, 139 Insufficient cpu. preemption: 0/139 nodes are available: 139 No preemption victims found for incoming pod..
Warning	FailedScheduling	12m	default-scheduler	(combined from similar events): 0/141 nodes are available: 141 Insufficient cpu, 9 Insufficient memory. preemption: 0/141 nodes are available: 141 No preemption victims found for incoming pod..
Warning	FailedScheduling	12m	default-scheduler	0/141 nodes are available: 141 Insufficient cpu, 9 Insufficient memory. preemption: 0/141 nodes are available: 141 No preemption victims found for incoming pod..

Maybe we need to increase pod_unscheduled_timeout in prow? We only allow 5m but we have 15m for pod_pending_timeout.

It seems like we're scaling up but not before Prow gives up.

pod didn't trigger scale-up (it wouldn't fit if a new node is added): 3 Insufficient cpu

Doesn't make sense. We're requesting 7 cores, the nodes are 8 core, and I don't see that we're anywhere near exhausting GCE CPU quota. And then it did also scale up ...

from test-infra.

BenTheElder avatar BenTheElder commented on August 22, 2024

Possibly due to system pods ..? (confusing the system if adding a node would help as we run right up against the limit? also maybe one is using more CPU now?)

On a node successfully running pull-kubernetes-e2e-kind:

CPU | 8 CPU | 7.91 CPU | 7.87 CPU

total / allocatable / requested.

7.1 of that is the test pod, .1 of which is sidecar
The rest is system pods.

from test-infra.

BenTheElder avatar BenTheElder commented on August 22, 2024

This impacts jobs that:

  1. run on cluster: k8s-infra-prow-builds
  2. request 7 CPU for the test container (which leads to 7.1 total)

It appears to have gotten bad sometime just before 10am pacific.

from test-infra.

BenTheElder avatar BenTheElder commented on August 22, 2024

We could do one of:

  1. Move all of these to the EKS cluster.
  2. Reduce the CPU requests (and suffer slower jobs and possibly more flakes)
  3. Reduce the CPU requests elsewhere (sidecar, any system agents we control)
    to mitigate. But all of these are not ideal longterm.

We partially did 1) which we would have done for some of these anyhow.

Ideally we'd root cause and resolve the apparent scaling decision flap, in any case I'm logging off for the night.

All of this infra is managed in this repo or github.com/kubernetes/k8s.io at least, so someone else could pick this up in the interim. I'm going to be in meetings all morning unfortunately.

from test-infra.

BenTheElder avatar BenTheElder commented on August 22, 2024

https://kubernetes.slack.com/archives/CCK68P2Q2/p1709622064921789?thread_ts=1709203049.908399&cid=CCK68P2Q2

this is probably due to kubernetes/k8s.io#6468

which adds a new daemonset requesting .2 CPU limit, meanwhile these jobs were requesting almost 100% of schedulable CPU by design (to avoid noisy neighbors, they're very I/O and CPU heavy)

from test-infra.

BenTheElder avatar BenTheElder commented on August 22, 2024

Given code freeze is pending in like one day, we should probably revert for now and then evaluate follow-up options?

This is having significant impact on merging to kubernetes as required presubmits are failing to schedule.

https://www.kubernetes.dev/resources/release/

from test-infra.

upodroid avatar upodroid commented on August 22, 2024

This should be mitigated now as I deleted the daemonset.

from test-infra.

BenTheElder avatar BenTheElder commented on August 22, 2024

Yes, it appears to be: https://prow.k8s.io/?state=error

from test-infra.

BenTheElder avatar BenTheElder commented on August 22, 2024

So to conclude:

We schedule many jobs that use ~100% of available CPU because:
a) they'll happily use it
b) they're doing builds or running kind/etcd/local-up-cluster or other I/O heavy workloads and I/O is not schedulable, but we can prevent other CI jobs by not leaving room for their CPU requests on the same node to prevent I/O contention

For a long time that has meant requesting 7 cores (+0.1 for prow's sidecar), since we've run on 8 core nodes and there are some system reserved covering part of the remaining core and no job is requesting <1 core.

Looking at k8s-infra-prow-builds right now, we have:

Resource type Capacity Allocatable Total requested  
CPU 8 CPU 7.91 CPU 7.88 CPU

So we can't fit the 200m CPU daemonset (kubernetes/k8s.io#6521) and that breaks auto-scaling.

Pods for sample node running 7.1 core prowjob:

Name Status CPU requested Memory requested Storage requested Namespace Restarts Created on
ip-masq-agent-h9tlv Running 10 mCPU 16.78 MB 0 B kube-system 0 Mar 4, 2024, 8:54:44 AM
tune-sysctls-9bwvr Running 0 CPU 0 B 0 B kube-system 0 Mar 4, 2024, 8:54:44 AM
pdcsi-node-gmm8z Running 10 mCPU 20.97 MB 0 B kube-system 0 Mar 4, 2024, 8:54:44 AM
kube-proxy-gke-prow-build-pool5-2021092812495606-e8f905a4-sjcm Running 100 mCPU 0 B 0 B kube-system 0 Mar 4, 2024, 8:54:44 AM
create-loop-devs-4rvgp Running 0 CPU 0 B 0 B kube-system 0 Mar 4, 2024, 8:54:44 AM
gke-metadata-server-m6nvx Running 100 mCPU 104.86 MB 0 B kube-system 0 Mar 4, 2024, 8:54:44 AM
netd-q2x2q Running 2 mCPU 31.46 MB 0 B kube-system 0 Mar 4, 2024, 8:54:44 AM
network-metering-agent-ttgwq Running 0 CPU 0 B 0 B kube-system 0 Mar 4, 2024, 8:54:44 AM
gke-metrics-agent-tr7hf Running 6 mCPU 104.86 MB 0 B kube-system 0 Mar 4, 2024, 8:54:44 AM
node-local-dns-l4q7j Running 25 mCPU 20.97 MB 0 B kube-system 0 Mar 4, 2024, 8:54:44 AM
fluentbit-gke-xgpxf Running 100 mCPU 209.72 MB 0 B kube-system 0 Mar 4, 2024, 8:54:44 AM
konnectivity-agent-77c57877b6-4n4jx Running 10 mCPU 31.46 MB 0 B kube-system 0 Mar 5, 2024, 6:10:59 AM
calico-node-zdsn2 Running 420 mCPU 0 B 0 B kube-system 0 Mar 5, 2024, 7:48:41 AM
0ab611de-742f-4f23-b405-fc049c25febf Running 7.1 CPU 37.58 GB 0 B test-pods 0 Mar 5, 2024, 8:25:27 AM

Least loaded node with 7.1 core prowjob:

Resource type Capacity Allocatable Total requested  
CPU 8 CPU 7.91 CPU 7.85 CPU

We only have .06 CPU overhead at most currently for nodes running these jobs.

Pods on that node:

Name Status CPU requested Memory requested Storage requested Namespace Restarts Created on
f1b12fca-5529-4b74-9bcf-ca265ed18085 Running 7.1 CPU 10.74 GB 0 B test-pods 0 Mar 5, 2024, 8:56:27 AM
calico-node-4xwzb Running 400 mCPU 0 B 0 B kube-system 0 Mar 5, 2024, 9:22:04 AM
fluentbit-gke-4dv2h Running 100 mCPU 209.72 MB 0 B kube-system 0 Mar 4, 2024, 10:13:25 PM
kube-proxy-gke-prow-build-pool5-2021092812495606-e8f905a4-z7zp Running 100 mCPU 0 B 0 B kube-system 0 Mar 4, 2024, 10:13:25 PM
gke-metadata-server-q5hld Running 100 mCPU 104.86 MB 0 B kube-system 0 Mar 4, 2024, 10:13:26 PM
node-local-dns-qhvl5 Running 25 mCPU 20.97 MB 0 B kube-system 0 Mar 4, 2024, 10:13:26 PM
pdcsi-node-5k9fs Running 10 mCPU 20.97 MB 0 B kube-system 0 Mar 4, 2024, 10:13:25 PM
ip-masq-agent-fjz9r Running 10 mCPU 16.78 MB 0 B kube-system 0 Mar 4, 2024, 10:13:26 PM
gke-metrics-agent-ktb99 Running 6 mCPU 104.86 MB 0 B kube-system 0 Mar 4, 2024, 10:13:25 PM
netd-nxr9b Running 2 mCPU 31.46 MB 0 B kube-system 0 Mar 4, 2024, 10:13:26 PM
network-metering-agent-9wcb5 Running 0 CPU 0 B 0 B kube-system 0 Mar 4, 2024, 10:13:25 PM
create-loop-devs-97mqf Running 0 CPU 0 B 0 B kube-system 0 Mar 4, 2024, 10:13:25 PM
tune-sysctls-27pwv Running 0 CPU 0 B 0 B kube-system 0 Mar 4, 2024, 10:13:25 PM

We either have to keep daemonset additions extremely negligible, or we need to reduce the CPU available to these heavy jobs (and that means identify and updating ALL of them to prevent leaving jobs failing to schedule).

Presumably, we have slightly different resources available on the EKS nodes, enough to fit this daemonset alongside while still scheduling 7.1 cores, but we fundamentally have the same risk there.

Additionally: We ensure all of our jobs have guaranteed QOS via presubmit tests for the jobs, we should probably be doing at least manually doing this for anything else we install. The create dev loops and tune sysctls are an exception because they're doing almost nothing and don't really need any guaranteed resources.

from test-infra.

BenTheElder avatar BenTheElder commented on August 22, 2024

@upodroid points out here kubernetes/k8s.io#6525 (comment) that we should probably just disable calico network policy and get back .4 CPU/node for custom metrics daemonsets.

We are not running it on the old build cluster and I don't think we need it.

from test-infra.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.