<a href="https://prow.k8s.io/?state=error&cluster=k8s-infra-prow-build" rel="nofol

We had recently peaked in cluster scale though <a target="_blank" re

technically unrelated but similar issue: <a class="issue-link js-issue-link" data-erro

Still happening. <a href="https://prow.k8s.io/view/gs/kubernetes-jen

This impacts jobs that: run on clust

We could do one of: Move all of these to the EKS cluster.

k8s-infra-prow-builds is frequently failing to schedule due to capacity about test-infra HOT 15 CLOSED

BenTheElder commented on August 22, 2024 2

k8s-infra-prow-builds is frequently failing to schedule due to capacity

from test-infra.

Comments (15)

BenTheElder commented on August 22, 2024

We should probably increase the scaling limits for this, it's expected that job migrations will drive more usage ...

cc @ameukam @upodroid @dims

FYI @rjsadow 😅

from test-infra.

BenTheElder commented on August 22, 2024

We still have recent failure to schedule but we're only at 130 nodes currently, and AFAICT we have set the limit to 1-80 per zone ...

from test-infra.

BenTheElder commented on August 22, 2024

We had recently peaked in cluster scale though

from test-infra.

BenTheElder commented on August 22, 2024

https://prow.k8s.io/?state=error has dropped off for the moment, possibly following #32157 🤞

Last error pod was scheduled at 3:11 Pacific, config updated at 3:24

from test-infra.

BenTheElder commented on August 22, 2024

technically unrelated but similar issue: kubernetes/k8s.io#6519 (boskos pool exhausting quota)

from test-infra.

BenTheElder commented on August 22, 2024

Still happening.

https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/123568/pull-kubernetes-e2e-kind/1764802917736386560

There are no nodes that your pod can schedule to - check your requests, tolerations, and node selectors (0/141 nodes are available: 141 Insufficient cpu, 9 Insufficient memory. preemption: 0/141 nodes are available: 141 No preemption victims found for incoming pod..)

Schrödinger's scale-up??

Type	Reason	Age	Source	Message
Warning	FailedScheduling	15m	default-scheduler	0/135 nodes are available: 11 Insufficient memory, 135 Insufficient cpu. preemption: 0/135 nodes are available: 135 No preemption victims found for incoming pod..
Normal	TriggeredScaleUp	16m	cluster-autoscaler	pod triggered scale-up: [{https://www.googleapis.com/compute/v1/projects/k8s-infra-prow-build/zones/us-central1-b/instanceGroups/gke-prow-build-pool5-2021092812495606-3a8095df-grp 41->42 (max: 80)}]
Normal	NotTriggerScaleUp	16m	cluster-autoscaler	pod didn't trigger scale-up (it wouldn't fit if a new node is added): 3 Insufficient cpu
Warning	FailedScheduling	15m	default-scheduler	0/136 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/not-ready: }, 10 Insufficient memory, 135 Insufficient cpu. preemption: 0/136 nodes are available: 1 Preemption is not helpful for scheduling, 135 No preemption victims found for incoming pod..
Warning	FailedScheduling	15m	default-scheduler	0/136 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/network-unavailable: }, 10 Insufficient memory, 135 Insufficient cpu. preemption: 0/136 nodes are available: 1 Preemption is not helpful for scheduling, 135 No preemption victims found for incoming pod..
Warning	FailedScheduling	14m	default-scheduler	0/136 nodes are available: 10 Insufficient memory, 136 Insufficient cpu. preemption: 0/136 nodes are available: 136 No preemption victims found for incoming pod..
Warning	FailedScheduling	14m	default-scheduler	0/136 nodes are available: 136 Insufficient cpu, 9 Insufficient memory. preemption: 0/136 nodes are available: 136 No preemption victims found for incoming pod..
Warning	FailedScheduling	14m	default-scheduler	0/137 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/not-ready: }, 10 Insufficient memory, 136 Insufficient cpu. preemption: 0/137 nodes are available: 1 Preemption is not helpful for scheduling, 136 No preemption victims found for incoming pod..
Warning	FailedScheduling	14m	default-scheduler	0/137 nodes are available: 10 Insufficient memory, 137 Insufficient cpu. preemption: 0/137 nodes are available: 137 No preemption victims found for incoming pod..
Warning	FailedScheduling	13m	default-scheduler	0/139 nodes are available: 10 Insufficient memory, 137 Insufficient cpu, 2 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/139 nodes are available: 137 No preemption victims found for incoming pod, 2 Preemption is not helpful for scheduling..
Warning	FailedScheduling	13m	default-scheduler	0/139 nodes are available: 10 Insufficient memory, 139 Insufficient cpu. preemption: 0/139 nodes are available: 139 No preemption victims found for incoming pod..
Warning	FailedScheduling	12m	default-scheduler	(combined from similar events): 0/141 nodes are available: 141 Insufficient cpu, 9 Insufficient memory. preemption: 0/141 nodes are available: 141 No preemption victims found for incoming pod..
Warning	FailedScheduling	12m	default-scheduler	0/141 nodes are available: 141 Insufficient cpu, 9 Insufficient memory. preemption: 0/141 nodes are available: 141 No preemption victims found for incoming pod..

Maybe we need to increase pod_unscheduled_timeout in prow? We only allow 5m but we have 15m for pod_pending_timeout.

It seems like we're scaling up but not before Prow gives up.

pod didn't trigger scale-up (it wouldn't fit if a new node is added): 3 Insufficient cpu

Doesn't make sense. We're requesting 7 cores, the nodes are 8 core, and I don't see that we're anywhere near exhausting GCE CPU quota. And then it did also scale up ...

from test-infra.

BenTheElder commented on August 22, 2024

Possibly due to system pods ..? (confusing the system if adding a node would help as we run right up against the limit? also maybe one is using more CPU now?)

On a node successfully running pull-kubernetes-e2e-kind:

CPU | 8 CPU | 7.91 CPU | 7.87 CPU

total / allocatable / requested.

7.1 of that is the test pod, .1 of which is sidecar
The rest is system pods.

from test-infra.

BenTheElder commented on August 22, 2024

This impacts jobs that:

run on cluster: k8s-infra-prow-builds
request 7 CPU for the test container (which leads to 7.1 total)

It appears to have gotten bad sometime just before 10am pacific.

from test-infra.

BenTheElder commented on August 22, 2024

We could do one of:

Move all of these to the EKS cluster.
Reduce the CPU requests (and suffer slower jobs and possibly more flakes)
Reduce the CPU requests elsewhere (sidecar, any system agents we control)
to mitigate. But all of these are not ideal longterm.

We partially did 1) which we would have done for some of these anyhow.

Ideally we'd root cause and resolve the apparent scaling decision flap, in any case I'm logging off for the night.

All of this infra is managed in this repo or github.com/kubernetes/k8s.io at least, so someone else could pick this up in the interim. I'm going to be in meetings all morning unfortunately.

from test-infra.

BenTheElder commented on August 22, 2024

https://kubernetes.slack.com/archives/CCK68P2Q2/p1709622064921789?thread_ts=1709203049.908399&cid=CCK68P2Q2

this is probably due to kubernetes/k8s.io#6468

which adds a new daemonset requesting .2 CPU limit, meanwhile these jobs were requesting almost 100% of schedulable CPU by design (to avoid noisy neighbors, they're very I/O and CPU heavy)

from test-infra.

BenTheElder commented on August 22, 2024

Given code freeze is pending in like one day, we should probably revert for now and then evaluate follow-up options?

This is having significant impact on merging to kubernetes as required presubmits are failing to schedule.

https://www.kubernetes.dev/resources/release/

from test-infra.

upodroid commented on August 22, 2024

This should be mitigated now as I deleted the daemonset.

from test-infra.

BenTheElder commented on August 22, 2024

Yes, it appears to be: https://prow.k8s.io/?state=error

from test-infra.

BenTheElder commented on August 22, 2024

So to conclude:

We schedule many jobs that use ~100% of available CPU because:
a) they'll happily use it
b) they're doing builds or running kind/etcd/local-up-cluster or other I/O heavy workloads and I/O is not schedulable, but we can prevent other CI jobs by not leaving room for their CPU requests on the same node to prevent I/O contention

For a long time that has meant requesting 7 cores (+0.1 for prow's sidecar), since we've run on 8 core nodes and there are some system reserved covering part of the remaining core and no job is requesting <1 core.

Looking at k8s-infra-prow-builds right now, we have:

Resource type	Capacity	Allocatable	Total requested
CPU	8 CPU	7.91 CPU	7.88 CPU

So we can't fit the 200m CPU daemonset (kubernetes/k8s.io#6521) and that breaks auto-scaling.

Pods for sample node running 7.1 core prowjob:

Name	Status	CPU requested	Memory requested	Namespace	Created on
ip-masq-agent-h9tlv	Running	10 mCPU	16.78 MB	kube-system	Mar 4, 2024, 8:54:44 AM
tune-sysctls-9bwvr	Running	0 CPU	0 B	kube-system	Mar 4, 2024, 8:54:44 AM
pdcsi-node-gmm8z	Running	10 mCPU	20.97 MB	kube-system	Mar 4, 2024, 8:54:44 AM
kube-proxy-gke-prow-build-pool5-2021092812495606-e8f905a4-sjcm	Running	100 mCPU	0 B	kube-system	Mar 4, 2024, 8:54:44 AM
create-loop-devs-4rvgp	Running	0 CPU	0 B	kube-system	Mar 4, 2024, 8:54:44 AM
gke-metadata-server-m6nvx	Running	100 mCPU	104.86 MB	kube-system	Mar 4, 2024, 8:54:44 AM
netd-q2x2q	Running	2 mCPU	31.46 MB	kube-system	Mar 4, 2024, 8:54:44 AM
network-metering-agent-ttgwq	Running	0 CPU	0 B	kube-system	Mar 4, 2024, 8:54:44 AM
gke-metrics-agent-tr7hf	Running	6 mCPU	104.86 MB	kube-system	Mar 4, 2024, 8:54:44 AM
node-local-dns-l4q7j	Running	25 mCPU	20.97 MB	kube-system	Mar 4, 2024, 8:54:44 AM
fluentbit-gke-xgpxf	Running	100 mCPU	209.72 MB	kube-system	Mar 4, 2024, 8:54:44 AM
konnectivity-agent-77c57877b6-4n4jx	Running	10 mCPU	31.46 MB	kube-system	Mar 5, 2024, 6:10:59 AM
calico-node-zdsn2	Running	420 mCPU	0 B	kube-system	Mar 5, 2024, 7:48:41 AM
0ab611de-742f-4f23-b405-fc049c25febf	Running	7.1 CPU	37.58 GB	test-pods	Mar 5, 2024, 8:25:27 AM

Least loaded node with 7.1 core prowjob:

Resource type	Capacity	Allocatable	Total requested
CPU	8 CPU	7.91 CPU	7.85 CPU

We only have .06 CPU overhead at most currently for nodes running these jobs.

Pods on that node:

Name	Status	CPU requested	Memory requested	Namespace	Created on
f1b12fca-5529-4b74-9bcf-ca265ed18085	Running	7.1 CPU	10.74 GB	test-pods	Mar 5, 2024, 8:56:27 AM
calico-node-4xwzb	Running	400 mCPU	0 B	kube-system	Mar 5, 2024, 9:22:04 AM
fluentbit-gke-4dv2h	Running	100 mCPU	209.72 MB	kube-system	Mar 4, 2024, 10:13:25 PM
kube-proxy-gke-prow-build-pool5-2021092812495606-e8f905a4-z7zp	Running	100 mCPU	0 B	kube-system	Mar 4, 2024, 10:13:25 PM
gke-metadata-server-q5hld	Running	100 mCPU	104.86 MB	kube-system	Mar 4, 2024, 10:13:26 PM
node-local-dns-qhvl5	Running	25 mCPU	20.97 MB	kube-system	Mar 4, 2024, 10:13:26 PM
pdcsi-node-5k9fs	Running	10 mCPU	20.97 MB	kube-system	Mar 4, 2024, 10:13:25 PM
ip-masq-agent-fjz9r	Running	10 mCPU	16.78 MB	kube-system	Mar 4, 2024, 10:13:26 PM
gke-metrics-agent-ktb99	Running	6 mCPU	104.86 MB	kube-system	Mar 4, 2024, 10:13:25 PM
netd-nxr9b	Running	2 mCPU	31.46 MB	kube-system	Mar 4, 2024, 10:13:26 PM
network-metering-agent-9wcb5	Running	0 CPU	0 B	kube-system	Mar 4, 2024, 10:13:25 PM
create-loop-devs-97mqf	Running	0 CPU	0 B	kube-system	Mar 4, 2024, 10:13:25 PM
tune-sysctls-27pwv	Running	0 CPU	0 B	kube-system	Mar 4, 2024, 10:13:25 PM

We either have to keep daemonset additions extremely negligible, or we need to reduce the CPU available to these heavy jobs (and that means identify and updating ALL of them to prevent leaving jobs failing to schedule).

Presumably, we have slightly different resources available on the EKS nodes, enough to fit this daemonset alongside while still scheduling 7.1 cores, but we fundamentally have the same risk there.

Additionally: We ensure all of our jobs have guaranteed QOS via presubmit tests for the jobs, we should probably be doing at least manually doing this for anything else we install. The create dev loops and tune sysctls are an exception because they're doing almost nothing and don't really need any guaranteed resources.

from test-infra.

BenTheElder commented on August 22, 2024

@upodroid points out here kubernetes/k8s.io#6525 (comment) that we should probably just disable calico network policy and get back .4 CPU/node for custom metrics daemonsets.

We are not running it on the old build cluster and I don't think we need it.

from test-infra.

k8s-infra-prow-builds is frequently failing to schedule due to capacity about test-infra HOT 15 CLOSED

Comments (15)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent