Comments (9)
Here is a more severe case, where all pods are in not ready state:
❯ kubectl get po -n cert-manager
NAME READY STATUS RESTARTS AGE
cert-manager-istio-csr-79ffc5bfd-q4qw8 0/1 Running 0 20d
cert-manager-istio-csr-79ffc5bfd-vrjdd 0/1 Running 0 20d
cert-manager-istio-csr-79ffc5bfd-xs9mj 0/1 Running 0 20d
❯ kubectl describe po cert-manager-istio-csr-79ffc5bfd-xs9mj -n cert-manager
Name: cert-manager-istio-csr-79ffc5bfd-xs9mj
Namespace: cert-manager
Priority: 0
Node: ip-10-136-208-186.ec2.internal/10.136.208.186
Start Time: Wed, 17 Feb 2021 16:19:11 -0500
Labels: app=cert-manager-istio-csr
pod-template-hash=79ffc5bfd
Annotations: kubernetes.io/psp: eks.privileged
Status: Running
IP: 10.136.212.19
IPs:
IP: 10.136.212.19
Controlled By: ReplicaSet/cert-manager-istio-csr-79ffc5bfd
Containers:
cert-manager-istio-csr:
Container ID: docker://844832e7090dd643e7e296def2cbe8c3c9519d6f38537480a2510bf63a3ace7d
Image: quay.io/jetstack/cert-manager-istio-csr:v0.1.0
Image ID: docker-pullable://quay.io/jetstack/cert-manager-istio-csr@sha256:f9d473fa10520d0a255a4b60350a9f9057834da762129f9e5ecb9681955b1fd0
Port: 6443/TCP
Host Port: 0/TCP
Command:
cert-manager-istio-csr
Args:
--log-level=1
--readiness-probe-port=6060
--readiness-probe-path=/readyz
--serving-address=0.0.0.0:6443
--serving-certificate-duration=24h
--root-ca-configmap-name=istio-ca-root-cert
--certificate-namespace=istio-system
--issuer-group=cert-manager.io
--issuer-kind=ClusterIssuer
--issuer-name=vault-issuer
--max-client-certificate-duration=24h
--preserve-certificate-requests=false
State: Running
Started: Wed, 17 Feb 2021 16:19:37 -0500
Ready: False
Restart Count: 0
Readiness: http-get http://:6060/readyz delay=3s timeout=1s period=7s #success=1 #failure=3
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from cert-manager-istio-csr-token-h42zw (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
cert-manager-istio-csr-token-h42zw:
Type: Secret (a volume populated by a Secret)
SecretName: cert-manager-istio-csr-token-h42zw
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 2m43s (x139117 over 20d) kubelet Readiness probe failed: Get http://10.136.212.19:6060/readyz: dial tcp 10.136.212.19:6060: connect: connection refused
The only log of interested here is around RBAC:
E0226 17:37:13.286265 1 event.go:264] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:".16675cce7821d48e", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"ConfigMap", Namespace:"", Name:"", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}, Reason:"LeaderElection", Message:"cert-manager-istio-csr-79ffc5bfd-xs9mj_7e03f6c1-8793-4729-aa4d-4ca47180a174 stopped leading", Source:v1.EventSource{Component:"cert-manager-istio-csr-79ffc5bfd-xs9mj_7e03f6c1-8793-4729-aa4d-4ca47180a174", Host:""}, FirstTimestamp:v1.Time{Time:time.Time{wall:0xc0066a5250ef3a8e, ext:764255996264212, loc:(*time.Location)(0x27b9ac0)}}, LastTimestamp:v1.Time{Time:time.Time{wall:0xc0066a5250ef3a8e, ext:764255996264212, loc:(*time.Location)(0x27b9ac0)}}, Count:1, Type:"Normal", EventTime:v1.MicroTime{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'events is forbidden: User "system:serviceaccount:cert-manager:cert-manager-istio-csr" cannot create resource "events" in API group "" in the namespace "default"' (will not retry!)
from istio-csr.
Thanks for opening this @adnankobir. This is quite a concerning bug.. will spend some time to see if I can replicate the issue.
Roughly how long does it take for the probe to start failing?
from istio-csr.
Thanks @JoshVanL
I don't have a rough estimate, it appears to be completely random, fortunately I have this deployed in 7 clusters, some clusters exhibit this behaviour within a matter of hours, others a couple of days.
As a workaround for now, what are the implications of removing the healthchecks or better yet simply doing a TCP check on the serving address?
from istio-csr.
Very strange. This readiness endpoint is managed through controller-runtime so that is where I'll be looking first. The only times where this check is set to false after successful initialization, is during termination.
The implications of removing the check is that the pod will receive request traffic before it has initialized (fetched a serving cert and has begun serving). Changing to TCP may work, but it would be interesting to see that TCP works and not HTTP.
Another option is to add a liveness probe with the same check, so long as it had a reasonable initialDelaySeconds
to complete the initial initialization (something very large like 10m is probably fine here). This would at least kill the pod and come back up being ready...
from istio-csr.
In case it helps, I had the same issue on 2 clusters at the exact same time, 30 days after the pods were created. Restarting the pods seems to have fixed it for now.
from istio-csr.
I've briefly tested this (on GKE, k8s v1.19) - it seems that istio-csr
pods become not ready if cert-manager
webhook goes down whilst istio-csr
is processing some certificate requests.
It seems to remain not ready even after webhook is healthy again.
Adding liveness probe as @JoshVanL mentions above seems to fix that issue. I've not done any extensive testing on this though.
from istio-csr.
Thanks all. I have managed to track down this issue;
If there is a transient network connectivity error or similar, istio-csr will lose leader election or fail to renew the lease. If this happens the pod become unready. To resolve this, istio-csr now correctly exits which will allow either another istio-csr to assume the leader, or for the pod to come back up as the leader.
This has been fixed in this PR: #62
from istio-csr.
This fix has been merged as part of the v0.2.0 release.
Closing this for now, but please feel free to open if you continue to have issues.
/close
from istio-csr.
@JoshVanL: Closing this issue.
In response to this:
This fix has been merged as part of the v0.2.0 release.
Closing this for now, but please feel free to open if you continue to have issues.
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
from istio-csr.
Related Issues (20)
- istio-csr should seperate leases role permissions from cert-manager issuer namespace
- Third-party JWT issue HOT 1
- add the compatibility matrix for Kubernetes versions to README
- Add ability to annotate certificate requests generateed by istio-csr HOT 1
- Add custom annotations to deployment HOT 3
- charts.jetstack.io beding cluster presents a challenge and breaks deployment
- istio-csr vault integration - permission denied - Vault failed to sign certificate HOT 2
- Restarting a namespace with 30+ deployments causes errors in istio-csr which tends to reolve after a while. HOT 1
- Custom DNS support in istio-csr's istiod certificate HOT 1
- False positive warnings from trivy and dependabot HOT 2
- ClusterRole & ClusterRoleBindings for istio-csr
- TODO: tests - carotation creates two kind clusters
- Populate Subject Fields in Certificate HOT 1
- CSR generation always defaults to P256 curve due to missing parameter HOT 4
- It is not possible to provide SAN for istiod certificate HOT 2
- how to build oci image locally using make command HOT 1
- Istio sidecar can only request new cert using istio-token HOT 1
- Document / improve that sometimes the issuer needs to set `ca.crt`
- Image version is v0.0.0 HOT 4
- Getting Readiness probe failed when using cert-manager-istio-csr
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from istio-csr.