Comments (9)
@Wenliang-CHEN Keeping fingers crossed for you -- enjoy the holiday! 🙂
from linkerd2.
@Wenliang-CHEN Happy new year!! Just wanted to make sure this was still on your radar. 🙂
from linkerd2.
This sounds a bit similar to an issue we had where the destination controller could become locked and stop processing service discovery updates. However, this bug was fixed in stable-2.14.2 and should not affect you in stable-2.14.3. In order to rule out that possibility, you could take a look at the endpoints_updates
counter metric exposed by the destination controller:
linkerd diagnostics controller-metrics | grep endpoints_updates
You should see this counter incremented when the endpoints of a service change. If, instead, this counter remains at the same value, it means that the destination controller is not processing updates for some reason.
In stable-2.14.4 we added *_informer_lag_secs
histogram metrics to the destination controller for even more visibility. If you upgrade to stable-2.14.4 or later you can use these histograms to see if there is a substantial lag between when endpoints are updated in Kubernetes vs when the destination controller processes those updates.
from linkerd2.
Hey @adleong , thanks for the reply.
And yes, I do see the endpoints_updates counter incremented after the deployment of the target service: service A. With that I guess the destination controller was processing.
A couple of things worth mentioning:
- the issue happens about 20mins after the deployment of the target service.
- If we restart the deployment that owns the outbound pod, the issue is solved
Does it change anything?
And as action item, I think we will try to update to stable-2.14.4
and take a look at *_informer_lag_secs
as well.
Meanwhile, if we found anything new, we will report in the thread again.
Thanks!
from linkerd2.
@Wenliang-CHEN Any joy trying with stable-2.14.4
? 🙂
from linkerd2.
Hey @kflynn not yet...around Christmas holiday. I will let you know 😄
But there has not been another instance since I reported the issue. But to be safe, we are still observing...
from linkerd2.
Hey @kflynn happy new year!
And yes, we have not forgotten this. We just upgraded to v2.14.9. And so far we did not get any report about the same issue.
Hopefully the upgrade somehow fixes it. We will monitor it through out Feb. If there is no further report, I think we can close it for now. Thanks!
from linkerd2.
Okay, the issue happens again.
We are able to get the linkerd.endpoints_updates
, linkerd.endpointslices_informer_lag_seconds.bucket
and linkerd.endpoints_informer_lag_seconds.bucket
It seems they go in patterns: the linkerd.endpointslices_informer_lag_seconds.bucket
goes with linkerd.endpoints_updates
:
And the linkerd.endpoints_informer_lag_seconds.bucket
is aways 0
We are not sure how to understand this. Do they mean anything particular? Or are they totally normal?
from linkerd2.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.
from linkerd2.
Related Issues (20)
- After node restart linkerd-cni pod hast to be restarted sometimes HOT 4
- Default Server policy on linkerd-jaeger prohibits jaeger-ui access HOT 1
- Headless endpoint mirrors are incorrectly cleaned up as part of GC
- timestamp is in weird format HOT 2
- BadSignature error when using ec with key_bits 512 (works with 256) HOT 1
- CPU Spikes when upgrading to 2.4.10 from 2.4.0 HOT 3
- Linkerd CNI pods not aware about the OIDC signing key auto-rotation by AKS|
- PodMonitor linkerd-proxy - Creates duplicate timestamp metric labels
- `linkerd-destination` OOMKilled due to discovery spike in linkerd P2P multicluster, renders cluster inoperable HOT 5
- HTTPRoute intermittently fails to distribute traffic HOT 6
- Intermittent routing failures with HTTPRoute HOT 9
- Linkerd-proxy logging full header contents of incoming http requests for log level debug and trace. HOT 3
- Allow port ranges in dynamic authorization policy resources
- Prometheus metrics scrapes of `linkerd-proxy` are not TLS protected (occassionally) HOT 6
- Change default `cr.l5d.io` to `ghcr.io`? HOT 1
- Linkerd Multi-Cluster service-mirroring to give option to mirror EndpointSlices as well
- Helm upgrade always changing due to trust root? HOT 2
- Connection refused randomly for pairs of pods HOT 4
- Destination container in the linkerd-destination pod panics when using deployments with headless services
- Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from linkerd2.