Comments (7)
Is there anything preventing agent.Reload
from being called more than once within the 256ms race-condition derisk delay? Looking at this code, I see that we have this,
https://github.com/istio/manager/blob/master/proxy/envoy/watcher.go#L104
but does anything prevent watcher.reload
from being called more than once within the 256ms?
In Amalgam8, we have the 256ms delay inside our Envoy reload method inside of a lock. This prevents Envoy reload from being triggered more than once in less than 256ms:
https://github.com/amalgam8/amalgam8/blob/master/sidecar/proxy/envoy/service.go#L99
from pilot.
The controller has a single event queue that collects all notifications from k8s and then dispatches handlers one by one. So if you block a single handler, we should be blocking the entire queue. This is why we don't do any locking in the handlers, it's all single-threaded (see https://github.com/istio/manager/blob/master/platform/kube/queue.go). My best guess is that under some conditions, Envoy crashes due to a problem with our config, and that screws up the graceful restart protocol. I'm watching for the next time this appears to collect all logs. We're very noisy with the logs, so last time the important bits got rotated.
from pilot.
I found another issue with envoy hot restart that may (or may not) be related. Looks like the epoch goes out of sync if envoy hot restart fails (see domain socket error below and attached gist).
unable to initialize hot restart: unable to bind domain socket with id=3 (see --base-id option)
$ grep -rn "Envoy starting" ~/envoy-crash-log | cut -c -130
35:}I0303 18:46:15.088478 1 agent.go:133] Envoy starting: [-c /etc/envoy/envoy-rev0.json --restart-epoch 0 --drain-time-s 30
730:}I0303 18:46:17.576076 1 agent.go:133] Envoy starting: [-c /etc/envoy/envoy-rev1.json --restart-epoch 1 --drain-time-s 3
1395:}I0303 18:46:18.427845 1 agent.go:133] Envoy starting: [-c /etc/envoy/envoy-rev2.json --restart-epoch 2 --drain-time-s
2076:I0303 18:46:19.483945 1 agent.go:133] Envoy starting: [-c /etc/envoy/envoy-rev3.json --restart-epoch 3 --drain-time-s 3
2757:}I0303 18:46:20.884821 1 agent.go:133] Envoy starting: [-c /etc/envoy/envoy-rev4.json --restart-epoch 4 --drain-time-s
3446:}I0303 18:46:22.327936 1 agent.go:133] Envoy starting: [-c /etc/envoy/envoy-rev5.json --restart-epoch 5 --drain-time-s
4114:}I0303 18:46:23.263955 1 agent.go:133] Envoy starting: [-c /etc/envoy/envoy-rev6.json --restart-epoch 6 --drain-time-s
4790:}I0303 18:46:24.095951 1 agent.go:133] Envoy starting: [-c /etc/envoy/envoy-rev7.json --restart-epoch 7 --drain-time-s
5471:I0303 18:46:25.570984 1 agent.go:133] Envoy starting: [-c /etc/envoy/envoy-rev8.json --restart-epoch 8 --drain-time-s 3
6136:}I0303 18:46:26.364040 1 agent.go:133] Envoy starting: [-c /etc/envoy/envoy-rev9.json --restart-epoch 9 --drain-time-s
6796:}I0303 18:46:27.399934 1 agent.go:133] Envoy starting: [-c /etc/envoy/envoy-rev10.json --restart-epoch 10 --drain-time-
7455:}I0303 18:46:28.862936 1 agent.go:133] Envoy starting: [-c /etc/envoy/envoy-rev11.json --restart-epoch 11 --drain-time-
8116:}I0303 18:46:30.013958 1 agent.go:133] Envoy starting: [-c /etc/envoy/envoy-rev12.json --restart-epoch 12 --drain-time-
8822:}I0303 18:46:30.420838 1 agent.go:133] Envoy starting: [-c /etc/envoy/envoy-rev3.json --restart-epoch 3 --drain-time-s
9481:}I0303 18:46:30.781921 1 agent.go:133] Envoy starting: [-c /etc/envoy/envoy-rev3.json --restart-epoch 3 --drain-time-s
10140:}I0303 18:46:31.051243 1 agent.go:133] Envoy starting: [-c /etc/envoy/envoy-rev3.json --restart-epoch 3 --drain-time-s
10799:}I0303 18:46:31.313649 1 agent.go:133] Envoy starting: [-c /etc/envoy/envoy-rev3.json --restart-epoch 3 --drain-time-s
11458:}I0303 18:46:31.579342 1 agent.go:133] Envoy starting: [-c /etc/envoy/envoy-rev3.json --restart-epoch 3 --drain-time-s
https://gist.github.com/ayj/e04edc7c75a86ff78febd07e73042498
from pilot.
In the logs, I noticed that epoch 1 took so long to start, that epoch 2 got started before epoch 1 is initialized. Maybe epoch 2 envoy kills epoch 0 and then epoch 1 is just sitting there disrupting the order? We played with the delays to make sure epoch 2 starts after epoch 1 is initialized but it really depends on the delay between envoy exec and envoy sending a message to the previous epoch?
from pilot.
@rshriram I think amalgam8 used to hit similar issue. What's the delay between restarts that works?
from pilot.
from pilot.
duplicate of #268
from pilot.
Related Issues (20)
- istioctl not defaulting ns to "default" HOT 1
- Sidecar injection with mutating webhooks HOT 4
- Tests :Sidecar injection with mutating webhooks HOT 3
- Istio injection is not working for modified Deployments. HOT 6
- Ingress with host network HOT 1
- Request Headers Route Rule with composite services does not work HOT 1
- handling service registry client errors HOT 3
- Redirecting all ingress http traffic to https HOT 1
- Relational database adapter for Pilot config store HOT 10
- Diego BBS adapter for Pilot platform data HOT 12
- bazel 0.7 - make setup fails with bazel error on macOS HOT 12
- Use readable cluster names in stats HOT 4
- Build fails on Intel for Istioctl(pilot) HOT 14
- destination.labels is ignored in weighted rule HOT 4
- fails to create mixer configs when namespace field is empty
- Compute Envoy config eagerly rather than on-demand HOT 33
- istioctl kube-inject doesn't work when my pod has 2 containers HOT 2
- Add a script to query pilot for proxy configurations HOT 1
- gRPC-web HOT 1
- How to access the external services when istio with sidecar injected. HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pilot.