Comments (6)
Okay, I've found the problem - it's that, in some cases, the CiliumEnvoyConfig reconciliation happens before the Service has been added into Cilium's stores, so the Nodeports can't be determined. Verified with a terrible delay patch, and talked to @joamaki and have figured out a way to fix this properly.
Folks affected by this error, it's been reverted on main
, so a rebase should clear this up until I can get a proper fix in.
from cilium.
So, in the change I added, CiliumEnvoyConfig processing is changed so that, if we are adding L7 proxy port forwarding for a Service that has Nodeports, we forward both the frontend port and the Nodeport to the proxy. (This is because the services in question are the special LB services for Ingress that don't do anything other than forward traffic to Envoy).
In all of the failures I've checked, at least one node does not have the ProxyPort rules visible in cilium bpf lb list
. So, when the connectivity test tests that node will respond to the Ingress on that Nodeport, it fails.
In most cases, the ProxyPort rules never show up for that node for the lifetime of the test (they're not visible in the final sysdump, nor in any of the ones along the way). In one case, the rules were not present on one node in the first sysdump, then they showed up later.
So, it seems like this is caused by a node failing to apply the ProxyPort forwarding correctly for a single service. Which smells like an ordering-of-received-Kubernetes-updates issue to me. Going to ask for help from sig-datapath on this one.
from cilium.
I wrote a script to search through sysdumps, find the nodePort
values for the cilium-ingress-same-node
and cilium-ingress-other-node
Services, and then grep the cilium bpf lb list
output for those values:
#!/bin/bash
SYSDUMP_DIRS=$(find . -type d -name "cilium-sysdump-*" | sort)
for d in $SYSDUMP_DIRS; do
pushd $d || exit
echo Moving into $d
SVC_FILE=$(find . -name "k8s-service*.yaml")
SAME_NODEPORT=$(cat $SVC_FILE | yq '.items | map(select(.metadata.name == "cilium-ingress-same-node"))[0] | .spec.ports[0].nodePort')
OTHER_NODEPORT=$(cat $SVC_FILE | yq '.items | map(select(.metadata.name == "cilium-ingress-other-node"))[0] | .spec.ports[0].nodePort')
echo ================= Same Nodeport: $SAME_NODEPORT, Other Nodeport $OTHER_NODEPORT
for ciliumpod in $(find . -type d -name 'cilium*'); do echo $ciliumpod; cat ${ciliumpod}/cmd/cilium-dbg-bpf-lb-list.md | grep -E "$SAME_NODEPORT|$OTHER_NODEPORT";done
popd || exit
done
Representative output:
/Users/ynick/src/isovalent/issues/nodeport-ci/cilium-sysdumps-2/cilium-sysdump-12-20240618-052548 /Users/ynick/src/isovalent/issues/nodeport-ci
Moving into ./cilium-sysdumps-2/cilium-sysdump-12-20240618-052548
================= Same Nodeport: 32749, Other Nodeport 31088
./cilium-bugtool-cilium-s5xzv-20240618-052548
0.0.0.0:32749 (0) 0.0.0.0:0 (42) (0) [NodePort, non-routable]
192.168.0.2:32749 (0) 0.0.0.0:0 (41) (0) [NodePort]
172.18.0.2:32749 (0) 0.0.0.0:0 (40) (0) [NodePort]
192.168.0.2:31088 (0) 0.0.0.0:0 (37) (0) [NodePort, l7-load-balancer] (L7LB Proxy Port: 11507)
0.0.0.0:31088 (0) 0.0.0.0:0 (35) (0) [NodePort, non-routable, l7-load-balancer] (L7LB Proxy Port: 11507)
172.18.0.2:31088 (0) 0.0.0.0:0 (36) (0) [NodePort, l7-load-balancer] (L7LB Proxy Port: 11507)
./cilium-bugtool-cilium-qlq76-20240618-052548
0.0.0.0:31088 (0) 0.0.0.0:0 (37) (0) [NodePort, non-routable, l7-load-balancer] (L7LB Proxy Port: 12362)
192.168.0.4:32749 (0) 0.0.0.0:0 (41) (0) [NodePort]
0.0.0.0:32749 (0) 0.0.0.0:0 (42) (0) [NodePort, non-routable]
172.18.0.3:32749 (0) 0.0.0.0:0 (40) (0) [NodePort]
172.18.0.3:31088 (0) 0.0.0.0:0 (35) (0) [NodePort, l7-load-balancer] (L7LB Proxy Port: 12362)
192.168.0.4:31088 (0) 0.0.0.0:0 (36) (0) [NodePort, l7-load-balancer] (L7LB Proxy Port: 12362)
./cilium-bugtool-cilium-nhwl9-20240618-052548
192.168.0.3:32749 (0) 0.0.0.0:0 (41) (0) [NodePort, l7-load-balancer] (L7LB Proxy Port: 19370)
192.168.0.3:31088 (0) 0.0.0.0:0 (36) (0) [NodePort, l7-load-balancer] (L7LB Proxy Port: 19317)
172.18.0.4:31088 (0) 0.0.0.0:0 (35) (0) [NodePort, l7-load-balancer] (L7LB Proxy Port: 19317)
172.18.0.4:32749 (0) 0.0.0.0:0 (40) (0) [NodePort, l7-load-balancer] (L7LB Proxy Port: 19370)
0.0.0.0:31088 (0) 0.0.0.0:0 (37) (0) [NodePort, non-routable, l7-load-balancer] (L7LB Proxy Port: 19317)
0.0.0.0:32749 (0) 0.0.0.0:0 (42) (0) [NodePort, non-routable, l7-load-balancer] (L7LB Proxy Port: 19370)
/Users/ynick/src/isovalent/issues/nodeport-ci
/Users/ynick/src/isovalent/issues/nodeport-ci/cilium-sysdumps-2/cilium-sysdump-12-final /Users/ynick/src/isovalent/issues/nodeport-ci
Moving into ./cilium-sysdumps-2/cilium-sysdump-12-final
================= Same Nodeport: 32749, Other Nodeport 31088
./cilium-bugtool-cilium-s5xzv-20240618-053904
0.0.0.0:32749 (0) 0.0.0.0:0 (42) (0) [NodePort, non-routable]
0.0.0.0:31088 (0) 0.0.0.0:0 (35) (0) [NodePort, non-routable, l7-load-balancer] (L7LB Proxy Port: 11507)
192.168.0.2:31088 (0) 0.0.0.0:0 (37) (0) [NodePort, l7-load-balancer] (L7LB Proxy Port: 11507)
192.168.0.2:32749 (0) 0.0.0.0:0 (41) (0) [NodePort]
172.18.0.2:31088 (0) 0.0.0.0:0 (36) (0) [NodePort, l7-load-balancer] (L7LB Proxy Port: 11507)
172.18.0.2:32749 (0) 0.0.0.0:0 (40) (0) [NodePort]
./cilium-bugtool-cilium-qlq76-20240618-053904
0.0.0.0:31088 (0) 0.0.0.0:0 (37) (0) [NodePort, non-routable, l7-load-balancer] (L7LB Proxy Port: 12362)
192.168.0.4:31088 (0) 0.0.0.0:0 (36) (0) [NodePort, l7-load-balancer] (L7LB Proxy Port: 12362)
192.168.0.4:32749 (0) 0.0.0.0:0 (41) (0) [NodePort]
172.18.0.3:32749 (0) 0.0.0.0:0 (40) (0) [NodePort]
172.18.0.3:31088 (0) 0.0.0.0:0 (35) (0) [NodePort, l7-load-balancer] (L7LB Proxy Port: 12362)
0.0.0.0:32749 (0) 0.0.0.0:0 (42) (0) [NodePort, non-routable]
./cilium-bugtool-cilium-nhwl9-20240618-053904
172.18.0.4:32749 (0) 0.0.0.0:0 (40) (0) [NodePort, l7-load-balancer] (L7LB Proxy Port: 19370)
192.168.0.3:32749 (0) 0.0.0.0:0 (41) (0) [NodePort, l7-load-balancer] (L7LB Proxy Port: 19370)
172.18.0.4:31088 (0) 0.0.0.0:0 (35) (0) [NodePort, l7-load-balancer] (L7LB Proxy Port: 19317)
0.0.0.0:32749 (0) 0.0.0.0:0 (42) (0) [NodePort, non-routable, l7-load-balancer] (L7LB Proxy Port: 19370)
0.0.0.0:31088 (0) 0.0.0.0:0 (37) (0) [NodePort, non-routable, l7-load-balancer] (L7LB Proxy Port: 19317)
192.168.0.3:31088 (0) 0.0.0.0:0 (36) (0) [NodePort, l7-load-balancer] (L7LB Proxy Port: 19317)
You can see in these examples that the 32749
port does not have Proxy Port forwards on some nodes.
from cilium.
Hit on https://github.com/cilium/cilium/actions/runs/9598146987/job/26468789478
PR: #33240
sysdump: cilium-sysdump-15-final.zip
from cilium.
@youngnick We handle the same problem with l7 service redirection by registering the redirection without assuming the service is already there. Maybe we could do something similar about the ports, allowing the service package to deal with teh actual port number when the service appears?
from cilium.
I looked into that, but it's not really practical as the service package doesn't hold the details we'd need to look up. I should be able to work up a PR on Monday my time, so I'll have something to show you early next week.
from cilium.
Related Issues (20)
- Nodeport timeout when remote node in different subnet (IPv6-only cluster)
- cannot enable Bandwidth Manager on oracle linux 9, HOT 11
- Helm can't enable geneve protocol: line 132: mapping key "tunnel-protocol" already defined at line 131
- IPsec unable to use 36 byte psk for GCM-256-AES HOT 1
- CFP: Add option for BGP Control Plane to support advertising ipv6 routes by block instead of by LoadBalancer Ingress IP
- Check kernel testing on stable branches
- CFP: limit the gateway API to choose certain nodes when forwarding from external LB to k8s cluster
- CFP #33462: Scalability - remove anti-affinity conditions
- Strange reference to 99.105.108.105/24 HOT 2
- Pods lose IPv6 connectivity at some point of their lifetime if multi-pool is used
- fatal error: concurrent map iteration and map write in github.com/cilium/cilium/pkg/auth HOT 1
- CFP: Support Layer 2 VPN Pods
- testing: Extend kind.sh --secondary-network to use L3 netdev
- CFP: surface ICMP Type=3,Code=4 flows/counters to potentially detect MTU misconfigurations
- CFP: Cilium Gateway API - more Gateway features HOT 1
- CFP: Dynamic Cell Lifecycle HOT 1
- IPsec errors in 1.15.6 HOT 1
- Intermittent endpoint regeneration failure leads to DNS resolution errors in pods
- Kubernetes network policy for api-server
- unable to determine direct routing device. Use --direct-routing-device to specify it (except that option is set) HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cilium.