Giter Site home page Giter Site logo

Comments (6)

youngnick avatar youngnick commented on July 20, 2024 1

Okay, I've found the problem - it's that, in some cases, the CiliumEnvoyConfig reconciliation happens before the Service has been added into Cilium's stores, so the Nodeports can't be determined. Verified with a terrible delay patch, and talked to @joamaki and have figured out a way to fix this properly.

Folks affected by this error, it's been reverted on main, so a rebase should clear this up until I can get a proper fix in.

from cilium.

youngnick avatar youngnick commented on July 20, 2024

So, in the change I added, CiliumEnvoyConfig processing is changed so that, if we are adding L7 proxy port forwarding for a Service that has Nodeports, we forward both the frontend port and the Nodeport to the proxy. (This is because the services in question are the special LB services for Ingress that don't do anything other than forward traffic to Envoy).

In all of the failures I've checked, at least one node does not have the ProxyPort rules visible in cilium bpf lb list. So, when the connectivity test tests that node will respond to the Ingress on that Nodeport, it fails.

In most cases, the ProxyPort rules never show up for that node for the lifetime of the test (they're not visible in the final sysdump, nor in any of the ones along the way). In one case, the rules were not present on one node in the first sysdump, then they showed up later.

So, it seems like this is caused by a node failing to apply the ProxyPort forwarding correctly for a single service. Which smells like an ordering-of-received-Kubernetes-updates issue to me. Going to ask for help from sig-datapath on this one.

from cilium.

youngnick avatar youngnick commented on July 20, 2024

I wrote a script to search through sysdumps, find the nodePort values for the cilium-ingress-same-node and cilium-ingress-other-node Services, and then grep the cilium bpf lb list output for those values:

#!/bin/bash

SYSDUMP_DIRS=$(find . -type d -name "cilium-sysdump-*" | sort)

for d in $SYSDUMP_DIRS; do

pushd $d || exit

echo Moving into $d
SVC_FILE=$(find . -name "k8s-service*.yaml")
SAME_NODEPORT=$(cat $SVC_FILE | yq '.items | map(select(.metadata.name == "cilium-ingress-same-node"))[0] | .spec.ports[0].nodePort')
OTHER_NODEPORT=$(cat $SVC_FILE | yq '.items | map(select(.metadata.name == "cilium-ingress-other-node"))[0] | .spec.ports[0].nodePort')
echo ================= Same Nodeport: $SAME_NODEPORT, Other Nodeport $OTHER_NODEPORT
for ciliumpod in $(find . -type d -name 'cilium*'); do echo $ciliumpod; cat ${ciliumpod}/cmd/cilium-dbg-bpf-lb-list.md | grep -E "$SAME_NODEPORT|$OTHER_NODEPORT";done

popd || exit

done

Representative output:

/Users/ynick/src/isovalent/issues/nodeport-ci/cilium-sysdumps-2/cilium-sysdump-12-20240618-052548 /Users/ynick/src/isovalent/issues/nodeport-ci
Moving into ./cilium-sysdumps-2/cilium-sysdump-12-20240618-052548
================= Same Nodeport: 32749, Other Nodeport 31088
./cilium-bugtool-cilium-s5xzv-20240618-052548
0.0.0.0:32749 (0)             0.0.0.0:0 (42) (0) [NodePort, non-routable]
192.168.0.2:32749 (0)         0.0.0.0:0 (41) (0) [NodePort]
172.18.0.2:32749 (0)          0.0.0.0:0 (40) (0) [NodePort]
192.168.0.2:31088 (0)         0.0.0.0:0 (37) (0) [NodePort, l7-load-balancer] (L7LB Proxy Port: 11507)
0.0.0.0:31088 (0)             0.0.0.0:0 (35) (0) [NodePort, non-routable, l7-load-balancer] (L7LB Proxy Port: 11507)
172.18.0.2:31088 (0)          0.0.0.0:0 (36) (0) [NodePort, l7-load-balancer] (L7LB Proxy Port: 11507)
./cilium-bugtool-cilium-qlq76-20240618-052548
0.0.0.0:31088 (0)             0.0.0.0:0 (37) (0) [NodePort, non-routable, l7-load-balancer] (L7LB Proxy Port: 12362)
192.168.0.4:32749 (0)         0.0.0.0:0 (41) (0) [NodePort]
0.0.0.0:32749 (0)             0.0.0.0:0 (42) (0) [NodePort, non-routable]
172.18.0.3:32749 (0)          0.0.0.0:0 (40) (0) [NodePort]
172.18.0.3:31088 (0)          0.0.0.0:0 (35) (0) [NodePort, l7-load-balancer] (L7LB Proxy Port: 12362)
192.168.0.4:31088 (0)         0.0.0.0:0 (36) (0) [NodePort, l7-load-balancer] (L7LB Proxy Port: 12362)
./cilium-bugtool-cilium-nhwl9-20240618-052548
192.168.0.3:32749 (0)         0.0.0.0:0 (41) (0) [NodePort, l7-load-balancer] (L7LB Proxy Port: 19370)
192.168.0.3:31088 (0)         0.0.0.0:0 (36) (0) [NodePort, l7-load-balancer] (L7LB Proxy Port: 19317)
172.18.0.4:31088 (0)          0.0.0.0:0 (35) (0) [NodePort, l7-load-balancer] (L7LB Proxy Port: 19317)
172.18.0.4:32749 (0)          0.0.0.0:0 (40) (0) [NodePort, l7-load-balancer] (L7LB Proxy Port: 19370)
0.0.0.0:31088 (0)             0.0.0.0:0 (37) (0) [NodePort, non-routable, l7-load-balancer] (L7LB Proxy Port: 19317)
0.0.0.0:32749 (0)             0.0.0.0:0 (42) (0) [NodePort, non-routable, l7-load-balancer] (L7LB Proxy Port: 19370)
/Users/ynick/src/isovalent/issues/nodeport-ci
/Users/ynick/src/isovalent/issues/nodeport-ci/cilium-sysdumps-2/cilium-sysdump-12-final /Users/ynick/src/isovalent/issues/nodeport-ci
Moving into ./cilium-sysdumps-2/cilium-sysdump-12-final
================= Same Nodeport: 32749, Other Nodeport 31088
./cilium-bugtool-cilium-s5xzv-20240618-053904
0.0.0.0:32749 (0)             0.0.0.0:0 (42) (0) [NodePort, non-routable]
0.0.0.0:31088 (0)             0.0.0.0:0 (35) (0) [NodePort, non-routable, l7-load-balancer] (L7LB Proxy Port: 11507)
192.168.0.2:31088 (0)         0.0.0.0:0 (37) (0) [NodePort, l7-load-balancer] (L7LB Proxy Port: 11507)
192.168.0.2:32749 (0)         0.0.0.0:0 (41) (0) [NodePort]
172.18.0.2:31088 (0)          0.0.0.0:0 (36) (0) [NodePort, l7-load-balancer] (L7LB Proxy Port: 11507)
172.18.0.2:32749 (0)          0.0.0.0:0 (40) (0) [NodePort]
./cilium-bugtool-cilium-qlq76-20240618-053904
0.0.0.0:31088 (0)             0.0.0.0:0 (37) (0) [NodePort, non-routable, l7-load-balancer] (L7LB Proxy Port: 12362)
192.168.0.4:31088 (0)         0.0.0.0:0 (36) (0) [NodePort, l7-load-balancer] (L7LB Proxy Port: 12362)
192.168.0.4:32749 (0)         0.0.0.0:0 (41) (0) [NodePort]
172.18.0.3:32749 (0)          0.0.0.0:0 (40) (0) [NodePort]
172.18.0.3:31088 (0)          0.0.0.0:0 (35) (0) [NodePort, l7-load-balancer] (L7LB Proxy Port: 12362)
0.0.0.0:32749 (0)             0.0.0.0:0 (42) (0) [NodePort, non-routable]
./cilium-bugtool-cilium-nhwl9-20240618-053904
172.18.0.4:32749 (0)          0.0.0.0:0 (40) (0) [NodePort, l7-load-balancer] (L7LB Proxy Port: 19370)
192.168.0.3:32749 (0)         0.0.0.0:0 (41) (0) [NodePort, l7-load-balancer] (L7LB Proxy Port: 19370)
172.18.0.4:31088 (0)          0.0.0.0:0 (35) (0) [NodePort, l7-load-balancer] (L7LB Proxy Port: 19317)
0.0.0.0:32749 (0)             0.0.0.0:0 (42) (0) [NodePort, non-routable, l7-load-balancer] (L7LB Proxy Port: 19370)
0.0.0.0:31088 (0)             0.0.0.0:0 (37) (0) [NodePort, non-routable, l7-load-balancer] (L7LB Proxy Port: 19317)
192.168.0.3:31088 (0)         0.0.0.0:0 (36) (0) [NodePort, l7-load-balancer] (L7LB Proxy Port: 19317)

You can see in these examples that the 32749 port does not have Proxy Port forwards on some nodes.

from cilium.

thorn3r avatar thorn3r commented on July 20, 2024

Hit on https://github.com/cilium/cilium/actions/runs/9598146987/job/26468789478
PR: #33240
sysdump: cilium-sysdump-15-final.zip

from cilium.

jrajahalme avatar jrajahalme commented on July 20, 2024

@youngnick We handle the same problem with l7 service redirection by registering the redirection without assuming the service is already there. Maybe we could do something similar about the ports, allowing the service package to deal with teh actual port number when the service appears?

from cilium.

youngnick avatar youngnick commented on July 20, 2024

I looked into that, but it's not really practical as the service package doesn't hold the details we'd need to look up. I should be able to work up a PR on Monday my time, so I'll have something to show you early next week.

from cilium.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.