Comments (5)
Reproduction steps on the development environment:
- Spin up the kind cluster
[make kind-down
]
make kind
make kind-image
make kind-install-cilium
- Check statedb for eth0 addresses
Example:
for pod in $(kubectl -n kube-system get po -o NAME -l app.kubernetes.io/name=cilium-agent) do echo -e "\n# $pod" kubectl -n kube-system exec -c cilium-agent $pod -- cilium-dbg statedb node-addresses | grep eth0 done
# pod/cilium-dkrc6 172.25.0.3 true true eth0 fc00:c111::3 true true eth0 # pod/cilium-pdjc7 fc00:c111::2 true true eth0 172.25.0.2 true true eth0
- Add an other address to the worker. Choosing an other subnet because we want it to be ordered before the node address, and
172.25.0.1
is not available.❯ docker exec kind-worker ip -br a s dev eth0 eth0@if162 UP 172.25.0.2/16 fc00:c111::2/64 fe80::42:acff:fe19:2/64 ❯ docker exec kind-worker ip a a 172.16.0.123/16 dev eth0 ❯ docker exec kind-worker ip -br a s dev eth0 eth0@if162 UP 172.25.0.2/16 172.16.0.123/16 fc00:c111::2/64 fe80::42:acff:fe19:2/64
- Check the statedb again
Example:# pod/cilium-dkrc6 172.25.0.3 true true eth0 fc00:c111::3 true true eth0 # pod/cilium-pdjc7 fc00:c111::2 true true eth0 172.16.0.123 true true eth0
172.25.0.2
is missing! - Delete
172.16.0.123
and replace it with172.25.0.123
, same subnet as original node IPs, but ordered after the main node address.❯ docker exec kind-worker ip a d 172.16.0.123/16 dev eth0 ❯ docker exec kind-worker ip a a 172.25.0.123/16 dev eth0 ❯ docker exec kind-worker ip -br a s dev eth0 eth0@if162 UP 172.25.0.2/16 172.25.0.123/16 fc00:c111::2/64 fe80::42:acff:fe19:2/64
- In this situation the statedb is correct
# pod/cilium-dkrc6 172.25.0.3 true true eth0 fc00:c111::3 true true eth0 # pod/cilium-pdjc7 fc00:c111::2 true true eth0 172.25.0.2 true true eth0 172.25.0.123 false false eth0
from cilium.
1.16.0-rc.0
is affected
However so far I could not reproduce the issue on 1.14.12
so it may have been introduced in the 1.15 branch
from cilium.
The issue here was that the set operations were performed on NodeAddress
which not only included the IP address, but also the "NodePort" flag. When a new address was added the flag changed. The simplest fix for this is to swap the loops: first delete the existing addresses that are not part of the new set and then add the new ones in. I'll prepare a PR + backport to fix this.
I'm still a bit puzzled by what impact this bug had. In v1.15 Cilium only supports one NodePort IP per device and the "enable-runtime-device-detection" is still disabled by default in v1.15 (nothing happens when Table[NodeAddress] updates, only the very initial version matters). For v1.16 where Cilium now reacts to device/address changes by default and supports multiple NodePort IPs per device (with --nodeport-addresses) so this is definitely a release blocking bug.
A LoadBalancer service is essentially a NodePort + some static IPs. In your case was the VIP address specified as a LoadBalancer IP? If that was the case then my expectation would've been that the node addresses should not have mattered. (EDIT: hmm unless they clashed and you had --enable-runtime-device-detection enabled.. then the NodePort service might've overwritten the LoadBalancer service and it could've gotten removed afterwards...)
Could you dump the LoadBalancer service here? kubectl get svc/name-of-service -o yaml
?
Also cilium-dbg service list
would be useful (the relevant LoadBalancer and NodePort services). Were you accessing the service via the LoadBalancer "port" or via the "nodePort"?
from cilium.
The simplest fix for this is to swap the loops: first delete the existing addresses that are not part of the new set and then add the new ones in. I'll prepare a PR + backport to fix this.
Great! Thanks a lot.
I'm still a bit puzzled by what impact this bug had.
Sorry the report may have been confusing. It doesn't involve a LoadBalancer service in k8s, I'll explain.
Here's an example of Helm values for an install where the issue was first discovered. With this same set of values, 1.14 is fine, but 1.15 and 1.16 exhibit the issue.
enableRuntimeDeviceDetection
was never set and left to defaults, so yes that was false
with 1.15. But 1.16 with runtime detection had the exact same problem and behaviour regarding system VIPs.
autoDirectNodeRoutes: true
bgpControlPlane:
enabled: true
bpf:
masquerade: true
cgroup:
autoMount:
enabled: false
hostRoot: /sys/fs/cgroup
endpointStatus:
enabled: true
status: policy health controllers log state
externalIPs:
enabled: true
hubble:
relay:
enabled: true
ui:
enabled: true
ipam:
mode: kubernetes
ipv4NativeRoutingCIDR: 10.244.0.0/16
k8sServiceHost: localhost
k8sServicePort: 7445
kubeProxyReplacement: true
loadBalancer:
acceleration: native
algorithm: maglev
mode: hybrid
monitor:
enabled: true
prometheus:
enabled: true
serviceMonitor:
enabled: true
routingMode: native
securityContext:
capabilities:
ciliumAgent:
- CHOWN
- KILL
- NET_ADMIN
- NET_RAW
- IPC_LOCK
- SYS_ADMIN
- SYS_RESOURCE
- DAC_OVERRIDE
- FOWNER
- SETGID
- SETUID
cleanCiliumState:
- NET_ADMIN
- SYS_ADMIN
- SYS_RESOURCE
A LoadBalancer service is essentially a NodePort + some static IPs. In your case was the VIP address specified as a LoadBalancer IP?
The VIP feature mentionned isn't managed by a k8s or LoadBalancer controller. An address is added/removed to a node main NIC by a system controller in Talos, just like ip addr add
would. Should be the same with keepalived, carp, etc.
It's reproduced this way on the development stack in Kind, with the default Cilium config and ip
commands. And the bug can be observed in statedb and various bpf maps already, no need to dig into Service bits it seems.
An affected Service can be of the ClusterIP type. LoadBalancer is not required.
The problem actually happens when selected Pods are using hostNetwork, so where Endpoints are Node addresses. Perhaps this is analogue to NodePorts in the datapath, I don't know.
The best example for this would be default Kubernetes API service:
apiVersion: v1
kind: Service
metadata:
labels:
component: apiserver
provider: kubernetes
name: kubernetes
namespace: default
spec:
clusterIP: 10.16.0.1
clusterIPs:
- 10.16.0.1
internalTrafficPolicy: Cluster
ipFamilies:
- IPv4
ipFamilyPolicy: SingleStack
ports:
- name: https
port: 443
protocol: TCP
targetPort: 6443
sessionAffinity: None
type: ClusterIP
status:
loadBalancer: {}
apiVersion: v1
kind: Endpoints
metadata:
labels:
endpointslice.kubernetes.io/skip-mirror: "true"
name: kubernetes
namespace: default
subsets:
- addresses:
- ip: 10.0.50.11
- ip: 10.0.50.12
- ip: 10.0.50.13
ports:
- name: https
port: 6443
protocol: TCP
addressType: IPv4
apiVersion: discovery.k8s.io/v1
endpoints:
- addresses:
- 10.0.50.11
conditions:
ready: true
- addresses:
- 10.0.50.12
conditions:
ready: true
- addresses:
- 10.0.50.13
conditions:
ready: true
kind: EndpointSlice
metadata:
labels:
kubernetes.io/service-name: kubernetes
name: kubernetes
namespace: default
ports:
- name: https
port: 6443
protocol: TCP
In these dumps, endpoints in 10.0.50.0/24
are node addresses, not from the PodCIDR.
The service is accessed with the Service port 443
.
from cilium.
Hmm, Table[NodeAddress] is only used for 1) Picking the NodePort/HostPort frontend IPs, 2) Picking the BPF masquerade IP address. It should have no impact to ClusterIP services. It might be there's another unrelated bug involved here. I'll finish up the NodeAddress fix and will then try and reproduce this issue on my local Talos cluster.
EDIT: Forgot host IP syncing. It's periodically marking the node addresses with the host identity (daemon/cmd/hostips-sync.go). This could explain it.
Automation closed the PR, but I'll reopen this until backports for v1.16 and v1.15 are done.
@npdgm could you try and reproduce this with latest build with the fix?
Digests at https://github.com/cilium/cilium/actions/runs/9938947454/job/27453160771
Instructions on to install from a CI build: https://isovalent.com/blog/post/tutorial-tips-and-tricks-to-install-cilium/#h-the-ci-build-image-way
from cilium.
Related Issues (20)
- [DATA RACE]: k8s.(*ServiceCache).DebugStatus() HOT 1
- custom BPF code loading failes
- SourceIPVerification while chaining CNI HOT 2
- Hubble-Relay timesout running on Kubernetes 1.30
- Test [to-fqdns] fails yet manual verification works HOT 1
- [DATA RACE]: github.com/cilium/dns.(*Conn).ReadMsgHeader()
- [DATA RACE]: github.com/cilium/cilium/pkg/bgpv1/agent.(*Controller).bgppSelection()
- [DATA RACE]: github.com/cilium/cilium/operator/auth/spire.(*Client).Upsert()
- [DATA RACE]: github.com/cilium/cilium/pkg/ciliumenvoyconfig.(*envoyServiceBackendSyncer).RegisterServiceUsageInCEC()
- Collect cilium_skip_lb{4,6} maps in sysdump
- Cilium identity not correct when trying connect from outside cluster with nodeport
- Cannot forward proxied DNS lookup; bind: address already in use
- LoadBalancerClass cannot be set on Cilium Gateway API Servicess
- CI: Integration Test: TestClusterMeshMultipleAddRemove HOT 1
- Document GKE cloud DNS deployment using LRP
- Run LRP e2e test for per-packet LB HOT 1
- Policy Correlation: realized policy misses for L4-only rules HOT 1
- CFP:
- CFP: cilium
- Documentation issues
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cilium.