Giter Site home page Giter Site logo

Comments (5)

npdgm avatar npdgm commented on July 20, 2024 3

Reproduction steps on the development environment:

  1. Spin up the kind cluster
    [ make kind-down ]
    make kind
    make kind-image
    make kind-install-cilium
  2. Check statedb for eth0 addresses
    for pod in $(kubectl -n kube-system get po -o NAME -l app.kubernetes.io/name=cilium-agent)
    do
       echo -e "\n# $pod"
       kubectl -n kube-system exec -c cilium-agent $pod -- cilium-dbg statedb node-addresses | grep eth0
    done
    Example:
    # pod/cilium-dkrc6
    172.25.0.3                  true       true      eth0
    fc00:c111::3                true       true      eth0
    
    # pod/cilium-pdjc7
    fc00:c111::2                true       true      eth0
    172.25.0.2                  true       true      eth0
    
  3. Add an other address to the worker. Choosing an other subnet because we want it to be ordered before the node address, and 172.25.0.1 is not available.
    ❯ docker exec kind-worker ip -br a s dev eth0
    eth0@if162       UP             172.25.0.2/16 fc00:c111::2/64 fe80::42:acff:fe19:2/64 
    
    ❯ docker exec kind-worker ip a a 172.16.0.123/16 dev eth0
    
    ❯ docker exec kind-worker ip -br a s dev eth0
    eth0@if162       UP             172.25.0.2/16 172.16.0.123/16 fc00:c111::2/64 fe80::42:acff:fe19:2/64 
    
  4. Check the statedb again
    Example:
    # pod/cilium-dkrc6
    172.25.0.3                  true       true      eth0
    fc00:c111::3                true       true      eth0
    
    # pod/cilium-pdjc7
    fc00:c111::2                true       true      eth0
    172.16.0.123                true       true      eth0
    
    172.25.0.2 is missing!
  5. Delete 172.16.0.123 and replace it with 172.25.0.123, same subnet as original node IPs, but ordered after the main node address.
    ❯ docker exec kind-worker ip a d 172.16.0.123/16 dev eth0
    ❯ docker exec kind-worker ip a a 172.25.0.123/16 dev eth0
    ❯ docker exec kind-worker ip -br a s dev eth0
    eth0@if162       UP             172.25.0.2/16 172.25.0.123/16 fc00:c111::2/64 fe80::42:acff:fe19:2/64 
    
  6. In this situation the statedb is correct
    # pod/cilium-dkrc6
    172.25.0.3                  true       true      eth0
    fc00:c111::3                true       true      eth0
    
    # pod/cilium-pdjc7
    fc00:c111::2                true       true      eth0
    172.25.0.2                  true       true      eth0
    172.25.0.123                false      false     eth0
    

from cilium.

npdgm avatar npdgm commented on July 20, 2024 1

1.16.0-rc.0 is affected
However so far I could not reproduce the issue on 1.14.12 so it may have been introduced in the 1.15 branch

from cilium.

joamaki avatar joamaki commented on July 20, 2024

The issue here was that the set operations were performed on NodeAddress which not only included the IP address, but also the "NodePort" flag. When a new address was added the flag changed. The simplest fix for this is to swap the loops: first delete the existing addresses that are not part of the new set and then add the new ones in. I'll prepare a PR + backport to fix this.

I'm still a bit puzzled by what impact this bug had. In v1.15 Cilium only supports one NodePort IP per device and the "enable-runtime-device-detection" is still disabled by default in v1.15 (nothing happens when Table[NodeAddress] updates, only the very initial version matters). For v1.16 where Cilium now reacts to device/address changes by default and supports multiple NodePort IPs per device (with --nodeport-addresses) so this is definitely a release blocking bug.

A LoadBalancer service is essentially a NodePort + some static IPs. In your case was the VIP address specified as a LoadBalancer IP? If that was the case then my expectation would've been that the node addresses should not have mattered. (EDIT: hmm unless they clashed and you had --enable-runtime-device-detection enabled.. then the NodePort service might've overwritten the LoadBalancer service and it could've gotten removed afterwards...)

Could you dump the LoadBalancer service here? kubectl get svc/name-of-service -o yaml?
Also cilium-dbg service list would be useful (the relevant LoadBalancer and NodePort services). Were you accessing the service via the LoadBalancer "port" or via the "nodePort"?

from cilium.

npdgm avatar npdgm commented on July 20, 2024

The simplest fix for this is to swap the loops: first delete the existing addresses that are not part of the new set and then add the new ones in. I'll prepare a PR + backport to fix this.

Great! Thanks a lot.

I'm still a bit puzzled by what impact this bug had.

Sorry the report may have been confusing. It doesn't involve a LoadBalancer service in k8s, I'll explain.

Here's an example of Helm values for an install where the issue was first discovered. With this same set of values, 1.14 is fine, but 1.15 and 1.16 exhibit the issue.
enableRuntimeDeviceDetection was never set and left to defaults, so yes that was false with 1.15. But 1.16 with runtime detection had the exact same problem and behaviour regarding system VIPs.

autoDirectNodeRoutes: true
bgpControlPlane:
  enabled: true
bpf:
  masquerade: true
cgroup:
  autoMount:
    enabled: false
  hostRoot: /sys/fs/cgroup
endpointStatus:
  enabled: true
  status: policy health controllers log state
externalIPs:
  enabled: true
hubble:
  relay:
    enabled: true
  ui:
    enabled: true
ipam:
  mode: kubernetes
ipv4NativeRoutingCIDR: 10.244.0.0/16
k8sServiceHost: localhost
k8sServicePort: 7445
kubeProxyReplacement: true
loadBalancer:
  acceleration: native
  algorithm: maglev
  mode: hybrid
monitor:
  enabled: true
prometheus:
  enabled: true
  serviceMonitor:
    enabled: true
routingMode: native
securityContext:
  capabilities:
    ciliumAgent:
    - CHOWN
    - KILL
    - NET_ADMIN
    - NET_RAW
    - IPC_LOCK
    - SYS_ADMIN
    - SYS_RESOURCE
    - DAC_OVERRIDE
    - FOWNER
    - SETGID
    - SETUID
    cleanCiliumState:
    - NET_ADMIN
    - SYS_ADMIN
    - SYS_RESOURCE

A LoadBalancer service is essentially a NodePort + some static IPs. In your case was the VIP address specified as a LoadBalancer IP?

The VIP feature mentionned isn't managed by a k8s or LoadBalancer controller. An address is added/removed to a node main NIC by a system controller in Talos, just like ip addr add would. Should be the same with keepalived, carp, etc.
It's reproduced this way on the development stack in Kind, with the default Cilium config and ip commands. And the bug can be observed in statedb and various bpf maps already, no need to dig into Service bits it seems.

An affected Service can be of the ClusterIP type. LoadBalancer is not required.
The problem actually happens when selected Pods are using hostNetwork, so where Endpoints are Node addresses. Perhaps this is analogue to NodePorts in the datapath, I don't know.

The best example for this would be default Kubernetes API service:

apiVersion: v1
kind: Service
metadata:
  labels:
    component: apiserver
    provider: kubernetes
  name: kubernetes
  namespace: default
spec:
  clusterIP: 10.16.0.1
  clusterIPs:
  - 10.16.0.1
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - name: https
    port: 443
    protocol: TCP
    targetPort: 6443
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}
apiVersion: v1
kind: Endpoints
metadata:
  labels:
    endpointslice.kubernetes.io/skip-mirror: "true"
  name: kubernetes
  namespace: default
subsets:
- addresses:
  - ip: 10.0.50.11
  - ip: 10.0.50.12
  - ip: 10.0.50.13
  ports:
  - name: https
    port: 6443
    protocol: TCP
addressType: IPv4
apiVersion: discovery.k8s.io/v1
endpoints:
- addresses:
  - 10.0.50.11
  conditions:
    ready: true
- addresses:
  - 10.0.50.12
  conditions:
    ready: true
- addresses:
  - 10.0.50.13
  conditions:
    ready: true
kind: EndpointSlice
metadata:
  labels:
    kubernetes.io/service-name: kubernetes
  name: kubernetes
  namespace: default
ports:
- name: https
  port: 6443
  protocol: TCP

In these dumps, endpoints in 10.0.50.0/24 are node addresses, not from the PodCIDR.

The service is accessed with the Service port 443.

from cilium.

joamaki avatar joamaki commented on July 20, 2024

Hmm, Table[NodeAddress] is only used for 1) Picking the NodePort/HostPort frontend IPs, 2) Picking the BPF masquerade IP address. It should have no impact to ClusterIP services. It might be there's another unrelated bug involved here. I'll finish up the NodeAddress fix and will then try and reproduce this issue on my local Talos cluster.

EDIT: Forgot host IP syncing. It's periodically marking the node addresses with the host identity (daemon/cmd/hostips-sync.go). This could explain it.

Automation closed the PR, but I'll reopen this until backports for v1.16 and v1.15 are done.

@npdgm could you try and reproduce this with latest build with the fix?

Digests at https://github.com/cilium/cilium/actions/runs/9938947454/job/27453160771
Instructions on to install from a CI build: https://isovalent.com/blog/post/tutorial-tips-and-tricks-to-install-cilium/#h-the-ci-build-image-way

from cilium.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.