I have a control plane with 3 node, and i want to replace 1 node (say delete machine 1

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

What is the correct approach for replacing a controlplane node about kube-vip HOT 8 CLOSED

kube-vip commented on May 25, 2024

What is the correct approach for replacing a controlplane node

from kube-vip.

Comments (8)

thebsdbox commented on May 25, 2024 1

I've recently re-written Kube-vip for precisely the reasons above, In order to provide more resiliency and less confusion I first opted for the route of having kube-vip use client-go to monitor what was happening within the cluster (watching for both nodes with the control plane label, and then if running the kube-vip pod). This unfortunately introduced more instability into the cluster as changing the peers would usually result in the raft algorithm failing or leaving the cluster with no leader.

After this I decided to attempt to adopt a different method for managing HA within the cluster, this time using client-go and it's leaderElection feature. I've kept the same UX and so far this seems "rock solid" from a stability perspective.

Install

# First Node
sudo docker run --network host --rm plndr/kube-vip:0.1.6-election kubeadm init --interface ens192 --vip 192.168.0.81 --leaderElection | sudo tee /etc/kubernetes/manifests/vip.yaml

kubeadm init [....]

# Remaining Nodes

kubeadm join [...]
sudo docker run --network host --rm plndr/kube-vip:0.1.6-election kubeadm init --interface ens192 --vip 192.168.0.81 --leaderElection | sudo tee /etc/kubernetes/manifests/vip.yaml

Upgrades

From above we have a 3 node cluster and the controlPlane01 is leader:

$ kubectl logs -n kube-system kube-vip-controlplane01 -f
time="2020-07-04T15:12:52Z" level=info msg="Beginning cluster membership, namespace [kube-system], lock name [plunder-lock], id [controlPlane01]"
I0704 15:12:52.290420       1 leaderelection.go:242] attempting to acquire leader lease  kube-system/plunder-lock...
I0704 15:12:56.373113       1 leaderelection.go:252] successfully acquired lease kube-system/plunder-lock
time="2020-07-04T15:12:56Z" level=info msg="This node is assuming leadership of the cluster"
time="2020-07-04T15:12:56Z" level=error msg="This node is leader and is adopting the virtual IP"
time="2020-07-04T15:12:56Z" level=info msg="Starting TCP Load Balancer for service [192.168.0.81:0]"
time="2020-07-04T15:12:56Z" level=info msg="Load Balancer [Kubeadm Load Balancer] started"
time="2020-07-04T15:12:56Z" level=info msg="Broadcasting ARP update for 192.168.0.81 (00:50:56:a5:69:a1) via ens192"
time="2020-07-04T15:12:56Z" level=info msg="Starting TCP Load Balancer for service [192.168.0.81:0]"
time="2020-07-04T15:12:56Z" level=info msg="Load Balancer [Kubeadm Load Balancer] started"
time="2020-07-04T15:12:56Z" level=info msg="Broadcasting ARP update for 192.168.0.81 (00:50:56:a5:69:a1) via ens192"
time="2020-07-04T15:12:56Z" level=info msg="new leader elected: controlPlane01"

We will kill this node and watch kube-vip logs from another node:

Pinging VIP

64 bytes from 192.168.0.81: icmp_seq=667 ttl=64 time=0.387 ms
Request timeout for icmp_seq 668
Request timeout for icmp_seq 669
Request timeout for icmp_seq 670
Request timeout for icmp_seq 671
Request timeout for icmp_seq 672
64 bytes from 192.168.0.81: icmp_seq=673 ttl=64 time=0.453 ms

Logs

$ kubectl logs -n kube-system kube-vip-controlplane03 -f
time="2020-07-04T15:17:53Z" level=info msg="Beginning cluster membership, namespace [kube-system], lock name [plunder-lock], id [controlPlane03]"
I0704 15:17:53.484698       1 leaderelection.go:242] attempting to acquire leader lease  kube-system/plunder-lock...
time="2020-07-04T15:17:53Z" level=info msg="new leader elected: controlPlane01"
E0704 15:20:18.864141       1 leaderelection.go:331] error retrieving resource lock kube-system/plunder-lock: etcdserver: request timed out
time="2020-07-04T15:20:20Z" level=info msg="new leader elected: controlPlane02"

Adding `controlPlane04`

A kubeadm join will fail as the controlPlane01 still exists as an endpoint, so we have two options (manual steps and configmap edit to remove all mention of this node, or we can bring this node up and kubeadm reset the node (which we will do)).

$ kubectl get nodes
NAME             STATUS     ROLES    AGE   VERSION
controlplane01   NotReady   master   14m   v1.17.0
controlplane02   Ready      master   13m   v1.17.2
controlplane03   Ready      master   13m   v1.17.0
controlplane04   NotReady   master   9s    v1.17.0

After this we can add this node into kube-vip with the same manifest created by docker run.

cc/ @mauilion @yastij @fabriziopandini

from kube-vip.

thebsdbox commented on May 25, 2024 1

@schmitch A control planekubeadm join is successful if an existing control plane is online AND healthchecks for etcd are fine. My point above is to ensure that when we remove a node, we also need to ensure it's removed as an etcd endpoint too.

from kube-vip.

thebsdbox commented on May 25, 2024 1

I'll close this as this approach is now being used in CAPV 0.70

from kube-vip.

thebsdbox commented on May 25, 2024

I've just tested with the following, using cp2 (control plane node 2) as the one we remove.

k8s version 1.17.0 (I'll build a new cluster later)
vip 192.168.0.81
cp1 192.168.0.70
cp2 192.168.0.71
cp3 192.168.0.72

cp4 192.168.0.73

Simply doing a poweroff of cp2 leaves this node as part of the cluster, including being part of the etcd endpoint list. This causes cp4 to fail with kubeadm join to fail as it tests endpoints.

If I start again (back to a working cluster) and ensure that cp2 is removed from the cluster and the etcd endpoints (kubeadm reset / etcd remove phase etc) are also removed.. then when I add cp4 in to the kube-vip it will only see the two remaining endpoints and will join the kube-vip cluster. It may take a little while for the raft algorithm to stabilise (the remaining leader of the vip will stay in charge however).

from kube-vip.

fabriziopandini commented on May 25, 2024

@yastij ^^
If I got this right, it seams that if node deletion is done properly, kube-vip removes delete nodes from the VIP

from kube-vip.

thebsdbox commented on May 25, 2024

@fabriziopandini Is this OK to close?

from kube-vip.

fabriziopandini commented on May 25, 2024

@thebsdbox sorry for the late answer
If I got this right this works for me. I leave the final world to @yastij which is actively testing this in CAPV

from kube-vip.

schmitch commented on May 25, 2024

@thebsdbox shouldn't kubeadm join always be successful if a quorum of the control plane is online? it seems wierd that this does not work.

from kube-vip.

What is the correct approach for replacing a controlplane node about kube-vip HOT 8 CLOSED

Comments (8)

Install

Upgrades

Pinging VIP

Logs

Adding `controlPlane04`

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Comments (8)

Install

Upgrades

Pinging VIP

Logs

Adding controlPlane04

Related Issues (20)

Recommend Projects

Recommend Topics

Recommend Org

Adding `controlPlane04`