Giter Site home page Giter Site logo

Comments (8)

thebsdbox avatar thebsdbox commented on May 25, 2024 1

I've recently re-written Kube-vip for precisely the reasons above, In order to provide more resiliency and less confusion I first opted for the route of having kube-vip use client-go to monitor what was happening within the cluster (watching for both nodes with the control plane label, and then if running the kube-vip pod). This unfortunately introduced more instability into the cluster as changing the peers would usually result in the raft algorithm failing or leaving the cluster with no leader.

After this I decided to attempt to adopt a different method for managing HA within the cluster, this time using client-go and it's leaderElection feature. I've kept the same UX and so far this seems "rock solid" from a stability perspective.

Install

# First Node
sudo docker run --network host --rm plndr/kube-vip:0.1.6-election kubeadm init --interface ens192 --vip 192.168.0.81 --leaderElection | sudo tee /etc/kubernetes/manifests/vip.yaml

kubeadm init [....]

# Remaining Nodes

kubeadm join [...]
sudo docker run --network host --rm plndr/kube-vip:0.1.6-election kubeadm init --interface ens192 --vip 192.168.0.81 --leaderElection | sudo tee /etc/kubernetes/manifests/vip.yaml

Upgrades

From above we have a 3 node cluster and the controlPlane01 is leader:

$ kubectl logs -n kube-system kube-vip-controlplane01 -f
time="2020-07-04T15:12:52Z" level=info msg="Beginning cluster membership, namespace [kube-system], lock name [plunder-lock], id [controlPlane01]"
I0704 15:12:52.290420       1 leaderelection.go:242] attempting to acquire leader lease  kube-system/plunder-lock...
I0704 15:12:56.373113       1 leaderelection.go:252] successfully acquired lease kube-system/plunder-lock
time="2020-07-04T15:12:56Z" level=info msg="This node is assuming leadership of the cluster"
time="2020-07-04T15:12:56Z" level=error msg="This node is leader and is adopting the virtual IP"
time="2020-07-04T15:12:56Z" level=info msg="Starting TCP Load Balancer for service [192.168.0.81:0]"
time="2020-07-04T15:12:56Z" level=info msg="Load Balancer [Kubeadm Load Balancer] started"
time="2020-07-04T15:12:56Z" level=info msg="Broadcasting ARP update for 192.168.0.81 (00:50:56:a5:69:a1) via ens192"
time="2020-07-04T15:12:56Z" level=info msg="Starting TCP Load Balancer for service [192.168.0.81:0]"
time="2020-07-04T15:12:56Z" level=info msg="Load Balancer [Kubeadm Load Balancer] started"
time="2020-07-04T15:12:56Z" level=info msg="Broadcasting ARP update for 192.168.0.81 (00:50:56:a5:69:a1) via ens192"
time="2020-07-04T15:12:56Z" level=info msg="new leader elected: controlPlane01"

We will kill this node and watch kube-vip logs from another node:

Pinging VIP

64 bytes from 192.168.0.81: icmp_seq=667 ttl=64 time=0.387 ms
Request timeout for icmp_seq 668
Request timeout for icmp_seq 669
Request timeout for icmp_seq 670
Request timeout for icmp_seq 671
Request timeout for icmp_seq 672
64 bytes from 192.168.0.81: icmp_seq=673 ttl=64 time=0.453 ms

Logs

$ kubectl logs -n kube-system kube-vip-controlplane03 -f
time="2020-07-04T15:17:53Z" level=info msg="Beginning cluster membership, namespace [kube-system], lock name [plunder-lock], id [controlPlane03]"
I0704 15:17:53.484698       1 leaderelection.go:242] attempting to acquire leader lease  kube-system/plunder-lock...
time="2020-07-04T15:17:53Z" level=info msg="new leader elected: controlPlane01"
E0704 15:20:18.864141       1 leaderelection.go:331] error retrieving resource lock kube-system/plunder-lock: etcdserver: request timed out
time="2020-07-04T15:20:20Z" level=info msg="new leader elected: controlPlane02"

Adding controlPlane04

A kubeadm join will fail as the controlPlane01 still exists as an endpoint, so we have two options (manual steps and configmap edit to remove all mention of this node, or we can bring this node up and kubeadm reset the node (which we will do)).

$ kubectl get nodes
NAME             STATUS     ROLES    AGE   VERSION
controlplane01   NotReady   master   14m   v1.17.0
controlplane02   Ready      master   13m   v1.17.2
controlplane03   Ready      master   13m   v1.17.0
controlplane04   NotReady   master   9s    v1.17.0

After this we can add this node into kube-vip with the same manifest created by docker run.

cc/ @mauilion @yastij @fabriziopandini

from kube-vip.

thebsdbox avatar thebsdbox commented on May 25, 2024 1

@schmitch A control planekubeadm join is successful if an existing control plane is online AND healthchecks for etcd are fine. My point above is to ensure that when we remove a node, we also need to ensure it's removed as an etcd endpoint too.

from kube-vip.

thebsdbox avatar thebsdbox commented on May 25, 2024 1

I'll close this as this approach is now being used in CAPV 0.70

from kube-vip.

thebsdbox avatar thebsdbox commented on May 25, 2024

I've just tested with the following, using cp2 (control plane node 2) as the one we remove.

k8s version 1.17.0 (I'll build a new cluster later)
vip 192.168.0.81
cp1 192.168.0.70
cp2 192.168.0.71
cp3 192.168.0.72

cp4 192.168.0.73

Simply doing a poweroff of cp2 leaves this node as part of the cluster, including being part of the etcd endpoint list. This causes cp4 to fail with kubeadm join to fail as it tests endpoints.

If I start again (back to a working cluster) and ensure that cp2 is removed from the cluster and the etcd endpoints (kubeadm reset / etcd remove phase etc) are also removed.. then when I add cp4 in to the kube-vip it will only see the two remaining endpoints and will join the kube-vip cluster. It may take a little while for the raft algorithm to stabilise (the remaining leader of the vip will stay in charge however).

from kube-vip.

fabriziopandini avatar fabriziopandini commented on May 25, 2024

@yastij ^^
If I got this right, it seams that if node deletion is done properly, kube-vip removes delete nodes from the VIP

from kube-vip.

thebsdbox avatar thebsdbox commented on May 25, 2024

@fabriziopandini Is this OK to close?

from kube-vip.

fabriziopandini avatar fabriziopandini commented on May 25, 2024

@thebsdbox sorry for the late answer
If I got this right this works for me. I leave the final world to @yastij which is actively testing this in CAPV

from kube-vip.

schmitch avatar schmitch commented on May 25, 2024

@thebsdbox shouldn't kubeadm join always be successful if a quorum of the control plane is online? it seems wierd that this does not work.

from kube-vip.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.