kube-vip / kube-vip Goto Github PK
View Code? Open in Web Editor NEWKubernetes Control Plane Virtual IP and Load-Balancer
Home Page: https://kube-vip.io
License: Apache License 2.0
Kubernetes Control Plane Virtual IP and Load-Balancer
Home Page: https://kube-vip.io
License: Apache License 2.0
In https://github.com/plunder-app/kube-vip/blob/master/kubernetes-control-plane.md the kube-vip and kube-apiserver port assignments are inverted between the kube-vip config and the kube init params. I.e. you've swapped who is 6443 and who is 6444.
Instead of populating the configmap with vip/protocol/port etc.. pull information directly from service mentioned in configmap
I found in kube-vip.go
a default interface of ens192
as opposed to eth0
in other configs. Not sure if this is left over from different environment testing.
Currently if the plndr-cloud-provider pod is deleted (evicted etc.. or accidentally) it doesn't get re-created
The simplest solution is to move to a statefulset with replicas 1
Justification for StatefulSet is from tektoncd/pipeline#2630
Singleton workloads can be implemented in multiple ways, and they differ in behavior when the Node becomes unreachable:
- as a Pod - the Pod is not managed, so it will not be recreated.
- as a Deployment - the Pod will be recreated and puts Availability before the singleton property
- as a StatefulSet - the Pod will be recreated but puds the singleton property before Availability
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: plndr-cloud-provider
spec:
podManagementPolicy: OrderedReady
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: kube-vip
component: plndr-cloud-provider
template:
metadata:
labels:
app: kube-vip
component: plndr-cloud-provider
spec:
containers:
- command:
- /plndr-cloud-provider
image: plndr/plndr-cloud-provider:0.1.3
name: plndr-cloud-provider
imagePullPolicy: Always
resources: {}
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
serviceAccountName: plunder-cloud-controller
Binary help file defaults to true for --arp
flag
$ sudo ctr run --rm --net-host docker.io/plndr/kube-vip:0.1.5 init /kube-vip kubeadm init -h
The "init" subcommand will generate the Kubernetes manifest that will be started by kubeadm through the kubeadm init process
Usage:
kube-vip kubeadm init [flags]
Flags:
-h, --help help for init
Global Flags:
--addPeersToLB The Virtual IP addres (default true)
--arp Enable Arp for Vip changes (default true)
--interface string Name of the interface to bind to
--lbBackEndPort int A port that all backends may be using (optional) (default 6444)
--lbBindToVip Bind example load balancer to VIP (default true)
--lbName string The name of a load balancer instance (default "Kubeadm Load Balancer")
--lbPort int Port that load balander will expose on (default 6443)
--lbType string Type of load balancer instance (tcp/http) (default "tcp")
--log uint32 Set the level of logging (default 4)
--startAsLeader Start this instance as the cluster leader
--vip string The Virtual IP addres
However, code config implementation defaults to false.
We use outsidecluster as a flag to indicate we are outside the cluster.
Is there any reason we cannot do it the way kubectl does (or used to, last time I looked at its code), or cloud-provider? I think it basically does the following:
KUBECONFIG
env var or --kubeconfig
cmd-line option, if found, use thatKube-vip listens on port 6443 and there is no way to configure it.
Currently being investigated... shouldn't be too much work (probably)
Add a prometheus exporter! Example outputs:
readme.md
/docs/
folder/cc @nickperry
I am able to setup my cluster using the commands:
docker run --network host --rm plndr/kube-vip:0.1.9 kubeadm init
--interface ens192
--vip 192.168.254.80
--arp
--leaderElection | sudo tee /etc/kubernetes/manifests/vip.yaml
kubeadm init --control-plane-endpoint 192.168.254.80 --upload-certs
And I am able to ping 192.168.254.80 from other nodes for 30-45 seconds, however then I start seeing:
From 192.168.254.81 icmp_seq=565 Redirect Host(New nexthop: 192.168.254.80)
At which point the VIP address seems no longer accessible and kubectl stops working.
This is technically possible to expose the port mapping to a commercial router so we could:
kubectl expose deployment nginx-deployment --port=18080 --type=LoadBalancer --name=nginx-loadbalancer
The --port=xxx
will need to be considered for the uPNP endpoint (router)
Errors will be hard to handle as it's not something that Kubernetes load-balancers are designed to handle, errors would exist the kube-vip
pod in the namespace the load-balancer is created in. This is where an end-user would find that the exposed port hasn't made it to the router (such as having an already existing port-mapping created).
cc / @displague
Using a TUN/TAP device, will make it completely safe whilst managing floating IP addresses, and ensure that no duplicate addresses remain on on hosts.
https://github.com/songgao/water - should provide the required functionality.
Creating HA k8s setup using vagrant[RHEL7] and kube-vip creates initialisation error.
Setup Architecture
VIP: 192.100.100.10
master1: 192.100.100.11
master2: 192.100.100.12
master2: 192.100.100.13
Configuration steps:
AddPeersAsBackends: false
BGPConfig:
AS: 0
IPv6: false
NextHop: ""
Peers: null
RouterID: ""
SourceIF: ""
SourceIP: ""
BGPPeerConfig:
AS: 0
Address: ""
EnableBGP: false
EnableLeaderElection: true
EnableLoadBalancer: true
EnablePacket: false
GratuitousARP: true
Interface: eth1
LeaseDuration: 0
LoadBalancers:
- BackendPort: 6443
Backends:
- Address: 192.100.100.11
ParsedURL: null
Port: 6443
RawURL: ""
- Address: 192.100.100.12
ParsedURL: null
Port: 6443
RawURL: ""
- Address: 192.100.100.13
ParsedURL: null
Port: 6443
RawURL: ""
BindToVip: true
Name: Kubernetes COntrol Plane
Port: 7443
Type: tcp
LocalPeer:
Address: 192.100.100.11
ID: server1
Port: 10000
PacketAPIKey: ""
PacketProject: ""
RemotePeers:
- Address: 192.100.100.12
ID: server2
Port: 10000
- Address: 192.100.100.13
ID: server3
Port: 10000
RenewDeadline: 0
RetryPeriod: 0
SingleNode: false
StartAsLeader: true
VIP: 192.100.100.10
VIPCIDR: ""
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
name: kube-vip
namespace: kube-system
spec:
containers:
- args:
- start
env:
- name: vip_arp
value: "true"
- name: vip_interface
value: eth1
- name: vip_address
value: 192.100.100.10
- name: vip_leaderelection
value: "true"
- name: vip_leaseduration
value: "5"
- name: vip_renewdeadline
value: "3"
- name: vip_retryperiod
value: "1"
- name: lb_enable
value: "true"
- name: lb_port
value: "7443"
- name: lb_backendport
value: "6443"
- name: lb_name
value: Kubeadm Load Balancer7443
- name: lb_type
value: tcp
- name: lb_bindtovip
value: "true"
- name: log
value: "5"
image: plndr/kube-vip:0.1.9
imagePullPolicy: IfNotPresent
name: kube-vip
resources: {}
securityContext:
capabilities:
add:
- NET_ADMIN
- SYS_TIME
volumeMounts:
- mountPath: /etc/kubernetes/admin.conf
name: kubeconfig
- mountPath: /etc/ssl/certs
name: ca-certs
readOnly: true
hostNetwork: true
volumes:
- hostPath:
path: /etc/kubernetes/admin.conf
name: kubeconfig
- hostPath:
path: /etc/ssl/certs
name: ca-certs
status: {}
sudo kubeadm init --control-plane-endpoint=192.100.100.10:7443 --apiserver-advertise-address=192.100.100.11 --upload-certs --kubernetes-version=1.17.5 -v=5
3.1 Error after initialisation is described below.
I0925 09:02:34.533830 14257 initconfiguration.go:103] detected and using CRI socket: /var/run/dockershim.sock
W0925 09:02:34.534061 14257 validation.go:28] Cannot validate kube-proxy config - no validator is available
W0925 09:02:34.534073 14257 validation.go:28] Cannot validate kubelet config - no validator is available
[init] Using Kubernetes version: v1.17.5
[preflight] Running pre-flight checks
I0925 09:02:34.534335 14257 checks.go:577] validating Kubernetes and kubeadm version
I0925 09:02:34.534361 14257 checks.go:166] validating if the firewall is enabled and active
I0925 09:02:34.548787 14257 checks.go:201] validating availability of port 6443
I0925 09:02:34.549322 14257 checks.go:201] validating availability of port 10259
I0925 09:02:34.549373 14257 checks.go:201] validating availability of port 10257
I0925 09:02:34.549425 14257 checks.go:286] validating the existence of file /etc/kubernetes/manifests/kube-apiserver.yaml
I0925 09:02:34.549446 14257 checks.go:286] validating the existence of file /etc/kubernetes/manifests/kube-controller-manager.yaml
I0925 09:02:34.549458 14257 checks.go:286] validating the existence of file /etc/kubernetes/manifests/kube-scheduler.yaml
I0925 09:02:34.549469 14257 checks.go:286] validating the existence of file /etc/kubernetes/manifests/etcd.yaml
I0925 09:02:34.549489 14257 checks.go:432] validating if the connectivity type is via proxy or direct
I0925 09:02:34.549543 14257 checks.go:471] validating http connectivity to first IP address in the CIDR
I0925 09:02:34.549562 14257 checks.go:471] validating http connectivity to first IP address in the CIDR
I0925 09:02:34.549570 14257 checks.go:102] validating the container runtime
I0925 09:02:34.712920 14257 checks.go:128] validating if the service is enabled and active
[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
I0925 09:02:34.912195 14257 checks.go:335] validating the contents of file /proc/sys/net/bridge/bridge-nf-call-iptables
I0925 09:02:34.912264 14257 checks.go:335] validating the contents of file /proc/sys/net/ipv4/ip_forward
I0925 09:02:34.912303 14257 checks.go:649] validating whether swap is enabled or not
I0925 09:02:34.912342 14257 checks.go:376] validating the presence of executable ip
I0925 09:02:34.912372 14257 checks.go:376] validating the presence of executable iptables
I0925 09:02:34.912389 14257 checks.go:376] validating the presence of executable mount
I0925 09:02:34.912493 14257 checks.go:376] validating the presence of executable nsenter
I0925 09:02:34.912516 14257 checks.go:376] validating the presence of executable ebtables
I0925 09:02:34.912530 14257 checks.go:376] validating the presence of executable ethtool
I0925 09:02:34.912543 14257 checks.go:376] validating the presence of executable socat
I0925 09:02:34.912561 14257 checks.go:376] validating the presence of executable tc
I0925 09:02:34.912588 14257 checks.go:376] validating the presence of executable touch
I0925 09:02:34.912608 14257 checks.go:520] running all checks
I0925 09:02:35.097367 14257 checks.go:406] checking whether the given node name is reachable using net.LookupHost
I0925 09:02:35.097829 14257 checks.go:618] validating kubelet version
I0925 09:02:35.205698 14257 checks.go:128] validating if the service is enabled and active
I0925 09:02:35.221721 14257 checks.go:201] validating availability of port 10250
I0925 09:02:35.221846 14257 checks.go:201] validating availability of port 2379
I0925 09:02:35.221922 14257 checks.go:201] validating availability of port 2380
I0925 09:02:35.222000 14257 checks.go:249] validating the existence and emptiness of directory /var/lib/etcd
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
I0925 09:02:35.314982 14257 checks.go:838] image exists: k8s.gcr.io/kube-apiserver:v1.17.5
I0925 09:02:35.398945 14257 checks.go:838] image exists: k8s.gcr.io/kube-controller-manager:v1.17.5
I0925 09:02:35.476542 14257 checks.go:838] image exists: k8s.gcr.io/kube-scheduler:v1.17.5
I0925 09:02:35.552533 14257 checks.go:838] image exists: k8s.gcr.io/kube-proxy:v1.17.5
I0925 09:02:35.631176 14257 checks.go:838] image exists: k8s.gcr.io/pause:3.1
I0925 09:02:35.705236 14257 checks.go:838] image exists: k8s.gcr.io/etcd:3.4.3-0
I0925 09:02:35.785750 14257 checks.go:838] image exists: k8s.gcr.io/coredns:1.6.5
I0925 09:02:35.785794 14257 kubelet.go:63] Stopping the kubelet
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[certs] Using certificateDir folder "/etc/kubernetes/pki"
I0925 09:02:36.101868 14257 certs.go:104] creating a new certificate authority for ca
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [master1 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.100.100.11 192.100.100.10]
[certs] Generating "apiserver-kubelet-client" certificate and key
I0925 09:02:37.337996 14257 certs.go:104] creating a new certificate authority for front-proxy-ca
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
I0925 09:02:37.818897 14257 certs.go:104] creating a new certificate authority for etcd-ca
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [master1 localhost] and IPs [192.100.100.11 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [master1 localhost] and IPs [192.100.100.11 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
I0925 09:02:39.408826 14257 certs.go:70] creating a new public/private key files for signing service account users
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
I0925 09:02:39.607080 14257 kubeconfig.go:79] creating kubeconfig file for admin.conf
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "admin.conf" kubeconfig file
I0925 09:02:40.056720 14257 kubeconfig.go:79] creating kubeconfig file for kubelet.conf
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "kubelet.conf" kubeconfig file
I0925 09:02:40.280354 14257 kubeconfig.go:79] creating kubeconfig file for controller-manager.conf
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
I0925 09:02:40.557449 14257 kubeconfig.go:79] creating kubeconfig file for scheduler.conf
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
I0925 09:02:40.876818 14257 manifests.go:90] [control-plane] getting StaticPodSpecs
I0925 09:02:40.884142 14257 manifests.go:115] [control-plane] wrote static Pod manifest for component "kube-apiserver" to "/etc/kubernetes/manifests/kube-apiserver.yaml"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
I0925 09:02:40.884218 14257 manifests.go:90] [control-plane] getting StaticPodSpecs
W0925 09:02:40.884290 14257 manifests.go:214] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node,RBAC"
I0925 09:02:40.885287 14257 manifests.go:115] [control-plane] wrote static Pod manifest for component "kube-controller-manager" to "/etc/kubernetes/manifests/kube-controller-manager.yaml"
[control-plane] Creating static Pod manifest for "kube-scheduler"
I0925 09:02:40.885307 14257 manifests.go:90] [control-plane] getting StaticPodSpecs
W0925 09:02:40.885358 14257 manifests.go:214] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node,RBAC"
I0925 09:02:40.886859 14257 manifests.go:115] [control-plane] wrote static Pod manifest for component "kube-scheduler" to "/etc/kubernetes/manifests/kube-scheduler.yaml"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
I0925 09:02:40.888152 14257 local.go:69] [etcd] wrote Static Pod manifest for a local etcd member to "/etc/kubernetes/manifests/etcd.yaml"
I0925 09:02:40.888188 14257 waitcontrolplane.go:80] [wait-control-plane] Waiting for the API server to be healthy
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.
Unfortunately, an error has occurred:
timed out waiting for the condition
This error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'
Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI, e.g. docker.
Here is one example how you may list all Kubernetes containers running in docker:
- 'docker ps -a | grep kube | grep -v pause'
Once you have found the failing container, you can inspect its logs with:
- 'docker logs CONTAINERID'
couldn't initialize a Kubernetes cluster
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/init.runWaitControlPlanePhase
/workspace/anago-v1.17.5-beta.0.65+a04e7d7c202142/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/init/waitcontrolplane.go:100
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1
/workspace/anago-v1.17.5-beta.0.65+a04e7d7c202142/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:234
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).visitAll
/workspace/anago-v1.17.5-beta.0.65+a04e7d7c202142/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:422
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run
/workspace/anago-v1.17.5-beta.0.65+a04e7d7c202142/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:207
k8s.io/kubernetes/cmd/kubeadm/app/cmd.NewCmdInit.func1
/workspace/anago-v1.17.5-beta.0.65+a04e7d7c202142/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/init.go:147
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).execute
/workspace/anago-v1.17.5-beta.0.65+a04e7d7c202142/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:826
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).ExecuteC
/workspace/anago-v1.17.5-beta.0.65+a04e7d7c202142/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:914
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).Execute
/workspace/anago-v1.17.5-beta.0.65+a04e7d7c202142/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:864
k8s.io/kubernetes/cmd/kubeadm/app.Run
/workspace/anago-v1.17.5-beta.0.65+a04e7d7c202142/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/kubeadm.go:50
main.main
_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/kubeadm.go:25
runtime.main
/usr/local/go/src/runtime/proc.go:203
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1357
error execution phase wait-control-plane
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1
/workspace/anago-v1.17.5-beta.0.65+a04e7d7c202142/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:235
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).visitAll
/workspace/anago-v1.17.5-beta.0.65+a04e7d7c202142/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:422
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run
/workspace/anago-v1.17.5-beta.0.65+a04e7d7c202142/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:207
k8s.io/kubernetes/cmd/kubeadm/app/cmd.NewCmdInit.func1
/workspace/anago-v1.17.5-beta.0.65+a04e7d7c202142/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/init.go:147
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).execute
/workspace/anago-v1.17.5-beta.0.65+a04e7d7c202142/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:826
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).ExecuteC
/workspace/anago-v1.17.5-beta.0.65+a04e7d7c202142/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:914
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).Execute
/workspace/anago-v1.17.5-beta.0.65+a04e7d7c202142/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:864
k8s.io/kubernetes/cmd/kubeadm/app.Run
/workspace/anago-v1.17.5-beta.0.65+a04e7d7c202142/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/kubeadm.go:50
main.main
_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/kubeadm.go:25
runtime.main
/usr/local/go/src/runtime/proc.go:203
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1357
3.2 From host when I ping VIPwhile initializing cluster:
From 192.100.100.1 icmp_seq=39 Destination Host Unreachable
From 192.100.100.1 icmp_seq=40 Destination Host Unreachable
From 192.100.100.1 icmp_seq=41 Destination Host Unreachable
From 192.100.100.1 icmp_seq=42 Destination Host Unreachable
From 192.100.100.1 icmp_seq=43 Destination Host Unreachable
From 192.100.100.1 icmp_seq=44 Destination Host Unreachable
From 192.100.100.1 icmp_seq=45 Destination Host Unreachable
From 192.100.100.1 icmp_seq=46 Destination Host Unreachable
From 192.100.100.1 icmp_seq=47 Destination Host Unreachable
64 bytes from 192.100.100.10: icmp_seq=48 ttl=64 time=2554 ms
64 bytes from 192.100.100.10: icmp_seq=49 ttl=64 time=1526 ms
64 bytes from 192.100.100.10: icmp_seq=50 ttl=64 time=506 ms
64 bytes from 192.100.100.10: icmp_seq=51 ttl=64 time=0.744 ms
64 bytes from 192.100.100.10: icmp_seq=52 ttl=64 time=0.565 ms
64 bytes from 192.100.100.10: icmp_seq=53 ttl=64 time=2.05 ms
64 bytes from 192.100.100.10: icmp_seq=54 ttl=64 time=0.653 ms
64 bytes from 192.100.100.10: icmp_seq=55 ttl=64 time=0.949 ms
64 bytes from 192.100.100.10: icmp_seq=56 ttl=64 time=0.600 ms
64 bytes from 192.100.100.10: icmp_seq=57 ttl=64 time=0.590 ms
64 bytes from 192.100.100.10: icmp_seq=58 ttl=64 time=0.600 ms
64 bytes from 192.100.100.10: icmp_seq=59 ttl=64 time=1.53 ms
64 bytes from 192.100.100.10: icmp_seq=60 ttl=64 time=0.736 ms
64 bytes from 192.100.100.10: icmp_seq=61 ttl=64 time=0.665 ms
64 bytes from 192.100.100.10: icmp_seq=62 ttl=64 time=0.933 ms
3.3 Logs of docker container of kube-vip
time="2020-09-25T09:03:11Z" level=info msg="Beginning cluster membership, namespace [kube-system], lock name [plunder-lock], id [master1]"
I0925 09:03:11.136194 1 leaderelection.go:242] attempting to acquire leader lease kube-system/plunder-lock...
I0925 09:03:19.347336 1 leaderelection.go:252] successfully acquired lease kube-system/plunder-lock
time="2020-09-25T09:03:19Z" level=info msg="This node is starting with leadership of the cluster"
time="2020-09-25T09:03:19Z" level=info msg="Starting TCP Load Balancer for service [192.100.100.10:3875]"
time="2020-09-25T09:03:19Z" level=info msg="Load Balancer [Kubeadm Load Balancer7443] started"
time="2020-09-25T09:03:19Z" level=info msg="Broadcasting ARP update for 192.100.100.10 (08:00:27:45:06:38) via eth1"
time="2020-09-25T09:03:19Z" level=info msg="Node [master1] is assuming leadership of the cluster"
time="2020-09-25T09:03:22Z" level=info msg="Broadcasting ARP update for 192.100.100.10 (08:00:27:45:06:38) via eth1"
time="2020-09-25T09:03:25Z" level=info msg="Broadcasting ARP update for 192.100.100.10 (08:00:27:45:06:38) via eth1"
time="2020-09-25T09:03:28Z" level=info msg="Broadcasting ARP update for 192.100.100.10 (08:00:27:45:06:38) via eth1"
time="2020-09-25T09:03:31Z" level=info msg="Broadcasting ARP update for 192.100.100.10 (08:00:27:45:06:38) via eth1"
3.4 Logs of docker container of kube-controller-manager
I0925 09:03:14.733520 1 serving.go:312] Generated self-signed cert in-memory
I0925 09:03:16.187931 1 controllermanager.go:161] Version: v1.17.5
I0925 09:03:16.189522 1 secure_serving.go:178] Serving securely on 127.0.0.1:10257
I0925 09:03:16.190090 1 deprecated_insecure_serving.go:53] Serving insecurely on [::]:10252
I0925 09:03:16.190205 1 leaderelection.go:242] attempting to acquire leader lease kube-system/kube-controller-manager...
I0925 09:03:16.190887 1 dynamic_cafile_content.go:166] Starting request-header::/etc/kubernetes/pki/front-proxy-ca.crt
I0925 09:03:16.190931 1 tlsconfig.go:219] Starting DynamicServingCertificateController
I0925 09:03:16.191094 1 dynamic_cafile_content.go:166] Starting client-ca-bundle::/etc/kubernetes/pki/ca.crt
E0925 09:03:17.231840 1 leaderelection.go:331] error retrieving resource lock kube-system/kube-controller-manager: Get https://192.100.100.10:7443/api/v1/namespaces/kube-system/endpoints/kube-controller-manager?timeout=10s: dial tcp 192.100.100.10:7443: connect: no route to host
E0925 09:03:21.241244 1 leaderelection.go:331] error retrieving resource lock kube-system/kube-controller-manager: Get https://192.100.100.10:7443/api/v1/namespaces/kube-system/endpoints/kube-controller-manager?timeout=10s: dial tcp 192.100.100.10:7443: connect: connection refused
E0925 09:03:24.056955 1 leaderelection.go:331] error retrieving resource lock kube-system/kube-controller-manager: Get https://192.100.100.10:7443/api/v1/namespaces/kube-system/endpoints/kube-controller-manager?timeout=10s: dial tcp 192.100.100.10:7443: connect: connection refused
E0925 09:03:28.029424 1 leaderelection.go:331] error retrieving resource lock kube-system/kube-controller-manager: Get https://192.100.100.10:7443/api/v1/namespaces/kube-system/endpoints/kube-controller-manager?timeout=10s: dial tcp 192.100.100.10:7443: connect: connection refused
Please reply me for any other information if I can provide on this.
For example, if we use MetalLB to create and advertise the LB IP for api server, kubelet cannot talk to the control plane until MetalLB has started and configured the LB IP. But MetalLB cannot start until kubelet can talk to the control plane and discover that it should be running the pod.
This issue is not specifically about MetalLB, I'm just wondering if kube-vip has the same issue. Could somebody please enlighten me if I could use kube-vip on baremetal k8s to advertise the kube api server? So I could use https://$LB_IP:6443
instead of https://$A_NODE_IP:6443
in my kubeconfig file.
Related open issues:
I rebooted my server and when everything came back up I noticed I have two LB services using the same IP 10.123.204.1
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kafka-schema-registry-internal-loadbalancer LoadBalancer 10.233.35.155 10.123.204.1 8081:30313/TCP 45h
webapp-internal-loadbalancer LoadBalancer 10.233.38.170 10.123.204.2 8081:30785/TCP 45h
xtradb-internal-loadbalancer LoadBalancer 10.233.38.114 10.123.204.1 3306:32431/TCP 45h
Also, is it much work to re-use the existing IP address if one is already assigned ? xtradb-internal-loadbalancer used to have 10.123.204.3 or .4 before the reboot
There is a new sub-command to kibe-vip
which is to allow tighter integration with kubeadm
:
kube-vip kubeadm init --<flags>
kube-vip kubeadm join
The init
will generate a manifest (no configuration file) with everything required for the first node. Currently some of the configuration has been reverted due to breaking changes in #39
Manifests require:
At the moment with the change to leaderElection
we're mainly focussed on just providing VIP services. It would be good to investigate if we can grab all controlPlane
members regularly and use this as the backend array for load-balancing.
address a few config typos
Create local directory for BD storage
mkdir mysql
Start Docker MySQL container
sudo docker run --cap-add SYS_NICE -p 3306:3306 --name k3s-mysql -v /home/dan/mysql:/var/lib/mysql -e MYSQL_ROOT_PASSWORD=k3s-password -d mysql:8
Set VIP
export VIP=192.168.0.40
Create Manifest directory (before) starting k3s
sudo mkdir -p /var/lib/rancher/k3s/server/manifests/
Create kube-vip manifest (needs turning into daemonset)
sudo docker run --network host --rm plndr/kube-vip:0.1.8 kubeadm init --interface ens160 --vip $VIP --arp --leaderElection | sed 's/name: kube-vip/name: kube-vip-'
hostname'/g' | sed 's/path: \/etc\/kubernetes\/admin.conf/path: \/etc\/rancher\/k3s\/k3s.yaml/g' | sudo tee /var/lib/rancher/k3s/server/manifests/vip.yaml
Start k3s
sudo ./k3s server --tls-san $VIP --datastore-endpoint="mysql://root:k3s-password@tcp(192.168.0.43:3306)/kubernetes"
Grab configuration and point it to the VIP address
mkdir -p $HOME/.kube
sudo cat /etc/rancher/k3s/k3s.yaml | sed 's/127.0.0.1/'$VIP'/g' > $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
GetNode
token
sudo cat /var/lib/rancher/k3s/server/node-token
Alternative install method and notes (TODO - investigate later on)
curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="--write-kubeconfig-mode 644 \
--datastore-endpoint mysql://root:k3s-password@tcp(192.168.0.43:3306)/kubernetes \
-t agent-secret --tls-san $VIP \
--node-taint k3s-controlplane=true:NoExecute" sh -
or
sudo K3S_TOKEN= ./k3s server --server https://192.168.0.40:6443
cc /@oskapt
Bare metal servers are BIG, sometimes TOO BIG.. which makes a control plane sat on a box with 46Tb of ram and 1.7 million CPUs a waste of space..
The plan is to pull out the ARP and BGP loadbalancer stuff into separate managers, then spin out the load balancer code for either use case into separate Go routines.
I have a control plane with 3 node, and i want to replace 1 node (say delete machine 1, create machine 4)
What is the correct procedure for doing this with a kube-vip control plane-load balancer?
Thanks in advance for help
Leaving this to track the issue with leader election
We just had a combination of kyverno and kube-vip crashlooping.
This began with kyverno configuring an validatingwebhook with an failure ignore policy.
Then kyverno crashed and didnt come back up in a very short ammount of time, causing the k8s-api to wait the webhook timeout on each request.
It seems that that caused kube-vip to crashloop.
We are still using kube-vip 0.1.7
Logs:
Node1:
time="2020-12-03T15:14:13Z" level=info msg="Beginning cluster membership, namespace [kube-system], lock name [plunder-lock], id [ddos-1-ppf6s]"
I1203 15:14:13.575443 1 leaderelection.go:242] attempting to acquire leader lease kube-system/plunder-lock...
time="2020-12-03T15:14:13Z" level=info msg="Node [ddos-1-5nlmb] is assuming leadership of the cluster"
time="2020-12-03T15:14:15Z" level=info msg="Node [ddos-1-pwkjb] is assuming leadership of the cluster"
[a92615428@jos-2011 ~]$ kubectl --kubeconfig debug.kubeconfig -n kube-system logs kube-vip-ddos-1-ppf6s --previous
time="2020-12-03T15:05:22Z" level=info msg="Beginning cluster membership, namespace [kube-system], lock name [plunder-lock], id [ddos-1-ppf6s]"
I1203 15:05:22.885676 1 leaderelection.go:242] attempting to acquire leader lease kube-system/plunder-lock...
time="2020-12-03T15:05:22Z" level=info msg="Node [ddos-1-pwkjb] is assuming leadership of the cluster"
I1203 15:09:00.584042 1 leaderelection.go:252] successfully acquired lease kube-system/plunder-lock
time="2020-12-03T15:09:00Z" level=info msg="Node [ddos-1-ppf6s] is assuming leadership of the cluster"
time="2020-12-03T15:09:00Z" level=info msg="This node is starting with leadership of the cluster"
time="2020-12-03T15:09:00Z" level=info msg="Broadcasting ARP update for 10.78.160.254 (00:50:56:8a:8a:98) via eth0"
I1203 15:09:02.584342 1 leaderelection.go:288] failed to renew lease kube-system/plunder-lock: failed to tryAcquireOrRenew context deadline exceeded
time="2020-12-03T15:09:03Z" level=info msg="Broadcasting ARP update for 10.78.160.254 (00:50:56:8a:8a:98) via eth0"
E1203 15:09:04.642873 1 leaderelection.go:307] Failed to release lock: Operation cannot be fulfilled on leases.coordination.k8s.io "plunder-lock": the object has been modified; please apply your changes to the latest version and try again
time="2020-12-03T15:09:04Z" level=info msg="This node is becoming a follower within the cluster"
time="2020-12-03T15:09:04Z" level=info msg="Shutting down Kube-Vip Leader Election cluster"
Node2:
time="2020-12-03T15:04:31Z" level=info msg="Beginning cluster membership, namespace [kube-system], lock name [plunder-lock], id [ddos-1-pwkjb]"
I1203 15:04:31.730902 1 leaderelection.go:242] attempting to acquire leader lease kube-system/plunder-lock...
time="2020-12-03T15:04:31Z" level=info msg="Node [ddos-1-5nlmb] is assuming leadership of the cluster"
I1203 15:04:35.941138 1 leaderelection.go:252] successfully acquired lease kube-system/plunder-lock
time="2020-12-03T15:04:35Z" level=info msg="Node [ddos-1-pwkjb] is assuming leadership of the cluster"
time="2020-12-03T15:04:35Z" level=info msg="This node is starting with leadership of the cluster"
time="2020-12-03T15:04:35Z" level=info msg="Broadcasting ARP update for 10.78.160.254 (00:50:56:8a:0b:65) via eth0"
...
time="2020-12-03T15:08:51Z" level=info msg="Broadcasting ARP update for 10.78.160.254 (00:50:56:8a:0b:65) via eth0"
I1203 15:08:54.384629 1 leaderelection.go:288] failed to renew lease kube-system/plunder-lock: failed to tryAcquireOrRenew context deadline exceeded
time="2020-12-03T15:08:54Z" level=info msg="Broadcasting ARP update for 10.78.160.254 (00:50:56:8a:0b:65) via eth0"
E1203 15:08:56.439606 1 leaderelection.go:307] Failed to release lock: Operation cannot be fulfilled on leases.coordination.k8s.io "plunder-lock": the object has been modified; please apply your changes to the latest version and try again
time="2020-12-03T15:08:56Z" level=info msg="This node is becoming a follower within the cluster"
time="2020-12-03T15:08:56Z" level=info msg="Shutting down Kube-Vip Leader Election cluster"
Node3:
time="2020-12-03T15:05:54Z" level=info msg="Beginning cluster membership, namespace [kube-system], lock name [plunder-lock], id [ddos-1-5nlmb]"
I1203 15:05:54.460396 1 leaderelection.go:242] attempting to acquire leader lease kube-system/plunder-lock...
time="2020-12-03T15:05:54Z" level=info msg="Node [ddos-1-pwkjb] is assuming leadership of the cluster"
E1203 15:09:01.716885 1 leaderelection.go:367] Failed to update lock: Operation cannot be fulfilled on leases.coordination.k8s.io "plunder-lock": the object has been modified; please apply your changes to the latest version and try again
time="2020-12-03T15:09:03Z" level=info msg="Node [ddos-1-ppf6s] is assuming leadership of the cluster"
I1203 15:09:09.752718 1 leaderelection.go:252] successfully acquired lease kube-system/plunder-lock
time="2020-12-03T15:09:09Z" level=info msg="Node [ddos-1-5nlmb] is assuming leadership of the cluster"
time="2020-12-03T15:09:09Z" level=info msg="This node is starting with leadership of the cluster"
time="2020-12-03T15:09:09Z" level=info msg="Broadcasting ARP update for 10.78.160.254 (00:50:56:8a:50:7a) via eth0"
I1203 15:09:11.753050 1 leaderelection.go:288] failed to renew lease kube-system/plunder-lock: failed to tryAcquireOrRenew context deadline exceeded
time="2020-12-03T15:09:12Z" level=info msg="Broadcasting ARP update for 10.78.160.254 (00:50:56:8a:50:7a) via eth0"
E1203 15:09:13.811535 1 leaderelection.go:307] Failed to release lock: Operation cannot be fulfilled on leases.coordination.k8s.io "plunder-lock": the object has been modified; please apply your changes to the latest version and try again
time="2020-12-03T15:09:13Z" level=info msg="This node is becoming a follower within the cluster"
time="2020-12-03T15:09:13Z" level=info msg="Shutting down Kube-Vip Leader Election cluster"
Version: 0.1.5
An inacessable config file causes a crash:
time="2020-04-22T07:34:35Z" level=info msg="Reading configuration from [/etc/kube-vip/config.yaml]"
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1277a9e]
goroutine 1 [running]:
github.com/plunder-app/kube-vip/cmd.glob..func6(0x21107c0, 0xc0001a5c00, 0x0, 0x2)
/src/cmd/kube-vip-start.go:53 +0x44e
github.com/spf13/cobra.(*Command).execute(0x21107c0, 0xc0001a5be0, 0x2, 0x2, 0x21107c0, 0xc0001a5be0)
/go/pkg/mod/github.com/spf13/[email protected]/command.go:844 +0x2aa
github.com/spf13/cobra.(*Command).ExecuteC(0x2110a60, 0x43b80a, 0x20b7720, 0xc000058750)
/go/pkg/mod/github.com/spf13/[email protected]/command.go:945 +0x317
github.com/spf13/cobra.(*Command).Execute(...)
/go/pkg/mod/github.com/spf13/[email protected]/command.go:885
github.com/plunder-app/kube-vip/cmd.Execute()
/src/cmd/kube-vip.go:64 +0x31
main.main()
/src/main.go:14 +0x6e
kube-vip should have a proper error handling in that case and present a warning.
Hello,
I am wanting to raise a PR for docs https://kube-vip.io is there a public repo for this?
to enable DNS support, we should be able to resolve a DNS entry and use it a VIP for ARP broadcasting
https://github.com/brutella/dnssd <- should be relatively straight forward
Something like below should work:
// Load certs
certPair, err := tls.X509KeyPair(cert, key)
cfg := &tls.Config{Certificates: []tls.Certificate{certPair}}
// start server
l, err := net.ListenTCP("tcp", laddr)
if nil != err {
return fmt.Errorf("Unable to bind [%s]", err.Error())
}
// Listen in TLS
listener = tls.NewListener(l, cfg)
The code already exists within kube-vip
control plane, this issue tracks adding it to the type: LoadBalancer
I am trying to run kube-vip start
inside my cluster to have my CP become reachable on a VIP. I do not need any LB for now, just to have one of the nodes bind to the VIP and fight over it through leader election.
I start kube-vip
(from a DaemonSet) in this cluster like this:
image: plndr/kube-vip:0.2.1
imagePullPolicy: Always
name: kube-vip
command:
- /kube-vip
- start
env:
- name: vip_address
value: "10.8.95.3"
- name: vip_leaderelection
value: "true"
- name: vip_interface
value: "eth0"
- name: vip_loglevel
value: "5"
- name: vip_leaseduration
value: "15"
- name: vip_renewdeadline
value: "10"
- name: vip_retryperiod
value: "2"
- name: vip_incluster
value: "true"
- name: vip_inCluster
value: "true"
- name: inCluster
value: "true"
I know only the first of the three is the only right one, but I did it to really really make sure character casing and prefix stripping were not the culprit.
The error I get is:
panic: stat /etc/kubernetes/admin.conf: no such file or directory
goroutine 1 [running]:
github.com/plunder-app/kube-vip/pkg/cluster.NewManager(0x1d62e64, 0x1a, 0xc00000d000, 0x192b, 0x0, 0xc0001406e0, 0xc000143b80)
/src/pkg/cluster/clusterLeader.go:59 +0x535
github.com/plunder-app/kube-vip/cmd.glob..func9(0x2ba0aa0, 0x3116750, 0x0, 0x0)
/src/cmd/kube-vip-start.go:90 +0x1bf
github.com/spf13/cobra.(*Command).execute(0x2ba0aa0, 0x3116750, 0x0, 0x0, 0x2ba0aa0, 0x3116750)
/go/pkg/mod/github.com/spf13/[email protected]/command.go:846 +0x2c2
github.com/spf13/cobra.(*Command).ExecuteC(0x2ba0d40, 0x4458ca, 0x2b3b740, 0xc000066778)
/go/pkg/mod/github.com/spf13/[email protected]/command.go:950 +0x375
github.com/spf13/cobra.(*Command).Execute(...)
/go/pkg/mod/github.com/spf13/[email protected]/command.go:887
github.com/plunder-app/kube-vip/cmd.Execute()
/src/cmd/kube-vip.go:113 +0x31
main.main()
/src/main.go:14 +0x6e
From what I read at cmd/kube-vip-start.go
and pkg/cluster/clusterLeader.go
, I really cannot get my head around why this is happening.
env file:
install -d /opt/software/env
cat > /opt/software/env/kubernetes.env <<EOF
OS_PRETTY_NAME="$(grep "PRETTY_NAME" /etc/os-release | awk -F"\"" '{print $2}')"
KUBE_VERSION="1.18.8"
KUBE_CLUSTE_NAME="mycluster" # kubeadm的默认值为"kubernetes"
KUBE_VIP_VERSION="0.1.7" # Get from https://github.com/plunder-app/kube-vip/releases
KUBE_VIP_INTERFACE="ens33" # 默认路由网卡名称
KUBE_API_LB_HOST="192.168.128.250" # 虚拟IP地址
KUBE_API_LB_PORT=16443
KUBE_POD_SUBNET="10.95.0.0/16"
KUBE_SVC_SUBNET="10.96.0.0/16" # kubeadm默认值为"10.96.0.0/12"
EOF
create static pod yaml command:
source /opt/software/env/kubernetes.env
sudo docker run --network host --rm \
plndr/kube-vip:$KUBE_VIP_VERSION \
kubeadm init \
--interface $KUBE_VIP_INTERFACE \
--vip $KUBE_API_LB_HOST \
--arp \
--leaderElection \
| sudo tee /etc/kubernetes/manifests/vip.yaml
error:
time="2020-08-28T02:12:48Z" level=info msg="This node is starting with leadership of the cluster"
time="2020-08-28T02:12:48Z" level=info msg="Node [k8s-node-1] is assuming leadership of the cluster"
time="2020-08-28T02:12:48Z" level=info msg="Broadcasting ARP update for 192.168.128.250 (00:0c:29:d5:81:22) via ens33"
time="2020-08-28T02:12:51Z" level=info msg="Broadcasting ARP update for 192.168.128.250 (00:0c:29:d5:81:22) via ens33"
I0828 02:12:51.937656 1 leaderelection.go:288] failed to renew lease kube-system/plunder-lock: failed to tryAcquireOrRenew context deadline exceeded
E0828 02:12:52.618251 1 leaderelection.go:307] Failed to release lock: Lease.coordination.k8s.io "plunder-lock" is invalid: spec.leaseDurationSeconds: Invalid value: 0: must be greater than 0
time="2020-08-28T02:12:52Z" level=info msg="This node is becoming a follower within the cluster"
time="2020-08-28T02:12:52Z" level=info msg="Shutting down Kube-Vip Leader Election cluster"
we should refactor the kube-vip
code base to introduce unit test, in particular we should introduce a fake implementation of the network interface.
we should also explore envtest
for integration tests since we're using leaderelection
Followed instructions to build load-balanced control plane for Kubernetes cluster at https://kube-vip.io/control-plane/.
After deploying kube-vip to second node with config:
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
name: kube-vip
namespace: kube-system
spec:
containers:
- args:
- start
env:
- name: vip_arp
value: "true"
- name: vip_interface
value: ens192
- name: vip_cidr
value: "32"
- name: vip_leaderelection
value: "true"
- name: vip_leaseduration
value: "5"
- name: vip_renewdeadline
value: "3"
- name: vip_retryperiod
value: "1"
- name: vip_address
value: ###.###.###.###
image: plndr/kube-vip:0.2.2
imagePullPolicy: Always
name: kube-vip
resources: {}
securityContext:
capabilities:
add:
- NET_ADMIN
- SYS_TIME
volumeMounts:
- mountPath: /etc/kubernetes/admin.conf
name: kubeconfig
- mountPath: /etc/ssl/certs
name: ca-certs
readOnly: true
hostNetwork: true
volumes:
- hostPath:
path: /etc/kubernetes/admin.conf
name: kubeconfig
- hostPath:
path: /etc/ssl/certs
name: ca-certs
status: {}
I have in the log of this container:
2020-11-26T00:35:16.273397819Z time="2020-11-26T00:35:16Z" level=info msg="Beginning cluster membership, namespace [], lock name [plndr-cp-lock], id [vm-tst-kube-m2]"
2020-11-26T00:35:16.273668809Z I1126 00:35:16.273611 1 leaderelection.go:242] attempting to acquire leader lease /plndr-cp-lock...
2020-11-26T00:35:16.273724134Z E1126 00:35:16.273688 1 leaderelection.go:321] error retrieving resource lock /plndr-cp-lock: an empty namespace may not be set when a resource name is provided
2020-11-26T00:35:18.101416084Z E1126 00:35:18.101292 1 leaderelection.go:321] error retrieving resource lock /plndr-cp-lock: an empty namespace may not be set when a resource name is provided
2020-11-26T00:35:20.058238179Z E1126 00:35:20.058099 1 leaderelection.go:321] error retrieving resource lock /plndr-cp-lock: an empty namespace may not be set when a resource name is provided
2020-11-26T00:35:21.383662919Z E1126 00:35:21.383536 1 leaderelection.go:321] error retrieving resource lock /plndr-cp-lock: an empty namespace may not be set when a resource name is provided
2020-11-26T00:35:23.153960632Z E1126 00:35:23.153842 1 leaderelection.go:321] error retrieving resource lock /plndr-cp-lock: an empty namespace may not be set when a resource name is provided
2020-11-26T00:35:25.314805828Z E1126 00:35:25.314693 1 leaderelection.go:321] error retrieving resource lock /plndr-cp-lock: an empty namespace may not be set when a resource name is provided
https://github.com/kubernetes/client-go/tree/master/examples/leader-election
This would give the following architecture:
OutOfCluster: raft
InCluster: client-Go
Leader Election
In one of the CAPV setup with kube-vip, we saw that the gARP message was not received by other nodes. During upgrade when VIP migrated to the new control plane node, 'arp -n' shows other nodes still have the old MAC address for the VIP.
After digging more it turns out that, both ARPRequest and ARPReply can be used to send gratuitous ARP, and network devices may only support either one of them.
Also, for the ARPReply that kube-vip is sending, the receiver MAC/IP seems have a mis configuration which does not match what the protocol defines.
Ref:
https://tools.ietf.org/html/rfc5944#section-4.6
https://en.wikipedia.org/wiki/Address_Resolution_Protocol.
Will have a PR to send Request/Reply alternatively, also change the Reply message.
There are a number of deployment options for using a vip with kubeadm
all of them have numerous pros/cons:
1. Standalone VIP pre-kubeadm
This is a "one-shot" approach where a vip is created and applied prior to running a kubeadm init
, this will attach the IP to the initial host allowing kubeadm
to communicate with the newly deployed api-server through the address. Once completed then this vip would need removing and restarting within the cluster itself (perhaps as a daemonset).
This could be automated in the following fashion:
kube-vip
is started and is passed the command line to run kubeadm
kube-vip
then runs the kubeadm init
and waits for its completion exec.Command()
kube-vip
upon completion of the init stops the vip and starts it within the kubernetes cluster2. Static pods (how it is done today)
In this approach the vip is ran outside of the cluster, through the use of static pods that are managed by the kubelet
and not the kubernetes cluster itself.
This is automated in the following fashion:
kube-vip
generates a manifest in /etc/kubelet/manifests
kubeadm init
starts all pods from manifests (including kube-vip
)kubeadm join
followed by kube-vip join
will add nodes through the vip and will add them as members of the vip clustercc @yastij / @ncdc / @randomvariable (any thoughts)
Cool project!
My immediate thought when reading about kube-vip was "How does this compare to MetalLB?".
I think it would be good to add to the readme how it differs, and if there are situations where one is preferred over the other. Unfortunately I don't know the answers myself or I would already have opened a PR!
Specifically, I think MetalLB's layer 2 mode seems to be exactly the same as wat kube-vip is doing.
We're defaulting to leaderElection
and from what we've seen it provides a much smoother and stable cluster and ensures lifecycle behaviours.. with that should we remove the Raft code.
cc / @jzhoucliqr @yastij
Hello Everyone,
The last 3 releases (https://hub.docker.com/r/plndr/kube-vip/tags) are missing the arm packages. Probably were forgotten?
KR,
Vasileios
@megian :-)
One thing I like about metallb is that you can install it, configure it and it will work for all namespaces without having to deploy a namespace specific config as well as a deployment (kube-vip) in the namespace
Would an alternative way of deploying work where
One downside with using a daemonset for kube-vip is that if the pod crashes it effects all vips on that node
Since you have 3 replicas with affinity for hostname, you should document that the example needs at least 3 unique worker nodes, or else it'll fail.
Currently new images are created manually through the makefile (needs repo access) this will need automating if enough demand is reached.
Is it possible to configure kube-vip so that it just assigns the IP address to the service, and get kube proxy to do the load balancing ?
Investigate using DHCP to add a VIP address
cc / @mylesagray
This started out as a different issue, but may well just be a documentation change...I've kept the full process in here as it's already been written up, and it may help to validate
TL;DR - the docs specify using the --api-server-bind-port
flag and it causes kubeadm init
to fail. Removing the flag let me initialise the cluster.
I'm trying to load balance a Kubernetes 1.19.2 Control Plane following the instructions here https://kube-vip.io/control-plane/. I previously had kube-vip 0.1.3 load balancing kubernetes 1.18.x.
Hosts
# Add to /etc/hosts
192.168.20.9 rpi-vip rpi-vip.lab.definit.co.uk
192.168.20.12 rpi4-01 rpi4-01.lab.definit.co.uk
192.168.20.13 rpi4-02 rpi4-02.lab.definit.co.uk
192.168.20.14 rpi4-03 rpi4-03.lab.definit.co.uk
Generate config yaml
sudo docker run --network host --rm plndr/kube-vip:0.1.8 kubeadm init \
--interface eth0 \
--vip 192.168.20.9 \
--arp \
--leaderElection | sudo tee /etc/kubernetes/manifests/vip.yaml
Which generates:
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
name: kube-vip
namespace: kube-system
spec:
containers:
- args:
- start
env:
- name: vip_arp
value: "true"
- name: vip_interface
value: eth0
- name: vip_address
value: 192.168.20.9
- name: vip_leaderelection
value: "true"
- name: vip_leaseduration
value: "5"
- name: vip_renewdeadline
value: "3"
- name: vip_retryperiod
value: "1"
image: plndr/kube-vip:0.1.8
imagePullPolicy: Always
name: kube-vip
resources: {}
securityContext:
capabilities:
add:
- NET_ADMIN
- SYS_TIME
volumeMounts:
- mountPath: /etc/kubernetes/admin.conf
name: kubeconfig
- mountPath: /etc/ssl/certs
name: ca-certs
readOnly: true
hostNetwork: true
volumes:
- hostPath:
path: /etc/kubernetes/admin.conf
name: kubeconfig
- hostPath:
path: /etc/ssl/certs
name: ca-certs
status: {}
Init Cluster
sudo kubeadm init --control-plane-endpoint “192.168.20.9:6443” --apiserver-bind-port 6444 --upload-certs --kubernetes-version "v1.19.2"
This fails - basically because it can't communicate with the control plane API server.
[init] Using Kubernetes version: v1.19.2
[preflight] Running pre-flight checks
[WARNING SystemVerification]: missing optional cgroups: hugetlb
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local rpi4-01] and IPs [10.96.0.1 192.168.20.12 192.168.20.9]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [localhost rpi4-01] and IPs [192.168.20.12 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [localhost rpi4-01] and IPs [192.168.20.12 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "admin.conf" kubeconfig file
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.
Unfortunately, an error has occurred:
timed out waiting for the condition
This error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'
Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI.
Here is one example how you may list all Kubernetes containers running in docker:
- 'docker ps -a | grep kube | grep -v pause'
Once you have found the failing container, you can inspect its logs with:
- 'docker logs CONTAINERID'
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
To see the stack trace of this error execute with --v=5 or higher
The node is listening on 6444, but not 6443 - in fact I don't see the VIP address bound to the interface
tcp 0 0 192.168.20.12:47636 192.168.20.12:6444 ESTABLISHED keepalive (19.31/0/0)
tcp 0 0 192.168.20.12:47716 192.168.20.12:6444 ESTABLISHED keepalive (25.57/0/0)
tcp 0 0 192.168.20.12:47746 192.168.20.12:6444 ESTABLISHED keepalive (7.85/0/0)
tcp 0 0 192.168.20.12:47614 192.168.20.12:6444 ESTABLISHED keepalive (10.80/0/0)
tcp6 0 0 :::6444 :::* LISTEN off (0.00/0/0)
tcp6 0 0 ::1:6444 ::1:41970 ESTABLISHED keepalive (66.05/0/0)
tcp6 0 0 192.168.20.12:6444 192.168.20.12:47636 ESTABLISHED keepalive (59.75/0/0)
tcp6 0 0 192.168.20.12:6444 192.168.20.12:47746 ESTABLISHED keepalive (67.68/0/0)
tcp6 0 0 ::1:41970 ::1:6444 ESTABLISHED keepalive (27.75/0/0)
tcp6 0 0 192.168.20.12:6444 192.168.20.12:47614 ESTABLISHED keepalive (69.92/0/0)
tcp6 0 0 192.168.20.12:6444 192.168.20.12:47716 ESTABLISHED keepalive (65.90/0/0)
pi@rpi4-01:~ $ netstat -ano | grep 6443
tcp 0 1 192.168.20.12:33980 192.168.20.9:6443 SYN_SENT on (0.00/0/0)
tcp 0 1 192.168.20.12:33986 192.168.20.9:6443 SYN_SENT on (0.37/0/0)
tcp 0 1 192.168.20.12:33984 192.168.20.9:6443 SYN_SENT on (0.17/0/0)
pi@rpi4-01:~ $ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether dc:a6:32:66:31:b9 brd ff:ff:ff:ff:ff:ff
inet 192.168.20.12/24 brd 192.168.20.255 scope global noprefixroute eth0
valid_lft forever preferred_lft forever
inet6 fe80::8a84:80d4:d7c2:c323/64 scope link
valid_lft forever preferred_lft forever
3: wlan0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether dc:a6:32:66:31:ba brd ff:ff:ff:ff:ff:ff
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:13:7f:85:2a brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
valid_lft forever preferred_lft forever
docker ps shows that kube-vip and kube-api server are running
pi@rpi4-01:~ $ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
76c790f85c6d plndr/kube-vip "/kube-vip start" 6 minutes ago Up 6 minutes k8s_kube-vip_kube-vip-rpi4-01_kube-system_09c43c4aab4ab14ec3bf2f1c26ec19c0_0
e6ecf7216f9d d4674d6c9d85 "kube-controller-man…" 7 minutes ago Up 7 minutes k8s_kube-controller-manager_kube-controller-manager-rpi4-01_kube-system_1bb834f9d8f4da4824190f635fdb6eff_0
abc58645ca41 d0cd31ffbe54 "kube-apiserver --ad…" 7 minutes ago Up 7 minutes k8s_kube-apiserver_kube-apiserver-rpi4-01_kube-system_da8d1b6274e5e7613c534f0108d3db2e_0
9d7d110e980b 2e91dde7e952 "etcd --advertise-cl…" 7 minutes ago Up 7 minutes k8s_etcd_etcd-rpi4-01_kube-system_886a564097da8246af6636713fab1b64_0
629077b18d63 k8s.gcr.io/pause:3.2 "/pause" 7 minutes ago Up 7 minutes k8s_POD_kube-controller-manager-rpi4-01_kube-system_1bb834f9d8f4da4824190f635fdb6eff_0
c14aff18fb1e k8s.gcr.io/pause:3.2 "/pause" 7 minutes ago Up 7 minutes k8s_POD_kube-apiserver-rpi4-01_kube-system_da8d1b6274e5e7613c534f0108d3db2e_0
d7989aaf06b0 75372dded3f2 "kube-scheduler --au…" 7 minutes ago Up 7 minutes k8s_kube-scheduler_kube-scheduler-rpi4-01_kube-system_f543c94683059cb32a4441e29fbdb238_0
a37101185c86 k8s.gcr.io/pause:3.2 "/pause" 7 minutes ago Up 7 minutes k8s_POD_etcd-rpi4-01_kube-system_886a564097da8246af6636713fab1b64_0
e12b75f88946 k8s.gcr.io/pause:3.2 "/pause" 7 minutes ago Up 7 minutes k8s_POD_kube-vip-rpi4-01_kube-system_09c43c4aab4ab14ec3bf2f1c26ec19c0_0
7cb3e9e6f2b9 k8s.gcr.io/pause:3.2 "/pause" 7 minutes ago Up 7 minutes k8s_POD_kube-scheduler-rpi4-01_kube-system_f543c94683059cb32a4441e29fbdb238_0
The API server is running on 6444 (as expected with --api-server-bind-port
)
pi@rpi4-01:~ $ docker ps --no-trunc | grep 6444
abc58645ca41052486dc8141246a564296accb20de4758eaf51c0184022fdb5d sha256:d0cd31ffbe54ecfe10d03f8442e23393bd16cdc94412b3b0a996fe80e3e11e44 "kube-apiserver --advertise-address=192.168.20.12 --allow-privileged=true --authorization-mode=Node,RBAC --client-ca-file=/etc/kubernetes/pki/ca.crt --enable-admission-plugins=NodeRestriction --enable-bootstrap-token-auth=true --etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt --etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt --etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key --etcd-servers=https://127.0.0.1:2379 --insecure-port=0 --kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt --kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname --proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt --proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key --requestheader-allowed-names=front-proxy-client --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt --requestheader-extra-headers-prefix=X-Remote-Extra- --requestheader-group-headers=X-Remote-Group --requestheader-username-headers=X-Remote-User --secure-port=6444 --service-account-key-file=/etc/kubernetes/pki/sa.pub --service-cluster-ip-range=10.96.0.0/12 --tls-cert-file=/etc/kubernetes/pki/apiserver.crt --tls-private-key-file=/etc/kubernetes/pki/apiserver.key" 8 minutes ago Up 8 minutes k8s_kube-apiserver_kube-apiserver-rpi4-01_kube-system_da8d1b6274e5e7613c534f0108d3db2e_0
docker logs on the kube-vip pod shows
E0921 09:16:17.036814 1 leaderelection.go:321] error retrieving resource lock kube-system/plunder-lock: Get https://rpi4-01:6443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/plunder-lock: dial tcp 192.168.20.12:6443: connect: connection refused
So kube-vip is trying to talk to kube-api (for leader election?) using 6443 for the API - that should be 6444?
So I've just tried the process again without the --api-server-bind-port
flag and it's initialised correctly - so this might just turn out to be a documentation change to remote the flag?
This should help where kubernetes reports an endpoint as active but perhaps the docker engine isn't playing ball.
d := net.Dialer{Timeout: timeout}
conn, err := d.Dial("tcp", addr)
if err != nil {
// handle error
}
On large clusters, leases can take too long to acquire resulting in a bouncing VIP. Adding customisation should make this easier to modify the kube-vip
configuration on an existing cluster.
cc /@yastij
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.