cookeem / kubeadm-ha Goto Github PK

通过kubeadm安装kubernetes高可用集群，使用docker/containerd容器运行时，适用v1.24.x以上版本

License: MIT License

kubernetes kubeadm ha high-availability cluster nginx keepalived istio prometheus traefik containerd

kubeadm-ha's Introduction

通过kubeadm安装kubernetes高可用集群(支持docker和containerd作为kubernetes的容器运行时)

部署节点信息

hostname	ip address	comment
k8s-master01	192.168.0.101	kubernetes 控制平面主机 master01
k8s-master02	192.168.0.102	kubernetes 控制平面主机 master02
k8s-master03	192.168.0.103	kubernetes 控制平面主机 master03
k8s-vip	192.168.0.100	kubernetes 浮动IP，通过keepalived创建，如果使用公有云请预先申请该浮动IP

# 各节点请添加主机名解释
cat << EOF >> /etc/hosts
192.168.0.100    k8s-vip
192.168.0.101    k8s-master01
192.168.0.102    k8s-master02
192.168.0.103    k8s-master03
EOF

架构说明

演示需要，只部署3个高可用的master节点
使用keepalived和nginx作为高可用的负载均衡器，通过dorycli命令行工具生成负载均衡器的配置，并通过docker-compose部署负载均衡器
容器运行时使用docker，cri-socket使用cri-dockerd连接docker和kubernetes

版本信息

# 操作系统版本: Debian 11
$ lsb_release -a
No LSB modules are available.
Distributor ID:     Debian
Description:        Debian GNU/Linux 11 (bullseye)
Release:            11
Codename:           bullseye

# docker版本: 24.0.5
$ docker version
Client: Docker Engine - Community
 Version:           24.0.5
 API version:       1.43
 Go version:        go1.20.6
 Git commit:        ced0996
 Built:             Fri Jul 21 20:35:45 2023
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          24.0.5
  API version:      1.43 (minimum version 1.12)
  Go version:       go1.20.6
  Git commit:       a61e2b4
  Built:            Fri Jul 21 20:35:45 2023
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.22
  GitCommit:        8165feabfdfe38c65b599c4993d227328c231fca
 runc:
  Version:          1.1.8
  GitCommit:        v1.1.8-0-g82f18fe
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

# cri-dockerd版本: 0.3.4
$ cri-dockerd --version
cri-dockerd 0.3.4 (e88b1605)

# dorycli版本: v1.6.4
$ dorycli version
dorycli version: v1.6.4
install dory-engine version: v2.6.4
install dory-console version: v2.6.4


# kubeadm版本: v1.28.0
$ kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"28", GitVersion:"v1.28.0", GitCommit:"855e7c48de7388eb330da0f8d9d2394ee818fb8d", GitTreeState:"clean", BuildDate:"2023-08-15T10:20:15Z", GoVersion:"go1.20.7", Compiler:"gc", Platform:"linux/amd64"}

# kubernetes版本: v1.28.0
$ kubectl get nodes
NAME           STATUS   ROLES           AGE   VERSION
k8s-master01   Ready    control-plane   35m   v1.28.0
k8s-master02   Ready    control-plane   31m   v1.28.0
k8s-master03   Ready    control-plane   30m   v1.28.0

安装docker

在所有节点安装docker服务

# 安装基础软件
apt-get update
apt-get install -y sudo wget ca-certificates curl gnupg htop git jq tree

# 安装docker
install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/debian/gpg | gpg --dearmor -o /etc/apt/keyrings/docker.gpg
chmod a+r /etc/apt/keyrings/docker.gpg
echo   "deb [arch="$(dpkg --print-architecture)" signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/debian \
  "$(. /etc/os-release && echo "$VERSION_CODENAME")" stable" |   tee /etc/apt/sources.list.d/docker.list > /dev/null
apt-get update
apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin docker-compose

# 检查docker版本
docker version

# 设置docker参数
cat << EOF > /etc/docker/daemon.json
{
    "exec-opts": ["native.cgroupdriver=systemd"],
    "log-driver": "json-file",
    "log-opts": {
        "max-size": "100m"
    },
    "storage-driver": "overlay2"
}
EOF

# 重启docker服务
systemctl restart docker
systemctl status docker

# 验证docker服务是否正常
docker images
docker pull busybox
docker run --rm busybox uname -m

安装kubernetes

在所有节点安装kubernetes相关软件

# 安装kubernetes相关组件
curl https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | apt-key add - 
cat <<EOF >/etc/apt/sources.list.d/kubernetes.list
deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main
EOF
apt-get update
apt-get install -y kubelet kubeadm kubectl
kubeadm version

# 获取kubernetes所需要的镜像
kubeadm config images list --image-repository registry.cn-hangzhou.aliyuncs.com/google_containers
export PAUSE_IMAGE=$(kubeadm config images list --image-repository registry.cn-hangzhou.aliyuncs.com/google_containers | grep pause)

# 注意pause镜像用于配置cri-dockerd的启动参数
# 应该是输出 registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.9
echo $PAUSE_IMAGE

# 安装cri-dockerd，用于连接kubernetes和docker
wget https://github.com/Mirantis/cri-dockerd/releases/download/v0.3.4/cri-dockerd-0.3.4.amd64.tgz
tar zxvf cri-dockerd-0.3.4.amd64.tgz 
cd cri-dockerd/
mkdir -p /usr/local/bin
install -o root -g root -m 0755 cri-dockerd /usr/local/bin/cri-dockerd

# 创建cri-docker.socket启动文件
cat << EOF > /etc/systemd/system/cri-docker.socket
[Unit]
Description=CRI Docker Socket for the API
PartOf=cri-docker.service

[Socket]
ListenStream=%t/cri-dockerd.sock
SocketMode=0660
SocketUser=root
SocketGroup=docker

[Install]
WantedBy=sockets.target
EOF

# 创建cri-docker.service启动文件
# 注意设置pause容器镜像信息 --pod-infra-container-image=$PAUSE_IMAGE
cat << EOF > /etc/systemd/system/cri-docker.service
[Unit]
Description=CRI Interface for Docker Application Container Engine
Documentation=https://docs.mirantis.com
After=network-online.target firewalld.service docker.service
Wants=network-online.target
Requires=cri-docker.socket

[Service]
Type=notify
ExecStart=/usr/local/bin/cri-dockerd --container-runtime-endpoint fd:// --pod-infra-container-image=$PAUSE_IMAGE
ExecReload=/bin/kill -s HUP \$MAINPID
TimeoutSec=0
RestartSec=2
Restart=always

# Note that StartLimit* options were moved from "Service" to "Unit" in systemd 229.
# Both the old, and new location are accepted by systemd 229 and up, so using the old location
# to make them work for either version of systemd.
StartLimitBurst=3

# Note that StartLimitInterval was renamed to StartLimitIntervalSec in systemd 230.
# Both the old, and new name are accepted by systemd 230 and up, so using the old name to make
# this option work for either version of systemd.
StartLimitInterval=60s

# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity

# Comment TasksMax if your systemd version does not support it.
# Only systemd 226 and above support this option.
TasksMax=infinity
Delegate=yes
KillMode=process

[Install]
WantedBy=multi-user.target
EOF

# 启动cri-dockerd
systemctl daemon-reload
systemctl enable --now cri-docker.socket
systemctl restart cri-docker
systemctl status cri-docker

# 通过kubeadm预先拉取所需的容器镜像
kubeadm config images pull --image-repository registry.cn-hangzhou.aliyuncs.com/google_containers --cri-socket unix:///var/run/cri-dockerd.sock
docker images

在k8s-master01节点通过dorycli创建并启动高可用负载均衡器: keepalived, nginx-lb
dorycli项目地址: https://github.com/dory-engine/dorycli

# 安装dorycli
cd /root
wget https://github.com/dory-engine/dorycli/releases/download/v1.6.4/dorycli-v1.6.4-linux-amd64.tgz
tar zxvf dorycli-v1.6.4-linux-amd64.tgz
chmod a+x dorycli
mv dorycli /usr/bin/

# 设置dorycli的自动完成，可以通过键盘TAB键自动补全子命令和参数
dorycli completion bash -h
source <(dorycli completion bash)
dorycli completion bash > /etc/bash_completion.d/dorycli

# 使用dorycli打印高可用负载均衡器配置信息，并保存到kubeadm-ha.yaml
dorycli install ha print --language zh > kubeadm-ha.yaml

# 根据实际情况修改kubeadm-ha.yaml的配置信息
# 可以通过以下命令获取各个主机的网卡名字
ip address

# 本例子的配置如下，请根据实际情况修改配置
cat kubeadm-ha.yaml
# 需要安装的kubernetes的版本
version: "v1.28.0"
# kubernetes的镜像仓库设置，如果不设置，那么使用官方的默认镜像仓库
imageRepository: "registry.cn-hangzhou.aliyuncs.com/google_containers"
# 使用keepalived创建的高可用kubernetes集群的浮动ip地址
virtualIp: 192.168.0.100
# 使用nginx映射的高可用kubernetes集群的apiserver映射端口
virtualPort: 16443
# 浮动ip地址映射的主机名，请在/etc/hosts配置文件中进行主机名映射设置
virtualHostname: k8s-vip
# kubernetes的容器运行时socket
# docker情况下: unix:///var/run/cri-dockerd.sock
# containerd情况下: unix:///var/run/containerd/containerd.sock
# cri-o情况下: unix:///var/run/crio/crio.sock
criSocket: unix:///var/run/cri-dockerd.sock
# kubernetes集群的pod子网地址，如果不设置，使用默认的pod子网地址
podSubnet: "10.244.0.0/24"
# kubernetes集群的service子网地址，如果不设置，使用默认的service子网地址
serviceSubnet: "10.96.0.0/16"
# keepalived的鉴权密码，如果不设置那么使用随机生成的密码
keepAlivedAuthPass: ""
# kubernetes的controlplane控制平面的主机配置，高可用master节点数量必须为单数并且至少3台
masterHosts:
    # master节点的主机名，请在/etc/hosts配置文件中进行主机名映射设置
  - hostname: k8s-master01
    # master节点的IP地址
    ipAddress: 192.168.0.101
    # master节点互访使用的网卡名字，用于keepalived网卡绑定
    networkInterface: eth0
    # keepalived选举优先级，数值越大优先级越高，各个master节点的优先级不能一样
    keepalivedPriority: 120
    # master节点的主机名，请在/etc/hosts配置文件中进行主机名映射设置
  - hostname: k8s-master02
    # master节点的IP地址
    ipAddress: 192.168.0.102
    # master节点互访使用的网卡名字，用于keepalived网卡绑定
    networkInterface: eth0
    # keepalived选举优先级，数值越大优先级越高，各个master节点的优先级不能一样
    keepalivedPriority: 110
    # master节点的主机名，请在/etc/hosts配置文件中进行主机名映射设置
  - hostname: k8s-master03
    # master节点的IP地址
    ipAddress: 192.168.0.103
    # master节点互访使用的网卡名字，用于keepalived网卡绑定
    networkInterface: eth0
    # keepalived选举优先级，数值越大优先级越高，各个master节点的优先级不能一样
    keepalivedPriority: 100

# 通过dorycli创建可用负载均衡器配置信息，并且把生成的配置输出到当前目录
# 执行命名后，会输出生成的文件说明，以及启动配置文件说明
dorycli install ha script -o . -f kubeadm-ha.yaml --language zh

# 查看dorycli生成的kubeadm-config.yaml配置文件，该配置文件用于kubeadm init初始化kubernetes集群用途
# 本例子生成的配置如下:
cat kubeadm-config.yaml
---
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
kubernetesVersion: v1.28.0
imageRepository: registry.cn-hangzhou.aliyuncs.com/google_containers
apiServer:
  certSANs:
    - "k8s-vip"
    - "192.168.0.100"
    - "k8s-master01"
    - "192.168.0.101"
    - "k8s-master02"
    - "192.168.0.102"
    - "k8s-master03"
    - "192.168.0.103"
controlPlaneEndpoint: "192.168.0.100:16443"
networking:
  podSubnet: "10.244.0.0/24"
  serviceSubnet: "10.96.0.0/16"
---
apiVersion: kubeadm.k8s.io/v1beta3
kind: InitConfiguration
nodeRegistration:
  criSocket: unix:///var/run/cri-dockerd.sock

# 设置master节点的kubernetes高可用负载均衡器的文件路径
export LB_DIR=/data/k8s-lb

# 把高可用负载均衡器的文件复制到k8s-master01
ssh k8s-master01 mkdir -p ${LB_DIR}
scp -r k8s-master01/nginx-lb k8s-master01/keepalived root@k8s-master01:${LB_DIR}

# 在 k8s-master01 节点上启动高可用负载均衡器
ssh k8s-master01 "cd ${LB_DIR}/keepalived/ && docker-compose stop && docker-compose rm -f && docker-compose up -d"
ssh k8s-master01 "cd ${LB_DIR}/nginx-lb/ && docker-compose stop && docker-compose rm -f && docker-compose up -d"

# 把高可用负载均衡器的文件复制到k8s-master02
ssh k8s-master02 mkdir -p ${LB_DIR}
scp -r k8s-master02/nginx-lb k8s-master02/keepalived root@k8s-master02:${LB_DIR}

# 在 k8s-master02 节点上启动高可用负载均衡器
ssh k8s-master02 "cd ${LB_DIR}/keepalived/ && docker-compose stop && docker-compose rm -f && docker-compose up -d"
ssh k8s-master02 "cd ${LB_DIR}/nginx-lb/ && docker-compose stop && docker-compose rm -f && docker-compose up -d"

# 把高可用负载均衡器的文件复制到k8s-master03
ssh k8s-master03 mkdir -p ${LB_DIR}
scp -r k8s-master03/nginx-lb k8s-master03/keepalived root@k8s-master03:${LB_DIR}

# 在 k8s-master03 节点上启动高可用负载均衡器
ssh k8s-master03 "cd ${LB_DIR}/keepalived/ && docker-compose stop && docker-compose rm -f && docker-compose up -d"
ssh k8s-master03 "cd ${LB_DIR}/nginx-lb/ && docker-compose stop && docker-compose rm -f && docker-compose up -d"

# 在各个master节点上检验浮动IP是否已经创建，正常情况下浮动IP绑定在 k8s-master01 上
ip address

初始化高可用kubernetes集群

# 在k8s-master01上使用kubeadm-config.yaml配置文件初始化高可用集群
kubeadm init --config=kubeadm-config.yaml --upload-certs
# kubeadm init命令将会输出以下提示，使用该提示在其他master节点执行join操作
You can now join any number of the control-plane node running the following command on each as root:

  kubeadm join 192.168.0.100:16443 --token tgszyf.c9dicrflqy85juaf \
    --discovery-token-ca-cert-hash sha256:xxx \
    --control-plane --certificate-key xxx

Please note that the certificate-key gives access to cluster sensitive data, keep it secret!
As a safeguard, uploaded-certs will be deleted in two hours; If necessary, you can use
"kubeadm init phase upload-certs --upload-certs" to reload certs afterward.

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 192.168.0.100:16443 --token tgszyf.c9dicrflqy85juaf \
    --discovery-token-ca-cert-hash sha256:xxx 


  kubeadm join 192.168.0.100:16443 --token tgszyf.c9dicrflqy85juaf \
    --discovery-token-ca-cert-hash sha256:xxx \
    --control-plane --certificate-key xxx

# 在k8s-master02 和 k8s-master03节点上执行以下命令，把k8s-master02 和 k8s-master03加入到高可用kubernetes集群
# 记住kubeadm join命令需要设置--cri-socket unix:///var/run/cri-dockerd.sock
kubeadm join 192.168.0.100:16443 --token tgszyf.c9dicrflqy85juaf \
        --discovery-token-ca-cert-hash sha256:xxx \
        --control-plane --certificate-key xxx --cri-socket unix:///var/run/cri-dockerd.sock

# 在所有master节点上设置kubectl访问kubernetes集群
mkdir -p $HOME/.kube
cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
chown $(id -u):$(id -g) $HOME/.kube/config

# 在所有master节点上设置kubectl的自动完成，可以通过键盘TAB键自动补全子命令和参数
kubectl completion -h
kubectl completion bash > ~/.kube/completion.bash.inc
printf "
# Kubectl shell completion
source '$HOME/.kube/completion.bash.inc'
" >> $HOME/.bash_profile
source $HOME/.bash_profile

# 在k8s-master01节点上安装cilium网络组件
wget https://github.com/cilium/cilium-cli/releases/download/v0.15.6/cilium-linux-amd64.tar.gz
tar zxvf cilium-linux-amd64.tar.gz 
mv cilium /usr/local/bin/
cilium install --version 1.14.0 --set cni.chainingMode=portmap

# 设置所有master允许调度pod
kubectl taint nodes --all node-role.kubernetes.io/control-plane-

# 检查所有pod状态是否正常
kubectl get pods -A -o wide
NAMESPACE              NAME                                         READY   STATUS    RESTARTS      AGE     IP              NODE           NOMINATED NODE   READINESS GATES
kube-system            cilium-mwvsr                                 1/1     Running   0             21m     192.168.0.102   k8s-master02   <none>           <none>
kube-system            cilium-operator-b4dfbf784-zgr7v              1/1     Running   0             21m     192.168.0.102   k8s-master02   <none>           <none>
kube-system            cilium-v27l2                                 1/1     Running   0             21m     192.168.0.103   k8s-master03   <none>           <none>
kube-system            cilium-zbcdj                                 1/1     Running   0             21m     192.168.0.101   k8s-master01   <none>           <none>
kube-system            coredns-6554b8b87f-kp7tn                     1/1     Running   0             30m     10.0.2.231      k8s-master03   <none>           <none>
kube-system            coredns-6554b8b87f-zlhgx                     1/1     Running   0             30m     10.0.2.197      k8s-master03   <none>           <none>
kube-system            etcd-k8s-master01                            1/1     Running   0             30m     192.168.0.101   k8s-master01   <none>           <none>
kube-system            etcd-k8s-master02                            1/1     Running   0             26m     192.168.0.102   k8s-master02   <none>           <none>
kube-system            etcd-k8s-master03                            1/1     Running   0             25m     192.168.0.103   k8s-master03   <none>           <none>
kube-system            kube-apiserver-k8s-master01                  1/1     Running   0             30m     192.168.0.101   k8s-master01   <none>           <none>
kube-system            kube-apiserver-k8s-master02                  1/1     Running   0             26m     192.168.0.102   k8s-master02   <none>           <none>
kube-system            kube-apiserver-k8s-master03                  1/1     Running   1 (25m ago)   25m     192.168.0.103   k8s-master03   <none>           <none>
kube-system            kube-controller-manager-k8s-master01         1/1     Running   1 (26m ago)   30m     192.168.0.101   k8s-master01   <none>           <none>
kube-system            kube-controller-manager-k8s-master02         1/1     Running   0             26m     192.168.0.102   k8s-master02   <none>           <none>
kube-system            kube-controller-manager-k8s-master03         1/1     Running   0             24m     192.168.0.103   k8s-master03   <none>           <none>
kube-system            kube-proxy-gr2pt                             1/1     Running   0             26m     192.168.0.102   k8s-master02   <none>           <none>
kube-system            kube-proxy-rkb9b                             1/1     Running   0             30m     192.168.0.101   k8s-master01   <none>           <none>
kube-system            kube-proxy-rvmv4                             1/1     Running   0             25m     192.168.0.103   k8s-master03   <none>           <none>
kube-system            kube-scheduler-k8s-master01                  1/1     Running   1 (26m ago)   30m     192.168.0.101   k8s-master01   <none>           <none>
kube-system            kube-scheduler-k8s-master02                  1/1     Running   0             26m     192.168.0.102   k8s-master02   <none>           <none>
kube-system            kube-scheduler-k8s-master03                  1/1     Running   0             23m     192.168.0.103   k8s-master03   <none>           <none>

# 检查所有节点状态是否正常
kubectl get nodes
NAME           STATUS   ROLES           AGE   VERSION
k8s-master01   Ready    control-plane   31m   v1.28.0
k8s-master02   Ready    control-plane   27m   v1.28.0
k8s-master03   Ready    control-plane   26m   v1.28.0

# 测试部署应用到kubernetes集群
# 部署一个nginx应用，并暴露到nodePort31000
kubectl run nginx --image=nginx:1.23.1-alpine --image-pull-policy=IfNotPresent --port=80 -l=app=nginx
kubectl create service nodeport nginx --tcp=80:80 --node-port=31000
curl k8s-vip:31000

[可选] 安装管理界面 kubernetes-dashboard

为了管理kubernetes中部署的应用，推荐使用kubernetes-dashboard
要了解更多，请阅读官方代码仓库README.md文档: kubernetes-dashboard
安装:

# 安装 kubernetes-dashboard
kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.5.1/aio/deploy/recommended.yaml

# 调整kubernetes-dashboard服务使用nodePort暴露端口
cat << EOF | kubectl apply -f -
apiVersion: v1
kind: Service
metadata:
  labels:
    k8s-app: kubernetes-dashboard
  name: kubernetes-dashboard
  namespace: kubernetes-dashboard
spec:
  ports:
  - port: 443
    protocol: TCP
    targetPort: 8443
    nodePort: 30000
  selector:
    k8s-app: kubernetes-dashboard
  type: NodePort
EOF

# 创建管理员serviceaccount
kubectl create serviceaccount -n kube-system admin-user --dry-run=client -o yaml | kubectl apply -f -

# 创建管理员clusterrolebinding
kubectl create clusterrolebinding admin-user --clusterrole=cluster-admin --serviceaccount=kube-system:admin-user --dry-run=client -o yaml | kubectl apply -f -

# 手动创建serviceaccount的secret
cat << EOF | kubectl apply -f -
apiVersion: v1
kind: Secret
metadata:
  name: admin-user-secret
  namespace: kube-system
  annotations:
    kubernetes.io/service-account.name: admin-user
type: kubernetes.io/service-account-token
EOF

# 获取kubernetes管理token
kubectl -n kube-system get secret admin-user-secret -o jsonpath='{ .data.token }' | base64 -d

# 使用浏览器访问kubernetes-dashboard: https://k8s-vip:30000
# 使用kubernetes管理token登录kubernetes-dashboard

[可选] 安装ingress控制器 traefik

要使用kubernetes的ingress功能，必须安装ingress controller，推荐使用traefik
要了解更多，请阅读官方网站文档: traefik
在kubernetes所有master节点部署traefik:

# 拉取 traefik helm repo
helm repo add traefik https://traefik.github.io/charts
helm fetch traefik/traefik --untar

# 以daemonset方式部署traefik
cat << EOF > traefik.yaml
deployment:
  kind: DaemonSet
image:
  name: traefik
  tag: v2.6.4
ports:
  web:
    hostPort: 80
  websecure:
    hostPort: 443
service:
  type: ClusterIP
EOF

# 安装traefik
kubectl create namespace traefik --dry-run=client -o yaml | kubectl apply -f -
helm install -n traefik traefik traefik/ -f traefik.yaml

# 检查安装情况
helm -n traefik list
kubectl -n traefik get pods -o wide
kubectl -n traefik get services -o wide

# 检验traefik安装是否成功，如果输出 404 page not found 表示成功
curl k8s-vip
curl -k https://k8s-vip

[可选] 安装性能数据采集工具 metrics-server

为了使用kubernetes的水平扩展缩容功能horizontal pod autoscale，必须安装metrics-server
要了解更多，请阅读官方代码仓库README.md文档: metrics-server

# 拉取镜像
docker pull registry.aliyuncs.com/google_containers/metrics-server:v0.6.1
docker tag registry.aliyuncs.com/google_containers/metrics-server:v0.6.1 k8s.gcr.io/metrics-server/metrics-server:v0.6.1

# 获取metrics-server安装yaml
curl -O -L https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.6.1/components.yaml
# 添加--kubelet-insecure-tls参数
sed -i 's/- args:/- args:\n        - --kubelet-insecure-tls/g' components.yaml
# 安装metrics-server
kubectl apply -f components.yaml

# 等待metrics-server正常
kubectl -n kube-system get pods -l=k8s-app=metrics-server

# 查看节点的metrics
kubectl top nodes
NAME           CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
k8s-master01   146m         7%     2284Mi          59%       
k8s-master02   123m         6%     2283Mi          59%       
k8s-master03   114m         5%     2180Mi          57%

安装metrics-server后kubernetes-dashboard也可以显示性能数据

[可选] 安装服务网格 istio

要使用服务网格的混合灰度发布能力，需要部署istio服务网格
要了解更多，请阅读istio官网文档: istio.io

# 安装istioctl，客户端下载地址 https://github.com/istio/istio/releases/tag/1.18.2

# 下载并安装istioctl
wget https://github.com/istio/istio/releases/download/1.18.2/istioctl-1.18.2-linux-amd64.tar.gz
tar zxvf istioctl-1.18.2-linux-amd64.tar.gz
mv istioctl /usr/bin/

# 确认istioctl版本
istioctl version

# 使用istioctl部署istio到kubernetes
istioctl install --set profile=demo \
--set values.gateways.istio-ingressgateway.type=ClusterIP \
--set values.global.imagePullPolicy=IfNotPresent \
--set values.global.proxy_init.resources.limits.cpu=100m \
--set values.global.proxy_init.resources.limits.memory=100Mi \
--set values.global.proxy.resources.limits.cpu=100m \
--set values.global.proxy.resources.limits.memory=100Mi

# 检查istio部署情况
kubectl -n istio-system get pods,svc

[可选] 应用上云引擎 Dory-Engine

🚀🚀🚀 Dory-Engine平台工程最佳实践 (https://www.bilibili.com/video/BV1oM4y117Pj/)

Dory-Engine 是一个非常简单的应用上云引擎，开发人员不用学、不用写、不用配就可以自行把自己编写的程序从源代码，编译、打包、部署到各类k8s环境或者主机环境中。

不用学: 不需要学习如何编写复杂的上云脚本和如何部署应用到k8s，所有配置都所见即所得一看就懂
不用写: 不需要编写复杂的构建、打包、部署的上云脚本，也不需要编写复杂的k8s应用部署文件，只需要几项简单的配置就可以设置好自己的上云流水线
不用配: 不需要配置各个DevOps工具链和k8s环境如何互相配合完成应用上云，项目一开通所有工具链和环境自动完成配置

安装指引参见: https://github.com/dory-engine/dorycli

🚀🚀🚀 使用dorycli安装部署Dory-Engine (https://www.bilibili.com/video/BV1aG411D7Sj/)

kubeadm-ha's People

Contributors

Stargazers

Watchers

Forkers

humin11 happyabc wenh123 davaddi xfstudio awellock leizhu900516 linuxvip zhiphe lq199302 canghai908 blu3gui7ar wangood orimanabu eddie-he qianmoke j050622 liaoyizhi penfree homernajafi gxjluck haikuo81 wangande lioncruise videoamp xiaodong84 sidney9217 delapsley glwlg ggaaooppeenngg yzzhu cetsupport dl528888 miry regardfs triton086 cuihailin eehuangyanwen elonmia whychoice ghulevishal lostar01 aspros-zhong zhuwenjie11 elmeramigleo ading1977 xbtian zhushilu wanghaibo nbarnum coolpalani rajivece nazzour kjm0001 evalle matri xinsfang wangzheng422 royalpay cwdgit blankxyz haply duyong1008 ergin-ozekes coullin yourfinger zoumin1516 zhxjdwh berny9015 thonatos weelet acloudiator len-2017 colegatron concrete-cristian-trucco haroldcoding rhinoceros thienly gokulchandrap allenshi hjsw1 iferrarims tingweiwu powellzhang baseyou iiishine amucode dshamanthreddy lflxp srimanchiru chenqing24 shaojielinux weir2010 rechal-liu ankravch malli1983 philipxiefei kevin-penn lowang-bh forging2012

kubeadm-ha's Issues

istio的使用方法

非常感谢您的项目，很强大，自己摸索着结合项目中1.11和1.9的教程，搭起来了1.11的单master集群。
想请教一下项目主在实际工作中是如何使用istio的。 @cookeem

初始化异常

初始化配置文件，podSubnet
networking:
podSubnet: 10.244.0.0/16

但是初始化的日志显示如下内容：
Feb 7 08:23:46 master1 kubelet: I0207 08:23:46.268481 6924 kuberuntime_manager.go:918] updating runtime config through cri with podcidr 10.244.0.0/24
Feb 7 08:23:46 master1 kubelet: I0207 08:23:46.268679 6924 docker_service.go:343] docker cri received runtime config &RuntimeConfig{NetworkConfig:&NetworkConfig{PodCidr:10.244.0.0/24,},}
Feb 7 08:23:46 master1 kubelet: I0207 08:23:46.268838 6924 kubelet_network.go:196] Setting Pod CIDR: -> 10.244.0.0/24

这是什么情况呢？之前我部署集群的时候发现flannel启动失败的问题，但是意外发现了如下的日志信息，着造成了flannel的networking字段与CIDR不一致，我猜测是flannel启动不成功的原因，之后修改flannel的networking进行匹配，成功启动。

verify installation

Hi cookeem !

In last steop deploy nginx application to verify installation, I must use an yml file to create Pod and Service.

I need config about hostnetwork in Deploy to query example (see: https://github.com/projectcalico/felix/issues/1361)

If don't have "hostNetwork: true", command curl always timeout.

this my sample:

apiVersion: v1
kind: Service
metadata:
  name: my-nginx
  labels:
    app: my-nginx
spec:
  type: NodePort
  ports:
  - port: 80
  selector:
    app: my-nginx
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: my-nginx
  labels:
    app: my-nginx
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: my-nginx
    spec:
      hostNetwork: true
      containers:
      - name: nginx
        image: nginx
        ports:
          - containerPort: 80

v1.11.1 安装部署问题，pod创建失败

@cookeem 大神好，一直在关注您的k8s安装教程，我这边试了一下v1.11.1-ha的搭建，安装到keeplived、nginx-lb为止的一些列步骤都成功了，很赞！中间有部分问题，自行解决了，到部署metics-server和dashboard时提示创建失败，podIp未分配，不知是什么原因，我把相关日志截图给您看看，怀疑是防火墙问题，我把3个master防火墙都停掉了，但问题依旧

您好，问下K8SHA_CALICO_REACHABLE_IP

calico reachable ip address

export K8SHA_CALICO_REACHABLE_IP=192.168.60.1
您好，这一步的参数的192.168.60.1这个IP是我们服务器的网关地址吗？还是calico自用的IP和内网环境无关？

node加入集群后/etc/kubernetes/kubelet.conf需要修改api地址为HA-IP

只有kube-proxy指向的是HA-IP
root@ubuntu:/etc/kubernetes# netstat -alnp |grep 6443
tcp 0 0 0.0.0.0:6443 0.0.0.0:* LISTEN 1337/haproxy
tcp 0 0 192.168.1.148:33958 192.168.4.130:6443 ESTABLISHED 22089/kube-proxy
tcp 0 0 192.168.1.148:43178 192.168.1.146:6443 ESTABLISHED 21876/kubelet

146为本地IP，4.130为我的HA-IP
/etc/kubernetes/kubelet.conf 修改api地址后

root@ubuntu:/etc/kubernetes# vim /etc/kubernetes/kubelet.conf
root@ubuntu:/etc/kubernetes# systemctl restart docker && systemctl restart kubelet
root@ubuntu:/etc/kubernetes# netstat -alnp |grep 6443
tcp 0 0 0.0.0.0:6443 0.0.0.0:* LISTEN 1337/haproxy
tcp 0 0 192.168.1.148:34212 192.168.4.130:6443 ESTABLISHED 22795/kubelet
tcp 0 0 192.168.1.148:34232 192.168.4.130:6443 ESTABLISHED 23008/kube-proxy

Any ubuntu version ????

Hello,
very good job!
Is there any ubuntu (or debian based) version of this script ??

Best

Worker nodes not visible via kubectl get node on VMs with several network interfaces

Good day,

First of all, thank you for thorough guide. It helped me a lot!

Problem description
OS: CentOS Linux release 7.4.1708 (Core)

I'm using VMs created by vagrant and by default have several network interfaces

eth0:  flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.0.2.15  netmask 255.255.255.0  broadcast 10.0.2.255
        inet6 fe80::5054:ff:fead:3b43  prefixlen 64  scopeid 0x20<link>
        ether 52:54:00:ad:3b:43  txqueuelen 1000  (Ethernet)
        RX packets 141997  bytes 158528777 (151.1 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 40171  bytes 2511712 (2.3 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.120.10  netmask 255.255.255.0  broadcast 192.168.120.255
        inet6 fe80::a00:27ff:fe13:acfa  prefixlen 64  scopeid 0x20<link>
        ether 08:00:27:13:ac:fa  txqueuelen 1000  (Ethernet)
        RX packets 127504  bytes 39089921 (37.2 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 109964  bytes 14854427 (14.1 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

master1 has ip 192.168.120.10
master2 - 192.168.120.82
master3 - 192.168.120.83

Following step

on devops-master01: use kubeadm to init a kubernetes cluster, notice: you must save the following message: kubeadm join --token XXX --discovery-token-ca-cert-hash YYY , this command will use lately.

I recive join on eth0 network interface.
kubeadm join --token 7f276c.0741d82a5337f526 10.0.2.15:6443 --discovery-token-ca-cert-hash sha256:c1c15936be9b5c4429cf14074706927a410a150ccb334d6823257cd450f2fe42

Adding worker nodes with this line corrupts result of kubectl get node. There are only master nodes.

My solution
If i edit kubeadm-init.yaml and add advertiseAddress. Everything works fine and kubectl get node also return worker nodes.

Kubeadm-init.yaml example:

apiVersion: kubeadm.k8s.io/v1alpha1
kind: MasterConfiguration
api:
  advertiseAddress: 192.168.120.83
kubernetesVersion: v1.9.1
networking:
  podSubnet: 10.244.0.0/16
  serviceSubnet: 10.96.0.0/12
apiServerCertSANs:
- kuber.master
- kuber.master2
- kuber.master3
- 192.168.120.10
- 192.168.120.82
- 192.168.120.83
- 192.168.120.2
- 127.0.0.1
etcd:
  endpoints:
  - http://192.168.120.10:2379
  - http://192.168.120.82:2379
  - http://192.168.120.83:2379
token: 7f276c.0741d82a5337f526
tokenTTL: "0"

Questions

Am i doing something wrong?
Maybe we should add advertiseAddress in guide/your project workflow?

楼主你好。关于flannel和calico

官方文档里面，kubeadm init的时候一般只需要一种网络模式就好了。
这里同时用了flannel和calico，是不是可以这么理解，flannel是用在master nodes之间的组网？ calico是cluster node之间的网络？
谢谢！

Q&A

Hello,

I have a question. Though this is not related to Kube-HA, since you are the expert, I would like to ask..
Do we not need to set up Ingress Controller?
what kind of features would Ingress Controller bring to this HA setup?

Thanks,
Rock

Heapster doesn't work

Hello, first of all thanks for you HOWTO, it made possible to easily create an HA k8s cluster in about 1-hour! I've used the 1.7 version, just replacing the 1.7.0 components for the latest minor version (1.7.8), everything works fine a part from Heapster. There's no way to get the Dashboard to display anything, and I can't see any useful message in logs. What could it be?

kubectl exec always timeout

I'm using 1.7.x to set up ha k8s. Everything is fine except I always got timeout when kubectl exec -it <pod-name> -- /bin/sh. The error are like

kubectl -n kube-system exec -it kube-scheduler-app-web39v33 -- /bin/sh
Error from server: error dialing backend: dial tcp 10.83.1.1:10250: getsockopt: connection timed out

KUBECONFIG settings cannot be used in general user account

If user runs in the general account, I suggest run the following instruction instead of "export KUBECONFIG=/etc/kubernetes/admin.conf".

sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
export KUBECONFIG=$HOME/.kube/config

nodes are not joined

Using v1.7, nodes are not joined. I scp /etc/kubernetes to other masters, then systemctl daemon-reload && systemctl restart kubelet followed by systemctl status kubelet. It is running; however only the initial node shows up. Should we not be sing the kubeadm join command around this point?

Language Preference

How can I select english in K8s deployment while I follow instructions using your repo.

请问离线安装需要注意哪些事项呢

谢谢，如题！

Add configuring network interface for flannel

According to official flannel documentation there is a known issue with flannel and vagrant:

Vagrant typically assigns two interfaces to all VMs. The first, for which all hosts are assigned the IP address 10.0.2.15, is for external traffic that gets NATed.

This may lead to problems with flannel. By default, flannel selects the first interface on a host. This leads to all hosts thinking they have the same public IP address. To prevent this issue, pass the --iface eth1 flag to flannel so that the second interface is chosen.

Can you please add network interface options for flannel in your cannal.yaml?

Kubelet service is down

When I was trying to bring up the server, it shows as down.

error: unable to load client CA file /etc/kubernetes/pki/ca.crt: open /etc/kubernetes/pki/ca.crt: no such file or directory

Is there any solutions start kubelet service

k8s HA cluster setup

@cookeem
I have a few questions about creating a k8s HA cluster using kubeadm.

In your instruction, you mentioned that starting from version v1.7.0,
kubernetes uses NodeRestriction admission control that prevents other master from
joining the cluster.
As a work around, you reset the kube-apiserver's admission-control settings to
the v1.6.x recommended config.

So, did you figure out how to make it work with NodeRestriction admission control?

It appears to me that your solution works. I also noticed there has been
some work to make kubeadm HA available in 1.9: kubernetes/kubeadm#261
Do you know exactly how your HA setup is different from the one being working on there?
I also notice there is another approach for creating a k8s HA cluster:
https://github.com/kubernetes-incubator/kubespray/blob/master/docs/ha-mode.md
Just curious how you would compare this approach with yours. Any thoughts?
Thank you for your time.

一直处于creating状态

环境单master没任何问题，部署多master时创建不了容器。

coredns-65dcdb4cf-4vxs4                  0/1       ContainerCreating   0          1h
dacc-d66dfdcc5-m5hcl                     0/1       ContainerCreating   0          42m
eacc-6d9ccfd9b7-kdjfs                    0/1       ContainerCreating   0          42m

Events:
  Type     Reason                  Age                 From               Message
  ----     ------                  ----                ----               -------
  Warning  FailedScheduling        34m (x37 over 44m)  default-scheduler  0/3 nodes are available: 3 PodToleratesNodeTaints.
  Normal   SuccessfulMountVolume   32m                 kubelet, node1     MountVolume.SetUp succeeded for volume "default-token-vh9f5"
  Normal   SandboxChanged          25m (x12 over 31m)  kubelet, node1     Pod sandbox changed, it will be killed and re-created.
  Warning  FailedCreatePodSandBox  1m (x58 over 31m)   kubelet, node1     Failed create pod sandbox.

大概是按照你的方式来的，不过也有点差异，LB用了个HAproxy

三个master使用的是相同的根证书ca.crt ca.key然后其它的各自生成的。三个master都执行了kubeadm init

[root@node1 ~]# kubectl get node
NAME      STATUS    ROLES     AGE       VERSION
master1   Ready     master    1h        v1.9.1
master2   Ready     master    1h        v1.9.1
master3   Ready     master    1h        v1.9.1
node1     Ready     <none>    34m       v1.9.1

kubeadm 配置文件：

apiVersion: kubeadm.k8s.io/v1alpha1
kind: MasterConfiguration
apiServerCertSANs:
- 172.31.244.231
- 172.31.244.232
- 172.31.244.233
- 172.31.244.234
- master1
- master2
- master3
- node1
- 47.75.1.72

etcd:
  endpoints:
  - http://172.31.244.232:2379

apiServerExtraArgs:
  endpoint-reconciler-type: lease

networking:
  podSubnet: 192.168.0.0/16
kubernetesVersion: v1.9.1
featureGates:
  CoreDNS: true

不知道问题出在哪儿？

你好，按照你最新的文档安装后，工作节点calico-node不能完全就绪

Testing k8s ha configuration by shutting down the first k8s master node

@cookeen, I followed your provided instruction and was able to deploy a HA Kubernetes cluster (with 3 k8s master nodes and 2 k8s nodes) using Kubernetes version 1.8.1
Everything seems working just like you described in instruction.

Next, I focused on testing the high availablity configuration. To do so, I attempted to shutdown the first k8s master. Once the first k8s master is brought down, the keepalived service on this node stopped and the virtual IP address transferred to the second k8s master. However, things start falling apart :(

Specifically, on the second (or third) master, when running the command: 'kubectl get nodes', the output shows something like the following:

NAME STATUS ROLES ...
k8s-master1 NotReady master ...
k8s-master2 Ready ...
k8s-master3 Ready ...
k8s-node1 Ready ...
k8s-node2 Ready ...

Also, on k8s-master2 or k8s-master3, when I ran 'kubectl logs' to check controller-manager and
scheduler, it appeared they did NOT reelect a new leader. As a result, all of the kubernetes services that were exposed before were no longer accessible.

Do you have any idea why the reelection process did NOT occur for the controller-manager and
scheduler on the remaining k8s master nodes?

一个小请求

楼主能否分享一下，基于k8s集群，利用Jenkins 做持续集成和构建，然后自动将代码上线到生产环境。

master-2的加入etcd集群后服务不可用

RT。
使用的k8s版本是1.11.2，按照1.11.1的操作逐步进行，每一次到master2加入etcd集群后整个服务就不可用了。表现就是kubectl没有响应。在master1种docker ps 可以看到etcd容器失败了在不断重启。我把master2中 /etc/kubernetes/manifests/ 目录下面的etcd.yaml移除后，master1的etcd重启后正常，但是这时kubectl命令提示请求:6443错误，是不是etcd数据全丢了，这种情况要怎么办呢？

i have a problem about nginx proxy

hello~
it was very helpful your opinion.

my kubernetes version 1.12.1
and os is ubuntu 16.0.4

i did all kubernetes HA setting and
install dashboard
so my url was http://VIP:30000

but it was not connected..
in my server curl -k http://VIP:16443 is worked
but my local did not worked. i registerd it in my hosts file ..

please help me thank you
thank you~!

calico authentication error to access apiserver

Hi sir,
I just try your newest updates based on canal. However, I just got stuck at deployment of canal. I find calico-node try to access 10.96.0.1:443 ( I think that is apiserver). Then, I see the apiserver prompt lots of authentication error like "Unable to authenticate the request due to an error". I tried to delete all secrets to make the secrets re-generated, but it doesn't work. Have you suffered from the same trouble or had any experiences to handle this?

Besides, I also want to ask you a question about clean removal of kubernetes. Actually, I tried "kubeadm reset" all the things (I have drained and deleted node first) and then follow your commands to remove the files. However, I still find calico pods initialized after kubeadm init automatically. Do you have any ideas about this?

Thanks,
Augustin

Grafana import not work with ID dashboard from grafana.com

everything's ok but Grafana import not work . Error : {"data":"","status":502,"config":{"method":"GET","transformRequest":[null],"transformResponse":[null],"jsonpCallbackParam":"callback","url":"api/gnet/dashboards/2","retry":0,"headers":{"X-Grafana-Org-Id":1,"Accept":"application/json, text/plain, /"}},"statusText":"Bad Gateway","xhrStatus":"complete","isHandled":true}

在所有master上增加apiserver的apiserver-count设置后，kube-flannel-ds 启动失败

在所有master上增加apiserver的apiserver-count设置
vi /etc/kubernetes/manifests/kube-apiserver.yaml
- --apiserver-count=3

systemctl restart docker && systemctl restart kubelet

calico网络插件疑问

记得你之前版本的部署文档中部署了两套网络，想请教以下，想更还cni网络插件的时候需要做那些清理工作，我目前想将flannel调整为calico，master01能正常启动，但是master02和03的calico node不能正常启动。
logs显示calico在master节点上建立的br-****都使用了172.18.0.0/16这个地址段，造成master02,03不能启动，请问有什么解决办法吗？

楼主有没有用二进制部署过高可用集群？

今天突然发现一个问题，二进制部署高可用集群，转发采用的ipvs，部署traefik如果采用hostPort模式，就会导致集群无法创建pod，楼主有没有遇到过？

ip address

In hosts list, masters have ip address 192.168.20.20 ~ 22 and VIP have this ip 192.168.20.10 but in create-config.sh script you write this:
export K8SHA_VIP=192.168.60.79

master01 ip address

export K8SHA_IP1=192.168.60.72

master02 ip address

export K8SHA_IP2=192.168.60.77

master03 ip address

export K8SHA_IP3=192.168.60.78
I think there's something wrong.

是否可以支持v1.12.3和v1.13.0

Kubernetes首爆严重安全漏洞请升级Kubernetes
https://aqzt.com/5515.html
什么时候能支持呢？

Calico and flanneld

Hello,

I really liked the post, I'm implementing it and with some doubts.

Why did you use Calico and flanneld?

What is the function of each of them in the cluster?

I imagined they had the same function.

Thanks

apiserver.ext does not exist

I am trying out v1.7. In the section to create certificates, I'm asked to edit this file: apiserver.ext

It does not exist either in current directory, or on the filesystem at all. Something is amiss.

nginx-lb.conf

Hello,

I am really excited to find this post.
One thing I find, I think this is error.

/nginx-lb/docker-compose.yaml

volumes:
- /root/kube-yaml/nginx-lb/nginx-lb.conf:/etc/nginx/nginx.conf

I think "kube-yaml" directory name is hard coded. I have this file at

/root/kubeadm-ha/nginx-lb/nginx-lb.conf

exited due to signal 15 when check keepalived status

When I checked the keepalived status, I found it´s not work. I use an external VIP 10.159.222.x k8s-master-lb, and the external VIP have ability to point to the 3 masters, I configured 3 masters successfully, but it seems the keepalived are not work correctly. would you please help to confirm below questions:

kubeadm-ha solution can use the external VIP?
the pre-assigned VIP for your case 192.168.20.10 / k8s-master-lb, who will create it and how does it combine with the 3 masters IP and hosts?

$ systemctl status keepalived
● keepalived.service - LVS and VRRP High Availability Monitor
Loaded: loaded (/usr/lib/systemd/system/keepalived.service; enabled; vendor preset: disabled)
Active: active (running) since Thu 2019-01-31 15:16:05 WET; 2h 9min ago
Process: 14343 ExecStart=/usr/sbin/keepalived $KEEPALIVED_OPTIONS (code=exited, status=0/SUCCESS)
Main PID: 14344 (keepalived)
Tasks: 6
Memory: 5.7M
CGroup: /system.slice/keepalived.service
├─14344 /usr/sbin/keepalived -D
├─14345 /usr/sbin/keepalived -D
├─14346 /usr/sbin/keepalived -D
├─20865 /usr/sbin/keepalived -D
├─20866 /bin/bash /etc/keepalived/check_apiserver.sh
└─20871 sleep 5

Jan 31 17:25:16 k8s-master02 Keepalived_vrrp[14346]: /etc/keepalived/check_apiserver.sh exited due to signal 15
Jan 31 17:25:21 k8s-master02 Keepalived_vrrp[14346]: /etc/keepalived/check_apiserver.sh exited due to signal 15
Jan 31 17:25:26 k8s-master02 Keepalived_vrrp[14346]: /etc/keepalived/check_apiserver.sh exited due to signal 15
Jan 31 17:25:31 k8s-master02 Keepalived_vrrp[14346]: /etc/keepalived/check_apiserver.sh exited due to signal 15
Jan 31 17:25:36 k8s-master02 Keepalived_vrrp[14346]: /etc/keepalived/check_apiserver.sh exited due to signal 15
Jan 31 17:25:41 k8s-master02 Keepalived_vrrp[14346]: /etc/keepalived/check_apiserver.sh exited due to signal 15
Jan 31 17:25:46 k8s-master02 Keepalived_vrrp[14346]: /etc/keepalived/check_apiserver.sh exited due to signal 15
Jan 31 17:25:51 k8s-master02 Keepalived_vrrp[14346]: /etc/keepalived/check_apiserver.sh exited due to signal 15
Jan 31 17:25:56 k8s-master02 Keepalived_vrrp[14346]: /etc/keepalived/check_apiserver.sh exited due to signal 15
Jan 31 17:26:01 k8s-master02 Keepalived_vrrp[14346]: /etc/keepalived/check_apiserver.sh exited due to signal 15

这样安装的集群证书有效期是一年吧？kubeadm源码里面写死的是1年。

有1.8的安装文档吗？kubeadm的安装方式不能上生产环境吧？

@cookeem

couldn't parse external etcd version

I did a kubeadm init --config=kubeadm-init.yaml and got the below error while initialising.

[ERROR ExternalEtcdVersion]: couldn't parse external etcd version "": Version string empty

can I use 127.0.0.1 or a domain when keepalived is not an option?

In my case, we cannot use keepalived. Therefor we have to use other ways.
I only think of two ways to get it done.

every node start a nginx to load three api servers. the node just join 127.0.0.1:8443 (8443 is the nginx port which load to real api servers.
use a domain such like k8s.mycompany.com.

I only test the first one, but I didn't work well.
I don't know if I missing something when configuration.

Can you give some advices on this ?
Any tips are appreciated.
Thank you very much.

我安装文档安装过程中在master节点上部署pod的时候返回和你的不一样请问是什么问题

@cookeem
kubectl taint nodes --all node-role.kubernetes.io/master-
node "node240" untainted

istio是做什么用的？

您好，请问istio是做什么用的？之前没有用过，这个不装可以吗？
装的时候报错：

horizontalpodautoscaler.autoscaling/istio-pilot created
service/jaeger-query created
service/jaeger-collector created
service/jaeger-agent created
service/zipkin created
service/tracing created
mutatingwebhookconfiguration.admissionregistration.k8s.io/istio-sidecar-injector created
unable to recognize "istio/istio-demo.yaml": no matches for kind "Gateway" in version "networking.istio.io/v1alpha3"
unable to recognize "istio/istio-demo.yaml": no matches for kind "attributemanifest" in version "config.istio.io/v1alpha2"
unable to recognize "istio/istio-demo.yaml": no matches for kind "attributemanifest" in version "config.istio.io/v1alpha2"
unable to recognize "istio/istio-demo.yaml": no matches for kind "stdio" in version "config.istio.io/v1alpha2"
unable to recognize "istio/istio-demo.yaml": no matches for kind "logentry" in version "config.istio.io/v1alpha2"
unable to recognize "istio/istio-demo.yaml": no matches for kind "logentry" in version "config.istio.io/v1alpha2"
unable to recognize "istio/istio-demo.yaml": no matches for kind "rule" in version "config.istio.io/v1alpha2"
unable to recognize "istio/istio-demo.yaml": no matches for kind "rule" in version "config.istio.io/v1alpha2"
unable to recognize "istio/istio-demo.yaml": no matches for kind "metric" in version "config.istio.io/v1alpha2"
unable to recognize "istio/istio-demo.yaml": no matches for kind "metric" in version "config.istio.io/v1alpha2"
unable to recognize "istio/istio-demo.yaml": no matches for kind "metric" in version "config.istio.io/v1alpha2"
unable to recognize "istio/istio-demo.yaml": no matches for kind "metric" in version "config.istio.io/v1alpha2"
unable to recognize "istio/istio-demo.yaml": no matches for kind "metric" in version "config.istio.io/v1alpha2"
unable to recognize "istio/istio-demo.yaml": no matches for kind "metric" in version "config.istio.io/v1alpha2"
unable to recognize "istio/istio-demo.yaml": no matches for kind "prometheus" in version "config.istio.io/v1alpha2"
unable to recognize "istio/istio-demo.yaml": no matches for kind "rule" in version "config.istio.io/v1alpha2"
unable to recognize "istio/istio-demo.yaml": no matches for kind "rule" in version "config.istio.io/v1alpha2"
unable to recognize "istio/istio-demo.yaml": no matches for kind "kubernetesenv" in version "config.istio.io/v1alpha2"
unable to recognize "istio/istio-demo.yaml": no matches for kind "rule" in version "config.istio.io/v1alpha2"
unable to recognize "istio/istio-demo.yaml": no matches for kind "rule" in version "config.istio.io/v1alpha2"
unable to recognize "istio/istio-demo.yaml": no matches for kind "kubernetes" in version "config.istio.io/v1alpha2"
unable to recognize "istio/istio-demo.yaml": no matches for kind "DestinationRule" in version "networking.istio.io/v1alpha3"
unable to recognize "istio/istio-demo.yaml": no matches for kind "DestinationRule" in version "networking.istio.io/v1alpha3"

Question on Virtual IP

In your post you mentioned the below steps for workers.
on all kubernetes worker nodes: set the /etc/kubernetes/bootstrap-kubelet.conf server settings, make sure this settings use the keepalived virtual IP and nginx load balancer port (here is: https://192.168.20.10:16443)

does keepalived needs to be installed on workers too? without that how would the workers reach the virtual up?

i have a problem "add etcd member to the cluster"

hello~
i have a question about "add etcd member to the cluster"

my kubernetes version 1.12.1
and os is ubuntu 16.0.4

i did
kubectl exec -n kube-system etcd-${CP0_HOSTNAME} -- etcdctl --ca-file /etc/kubernetes/pki/etcd/ca.crt --cert-file /etc/kubernetes/pki/tcd/peer.crt --key-file /etc/kubernetes/pki/etcd/peer.key --endpoints=https://${CP0_IP}:2379 member add ${CP1_HOSTNAME} https://${CP1_IP}:2380

but i got
Unable to connect to the server: dial tcp 178.128.174.200:6443: i/o timeout

this error.. firewalld was all down and i copied cert files already..

please help me thank you

dashboard无法连接

您好，我已经安装完所有组件如下：
[root@k8s-master01 ~]# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
calico-node-kwz9t 2/2 Running 0 3h
calico-node-rj8p8 2/2 Running 0 3h
calico-node-xfsg5 2/2 Running 0 3h
coredns-777d78ff6f-4rcsb 1/1 Running 0 4h
coredns-777d78ff6f-7xqzx 1/1 Running 0 4h
etcd-k8s-master01 1/1 Running 0 4h
etcd-k8s-master02 1/1 Running 0 4h
etcd-k8s-master03 1/1 Running 9 3h
heapster-5874d498f5-q2gzx 1/1 Running 0 13m
kube-apiserver-k8s-master01 1/1 Running 0 3h
kube-apiserver-k8s-master02 1/1 Running 0 3h
kube-apiserver-k8s-master03 1/1 Running 1 3h
kube-controller-manager-k8s-master01 1/1 Running 0 3h
kube-controller-manager-k8s-master02 1/1 Running 1 3h
kube-controller-manager-k8s-master03 1/1 Running 0 3h
kube-proxy-4cjhm 1/1 Running 0 4h
kube-proxy-lkvjk 1/1 Running 2 4h
kube-proxy-m7htq 1/1 Running 0 4h
kube-scheduler-k8s-master01 1/1 Running 2 4h
kube-scheduler-k8s-master02 1/1 Running 0 4h
kube-scheduler-k8s-master03 1/1 Running 2 3h
kubernetes-dashboard-7954d796d8-2k4hx 1/1 Running 0 7m
metrics-server-55fcc5b88-r8v5j 1/1 Running 0 13m
monitoring-grafana-9b6b75b49-4zm6d 1/1 Running 0 45m
monitoring-influxdb-655cd78874-lmz5l 1/1 Running 0 45m

节点信息如下：
[root@k8s-master01 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master01 Ready master 4h v1.11.1
k8s-master02 Ready master 4h v1.11.1
k8s-master03 Ready master 4h v1.11.1
通过curl访问dashboard的接口30000如下：
[root@k8s-master01 ~]# curl -k https://k8s-master-lb:30000
<!doctype html> <title ng-controller="kdTitle as $ctrl" ng-bind="$ctrl.title()"></title> <script src="static/vendor.bd425c26.js"></script> <script src="api/appConfig.json"></script> <script src="static/app.b5ad51ac.js"></script>

但是浏览器无法连接这个页面，请问您知道什么原因吗？

dashboard的pod日志如下:
[root@k8s-master01 ~]# kubectl logs kubernetes-dashboard-7954d796d8-2k4hx -n kube-system
2018/10/31 10:19:21 Starting overwatch
2018/10/31 10:19:21 Using in-cluster config to connect to apiserver
2018/10/31 10:19:21 Using service account token for csrf signing
2018/10/31 10:19:21 No request provided. Skipping authorization
2018/10/31 10:19:21 Successful initial request to the apiserver, version: v1.11.1
2018/10/31 10:19:21 Generating JWE encryption key
2018/10/31 10:19:21 New synchronizer has been registered: kubernetes-dashboard-key-holder-kube-system. Starting
2018/10/31 10:19:21 Starting secret synchronizer for kubernetes-dashboard-key-holder in namespace kube-system
2018/10/31 10:19:22 Initializing JWE encryption key from synchronized object
2018/10/31 10:19:22 Creating in-cluster Heapster client
2018/10/31 10:19:22 Auto-generating certificates
2018/10/31 10:19:22 Successfully created certificates
2018/10/31 10:19:22 Serving securely on HTTPS port: 8443
2018/10/31 10:19:22 Successful request to heapster
2018/10/31 10:23:50 http: TLS handshake error from 172.168.0.1:55803: tls: first record does not look like a TLS handshake
2018/10/31 10:25:04 http: TLS handshake error from 172.168.0.1:55814: tls: first record does not look like a TLS handshake
2018/10/31 10:28:34 http: TLS handshake error from 172.168.0.1:55888: tls: first record does not look like a TLS handshake

新建的集群就报证书过期

您好，我新建的集群，上午还好好的。下午就报

[root@k8s-master01 ~]# kubectl get po
No resources found.
Unable to connect to the server: x509: certificate has expired or is not yet valid

然后我尝试用kubeadm alpha phase certs all --config /root/kubeadm-config.yaml重新生成证书，但是也报错

[root@k8s-master01 ~]# kubeadm alpha phase certs all --config /root/kubeadm-config.yaml
[endpoint] WARNING: port specified in api.controlPlaneEndpoint overrides api.bindPort in the controlplane address
failure loading ca certificate: the certificate is not valid yet

网上说是时间不同步的问题，但是我的时间是同步的。请问您有遇到过么？或者怎么更换证书呢

按照1.11.x 安装k8s-ha集群，发现宿主机上无法ping通cluster-ip

Check keepalived work

Hello How check that keepalived work
i ping from one masternode virtal ip and get
ping 10.10.61.154
PING 10.10.61.154 (10.10.61.154) 56(84) bytes of data.
From 10.10.61.211 icmp_seq=1 Destination Host Unreachable
From 10.10.61.211 icmp_seq=2 Destination Host Unreachable
So it doen't work
How config router for kepalived work correctly?
I also use proxy server to access inernet addresse
On another stand it works fine There is no proxy there and router is more simple

k8s 1.12.x

Hi, do u plan to upgrade this wonderful code to k8s version 1.12.x?

really appreciate your sharing this code. help me on setup k8s with HA.

thank you very much

v1.13.0 版本好像和之前的不一样了。

v1.13.0的高可用部署方式不一样了。可以再更新下文档吗？

cookeem / kubeadm-ha Goto Github PK

kubeadm-ha's Introduction

通过kubeadm安装kubernetes高可用集群(支持docker和containerd作为kubernetes的容器运行时)

部署节点信息

架构说明

版本信息

安装docker

安装kubernetes

[可选] 安装管理界面 kubernetes-dashboard

[可选] 安装ingress控制器 traefik

[可选] 安装性能数据采集工具 metrics-server

[可选] 安装服务网格 istio

[可选] 应用上云引擎 Dory-Engine

kubeadm-ha's People

Contributors

Stargazers

Watchers

Forkers

kubeadm-ha's Issues

calico reachable ip address

master01 ip address

master02 ip address

master03 ip address

Recommend Projects

Recommend Topics

Recommend Org