mbert / kubeadm2ha Goto Github PK
View Code? Open in Web Editor NEWA set of scripts and documentation for adding redundancy (etcd cluster, multiple masters) to a cluster set up with kubeadm 1.8 and above
License: Apache License 2.0
A set of scripts and documentation for adding redundancy (etcd cluster, multiple masters) to a cluster set up with kubeadm 1.8 and above
License: Apache License 2.0
"/root/join-worker-node.sh" - FATAL ERROR
(ignorable, as it should be done as a prereq maybe)
[ERROR FileContent--proc-sys-net-bridge-bridge-nf-call-iptables]: /proc/sys/net/bridge/bridge-nf-call-iptables contents are not set to 1
echo "1" > /proc/sys/net/bridge/bridge-nf-call-iptables
but also make it persistent
hi (:
the default load balancing strategy of nginx is rr, so when a pod(sth like kube-proxy) do Watch
action, it will print a lots of warning log message like
W0301 02:10:52.929987 1 reflector.go:341] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:85: watch of *core.Service ended with: very short watch: k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:85: Unexpected watch close - watch lasted less than a second and no items received
How to deal with this issue? or just ignore it?
Hi,
why are you using different kubeadm-init.yaml.j2 tempalte files for master / secondary? I just ran your playbook and you should always set "endpoint-reconciler-type" for the apiserver (not only on the secondaries but also on the master):
apiServerExtraArgs:
{% if KUBERNETES_VERSION | match('^1\.8') %}apiserver-count: "{{ groups['masters'] | length }}"{% else %}endpoint-reconciler-type: "lease"{% endif %}
You should always use the "global" templates/kubeadm-init.yaml.j2 file:
template/kubeadm-init.yaml.j2
"/root/join-worker-node.sh" - WARNING
(ignorable, as it should be done as a prereq maybe)
RequiredIPVSKernelModulesAvailable]: the IPVS proxier will not be used, because the following required kernel modules are not loaded: [ip_vs_wrr ip_vs_sh ip_vs ip_vs_rr]
add:
modprobe ip_vs_wrr ip_vs_sh ip_vs ip_vs_rr
But also make them persistent.
- name: load ip_vs kernel modules
modprobe: name={{ item }} state=present
with_items:
- ip_vs_wrr
- ip_vs_rr
- ip_vs_sh
- ip_vs
- name: persist ip_vs kernel modules
copy:
path: /etc/modules-load.d/ip_vs.conf
content: |
ip_vs_wrr
ip_vs_rr
ip_vs_sh
ip_vs
inventory -> there is no var NGINX_TAG, only NGINX_VERSION
Your are doing some steps twice, which is not necessary (seems like a copy paste error for me):
You should join all worker (minions) nodes via the MASTER_VIP and not via the primary master IP:
join-token/templates/join-worker-node.sh.j2
kubeadm join --token {{ TOKEN.stdout }} {{ MASTER_VIP }}:6443 --discovery-token-ca-cert-hash sha256:{{ HASH.stdout }}
I am testing your scripts out and getting the following error:
TASK [nginx : Install nginx via package manager] ***********************************************************************************************************************************************************
fatal: [my-cluster-master-1]: FAILED! => {"changed": false, "failed": true, "msg": "No package matching 'nginx-1.12.2' found available, installed or updated", "rc": 126, "results": ["No package matching 'nginx-1.12.2' found available, installed or updated"]}
fatal: [my-cluster-master-2]: FAILED! => {"changed": false, "failed": true, "msg": "No package matching 'nginx-1.12.2' found available, installed or updated", "rc": 126, "results": ["No package matching 'nginx-1.12.2' found available, installed or updated"]}
fatal: [my-cluster-master-3]: FAILED! => {"changed": false, "failed": true, "msg": "No package matching 'nginx-1.12.2' found available, installed or updated", "rc": 126, "results": ["No package matching 'nginx-1.12.2' found available, installed or updated"]}
I noticed there is a nice option "--print-join-command" , which provides the
kubeadm token create --print-join-command
I0914 14:11:14.396695 3948 feature_gate.go:230] feature gates: &{map[]}
kubeadm join 10.1.3.2:6443 --token aaaaaa.bbbbbbbbbbbbbbbb --discovery-token-ca-cert-hash sha256:bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
You may want to look at it for the join-token role, which is quite complicated now with the openssl option.
If the playbook is run without root, the etcd role fails due to lack of perms. when it tries to copy the certs from localhost to all etcd machines:
Reason being: at unarchive, all files are owned by root (as they were on the primary-etcd), and now, a non-root user on control machine (localaction) cannot read them.
โ etcd.service - Etcd Server
Loaded: loaded (/usr/lib/systemd/system/etcd.service; enabled; vendor preset: disabled)
Active: failed (Result: start-limit) since Thu 2018-03-01 18:56:36 UTC; 26min ago
Process: 13849 ExecStart=/bin/bash -c GOMAXPROCS=$(nproc) /usr/bin/etcd --name="${ETCD_NAME}" --data-dir="${ETCD_DATA_DIR}" --listen-client-urls="${ETCD_LISTEN_CLIENT_URLS}" (code=exited, status=1/FAILURE)
Main PID: 13849 (code=exited, status=1/FAILURE)
Mar 01 18:56:35 localhost.localdomain systemd[1]: etcd.service: main process exited, code=exited, status...LURE
Mar 01 18:56:35 localhost.localdomain systemd[1]: Failed to start Etcd Server.
Mar 01 18:56:35 localhost.localdomain systemd[1]: Unit etcd.service entered failed state.
Mar 01 18:56:35 localhost.localdomain systemd[1]: etcd.service failed.
Mar 01 18:56:36 localhost.localdomain systemd[1]: etcd.service holdoff time over, scheduling restart.
Mar 01 18:56:36 localhost.localdomain systemd[1]: start request repeated too quickly for etcd.service
Mar 01 18:56:36 localhost.localdomain systemd[1]: Failed to start Etcd Server.
Mar 01 18:56:36 localhost.localdomain systemd[1]: Unit etcd.service entered failed state.
Mar 01 18:56:36 localhost.localdomain systemd[1]: etcd.service failed.
Hint: Some lines were ellipsized, use -l to show in full.
but if I log in directly to the host and su to the etcd user:
-bash-4.2$ etcd --config-file=/etc/etcd/etcd.conf
2018-03-01 19:22:21.671961 I | etcdmain: Loading server configuration from "/etc/etcd/etcd.conf"
2018-03-01 19:22:21.672261 E | etcdmain: error verifying flags, error converting YAML to JSON: yaml: line 7: did not find expected <document start>. See 'etcd --help'.
etcd is still 3.2.7
etcd --version
etcd Version: 3.2.7
Git SHA: bb66589
Go Version: go1.8.3
Go OS/Arch: linux/amd64
I see this as an example:
https://github.com/coreos/etcd/blob/master/etcd.conf.yml.sample
am I missing something? Should this be an environment file instead?
prepare nodes doesn't swapoff /remove from fstab. (ignorable, as it should be done as a prereq maybe)
Since the SSL certificates are copied to the etcd hosts which can, but need not be the same as the master hosts, the client certificates will be unavailable for the K8s cluster if the etcd hosts are separate.
20-etcd-service-manager.conf is missing the "--cgroup-driver=systemd"
BTW, FYI, starting 1.11 this setting is handled automatically by kubeadm, but this step is before calling kubeadm.
"kubeadm now detects the Docker cgroup driver and starts the kubelet with the matching driver. This eliminates a common error experienced by new users in when the Docker cgroup driver is not the same as the one set for the kubelet due to different Linux distributions setting different cgroup drivers for Docker, making it hard to start the kubelet properly. (#64347, @neolit123)"
The prepare-nodes cgroup driver part does not apply to this because:
a) 20-etcd-service-manager.conf overrides the 10-kubeadm.conf
b) the prepare-nodes code won't do anything any longer, as 10-kubeadm.conf no longer holds the "cgroup-driver" string.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.