k3s-io / k3s-ansible Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
encountering an error when trying to create a cluster on centos 7.8.2003 raspberry pi
Saturday 18 July 2020 18:37:32 -0700 (0:01:05.670) 0:01:48.058 *********
fatal: [192.168.x.y]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'description'\n\nThe error appears to be in '/home/someone/Codes/k3s-ansible/roles/raspbian/tasks/main.yml': line 2, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n---\n- name: Test for Raspbian\n ^ here\n"}
ansible 2.9.10 on Fedora 32, with python 3.8.3
As pointed out in the official docs, Raspbian Buster needs to change to iptables-legacy for k3s:
https://rancher.com/docs/k3s/latest/en/advanced/#enabling-legacy-iptables-on-raspbian-buster
Should we add tasks for this?
The latest version is v1.17.5+k3s1
.
The 'when' conditionals (bolded below) in the raspbian task main.yml generate a syntax error (see below). Update them to 'raspbian == true' to resolve the syntax.
name: Activating cgroup support
lineinfile:
path: /boot/cmdline.txt
regexp: '^((?!.\bcgroup_enable=cpuset cgroup_memory=1 cgroup_enable=memory\b).)$'
line: '\1 cgroup_enable=cpuset cgroup_memory=1 cgroup_enable=memory'
backrefs: true
notify: reboot
**when:
name: Flush iptables before changing to iptables-legacy
iptables:
flush: true
when: raspbian
changed_when: false # iptables flush always returns changed
name: Changing to iptables-legacy
alternatives:
path: /usr/sbin/iptables-legacy
name: iptables
register: ip4_legacy
when: raspbian
name: Changing to ip6tables-legacy
alternatives:
path: /usr/sbin/ip6tables-legacy
name: ip6tables
register: ip6_legacy
when: raspbian
syntax error:
"The conditional check 'raspbian is true' failed. The error was: template error while templating string: no test named 'true'. String: {% if raspbian is true %} True {% else %} False {% endif %}
fatal: [192.168.x.y]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'items' is undefined\n\nThe error appears to be in '/home/someone/Codes/k3s-ansible/roles/prereq/tasks/main.yml': line 33, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Set bridge-nf-call-iptables (just to be sure)\n ^ here\n"}
I am trying to run this on 3x raspberry pi 3, running centos7.8.2003.
james@dragon:~/Downloads/k3s-ansible-master$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system helm-install-traefik-xbgdc 0/1 ContainerCreating 0 18m
kube-system local-path-provisioner-58fb86bdfd-hdddq 0/1 ContainerCreating 0 18m
kube-system metrics-server-6d684c7b5-xhcq9 0/1 ContainerCreating 0 18m
kube-system coredns-6c6bb68b64-f99km 0/1 ContainerCreating 0 18m
Warning FailedCreatePodSandBox 47s (x35 over 8m42s) kubelet, pine1 (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create containerd task: failed to mount rootfs component &{overlay overlay [workdir=/var/lib/rancher/k3s/agent/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/175/work upperdir=/var/lib/rancher/k3s/agent/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/175/fs lowerdir=/var/lib/rancher/k3s/agent/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/1/fs]}: no such device: unknown
i was unable to find a /snapshots/175 and only found a /snapshots/1/
did a reset and re-deploy and have the same issue.
Version:
N/A
K3s arguments:
N/A
Describe the bug
When I run the ansible playbook with Ansible 2.9.6, I get a warning saying one of the group names in the inventory is invalid.
This seems to be related to this issue in Ansible's repository ansible/ansible-documentation#89
Regardless of the outcome of that issue, it might be best to convert the group name to using underscores, to prevent this warning and ensure the playbook runs properly.
To Reproduce
ansible-playbook site.yml -i inventory/hosts.ini
Expected behavior
No warnings at the beginning of the play.
Actual behavior
$ ansible-playbook site.yml -i inventory/hosts.ini
[WARNING]: Invalid characters were found in group names but not replaced, use -vvvv to see details
Additional context / logs
Moved from k3s repo issue k3s-io/k3s#1727
It would be nice to get off 1.17 (which is the current version this playbook installs), but there is one caveat—at least in some testing from @alexellis—it seems that K3s 1.19 may have issues running on older Pis like the Pi 3 B+ due to network timeouts or disk IO speed (see k3s-io/k3s#2353).
I noticed when evaluating the Ansible playbook that two tasks in the prereq role did the exact same thing (not following the documentation in the 'name' of the task).
Version:
N/A
K3s arguments:
N/A
Describe the bug
After applying the fix in k3s-io/k3s#1730, to make the 'reboot on raspbian' task actually work (without a fatal error), I realized that this causes another problem: when the ARM servers reboot mid-playbook, the playbook fails. Even if only the master
node fails, everything else will fail at the Copy the K3s service file
task with the message:
AnsibleUndefinedVariable: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'token'
To Reproduce
Run the Ansible playbook on ARM servers.
Expected behavior
The playbook completes successfully, and reboots the ARM servers as required in the Rebooting on Raspbian
task.
Actual behavior
TASK [raspbian : Rebooting on Raspbian] ********************************************************************************
Saturday 02 May 2020 11:36:06 -0500 (0:00:02.881) 0:00:38.813 **********
skipping: [worker-01]
skipping: [worker-02]
skipping: [worker-03]
skipping: [worker-04]
skipping: [worker-05]
skipping: [worker-06]
fatal: [turing-master]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: Shared connection to turing-master closed.", "unreachable": true}
Which, in turn, causes all the other hosts to fail:
TASK [k3s/node : Copy K3s service file] ********************************************************************************
Saturday 02 May 2020 11:36:14 -0500 (0:00:06.435) 0:00:46.844 **********
fatal: [worker-01]: FAILED! => {"changed": false, "msg": "AnsibleUndefinedVariable: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'token'"}
fatal: [worker-02]: FAILED! => {"changed": false, "msg": "AnsibleUndefinedVariable: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'token'"}
fatal: [worker-03]: FAILED! => {"changed": false, "msg": "AnsibleUndefinedVariable: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'token'"}
fatal: [worker-05]: FAILED! => {"changed": false, "msg": "AnsibleUndefinedVariable: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'token'"}
fatal: [worker-04]: FAILED! => {"changed": false, "msg": "AnsibleUndefinedVariable: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'token'"}
fatal: [worker-06]: FAILED! => {"changed": false, "msg": "AnsibleUndefinedVariable: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'token'"}
Additional context / logs
This was moved from the k3s repo issue k3s-io/k3s#1724
Wouldn't it be better/clearer to move the separate Raspbian role to prereq and just load the taskfile specifically on raspbian? Additional the Raspbian-buster only could also be loaded similarly. WDYT?
I am going to have a 4 node Raspbian 64-bit cluster.
Do I need to pre-install with Docker before I execute this playbook?
This is more of a question rather than an issue.
Add role metadata to enable
ansible-galaxy install -r requirements.yaml
Hello, wouldn't it be nice to do basic hardening to the ubuntu nodes (and others also) ? Like installing fail2ban and making sure unattended-upgrades are activate for security patches? @geerlingguy I know you have tons of playbooks for this, what do you think?
As the title says; that's what is used for the k3s
project from which this project was branched: https://github.com/rancher/k3s/blob/master/LICENSE
Without this setting, if you have cowsay
installed, you see a bunch of:
$ ansible-playbook site.yml -i inventory/hosts.ini
____________________
< PLAY [k3s_cluster] >
--------------------
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
|| ||
I'm all for cowsay in moderation, but it can be a bit jarring for a first time user :)
It'd be great to have optional support for SELinux: https://rancher.com/docs/k3s/latest/en/advanced/#experimental-selinux-support
From k3s-io/k3s#2473 (comment) :
k3s-ansible (an excellent reference for the necessary setup steps) could be updated here as well: https://github.com/rancher/k3s-ansible/blob/721c3487027e42d30c60eb206e0fb5abfddd094f/roles/prereq/tasks/main.yml#L2-L5
OTOH something like this:
- name: Set SELinux to disabled state
selinux:
state: disabled
when:
- not (k3s_disable_selinux is defined and not k3s_disable_selinux) or k3s_disable_selinux == False
- ansible_distribution in ['CentOS', 'Red Hat Enterprise Linux']
With this added to https://github.com/rancher/k3s-ansible/blob/master/inventory/sample/group_vars/all.yml :
k3s_disable_selinux: False
RPi x 4
Buster
i have done sudo git clone and have made the my-cluster dir in the inventory. I have changed the host.ini and the all.yml.
I keep getting this. Super new to this stuff and do not know what i have done. thank you for your help.
-bash: ansible-playbook: command not found
When trying to enable cgroups, the playbook tries to do it through the /boot/firmware/cmdline.txt
file which doesn't exist since that's not where the kernel cmdline arguments are defined on the latest ubuntu server version on x86_64.
This resulting output is
fatal: [k3s-main]: FAILED! => {"changed": false, "msg": "Destination /boot/firmware/cmdline.txt does not exist !", "rc": 257}
I think this step should be skipped on non-RPi machines and refactored for the rest of the cases either through a templated /etc/default/grub
file that will be distro and architecture specific or by checking the kernel config in /boot/
like below
grep CONFIG_CGROUPS= /boot/config-`uname -r`
which if successful returns
CONFIG_CGROUPS=y
"{{ items }}" in this line should be "{{ item }}".
Ansible version: 2.9.10
OS: CentOS Linux release 7.8.2003 (Core)
After changed to {{ item }}, the task finished successfully.
When running the playbook after following the Readme and changing the user I get the following error when running the playbook.
TASK [raspbian : Activating cgroup support] ****************************************************************************************************************************************************************************************************************************************************************************************************************************************************
fatal: [192.168.50.51]: FAILED! => {"msg": "The conditional check 'raspbian is true' failed. The error was: template error while templating string: no test named 'true'. String: {% if raspbian is true %} True {% else %} False {% endif %}\n\nThe error appears to be in '/home/brett/k3s-ansible/roles/raspbian/tasks/main.yml': line 10, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Activating cgroup support\n ^ here\n"}
fatal: [192.168.50.215]: FAILED! => {"msg": "The conditional check 'raspbian is true' failed. The error was: template error while templating string: no test named 'true'. String: {% if raspbian is true %} True {% else %} False {% endif %}\n\nThe error appears to be in '/home/brett/k3s-ansible/roles/raspbian/tasks/main.yml': line 10, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Activating cgroup support\n ^ here\n"}
fatal: [192.168.50.196]: FAILED! => {"msg": "The conditional check 'raspbian is true' failed. The error was: template error while templating string: no test named 'true'. String: {% if raspbian is true %} True {% else %} False {% endif %}\n\nThe error appears to be in '/home/brett/k3s-ansible/roles/raspbian/tasks/main.yml': line 10, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Activating cgroup support\n ^ here\n"}
fatal: [192.168.50.204]: FAILED! => {"msg": "The conditional check 'raspbian is true' failed. The error was: template error while templating string: no test named 'true'. String: {% if raspbian is true %} True {% else %} False {% endif %}\n\nThe error appears to be in '/home/brett/k3s-ansible/roles/raspbian/tasks/main.yml': line 10, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Activating cgroup support\n ^ here\n"}
After that fails the playbook ends. I'm trying to deploy it to 4 Pi4s running HypriotOS. I am new to all this so trying to learn.
Since using the ansible-galaxy method is not supported (even via direct from git) what is the recommend way for using these roles? Just copy and modify?
On Ubuntu Server 20.04 file /boot/firmware/cmdline.txt
don't exists on a fresh install.
It causes fail on the task - name: Enable cgroup via boot commandline if not already enabled
Failing to get k3s nodes talking to each other directly, I thought I'd take crack at using the ansible playbook to make sure I wasn't missing anything.
The problem could be that I'm running on Armbian whereas this seems to be well-tested on Ubuntu and Raspbian. I'm trying to build a PR to fix discrepancies, but I haven't succeeded in a successful playbook run yet. It's hanging now at "Enable and check K3s service"
I see this error on a node:
./syslog:Aug 2 00:58:14 localhost k3s[4895]: time="2020-08-02T00:58:14.866195152Z" level=info msg="Running load balancer 127.0.0.1:43201 -> [t4.local:6443]"
./syslog:Aug 2 00:58:24 localhost k3s[4895]: time="2020-08-02T00:58:24.881040536Z" level=error msg="failed to get CA certs at https://127.0.0.1:43201/cacerts: Get https://127.0.0.1:43201/cacerts: read tcp 127.0.0.1:36796->127.0.0.1:43201: read: connection reset by peer"
Service is up on the master, and accessible from the node.
But.
It's really, really slow:
time curl --insecure https://t4.local:6443/cacerts
-----BEGIN CERTIFICATE-----
...
-----END CERTIFICATE-----real 0m10.304s
user 0m0.149s
sys 0m0.042s
Resets + slow service seem a bit suspect, and of half a dozen queries, they all return at just over 10 seconds. There's free memory and the load average is 0.6 on the master. They're on the same dumb switch. Don't seem to be any error logs on master during the request, but I could be looking in the wrong spots.
What am I missing, or should I be looking for?
Unexpected templating type error on CentOS 8 RPI4
Error message:
TASK [raspbian : Test for Raspbian] **********************************************************************************************************************************************
Monday 03 August 2020 19:14:42 +0100 (0:00:00.216) 0:00:28.536 *********
fatal: [polux]: FAILED! => {"msg": "Unexpected templating type error occurred on ({% if ( ansible_facts.architecture is search(\"arm\") and ansible_facts.lsb.description is match(\"[Rr]aspbian.*[Bb]uster\") ) or ( ansible_facts.architecture is search(\"aarch64\") and ansible_facts.lsb.description is match(\"Debian.*buster\") ) %}True{% else %}False{% endif %}): expected string or bytes-like object"}
fatal: [pangea]: FAILED! => {"msg": "Unexpected templating type error occurred on ({% if ( ansible_facts.architecture is search(\"arm\") and ansible_facts.lsb.description is match(\"[Rr]aspbian.*[Bb]uster\") ) or ( ansible_facts.architecture is search(\"aarch64\") and ansible_facts.lsb.description is match(\"Debian.*buster\") ) %}True{% else %}False{% endif %}): expected string or bytes-like object"}
to retry, use: --limit @/home/ansible/work/k3s-ansible/site.retry
Machine: Raspberry Pi 4, 4GB model.
OS:
$ uname -a
Linux polux 5.4.53-v8.1.el8 #1 SMP PREEMPT Sun Jul 26 12:06:25 -03 2020 aarch64 aarch64 aarch64 GNU/Linux
$ cat /etc/os-release
NAME="CentOS Linux"
VERSION="8 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="8"
PLATFORM_ID="platform:el8"
PRETTY_NAME="CentOS Linux 8 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:8"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-8"
CENTOS_MANTISBT_PROJECT_VERSION="8"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="8"
Has anyone seen this Raspberry Pi 3 with Ubuntu 20.04, the Architecture isn't named "arm", but rather "aarch64"
Consequently, nothing works with a rename of the architecture name
In the included ansible.cfg
, the proper inventory file is configured:
inventory = ./hosts.ini
This assumes the user has this repository, runs ansible*
commands from the root directory, and has created a hosts.ini
file in the base directory (alongside the site.yml
playbook).
The README currently states:
Add the system information gathered above into a file called hosts.ini. For example:
I think this would be more clear if it were something like "Add the system information gathered above into a file called hosts.ini
in the same directory as this README file. There is a template in the inventory
directory."
Additionally, because the path to the file is defined in ansible.cfg
, it need not be specified when you run the playbook, so the playbook command could be, simply:
ansible-playbook site.yml
(Unless I'm reading the configuration wrong.)
Finally, if a .gitignore
file is added to the repository with the hosts.ini
file excluded, a user like me could clone the repository, create my custom hosts.ini
file, and pull changes without fear of any conflicts or accidentally adding my local customized hosts.ini
file to the repository.
I'm facing with this error when I'm using Ansible Galaxy version (in ansible-galaxy branch):
TASK [k3s-ansible/roles/prereq : Set bridge-nf-call-iptables (just to be sure)] ***
fatal: [10.0.10.11]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'items' is undefined\n\nThe error appears to be in '/roles/k3s-ansible/roles/prereq/tasks/main.yml': line 33, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Set bridge-nf-call-iptables (just to be sure)\n ^ here\n"}
That variable should be "item":
https://github.com/rancher/k3s-ansible/blob/f91dfcfc8e2e94d6ff687c3d0ecc7805d38e8517/roles/prereq/tasks/main.yml#L35
For quite a long time I faced this problem, but decided to write only now, when I finally got mad. Every few deployments (about every fifth, I didn't let down the exact statistics) a pod with the name helm-install-traefik
falls into an CrashLoopBackOff
and then Error
state. Of course, this pod retries to stand again later after pod-restart and sometimes it even can reach the Complete
state. But almost always, if this happens, helm-install-traefik
doesn't stand up and the cluster doesn't deploy. The fact, that it can happens every next deploy is very unpleasant.
This problem was encountered on Ubuntu-server 18.04.4 LTS and CentOS 7 on x86-64 architecture
Attach describe-pod and logs command output here:
helm-install-traefik-describe.txt
helm-install-traefik-logs.txt
In Ansible 2.9.x, the check for Rebooting on Raspbian
fails with the following error message:
fatal: [18.206.98.159]: FAILED! => {"msg": "The conditional check 'boot_cmdline | changed' failed. The error was: template error while templating string: no filter named 'changed'. String: {% if boot_cmdline | changed %} True {% else %} False {% endif %}\n\nThe error appears to be in '/Users/jgeerling/Downloads/youtube-10k-pods/attempt-two-k3s/k3s-ansible/roles/raspbian/tasks/main.yml': line 19, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Rebooting on Raspbian\n ^ here\n"}
The fix is to set the conditional to boot_cmdline is changed
.
trying to bring up k3s on 4 pi3, it failed with this error
TASK [k3s/node : Enable and check K3s service] ********************************************************************************
Sunday 13 September 2020 14:58:07 -0700 (0:00:07.255) 0:05:08.667 ******
fatal: [192.168.xxx.yyy]: FAILED! => {"changed": false, "msg": "Unable to start service k3s-node: Job for k3s-node.service failed because the control process exited with error code. See \"systemctl status k3s-node.service\" and \"journalctl -xe\" for details.\n"}
[root@pi3-01 ~]# systemctl status k3s-node.service
● k3s-node.service - Lightweight Kubernetes
Loaded: loaded (/etc/systemd/system/k3s-node.service; enabled; vendor preset: disabled)
Active: activating (auto-restart) (Result: exit-code) since Sun 2020-09-13 14:58:34 PDT; 1s ago
Docs: https://k3s.io
Process: 4109 ExecStart=/usr/local/bin/k3s agent --server https://192.168.xxx.yyy:6443 --token zzzzzzzzzzzzzzzz (code=exited, status=1/FAILURE)
Process: 4106 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
Process: 4103 ExecStartPre=/sbin/modprobe br_netfilter (code=exited, status=0/SUCCESS)
Main PID: 4109 (code=exited, status=1/FAILURE)
Sep 13 14:58:34 pi3-01.local systemd[1]: Failed to start Lightweight Kubernetes.
Sep 13 14:58:34 pi3-01.local systemd[1]: Unit k3s-node.service entered failed state.
Sep 13 14:58:34 pi3-01.local systemd[1]: k3s-node.service failed.
node service seems to be running
This is something I think we might be able to get configured in the Ansible playbook, but I didn't see (at a glance at least) if it was something supported by this playbook yet; namely, a multi-master configuration with an external database: High Availability with an External DB.
In this playbook's case, maybe it would delegate the task of configuring an external database cluster to the user (e.g. use a separate Ansible playbook that builds an RDS cluster in Amazon, or a separate two or three node DB cluster on some other bare metal servers alongside the K3s cluster), but then how could we make it so this playbook supports the multi-master configuration described in the docs page linked above.
A CI to test that the playbook is running successfully would be nice.
Getting a 'raspian is true' failed error when master and nodes are x64 Intel Debian Buster , not ARM, not RaspberryPIs
$ ssh 192.168.86.110 uname -a
Linux alfred 4.19.0-10-amd64 #1 SMP Debian 4.19.132-1 (2020-07-24) x86_64 GNU/Linux
....
TASK [download : Download k3s binary x64] ******************************************************************************
Sunday 06 September 2020 00:21:32 -0600 (0:00:00.412) 0:00:05.811 ******
changed: [192.168.86.110]
changed: [192.168.86.111]
changed: [192.168.86.112]
TASK [download : Download k3s binary arm64] ****************************************************************************
Sunday 06 September 2020 00:22:13 -0600 (0:00:41.544) 0:00:47.356 ******
skipping: [192.168.86.111]
skipping: [192.168.86.112]
skipping: [192.168.86.110]
TASK [download : Download k3s binary armhf] ****************************************************************************
Sunday 06 September 2020 00:22:13 -0600 (0:00:00.137) 0:00:47.493 ******
skipping: [192.168.86.111]
skipping: [192.168.86.112]
skipping: [192.168.86.110]
TASK [raspbian : Test for Raspbian] ************************************************************************************
Sunday 06 September 2020 00:22:13 -0600 (0:00:00.177) 0:00:47.671 ******
ok: [192.168.86.111]
ok: [192.168.86.112]
ok: [192.168.86.110]
TASK [raspbian : Activating cgroup support] ****************************************************************************
Sunday 06 September 2020 00:22:14 -0600 (0:00:00.299) 0:00:47.970 ******
fatal: [192.168.86.110]: FAILED! => {"msg": "The conditional check 'raspbian is true' failed. The error was: template error while templating string: no test named 'true'. String: {% if raspbian is true %} True {% else %} False {% endif %}\n\nThe error appears to have been in '/home/sean/k3s-ansible/roles/raspbian/tasks/main.yml': line 11, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Activating cgroup support\n ^ here\n"}
fatal: [192.168.86.111]: FAILED! => {"msg": "The conditional check 'raspbian is true' failed.
...
fatal: [192.168.1.150]: FAILED! => {"msg": "The conditional check 'raspbian is true' failed. The error was: template error while templating string: no test named 'true'. String: {% if raspbian is true %} True {% else %} False {% endif %}\n\nThe error appears to be in '/home/james/Downloads/k3s-ansible-master/roles/raspbian/tasks/main.yml': line 10, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Activating cgroup support\n ^ here\n"}
ansible playbook fails to detect my OS version.
cat /etc/osrelease:
NAME="Ubuntu"
VERSION="18.04.4 LTS (Bionic Beaver)"
this is running on an ARM64 system
I'm running the playbook with ansible-playbook site.yml -i inventory/sample/hosts.ini -k -K -vv
.
It runs successfully up to [Enable and check K3s service] in /node/tasks/main.yml, then it hangs indefinitely. I've run this with varying levels of verbosity and debugging on.
Running this on Raspberry Pi 4Bs, all of which have Raspberry Pi OS Lite.
hosts.ini
when I run the following command with Ansible version ansible 2.5.1 on Ubuntu 18 LTS
ansible-playbook site.yml -i inventory/my-cluster/hosts.ini
The following error is produced:
ERROR! no action detected in task. This often indicates a misspelled module name, or incorrect module path.
The error appears to have been in '/home/thepoetwarrior/Downloads/k3s-ansible-master/roles/raspbian/tasks/main.yml': line 41, column 3, but may
be elsewhere in the file depending on the exact syntax problem.
The offending line appears to be:
My current config for this playbook is:
file: inventory/my-cluster/hosts.ini
[master]
192.168.0.50
[node]
192.168.0.51
192.168.0.52
192.168.0.53
[k3s_cluster:children]
master
node
file:inventory/my-cluster/group_vars/all.yml
k3s_version: v1.17.5+k3s1
ansible_user: pi
systemd_dir: /etc/systemd/system
master_ip: "{{ hostvars[groups['master'][0]]['ansible_host'] | default(groups['master'][0]) }}"
extra_server_args: ""
Raspberry PI specific settings must only run on a Raspberry PI, not on all arm / arm64 systems. There are other arm systems and arm virtual machines.
I'm trying to get this running on Alpine.
After a fresh install the playbook ends with the following error:
FAILED. Unable to start service k3s: Job for k3s.service failed because the control process exited with error code.
Here is the status of the k3s.service:
un 19 15:14:46 master k3s[1087]: E0619 15:14:46.710048 1087 cluster_authentication_trust_controller.go:493] kube-system/extension-apiserver-authentication failed with : context deadline exceeded
Jun 19 15:14:46 master k3s[1087]: E0619 15:14:46.723364 1087 status.go:71] apiserver received an error that is not an metav1.Status: context.deadlineExceededError{}
Jun 19 15:14:46 master k3s[1087]: I0619 15:14:46.724123 1087 trace.go:116] Trace[927478669]: "Create" url:/apis/apiregistration.k8s.io/v1/apiservices,user-agent:k3s/v1.17.5+k3s1 (linux/arm64) kubernetes/
Jun 19 15:14:46 master k3s[1087]: Trace[927478669]: [34.001125181s] [34.000375656s] END
Jun 19 15:14:46 master k3s[1087]: E0619 15:14:46.725191 1087 autoregister_controller.go:194] v2beta1.autoscaling failed with : context deadline exceeded
Jun 19 15:14:46 master k3s[1087]: E0619 15:14:46.780093 1087 status.go:71] apiserver received an error that is not an metav1.Status: context.deadlineExceededError{}
Jun 19 15:14:46 master k3s[1087]: I0619 15:14:46.781135 1087 trace.go:116] Trace[1836866062]: "Create" url:/apis/apiregistration.k8s.io/v1/apiservices,user-agent:k3s/v1.17.5+k3s1 (linux/arm64) kubernetes
Jun 19 15:14:46 master k3s[1087]: Trace[1836866062]: [34.001351143s] [34.000847744s] END
Jun 19 15:14:46 master k3s[1087]: E0619 15:14:46.782577 1087 autoregister_controller.go:194] v2beta2.autoscaling failed with : context deadline exceeded
Jun 19 15:14:55 master k3s[1087]: time="2020-06-19T15:14:55.117854357+01:00" level=error msg="error in txn: context deadline exceeded"
Hi all,
Here is my proposal:
I will probably create a new branch to test the ansible-galaxy directory structure and reorganize the structure!
Are you ok with that way to do it?
I was testing the reset.yml
playbook and reset
role today, and got the following error:
TASK [reset : Disable services] ****************************************************************************************
Sunday 17 May 2020 22:37:05 -0500 (0:00:07.969) 0:00:07.991 ************
failed: [10.0.100.37] (item=k3s) => {"ansible_loop_var": "item", "changed": false, "item": "k3s", "msg": "Could not find the requested service k3s: host"}
failed: [10.0.100.91] (item=k3s) => {"ansible_loop_var": "item", "changed": false, "item": "k3s", "msg": "Could not find the requested service k3s: host"}
failed: [10.0.100.74] (item=k3s) => {"ansible_loop_var": "item", "changed": false, "item": "k3s", "msg": "Could not find the requested service k3s: host"}
failed: [10.0.100.70] (item=k3s) => {"ansible_loop_var": "item", "changed": false, "item": "k3s", "msg": "Could not find the requested service k3s: host"}
changed: [10.0.100.163] => (item=k3s)
changed: [10.0.100.37] => (item=k3s-node)
changed: [10.0.100.91] => (item=k3s-node)
changed: [10.0.100.74] => (item=k3s-node)
changed: [10.0.100.70] => (item=k3s-node)
failed: [10.0.100.163] (item=k3s-node) => {"ansible_loop_var": "item", "changed": false, "item": "k3s-node", "msg": "Could not find the requested service k3s-node: host"}
failed: [10.0.100.99] (item=k3s) => {"ansible_loop_var": "item", "changed": false, "item": "k3s", "msg": "Could not find the requested service k3s: host"}
failed: [10.0.100.197] (item=k3s) => {"ansible_loop_var": "item", "changed": false, "item": "k3s", "msg": "Could not find the requested service k3s: host"}
changed: [10.0.100.197] => (item=k3s-node)
changed: [10.0.100.99] => (item=k3s-node)
Running it again results in all those changed
messages becoming ok
, but the failed
messages still kill the playbook run and K3s is not totally uninstalled.
Adding a failed_when: false
to the task allows those expected failures to be ignored, so the rest of the playbook can run.
Issue: k3s fails to deploy on a vanilla ubuntu 20.04 on a Raspberry Pi 4.
Ansible does not show any error (see output below) but the k3s service restarts in loop (logs attached)
OS:
Linux polux 5.4.0-1015-raspi #15-Ubuntu SMP Fri Jul 10 05:34:24 UTC 2020 aarch64 aarch64 aarch64 GNU/Linux
/boot/firmware/cmdline.txt:
net.ifnames=0 dwc_otg.lpm_enable=0 console=serial0,115200 console=tty1 root=/dev/sda2 rootfstype=ext4 elevator=deadline rootwait fixrtc cgroup_enable=cpuset cgroup_enable=memory cgroup_memory=1
Current version of k3s-ansible:
* ad3dc65 (HEAD -> master, origin/master, origin/HEAD) Merge pull request #66 from stafwag/master
k3s-ansible configuration:
ansible@gaia:~/work/k3s-ansible$ cat inventory/rpi-galaxy/group_vars/all.yml
---
#k3s_version: v1.17.5+k3s1
# according to the following link, this version has been validated for Ubuntu 20.04 / ARM64
# https://github.com/rancher/k3s/issues/1860
k3s_version: v1.18.3+k3s1
ansible_user: ansible
systemd_dir: /etc/systemd/system
master_ip: "{{ hostvars[groups['master'][0]]['ansible_host'] | default(groups['master'][0]) }}"
extra_server_args: ""
extra_agent_args: ""
ansible@gaia:~/work/k3s-ansible$ cat inventory/rpi-galaxy/hosts.ini
[master]
polux
[node]
polux
#kore
#cygnus
[k3s_cluster:children]
master
node
sudo journalctl -u k3s (for 1 restart iteration):
Jul 28 20:32:40 polux systemd[1]: k3s.service: Scheduled restart job, restart counter is at 3.
Jul 28 20:32:40 polux systemd[1]: Stopped Lightweight Kubernetes.
Jul 28 20:32:40 polux systemd[1]: Starting Lightweight Kubernetes...
Jul 28 20:32:40 polux modprobe[3365]: modprobe: FATAL: Module br_netfilter not found in directory /lib/modules/5.4.0-1015-raspi
Jul 28 20:32:41 polux k3s[3367]: time="2020-07-28T20:32:41.493565790Z" level=info msg="Starting k3s v1.18.3+k3s1 (96653e8d)"
Jul 28 20:32:41 polux k3s[3367]: time="2020-07-28T20:32:41.494117219Z" level=info msg="Cluster bootstrap already complete"
Jul 28 20:32:41 polux k3s[3367]: time="2020-07-28T20:32:41.549335621Z" level=info msg="Kine listening on unix://kine.sock"
Jul 28 20:32:41 polux k3s[3367]: time="2020-07-28T20:32:41.551698793Z" level=info msg="Running kube-apiserver --advertise-port=6443 --allow-privileged=true --anonymous-auth=false --api-audiences=unknown --authorization-mode=Node,RBAC --basic-auth-file=/var/lib/rancher/k3s/server/cred/passwd --bind-address=127.0.0.1 --cert-dir=/var/lib/rancher/k3s/server/tls/temporary-certs --client-ca-file=/var/lib/rancher/k3s/server/tls/client-ca.crt --enable-admission-plugins=NodeRestriction --etcd-servers=unix://kine.sock --insecure-port=0 --kubelet-certificate-authority=/var/lib/rancher/k3s/server/tls/server-ca.crt --kubelet-client-certificate=/var/lib/rancher/k3s/server/tls/client-kube-apiserver.crt --kubelet-client-key=/var/lib/rancher/k3s/server/tls/client-kube-apiserver.key --proxy-client-cert-file=/var/lib/rancher/k3s/server/tls/client-auth-proxy.crt --proxy-client-key-file=/var/lib/rancher/k3s/server/tls/client-auth-proxy.key --requestheader-allowed-names=system:auth-proxy --requestheader-client-ca-file=/var/lib/rancher/k3s/server/tls/request-header-ca.crt --requestheader-extra-headers-prefix=X-Remote-Extra- --requestheader-group-headers=X-Remote-Group --requestheader-username-headers=X-Remote-User --secure-port=6444 --service-account-issuer=k3s --service-account-key-file=/var/lib/rancher/k3s/server/tls/service.key --service-account-signing-key-file=/var/lib/rancher/k3s/server/tls/service.key --service-cluster-ip-range=10.43.0.0/16 --storage-backend=etcd3 --tls-cert-file=/var/lib/rancher/k3s/server/tls/serving-kube-apiserver.crt --tls-private-key-file=/var/lib/rancher/k3s/server/tls/serving-kube-apiserver.key"
Jul 28 20:32:41 polux k3s[3367]: Flag --basic-auth-file has been deprecated, Basic authentication mode is deprecated and will be removed in a future release. It is not recommended for production environments.
Jul 28 20:32:41 polux k3s[3367]: I0728 20:32:41.553311 3367 server.go:682] external host was not specified, using 192.168.0.95
Jul 28 20:32:41 polux k3s[3367]: I0728 20:32:41.554216 3367 server.go:166] Version: v1.18.3+k3s1
Jul 28 20:32:41 polux k3s[3367]: I0728 20:32:41.570077 3367 plugins.go:158] Loaded 12 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByCondition,Priority,DefaultTolerationSeconds,DefaultStorageClass,StorageObjectInUseProtection,RuntimeClass,DefaultIngressClass,MutatingAdmissionWebhook.
Jul 28 20:32:41 polux k3s[3367]: I0728 20:32:41.571518 3367 plugins.go:161] Loaded 10 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,Priority,PersistentVolumeClaimResize,RuntimeClass,CertificateApproval,CertificateSigning,CertificateSubjectRestriction,ValidatingAdmissionWebhook,ResourceQuota.
Jul 28 20:32:41 polux k3s[3367]: I0728 20:32:41.576411 3367 plugins.go:158] Loaded 12 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByCondition,Priority,DefaultTolerationSeconds,DefaultStorageClass,StorageObjectInUseProtection,RuntimeClass,DefaultIngressClass,MutatingAdmissionWebhook.
Jul 28 20:32:41 polux k3s[3367]: I0728 20:32:41.576977 3367 plugins.go:161] Loaded 10 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,Priority,PersistentVolumeClaimResize,RuntimeClass,CertificateApproval,CertificateSigning,CertificateSubjectRestriction,ValidatingAdmissionWebhook,ResourceQuota.
Jul 28 20:32:41 polux k3s[3367]: I0728 20:32:41.638976 3367 master.go:270] Using reconciler: lease
Jul 28 20:32:41 polux k3s[3367]: I0728 20:32:41.719580 3367 rest.go:113] the default service ipfamily for this cluster is: IPv4
Jul 28 20:32:42 polux k3s[3367]: W0728 20:32:42.642664 3367 genericapiserver.go:409] Skipping API batch/v2alpha1 because it has no resources.
Jul 28 20:32:42 polux k3s[3367]: W0728 20:32:42.679737 3367 genericapiserver.go:409] Skipping API discovery.k8s.io/v1alpha1 because it has no resources.
Jul 28 20:32:42 polux k3s[3367]: W0728 20:32:42.723548 3367 genericapiserver.go:409] Skipping API node.k8s.io/v1alpha1 because it has no resources.
Jul 28 20:32:42 polux k3s[3367]: W0728 20:32:42.791859 3367 genericapiserver.go:409] Skipping API rbac.authorization.k8s.io/v1alpha1 because it has no resources.
Jul 28 20:32:42 polux k3s[3367]: W0728 20:32:42.804764 3367 genericapiserver.go:409] Skipping API scheduling.k8s.io/v1alpha1 because it has no resources.
Jul 28 20:32:42 polux k3s[3367]: W0728 20:32:42.859465 3367 genericapiserver.go:409] Skipping API storage.k8s.io/v1alpha1 because it has no resources.
Jul 28 20:32:42 polux k3s[3367]: W0728 20:32:42.931689 3367 genericapiserver.go:409] Skipping API apps/v1beta2 because it has no resources.
Jul 28 20:32:42 polux k3s[3367]: W0728 20:32:42.931776 3367 genericapiserver.go:409] Skipping API apps/v1beta1 because it has no resources.
Jul 28 20:32:42 polux k3s[3367]: I0728 20:32:42.966951 3367 plugins.go:158] Loaded 12 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByCondition,Priority,DefaultTolerationSeconds,DefaultStorageClass,StorageObjectInUseProtection,RuntimeClass,DefaultIngressClass,MutatingAdmissionWebhook.
Jul 28 20:32:42 polux k3s[3367]: I0728 20:32:42.967022 3367 plugins.go:161] Loaded 10 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,Priority,PersistentVolumeClaimResize,RuntimeClass,CertificateApproval,CertificateSigning,CertificateSubjectRestriction,ValidatingAdmissionWebhook,ResourceQuota.
Jul 28 20:32:48 polux k3s[3367]: I0728 20:32:48.655993 3367 dynamic_cafile_content.go:167] Starting request-header::/var/lib/rancher/k3s/server/tls/request-header-ca.crt
Jul 28 20:32:48 polux k3s[3367]: I0728 20:32:48.656063 3367 dynamic_cafile_content.go:167] Starting client-ca-bundle::/var/lib/rancher/k3s/server/tls/client-ca.crt
Jul 28 20:32:48 polux k3s[3367]: I0728 20:32:48.656646 3367 dynamic_serving_content.go:130] Starting serving-cert::/var/lib/rancher/k3s/server/tls/serving-kube-apiserver.crt::/var/lib/rancher/k3s/server/tls/serving-kube-apiserver.key
Jul 28 20:32:48 polux k3s[3367]: I0728 20:32:48.658203 3367 secure_serving.go:178] Serving securely on 127.0.0.1:6444
Jul 28 20:32:48 polux k3s[3367]: I0728 20:32:48.658285 3367 tlsconfig.go:240] Starting DynamicServingCertificateController
Jul 28 20:32:48 polux k3s[3367]: I0728 20:32:48.658332 3367 autoregister_controller.go:141] Starting autoregister controller
Jul 28 20:32:48 polux k3s[3367]: I0728 20:32:48.658358 3367 cache.go:32] Waiting for caches to sync for autoregister controller
Jul 28 20:32:48 polux k3s[3367]: I0728 20:32:48.659964 3367 cluster_authentication_trust_controller.go:440] Starting cluster_authentication_trust_controller controller
Jul 28 20:32:48 polux k3s[3367]: I0728 20:32:48.660030 3367 shared_informer.go:223] Waiting for caches to sync for cluster_authentication_trust_controller
Jul 28 20:32:48 polux k3s[3367]: I0728 20:32:48.660157 3367 crd_finalizer.go:266] Starting CRDFinalizer
Jul 28 20:32:48 polux k3s[3367]: I0728 20:32:48.660238 3367 dynamic_cafile_content.go:167] Starting client-ca-bundle::/var/lib/rancher/k3s/server/tls/client-ca.crt
Jul 28 20:32:48 polux k3s[3367]: I0728 20:32:48.660332 3367 dynamic_cafile_content.go:167] Starting request-header::/var/lib/rancher/k3s/server/tls/request-header-ca.crt
Jul 28 20:32:48 polux k3s[3367]: I0728 20:32:48.662079 3367 available_controller.go:387] Starting AvailableConditionController
Jul 28 20:32:48 polux k3s[3367]: I0728 20:32:48.662148 3367 cache.go:32] Waiting for caches to sync for AvailableConditionController controller
Jul 28 20:32:48 polux k3s[3367]: I0728 20:32:48.662214 3367 crdregistration_controller.go:111] Starting crd-autoregister controller
Jul 28 20:32:48 polux k3s[3367]: I0728 20:32:48.662234 3367 shared_informer.go:223] Waiting for caches to sync for crd-autoregister
Jul 28 20:32:48 polux k3s[3367]: I0728 20:32:48.662082 3367 apiservice_controller.go:94] Starting APIServiceRegistrationController
Jul 28 20:32:48 polux k3s[3367]: I0728 20:32:48.662328 3367 cache.go:32] Waiting for caches to sync for APIServiceRegistrationController controller
Jul 28 20:32:48 polux k3s[3367]: I0728 20:32:48.662113 3367 controller.go:81] Starting OpenAPI AggregationController
Jul 28 20:32:48 polux k3s[3367]: I0728 20:32:48.664077 3367 controller.go:86] Starting OpenAPI controller
Jul 28 20:32:48 polux k3s[3367]: I0728 20:32:48.664180 3367 customresource_discovery_controller.go:209] Starting DiscoveryController
Jul 28 20:32:48 polux k3s[3367]: I0728 20:32:48.664232 3367 naming_controller.go:291] Starting NamingConditionController
Jul 28 20:32:48 polux k3s[3367]: I0728 20:32:48.664284 3367 establishing_controller.go:76] Starting EstablishingController
Jul 28 20:32:48 polux k3s[3367]: I0728 20:32:48.664355 3367 nonstructuralschema_controller.go:186] Starting NonStructuralSchemaConditionController
Jul 28 20:32:48 polux k3s[3367]: I0728 20:32:48.664406 3367 apiapproval_controller.go:186] Starting KubernetesAPIApprovalPolicyConformantConditionController
Jul 28 20:32:48 polux k3s[3367]: E0728 20:32:48.810415 3367 controller.go:156] Unable to remove old endpoints from kubernetes service: no master IPs were listed in storage, refusing to erase all endpoints for the kubernetes service
Jul 28 20:32:48 polux k3s[3367]: I0728 20:32:48.858559 3367 cache.go:39] Caches are synced for autoregister controller
Jul 28 20:32:48 polux k3s[3367]: I0728 20:32:48.860242 3367 shared_informer.go:230] Caches are synced for cluster_authentication_trust_controller
Jul 28 20:32:48 polux k3s[3367]: I0728 20:32:48.863682 3367 cache.go:39] Caches are synced for APIServiceRegistrationController controller
Jul 28 20:32:48 polux k3s[3367]: I0728 20:32:48.864186 3367 shared_informer.go:230] Caches are synced for crd-autoregister
Jul 28 20:32:48 polux k3s[3367]: I0728 20:32:48.864247 3367 cache.go:39] Caches are synced for AvailableConditionController controller
Jul 28 20:32:49 polux k3s[3367]: I0728 20:32:49.669086 3367 storage_scheduling.go:143] all system priority classes are created successfully or already exist.
Jul 28 20:32:50 polux k3s[3367]: I0728 20:32:50.575394 3367 controller.go:130] OpenAPI AggregationController: action for item : Nothing (removed from the queue).
Jul 28 20:32:50 polux k3s[3367]: I0728 20:32:50.575483 3367 controller.go:130] OpenAPI AggregationController: action for item k8s_internal_local_delegation_chain_0000000000: Nothing (removed from the queue).
Jul 28 20:32:50 polux k3s[3367]: I0728 20:32:50.775039 3367 registry.go:150] Registering EvenPodsSpread predicate and priority function
Jul 28 20:32:50 polux k3s[3367]: I0728 20:32:50.775119 3367 registry.go:150] Registering EvenPodsSpread predicate and priority function
Jul 28 20:32:50 polux k3s[3367]: time="2020-07-28T20:32:50.775834001Z" level=info msg="Running kube-scheduler --bind-address=127.0.0.1 --kubeconfig=/var/lib/rancher/k3s/server/cred/scheduler.kubeconfig --leader-elect=false --port=10251 --secure-port=0"
Jul 28 20:32:50 polux k3s[3367]: time="2020-07-28T20:32:50.777075941Z" level=info msg="Running kube-controller-manager --allocate-node-cidrs=true --bind-address=127.0.0.1 --cluster-cidr=10.42.0.0/16 --cluster-signing-cert-file=/var/lib/rancher/k3s/server/tls/server-ca.crt --cluster-signing-key-file=/var/lib/rancher/k3s/server/tls/server-ca.key --kubeconfig=/var/lib/rancher/k3s/server/cred/controller.kubeconfig --leader-elect=false --port=10252 --root-ca-file=/var/lib/rancher/k3s/server/tls/server-ca.crt --secure-port=0 --service-account-private-key-file=/var/lib/rancher/k3s/server/tls/service.key --use-service-account-credentials=true"
Jul 28 20:32:50 polux k3s[3367]: I0728 20:32:50.794019 3367 controllermanager.go:161] Version: v1.18.3+k3s1
Jul 28 20:32:50 polux k3s[3367]: I0728 20:32:50.795822 3367 deprecated_insecure_serving.go:53] Serving insecurely on [::]:10252
Jul 28 20:32:50 polux k3s[3367]: time="2020-07-28T20:32:50.799656121Z" level=info msg="Running cloud-controller-manager --allocate-node-cidrs=true --allow-untagged-cloud=true --bind-address=127.0.0.1 --cloud-provider=k3s --cluster-cidr=10.42.0.0/16 --kubeconfig=/var/lib/rancher/k3s/server/cred/cloud-controller.kubeconfig --leader-elect=false --node-status-update-frequency=1m --secure-port=0"
Jul 28 20:32:50 polux k3s[3367]: Flag --allow-untagged-cloud has been deprecated, This flag is deprecated and will be removed in a future release. A cluster-id will be required on cloud instances.
Jul 28 20:32:50 polux k3s[3367]: I0728 20:32:50.823862 3367 controllermanager.go:120] Version: v1.18.3+k3s1
Jul 28 20:32:50 polux k3s[3367]: W0728 20:32:50.823946 3367 controllermanager.go:132] detected a cluster without a ClusterID. A ClusterID will be required in the future. Please tag your cluster to avoid any future issues
Jul 28 20:32:50 polux k3s[3367]: I0728 20:32:50.836913 3367 registry.go:150] Registering EvenPodsSpread predicate and priority function
Jul 28 20:32:50 polux k3s[3367]: I0728 20:32:50.836976 3367 registry.go:150] Registering EvenPodsSpread predicate and priority function
Jul 28 20:32:50 polux k3s[3367]: I0728 20:32:50.839051 3367 node_controller.go:110] Sending events to api server.
Jul 28 20:32:50 polux k3s[3367]: I0728 20:32:50.839189 3367 controllermanager.go:247] Started "cloud-node"
Jul 28 20:32:50 polux k3s[3367]: W0728 20:32:50.843050 3367 authorization.go:47] Authorization is disabled
Jul 28 20:32:50 polux k3s[3367]: W0728 20:32:50.843107 3367 authentication.go:40] Authentication is disabled
Jul 28 20:32:50 polux k3s[3367]: I0728 20:32:50.843138 3367 deprecated_insecure_serving.go:51] Serving healthz insecurely on [::]:10251
Jul 28 20:32:50 polux k3s[3367]: I0728 20:32:50.845619 3367 node_lifecycle_controller.go:78] Sending events to api server
Jul 28 20:32:50 polux k3s[3367]: I0728 20:32:50.845791 3367 controllermanager.go:247] Started "cloud-node-lifecycle"
Jul 28 20:32:50 polux k3s[3367]: E0728 20:32:50.852243 3367 core.go:90] Failed to start service controller: the cloud provider does not support external load balancers
Jul 28 20:32:50 polux k3s[3367]: W0728 20:32:50.852314 3367 controllermanager.go:244] Skipping "service"
Jul 28 20:32:50 polux k3s[3367]: W0728 20:32:50.852342 3367 core.go:108] configure-cloud-routes is set, but cloud provider does not support routes. Will not configure cloud provider routes.
Jul 28 20:32:50 polux k3s[3367]: W0728 20:32:50.852360 3367 controllermanager.go:244] Skipping "route"
Jul 28 20:32:50 polux k3s[3367]: E0728 20:32:50.880778 3367 node_controller.go:245] Error getting node addresses for node "polux": error fetching node by provider ID: unimplemented, and error by node name: Failed to find node polux: node "polux" not found
Jul 28 20:32:50 polux k3s[3367]: time="2020-07-28T20:32:50.894978323Z" level=info msg="Writing static file: /var/lib/rancher/k3s/server/static/charts/traefik-1.81.0.tgz"
Jul 28 20:32:50 polux k3s[3367]: time="2020-07-28T20:32:50.896046509Z" level=info msg="Writing manifest: /var/lib/rancher/k3s/server/manifests/metrics-server/metrics-apiservice.yaml"
Jul 28 20:32:50 polux k3s[3367]: time="2020-07-28T20:32:50.896886203Z" level=info msg="Writing manifest: /var/lib/rancher/k3s/server/manifests/metrics-server/metrics-server-deployment.yaml"
Jul 28 20:32:50 polux k3s[3367]: time="2020-07-28T20:32:50.897629363Z" level=info msg="Writing manifest: /var/lib/rancher/k3s/server/manifests/metrics-server/metrics-server-service.yaml"
Jul 28 20:32:50 polux k3s[3367]: time="2020-07-28T20:32:50.898372134Z" level=info msg="Writing manifest: /var/lib/rancher/k3s/server/manifests/rolebindings.yaml"
Jul 28 20:32:50 polux k3s[3367]: time="2020-07-28T20:32:50.899056278Z" level=info msg="Writing manifest: /var/lib/rancher/k3s/server/manifests/traefik.yaml"
Jul 28 20:32:50 polux k3s[3367]: time="2020-07-28T20:32:50.899827807Z" level=info msg="Writing manifest: /var/lib/rancher/k3s/server/manifests/coredns.yaml"
Jul 28 20:32:50 polux k3s[3367]: time="2020-07-28T20:32:50.900547468Z" level=info msg="Writing manifest: /var/lib/rancher/k3s/server/manifests/metrics-server/auth-delegator.yaml"
Jul 28 20:32:50 polux k3s[3367]: time="2020-07-28T20:32:50.901229611Z" level=info msg="Writing manifest: /var/lib/rancher/k3s/server/manifests/metrics-server/aggregated-metrics-reader.yaml"
Jul 28 20:32:50 polux k3s[3367]: time="2020-07-28T20:32:50.901969457Z" level=info msg="Writing manifest: /var/lib/rancher/k3s/server/manifests/metrics-server/auth-reader.yaml"
Jul 28 20:32:50 polux k3s[3367]: time="2020-07-28T20:32:50.902669007Z" level=info msg="Writing manifest: /var/lib/rancher/k3s/server/manifests/metrics-server/resource-reader.yaml"
Jul 28 20:32:50 polux k3s[3367]: time="2020-07-28T20:32:50.903395983Z" level=info msg="Writing manifest: /var/lib/rancher/k3s/server/manifests/ccm.yaml"
Jul 28 20:32:50 polux k3s[3367]: time="2020-07-28T20:32:50.904183271Z" level=info msg="Writing manifest: /var/lib/rancher/k3s/server/manifests/local-storage.yaml"
Jul 28 20:32:51 polux k3s[3367]: time="2020-07-28T20:32:51.106800668Z" level=info msg="Node token is available at /var/lib/rancher/k3s/server/token"
Jul 28 20:32:51 polux k3s[3367]: time="2020-07-28T20:32:51.106918849Z" level=info msg="To join node to cluster: k3s agent -s https://192.168.0.95:6443 -t ${NODE_TOKEN}"
Jul 28 20:32:51 polux k3s[3367]: time="2020-07-28T20:32:51.107661252Z" level=info msg="Starting k3s.cattle.io/v1, Kind=Addon controller"
Jul 28 20:32:51 polux k3s[3367]: time="2020-07-28T20:32:51.109963234Z" level=info msg="Waiting for master node startup: resource name may not be empty"
Jul 28 20:32:51 polux k3s[3367]: I0728 20:32:51.280482 3367 controller.go:606] quota admission added evaluator for: addons.k3s.cattle.io
Jul 28 20:32:51 polux k3s[3367]: http: TLS handshake error from 127.0.0.1:52452: remote error: tls: bad certificate
Jul 28 20:32:51 polux k3s[3367]: time="2020-07-28T20:32:51.311790482Z" level=info msg="Starting /v1, Kind=Service controller"
Jul 28 20:32:51 polux k3s[3367]: time="2020-07-28T20:32:51.311792945Z" level=info msg="Starting /v1, Kind=Node controller"
Jul 28 20:32:51 polux k3s[3367]: time="2020-07-28T20:32:51.311831999Z" level=info msg="Starting helm.cattle.io/v1, Kind=HelmChart controller"
Jul 28 20:32:51 polux k3s[3367]: time="2020-07-28T20:32:51.311850369Z" level=info msg="Starting batch/v1, Kind=Job controller"
Jul 28 20:32:51 polux k3s[3367]: time="2020-07-28T20:32:51.311868961Z" level=info msg="Starting /v1, Kind=Pod controller"
Jul 28 20:32:51 polux k3s[3367]: time="2020-07-28T20:32:51.311889090Z" level=info msg="Starting /v1, Kind=Endpoints controller"
Jul 28 20:32:51 polux k3s[3367]: time="2020-07-28T20:32:51.329592677Z" level=info msg="Wrote kubeconfig /etc/rancher/k3s/k3s.yaml"
Jul 28 20:32:51 polux k3s[3367]: time="2020-07-28T20:32:51.329687285Z" level=info msg="Run: k3s kubectl"
Jul 28 20:32:51 polux k3s[3367]: time="2020-07-28T20:32:51.329721247Z" level=info msg="k3s is up and running"
Jul 28 20:32:51 polux k3s[3367]: time="2020-07-28T20:32:51.330231508Z" level=info msg="module overlay was already loaded"
Jul 28 20:32:51 polux systemd[1]: Started Lightweight Kubernetes.
Jul 28 20:32:51 polux k3s[3367]: time="2020-07-28T20:32:51.387743740Z" level=warning msg="failed to start nf_conntrack module"
Jul 28 20:32:51 polux k3s[3367]: time="2020-07-28T20:32:51.441464037Z" level=warning msg="failed to start br_netfilter module"
Jul 28 20:32:51 polux k3s[3367]: time="2020-07-28T20:32:51.442062425Z" level=warning msg="failed to write value 1 at /proc/sys/net/bridge/bridge-nf-call-iptables: open /proc/sys/net/bridge/bridge-nf-call-iptables: no such file or directory"
Jul 28 20:32:51 polux k3s[3367]: time="2020-07-28T20:32:51.442193421Z" level=warning msg="failed to write value 1 at /proc/sys/net/bridge/bridge-nf-call-ip6tables: open /proc/sys/net/bridge/bridge-nf-call-ip6tables: no such file or directory"
Jul 28 20:32:51 polux k3s[3367]: http: TLS handshake error from 127.0.0.1:52460: remote error: tls: bad certificate
Jul 28 20:32:51 polux k3s[3367]: I0728 20:32:51.477651 3367 controller.go:606] quota admission added evaluator for: helmcharts.helm.cattle.io
Jul 28 20:32:51 polux k3s[3367]: http: TLS handshake error from 127.0.0.1:52466: remote error: tls: bad certificate
Jul 28 20:32:51 polux k3s[3367]: time="2020-07-28T20:32:51.666280703Z" level=info msg="Starting /v1, Kind=Secret controller"
Jul 28 20:32:51 polux k3s[3367]: time="2020-07-28T20:32:51.670020696Z" level=info msg="Logging containerd to /var/lib/rancher/k3s/agent/containerd/containerd.log"
Jul 28 20:32:51 polux k3s[3367]: time="2020-07-28T20:32:51.670918131Z" level=info msg="Running containerd -c /var/lib/rancher/k3s/agent/etc/containerd/config.toml -a /run/k3s/containerd/containerd.sock --state /run/k3s/containerd --root /var/lib/rancher/k3s/agent/containerd"
Jul 28 20:32:51 polux k3s[3367]: time="2020-07-28T20:32:51.695371114Z" level=info msg="Active TLS secret k3s-serving (ver=179) (count 7): map[listener.cattle.io/cn-10.43.0.1:10.43.0.1 listener.cattle.io/cn-127.0.0.1:127.0.0.1 listener.cattle.io/cn-192.168.0.95:192.168.0.95 listener.cattle.io/cn-kubernetes:kubernetes listener.cattle.io/cn-kubernetes.default:kubernetes.default listener.cattle.io/cn-kubernetes.default.svc.cluster.local:kubernetes.default.svc.cluster.local listener.cattle.io/cn-localhost:localhost listener.cattle.io/hash:0122b4bba1a4ad7409c6fb2faf50ed6430fa7426dffb036658d0a84d1ecef5c7]"
Jul 28 20:32:51 polux k3s[3367]: time="2020-07-28T20:32:51.758789640Z" level=info msg="Connecting to proxy" url="wss://192.168.0.95:6443/v1-k3s/connect"
Jul 28 20:32:51 polux k3s[3367]: time="2020-07-28T20:32:51.772151272Z" level=info msg="Handling backend connection request [polux]"
Jul 28 20:32:51 polux k3s[3367]: time="2020-07-28T20:32:51.789101698Z" level=info msg="Running kubelet --address=0.0.0.0 --anonymous-auth=false --authentication-token-webhook=true --authorization-mode=Webhook --cgroup-driver=cgroupfs --client-ca-file=/var/lib/rancher/k3s/agent/client-ca.crt --cloud-provider=external --cluster-dns=10.43.0.10 --cluster-domain=cluster.local --cni-bin-dir=/var/lib/rancher/k3s/data/8963b85492ae8de2b3bbd12a0773ef069eb37c584017ea159104e3016b778bd9/bin --cni-conf-dir=/var/lib/rancher/k3s/agent/etc/cni/net.d --container-runtime-endpoint=/run/k3s/containerd/containerd.sock --container-runtime=remote --containerd=/run/k3s/containerd/containerd.sock --eviction-hard=imagefs.available<5%,nodefs.available<5% --eviction-minimum-reclaim=imagefs.available=10%,nodefs.available=10% --fail-swap-on=false --healthz-bind-address=127.0.0.1 --hostname-override=polux --kubeconfig=/var/lib/rancher/k3s/agent/kubelet.kubeconfig --kubelet-cgroups=/systemd/system.slice --node-labels= --read-only-port=0 --resolv-conf=/run/systemd/resolve/resolv.conf --runtime-cgroups=/systemd/system.slice --serialize-image-pulls=false --tls-cert-file=/var/lib/rancher/k3s/agent/serving-kubelet.crt --tls-private-key-file=/var/lib/rancher/k3s/agent/serving-kubelet.key"
Jul 28 20:32:51 polux k3s[3367]: Flag --containerd has been deprecated, This is a cadvisor flag that was mistakenly registered with the Kubelet. Due to legacy concerns, it will follow the standard CLI deprecation timeline before being removed.
Jul 28 20:32:51 polux k3s[3367]: time="2020-07-28T20:32:51.816696063Z" level=info msg="Running kube-proxy --cluster-cidr=10.42.0.0/16 --healthz-bind-address=127.0.0.1 --hostname-override=polux --kubeconfig=/var/lib/rancher/k3s/agent/kubeproxy.kubeconfig --proxy-mode=iptables"
Jul 28 20:32:51 polux k3s[3367]: W0728 20:32:51.817334 3367 server.go:225] WARNING: all flags other than --config, --write-config-to, and --cleanup are deprecated. Please begin using a config file ASAP.
Jul 28 20:32:51 polux k3s[3367]: time="2020-07-28T20:32:51.909115669Z" level=info msg="waiting for node polux CIDR not assigned yet"
Jul 28 20:32:51 polux k3s[3367]: I0728 20:32:51.943631 3367 server.go:413] Version: v1.18.3+k3s1
Jul 28 20:32:52 polux k3s[3367]: E0728 20:32:52.041055 3367 machine.go:331] failed to get cache information for node 0: open /sys/devices/system/cpu/cpu0/cache: no such file or directory
Jul 28 20:32:52 polux k3s[3367]: I0728 20:32:52.043275 3367 server.go:644] --cgroups-per-qos enabled, but --cgroup-root was not specified. defaulting to /
Jul 28 20:32:52 polux k3s[3367]: I0728 20:32:52.045018 3367 container_manager_linux.go:277] container manager verified user specified cgroup-root exists: []
Jul 28 20:32:52 polux k3s[3367]: I0728 20:32:52.045819 3367 container_manager_linux.go:282] Creating Container Manager object based on Node Config: {RuntimeCgroupsName:/systemd/system.slice SystemCgroupsName: KubeletCgroupsName:/systemd/system.slice ContainerRuntime:remote CgroupsPerQOS:true CgroupRoot:/ CgroupDriver:cgroupfs KubeletRootDir:/var/lib/kubelet ProtectKernelDefaults:false NodeAllocatableConfig:{KubeReservedCgroupName: SystemReservedCgroupName: ReservedSystemCPUs: EnforceNodeAllocatable:map[pods:{}] KubeReserved:map[] SystemReserved:map[] HardEvictionThresholds:[{Signal:imagefs.available Operator:LessThan Value:{Quantity:<nil> Percentage:0.05} GracePeriod:0s MinReclaim:<nil>} {Signal:nodefs.available Operator:LessThan Value:{Quantity:<nil> Percentage:0.05} GracePeriod:0s MinReclaim:<nil>}]} QOSReserved:map[] ExperimentalCPUManagerPolicy:none ExperimentalCPUManagerReconcilePeriod:10s ExperimentalPodPidsLimit:-1 EnforceCPULimits:true CPUCFSQuotaPeriod:100ms ExperimentalTopologyManagerPolicy:none}
Jul 28 20:32:52 polux k3s[3367]: I0728 20:32:52.047071 3367 topology_manager.go:126] [topologymanager] Creating topology manager with none policy
Jul 28 20:32:52 polux k3s[3367]: I0728 20:32:52.047570 3367 container_manager_linux.go:312] [topologymanager] Initializing Topology Manager with none policy
Jul 28 20:32:52 polux k3s[3367]: I0728 20:32:52.048027 3367 container_manager_linux.go:317] Creating device plugin manager: true
Jul 28 20:32:52 polux k3s[3367]: W0728 20:32:52.048729 3367 util_unix.go:103] Using "/run/k3s/containerd/containerd.sock" as endpoint is deprecated, please consider using full url format "unix:///run/k3s/containerd/containerd.sock".
Jul 28 20:32:52 polux k3s[3367]: W0728 20:32:52.049435 3367 util_unix.go:103] Using "/run/k3s/containerd/containerd.sock" as endpoint is deprecated, please consider using full url format "unix:///run/k3s/containerd/containerd.sock".
Jul 28 20:32:52 polux k3s[3367]: I0728 20:32:52.050286 3367 kubelet.go:317] Watching apiserver
Jul 28 20:32:52 polux k3s[3367]: W0728 20:32:52.068214 3367 proxier.go:635] Failed to load kernel module ip_vs with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules
Jul 28 20:32:52 polux k3s[3367]: I0728 20:32:52.153491 3367 kuberuntime_manager.go:211] Container runtime containerd initialized, version: v1.3.3-k3s2, apiVersion: v1alpha2
Jul 28 20:32:52 polux k3s[3367]: I0728 20:32:52.157022 3367 server.go:1123] Started kubelet
Jul 28 20:32:52 polux k3s[3367]: W0728 20:32:52.160588 3367 proxier.go:635] Failed to load kernel module ip_vs_rr with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules
Jul 28 20:32:52 polux k3s[3367]: I0728 20:32:52.186321 3367 fs_resource_analyzer.go:64] Starting FS ResourceAnalyzer
Jul 28 20:32:52 polux k3s[3367]: I0728 20:32:52.221318 3367 server.go:145] Starting to listen on 0.0.0.0:10250
Jul 28 20:32:52 polux k3s[3367]: E0728 20:32:52.234482 3367 server.go:792] Starting healthz server failed: listen tcp 127.0.0.1:10248: bind: address already in use
Jul 28 20:32:52 polux k3s[3367]: I0728 20:32:52.241643 3367 volume_manager.go:265] Starting Kubelet Volume Manager
Jul 28 20:32:52 polux k3s[3367]: W0728 20:32:52.251343 3367 proxier.go:635] Failed to load kernel module ip_vs_wrr with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules
Jul 28 20:32:52 polux k3s[3367]: I0728 20:32:52.271470 3367 desired_state_of_world_populator.go:139] Desired state populator starts to run
Jul 28 20:32:52 polux k3s[3367]: I0728 20:32:52.288827 3367 server.go:393] Adding debug handlers to kubelet server.
Jul 28 20:32:52 polux k3s[3367]: F0728 20:32:52.294996 3367 server.go:159] listen tcp 0.0.0.0:10250: bind: address already in use
Jul 28 20:32:52 polux systemd[1]: k3s.service: Main process exited, code=exited, status=255/EXCEPTION
Jul 28 20:32:52 polux systemd[1]: k3s.service: Failed with result 'exit-code'.
ansible-playbook output:
$ ansible-playbook site.yml -i inventory/rpi-galaxy/hosts.ini
PLAY [k3s_cluster] **********************************************************************************************************************
TASK [Gathering Facts] ******************************************************************************************************************
Tuesday 28 July 2020 21:29:51 +0100 (0:00:00.242) 0:00:00.242 **********
ok: [polux]
TASK [prereq : Set SELinux to disabled state] *******************************************************************************************
Tuesday 28 July 2020 21:30:08 +0100 (0:00:17.840) 0:00:18.082 **********
skipping: [polux]
TASK [prereq : Enable IPv4 forwarding] **************************************************************************************************
Tuesday 28 July 2020 21:30:09 +0100 (0:00:00.230) 0:00:18.313 **********
ok: [polux]
TASK [prereq : Enable IPv6 forwarding] **************************************************************************************************
Tuesday 28 July 2020 21:30:10 +0100 (0:00:01.196) 0:00:19.510 **********
ok: [polux]
TASK [prereq : Add br_netfilter to /etc/modules-load.d/] ********************************************************************************
Tuesday 28 July 2020 21:30:11 +0100 (0:00:00.860) 0:00:20.370 **********
skipping: [polux]
TASK [prereq : Load br_netfilter] *******************************************************************************************************
Tuesday 28 July 2020 21:30:11 +0100 (0:00:00.221) 0:00:20.591 **********
skipping: [polux]
TASK [prereq : Set bridge-nf-call-iptables (just to be sure)] ***************************************************************************
Tuesday 28 July 2020 21:30:11 +0100 (0:00:00.220) 0:00:20.813 **********
skipping: [polux] => (item=net.bridge.bridge-nf-call-iptables)
skipping: [polux] => (item=net.bridge.bridge-nf-call-ip6tables)
TASK [prereq : Add /usr/local/bin to sudo secure_path] **********************************************************************************
Tuesday 28 July 2020 21:30:11 +0100 (0:00:00.240) 0:00:21.053 **********
skipping: [polux]
TASK [download : Delete k3s if already present] *****************************************************************************************
Tuesday 28 July 2020 21:30:12 +0100 (0:00:00.220) 0:00:21.274 **********
ok: [polux]
TASK [download : Download k3s binary x64] ***********************************************************************************************
Tuesday 28 July 2020 21:30:13 +0100 (0:00:01.057) 0:00:22.331 **********
skipping: [polux]
TASK [download : Download k3s binary arm64] *********************************************************************************************
Tuesday 28 July 2020 21:30:13 +0100 (0:00:00.279) 0:00:22.610 **********
changed: [polux]
TASK [download : Download k3s binary armhf] *********************************************************************************************
Tuesday 28 July 2020 21:30:20 +0100 (0:00:06.728) 0:00:29.339 **********
skipping: [polux]
TASK [raspbian : Test for Raspbian] *****************************************************************************************************
Tuesday 28 July 2020 21:30:20 +0100 (0:00:00.248) 0:00:29.588 **********
ok: [polux]
TASK [raspbian : Activating cgroup support] *********************************************************************************************
Tuesday 28 July 2020 21:30:20 +0100 (0:00:00.311) 0:00:29.899 **********
skipping: [polux]
TASK [raspbian : Flush iptables before changing to iptables-legacy] *********************************************************************
Tuesday 28 July 2020 21:30:20 +0100 (0:00:00.244) 0:00:30.144 **********
skipping: [polux]
TASK [raspbian : Changing to iptables-legacy] *******************************************************************************************
Tuesday 28 July 2020 21:30:21 +0100 (0:00:00.214) 0:00:30.358 **********
skipping: [polux]
TASK [raspbian : Changing to ip6tables-legacy] ******************************************************************************************
Tuesday 28 July 2020 21:30:21 +0100 (0:00:00.211) 0:00:30.569 **********
skipping: [polux]
TASK [raspbian : Rebooting] *************************************************************************************************************
Tuesday 28 July 2020 21:30:21 +0100 (0:00:00.296) 0:00:30.865 **********
skipping: [polux]
TASK [ubuntu : Enable cgroup via boot commandline if not already enabled] ***************************************************************
Tuesday 28 July 2020 21:30:21 +0100 (0:00:00.210) 0:00:31.076 **********
changed: [polux]
TASK [ubuntu : Reboot to enable cgroups] ************************************************************************************************
Tuesday 28 July 2020 21:30:22 +0100 (0:00:01.025) 0:00:32.101 **********
changed: [polux]
PLAY [master] ***************************************************************************************************************************
TASK [Gathering Facts] ******************************************************************************************************************
Tuesday 28 July 2020 21:31:22 +0100 (0:00:59.281) 0:01:31.383 **********
ok: [polux]
TASK [k3s/master : Copy K3s service file] ***********************************************************************************************
Tuesday 28 July 2020 21:31:28 +0100 (0:00:06.532) 0:01:37.915 **********
changed: [polux]
TASK [k3s/master : Enable and check K3s service] ****************************************************************************************
Tuesday 28 July 2020 21:31:30 +0100 (0:00:01.795) 0:01:39.711 **********
changed: [polux]
TASK [k3s/master : Wait for node-token] *************************************************************************************************
Tuesday 28 July 2020 21:31:54 +0100 (0:00:24.478) 0:02:04.189 **********
ok: [polux]
TASK [k3s/master : Register node-token file access mode] ********************************************************************************
Tuesday 28 July 2020 21:31:56 +0100 (0:00:01.209) 0:02:05.398 **********
ok: [polux]
TASK [k3s/master : Change file access node-token] ***************************************************************************************
Tuesday 28 July 2020 21:31:57 +0100 (0:00:01.008) 0:02:06.407 **********
changed: [polux]
TASK [k3s/master : Read node-token from master] *****************************************************************************************
Tuesday 28 July 2020 21:31:58 +0100 (0:00:00.884) 0:02:07.291 **********
ok: [polux]
TASK [k3s/master : Store Master node-token] *********************************************************************************************
Tuesday 28 July 2020 21:31:59 +0100 (0:00:00.979) 0:02:08.270 **********
ok: [polux]
TASK [k3s/master : Restore node-token file access] **************************************************************************************
Tuesday 28 July 2020 21:31:59 +0100 (0:00:00.277) 0:02:08.548 **********
changed: [polux]
TASK [k3s/master : Create directory .kube] **********************************************************************************************
Tuesday 28 July 2020 21:32:00 +0100 (0:00:00.835) 0:02:09.383 **********
ok: [polux]
TASK [k3s/master : Copy config file to user home directory] *****************************************************************************
Tuesday 28 July 2020 21:32:00 +0100 (0:00:00.790) 0:02:10.174 **********
changed: [polux]
TASK [k3s/master : Replace https://localhost:6443 by https://master-ip:6443] ************************************************************
Tuesday 28 July 2020 21:32:01 +0100 (0:00:00.788) 0:02:10.963 **********
changed: [polux]
TASK [k3s/master : Create kubectl symlink] **********************************************************************************************
Tuesday 28 July 2020 21:32:03 +0100 (0:00:02.014) 0:02:12.978 **********
changed: [polux]
TASK [k3s/master : Create crictl symlink] ***********************************************************************************************
Tuesday 28 July 2020 21:32:04 +0100 (0:00:00.810) 0:02:13.788 **********
changed: [polux]
PLAY [node] *****************************************************************************************************************************
TASK [Gathering Facts] ******************************************************************************************************************
Tuesday 28 July 2020 21:32:05 +0100 (0:00:00.834) 0:02:14.622 **********
ok: [polux]
TASK [k3s/node : Copy K3s service file] *************************************************************************************************
Tuesday 28 July 2020 21:32:11 +0100 (0:00:05.772) 0:02:20.394 **********
changed: [polux]
TASK [k3s/node : Enable and check K3s service] ******************************************************************************************
Tuesday 28 July 2020 21:32:12 +0100 (0:00:01.680) 0:02:22.075 **********
changed: [polux]
PLAY RECAP ******************************************************************************************************************************
polux : ok=25 changed=13 unreachable=0 failed=0
Tuesday 28 July 2020 21:32:34 +0100 (0:00:21.315) 0:02:43.391 **********
===============================================================================
ubuntu : Reboot to enable cgroups ----------------------------------------------------------------------------------------------- 59.28s
k3s/master : Enable and check K3s service --------------------------------------------------------------------------------------- 24.48s
k3s/node : Enable and check K3s service ----------------------------------------------------------------------------------------- 21.32s
Gathering Facts ----------------------------------------------------------------------------------------------------------------- 17.84s
download : Download k3s binary arm64 --------------------------------------------------------------------------------------------- 6.73s
Gathering Facts ------------------------------------------------------------------------------------------------------------------ 6.53s
Gathering Facts ------------------------------------------------------------------------------------------------------------------ 5.77s
k3s/master : Replace https://localhost:6443 by https://master-ip:6443 ------------------------------------------------------------ 2.01s
k3s/master : Copy K3s service file ----------------------------------------------------------------------------------------------- 1.80s
k3s/node : Copy K3s service file ------------------------------------------------------------------------------------------------- 1.68s
k3s/master : Wait for node-token ------------------------------------------------------------------------------------------------- 1.21s
prereq : Enable IPv4 forwarding -------------------------------------------------------------------------------------------------- 1.20s
download : Delete k3s if already present ----------------------------------------------------------------------------------------- 1.06s
ubuntu : Enable cgroup via boot commandline if not already enabled --------------------------------------------------------------- 1.03s
k3s/master : Register node-token file access mode -------------------------------------------------------------------------------- 1.01s
k3s/master : Read node-token from master ----------------------------------------------------------------------------------------- 0.98s
k3s/master : Change file access node-token --------------------------------------------------------------------------------------- 0.88s
prereq : Enable IPv6 forwarding -------------------------------------------------------------------------------------------------- 0.86s
k3s/master : Restore node-token file access -------------------------------------------------------------------------------------- 0.84s
k3s/master : Create crictl symlink ----------------------------------------------------------------------------------------------- 0.83s
Version:
N/A
K3s arguments:
N/A
Describe the bug
After applying the fix in k3s-io/k3s#1730, to make the 'reboot on raspbian' task actually work (without a fatal error), I realized that this causes another problem: when the ARM servers reboot mid-playbook, the playbook fails. Even if only the master
node fails, everything else will fail at the Copy the K3s service file
task with the message:
AnsibleUndefinedVariable: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'token'
To Reproduce
Run the Ansible playbook on ARM servers.
Expected behavior
The playbook completes successfully, and reboots the ARM servers as required in the Rebooting on Raspbian
task.
Actual behavior
TASK [raspbian : Rebooting on Raspbian] ********************************************************************************
Saturday 02 May 2020 11:36:06 -0500 (0:00:02.881) 0:00:38.813 **********
skipping: [worker-01]
skipping: [worker-02]
skipping: [worker-03]
skipping: [worker-04]
skipping: [worker-05]
skipping: [worker-06]
fatal: [turing-master]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: Shared connection to turing-master closed.", "unreachable": true}
Which, in turn, causes all the other hosts to fail:
TASK [k3s/node : Copy K3s service file] ********************************************************************************
Saturday 02 May 2020 11:36:14 -0500 (0:00:06.435) 0:00:46.844 **********
fatal: [worker-01]: FAILED! => {"changed": false, "msg": "AnsibleUndefinedVariable: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'token'"}
fatal: [worker-02]: FAILED! => {"changed": false, "msg": "AnsibleUndefinedVariable: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'token'"}
fatal: [worker-03]: FAILED! => {"changed": false, "msg": "AnsibleUndefinedVariable: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'token'"}
fatal: [worker-05]: FAILED! => {"changed": false, "msg": "AnsibleUndefinedVariable: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'token'"}
fatal: [worker-04]: FAILED! => {"changed": false, "msg": "AnsibleUndefinedVariable: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'token'"}
fatal: [worker-06]: FAILED! => {"changed": false, "msg": "AnsibleUndefinedVariable: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'token'"}
Additional context / logs
Moved from k3s repo issue k3s-io/k3s#1732
When running site playbook on a cluster of pi's it throws a syntax error.
7x Pi 4 (8GB RAM)
ansible-playbook site.yml -i inventory/my-cluster/hosts.ini
TASK [raspbian : Activating cgroup support]
and throws a syntax error.// inventory/my-cluster/group_vars/all.yml
---
k3s_version: v1.17.5+k3s1
ansible_user: pi
systemd_dir: /etc/systemd/system
master_ip: "{{ hostvars[groups['master'][0]]['ansible_host'] | default(groups['master'][0]) }}"
extra_server_args: "--docker --no-deploy traefik"
// inventory/my-cluster/hosts.ini
[master]
192.168.1.xx
[node]
192.168.1.xx
192.168.1.xx
192.168.1.xx
192.168.1.xx
192.168.1.xx
192.168.1.xx
[k3s_cluster:children]
master
node
When running reset playbook on a cluster of pi's it refuses to complete.
7x Pi 4 (8GB RAM)
ansible-playbook site.yml -i inventory/my-cluster/hosts.ini
ansible-playbook reset.yml -i inventory/my-cluster/hosts.ini
TASK [reset : daemon_reload]
and get stuck forever (tried leaving it for at least half an hour)// inventory/my-cluster/group_vars/all.yml
---
k3s_version: v1.17.5+k3s1
ansible_user: pi
systemd_dir: /etc/systemd/system
master_ip: "{{ hostvars[groups['master'][0]]['ansible_host'] | default(groups['master'][0]) }}"
extra_server_args: "--docker --no-deploy traefik"
// inventory/my-cluster/hosts.ini
[master]
192.168.1.xx
[node]
192.168.1.xx
192.168.1.xx
192.168.1.xx
192.168.1.xx
192.168.1.xx
192.168.1.xx
[k3s_cluster:children]
master
node
ok: [192.168.1.xx] => {
"changed": false,
"invocation": {
"module_args": {
"daemon_reexec": false,
"daemon_reload": true,
"enabled": null,
"force": null,
"masked": null,
"name": null,
"no_block": false,
"scope": null,
"state": null,
"user": null
}
},
"name": null,
"status": {}
}
Sense it's the last task of that reset role, it is in fact deleting and cleaning the installation, however it still gets stuck without exiting.
name: "{{ items }}"
should be
name: "{{ item }}"
From a commenter on on of my YouTube videos:
something I noticed is that using k3s-ansible on the beta 64bit OS didn't work, it was missing the k3s binary and can't find it .. did you faced the same? and if so how did you fixed that
(see comment).
I've been slowly working through testing some of my own automation on the new 64-bit version of the Pi OS, and I've found that some images and binaries have to be downloaded differently based on the arch (which, in the past, I always assumed was armv7 or arm32 on Raspbian, which is not necessarily true as of yesterday).
So this issue is mostly a reminder to me to do some work testing k3s-ansible on the 64-bit OS. I'm also tracking this internally for my Turing Pi cluster work, which uses a mix of different Pi versions (some which can't run Pi OS 64-bit), so it would be helpful to be able to make it work with all flavors for the foreseeable future.
validate k3s checksum on download roles and shouldn't be deleted if checksum match when k3s already exists
For the file-related tasks in the download role, the mode
should be an octal (or needs to be quoted), otherwise the permissions will not be set properly.
https://github.com/rancher/k3s-ansible/blob/master/roles/download/tasks/main.yml#L8-L15
These should be like:
mode: 0755
From Ansible's docs on get_url:
You must either add a leading zero so that Ansible's YAML parser knows it is an octal number (like
0644
or01777
) or quote it (like'644'
or'1777'
) so Ansible receives a string and can do its own conversion from string into number.
It seems this is already correct in the k3s
role file-related tasks, so let's get these download tasks in the same format.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.