Giter Site home page Giter Site logo

k3s-ansible's People

Contributors

arpankapoor avatar b-m-f avatar bubylou avatar clambin avatar curx avatar dereknola avatar dmitriysafronov avatar dreamingdeer avatar erikwilson avatar frankkkkk avatar galal-hussein avatar geerlingguy avatar itwars avatar jeffspahr avatar jiayihu avatar jlpedrosa avatar johnthenerd avatar jon-stumpf avatar laszlojau avatar lentzi90 avatar nickto avatar orlovmyk avatar pieterv-icloud-com avatar rockaut avatar roivanov avatar simagick avatar st0rmingbr4in avatar stafwag avatar tamsky avatar zaherg avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

k3s-ansible's Issues

"raspbian : Test for Raspbian" runs on centos7 nodes and failed

encountering an error when trying to create a cluster on centos 7.8.2003 raspberry pi

Saturday 18 July 2020 18:37:32 -0700 (0:01:05.670) 0:01:48.058 *********
fatal: [192.168.x.y]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'description'\n\nThe error appears to be in '/home/someone/Codes/k3s-ansible/roles/raspbian/tasks/main.yml': line 2, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n---\n- name: Test for Raspbian\n ^ here\n"}

ansible 2.9.10 on Fedora 32, with python 3.8.3

Raspbian Task main.yml generates a syntax error

The 'when' conditionals (bolded below) in the raspbian task main.yml generate a syntax error (see below). Update them to 'raspbian == true' to resolve the syntax.

  • name: Activating cgroup support
    lineinfile:
    path: /boot/cmdline.txt
    regexp: '^((?!.\bcgroup_enable=cpuset cgroup_memory=1 cgroup_enable=memory\b).)$'
    line: '\1 cgroup_enable=cpuset cgroup_memory=1 cgroup_enable=memory'
    backrefs: true
    notify: reboot
    **when:

    • raspbian is true**
  • name: Flush iptables before changing to iptables-legacy
    iptables:
    flush: true
    when: raspbian
    changed_when: false # iptables flush always returns changed

  • name: Changing to iptables-legacy
    alternatives:
    path: /usr/sbin/iptables-legacy
    name: iptables
    register: ip4_legacy
    when: raspbian

  • name: Changing to ip6tables-legacy
    alternatives:
    path: /usr/sbin/ip6tables-legacy
    name: ip6tables
    register: ip6_legacy
    when: raspbian

syntax error:

"The conditional check 'raspbian is true' failed. The error was: template error while templating string: no test named 'true'. String: {% if raspbian is true %} True {% else %} False {% endif %}

"Set bridge-nf-call-iptables (just to be sure)" failed on centos7 nodes

fatal: [192.168.x.y]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'items' is undefined\n\nThe error appears to be in '/home/someone/Codes/k3s-ansible/roles/prereq/tasks/main.yml': line 33, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Set bridge-nf-call-iptables (just to be sure)\n ^ here\n"}

I am trying to run this on 3x raspberry pi 3, running centos7.8.2003.

System containers not creating.

james@dragon:~/Downloads/k3s-ansible-master$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system helm-install-traefik-xbgdc 0/1 ContainerCreating 0 18m
kube-system local-path-provisioner-58fb86bdfd-hdddq 0/1 ContainerCreating 0 18m
kube-system metrics-server-6d684c7b5-xhcq9 0/1 ContainerCreating 0 18m
kube-system coredns-6c6bb68b64-f99km 0/1 ContainerCreating 0 18m

Warning FailedCreatePodSandBox 47s (x35 over 8m42s) kubelet, pine1 (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create containerd task: failed to mount rootfs component &{overlay overlay [workdir=/var/lib/rancher/k3s/agent/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/175/work upperdir=/var/lib/rancher/k3s/agent/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/175/fs lowerdir=/var/lib/rancher/k3s/agent/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/1/fs]}: no such device: unknown

i was unable to find a /snapshots/175 and only found a /snapshots/1/

did a reset and re-deploy and have the same issue.

Ansible inventory name 'k3s-cluster' fails validation in the latest Ansible version

Version:

N/A

K3s arguments:

N/A

Describe the bug

When I run the ansible playbook with Ansible 2.9.6, I get a warning saying one of the group names in the inventory is invalid.

This seems to be related to this issue in Ansible's repository ansible/ansible-documentation#89

Regardless of the outcome of that issue, it might be best to convert the group name to using underscores, to prevent this warning and ensure the playbook runs properly.

To Reproduce

  1. Install the latest version of Ansible.
  2. Run the playbook: ansible-playbook site.yml -i inventory/hosts.ini

Expected behavior

No warnings at the beginning of the play.

Actual behavior

$ ansible-playbook site.yml -i inventory/hosts.ini
[WARNING]: Invalid characters were found in group names but not replaced, use -vvvv to see details

Additional context / logs

Moved from k3s repo issue k3s-io/k3s#1727

Lint Ansible playbook and tweak code style slightly (for readability)

Version:

N/A

K3s arguments:

N/A

Describe the bug

After applying the fix in k3s-io/k3s#1730, to make the 'reboot on raspbian' task actually work (without a fatal error), I realized that this causes another problem: when the ARM servers reboot mid-playbook, the playbook fails. Even if only the master node fails, everything else will fail at the Copy the K3s service file task with the message:

AnsibleUndefinedVariable: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'token'

To Reproduce

Run the Ansible playbook on ARM servers.

Expected behavior

The playbook completes successfully, and reboots the ARM servers as required in the Rebooting on Raspbian task.

Actual behavior

TASK [raspbian : Rebooting on Raspbian] ********************************************************************************
Saturday 02 May 2020  11:36:06 -0500 (0:00:02.881)       0:00:38.813 ********** 
skipping: [worker-01]
skipping: [worker-02]
skipping: [worker-03]
skipping: [worker-04]
skipping: [worker-05]
skipping: [worker-06]
fatal: [turing-master]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: Shared connection to turing-master closed.", "unreachable": true}

Which, in turn, causes all the other hosts to fail:

TASK [k3s/node : Copy K3s service file] ********************************************************************************
Saturday 02 May 2020  11:36:14 -0500 (0:00:06.435)       0:00:46.844 ********** 
fatal: [worker-01]: FAILED! => {"changed": false, "msg": "AnsibleUndefinedVariable: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'token'"}
fatal: [worker-02]: FAILED! => {"changed": false, "msg": "AnsibleUndefinedVariable: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'token'"}
fatal: [worker-03]: FAILED! => {"changed": false, "msg": "AnsibleUndefinedVariable: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'token'"}
fatal: [worker-05]: FAILED! => {"changed": false, "msg": "AnsibleUndefinedVariable: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'token'"}
fatal: [worker-04]: FAILED! => {"changed": false, "msg": "AnsibleUndefinedVariable: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'token'"}
fatal: [worker-06]: FAILED! => {"changed": false, "msg": "AnsibleUndefinedVariable: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'token'"}

Additional context / logs

This was moved from the k3s repo issue k3s-io/k3s#1724

Move Raspbian role to prereq

Wouldn't it be better/clearer to move the separate Raspbian role to prereq and just load the taskfile specifically on raspbian? Additional the Raspbian-buster only could also be loaded similarly. WDYT?

Question

I am going to have a 4 node Raspbian 64-bit cluster.

Do I need to pre-install with Docker before I execute this playbook?

This is more of a question rather than an issue.

Add fail2ban and unattended-upgrade for Ubuntu

Hello, wouldn't it be nice to do basic hardening to the ubuntu nodes (and others also) ? Like installing fail2ban and making sure unattended-upgrades are activate for security patches? @geerlingguy I know you have tons of playbooks for this, what do you think?

Add `nocows = True` in `ansible.cfg`

Without this setting, if you have cowsay installed, you see a bunch of:

$ ansible-playbook site.yml -i inventory/hosts.ini
 ____________________ 
< PLAY [k3s_cluster] >
 -------------------- 
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||

I'm all for cowsay in moderation, but it can be a bit jarring for a first time user :)

SEC: Support SELinux

It'd be great to have optional support for SELinux: https://rancher.com/docs/k3s/latest/en/advanced/#experimental-selinux-support

From k3s-io/k3s#2473 (comment) :

k3s-ansible (an excellent reference for the necessary setup steps) could be updated here as well: https://github.com/rancher/k3s-ansible/blob/721c3487027e42d30c60eb206e0fb5abfddd094f/roles/prereq/tasks/main.yml#L2-L5

OTOH something like this:

- name: Set SELinux to disabled state
  selinux:
    state: disabled
  when:
    - not (k3s_disable_selinux is defined and not k3s_disable_selinux) or k3s_disable_selinux == False
    - ansible_distribution in ['CentOS', 'Red Hat Enterprise Linux']

With this added to https://github.com/rancher/k3s-ansible/blob/master/inventory/sample/group_vars/all.yml :

k3s_disable_selinux: False

-bash error

RPi x 4

Buster

i have done sudo git clone and have made the my-cluster dir in the inventory. I have changed the host.ini and the all.yml.

I keep getting this. Super new to this stuff and do not know what i have done. thank you for your help.

-bash: ansible-playbook: command not found

Enabling cgroups through /boot/firmware/cmdline.txt fails on x86_64 ubuntu server 20.04

When trying to enable cgroups, the playbook tries to do it through the /boot/firmware/cmdline.txt file which doesn't exist since that's not where the kernel cmdline arguments are defined on the latest ubuntu server version on x86_64.
This resulting output is

fatal: [k3s-main]: FAILED! => {"changed": false, "msg": "Destination /boot/firmware/cmdline.txt does not exist !", "rc": 257}

I think this step should be skipped on non-RPi machines and refactored for the rest of the cases either through a templated /etc/default/grub file that will be distro and architecture specific or by checking the kernel config in /boot/ like below

grep CONFIG_CGROUPS= /boot/config-`uname -r`

which if successful returns

CONFIG_CGROUPS=y

Activate cgroup support failed

When running the playbook after following the Readme and changing the user I get the following error when running the playbook.

TASK [raspbian : Activating cgroup support] ****************************************************************************************************************************************************************************************************************************************************************************************************************************************************
fatal: [192.168.50.51]: FAILED! => {"msg": "The conditional check 'raspbian is true' failed. The error was: template error while templating string: no test named 'true'. String: {% if raspbian is true %} True {% else %} False {% endif %}\n\nThe error appears to be in '/home/brett/k3s-ansible/roles/raspbian/tasks/main.yml': line 10, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Activating cgroup support\n ^ here\n"}
fatal: [192.168.50.215]: FAILED! => {"msg": "The conditional check 'raspbian is true' failed. The error was: template error while templating string: no test named 'true'. String: {% if raspbian is true %} True {% else %} False {% endif %}\n\nThe error appears to be in '/home/brett/k3s-ansible/roles/raspbian/tasks/main.yml': line 10, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Activating cgroup support\n ^ here\n"}
fatal: [192.168.50.196]: FAILED! => {"msg": "The conditional check 'raspbian is true' failed. The error was: template error while templating string: no test named 'true'. String: {% if raspbian is true %} True {% else %} False {% endif %}\n\nThe error appears to be in '/home/brett/k3s-ansible/roles/raspbian/tasks/main.yml': line 10, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Activating cgroup support\n ^ here\n"}
fatal: [192.168.50.204]: FAILED! => {"msg": "The conditional check 'raspbian is true' failed. The error was: template error while templating string: no test named 'true'. String: {% if raspbian is true %} True {% else %} False {% endif %}\n\nThe error appears to be in '/home/brett/k3s-ansible/roles/raspbian/tasks/main.yml': line 10, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Activating cgroup support\n ^ here\n"}

After that fails the playbook ends. I'm trying to deploy it to 4 Pi4s running HypriotOS. I am new to all this so trying to learn.

Connection resets on Armbian

Failing to get k3s nodes talking to each other directly, I thought I'd take crack at using the ansible playbook to make sure I wasn't missing anything.

The problem could be that I'm running on Armbian whereas this seems to be well-tested on Ubuntu and Raspbian. I'm trying to build a PR to fix discrepancies, but I haven't succeeded in a successful playbook run yet. It's hanging now at "Enable and check K3s service"

I see this error on a node:

./syslog:Aug 2 00:58:14 localhost k3s[4895]: time="2020-08-02T00:58:14.866195152Z" level=info msg="Running load balancer 127.0.0.1:43201 -> [t4.local:6443]"
./syslog:Aug 2 00:58:24 localhost k3s[4895]: time="2020-08-02T00:58:24.881040536Z" level=error msg="failed to get CA certs at https://127.0.0.1:43201/cacerts: Get https://127.0.0.1:43201/cacerts: read tcp 127.0.0.1:36796->127.0.0.1:43201: read: connection reset by peer"

Service is up on the master, and accessible from the node.

But.

It's really, really slow:

time curl --insecure https://t4.local:6443/cacerts
-----BEGIN CERTIFICATE-----
...
-----END CERTIFICATE-----

real 0m10.304s
user 0m0.149s
sys 0m0.042s

Resets + slow service seem a bit suspect, and of half a dozen queries, they all return at just over 10 seconds. There's free memory and the load average is 0.6 on the master. They're on the same dumb switch. Don't seem to be any error logs on master during the request, but I could be looking in the wrong spots.

What am I missing, or should I be looking for?

Unexpected templating type error on CentOS 8 RPI4

Unexpected templating type error on CentOS 8 RPI4

Error message:

TASK [raspbian : Test for Raspbian] **********************************************************************************************************************************************
Monday 03 August 2020  19:14:42 +0100 (0:00:00.216)       0:00:28.536 ********* 
fatal: [polux]: FAILED! => {"msg": "Unexpected templating type error occurred on ({% if ( ansible_facts.architecture is search(\"arm\") and ansible_facts.lsb.description is match(\"[Rr]aspbian.*[Bb]uster\") ) or ( ansible_facts.architecture is search(\"aarch64\") and ansible_facts.lsb.description is match(\"Debian.*buster\") ) %}True{% else %}False{% endif %}): expected string or bytes-like object"}
fatal: [pangea]: FAILED! => {"msg": "Unexpected templating type error occurred on ({% if ( ansible_facts.architecture is search(\"arm\") and ansible_facts.lsb.description is match(\"[Rr]aspbian.*[Bb]uster\") ) or ( ansible_facts.architecture is search(\"aarch64\") and ansible_facts.lsb.description is match(\"Debian.*buster\") ) %}True{% else %}False{% endif %}): expected string or bytes-like object"}
	to retry, use: --limit @/home/ansible/work/k3s-ansible/site.retry

Machine: Raspberry Pi 4, 4GB model.

OS:

$ uname -a
Linux polux 5.4.53-v8.1.el8 #1 SMP PREEMPT Sun Jul 26 12:06:25 -03 2020 aarch64 aarch64 aarch64 GNU/Linux
$ cat /etc/os-release
NAME="CentOS Linux"
VERSION="8 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="8"
PLATFORM_ID="platform:el8"
PRETTY_NAME="CentOS Linux 8 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:8"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-8"
CENTOS_MANTISBT_PROJECT_VERSION="8"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="8"

Make inventory creation documentation slightly better

In the included ansible.cfg, the proper inventory file is configured:

inventory  = ./hosts.ini

This assumes the user has this repository, runs ansible* commands from the root directory, and has created a hosts.ini file in the base directory (alongside the site.yml playbook).

The README currently states:

Add the system information gathered above into a file called hosts.ini. For example:

I think this would be more clear if it were something like "Add the system information gathered above into a file called hosts.ini in the same directory as this README file. There is a template in the inventory directory."

Additionally, because the path to the file is defined in ansible.cfg, it need not be specified when you run the playbook, so the playbook command could be, simply:

ansible-playbook site.yml

(Unless I'm reading the configuration wrong.)

Finally, if a .gitignore file is added to the repository with the hosts.ini file excluded, a user like me could clone the repository, create my custom hosts.ini file, and pull changes without fear of any conflicts or accidentally adding my local customized hosts.ini file to the repository.

[ansible-galaxy]Typo error causing error in Centos execution

I'm facing with this error when I'm using Ansible Galaxy version (in ansible-galaxy branch):

TASK [k3s-ansible/roles/prereq : Set bridge-nf-call-iptables (just to be sure)] ***
fatal: [10.0.10.11]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'items' is undefined\n\nThe error appears to be in '/roles/k3s-ansible/roles/prereq/tasks/main.yml': line 33, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Set bridge-nf-call-iptables (just to be sure)\n ^ here\n"}

That variable should be "item":
https://github.com/rancher/k3s-ansible/blob/f91dfcfc8e2e94d6ff687c3d0ecc7805d38e8517/roles/prereq/tasks/main.yml#L35

helm-install-traefik pod falls into CrashLoopBackOff state

For quite a long time I faced this problem, but decided to write only now, when I finally got mad. Every few deployments (about every fifth, I didn't let down the exact statistics) a pod with the name helm-install-traefik falls into an CrashLoopBackOff and then Error state. Of course, this pod retries to stand again later after pod-restart and sometimes it even can reach the Complete state. But almost always, if this happens, helm-install-traefik doesn't stand up and the cluster doesn't deploy. The fact, that it can happens every next deploy is very unpleasant.

helm-traefik-clbo

This problem was encountered on Ubuntu-server 18.04.4 LTS and CentOS 7 on x86-64 architecture
Attach describe-pod and logs command output here:

helm-install-traefik-describe.txt
helm-install-traefik-logs.txt

The conditional check 'boot_cmdline | changed' failed. No filter named 'changed'

In Ansible 2.9.x, the check for Rebooting on Raspbian fails with the following error message:

fatal: [18.206.98.159]: FAILED! => {"msg": "The conditional check 'boot_cmdline | changed' failed. The error was: template error while templating string: no filter named 'changed'. String: {% if boot_cmdline | changed %} True {% else %} False {% endif %}\n\nThe error appears to be in '/Users/jgeerling/Downloads/youtube-10k-pods/attempt-two-k3s/k3s-ansible/roles/raspbian/tasks/main.yml': line 19, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Rebooting on Raspbian\n  ^ here\n"}

The fix is to set the conditional to boot_cmdline is changed.

pi3 Centos7: Enable and check K3s service failed

trying to bring up k3s on 4 pi3, it failed with this error

TASK [k3s/node : Enable and check K3s service] ********************************************************************************
Sunday 13 September 2020  14:58:07 -0700 (0:00:07.255)       0:05:08.667 ****** 
fatal: [192.168.xxx.yyy]: FAILED! => {"changed": false, "msg": "Unable to start service k3s-node: Job for k3s-node.service failed because the control process exited with error code. See \"systemctl status k3s-node.service\" and \"journalctl -xe\" for details.\n"}
[root@pi3-01 ~]# systemctl status k3s-node.service
● k3s-node.service - Lightweight Kubernetes
   Loaded: loaded (/etc/systemd/system/k3s-node.service; enabled; vendor preset: disabled)
   Active: activating (auto-restart) (Result: exit-code) since Sun 2020-09-13 14:58:34 PDT; 1s ago
     Docs: https://k3s.io
  Process: 4109 ExecStart=/usr/local/bin/k3s agent --server https://192.168.xxx.yyy:6443 --token zzzzzzzzzzzzzzzz (code=exited, status=1/FAILURE)
  Process: 4106 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
  Process: 4103 ExecStartPre=/sbin/modprobe br_netfilter (code=exited, status=0/SUCCESS)
 Main PID: 4109 (code=exited, status=1/FAILURE)

Sep 13 14:58:34 pi3-01.local systemd[1]: Failed to start Lightweight Kubernetes.
Sep 13 14:58:34 pi3-01.local systemd[1]: Unit k3s-node.service entered failed state.
Sep 13 14:58:34 pi3-01.local systemd[1]: k3s-node.service failed.

node service seems to be running

Multi-master support for K3s Ansible playbook?

This is something I think we might be able to get configured in the Ansible playbook, but I didn't see (at a glance at least) if it was something supported by this playbook yet; namely, a multi-master configuration with an external database: High Availability with an External DB.

In this playbook's case, maybe it would delegate the task of configuring an external database cluster to the user (e.g. use a separate Ansible playbook that builds an RDS cluster in Amazon, or a separate two or three node DB cluster on some other bare metal servers alongside the K3s cluster), but then how could we make it so this playbook supports the multi-master configuration described in the docs page linked above.

Add CI to test the playbook

A CI to test that the playbook is running successfully would be nice.

  • Should we use Github action, Travis, Circle CI or other ?
  • We need to test that systemd services works so running the playbook in docker might not be suitable

raspbian :Activating cgroup support - failure on x64 Debian Buster platform

Getting a 'raspian is true' failed error when master and nodes are x64 Intel Debian Buster , not ARM, not RaspberryPIs

$ ssh 192.168.86.110 uname -a
Linux alfred 4.19.0-10-amd64 #1 SMP Debian 4.19.132-1 (2020-07-24) x86_64 GNU/Linux

....
TASK [download : Download k3s binary x64] ******************************************************************************
Sunday 06 September 2020 00:21:32 -0600 (0:00:00.412) 0:00:05.811 ******
changed: [192.168.86.110]
changed: [192.168.86.111]
changed: [192.168.86.112]

TASK [download : Download k3s binary arm64] ****************************************************************************
Sunday 06 September 2020 00:22:13 -0600 (0:00:41.544) 0:00:47.356 ******
skipping: [192.168.86.111]
skipping: [192.168.86.112]
skipping: [192.168.86.110]

TASK [download : Download k3s binary armhf] ****************************************************************************
Sunday 06 September 2020 00:22:13 -0600 (0:00:00.137) 0:00:47.493 ******
skipping: [192.168.86.111]
skipping: [192.168.86.112]
skipping: [192.168.86.110]

TASK [raspbian : Test for Raspbian] ************************************************************************************
Sunday 06 September 2020 00:22:13 -0600 (0:00:00.177) 0:00:47.671 ******
ok: [192.168.86.111]
ok: [192.168.86.112]
ok: [192.168.86.110]
TASK [raspbian : Activating cgroup support] ****************************************************************************
Sunday 06 September 2020 00:22:14 -0600 (0:00:00.299) 0:00:47.970 ******
fatal: [192.168.86.110]: FAILED! => {"msg": "The conditional check 'raspbian is true' failed. The error was: template error while templating string: no test named 'true'. String: {% if raspbian is true %} True {% else %} False {% endif %}\n\nThe error appears to have been in '/home/sean/k3s-ansible/roles/raspbian/tasks/main.yml': line 11, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Activating cgroup support\n ^ here\n"}
fatal: [192.168.86.111]: FAILED! => {"msg": "The conditional check 'raspbian is true' failed.
...

Activating cgroup support error

fatal: [192.168.1.150]: FAILED! => {"msg": "The conditional check 'raspbian is true' failed. The error was: template error while templating string: no test named 'true'. String: {% if raspbian is true %} True {% else %} False {% endif %}\n\nThe error appears to be in '/home/james/Downloads/k3s-ansible-master/roles/raspbian/tasks/main.yml': line 10, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Activating cgroup support\n ^ here\n"}

ansible playbook fails to detect my OS version.
cat /etc/osrelease:
NAME="Ubuntu"
VERSION="18.04.4 LTS (Bionic Beaver)"

this is running on an ARM64 system

Playbook hangs on [Enable and check K3s service]

I'm running the playbook with ansible-playbook site.yml -i inventory/sample/hosts.ini -k -K -vv.

It runs successfully up to [Enable and check K3s service] in /node/tasks/main.yml, then it hangs indefinitely. I've run this with varying levels of verbosity and debugging on.

Running this on Raspberry Pi 4Bs, all of which have Raspberry Pi OS Lite.

ansible-checks-k3s-hangs

hosts.ini

hosts

no action detected in task

when I run the following command with Ansible version ansible 2.5.1 on Ubuntu 18 LTS

ansible-playbook site.yml -i inventory/my-cluster/hosts.ini

The following error is produced:

ERROR! no action detected in task. This often indicates a misspelled module name, or incorrect module path.

The error appears to have been in '/home/thepoetwarrior/Downloads/k3s-ansible-master/roles/raspbian/tasks/main.yml': line 41, column 3, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:

  • name: Rebooting
    ^ here

My current config for this playbook is:

  file: inventory/my-cluster/hosts.ini 

[master]
192.168.0.50

[node]
192.168.0.51
192.168.0.52
192.168.0.53

[k3s_cluster:children]
master
node

file:inventory/my-cluster/group_vars/all.yml 

k3s_version: v1.17.5+k3s1
ansible_user: pi
systemd_dir: /etc/systemd/system
master_ip: "{{ hostvars[groups['master'][0]]['ansible_host'] | default(groups['master'][0]) }}"
extra_server_args: ""

k3s.service - Does not Start

After a fresh install the playbook ends with the following error:
FAILED. Unable to start service k3s: Job for k3s.service failed because the control process exited with error code.

Here is the status of the k3s.service:
un 19 15:14:46 master k3s[1087]: E0619 15:14:46.710048 1087 cluster_authentication_trust_controller.go:493] kube-system/extension-apiserver-authentication failed with : context deadline exceeded
Jun 19 15:14:46 master k3s[1087]: E0619 15:14:46.723364 1087 status.go:71] apiserver received an error that is not an metav1.Status: context.deadlineExceededError{}
Jun 19 15:14:46 master k3s[1087]: I0619 15:14:46.724123 1087 trace.go:116] Trace[927478669]: "Create" url:/apis/apiregistration.k8s.io/v1/apiservices,user-agent:k3s/v1.17.5+k3s1 (linux/arm64) kubernetes/
Jun 19 15:14:46 master k3s[1087]: Trace[927478669]: [34.001125181s] [34.000375656s] END
Jun 19 15:14:46 master k3s[1087]: E0619 15:14:46.725191 1087 autoregister_controller.go:194] v2beta1.autoscaling failed with : context deadline exceeded
Jun 19 15:14:46 master k3s[1087]: E0619 15:14:46.780093 1087 status.go:71] apiserver received an error that is not an metav1.Status: context.deadlineExceededError{}
Jun 19 15:14:46 master k3s[1087]: I0619 15:14:46.781135 1087 trace.go:116] Trace[1836866062]: "Create" url:/apis/apiregistration.k8s.io/v1/apiservices,user-agent:k3s/v1.17.5+k3s1 (linux/arm64) kubernetes
Jun 19 15:14:46 master k3s[1087]: Trace[1836866062]: [34.001351143s] [34.000847744s] END
Jun 19 15:14:46 master k3s[1087]: E0619 15:14:46.782577 1087 autoregister_controller.go:194] v2beta2.autoscaling failed with : context deadline exceeded
Jun 19 15:14:55 master k3s[1087]: time="2020-06-19T15:14:55.117854357+01:00" level=error msg="error in txn: context deadline exceeded"

ansible-galaxy

Hi all,
Here is my proposal:

I will probably create a new branch to test the ansible-galaxy directory structure and reorganize the structure!

Are you ok with that way to do it?

Reset Playbook fails because not all nodes have both `k3s` and `k3s-node` services

I was testing the reset.yml playbook and reset role today, and got the following error:

TASK [reset : Disable services] ****************************************************************************************
Sunday 17 May 2020  22:37:05 -0500 (0:00:07.969)       0:00:07.991 ************ 
failed: [10.0.100.37] (item=k3s) => {"ansible_loop_var": "item", "changed": false, "item": "k3s", "msg": "Could not find the requested service k3s: host"}
failed: [10.0.100.91] (item=k3s) => {"ansible_loop_var": "item", "changed": false, "item": "k3s", "msg": "Could not find the requested service k3s: host"}
failed: [10.0.100.74] (item=k3s) => {"ansible_loop_var": "item", "changed": false, "item": "k3s", "msg": "Could not find the requested service k3s: host"}
failed: [10.0.100.70] (item=k3s) => {"ansible_loop_var": "item", "changed": false, "item": "k3s", "msg": "Could not find the requested service k3s: host"}
changed: [10.0.100.163] => (item=k3s)
changed: [10.0.100.37] => (item=k3s-node)
changed: [10.0.100.91] => (item=k3s-node)
changed: [10.0.100.74] => (item=k3s-node)
changed: [10.0.100.70] => (item=k3s-node)
failed: [10.0.100.163] (item=k3s-node) => {"ansible_loop_var": "item", "changed": false, "item": "k3s-node", "msg": "Could not find the requested service k3s-node: host"}
failed: [10.0.100.99] (item=k3s) => {"ansible_loop_var": "item", "changed": false, "item": "k3s", "msg": "Could not find the requested service k3s: host"}
failed: [10.0.100.197] (item=k3s) => {"ansible_loop_var": "item", "changed": false, "item": "k3s", "msg": "Could not find the requested service k3s: host"}
changed: [10.0.100.197] => (item=k3s-node)
changed: [10.0.100.99] => (item=k3s-node)

Running it again results in all those changed messages becoming ok, but the failed messages still kill the playbook run and K3s is not totally uninstalled.

Adding a failed_when: false to the task allows those expected failures to be ignored, so the rest of the playbook can run.

ansible deployment on ubuntu 20.04 / 1x rpi 4 shows no error but k3s restarts in loop

Issue: k3s fails to deploy on a vanilla ubuntu 20.04 on a Raspberry Pi 4.
Ansible does not show any error (see output below) but the k3s service restarts in loop (logs attached)

OS:

Linux polux 5.4.0-1015-raspi #15-Ubuntu SMP Fri Jul 10 05:34:24 UTC 2020 aarch64 aarch64 aarch64 GNU/Linux

/boot/firmware/cmdline.txt:

net.ifnames=0 dwc_otg.lpm_enable=0 console=serial0,115200 console=tty1 root=/dev/sda2 rootfstype=ext4 elevator=deadline rootwait fixrtc cgroup_enable=cpuset cgroup_enable=memory cgroup_memory=1

Current version of k3s-ansible:

*   ad3dc65 (HEAD -> master, origin/master, origin/HEAD) Merge pull request #66 from stafwag/master

k3s-ansible configuration:

ansible@gaia:~/work/k3s-ansible$ cat inventory/rpi-galaxy/group_vars/all.yml
---
#k3s_version: v1.17.5+k3s1
# according to the following link, this version has been validated for Ubuntu 20.04 / ARM64
# https://github.com/rancher/k3s/issues/1860
k3s_version: v1.18.3+k3s1
ansible_user: ansible
systemd_dir: /etc/systemd/system
master_ip: "{{ hostvars[groups['master'][0]]['ansible_host'] | default(groups['master'][0]) }}"
extra_server_args: ""
extra_agent_args: ""
ansible@gaia:~/work/k3s-ansible$ cat inventory/rpi-galaxy/hosts.ini
[master]
polux

[node]
polux
#kore
#cygnus

[k3s_cluster:children]
master
node

sudo journalctl -u k3s (for 1 restart iteration):

Jul 28 20:32:40 polux systemd[1]: k3s.service: Scheduled restart job, restart counter is at 3.
Jul 28 20:32:40 polux systemd[1]: Stopped Lightweight Kubernetes.
Jul 28 20:32:40 polux systemd[1]: Starting Lightweight Kubernetes...
Jul 28 20:32:40 polux modprobe[3365]: modprobe: FATAL: Module br_netfilter not found in directory /lib/modules/5.4.0-1015-raspi
Jul 28 20:32:41 polux k3s[3367]: time="2020-07-28T20:32:41.493565790Z" level=info msg="Starting k3s v1.18.3+k3s1 (96653e8d)"
Jul 28 20:32:41 polux k3s[3367]: time="2020-07-28T20:32:41.494117219Z" level=info msg="Cluster bootstrap already complete"
Jul 28 20:32:41 polux k3s[3367]: time="2020-07-28T20:32:41.549335621Z" level=info msg="Kine listening on unix://kine.sock"
Jul 28 20:32:41 polux k3s[3367]: time="2020-07-28T20:32:41.551698793Z" level=info msg="Running kube-apiserver --advertise-port=6443 --allow-privileged=true --anonymous-auth=false --api-audiences=unknown --authorization-mode=Node,RBAC --basic-auth-file=/var/lib/rancher/k3s/server/cred/passwd --bind-address=127.0.0.1 --cert-dir=/var/lib/rancher/k3s/server/tls/temporary-certs --client-ca-file=/var/lib/rancher/k3s/server/tls/client-ca.crt --enable-admission-plugins=NodeRestriction --etcd-servers=unix://kine.sock --insecure-port=0 --kubelet-certificate-authority=/var/lib/rancher/k3s/server/tls/server-ca.crt --kubelet-client-certificate=/var/lib/rancher/k3s/server/tls/client-kube-apiserver.crt --kubelet-client-key=/var/lib/rancher/k3s/server/tls/client-kube-apiserver.key --proxy-client-cert-file=/var/lib/rancher/k3s/server/tls/client-auth-proxy.crt --proxy-client-key-file=/var/lib/rancher/k3s/server/tls/client-auth-proxy.key --requestheader-allowed-names=system:auth-proxy --requestheader-client-ca-file=/var/lib/rancher/k3s/server/tls/request-header-ca.crt --requestheader-extra-headers-prefix=X-Remote-Extra- --requestheader-group-headers=X-Remote-Group --requestheader-username-headers=X-Remote-User --secure-port=6444 --service-account-issuer=k3s --service-account-key-file=/var/lib/rancher/k3s/server/tls/service.key --service-account-signing-key-file=/var/lib/rancher/k3s/server/tls/service.key --service-cluster-ip-range=10.43.0.0/16 --storage-backend=etcd3 --tls-cert-file=/var/lib/rancher/k3s/server/tls/serving-kube-apiserver.crt --tls-private-key-file=/var/lib/rancher/k3s/server/tls/serving-kube-apiserver.key"
Jul 28 20:32:41 polux k3s[3367]: Flag --basic-auth-file has been deprecated, Basic authentication mode is deprecated and will be removed in a future release. It is not recommended for production environments.
Jul 28 20:32:41 polux k3s[3367]: I0728 20:32:41.553311    3367 server.go:682] external host was not specified, using 192.168.0.95
Jul 28 20:32:41 polux k3s[3367]: I0728 20:32:41.554216    3367 server.go:166] Version: v1.18.3+k3s1
Jul 28 20:32:41 polux k3s[3367]: I0728 20:32:41.570077    3367 plugins.go:158] Loaded 12 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByCondition,Priority,DefaultTolerationSeconds,DefaultStorageClass,StorageObjectInUseProtection,RuntimeClass,DefaultIngressClass,MutatingAdmissionWebhook.
Jul 28 20:32:41 polux k3s[3367]: I0728 20:32:41.571518    3367 plugins.go:161] Loaded 10 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,Priority,PersistentVolumeClaimResize,RuntimeClass,CertificateApproval,CertificateSigning,CertificateSubjectRestriction,ValidatingAdmissionWebhook,ResourceQuota.
Jul 28 20:32:41 polux k3s[3367]: I0728 20:32:41.576411    3367 plugins.go:158] Loaded 12 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByCondition,Priority,DefaultTolerationSeconds,DefaultStorageClass,StorageObjectInUseProtection,RuntimeClass,DefaultIngressClass,MutatingAdmissionWebhook.
Jul 28 20:32:41 polux k3s[3367]: I0728 20:32:41.576977    3367 plugins.go:161] Loaded 10 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,Priority,PersistentVolumeClaimResize,RuntimeClass,CertificateApproval,CertificateSigning,CertificateSubjectRestriction,ValidatingAdmissionWebhook,ResourceQuota.
Jul 28 20:32:41 polux k3s[3367]: I0728 20:32:41.638976    3367 master.go:270] Using reconciler: lease
Jul 28 20:32:41 polux k3s[3367]: I0728 20:32:41.719580    3367 rest.go:113] the default service ipfamily for this cluster is: IPv4
Jul 28 20:32:42 polux k3s[3367]: W0728 20:32:42.642664    3367 genericapiserver.go:409] Skipping API batch/v2alpha1 because it has no resources.
Jul 28 20:32:42 polux k3s[3367]: W0728 20:32:42.679737    3367 genericapiserver.go:409] Skipping API discovery.k8s.io/v1alpha1 because it has no resources.
Jul 28 20:32:42 polux k3s[3367]: W0728 20:32:42.723548    3367 genericapiserver.go:409] Skipping API node.k8s.io/v1alpha1 because it has no resources.
Jul 28 20:32:42 polux k3s[3367]: W0728 20:32:42.791859    3367 genericapiserver.go:409] Skipping API rbac.authorization.k8s.io/v1alpha1 because it has no resources.
Jul 28 20:32:42 polux k3s[3367]: W0728 20:32:42.804764    3367 genericapiserver.go:409] Skipping API scheduling.k8s.io/v1alpha1 because it has no resources.
Jul 28 20:32:42 polux k3s[3367]: W0728 20:32:42.859465    3367 genericapiserver.go:409] Skipping API storage.k8s.io/v1alpha1 because it has no resources.
Jul 28 20:32:42 polux k3s[3367]: W0728 20:32:42.931689    3367 genericapiserver.go:409] Skipping API apps/v1beta2 because it has no resources.
Jul 28 20:32:42 polux k3s[3367]: W0728 20:32:42.931776    3367 genericapiserver.go:409] Skipping API apps/v1beta1 because it has no resources.
Jul 28 20:32:42 polux k3s[3367]: I0728 20:32:42.966951    3367 plugins.go:158] Loaded 12 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByCondition,Priority,DefaultTolerationSeconds,DefaultStorageClass,StorageObjectInUseProtection,RuntimeClass,DefaultIngressClass,MutatingAdmissionWebhook.
Jul 28 20:32:42 polux k3s[3367]: I0728 20:32:42.967022    3367 plugins.go:161] Loaded 10 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,Priority,PersistentVolumeClaimResize,RuntimeClass,CertificateApproval,CertificateSigning,CertificateSubjectRestriction,ValidatingAdmissionWebhook,ResourceQuota.
Jul 28 20:32:48 polux k3s[3367]: I0728 20:32:48.655993    3367 dynamic_cafile_content.go:167] Starting request-header::/var/lib/rancher/k3s/server/tls/request-header-ca.crt
Jul 28 20:32:48 polux k3s[3367]: I0728 20:32:48.656063    3367 dynamic_cafile_content.go:167] Starting client-ca-bundle::/var/lib/rancher/k3s/server/tls/client-ca.crt
Jul 28 20:32:48 polux k3s[3367]: I0728 20:32:48.656646    3367 dynamic_serving_content.go:130] Starting serving-cert::/var/lib/rancher/k3s/server/tls/serving-kube-apiserver.crt::/var/lib/rancher/k3s/server/tls/serving-kube-apiserver.key
Jul 28 20:32:48 polux k3s[3367]: I0728 20:32:48.658203    3367 secure_serving.go:178] Serving securely on 127.0.0.1:6444
Jul 28 20:32:48 polux k3s[3367]: I0728 20:32:48.658285    3367 tlsconfig.go:240] Starting DynamicServingCertificateController
Jul 28 20:32:48 polux k3s[3367]: I0728 20:32:48.658332    3367 autoregister_controller.go:141] Starting autoregister controller
Jul 28 20:32:48 polux k3s[3367]: I0728 20:32:48.658358    3367 cache.go:32] Waiting for caches to sync for autoregister controller
Jul 28 20:32:48 polux k3s[3367]: I0728 20:32:48.659964    3367 cluster_authentication_trust_controller.go:440] Starting cluster_authentication_trust_controller controller
Jul 28 20:32:48 polux k3s[3367]: I0728 20:32:48.660030    3367 shared_informer.go:223] Waiting for caches to sync for cluster_authentication_trust_controller
Jul 28 20:32:48 polux k3s[3367]: I0728 20:32:48.660157    3367 crd_finalizer.go:266] Starting CRDFinalizer
Jul 28 20:32:48 polux k3s[3367]: I0728 20:32:48.660238    3367 dynamic_cafile_content.go:167] Starting client-ca-bundle::/var/lib/rancher/k3s/server/tls/client-ca.crt
Jul 28 20:32:48 polux k3s[3367]: I0728 20:32:48.660332    3367 dynamic_cafile_content.go:167] Starting request-header::/var/lib/rancher/k3s/server/tls/request-header-ca.crt
Jul 28 20:32:48 polux k3s[3367]: I0728 20:32:48.662079    3367 available_controller.go:387] Starting AvailableConditionController
Jul 28 20:32:48 polux k3s[3367]: I0728 20:32:48.662148    3367 cache.go:32] Waiting for caches to sync for AvailableConditionController controller
Jul 28 20:32:48 polux k3s[3367]: I0728 20:32:48.662214    3367 crdregistration_controller.go:111] Starting crd-autoregister controller
Jul 28 20:32:48 polux k3s[3367]: I0728 20:32:48.662234    3367 shared_informer.go:223] Waiting for caches to sync for crd-autoregister
Jul 28 20:32:48 polux k3s[3367]: I0728 20:32:48.662082    3367 apiservice_controller.go:94] Starting APIServiceRegistrationController
Jul 28 20:32:48 polux k3s[3367]: I0728 20:32:48.662328    3367 cache.go:32] Waiting for caches to sync for APIServiceRegistrationController controller
Jul 28 20:32:48 polux k3s[3367]: I0728 20:32:48.662113    3367 controller.go:81] Starting OpenAPI AggregationController
Jul 28 20:32:48 polux k3s[3367]: I0728 20:32:48.664077    3367 controller.go:86] Starting OpenAPI controller
Jul 28 20:32:48 polux k3s[3367]: I0728 20:32:48.664180    3367 customresource_discovery_controller.go:209] Starting DiscoveryController
Jul 28 20:32:48 polux k3s[3367]: I0728 20:32:48.664232    3367 naming_controller.go:291] Starting NamingConditionController
Jul 28 20:32:48 polux k3s[3367]: I0728 20:32:48.664284    3367 establishing_controller.go:76] Starting EstablishingController
Jul 28 20:32:48 polux k3s[3367]: I0728 20:32:48.664355    3367 nonstructuralschema_controller.go:186] Starting NonStructuralSchemaConditionController
Jul 28 20:32:48 polux k3s[3367]: I0728 20:32:48.664406    3367 apiapproval_controller.go:186] Starting KubernetesAPIApprovalPolicyConformantConditionController
Jul 28 20:32:48 polux k3s[3367]: E0728 20:32:48.810415    3367 controller.go:156] Unable to remove old endpoints from kubernetes service: no master IPs were listed in storage, refusing to erase all endpoints for the kubernetes service
Jul 28 20:32:48 polux k3s[3367]: I0728 20:32:48.858559    3367 cache.go:39] Caches are synced for autoregister controller
Jul 28 20:32:48 polux k3s[3367]: I0728 20:32:48.860242    3367 shared_informer.go:230] Caches are synced for cluster_authentication_trust_controller
Jul 28 20:32:48 polux k3s[3367]: I0728 20:32:48.863682    3367 cache.go:39] Caches are synced for APIServiceRegistrationController controller
Jul 28 20:32:48 polux k3s[3367]: I0728 20:32:48.864186    3367 shared_informer.go:230] Caches are synced for crd-autoregister
Jul 28 20:32:48 polux k3s[3367]: I0728 20:32:48.864247    3367 cache.go:39] Caches are synced for AvailableConditionController controller
Jul 28 20:32:49 polux k3s[3367]: I0728 20:32:49.669086    3367 storage_scheduling.go:143] all system priority classes are created successfully or already exist.
Jul 28 20:32:50 polux k3s[3367]: I0728 20:32:50.575394    3367 controller.go:130] OpenAPI AggregationController: action for item : Nothing (removed from the queue).
Jul 28 20:32:50 polux k3s[3367]: I0728 20:32:50.575483    3367 controller.go:130] OpenAPI AggregationController: action for item k8s_internal_local_delegation_chain_0000000000: Nothing (removed from the queue).
Jul 28 20:32:50 polux k3s[3367]: I0728 20:32:50.775039    3367 registry.go:150] Registering EvenPodsSpread predicate and priority function
Jul 28 20:32:50 polux k3s[3367]: I0728 20:32:50.775119    3367 registry.go:150] Registering EvenPodsSpread predicate and priority function
Jul 28 20:32:50 polux k3s[3367]: time="2020-07-28T20:32:50.775834001Z" level=info msg="Running kube-scheduler --bind-address=127.0.0.1 --kubeconfig=/var/lib/rancher/k3s/server/cred/scheduler.kubeconfig --leader-elect=false --port=10251 --secure-port=0"
Jul 28 20:32:50 polux k3s[3367]: time="2020-07-28T20:32:50.777075941Z" level=info msg="Running kube-controller-manager --allocate-node-cidrs=true --bind-address=127.0.0.1 --cluster-cidr=10.42.0.0/16 --cluster-signing-cert-file=/var/lib/rancher/k3s/server/tls/server-ca.crt --cluster-signing-key-file=/var/lib/rancher/k3s/server/tls/server-ca.key --kubeconfig=/var/lib/rancher/k3s/server/cred/controller.kubeconfig --leader-elect=false --port=10252 --root-ca-file=/var/lib/rancher/k3s/server/tls/server-ca.crt --secure-port=0 --service-account-private-key-file=/var/lib/rancher/k3s/server/tls/service.key --use-service-account-credentials=true"
Jul 28 20:32:50 polux k3s[3367]: I0728 20:32:50.794019    3367 controllermanager.go:161] Version: v1.18.3+k3s1
Jul 28 20:32:50 polux k3s[3367]: I0728 20:32:50.795822    3367 deprecated_insecure_serving.go:53] Serving insecurely on [::]:10252
Jul 28 20:32:50 polux k3s[3367]: time="2020-07-28T20:32:50.799656121Z" level=info msg="Running cloud-controller-manager --allocate-node-cidrs=true --allow-untagged-cloud=true --bind-address=127.0.0.1 --cloud-provider=k3s --cluster-cidr=10.42.0.0/16 --kubeconfig=/var/lib/rancher/k3s/server/cred/cloud-controller.kubeconfig --leader-elect=false --node-status-update-frequency=1m --secure-port=0"
Jul 28 20:32:50 polux k3s[3367]: Flag --allow-untagged-cloud has been deprecated, This flag is deprecated and will be removed in a future release. A cluster-id will be required on cloud instances.
Jul 28 20:32:50 polux k3s[3367]: I0728 20:32:50.823862    3367 controllermanager.go:120] Version: v1.18.3+k3s1
Jul 28 20:32:50 polux k3s[3367]: W0728 20:32:50.823946    3367 controllermanager.go:132] detected a cluster without a ClusterID.  A ClusterID will be required in the future.  Please tag your cluster to avoid any future issues
Jul 28 20:32:50 polux k3s[3367]: I0728 20:32:50.836913    3367 registry.go:150] Registering EvenPodsSpread predicate and priority function
Jul 28 20:32:50 polux k3s[3367]: I0728 20:32:50.836976    3367 registry.go:150] Registering EvenPodsSpread predicate and priority function
Jul 28 20:32:50 polux k3s[3367]: I0728 20:32:50.839051    3367 node_controller.go:110] Sending events to api server.
Jul 28 20:32:50 polux k3s[3367]: I0728 20:32:50.839189    3367 controllermanager.go:247] Started "cloud-node"
Jul 28 20:32:50 polux k3s[3367]: W0728 20:32:50.843050    3367 authorization.go:47] Authorization is disabled
Jul 28 20:32:50 polux k3s[3367]: W0728 20:32:50.843107    3367 authentication.go:40] Authentication is disabled
Jul 28 20:32:50 polux k3s[3367]: I0728 20:32:50.843138    3367 deprecated_insecure_serving.go:51] Serving healthz insecurely on [::]:10251
Jul 28 20:32:50 polux k3s[3367]: I0728 20:32:50.845619    3367 node_lifecycle_controller.go:78] Sending events to api server
Jul 28 20:32:50 polux k3s[3367]: I0728 20:32:50.845791    3367 controllermanager.go:247] Started "cloud-node-lifecycle"
Jul 28 20:32:50 polux k3s[3367]: E0728 20:32:50.852243    3367 core.go:90] Failed to start service controller: the cloud provider does not support external load balancers
Jul 28 20:32:50 polux k3s[3367]: W0728 20:32:50.852314    3367 controllermanager.go:244] Skipping "service"
Jul 28 20:32:50 polux k3s[3367]: W0728 20:32:50.852342    3367 core.go:108] configure-cloud-routes is set, but cloud provider does not support routes. Will not configure cloud provider routes.
Jul 28 20:32:50 polux k3s[3367]: W0728 20:32:50.852360    3367 controllermanager.go:244] Skipping "route"
Jul 28 20:32:50 polux k3s[3367]: E0728 20:32:50.880778    3367 node_controller.go:245] Error getting node addresses for node "polux": error fetching node by provider ID: unimplemented, and error by node name: Failed to find node polux: node "polux" not found
Jul 28 20:32:50 polux k3s[3367]: time="2020-07-28T20:32:50.894978323Z" level=info msg="Writing static file: /var/lib/rancher/k3s/server/static/charts/traefik-1.81.0.tgz"
Jul 28 20:32:50 polux k3s[3367]: time="2020-07-28T20:32:50.896046509Z" level=info msg="Writing manifest: /var/lib/rancher/k3s/server/manifests/metrics-server/metrics-apiservice.yaml"
Jul 28 20:32:50 polux k3s[3367]: time="2020-07-28T20:32:50.896886203Z" level=info msg="Writing manifest: /var/lib/rancher/k3s/server/manifests/metrics-server/metrics-server-deployment.yaml"
Jul 28 20:32:50 polux k3s[3367]: time="2020-07-28T20:32:50.897629363Z" level=info msg="Writing manifest: /var/lib/rancher/k3s/server/manifests/metrics-server/metrics-server-service.yaml"
Jul 28 20:32:50 polux k3s[3367]: time="2020-07-28T20:32:50.898372134Z" level=info msg="Writing manifest: /var/lib/rancher/k3s/server/manifests/rolebindings.yaml"
Jul 28 20:32:50 polux k3s[3367]: time="2020-07-28T20:32:50.899056278Z" level=info msg="Writing manifest: /var/lib/rancher/k3s/server/manifests/traefik.yaml"
Jul 28 20:32:50 polux k3s[3367]: time="2020-07-28T20:32:50.899827807Z" level=info msg="Writing manifest: /var/lib/rancher/k3s/server/manifests/coredns.yaml"
Jul 28 20:32:50 polux k3s[3367]: time="2020-07-28T20:32:50.900547468Z" level=info msg="Writing manifest: /var/lib/rancher/k3s/server/manifests/metrics-server/auth-delegator.yaml"
Jul 28 20:32:50 polux k3s[3367]: time="2020-07-28T20:32:50.901229611Z" level=info msg="Writing manifest: /var/lib/rancher/k3s/server/manifests/metrics-server/aggregated-metrics-reader.yaml"
Jul 28 20:32:50 polux k3s[3367]: time="2020-07-28T20:32:50.901969457Z" level=info msg="Writing manifest: /var/lib/rancher/k3s/server/manifests/metrics-server/auth-reader.yaml"
Jul 28 20:32:50 polux k3s[3367]: time="2020-07-28T20:32:50.902669007Z" level=info msg="Writing manifest: /var/lib/rancher/k3s/server/manifests/metrics-server/resource-reader.yaml"
Jul 28 20:32:50 polux k3s[3367]: time="2020-07-28T20:32:50.903395983Z" level=info msg="Writing manifest: /var/lib/rancher/k3s/server/manifests/ccm.yaml"
Jul 28 20:32:50 polux k3s[3367]: time="2020-07-28T20:32:50.904183271Z" level=info msg="Writing manifest: /var/lib/rancher/k3s/server/manifests/local-storage.yaml"
Jul 28 20:32:51 polux k3s[3367]: time="2020-07-28T20:32:51.106800668Z" level=info msg="Node token is available at /var/lib/rancher/k3s/server/token"
Jul 28 20:32:51 polux k3s[3367]: time="2020-07-28T20:32:51.106918849Z" level=info msg="To join node to cluster: k3s agent -s https://192.168.0.95:6443 -t ${NODE_TOKEN}"
Jul 28 20:32:51 polux k3s[3367]: time="2020-07-28T20:32:51.107661252Z" level=info msg="Starting k3s.cattle.io/v1, Kind=Addon controller"
Jul 28 20:32:51 polux k3s[3367]: time="2020-07-28T20:32:51.109963234Z" level=info msg="Waiting for master node  startup: resource name may not be empty"
Jul 28 20:32:51 polux k3s[3367]: I0728 20:32:51.280482    3367 controller.go:606] quota admission added evaluator for: addons.k3s.cattle.io
Jul 28 20:32:51 polux k3s[3367]: http: TLS handshake error from 127.0.0.1:52452: remote error: tls: bad certificate
Jul 28 20:32:51 polux k3s[3367]: time="2020-07-28T20:32:51.311790482Z" level=info msg="Starting /v1, Kind=Service controller"
Jul 28 20:32:51 polux k3s[3367]: time="2020-07-28T20:32:51.311792945Z" level=info msg="Starting /v1, Kind=Node controller"
Jul 28 20:32:51 polux k3s[3367]: time="2020-07-28T20:32:51.311831999Z" level=info msg="Starting helm.cattle.io/v1, Kind=HelmChart controller"
Jul 28 20:32:51 polux k3s[3367]: time="2020-07-28T20:32:51.311850369Z" level=info msg="Starting batch/v1, Kind=Job controller"
Jul 28 20:32:51 polux k3s[3367]: time="2020-07-28T20:32:51.311868961Z" level=info msg="Starting /v1, Kind=Pod controller"
Jul 28 20:32:51 polux k3s[3367]: time="2020-07-28T20:32:51.311889090Z" level=info msg="Starting /v1, Kind=Endpoints controller"
Jul 28 20:32:51 polux k3s[3367]: time="2020-07-28T20:32:51.329592677Z" level=info msg="Wrote kubeconfig /etc/rancher/k3s/k3s.yaml"
Jul 28 20:32:51 polux k3s[3367]: time="2020-07-28T20:32:51.329687285Z" level=info msg="Run: k3s kubectl"
Jul 28 20:32:51 polux k3s[3367]: time="2020-07-28T20:32:51.329721247Z" level=info msg="k3s is up and running"
Jul 28 20:32:51 polux k3s[3367]: time="2020-07-28T20:32:51.330231508Z" level=info msg="module overlay was already loaded"
Jul 28 20:32:51 polux systemd[1]: Started Lightweight Kubernetes.
Jul 28 20:32:51 polux k3s[3367]: time="2020-07-28T20:32:51.387743740Z" level=warning msg="failed to start nf_conntrack module"
Jul 28 20:32:51 polux k3s[3367]: time="2020-07-28T20:32:51.441464037Z" level=warning msg="failed to start br_netfilter module"
Jul 28 20:32:51 polux k3s[3367]: time="2020-07-28T20:32:51.442062425Z" level=warning msg="failed to write value 1 at /proc/sys/net/bridge/bridge-nf-call-iptables: open /proc/sys/net/bridge/bridge-nf-call-iptables: no such file or directory"
Jul 28 20:32:51 polux k3s[3367]: time="2020-07-28T20:32:51.442193421Z" level=warning msg="failed to write value 1 at /proc/sys/net/bridge/bridge-nf-call-ip6tables: open /proc/sys/net/bridge/bridge-nf-call-ip6tables: no such file or directory"
Jul 28 20:32:51 polux k3s[3367]: http: TLS handshake error from 127.0.0.1:52460: remote error: tls: bad certificate
Jul 28 20:32:51 polux k3s[3367]: I0728 20:32:51.477651    3367 controller.go:606] quota admission added evaluator for: helmcharts.helm.cattle.io
Jul 28 20:32:51 polux k3s[3367]: http: TLS handshake error from 127.0.0.1:52466: remote error: tls: bad certificate
Jul 28 20:32:51 polux k3s[3367]: time="2020-07-28T20:32:51.666280703Z" level=info msg="Starting /v1, Kind=Secret controller"
Jul 28 20:32:51 polux k3s[3367]: time="2020-07-28T20:32:51.670020696Z" level=info msg="Logging containerd to /var/lib/rancher/k3s/agent/containerd/containerd.log"
Jul 28 20:32:51 polux k3s[3367]: time="2020-07-28T20:32:51.670918131Z" level=info msg="Running containerd -c /var/lib/rancher/k3s/agent/etc/containerd/config.toml -a /run/k3s/containerd/containerd.sock --state /run/k3s/containerd --root /var/lib/rancher/k3s/agent/containerd"
Jul 28 20:32:51 polux k3s[3367]: time="2020-07-28T20:32:51.695371114Z" level=info msg="Active TLS secret k3s-serving (ver=179) (count 7): map[listener.cattle.io/cn-10.43.0.1:10.43.0.1 listener.cattle.io/cn-127.0.0.1:127.0.0.1 listener.cattle.io/cn-192.168.0.95:192.168.0.95 listener.cattle.io/cn-kubernetes:kubernetes listener.cattle.io/cn-kubernetes.default:kubernetes.default listener.cattle.io/cn-kubernetes.default.svc.cluster.local:kubernetes.default.svc.cluster.local listener.cattle.io/cn-localhost:localhost listener.cattle.io/hash:0122b4bba1a4ad7409c6fb2faf50ed6430fa7426dffb036658d0a84d1ecef5c7]"
Jul 28 20:32:51 polux k3s[3367]: time="2020-07-28T20:32:51.758789640Z" level=info msg="Connecting to proxy" url="wss://192.168.0.95:6443/v1-k3s/connect"
Jul 28 20:32:51 polux k3s[3367]: time="2020-07-28T20:32:51.772151272Z" level=info msg="Handling backend connection request [polux]"
Jul 28 20:32:51 polux k3s[3367]: time="2020-07-28T20:32:51.789101698Z" level=info msg="Running kubelet --address=0.0.0.0 --anonymous-auth=false --authentication-token-webhook=true --authorization-mode=Webhook --cgroup-driver=cgroupfs --client-ca-file=/var/lib/rancher/k3s/agent/client-ca.crt --cloud-provider=external --cluster-dns=10.43.0.10 --cluster-domain=cluster.local --cni-bin-dir=/var/lib/rancher/k3s/data/8963b85492ae8de2b3bbd12a0773ef069eb37c584017ea159104e3016b778bd9/bin --cni-conf-dir=/var/lib/rancher/k3s/agent/etc/cni/net.d --container-runtime-endpoint=/run/k3s/containerd/containerd.sock --container-runtime=remote --containerd=/run/k3s/containerd/containerd.sock --eviction-hard=imagefs.available<5%,nodefs.available<5% --eviction-minimum-reclaim=imagefs.available=10%,nodefs.available=10% --fail-swap-on=false --healthz-bind-address=127.0.0.1 --hostname-override=polux --kubeconfig=/var/lib/rancher/k3s/agent/kubelet.kubeconfig --kubelet-cgroups=/systemd/system.slice --node-labels= --read-only-port=0 --resolv-conf=/run/systemd/resolve/resolv.conf --runtime-cgroups=/systemd/system.slice --serialize-image-pulls=false --tls-cert-file=/var/lib/rancher/k3s/agent/serving-kubelet.crt --tls-private-key-file=/var/lib/rancher/k3s/agent/serving-kubelet.key"
Jul 28 20:32:51 polux k3s[3367]: Flag --containerd has been deprecated, This is a cadvisor flag that was mistakenly registered with the Kubelet. Due to legacy concerns, it will follow the standard CLI deprecation timeline before being removed.
Jul 28 20:32:51 polux k3s[3367]: time="2020-07-28T20:32:51.816696063Z" level=info msg="Running kube-proxy --cluster-cidr=10.42.0.0/16 --healthz-bind-address=127.0.0.1 --hostname-override=polux --kubeconfig=/var/lib/rancher/k3s/agent/kubeproxy.kubeconfig --proxy-mode=iptables"
Jul 28 20:32:51 polux k3s[3367]: W0728 20:32:51.817334    3367 server.go:225] WARNING: all flags other than --config, --write-config-to, and --cleanup are deprecated. Please begin using a config file ASAP.
Jul 28 20:32:51 polux k3s[3367]: time="2020-07-28T20:32:51.909115669Z" level=info msg="waiting for node polux CIDR not assigned yet"
Jul 28 20:32:51 polux k3s[3367]: I0728 20:32:51.943631    3367 server.go:413] Version: v1.18.3+k3s1
Jul 28 20:32:52 polux k3s[3367]: E0728 20:32:52.041055    3367 machine.go:331] failed to get cache information for node 0: open /sys/devices/system/cpu/cpu0/cache: no such file or directory
Jul 28 20:32:52 polux k3s[3367]: I0728 20:32:52.043275    3367 server.go:644] --cgroups-per-qos enabled, but --cgroup-root was not specified.  defaulting to /
Jul 28 20:32:52 polux k3s[3367]: I0728 20:32:52.045018    3367 container_manager_linux.go:277] container manager verified user specified cgroup-root exists: []
Jul 28 20:32:52 polux k3s[3367]: I0728 20:32:52.045819    3367 container_manager_linux.go:282] Creating Container Manager object based on Node Config: {RuntimeCgroupsName:/systemd/system.slice SystemCgroupsName: KubeletCgroupsName:/systemd/system.slice ContainerRuntime:remote CgroupsPerQOS:true CgroupRoot:/ CgroupDriver:cgroupfs KubeletRootDir:/var/lib/kubelet ProtectKernelDefaults:false NodeAllocatableConfig:{KubeReservedCgroupName: SystemReservedCgroupName: ReservedSystemCPUs: EnforceNodeAllocatable:map[pods:{}] KubeReserved:map[] SystemReserved:map[] HardEvictionThresholds:[{Signal:imagefs.available Operator:LessThan Value:{Quantity:<nil> Percentage:0.05} GracePeriod:0s MinReclaim:<nil>} {Signal:nodefs.available Operator:LessThan Value:{Quantity:<nil> Percentage:0.05} GracePeriod:0s MinReclaim:<nil>}]} QOSReserved:map[] ExperimentalCPUManagerPolicy:none ExperimentalCPUManagerReconcilePeriod:10s ExperimentalPodPidsLimit:-1 EnforceCPULimits:true CPUCFSQuotaPeriod:100ms ExperimentalTopologyManagerPolicy:none}
Jul 28 20:32:52 polux k3s[3367]: I0728 20:32:52.047071    3367 topology_manager.go:126] [topologymanager] Creating topology manager with none policy
Jul 28 20:32:52 polux k3s[3367]: I0728 20:32:52.047570    3367 container_manager_linux.go:312] [topologymanager] Initializing Topology Manager with none policy
Jul 28 20:32:52 polux k3s[3367]: I0728 20:32:52.048027    3367 container_manager_linux.go:317] Creating device plugin manager: true
Jul 28 20:32:52 polux k3s[3367]: W0728 20:32:52.048729    3367 util_unix.go:103] Using "/run/k3s/containerd/containerd.sock" as endpoint is deprecated, please consider using full url format "unix:///run/k3s/containerd/containerd.sock".
Jul 28 20:32:52 polux k3s[3367]: W0728 20:32:52.049435    3367 util_unix.go:103] Using "/run/k3s/containerd/containerd.sock" as endpoint is deprecated, please consider using full url format "unix:///run/k3s/containerd/containerd.sock".
Jul 28 20:32:52 polux k3s[3367]: I0728 20:32:52.050286    3367 kubelet.go:317] Watching apiserver
Jul 28 20:32:52 polux k3s[3367]: W0728 20:32:52.068214    3367 proxier.go:635] Failed to load kernel module ip_vs with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules
Jul 28 20:32:52 polux k3s[3367]: I0728 20:32:52.153491    3367 kuberuntime_manager.go:211] Container runtime containerd initialized, version: v1.3.3-k3s2, apiVersion: v1alpha2
Jul 28 20:32:52 polux k3s[3367]: I0728 20:32:52.157022    3367 server.go:1123] Started kubelet
Jul 28 20:32:52 polux k3s[3367]: W0728 20:32:52.160588    3367 proxier.go:635] Failed to load kernel module ip_vs_rr with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules
Jul 28 20:32:52 polux k3s[3367]: I0728 20:32:52.186321    3367 fs_resource_analyzer.go:64] Starting FS ResourceAnalyzer
Jul 28 20:32:52 polux k3s[3367]: I0728 20:32:52.221318    3367 server.go:145] Starting to listen on 0.0.0.0:10250
Jul 28 20:32:52 polux k3s[3367]: E0728 20:32:52.234482    3367 server.go:792] Starting healthz server failed: listen tcp 127.0.0.1:10248: bind: address already in use
Jul 28 20:32:52 polux k3s[3367]: I0728 20:32:52.241643    3367 volume_manager.go:265] Starting Kubelet Volume Manager
Jul 28 20:32:52 polux k3s[3367]: W0728 20:32:52.251343    3367 proxier.go:635] Failed to load kernel module ip_vs_wrr with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules
Jul 28 20:32:52 polux k3s[3367]: I0728 20:32:52.271470    3367 desired_state_of_world_populator.go:139] Desired state populator starts to run
Jul 28 20:32:52 polux k3s[3367]: I0728 20:32:52.288827    3367 server.go:393] Adding debug handlers to kubelet server.
Jul 28 20:32:52 polux k3s[3367]: F0728 20:32:52.294996    3367 server.go:159] listen tcp 0.0.0.0:10250: bind: address already in use
Jul 28 20:32:52 polux systemd[1]: k3s.service: Main process exited, code=exited, status=255/EXCEPTION
Jul 28 20:32:52 polux systemd[1]: k3s.service: Failed with result 'exit-code'.

ansible-playbook output:

$ ansible-playbook site.yml -i inventory/rpi-galaxy/hosts.ini

PLAY [k3s_cluster] **********************************************************************************************************************

TASK [Gathering Facts] ******************************************************************************************************************
Tuesday 28 July 2020  21:29:51 +0100 (0:00:00.242)       0:00:00.242 **********
ok: [polux]

TASK [prereq : Set SELinux to disabled state] *******************************************************************************************
Tuesday 28 July 2020  21:30:08 +0100 (0:00:17.840)       0:00:18.082 **********
skipping: [polux]

TASK [prereq : Enable IPv4 forwarding] **************************************************************************************************
Tuesday 28 July 2020  21:30:09 +0100 (0:00:00.230)       0:00:18.313 **********
ok: [polux]

TASK [prereq : Enable IPv6 forwarding] **************************************************************************************************
Tuesday 28 July 2020  21:30:10 +0100 (0:00:01.196)       0:00:19.510 **********
ok: [polux]

TASK [prereq : Add br_netfilter to /etc/modules-load.d/] ********************************************************************************
Tuesday 28 July 2020  21:30:11 +0100 (0:00:00.860)       0:00:20.370 **********
skipping: [polux]

TASK [prereq : Load br_netfilter] *******************************************************************************************************
Tuesday 28 July 2020  21:30:11 +0100 (0:00:00.221)       0:00:20.591 **********
skipping: [polux]

TASK [prereq : Set bridge-nf-call-iptables (just to be sure)] ***************************************************************************
Tuesday 28 July 2020  21:30:11 +0100 (0:00:00.220)       0:00:20.813 **********
skipping: [polux] => (item=net.bridge.bridge-nf-call-iptables)
skipping: [polux] => (item=net.bridge.bridge-nf-call-ip6tables)

TASK [prereq : Add /usr/local/bin to sudo secure_path] **********************************************************************************
Tuesday 28 July 2020  21:30:11 +0100 (0:00:00.240)       0:00:21.053 **********
skipping: [polux]

TASK [download : Delete k3s if already present] *****************************************************************************************
Tuesday 28 July 2020  21:30:12 +0100 (0:00:00.220)       0:00:21.274 **********
ok: [polux]

TASK [download : Download k3s binary x64] ***********************************************************************************************
Tuesday 28 July 2020  21:30:13 +0100 (0:00:01.057)       0:00:22.331 **********
skipping: [polux]

TASK [download : Download k3s binary arm64] *********************************************************************************************
Tuesday 28 July 2020  21:30:13 +0100 (0:00:00.279)       0:00:22.610 **********
changed: [polux]

TASK [download : Download k3s binary armhf] *********************************************************************************************
Tuesday 28 July 2020  21:30:20 +0100 (0:00:06.728)       0:00:29.339 **********
skipping: [polux]

TASK [raspbian : Test for Raspbian] *****************************************************************************************************
Tuesday 28 July 2020  21:30:20 +0100 (0:00:00.248)       0:00:29.588 **********
ok: [polux]

TASK [raspbian : Activating cgroup support] *********************************************************************************************
Tuesday 28 July 2020  21:30:20 +0100 (0:00:00.311)       0:00:29.899 **********
skipping: [polux]

TASK [raspbian : Flush iptables before changing to iptables-legacy] *********************************************************************
Tuesday 28 July 2020  21:30:20 +0100 (0:00:00.244)       0:00:30.144 **********
skipping: [polux]

TASK [raspbian : Changing to iptables-legacy] *******************************************************************************************
Tuesday 28 July 2020  21:30:21 +0100 (0:00:00.214)       0:00:30.358 **********
skipping: [polux]

TASK [raspbian : Changing to ip6tables-legacy] ******************************************************************************************
Tuesday 28 July 2020  21:30:21 +0100 (0:00:00.211)       0:00:30.569 **********
skipping: [polux]

TASK [raspbian : Rebooting] *************************************************************************************************************
Tuesday 28 July 2020  21:30:21 +0100 (0:00:00.296)       0:00:30.865 **********
skipping: [polux]

TASK [ubuntu : Enable cgroup via boot commandline if not already enabled] ***************************************************************
Tuesday 28 July 2020  21:30:21 +0100 (0:00:00.210)       0:00:31.076 **********
changed: [polux]

TASK [ubuntu : Reboot to enable cgroups] ************************************************************************************************
Tuesday 28 July 2020  21:30:22 +0100 (0:00:01.025)       0:00:32.101 **********
changed: [polux]

PLAY [master] ***************************************************************************************************************************

TASK [Gathering Facts] ******************************************************************************************************************
Tuesday 28 July 2020  21:31:22 +0100 (0:00:59.281)       0:01:31.383 **********
ok: [polux]

TASK [k3s/master : Copy K3s service file] ***********************************************************************************************
Tuesday 28 July 2020  21:31:28 +0100 (0:00:06.532)       0:01:37.915 **********
changed: [polux]

TASK [k3s/master : Enable and check K3s service] ****************************************************************************************
Tuesday 28 July 2020  21:31:30 +0100 (0:00:01.795)       0:01:39.711 **********
changed: [polux]

TASK [k3s/master : Wait for node-token] *************************************************************************************************
Tuesday 28 July 2020  21:31:54 +0100 (0:00:24.478)       0:02:04.189 **********
ok: [polux]

TASK [k3s/master : Register node-token file access mode] ********************************************************************************
Tuesday 28 July 2020  21:31:56 +0100 (0:00:01.209)       0:02:05.398 **********
ok: [polux]

TASK [k3s/master : Change file access node-token] ***************************************************************************************
Tuesday 28 July 2020  21:31:57 +0100 (0:00:01.008)       0:02:06.407 **********
changed: [polux]

TASK [k3s/master : Read node-token from master] *****************************************************************************************
Tuesday 28 July 2020  21:31:58 +0100 (0:00:00.884)       0:02:07.291 **********
ok: [polux]

TASK [k3s/master : Store Master node-token] *********************************************************************************************
Tuesday 28 July 2020  21:31:59 +0100 (0:00:00.979)       0:02:08.270 **********
ok: [polux]

TASK [k3s/master : Restore node-token file access] **************************************************************************************
Tuesday 28 July 2020  21:31:59 +0100 (0:00:00.277)       0:02:08.548 **********
changed: [polux]

TASK [k3s/master : Create directory .kube] **********************************************************************************************
Tuesday 28 July 2020  21:32:00 +0100 (0:00:00.835)       0:02:09.383 **********
ok: [polux]

TASK [k3s/master : Copy config file to user home directory] *****************************************************************************
Tuesday 28 July 2020  21:32:00 +0100 (0:00:00.790)       0:02:10.174 **********
changed: [polux]

TASK [k3s/master : Replace https://localhost:6443 by https://master-ip:6443] ************************************************************
Tuesday 28 July 2020  21:32:01 +0100 (0:00:00.788)       0:02:10.963 **********
changed: [polux]

TASK [k3s/master : Create kubectl symlink] **********************************************************************************************
Tuesday 28 July 2020  21:32:03 +0100 (0:00:02.014)       0:02:12.978 **********
changed: [polux]

TASK [k3s/master : Create crictl symlink] ***********************************************************************************************
Tuesday 28 July 2020  21:32:04 +0100 (0:00:00.810)       0:02:13.788 **********
changed: [polux]

PLAY [node] *****************************************************************************************************************************

TASK [Gathering Facts] ******************************************************************************************************************
Tuesday 28 July 2020  21:32:05 +0100 (0:00:00.834)       0:02:14.622 **********
ok: [polux]

TASK [k3s/node : Copy K3s service file] *************************************************************************************************
Tuesday 28 July 2020  21:32:11 +0100 (0:00:05.772)       0:02:20.394 **********
changed: [polux]

TASK [k3s/node : Enable and check K3s service] ******************************************************************************************
Tuesday 28 July 2020  21:32:12 +0100 (0:00:01.680)       0:02:22.075 **********
changed: [polux]

PLAY RECAP ******************************************************************************************************************************
polux                      : ok=25   changed=13   unreachable=0    failed=0

Tuesday 28 July 2020  21:32:34 +0100 (0:00:21.315)       0:02:43.391 **********
===============================================================================
ubuntu : Reboot to enable cgroups ----------------------------------------------------------------------------------------------- 59.28s
k3s/master : Enable and check K3s service --------------------------------------------------------------------------------------- 24.48s
k3s/node : Enable and check K3s service ----------------------------------------------------------------------------------------- 21.32s
Gathering Facts ----------------------------------------------------------------------------------------------------------------- 17.84s
download : Download k3s binary arm64 --------------------------------------------------------------------------------------------- 6.73s
Gathering Facts ------------------------------------------------------------------------------------------------------------------ 6.53s
Gathering Facts ------------------------------------------------------------------------------------------------------------------ 5.77s
k3s/master : Replace https://localhost:6443 by https://master-ip:6443 ------------------------------------------------------------ 2.01s
k3s/master : Copy K3s service file ----------------------------------------------------------------------------------------------- 1.80s
k3s/node : Copy K3s service file ------------------------------------------------------------------------------------------------- 1.68s
k3s/master : Wait for node-token ------------------------------------------------------------------------------------------------- 1.21s
prereq : Enable IPv4 forwarding -------------------------------------------------------------------------------------------------- 1.20s
download : Delete k3s if already present ----------------------------------------------------------------------------------------- 1.06s
ubuntu : Enable cgroup via boot commandline if not already enabled --------------------------------------------------------------- 1.03s
k3s/master : Register node-token file access mode -------------------------------------------------------------------------------- 1.01s
k3s/master : Read node-token from master ----------------------------------------------------------------------------------------- 0.98s
k3s/master : Change file access node-token --------------------------------------------------------------------------------------- 0.88s
prereq : Enable IPv6 forwarding -------------------------------------------------------------------------------------------------- 0.86s
k3s/master : Restore node-token file access -------------------------------------------------------------------------------------- 0.84s
k3s/master : Create crictl symlink ----------------------------------------------------------------------------------------------- 0.83s

Ansible playbook 'reboot' task should use Ansible's reboot module

Version:

N/A

K3s arguments:

N/A

Describe the bug

After applying the fix in k3s-io/k3s#1730, to make the 'reboot on raspbian' task actually work (without a fatal error), I realized that this causes another problem: when the ARM servers reboot mid-playbook, the playbook fails. Even if only the master node fails, everything else will fail at the Copy the K3s service file task with the message:

AnsibleUndefinedVariable: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'token'

To Reproduce

Run the Ansible playbook on ARM servers.

Expected behavior

The playbook completes successfully, and reboots the ARM servers as required in the Rebooting on Raspbian task.

Actual behavior

TASK [raspbian : Rebooting on Raspbian] ********************************************************************************
Saturday 02 May 2020  11:36:06 -0500 (0:00:02.881)       0:00:38.813 ********** 
skipping: [worker-01]
skipping: [worker-02]
skipping: [worker-03]
skipping: [worker-04]
skipping: [worker-05]
skipping: [worker-06]
fatal: [turing-master]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: Shared connection to turing-master closed.", "unreachable": true}

Which, in turn, causes all the other hosts to fail:

TASK [k3s/node : Copy K3s service file] ********************************************************************************
Saturday 02 May 2020  11:36:14 -0500 (0:00:06.435)       0:00:46.844 ********** 
fatal: [worker-01]: FAILED! => {"changed": false, "msg": "AnsibleUndefinedVariable: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'token'"}
fatal: [worker-02]: FAILED! => {"changed": false, "msg": "AnsibleUndefinedVariable: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'token'"}
fatal: [worker-03]: FAILED! => {"changed": false, "msg": "AnsibleUndefinedVariable: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'token'"}
fatal: [worker-05]: FAILED! => {"changed": false, "msg": "AnsibleUndefinedVariable: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'token'"}
fatal: [worker-04]: FAILED! => {"changed": false, "msg": "AnsibleUndefinedVariable: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'token'"}
fatal: [worker-06]: FAILED! => {"changed": false, "msg": "AnsibleUndefinedVariable: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'token'"}

Additional context / logs

Moved from k3s repo issue k3s-io/k3s#1732

Site playbook throws a syntax error on architecture check

When running site playbook on a cluster of pi's it throws a syntax error.

Hardware

7x Pi 4 (8GB RAM)

Software

  • OS - Buster Lite (32bit)
  • Ansible 2.9.9
  • Python 2.7.17

Steps to reproduce

  1. Run ansible-playbook site.yml -i inventory/my-cluster/hosts.ini
  2. Watch the tasks completing until ansible hits TASK [raspbian : Activating cgroup support] and throws a syntax error.

Screenshot

k3s-ansible-site

Configurations

// inventory/my-cluster/group_vars/all.yml

---
k3s_version: v1.17.5+k3s1
ansible_user: pi
systemd_dir: /etc/systemd/system
master_ip: "{{ hostvars[groups['master'][0]]['ansible_host'] | default(groups['master'][0]) }}"
extra_server_args: "--docker --no-deploy traefik"

// inventory/my-cluster/hosts.ini

[master]
192.168.1.xx

[node]
192.168.1.xx
192.168.1.xx
192.168.1.xx
192.168.1.xx
192.168.1.xx
192.168.1.xx

[k3s_cluster:children]
master
node

Reset playbook stuck on daemon-reload task

When running reset playbook on a cluster of pi's it refuses to complete.

Hardware

7x Pi 4 (8GB RAM)

Software

  • OS - Buster Lite (32bit)
  • Ansible 2.9.9
  • Python 2.7.17

Steps to reproduce

  1. Run ansible-playbook site.yml -i inventory/my-cluster/hosts.ini
  2. Run ansible-playbook reset.yml -i inventory/my-cluster/hosts.ini
  3. Watch the tasks completing until it hits TASK [reset : daemon_reload] and get stuck forever (tried leaving it for at least half an hour)

Screenshot

k3s-ansible

Configurations

// inventory/my-cluster/group_vars/all.yml

---
k3s_version: v1.17.5+k3s1
ansible_user: pi
systemd_dir: /etc/systemd/system
master_ip: "{{ hostvars[groups['master'][0]]['ansible_host'] | default(groups['master'][0]) }}"
extra_server_args: "--docker --no-deploy traefik"

// inventory/my-cluster/hosts.ini

[master]
192.168.1.xx

[node]
192.168.1.xx
192.168.1.xx
192.168.1.xx
192.168.1.xx
192.168.1.xx
192.168.1.xx

[k3s_cluster:children]
master
node

Debug Information

ok: [192.168.1.xx] => {
    "changed": false, 
    "invocation": {
        "module_args": {
            "daemon_reexec": false, 
            "daemon_reload": true, 
            "enabled": null, 
            "force": null, 
            "masked": null, 
            "name": null, 
            "no_block": false, 
            "scope": null, 
            "state": null, 
            "user": null
        }
    }, 
    "name": null, 
    "status": {}
}

Sense it's the last task of that reset role, it is in fact deleting and cleaning the installation, however it still gets stuck without exiting.

Test and make work with new 64-bit Raspberry Pi OS

From a commenter on on of my YouTube videos:

something I noticed is that using k3s-ansible on the beta 64bit OS didn't work, it was missing the k3s binary and can't find it .. did you faced the same? and if so how did you fixed that

(see comment).

I've been slowly working through testing some of my own automation on the new 64-bit version of the Pi OS, and I've found that some images and binaries have to be downloaded differently based on the arch (which, in the past, I always assumed was armv7 or arm32 on Raspbian, which is not necessarily true as of yesterday).

So this issue is mostly a reminder to me to do some work testing k3s-ansible on the 64-bit OS. I'm also tracking this internally for my Turing Pi cluster work, which uses a mix of different Pi versions (some which can't run Pi OS 64-bit), so it would be helpful to be able to make it work with all flavors for the foreseeable future.

File mode for download tasks needs to be in octal

For the file-related tasks in the download role, the mode should be an octal (or needs to be quoted), otherwise the permissions will not be set properly.

https://github.com/rancher/k3s-ansible/blob/master/roles/download/tasks/main.yml#L8-L15

These should be like:

mode: 0755

From Ansible's docs on get_url:

You must either add a leading zero so that Ansible's YAML parser knows it is an octal number (like 0644 or 01777) or quote it (like '644' or '1777') so Ansible receives a string and can do its own conversion from string into number.

It seems this is already correct in the k3s role file-related tasks, so let's get these download tasks in the same format.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.