redhat-emea-ssa-team / hetzner-ocp4 Goto Github PK
View Code? Open in Web Editor NEWInstalling OCP 4 on single bare metal server.
License: Apache License 2.0
Installing OCP 4 on single bare metal server.
License: Apache License 2.0
Document those parts of group_vars/all that can be changed.
You can install coreos using the coreos iso by extracting the:
Creating a .treeinfo
cat /tmp/coreos/.treeinfo
[general]
arch = x86_64
family = Fedora
platforms = x86_64
version = 29
[images-x86_64]
initrd = initramfs.img
kernel = vmlinuz
Then run virt-install with the --location argument.
#!/bin/bash
args='coreos.inst=yes '
args+='coreos.inst.install_dev=vda '
args+='coreos.inst.image_url=http://172.24.24.3:8080/pub/rhcos-42.80.20190828.2-metal-bios.raw.gz '
args+='coreos.inst.ignition_url=http://172.24.24.3:8080/pub/bootstrap.ign '
args+='ip=dhcp '
args+='rd.neednet=1'
virt-install --location /tmp/coreos --extra-args="${args}" --network network=ocp4 --name ocp4-compute-1 --memory 8192 --disk /var/lib/libvirt/images/ocp4-compute-1.qcow2
exit $?
Hi
The installer fails waiting for the api
"stderr_lines": [
"level=debug msg=\"OpenShift Installer v4.2.0-201908282219-dirty\"",
"level=debug msg=\"Built from commit 4f3e73a0143ba36229f42e8b65b6e65342bb826b\"",
"level=info msg=\"Waiting up to 30m0s for the Kubernetes API at https://api.ocp4.sanc.ch:6443...\"",
"level=debug msg=\"Still waiting for the Kubernetes API: Get https://api.ocp4.xxx:6443/version?timeout=32s: EOF\"",
On the bootstrap node I see
Sep 23 05:06:35 bootstrap podman[2318]: 2019-09-23 05:06:35.924008516 +0000 UTC m=+0.687375687 container attach 545491a0fa8e2c6e9275e2228547512d77015022f1912dd2d8025a729cb7e0ec (image=quay.io/openshift-release-dev/ocp-release-nightly@sha256:d48a15ea564293934eb188e6eb8737e56903453d50bc70830cdac2641fb63acc, name=elegant_knuth)
Sep 23 05:06:37 bootstrap bootkube.sh[722]: Starting etcd certificate signer...
Sep 23 05:06:37 bootstrap bootkube.sh[722]: Error: name etcd-signer is in use: container already exists
Sep 23 05:06:37 bootstrap systemd[1]: bootkube.service: Main process exited, code=exited, status=125/n/a
Sep 23 05:06:37 bootstrap systemd[1]: bootkube.service: Failed with result 'exit-code'.
Sep 23 05:06:42 bootstrap systemd[1]: bootkube.service: Service RestartSec=5s expired, scheduling restart.
Sep 23 05:06:42 bootstrap systemd[1]: bootkube.service: Scheduled restart job, restart counter is at 93.
Sep 23 05:06:42 bootstrap systemd[1]: Stopped Bootstrap a Kubernetes cluster.
Sep 23 05:06:42 bootstrap systemd[1]: Started Bootstrap a Kubernetes cluster.
Sep 23 05:06:43 bootstrap podman[2587]: 2019-09-23 05:06:43.572904432 +0000 UTC m=+0.372897314 container create 001f86c9ba3065693b1abda46f4594aec1909cfe01e80d6adc5528057a0af7e2 (image=quay.io/openshift-release-dev/ocp-release-nightly@sha256:d48a15ea564293934eb188e6eb8737e56903453d50bc70830cdac2641fb63acc, name=quizzical_rhodes)
and
Sep 22 19:11:01 bootstrap bootkube.sh[19735]: Waiting for etcd cluster...
Sep 22 19:11:09 bootstrap podman[22919]: 2019-09-22 19:11:09.259737221 +0000 UTC m=+7.442582895 image pull
Sep 22 19:11:09 bootstrap podman[22919]: 2019-09-22 19:11:09.580244772 +0000 UTC m=+7.763090429 container create b00e7ef09f5b9a778c2ed1b0fcc58bc5403f1876cde241c0ca39a>
Sep 22 19:11:09 bootstrap podman[22919]: 2019-09-22 19:11:09.907017175 +0000 UTC m=+8.089862853 container init b00e7ef09f5b9a778c2ed1b0fcc58bc5403f1876cde241c0ca39a8b>
Sep 22 19:11:10 bootstrap podman[22919]: 2019-09-22 19:11:10.038039098 +0000 UTC m=+8.220884731 container start b00e7ef09f5b9a778c2ed1b0fcc58bc5403f1876cde241c0ca39a8>
Sep 22 19:11:10 bootstrap podman[22919]: 2019-09-22 19:11:10.038132193 +0000 UTC m=+8.220977884 container attach b00e7ef09f5b9a778c2ed1b0fcc58bc5403f1876cde241c0ca39a>
Sep 22 19:21:09 bootstrap bootkube.sh[19735]: https://etcd-2.ocp4.xxx:2379 is unhealthy: failed to connect: dial tcp 192.168.50.12:2379: connect: connection refus>
Sep 22 19:21:09 bootstrap bootkube.sh[19735]: Error: unhealthy cluster
Sep 22 19:21:10 bootstrap bootkube.sh[19735]: etcdctl failed. Retrying in 5 seconds...
Any hints where to start debug would be highly appreciated.
cluster.yml should have list of users which are added to htpasswd secret for htpasswd based auth. Also one of those users should be labeled cluster-admin.
openshift-install --dir=/root/{{ cluster_name }}-install wait-for bootstrap-complete --log-level debug
virsh shutdown bootstrap
sleep 120
oc patch configs.imageregistry.operator.openshift.io cluster --type merge --patch '{"spec":{"storage":{"emptyDir":{}}}}'
# apiserver certs are not yet working.
#oc create secret tls letsencrypt-api-certs --cert={{ playbook_dir }}/../certificate/{{ cluster_name }}.{{ public_domain }}/fullchain.crt --key={{ playbook_dir }}/../certificate/{{ cluster_name }}.{{ public_domain }}/cert.key -n openshift-config
#oc patch apiserver cluster --type=merge -p '{"spec":{"servingCerts": {"namedCertificates":[{"names": ["api.{{ cluster_name }}.{{ public_domain }}"], "servingCertificate": {"name": "letsencrypt-api-certs"}}]}}}'
# Install certificate
oc create secret tls letsencrypt-router-certs --cert={{ playbook_dir }}/../certificate/{{ cluster_name }}.{{ public_domain }}/fullchain.crt --key={{ playbook_dir }}/../certificate/{{ cluster_name }}.{{ public_domain }}/cert.key -n openshift-ingress
oc patch ingresscontroller default -n openshift-ingress-operator --type=merge --patch='{"spec": { "defaultCertificate": { "name": "letsencrypt-router-certs" }}}'
openshift-install --dir=/root/{{ cluster_name }}-install wait-for install-complete --log-level debug
ports are 6443 and 22623
I've experienced some issues in installing the current version (4.2).
I discovered that the remote location on openshift.com changed (ansible/roles/openshift-4-cluster/defaults/main.yml
):
-coreos_download_url: "https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/pre-release/latest/rhcos-{{ coreos_version }}-qemu.qcow2"
+coreos_download_url: "https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/pre-release/4.2.0-rc.5/rhcos-{{ coreos_version }}-qemu.qcow2"
The /latest remote subdir now has been bumped to 4.3 version and qemu version disappeared.
We should always point to the latest stable to avoid these issues.
I'm going to create a PR.
The command ansible-playbook ./ansible/setup.yml
reports the following error
[root@ocp4 hetzner-ocp4]# ansible-playbook ./ansible/setup.yml
[WARNING]: Could not match supplied host pattern, ignoring: all
[WARNING]: provided hosts list is empty, only localhost is available
ERROR! no action detected in task. This often indicates a misspelled module name, or incorrect module path.
The error appears to have been in '/root/temp/hetzner-ocp4/ansible/roles/openshift-4-loadbalancer/tasks/create.yml': line 25, column 3, but may
be elsewhere in the file depending on the exact syntax problem.
The offending line appears to be:
- name: Collect services facts
^ here
The error appears to have been in '/root/temp/hetzner-ocp4/ansible/roles/openshift-4-loadbalancer/tasks/create.yml': line 25, column 3, but may
be elsewhere in the file depending on the exact syntax problem.
The offending line appears to be:
- name: Collect services facts
^ here
exception type: <class 'ansible.errors.AnsibleParserError'>
exception: no action detected in task. This often indicates a misspelled module name, or incorrect module path.
The error appears to have been in '/root/temp/hetzner-ocp4/ansible/roles/openshift-4-loadbalancer/tasks/create.yml': line 25, column 3, but may
be elsewhere in the file depending on the exact syntax problem.
The offending line appears to be:
- name: Collect services facts
^ here
OS: Centos7
Ansible version : 2.4.2.0
cluster.yml
---
cluster_name: ocp4
public_domain: example.com
dns_provider: [route53|cloudflare|gcp]
letsencrypt_account_email: [email protected]
# Depending on the dns provider:
# CloudFlare
cloudflare_account_email: [email protected]
cloudflare_account_api_token: 9348234sdsd894.....
cloudflare_zone: example.com
# Route53
aws_access_key: key
aws_secret_key: secret
aws_zone: example.com
# GCP
gcp_project: project-name
gcp_managed_zone_name: 'zone-name'
gcp_managed_zone_domain: 'example.com.'
gcp_serviceaccount_file: ../gcp_service_account.json
auth_htpasswd:
- admin:$ttttttttt//
- local:$ttttttttt//
storage_nfs: false # Default is false
auth_redhatsso:
client_id: "xxxxx.apps.googleusercontent.com"
client_secret: "xxxxxxx"
cluster_role_bindings:
- cluster_role: sudoers
name: [email protected]
- cluster_role: cluster-admin
name: admin
image_pull_secret: |-
ttttttttt
Default disk size is a little bit to small:
[core@master-0 ~]$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vda 252:0 0 16G 0 disk
├─vda1 252:1 0 1M 0 part
├─vda2 252:2 0 1G 0 part /boot
└─vda3 252:3 0 15G 0 part /sysroot
Please add variable in ./ansible/roles/openshift-4-loadbalancer/defaults/main.yml
and use it.
I added 2 labels for the worker nodes as infra
node-role.kubernetes.io/infra: ""
infra: infra
by editing
$ oc edit nodes worker-0.ocp4.dwojciec.com
and I added the 2 new labels inside
labels:
beta.kubernetes.io/arch: amd64
beta.kubernetes.io/os: linux
infra: infra
kubernetes.io/arch: amd64
kubernetes.io/hostname: worker-0.ocp4.dwojciec.com
kubernetes.io/os: linux
node-role.kubernetes.io/infra: ""
node-role.kubernetes.io/worker: ""
node.openshift.io/os_id: rhcos
name: worker-0.ocp4.dwojciec.com
resourceVersion: "26912"
see the result
[root@CentOS-76-64-minimal haproxy]# oc get nodes --show-labels
NAME STATUS ROLES AGE VERSION LABELS
master-0.ocp4.dwojciec.com Ready master,worker 69m v1.14.0+44b46b52b beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=master-0.ocp4.dwojciec.com,kubernetes.io/os=linux,node-role.kubernetes.io/master=,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhcos
master-1.ocp4.dwojciec.com Ready master,worker 69m v1.14.0+44b46b52b beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=master-1.ocp4.dwojciec.com,kubernetes.io/os=linux,node-role.kubernetes.io/master=,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhcos
master-2.ocp4.dwojciec.com Ready master,worker 70m v1.14.0+44b46b52b beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=master-2.ocp4.dwojciec.com,kubernetes.io/os=linux,node-role.kubernetes.io/master=,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhcos
worker-0.ocp4.dwojciec.com Ready infra,worker 70m v1.14.0+44b46b52b beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,infra=infra,kubernetes.io/arch=amd64,kubernetes.io/hostname=worker-0.ocp4.dwojciec.com,kubernetes.io/os=linux,node-role.kubernetes.io/infra=,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhcos
worker-1.ocp4.dwojciec.com Ready infra,worker 70m v1.14.0+44b46b52b beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,infra=infra,kubernetes.io/arch=amd64,kubernetes.io/hostname=worker-1.ocp4.dwojciec.com,kubernetes.io/os=linux,node-role.kubernetes.io/infra=,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhcos
worker-2.ocp4.dwojciec.com Ready infra,worker 70m v1.14.0+44b46b52b beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,infra=infra,kubernetes.io/arch=amd64,kubernetes.io/hostname=worker-2.ocp4.dwojciec.com,kubernetes.io/os=linux,node-role.kubernetes.io/infra=,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhcos
I assign infra label to the ingresscontroller
$ oc edit ingresscontroller default -n openshift-ingress-operator
when done I deleted
$ oc delete deployment router-default -n openshift-ingress
$ oc get pod -o wide
to check if the default-router is running on worker node.
oc get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
router-default-56d656f6b7-8fgzq 1/1 Running 0 24m 192.168.222.34 worker-0.ocp4.dwojciec.com <none> <none>
router-default-56d656f6b7-h49l4 1/1 Running 0 24m 192.168.222.35 worker-1.ocp4.dwojciec.com <none> <none>
Add annotation storageclass.kubernetes.io/is-default-class: "true"
Add support to add more than one DNS forwarders:
grep forward ansible/group_vars/all
# forwarder to access the internet for your prviate DNS server
forward_dns: 8.8.8.8
https://wiki.hetzner.de/index.php/Hetzner_Standard_Name_Server/en
There are differences what packages and where they are installed based on OS. There should be host OS based roles for package management.
/var/lib/libvirt/images
The requested URL /pub/openshift-v4/dependencies/rhcos/4.3/4.3.0/rhcos-4.3.0-x86_64-qemu.qcow2.gz was not found on this server.
What is this file? Why is it needed?
name: ensure IPForward is set in /etc/systemd/network/10-mainif.network
Follow the documentation: https://docs.openshift.com/container-platform/4.2/authentication/certificates/api-server.html
Using root server as NFS server.
At the moment cluster remains down if the host reboots. Let's fix that by either doing (pseudo code):
for all machines
virsh autostart machine-name
done
in ansible command for list of servers,
or another way, just do
for all machines
ln -s /etc/libvirt/qemu/autostart/ocp4-compute-0.xml /etc/libvirt/qemu/ocp4-compute-0.xml
done
both with ansible, naturally.
When running on RHEL 7.7 and SELinux is in Enforcing mode bootstrap VM does not start.
TASK [openshift-4-cluster : Start VirtualMachine iot-bootstrap] ***************************************************************************************************************************************************
Tuesday 01 October 2019 14:34:13 +0300 (0:00:00.404) 0:17:06.550 *******
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: libvirtError: internal error: qemu unexpectedly closed the monitor: 2019-10-01T11:34:15.354797Z qemu-kvm: -fw_cfg name=opt/com.coreos/config,file=/var/lib/libvirt/images/iot-bootstrap.ign: can't load /var/lib/libvirt/images/iot-bootstrap.ign
fatal: [localhost]: FAILED! => {"changed": false, "msg": "internal error: qemu unexpectedly closed the monitor: 2019-10-01T11:34:15.354797Z qemu-kvm: -fw_cfg name=opt/com.coreos/config,file=/var/lib/libvirt/images/iot-bootstrap.ign: can't load /var/lib/libvirt/images/iot-bootstrap.ign"}
TASK [ign : Create small ign for bootstrap] *********************************************************************************************************************************************
task path: /root/hetzner-ocp4/ansible/roles/ign/tasks/main.yml:25
fatal: [localhost -> localhost]: FAILED! => {"changed": false, "checksum": "0b3017f31dea301f097f544cd0a9b47ca00bea51", "msg": "Destination directory /root/terraform does not exist"}
to retry, use: --limit @/root/hetzner-ocp4/ansible/03-prepare-install.retry
We have to add OpenShit Container Storage 4 (rook & ceph) because, we need storage for the Image Registry and for applications too.
Working branch: ocs_issue#31
You can install ocs upstream on OCP4 - quick'n'dirty
git clone [email protected]:RedHat-EMEA-SSA-Team/hetzner-ocp4.git
cd hetzner-ocp4
git branch ocs_issue#31 origin/ocs_issue#31
git checkout ocs_issue#31
# Create cluster.yml
vi cluster.yml
./ansible/02-create-cluster.yml
export KUBECONFIG=....
./deploy-ocs.sh
Keep only installation related docs in main README and move all infra related to own documents. Like maybe in the future own docs if RHV is used.
Replace the ansible post-installation with an operator
Be careful with ssl cert and lookup('file'
because it strips the last important \n:
- name: Check certificates exist
stat:
path: "{{ ign_certificates_path }}/fullchain.crt"
register: crt
- name: Check ssl key exist
stat:
path: "{{ ign_certificates_path }}/cert.key"
register: key
- name: Create openshift-ingress config
block:
- name: Create openshift router certs secret
copy:
content: |
apiVersion: v1
kind: Secret
data:
tls.crt: {{ lookup('file',ign_certificates_path + '/fullchain.crt', rstrip=false) | b64encode }}
tls.key: {{ lookup('file',ign_certificates_path + '/cert.key', rstrip=false) | b64encode }}
metadata:
name: letsencrypt-router-certs
namespace: openshift-ingress
type: kubernetes.io/tls
dest: "{{ ign_openshift_install_dir }}/openshift/99_openshift-ingress-letsencrypt-router-certs-secret.yaml"
when: crt.stat.exists == True and key.stat.exists == True
$ oc get configs.imageregistry.operator.openshift.io/cluster -o yaml | grep managementState
managementState: Removed
Means, new cluster installation don't have a running registry.
MCO not accessible via https://apt-int.....:22623/config/worker
oc debug node/compute-0
Starting pod/compute-0-debug ...
To use host binaries, run `chroot /host`
Pod IP: 192.168.51.13
If you don't see a command prompt, try pressing enter.
sh-4.2# chroot /host
sh-4.4# curl -i -k https://api-int.demo.openshift.pub:22623/config/worker
curl: (7) Failed to connect to api-int.demo.openshift.pub port 22623: Connection refused
sh-4.4# nslookup api-int.demo.openshift.pub
Server: 192.168.51.1
Address: 192.168.51.1#53
Name: api-int.demo.openshift.pub
Address: 192.168.51.1
sh-4.4# ping 192.168.51.1
PING 192.168.51.1 (192.168.51.1) 56(84) bytes of data.
64 bytes from 192.168.51.1: icmp_seq=1 ttl=64 time=0.113 ms
64 bytes from 192.168.51.1: icmp_seq=2 ttl=64 time=0.118 ms
^C
--- 192.168.51.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 55ms
rtt min/avg/max/mdev = 0.113/0.115/0.118/0.011 ms
sh-4.4# curl -vvv -i -k https://192.168.51.1:22623/config/worker
* Trying 192.168.51.1...
* TCP_NODELAY set
* connect to 192.168.51.1 port 22623 failed: Connection refused
* Failed to connect to 192.168.51.1 port 22623: Connection refused
* Closing connection 0
curl: (7) Failed to connect to 192.168.51.1 port 22623: Connection refused
From host it worked:
root@homer:~ $ curl -vvv -I -k https://192.168.51.1:22623/config/worker
* About to connect() to 192.168.51.1 port 22623 (#0)
* Trying 192.168.51.1...
* Connected to 192.168.51.1 (192.168.51.1) port 22623 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* skipping SSL peer certificate verification
* SSL connection using TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256
* Server certificate:
* subject: CN=api-int.demo.openshift.pub
* start date: Oct 17 11:05:22 2019 GMT
* expire date: Oct 14 11:05:25 2029 GMT
* common name: api-int.demo.openshift.pub
* issuer: CN=root-ca,OU=openshift
> HEAD /config/worker HTTP/1.1
> User-Agent: curl/7.29.0
> Host: 192.168.51.1:22623
> Accept: */*
>
< HTTP/1.1 200 OK
HTTP/1.1 200 OK
< Content-Length: 91607
Content-Length: 91607
< Content-Type: application/json
Content-Type: application/json
< Date: Tue, 29 Oct 2019 15:47:52 GMT
Date: Tue, 29 Oct 2019 15:47:52 GMT
<
* Connection #0 to host 192.168.51.1 left intact
We should move the openshift-install
download [1] from prepare host to create cluster part.
AND important, add the version to the binary. Because RHEL CoreOS Version and openshift-install versions have to match together.
[1]
We have to add a note that
run's with ephemeral storage.
Install fails at TASK [openshift-4-cluster : Waiting bootstrap to complete]
journalctl on bootstrap node has the following error:
Sep 17 09:25:21 bootstrap release-image-download.sh[1016]: Error: error pulling image "quay.io/openshift-release-dev/ocp-release-nightly@sha256:d48a15ea564293934eb188e6eb8737e56903453d50bc70830cdac2641fb63acc": unable to pull quay.io/openshift-release-dev/ocp-release-nightly@sha256:d48a15ea564293934eb188e6eb8737e56903453d50bc70830cdac2641fb63acc: unable to pull image: Error initializing source docker://quay.io/openshift-release-dev/ocp-release-nightly@sha256:d48a15ea564293934eb188e6eb8737e56903453d50bc70830cdac2641fb63acc: pinging docker registry returned: Get https://quay.io/v2/: dial tcp 23.23.73.73:443: i/o timeout
Curling the quay api fails as well on bootstrap vm:
[root@bootstrap ~]# curl -v https://quay.io/v2/
Curling the quay API works fine from Hetzner root server or local laptop.
The variable cloudflare_account_api_token
should be renamed accordingly since the user should provide a global key and not an api token. The current naming leads to possible misunderstanng errors.
Obtain an API Token in Cloudflare and assign it to the cloudflare_account_api_token
.
Run the playbook.
The message will appear in the Ansible logs.
API bad request; Status: 400; Method: GET: Call: /zones?name=rhocplab.com; Error details: code: 6003, error: Invalid request headers; code: 6103, error: Invalid format for X-Auth-Key header;
When passing an
In Cloudflare APIs tokens are consumed as bearer token while global keys are consumed as X-Auth-Token.
The ansible module cloudflare_dns despite calling the related parameter account_api_token, sends a request passing an X-Auth-Token header. To match the kind of header the user should provide a global key.
To help users to not fall in this issue I suggest to rename our variable cloudflare_account_api_token to cloudflare_account_global_key.
I collect here items that need to be fixed for CentOS 8, which is nowadays available from Hetzner. After done, I'll hopefully close this with PR.
fatal: [localhost]: FAILED! => {"changed": false, "failures": ["No package python-lxml available.", "No package python-boto available.", "No package python2-openshift available."], "msg": ["Failed to install some of the specified packages"], "rc": 1, "results": []}
-> Changed to pip, works
fatal: [localhost]: FAILED! => {"changed": false, "failures": ["No package centos-release-openstack-stein available."], "msg": "Failed to install some of the specified packages", "rc": 1, "results": []}
-> removed, probably useless in CentOS 8
fatal: [localhost]: FAILED! => {"changed": false, "cmd": "yum-config-manager -q --disable centos-ceph-nautilus centos-nfs-ganesha28 centos-openstack-stein", "msg": "[Errno 2] No such file or directory: 'yum-config-manager': 'yum-config-manager'", "rc": 2}
-> removed, probably useless in CentOS 8
Installation proceeds, let's see.
I find the accompanying roles to be very useful for many other use cases and thing they should be broken out into their own repos.
The project can include an ansible.cfg that points to ansible/roles directory for the installation of roles. And there can be a requirements.yml
- src: https://github.com/flyemsafe/swygue-redhat-subscription.git
version: master
Then the roles can be installed with:
ansible-galaxy install --force -r requirements.yml
What can I do to install the ocp4 cluster on hetzner if I dont have a domain registered and an account with one of the DNS providers currently supported : AWS Route53, Cloudflare or GCP DNS ?
TASK [bind : Install bind] **************************************************************************************************************************************************************
task path: /root/hetzner-ocp4/ansible/roles/bind/tasks/main.yml:17
fatal: [localhost]: FAILED! => {"changed": false, "msg": "Unsupported parameters for (yum) module: use_backend Supported parameters include: allow_downgrade, bugfix, conf_file, disable_gpg_check, disable_plugin, disablerepo, enable_plugin, enablerepo, exclude, install_repoquery, installroot, list, name, security, skip_broken, state, update_cache, update_only, validate_certs"}
to retry, use: --limit @/root/hetzner-ocp4/ansible/03-prepare-install.retry
root@a:~ $ ansible --version
ansible 2.6.18
config file = /etc/ansible/ansible.cfg
configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
ansible python module location = /usr/lib/python2.7/site-packages/ansible
executable location = /usr/bin/ansible
python version = 2.7.5 (default, Jun 11 2019, 14:33:56) [GCC 4.8.5 20150623 (Red Hat 4.8.5-39)]
root@a:~ $ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.7 (Maipo)
root@a:~ $
Is use_backend: yum
necessary on Centos or with newer Ansible versions?
When installing 4.3 as baremetal installation, image registry is marked offline as no object storage is available. Fix is written in the OCP docs, would be nice to automate it as well.
Manual Fix:
"oc edit configs.imageregistry.operator.openshift.io " and change "managementState: Removed" to "managementState: Managed"
When doing a repeated install, the old files in /usr/local/bin remain active and lead to wrong version selection.
Creare ssh key automatically and include it to install config
Name is now misleading and it doesn't tell that email is also used with route53
fixed by changing to le_letsencrypt_account_email
Set bind to listen 192.168.222.1 and 127.0.0.1 only
[root@hack02]# ssh bootstrap
ssh: connect to host bootstrap port 22: No route to host
[root@hack02]# ssh bootstrap.ocp42.ocp.ninja
ssh: connect to host bootstrap.ocp42.ocp.ninja port 22: No route to host
[root@hack02]# nslookup bootstrap
Server: 127.0.0.1
Address: 127.0.0.1#53
Name: bootstrap.ocp42.ocp.ninja
Address: 192.168.222.30
[root@hack02]#
I found an issue on Letsencript playbook, the cloudflare account email is missing and the playbook fails with an error.
Letsencrypt role search for the variable:
le_cloudflare_account_email"
So the best way is to define it in ansible/roles/openshift-4-cluster/tasks/create.yml
:
le_cloudflare_account_email: "{{ cloudflare_account_email }}"
I'll make a PR for that
Please use /opt/openshift-install-{{ openshift_version }}/openshift-install
instead of openshift-install
to take care the right version is in use!
ansible/roles/openshift-4-cluster/tasks/create-ignition.yml: command: "openshift-install --dir={{ openshift_install_dir }} create ignition-configs"
ansible/roles/openshift-4-cluster/tasks/download-openshift-artifacts.yml: src: "https://mirror.openshift.com/pub/openshift-v4/clients/ocp-dev-preview/{{ openshift_version }}/openshift-install-linux-{{ openshift_version }}.tar.gz"
ansible/roles/openshift-4-cluster/tasks/download-openshift-artifacts.yml: dest: "/opt/openshift-install-{{ openshift_version }}/"
ansible/roles/openshift-4-cluster/tasks/download-openshift-artifacts.yml: creates: "/opt/openshift-install-{{ openshift_version }}/openshift-install"
ansible/roles/openshift-4-cluster/tasks/download-openshift-artifacts.yml: "/usr/local/bin/openshift-install": "/opt/openshift-install-{{ openshift_version }}/openshift-install"
ansible/roles/openshift-4-cluster/tasks/post-install.yml: command: "openshift-install wait-for bootstrap-complete --dir {{ openshift_install_dir }} --log-level debug"
ansible/roles/openshift-4-cluster/tasks/post-install.yml: command: "openshift-install wait-for install-complete --dir {{ openshift_install_dir }}"
In "vanilla" RHEL 7.7 where firewalld is disabled and iptables is in use, default rules prevents nodes to access bootstrap thru haproxy lb
Disconnected installation is using docker registry image from docker.io. That may be blocked and cannot be use in some cases. Quay.io is usually whitelisted so change registry image to point to quay.io/redhat-emea-ssa-team/registry image repo.
Hi
I got an additional Hetzner root server but this time the installer does not work for whatever reason. The only difference between the servers are the interface names.
TASK [openshift-4-cluster : Add emptyDir storage to registry] ********************************************************************************************************************************************************************************
Tuesday 24 September 2019 22:28:09 +0200 (0:00:00.644) 2:00:47.563 *****
: ["oc", "patch", "configs.imageregistry.operator.openshift.io", "cluster", "--type", "merge", "--patch", "{\"spec\":{\"storage\":{\"emptyDir\":{}}}}", "--config", "/root/hetzner-ocp4/ansible/../ocp4/auth/kubeconfig"], "delta": "0:00:48.827473", "end": "2019-09-24 23:26:00.512679", "msg": "non-zero return code", "rc": 1, "start": "2019-09-24 23:25:11.685206", "stderr": "Error from server (NotFound): configs.imageregistry.operator.openshift.io \"cluster\" not found", "stderr_lines": ["Error from server (NotFound): configs.imageregistry.operator.openshift.io \"cluster\" not found"], "stdout": "", "stdout_lines": []}
The API at least seems to work and is reachable but returns an error for the bootstrap-roles
https://api.ocp4.sanc.ch:6443/healthz
[+]ping ok
[+]log ok
[+]etcd ok
[+]poststarthook/generic-apiserver-start-informers ok
[+]poststarthook/kube-apiserver-requestheader-reload ok
[+]poststarthook/kube-apiserver-clientCA-reload ok
[+]poststarthook/start-apiextensions-informers ok
[+]poststarthook/start-apiextensions-controllers ok
[+]poststarthook/crd-discovery-available ok
[+]poststarthook/crd-informer-synced ok
[+]poststarthook/bootstrap-controller ok
[-]poststarthook/rbac/bootstrap-roles failed: reason withheld
[+]poststarthook/scheduling/bootstrap-system-priority-classes ok
[+]poststarthook/ca-registration ok
[+]poststarthook/start-kube-apiserver-admission-initializer ok
[+]poststarthook/openshift.io-clientCA-reload ok
[+]poststarthook/openshift.io-requestheader-reload ok
[+]poststarthook/quota.openshift.io-clusterquotamapping ok
[+]poststarthook/openshift.io-kubernetes-informers-synched ok
[+]poststarthook/openshift.io-startkubeinformers ok
[+]poststarthook/aggregator-reload-proxy-client-cert ok
[+]poststarthook/start-kube-aggregator-informers ok
[+]poststarthook/apiservice-registration-controller ok
[+]poststarthook/apiservice-status-available-controller ok
[+]poststarthook/apiservice-wait-for-first-sync ok
[+]poststarthook/apiservice-openapi-controller ok
[+]poststarthook/kube-apiserver-autoregistration ok
[+]autoregister-completion ok
healthz check failed
Might look like a network error
From: sdn-controller-k26q4_openshift-sdn_sdn-controller-839c5cf81e1ef067eb60b6b4e2d3d79466e70bd79b9b98bf7e2b57d7820a9855.log
/api/v1/namespaces/openshift-sdn/configmaps/openshift-network-controller: EOF
2019-09-25T04:22:52.964727739+00:00 stderr F E0925 04:22:52.964676 1 leaderelection.go:306] error retrieving resource lock openshift-sdn/openshift-network-controller: Get https://api-int.ocp4.sanc.ch:6443/api/v1/namespaces/openshift-sdn/configmaps/openshift-network-controller: EOF
2019-09-25T04:23:11.572064660+00:00 stderr F E0925 04:23:11.572023 1 leaderelection.go:306] error retrieving resource lock openshift-sdn/openshift-network-controller: Get https://api-int.ocp4.sanc.ch:6443/api/v1/namespaces/openshift-sdn/configmaps/openshift-network-controller: EOF
2019-09-25T04:29:44.826459167+00:00 stderr F E0925 04:29:44.826373 1 leaderelection.go:306] error retrieving resource lock openshift-sdn/openshift-network-controller: Get https://api-int.ocp4.sanc.ch:6443/api/v1/namespaces/openshift-sdn/configmaps/openshift-network-controller: http2: server sent GOAWAY and closed the connection; LastStreamID=5, ErrCode=NO_ERROR, debug=""
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.