Giter Site home page Giter Site logo

kubealex / libvirt-ocp4-provisioner Goto Github PK

View Code? Open in Web Editor NEW
88.0 16.0 40.0 6.96 MB

Automate your OCP4 installation

License: MIT License

HCL 61.57% Shell 7.53% Jinja 28.61% Makefile 2.29%
ocp libvirt libvirt-ocp4-provisioner cluster-installation openshift openshift-v4 upi kubernetes k8s devops

libvirt-ocp4-provisioner's Introduction

License: MIT

libvirt-ocp4-provisioner - Automate your cluster provisioning from 0 to OCP!

Welcome to the home of the project! This project has been inspired by @ValentinoUberti, who did a GREAT job creating the playbooks to provision existing infrastructure nodes on oVirt and preparing for cluster installation.

I wanted to play around with terraform and port his great work to libvirt and so, here we are! I adapted his playbooks to libvirt needs, making massive use of in-memory inventory creation for provisioned VMs, to minimize the impact on customizable stuff in variables.

Project Overview

To give a quick overview, this project will allow you to provision a fully working and stable OCP environment, consisting of:

  • Bastion machine provisioned with:
    • dnsmasq (with SELinux module, compiled and activated)
    • dhcp based on dnsmasq
    • nginx (for ignition files and rhcos pxe-boot)
    • pxeboot
  • Loadbalancer machine provisioned with:
    • haproxy
  • OCP Bootstrap VM
  • OCP Master VM(s)
  • OCP Worker VM(s)

It also takes care of preparing the host machine with needed packages, configuring:

PXE is automatic, based on MAC binding to different OCP nodes role, so no need of choosing it from the menus, this means you can just run the playbook, take a beer and have your fully running OCP up and running.

The version can be selected freely, by specifying the desired one (i.e. 4.10.x, 4.13.2) or the latest stable release with "stable". Versions before 4.6 are not supported anymore!!

Now support for Single Node Openshift - SNO has been added!

bastion and loadbalancer VMs spec:

The user is capable of logging via SSH too.

Quickstart

First of all, you need to install required collections to get started:

ansible-galaxy collection install -r requirements.yml

The playbook is meant to run against local host/s, defined under vm_host group in your inventory, depending on how many clusters you want to configure at once.

HA Clusters

ansible-playbook main.yml

Single Node Openshift (SNO)

ansible-playbook main-sno.yml

You can quickly make it work by configuring the needed vars, but you can go straight with the defaults!

Quickstart with Execution Environment

The playbooks are compatible with the newly introduced Execution environments (EE). To use them with an execution environment you need to have ansible-builder and ansible-navigator installed.

Build EE image

To build the EE image, jump in the execution-environment folder and run the build:

ansible-builder build -f execution-environment/execution-environment.yml -t ocp-ee

Run playbooks

To run the playbooks use ansible navigator:

ansible-navigator run main.yml -m stdout

Or, in case of Single Node Openshift:

ansible-navigator run main-sno.yml -m stdout

Common vars

The kind of network created is a simple NAT configuration, without DHCP since it will be provisioned with bastion VM. Defaults can be OK if you don't have any overlapping network.

HA Configuration vars

vars/infra_vars.yml

infra_nodes:
  host_list:
    bastion:
      - ip: 192.168.100.4
    loadbalancer:
      - ip: 192.168.100.5
dhcp:
  timezone: "Europe/Rome"
  ntp: 204.11.201.10

vars/cluster_vars.yml

three_node: false
network_cidr: 192.168.100.0/24
domain: hetzner.lab
additional_block_device:
  enabled: false
  size: 100
additional_nic:
  enabled: false
  network:
cluster:
  version: stable
  name: ocp4
  ocp_user: admin
  ocp_pass: openshift
  pullSecret: ""
cluster_nodes:
  host_list:
    bootstrap:
      - ip: 192.168.100.6
    masters:
      - ip: 192.168.100.7
      - ip: 192.168.100.8
      - ip: 192.168.100.9
    workers:
      - ip: 192.168.100.10
        role: infra
      - ip: 192.168.100.11
      - ip: 192.168.100.12
  specs:
    bootstrap:
      vcpu: 4
      mem: 16
      disk: 40
    masters:
      vcpu: 4
      mem: 16
      disk: 40
    workers:
      vcpu: 2
      mem: 8
      disk: 40

Where domain is the dns domain assigned to the nodes and cluster.name is the name chosen for our OCP cluster installation.

mem and disk are intended in GB

cluster.version allows you to choose a particular version to be installed (i.e. 4.5.0, stable)

additional_block_device controls whether an additional disk of the given size should be added to Workers or Control Plane nodes in case of compact (3 nodes) setup

additional_nic allows the creation of an additional network interface on all nodes. It is possible to customize the libvirt network to attach to it.

The role for workers is intended for nodes labelling. Omitting labels sets them to their default value, worker

The count of VMs is taken by the elements of the list, in this example, we got:

  • 3 master nodes with 4vcpu and 16G memory
  • 3 worker nodes with 2vcpu and 8G memory

Recommended values are:

Role vCPU RAM Storage
bootstrap 4 16G 120G
master 4 16G 120G
worker 2 8G 120G

For testing purposes, minimum storage value is set at 60GB.

The playbook now supports three nodes setup (3 masters with both master and worker node role) intended for pure testing purposes and you can enable it with the three_node boolean var ONLY FOR 4.6+

Single Node Openshift vars

vars/cluster_vars.yml

domain: hetzner.lab
network_cidr: 192.168.100.0/24
cluster:
  version: stable
  name: ocp4
  ocp_user: admin
  ocp_pass: openshift
  pullSecret: ""
cluster_nodes:
  host_list:
    sno:
      ip: 192.168.100.7
  specs:
    sno:
      vcpu: 8
      mem: 32
      disk: 120
local_storage:
  enabled: true
  volume_size: 50
additional_nic:
  enabled: false
  network:

local_storage field can be used to provision an additional disk to the VM in order to provision volumes using, for instance, rook-ceph or local storage operator.

additional_nic allows the creation of an additional network interface on the node. It is possible to customize the libvirt network to attach to it.

In both cases, Pull Secret can be retrived easily at https://cloud.redhat.com/openshift/install/pull-secret

HTPasswd provider is created after the installation, you can use ocp_user and ocp_pass to login!

Cleanup

To clean all resources, you can simply run the cleanup playbooks.

Full deployment cleanup

ansible-playbook -i inventory 99_cleanup.yml

SNO deployment cleanup

ansible-playbook -i inventory 99_cleanup_sno.yml

DISCLAIMER This project is for testing/lab only, it is not supported in any way by Red Hat nor endorsed.

Feel free to suggest modifications/improvements.

Alex

libvirt-ocp4-provisioner's People

Contributors

carljmosca avatar kubealex avatar mfagotto avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

libvirt-ocp4-provisioner's Issues

RHEL

Hello, How may I hard code the prerequisite check to allow RHEL?

I have:

cat /etc/*-release

NAME="Red Hat Enterprise Linux"
VERSION="8.3 (Ootpa)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="8.3"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Red Hat Enterprise Linux 8.3 (Ootpa)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:8.3:GA"
HOME_URL="https://www.redhat.com/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"

Thank you!

Missing collection : community.crypto

Working on pull from 2021/11/08 13:00 Eastern.

Attempting to run on RHEL8 server.

After installing ansible from the ansible-automation-platform-2.0-early-access-for-rhel-8-x86_64-rpms available via a Red Hat Developer subscription, I ran the 'ansible-galaxy collection install -r requirements.yml' command.

As I'm running this on a single server, I chose the 'ansible-playbook main-sno.yml' command next. It failed with message "ERROR! couldn't resolve module/action 'openssh_keypair'. This often indicates a misspelling, missing collection, or incorrect module path."

I found the collection "community.crypto" on galaxy.ansible.com and added it to the requirements.yml file, re-ran that then re-rain main-sno.yml and got past the error.

Newbie to GitHub so didn't want to even try to figure out if it was possible to submit the change, thought this was safest.

Install fails if there isn't enough disk space with no clear message

I had this happen when /root and /var/lib/libvirt/images weren't big enough. I resolved by increasing the space available, but then worked with someone to put together my first-ever piece of yaml code that I offer up for inclusion. It might help to check at least two locations, because I ran out of space on both.

I'll include the entire yaml file contents, obviously you won't need all of it.

`- hosts: localhost
become: True
vars:
#Specify space needed in bytes
need_space: 42949672960
#Specify directory where images are to be built
mntpt: "{{(ansible_mounts | selectattr('mount', 'in', '/var/lib/libvirt/images') | list | sort(attribute='mount'))[-1]['mount']}}"
tasks:

  • name: Assert if there is enough space
    loop: "{{ ansible_mounts }}"
    when: ( item.mount == mntpt )
    assert:
    that: ( item.size_available > need_space )
    `

The value for need_space was just something I was using to test, it should get set to however much room is actually needed for the path in question. The setting of mntpt is being done in such a way that if the path specified (/var/lib/libvirt/images in this example) is a subdirectory of a filesystem, mntpt will get set to the "owning" mout point of the filesystem, so the check against the ansible_mounts fact will work properly.

libvirt-sock missing on Fedora 36

I kept getting the error about:

TASK [Use TF project to ensure pool and network are defined] **************************************************************************************************************************
fatal: [localhost]: FAILED! => {"changed": false, "msg": "Terraform plan could not be created\r\nSTDOUT: \r\n\r\nSTDERR: \nError: failed to dial libvirt: dial unix /var/run/libvirt/libvirt-sock: connect: no such file or directory\n\n with provider["registry.terraform.io/dmacvicar/libvirt"],\n on libvirt-resources.tf line 10, in provider "libvirt":\n 10: provider "libvirt" {\n\n"}

PLAY RECAP ****************************************************************************************************************************************************************************
localhost : ok=15 changed=0 unreachable=0 failed=1 skipped=12 rescued=0 ignored=0

This is fixed by editing /etc/libvirt/libvirtd.conf and adding:

unix_sock_group = "libvirt"
unix_sock_ro_perms = "0777"
unix_sock_rw_perms = "0770"

and then restarting libvirtd:

sudo systemctl restart libvirtd

playbook is not really remote host compatible - tested with sno part

Hello!

Wanted to use SNO deploy from Fedora 34 machine to RHEL 8.5 host, but it always failed in Terraform part:

TASK [Use TF project to ensure pool and network are defined] **************************************************************************
fatal: [remote.host]: FAILED! => {"changed": false, "msg": "Path for Terraform project 'terraform/libvirt-resources' doesn't exist on this host - check the path and try again please."}

Local invocation works fine.

Used the sources from today.

Also I noticed I had to copy sno_vars.yml to cluster_vars.yml - otherwise my domain from sno_vars.yml was not being used. Instead default hetzner.lab was used.

Here is the whole install log:
[user@machine libvirt-ocp4-provisioner]$ ansible-playbook -i inventory main-sno.yml

PLAY [This play ensures prerequisites are satisfied before installing] ****************************************************************

TASK [Gathering Facts] ****************************************************************************************************************
ok: [remote.host]

TASK [Check if distribution is supported] *********************************************************************************************
skipping: [remote.host]

TASK [fail] ***************************************************************************************************************************
skipping: [remote.host]

TASK [fail] ***************************************************************************************************************************
skipping: [remote.host]

TASK [Fail fast if bootstrap node doesn't meet minimum requirements] ******************************************************************
skipping: [remote.host]

TASK [Check for pullSecret variable and fail fast] ************************************************************************************
skipping: [remote.host]

PLAY [This play installs needed tools to provision infrastructure VMs] ****************************************************************

TASK [Gathering Facts] ****************************************************************************************************************
ok: [remote.host]

TASK [Install needed packages] ********************************************************************************************************
ok: [remote.host]

TASK [Install needed packages] ********************************************************************************************************
skipping: [remote.host]

TASK [Install needed packages] ********************************************************************************************************
skipping: [remote.host]

TASK [Install needed packages] ********************************************************************************************************
skipping: [remote.host]

TASK [Download and provision Terraform] ***********************************************************************************************
ok: [remote.host]

TASK [Virtualization services are enabled] ********************************************************************************************
ok: [remote.host]

TASK [Use TF project to ensure pool and network are defined] **************************************************************************
fatal: [remote.host]: FAILED! => {"changed": false, "msg": "Path for Terraform project 'terraform/libvirt-resources' doesn't exist on this host - check the path and try again please."}

PLAY RECAP ****************************************************************************************************************************
remote.host : ok=5 changed=0 unreachable=0 failed=1 skipped=8 rescued=0 ignored=0

Error executing scritp

[root@localhost libvirt-ocp4-provisioner]# ansible-playbook main.yml
ERROR! couldn't resolve module/action 'community.general.terraform'. This often indicates a misspelling, missing collection, or incorrect module path.

The error appears to be in '/root/libvirt-ocp4-provisioner/01_install_virtualization_tools.yml': line 46, column 7, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:

- name: Use TF project to ensure pool and network are defined
  ^ here

Initramfs and kernel images download is locked to an older version

Description of the problem

During the installation of the default OCP4 cluster tha playbook execution stops at the task "Download initramfs and kernel". The full error message is:

failed: [bastion] (item=rhcos-4.4.3-x86_64-installer-initramfs.x86_64.img) => {"ansible_loop_var": "item", "changed": false, "dest": "/var/lib/tftpboot/rhcos/rhcos-4.4.3-x86_64-installer-initramfs.x86_64.img", "elapsed": 0, "item": "rhcos-4.4.3-x86_64-installer-initramfs.x86_64.img", "msg": "Request failed", "response": "HTTP Error 404: Not Found", "status_code": 404, "url": "https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.4/latest/rhcos-4.4.3-x86_64-installer-initramfs.x86_64.img"}
failed: [bastion] (item=rhcos-4.4.3-x86_64-installer-kernel-x86_64) => {"ansible_loop_var": "item", "changed": false, "dest": "/var/lib/tftpboot/rhcos/rhcos-4.4.3-x86_64-installer-kernel-x86_64", "elapsed": 0, "item": "rhcos-4.4.3-x86_64-installer-kernel-x86_64", "msg": "Request failed", "response": "HTTP Error 404: Not Found", "status_code": 404, "url": "https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.4/latest/rhcos-4.4.3-x86_64-installer-kernel-x86_64"}

Cause

The latest folder in mirror used for the download has been updated to version 4.4.9.

Resolution

Point to the https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.4/4.4.3/ url for correct resolution of the images.

Add support to RHEL

In order to extend capable hosts, add RHEL to the OSes that can be used on the host machine.

path not set for /usr/local/bin ?

It appears that ansible isn't looking at the environment variables set for path on the installed nodes.
I can modify the playbooks where the oc and openshift-install commands are run, and add an absolute
path in front of the binaries and they execute successfully.

System info:
[yates@cypress0 libvirt-ocp4-provisioner]$ cat /etc/os-release
NAME="Fedora Linux"
VERSION="36 (Workstation Edition)"
ID=fedora
VERSION_ID=36
VERSION_CODENAME=""
PLATFORM_ID="platform:f36"
PRETTY_NAME="Fedora Linux 36 (Workstation Edition)"
ANSI_COLOR="0;38;2;60;110;180"
LOGO=fedora-logo-icon
CPE_NAME="cpe:/o:fedoraproject:fedora:36"
HOME_URL="https://fedoraproject.org/"
DOCUMENTATION_URL="https://docs.fedoraproject.org/en-US/fedora/f36/system-administrators-guide/"
SUPPORT_URL="https://ask.fedoraproject.org/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Fedora"
REDHAT_BUGZILLA_PRODUCT_VERSION=36
REDHAT_SUPPORT_PRODUCT="Fedora"
REDHAT_SUPPORT_PRODUCT_VERSION=36
PRIVACY_POLICY_URL="https://fedoraproject.org/wiki/Legal:PrivacyPolicy"
VARIANT="Workstation Edition"
VARIANT_ID=workstation

Ansible info:
[yates@cypress0 libvirt-ocp4-provisioner]$ ansible --version
ansible [core 2.12.6]
config file = /home/yates/git/libvirt-ocp4-provisioner/ansible.cfg
configured module search path = ['/home/yates/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /usr/lib/python3.10/site-packages/ansible
ansible collection location = /home/yates/.ansible/collections:/usr/share/ansible/collections
executable location = /usr/bin/ansible
python version = 3.10.4 (main, Mar 25 2022, 00:00:00) [GCC 12.0.1 20220308 (Red Hat 12.0.1-0)]
jinja version = 3.0.3
libyaml = True

Terraform info:
[yates@cypress0 libvirt-ocp4-provisioner]$ terraform --version
Terraform v1.1.4
on linux_amd64

Your version of Terraform is out of date! The latest version
is 1.2.3. You can update by downloading from https://www.terraform.io/downloads.html

** Fails on checking for openshift binaries on bastion

TASK [Checking for openshift-install tool] ********************************************************************************************************************************************
fatal: [bastion]: FAILED! => {"changed": true, "cmd": "openshift-install version", "delta": "0:00:00.003313", "end": "2022-06-18 21:42:05.895890", "failed_when_result": true, "msg": "non-zero return code", "rc": 127, "start": "2022-06-18 21:42:05.892577", "stderr": "/bin/sh: openshift-install: command not found", "stderr_lines": ["/bin/sh: openshift-install: command not found"], "stdout": "", "stdout_lines": []}

PLAY RECAP ****************************************************************************************************************************************************************************
bastion : ok=18 changed=3 unreachable=0 failed=1 skipped=4 rescued=0 ignored=0
loadbalancer : ok=7 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
localhost : ok=30 changed=1 unreachable=0 failed=0 skipped=14 rescued=0 ignored=0

Error when target is Fedora

When the target is Fedora 34 (but the problem should be present on older/newer versions too), I get this error:

TASK [Install needed packages] ***************************************************************************************************************************************************************************************************************
fatal: [localhost]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'fedora'\n\nThe error appears to be in '/home/elroncio/Documents/Stuff/Mine/libvirt-ocp4-provisioner/01_install_virtualization_tools.yml': line 26, column 7, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n    - name: Install needed packages\n      ^ here\n"}

next_nth_usable filter requires python's netaddr be installed on the ansible controller

I'm getting this error during the "Configure NetworkManager for libvirt network" task. I've installed python3_netaddr on the base server, but this still comes up. Full -vvv output from the relevant section of the playbook:

TASK [Configure NetworkManager for libvirt network] ***************************************************************************************************************************************************************** task path: /images/libvirt-ocp4-provisioner-master/70_setup_sno_cluster.yml:80 <localhost> ESTABLISH LOCAL CONNECTION FOR USER: root <localhost> EXEC /bin/sh -c 'echo ~root && sleep 0' <localhost> EXEC /bin/sh -c '( umask 77 && mkdir -p " echo /root/.ansible/tmp "&& mkdir " echo /root/.ansible/tmp/ansible-tmp-1636499160.5299973-58778-194397503289046 " && echo ansible-tmp-1636499160.5299973-58778-194397503289046=" echo /root/.ansible/tmp/ansible-tmp-1636499160.5299973-58778-194397503289046 " ) && sleep 0' redirecting filter ansible.builtin.next_nth_usable to ansible.netcommon.next_nth_usable redirecting filter ansible.builtin.next_nth_usable to ansible.netcommon.next_nth_usable redirecting filter ansible.builtin.next_nth_usable to ansible.netcommon.next_nth_usable redirecting filter ansible.builtin.next_nth_usable to ansible.netcommon.next_nth_usable redirecting filter ansible.builtin.next_nth_usable to ansible.netcommon.next_nth_usable redirecting filter ansible.builtin.next_nth_usable to ansible.netcommon.next_nth_usable redirecting filter ansible.builtin.next_nth_usable to ansible.netcommon.next_nth_usable <localhost> EXEC /bin/sh -c 'rm -f -r /root/.ansible/tmp/ansible-tmp-1636499160.5299973-58778-194397503289046/ > /dev/null 2>&1 && sleep 0' The full traceback is: Traceback (most recent call last): File "/usr/lib/python3.8/site-packages/ansible/plugins/action/template.py", line 146, in run resultant = templar.do_template(template_data, preserve_trailing_newlines=True, escape_backslashes=False) File "/usr/lib/python3.8/site-packages/ansible/template/__init__.py", line 1100, in do_template res = j2_concat(rf) File "<template>", line 16, in root File "/usr/lib/python3.8/site-packages/ansible/template/__init__.py", line 265, in wrapper ret = func(*args, **kwargs) File "/root/.ansible/collections/ansible_collections/ansible/netcommon/plugins/filter/ipaddr.py", line 1125, in _need_netaddr raise errors.AnsibleFilterError( ansible.errors.AnsibleFilterError: The next_nth_usable filter requires python's netaddr be installed on the ansible controller fatal: [localhost]: FAILED! => { "changed": false, "msg": "AnsibleFilterError: The next_nth_usable filter requires python's netaddr be installed on the ansible controller" }

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.