Giter Site home page Giter Site logo

ansible-role-proxmox's Introduction

Galaxy Role

lae.proxmox

Installs and configures Proxmox Virtual Environment 6.x/7.x/8.x on Debian servers.

This role allows you to deploy and manage single-node PVE installations and PVE clusters (3+ nodes) on Debian Buster (10) and Bullseye (11). You are able to configure the following with the assistance of this role:

  • PVE RBAC definitions (roles, groups, users, and access control lists)
  • PVE Storage definitions
  • datacenter.cfg
  • HTTPS certificates for the Proxmox Web GUI (BYO)
  • PVE repository selection (e.g. pve-no-subscription or pve-enterprise)
  • Watchdog modules (IPMI and NMI) with applicable pve-ha-manager config
  • ZFS module setup and ZED notification email

With clustering enabled, this role does (or allows you to do) the following:

  • Ensure all hosts can connect to one another as root over SSH
  • Initialize a new PVE cluster (or possibly adopt an existing one)
  • Create or add new nodes to a PVE cluster
  • Setup Ceph on a PVE cluster
  • Create and manage high availability groups

Support/Contributing

For support or if you'd like to contribute to this role but want guidance, feel free to join this Discord server: https://discord.gg/cjqr6Fg. Please note, this is an temporary invite, so you'll need to wait for @lae to assign you a role, otherwise Discord will remove you from the server when you logout.

Quickstart

The primary goal for this role is to configure and manage a Proxmox VE cluster (see example playbook), however this role can be used to quickly install single node Proxmox servers.

I'm assuming you already have Ansible installed. You will need to use an external machine to the one you're installing Proxmox on (primarily because of the reboot in the middle of the installation, though I may handle this somewhat differently for this use case later).

Copy the following playbook to a file like install_proxmox.yml:

- hosts: all
  become: True
  roles:
    - role: geerlingguy.ntp
      vars:
        ntp_manage_config: true
        ntp_servers:
          - clock.sjc.he.net,
          - clock.fmt.he.net,
          - clock.nyc.he.net
    - role: lae.proxmox
      vars:
        - pve_group: all
        - pve_reboot_on_kernel_update: true

Install this role and a role for configuring NTP:

ansible-galaxy install lae.proxmox geerlingguy.ntp

Now you can perform the installation:

ansible-playbook install_proxmox.yml -i $SSH_HOST_FQDN, -u $SSH_USER

If your SSH_USER has a sudo password, pass the -K flag to the above command. If you also authenticate to the host via password instead of pubkey auth, pass the -k flag (make sure you have sshpass installed as well). You can set those variables prior to running the command or just replace them. Do note the comma is important, as a list is expected (otherwise it'll attempt to look up a file containing a list of hosts).

Once complete, you should be able to access your Proxmox VE instance at https://$SSH_HOST_FQDN:8006.

Deploying a fully-featured PVE 8.x cluster

Create a new playbook directory. We call ours lab-cluster. Our playbook will eventually look like this, but yours does not have to follow all of the steps:

lab-cluster/
├── files
│   └── pve01
│       ├── lab-node01.local.key
│       ├── lab-node01.local.pem
│       ├── lab-node02.local.key
│       ├── lab-node02.local.pem
│       ├── lab-node03.local.key
│       └── lab-node03.local.pem
├── group_vars
│   ├── all
│   └── pve01
├── inventory
├── roles
│   └── requirements.yml
├── site.yml
└── templates
    └── interfaces-pve01.j2

6 directories, 12 files

First thing you may note is that we have a bunch of .key and .pem files. These are private keys and SSL certificates that this role will use to configure the web interface for Proxmox across all the nodes. These aren't necessary, however, if you want to keep using the signed certificates by the CA that Proxmox sets up internally. You may typically use Ansible Vault to encrypt the private keys, e.g.:

ansible-vault encrypt files/pve01/*.key

This would then require you to pass the Vault password when running the playbook.

Let's first specify our cluster hosts. Our inventory file may look like this:

[pve01]
lab-node01.local
lab-node02.local
lab-node03.local

You could have multiple clusters, so it's a good idea to have one group for each cluster. Now, let's specify our role requirements in roles/requirements.yml:

---
- src: geerlingguy.ntp
- src: lae.proxmox

We need an NTP role to configure NTP, so we're using Jeff Geerling's role to do so. You wouldn't need it if you already have NTP configured or have a different method for configuring NTP.

Now, let's specify some group variables. First off, let's create group_vars/all for setting NTP-related variables:

---
ntp_manage_config: true
ntp_servers:
  - lab-ntp01.local iburst
  - lab-ntp02.local iburst

Of course, replace those NTP servers with ones you prefer.

Now for the flesh of your playbook, pve01's group variables. Create a file group_vars/pve01, add the following, and modify accordingly for your environment.

---
pve_group: pve01
pve_watchdog: ipmi
pve_ssl_private_key: "{{ lookup('file', pve_group + '/' + inventory_hostname + '.key') }}"
pve_ssl_certificate: "{{ lookup('file', pve_group + '/' + inventory_hostname + '.pem') }}"
pve_cluster_enabled: yes
pve_groups:
  - name: ops
    comment: Operations Team
pve_users:
  - name: admin1@pam
    email: [email protected]
    firstname: Admin
    lastname: User 1
    groups: [ "ops" ]
  - name: admin2@pam
    email: [email protected]
    firstname: Admin
    lastname: User 2
    groups: [ "ops" ]
pve_acls:
  - path: /
    roles: [ "Administrator" ]
    groups: [ "ops" ]
pve_storages:
  - name: localdir
    type: dir
    content: [ "images", "iso", "backup" ]
    path: /plop
    maxfiles: 4
pve_ssh_port: 22

interfaces_template: "interfaces-{{ pve_group }}.j2"

pve_group is set to the group name of our cluster, pve01 - it will be used for the purposes of ensuring all hosts within that group can connect to each other and are clustered together. Note that the PVE cluster name will be set to this group name as well, unless otherwise specified by pve_cluster_clustername. Leaving this undefined will default to proxmox.

pve_watchdog here enables IPMI watchdog support and configures PVE's HA manager to use it. Leave this undefined if you don't want to configure it.

pve_ssl_private_key and pve_ssl_certificate point to the SSL certificates for pvecluster. Here, a file lookup is used to read the contents of a file in the playbook, e.g. files/pve01/lab-node01.key. You could possibly just use host variables instead of files, if you prefer.

pve_cluster_enabled enables the role to perform all cluster management tasks. This includes creating a cluster if it doesn't exist, or adding nodes to the existing cluster. There are checks to make sure you're not mixing nodes that are already in existing clusters with different names.

pve_groups, pve_users, and pve_acls authorizes some local UNIX users (they must already exist) to access PVE and gives them the Administrator role as part of the ops group. Read the User and ACL Management section for more info.

pve_storages allows to create different types of storage and configure them. The backend needs to be supported by Proxmox. Read the Storage Management section for more info.

pve_ssh_port allows you to change the SSH port. If your SSH is listening on a port other than the default 22, please set this variable. If a new node is joining the cluster, the PVE cluster needs to communicate once via SSH.

pve_manage_ssh (default true) allows you to disable any changes this module would make to your SSH server config. This is useful if you use another role to manage your SSH server. Note that setting this to false is not officially supported, you're on your own to replicate the changes normally made in ssh_cluster_config.yml and pve_add_node.yml.

interfaces_template is set to the path of a template we'll use for configuring the network on these Debian machines. This is only necessary if you want to manage networking from Ansible rather than manually or via each host in PVE. You should probably be familiar with Ansible prior to doing this, as your method may involve setting host variables for the IP addresses for each host, etc.

Let's get that interface template out of the way. Feel free to skip this file (and leave it undefined in group_vars/pve01) otherwise. Here's one that I use:

# {{ ansible_managed }}
auto lo
iface lo inet loopback

allow-hotplug enp2s0f0
iface enp2s0f0 inet manual

auto vmbr0
iface vmbr0 inet static
    address {{ lookup('dig', ansible_fqdn) }}
    gateway 10.4.0.1
    netmask 255.255.255.0
    bridge_ports enp2s0f0
    bridge_stp off
    bridge_fd 0

allow-hotplug enp2s0f1
auto enp2s0f1
iface enp2s0f1 inet static
    address {{ lookup('dig', ansible_hostname + "-clusternet.local") }}
    netmask 255.255.255.0

You might not be familiar with the dig lookup, but basically here we're doing an A record lookup for each machine (e.g. lab-node01.local) for the first interface (and configuring it as a bridge we'll use for VM interfaces), and then another slightly modified lookup for the "clustering" network we might use for Ceph ("lab-node01-clusternet.local"). Of course, yours may look completely different, especially if you're using bonding, three different networks for management/corosync, storage and VM traffic, etc.

Finally, let's write our playbook. site.yml will look something like this:

---
- hosts: all
  become: True
  roles:
    - geerlingguy.ntp

# Leave this out if you're not modifying networking through Ansible
- hosts: pve01
  become: True
  serial: 1
  tasks:
    - name: Install bridge-utils
      apt:
        name: bridge-utils

    - name: Configure /etc/network/interfaces
      template:
        src: "{{ interfaces_template }}"
        dest: /etc/network/interfaces
      register: _configure_interfaces

    - block:
      - name: Reboot for networking changes
        shell: "sleep 5 && shutdown -r now 'Networking changes found, rebooting'"
        async: 1
        poll: 0

      - name: Wait for server to come back online
        wait_for_connection:
          delay: 15
      when: _configure_interfaces is changed

- hosts: pve01
  become: True
  roles:
    - lae.proxmox

Basically, we run the NTP role across all hosts (you might want to add some non-Proxmox machines), configure networking on pve01 with our separate cluster network and bridge layout, reboot to make those changes take effect, and then run this Proxmox role against the hosts to setup a cluster.

At this point, our playbook is ready and we can run the playbook.

Ensure that roles and dependencies are installed:

ansible-galaxy install -r roles/requirements.yml --force
pip install jmespath dnspython

jmespath is required for some of the tasks involving clustering. dnspython is only required if you're using a dig lookup, which you probably won't be if you skipped configuring networking. We pass --force to ansible-galaxy here so that roles are updated to their latest versions if already installed.

Now run the playbook:

ansible-playbook -i inventory site.yml -e '{"pve_reboot_on_kernel_update": true}'

The -e '{"pve_reboot_on_kernel_update": true}' should mainly be run the first time you do the Proxmox cluster setup, as it'll reboot the server to boot into a PVE kernel. Subsequent runs should leave this out, as you want to sequentially reboot servers after the cluster is running.

To specify a particular user, use -u root (replacing root), and if you need to provide passwords, use -k for SSH password and/or -K for sudo password. For example:

ansible-playbook -i inventory site.yml -K -u admin1

This will ask for a sudo password, then login to the admin1 user (using public key auth - add -k for pw) and run the playbook.

That's it! You should now have a fully deployed Proxmox cluster. You may want to create Ceph storage on it afterwards (see Ceph for more info) and other tasks possibly, but the hard part is mostly complete.

Example Playbook

This will configure hosts in the group pve01 as one cluster, as well as reboot the machines should the kernel have been updated. (Only recommended to set this flag during installation - reboots during operation should occur serially during a maintenance period.) It will also enable the IPMI watchdog.

- hosts: pve01
  become: True
  roles:
    - role: geerlingguy.ntp
        ntp_manage_config: true
        ntp_servers:
          - clock.sjc.he.net,
          - clock.fmt.he.net,
          - clock.nyc.he.net
    - role: lae.proxmox
        pve_group: pve01
        pve_cluster_enabled: yes
        pve_reboot_on_kernel_update: true
        pve_watchdog: ipmi

Role Variables

[variable]: [default] #[description/purpose]
pve_group: proxmox # host group that contains the Proxmox hosts to be clustered together
pve_repository_line: "deb http://download.proxmox.com/debian/pve bullseye pve-no-subscription" # apt-repository configuration - change to enterprise if needed (although TODO further configuration may be needed)
pve_remove_subscription_warning: true # patches the subscription warning messages in proxmox if you are using the community edition
pve_extra_packages: [] # Any extra packages you may want to install, e.g. ngrep
pve_run_system_upgrades: false # Let role perform system upgrades
pve_run_proxmox_upgrades: true # Let role perform Proxmox VE upgrades
pve_check_for_kernel_update: true # Runs a script on the host to check kernel versions
pve_reboot_on_kernel_update: false # If set to true, will automatically reboot the machine on kernel updates
pve_reboot_on_kernel_update_delay: 60 # Number of seconds to wait before and after a reboot process to proceed with next task in cluster mode
pve_remove_old_kernels: true # Currently removes kernel from main Debian repository
pve_pcie_passthrough_enabled: false # Set this to true to enable PCIe passthrough.
pve_iommu_passthrough_mode: false # Set this to true to allow VMs to bypass the DMA translation. This might increase performance for IOMMU passthrough.
pve_iommu_unsafe_interrupts: false # Set this to true if your system doesn't support interrupt remapping.
pve_mediated_devices_enabled: false # Set this to true if your device supports gtv-g and you wish to enable split functionality.
pve_pcie_ovmf_enabled: false # Set this to true to enable GPU OVMF PCI passthrough.
pve_pci_device_ids: [] # List of pci device ID's (see https://pve.proxmox.com/wiki/Pci_passthrough#GPU_Passthrough).
pve_vfio_blacklist_drivers: [] # List of device drivers to blacklist from the Proxmox host (see https://pve.proxmox.com/wiki/PCI(e)_Passthrough).
pve_pcie_ignore_msrs: false # Set this to true if passing through to Windows machine to prevent VM crashing.
pve_pcie_report_msrs: true # Set this to false to prevent dmesg system from logging msrs crash reports.
pve_watchdog: none # Set this to "ipmi" if you want to configure a hardware watchdog. Proxmox uses a software watchdog (nmi_watchdog) by default.
pve_watchdog_ipmi_action: power_cycle # Can be one of "reset", "power_cycle", and "power_off".
pve_watchdog_ipmi_timeout: 10 # Number of seconds the watchdog should wait
pve_zfs_enabled: no # Specifies whether or not to install and configure ZFS packages
# pve_zfs_options: "" # modprobe parameters to pass to zfs module on boot/modprobe
# pve_zfs_zed_email: "" # Should be set to an email to receive ZFS notifications
pve_zfs_create_volumes: [] # List of ZFS Volumes to create (to use as PVE Storages). See section on Storage Management.
pve_ceph_enabled: false # Specifies wheter or not to install and configure Ceph packages. See below for an example configuration.
pve_ceph_repository_line: "deb http://download.proxmox.com/debian/ceph-pacific bullseye main" # apt-repository configuration. Will be automatically set for 6.x and 7.x (Further information: https://pve.proxmox.com/wiki/Package_Repositories)
pve_ceph_network: "{{ (ansible_default_ipv4.network +'/'+ ansible_default_ipv4.netmask) | ansible.utils.ipaddr('net') }}" # Ceph public network
# pve_ceph_cluster_network: "" # Optional, if the ceph cluster network is different from the public network (see https://pve.proxmox.com/pve-docs/chapter-pveceph.html#pve_ceph_install_wizard)
pve_ceph_nodes: "{{ pve_group }}" # Host group containing all Ceph nodes
pve_ceph_mon_group: "{{ pve_group }}" # Host group containing all Ceph monitor hosts
pve_ceph_mgr_group: "{{ pve_ceph_mon_group }}" # Host group containing all Ceph manager hosts
pve_ceph_mds_group: "{{ pve_group }}" # Host group containing all Ceph metadata server hosts
pve_ceph_osds: [] # List of OSD disks
pve_ceph_pools: [] # List of pools to create
pve_ceph_fs: [] # List of CephFS filesystems to create
pve_ceph_crush_rules: [] # List of CRUSH rules to create
# pve_ssl_private_key: "" # Should be set to the contents of the private key to use for HTTPS
# pve_ssl_certificate: "" # Should be set to the contents of the certificate to use for HTTPS
pve_roles: [] # Added more roles with specific privileges. See section on User Management.
pve_groups: [] # List of group definitions to manage in PVE. See section on User Management.
pve_users: [] # List of user definitions to manage in PVE. See section on User Management.
pve_storages: [] # List of storages to manage in PVE. See section on Storage Management.
pve_datacenter_cfg: {} # Dictionary to configure the PVE datacenter.cfg config file.
pve_domains_cfg: [] # List of realms to use as authentication sources in the PVE domains.cfg config file.
pve_no_log: false # Set this to true in production to prevent leaking of storage credentials in run logs. (may be used in other tasks in the future)

To enable clustering with this role, configure the following variables appropriately:

pve_cluster_enabled: no # Set this to yes to configure hosts to be clustered together
pve_cluster_clustername: "{{ pve_group }}" # Should be set to the name of the PVE cluster
pve_manage_hosts_enabled : yes # Set this to no to NOT configure hosts file (case of using vpn and hosts file is already configured)

The following variables are used to provide networking information to corosync. These are known as ring0_addr/ring1_addr or link0_addr/link1_addr, depending on PVE version. They should be IPv4 or IPv6 addresses. You can also configure the priority of these interfaces to hint to corosync which interface should handle cluster traffic (lower numbers indicate higher priority). For more information, refer to the Cluster Manager chapter in the PVE Documentation.

# pve_cluster_addr0: "{{ defaults to the default interface ipv4 or ipv6 if detected }}"
# pve_cluster_addr1: "another interface's IP address or hostname"
# pve_cluster_addr0_priority: 255
# pve_cluster_addr1_priority: 0

You can set options in the datacenter.cfg configuration file:

pve_datacenter_cfg:
  keyboard: en-us

You can also configure HA manager groups:

pve_cluster_ha_groups: [] # List of HA groups to create in PVE.

This example creates a group "lab_node01" for resources assigned to the lab-node01 host:

pve_cluster_ha_groups:
  - name: lab_node01
    comment: "My HA group"
    nodes: "lab-node01"
    nofailback: 0
    restricted: 0

All configuration options supported in the datacenter.cfg file are documented in the Proxmox manual datacenter.cfg section.

In order for live reloading of network interfaces to work via the PVE web UI, you need to install the ifupdown2 package. Note that this will remove ifupdown. You can specify this using the pve_extra_packages role variable.

You can set realms / domains as authentication sources in the domains.cfg configuration file. If this file is not present, only the Linux PAM and Proxmox VE authentication server realms are available. Supported types are pam, pve, ad and ldap. It’s possible to automatically sync users and groups for LDAP-based realms (LDAP & Microsoft Active Directory) with sync: true. One realm should have the default: 1 property to mark it as the default:

pve_domains_cfg:
  - name: pam
    type: pam
    attributes:
      comment: Linux PAM standard authentication
  - name: pve
    type: pve
    attributes:
      comment: Proxmox VE authentication server
  - name: ad
    type: ad
    attributes:
      comment: Active Directory authentication
      domain: yourdomain.com
      server1: dc01.yourdomain.com
      default: 1
      secure: 1
      server2: dc02.yourdomain.com
  - name: ldap
    type: ldap
    sync: true
    attributes:
      comment: LDAP authentication
      base_dn: CN=Users,dc=yourdomain,dc=com
      bind_dn: "uid=svc-reader,CN=Users,dc=yourdomain,dc=com"
      bind_password: "{{ secret_ldap_svc_reader_password }}"
      server1: ldap1.yourdomain.com
      user_attr: uid
      secure: 1
      server2: ldap2.yourdomain.com

Dependencies

This role does not install NTP, so you should configure NTP yourself, e.g. with the geerlingguy.ntp role as shown in the example playbook.

When clustering is enabled, this role makes use of the json_query filter, which requires that the jmespath library be installed on your control host. You can either pip install jmespath or install it via your distribution's package manager, e.g. apt-get install python-jmespath.

User and ACL Management

You can use this role to manage users and groups within Proxmox VE (both in single server deployments and cluster deployments). Here are some examples.

pve_groups:
  - name: Admins
    comment: Administrators of this PVE cluster
  - name: api_users
  - name: test_users
pve_users:
  - name: root@pam
    email: [email protected]
  - name: lae@pam
    email: [email protected]
    firstname: Musee
    lastname: Ullah
    groups: [ "Admins" ]
  - name: pveapi@pve
    password: "Proxmox789"
    groups:
      - api_users
  - name: testapi@pve
    password: "Test456"
    enable: no
    groups:
      - api_users
      - test_users
  - name: tempuser@pam
    expire: 1514793600
    groups: [ "test_users" ]
    comment: "Temporary user set to expire on 2018年  1月  1日 月曜日 00:00:00 PST"
    email: [email protected]
    firstname: Test
    lastname: User

Refer to library/proxmox_user.py link and library/proxmox_group.py link for module documentation.

For managing roles and ACLs, a similar module is employed, but the main difference is that most of the parameters only accept lists (subject to change):

pve_roles:
  - name: Monitoring
    privileges:
      - "Sys.Modify"
      - "Sys.Audit"
      - "Datastore.Audit"
      - "VM.Monitor"
      - "VM.Audit"
pve_acls:
  - path: /
    roles: [ "Administrator" ]
    groups: [ "Admins" ]
  - path: /pools/testpool
    roles: [ "PVEAdmin" ]
    users:
      - pveapi@pve
    groups:
      - test_users

Refer to library/proxmox_role.py link and library/proxmox_acl.py link for module documentation.

Storage Management

You can use this role to manage storage within Proxmox VE (both in single server deployments and cluster deployments). For now, the only supported types are dir, rbd, nfs, cephfs, lvm,lvmthin, zfspool, btrfs, cifs and pbs. Here are some examples.

pve_storages:
  - name: dir1
    type: dir
    content: [ "images", "iso", "backup" ]
    path: /ploup
    disable: no
    maxfiles: 4
  - name: ceph1
    type: rbd
    content: [ "images", "rootdir" ]
    nodes: [ "lab-node01.local", "lab-node02.local" ]
    username: admin
    pool: rbd
    krbd: yes
    monhost:
      - 10.0.0.1
      - 10.0.0.2
      - 10.0.0.3
  - name: nfs1
    type: nfs
    content: [ "images", "iso" ]
    server: 192.168.122.2
    export: /data
  - name: lvm1
    type: lvm
    content: [ "images", "rootdir" ]
    vgname: vg1
  - name: lvmthin1
    type: lvmthin
    content: [ "images", "rootdir" ]
    vgname: vg2
    thinpool: data
  - name: cephfs1
    type: cephfs
    content: [ "snippets", "vztmpl", "iso" ]
    nodes: [ "lab-node01.local", "lab-node02.local" ]
    monhost:
      - 10.0.0.1
      - 10.0.0.2
      - 10.0.0.3
  - name: pbs1
    type: pbs
    content: [ "backup" ]
    server: 192.168.122.2
    username: user@pbs
    password: PBSPassword1
    datastore: main
    namespace: Top/something # Optional
  - name: zfs1
    type: zfspool
    content: [ "images", "rootdir" ]
    pool: rpool/data
    sparse: true
  - name: btrfs1
    type: btrfs
    content: [ "images", "rootdir" ]
    nodes: [ "lab-node01.local", "lab-node02.local" ]
    path: /mnt/proxmox_storage
    is_mountpoint: true
  - name: cifs1
    server: cifs-host.domain.tld
    type: cifs
    content: [ "snippets", "vztmpl", "iso" ]
    share: sharename
    subdir: /subdir
    username: user
    password: supersecurepass
    domain: addomain.tld

Refer to https://pve.proxmox.com/pve-docs/api-viewer/index.html for more information.

Currently the zfspool type can be used only for images and rootdir contents. If you want to store the other content types on a ZFS volume, you need to specify them with type dir, path /<POOL>/<VOLUME> and add an entry in pve_zfs_create_volumes. This example adds a iso storage on a ZFS pool:

pve_zfs_create_volumes:
  - rpool/iso
pve_storages:
  - name: iso
    type: dir
    path: /rpool/iso
    content: [ "iso" ]

Refer to library/proxmox_storage.py link for module documentation.

Ceph configuration

This section could use a little more love. If you are actively using this role to manage your PVE Ceph cluster, please feel free to flesh this section more thoroughly and open a pull request! See issue #68.

PVE Ceph management with this role is experimental. While users have successfully used this role to deploy PVE Ceph, it is not fully tested in CI (due to a lack of usable block devices to use as OSDs in Travis CI). Please deploy a test environment with your configuration first prior to prod, and report any issues if you run into any.

This role can configure the Ceph storage system on your Proxmox hosts. The following definitions show some of the configurations that are possible.

pve_ceph_enabled: true
pve_ceph_network: '172.10.0.0/24'
pve_ceph_cluster_network: '172.10.1.0/24'
pve_ceph_nodes: "ceph_nodes"
pve_ceph_osds:
  # OSD with everything on the same device
  - device: /dev/sdc
  # OSD with block.db/WAL on another device
  - device: /dev/sdd
    block.db: /dev/sdb1
  # encrypted OSD with everything on the same device
  - device: /dev/sdc
    encrypted: true
  # encrypted OSD with block.db/WAL on another device
  - device: /dev/sdd
    block.db: /dev/sdb1
    encrypted: true
# Crush rules for different storage classes
# By default 'type' is set to host, you can find valid types at
# (https://docs.ceph.com/en/latest/rados/operations/crush-map/)
# listed under 'TYPES AND BUCKETS'
pve_ceph_crush_rules:
  - name: replicated_rule
    type: osd # This is an example of how you can override a pre-existing rule
  - name: ssd
    class: ssd
    type: osd
    min-size: 2
    max-size: 8
  - name: hdd
    class: hdd
    type: host
# 2 Ceph pools for VM disks which will also be defined as Proxmox storages
# Using different CRUSH rules
pve_ceph_pools:
  - name: ssd
    pgs: 128
    rule: ssd
    application: rbd
    storage: true
# This Ceph pool uses custom size/replication values
  - name: hdd
    pgs: 32
    rule: hdd
    application: rbd
    storage: true
    size: 2
    min-size: 1
# This Ceph pool uses custom autoscale mode : "off" | "on" | "warn"> (default = "warn")
  - name: vm-storage
    pgs: 128
    rule: replicated_rule
    application: rbd
    autoscale_mode: "on"
    storage: true
pve_ceph_fs:
# A CephFS filesystem not defined as a Proxmox storage
  - name: backup
    pgs: 64
    rule: hdd
    storage: false
    mountpoint: /srv/proxmox/backup

pve_ceph_network by default uses the ansible.utils.ipaddr filter, which requires the netaddr library to be installed and usable by your Ansible controller.

pve_ceph_nodes by default uses pve_group, this parameter allows to specify on which nodes install Ceph (e.g. if you don't want to install Ceph on all your nodes).

pve_ceph_osds by default creates unencrypted ceph volumes. To use encrypted volumes the parameter encrypted has to be set per drive to true.

PCIe Passthrough

This role can be configured to allow PCI device passthrough from the Proxmox host to VMs. This feature is not enabled by default since not all motherboards and CPUs support this feature. To enable passthrough, the devices CPU must support hardware virtualization (VT-d for Intel based systems and AMD-V for AMD based systems). Refer to the manuals of all components to determine whether this feature is supported or not. Naming conventions of will vary, but is usually referred to as IOMMU, VT-d, or AMD-V.

By enabling this feature, dedicated devices (such as a GPU or USB devices) can be passed through to the VMs. Along with dedicated devices, various integrated devices such as Intel or AMD's integrated GPU's are also able to be passed through to VMs.

Some devices are able to take advantage of Mediated usage. Mediated devices are able to be passed through to multiple VMs to share resources, while still remaining usable by the host system. Splitting of devices is not always supported and should be validated before being enabled to prevent errors. Refer to the manual of the device you want to pass through to determine whether the device is capable of mediated usage (Currently this role only supports GVT-g; SR-IOV is not currently supported and must be enable manually after role completion).

The following is an example configuration which enables PCIe passthrough:

pve_pcie_passthrough_enabled: true
pve_iommu_passthrough_mode: true
pve_iommu_unsafe_interrupts: false
pve_mediated_devices_enabled: false
pve_pcie_ovmf_enabled: false
pve_pci_device_ids:
  - id: "10de:1381"
  - id: "10de:0fbc"
pve_vfio_blacklist_drivers:
  - name: "radeon"
  - name: "nouveau"
  - name: "nvidia"
pve_pcie_ignore_msrs: false
pve_pcie_report_msrs: true

pve_pcie_passthrough_enabled is required to use any PCIe passthrough functionality. Without this enabled, all other PCIe related fields will be unused.

pve_iommu_passthrough_mode enabling IOMMU passthrough mode might increase device performance. By enabling this feature, it allows VMs to bypass the default DMA translation which would normally be performed by the hyper-visor. Instead, VMs pass DMA requests directly to the hardware IOMMU.

pve_iommu_unsafe_interrupts is required to be enabled to allow PCI passthrough if your system doesn't support interrupt remapping. You can find check whether the device supports interrupt remapping by using dmesg | grep 'remapping'. If you see one of the following lines:

  • "AMD-Vi: Interrupt remapping enabled"
  • "DMAR-IR: Enabled IRQ remapping in x2apic mode" ('x2apic' can be different on old CPUs, but should still work)

Then system interrupt remapping is supported and you do not need to enable unsafe interrupts. Be aware that by enabling this value your system can become unstable.

pve_mediated_devices_enabled enables GVT-g support for integrated devices such as Intel iGPU's. Not all devices support GVT-g so it is recommended to check with your specific device beforehand to ensure it is allowed.

pve_pcie_ovmf_enabled enables GPU OVMF PCI passthrough. When using OVMF you should select 'OVMF' as the BIOS option for the VM instead of 'SeaBIOS' within Proxmox. This setting will try to opt-out devices from VGA arbitration if possible.

pve_pci_device_ids is a list of device and vendor ids that is wished to be passed through to VMs from the host. See the section 'GPU Passthrough' on the Proxmox WIKI to find your specific device and vendor id's. When setting this value, it is required to specify an 'id' for each new element in the array.

pve_vfio_blacklist_drivers is a list of drivers to be excluded/blacklisted from the host. This is required when passing through a PCI device to prevent the host from using the device before it can be assigned to a VM. When setting this value, it is required to specify a 'name' for each new element in the array.

pve_pcie_ignore_msrs prevents some Windows applications like GeForce Experience, Passmark Performance Test and SiSoftware Sandra from crashing the VM. This value is only required when passing PCI devices to Windows based systems.

pve_pcie_report_msrs can be used to enable or disable logging messages of msrs warnings. If you see a lot of warning messages in your 'dmesg' system log, this value can be used to silence msrs warnings.

Developer Notes

When developing new features or fixing something in this role, you can test out your changes by using Vagrant (only libvirt is supported currently). The playbook can be found in tests/vagrant (so be sure to modify group variables as needed). Be sure to test any changes on both Debian 10 and 11 (update the Vagrantfile locally to use debian/buster64) before submitting a PR.

You can also specify an apt caching proxy (e.g. apt-cacher-ng, and it must run on port 3142) with the APT_CACHE_HOST environment variable to speed up package downloads if you have one running locally in your environment. The vagrant playbook will detect whether or not the caching proxy is available and only use it if it is accessible from your network, so you could just permanently set this variable in your development environment if you prefer.

For example, you could run the following to show verbose/easier to read output, use a caching proxy, and keep the VMs running if you run into an error (so that you can troubleshoot it and/or run vagrant provision after fixing):

APT_CACHE_HOST=10.71.71.10 ANSIBLE_STDOUT_CALLBACK=debug vagrant up --no-destroy-on-error

Contributors

Musee Ullah (@lae, [email protected]) - Main developer
Fabien Brachere (@Fbrachere) - Storage config support
Gaudenz Steinlin (@gaundez) - Ceph support, etc
Richard Scott (@zenntrix) - Ceph support, PVE 7.x support, etc
Thoralf Rickert-Wendt (@trickert76) - PVE 6.x support, etc
Engin Dumlu (@roadrunner)
Jonas Meurer (@mejo-)
Ondrej Flidr (@SniperCZE)
niko2 (@niko2)
Christian Aublet (@caublet)
Gille Pietri (@gilou)
Michael Holasek (@mholasek)
Alexander Petermann (@lexxxel) - PVE 8.x support, etc
Bruno Travouillon (@btravouillon) - UX improvements
Tobias Negd (@wu3rstle) - Ceph support
PendaGTP (@PendaGTP) - Ceph support
John Marion (@jmariondev)
foerkede (@foerkede) - ZFS storage support
Guiffo Joel (@futuriste) - Pool configuration support
Adam Delo (@ol3d) - PCIe Passthrough Support

Full list of contributors

ansible-role-proxmox's People

Contributors

btravouillon avatar edv-pi avatar futuriste avatar gaudenz avatar gilou avatar jfpanisset avatar jmariondev avatar junousi avatar lae avatar leprasmurf avatar lexxxel avatar mejo- avatar mholasek avatar mrtwnklr avatar niko2 avatar pendagtp avatar peschmae avatar pgagarinov avatar phemmer avatar rina-y avatar slobberbone avatar snipercze avatar stackcoder avatar tbabej avatar tbasset avatar timtorchen avatar trickert76 avatar vitoo avatar wu3rstle avatar zenntrix avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ansible-role-proxmox's Issues

Let's Encrypt toggle failing

I'm setting up a public Proxmox host now so I'm trying out the LE SSL feature, but I'm running into an error:

TASK [lae.proxmox : Copy PVE SSL certificate chain and key to /etc/ssl] redacted.host*************************
skipping: [redacted.host]

TASK [lae.proxmox : Install PVE SSL certificate chain and key] redacted.host**********************************
skipping: [redacted.host] => (item={u'dest': u'/etc/pve/local/pveproxy-ssl.key', u'src': u'/etc/ssl/pveproxy-ssl.key'}) 
skipping: [redacted.host] => (item={u'dest': u'/etc/pve/local/pveproxy-ssl.pem', u'src': u'/etc/ssl/pveproxy-ssl.pem'}) 

TASK [lae.proxmox : Install Proxmox Let's Encrypt post-hook script] redacted.host*****************************
changed: [redacted.host]

TASK [lae.proxmox : Request Let's Encrypt certificate for redacted.host] redacted.host************

TASK [jnv.debian-backports : add distribution-specific variables] redacted.host*******************************
skipping: [redacted.host]

TASK [jnv.debian-backports : add backports repository] redacted.host******************************************
skipping: [redacted.host]

TASK [systemli.letsencrypt : install Let's Encrypt Certbot client] redacted.host******************************
skipping: [redacted.host]

TASK [systemli.letsencrypt : install certbot plugin 'apache' on webservers] redacted.host*********************
skipping: [redacted.host]

TASK [systemli.letsencrypt : install certbot DNS challenge helper script] redacted.host***********************
skipping: [redacted.host]

TASK [systemli.letsencrypt : create directory /etc/letsencrypt/keys] redacted.host****************************
skipping: [redacted.host]

TASK [systemli.letsencrypt : install certbot DNS challenge nsupdate key] redacted.host************************
skipping: [redacted.host]

TASK [systemli.letsencrypt : install certbot DNS challenge nsupdate private key] redacted.host****************
skipping: [redacted.host]

TASK [systemli.letsencrypt : add system group 'letsencrypt'] redacted.host************************************
skipping: [redacted.host]

TASK [systemli.letsencrypt : adjust permissions for certificate directories] redacted.host********************
skipping: [redacted.host] => (item=/etc/letsencrypt/archive) 
skipping: [redacted.host] => (item=/etc/letsencrypt/live) 

TASK [systemli.letsencrypt : check if letsencrypt_account_email is set] redacted.host*************************
skipping: [redacted.host]

TASK [systemli.letsencrypt : check if a Let's Encrypt account exists] redacted.host***************************
skipping: [redacted.host]

TASK [systemli.letsencrypt : prepare optional test cert option] redacted.host*********************************
skipping: [redacted.host]

TASK [systemli.letsencrypt : create Let's Encrypt account] redacted.host**************************************
skipping: [redacted.host]

TASK [systemli.letsencrypt : prepare authenticator options for apache] redacted.host**************************
skipping: [redacted.host]

TASK [systemli.letsencrypt : prepare authenticator options for standalone] redacted.host**********************
ok: [redacted.host]

TASK [systemli.letsencrypt : prepare authenticator options for webroot] redacted.host*************************
skipping: [redacted.host]

TASK [systemli.letsencrypt : prepare cert name option] redacted.host******************************************
ok: [redacted.host]

TASK [systemli.letsencrypt : prepare optional test cert option] redacted.host*********************************
ok: [redacted.host]

TASK [systemli.letsencrypt : prepare post-hook options] redacted.host*****************************************
ok: [redacted.host]

TASK [systemli.letsencrypt : prepare post-hook options] redacted.host*****************************************
skipping: [redacted.host]

TASK [systemli.letsencrypt : register Let's Encrypt certificate with HTTP challenge] redacted.host************
fatal: [redacted.host]: FAILED! => {"msg": "The conditional check 'not \"no action taken\" in letsencrypt_reg_certbot_http.stdout' failed. The error was: error while evaluating conditional (not \"no action taken\" in letsencrypt_reg_certbot_http.stdout): Unable to look up a name or access an attribute in template string ({% if not \"no action taken\" in letsencrypt_reg_certbot_http.stdout %} True {% else %} False {% endif %}).\nMake sure your variable name does not contain invalid characters like '-': argument of type 'StrictUndefined' is not iterable"}

RUNNING HANDLER [lae.proxmox : restart watchdog-mux] redacted.host********************************************

PLAY RECAP redacted.host**************************************************************************************
redacted.host  : ok=55   changed=24   unreachable=0    failed=1   

@mejo- have you run into this? I'm gonna look into it a bit further and see if I need to update documentation or something.

Subscription patch task failing on latest update

Looks like something changed with the PVE UI to cause this task to fail:

TASK [ansible-role-proxmox : Remove subscription check wrapper function in web UI] ***
task path: /home/travis/build/lae/ansible-role-proxmox/tasks/main.yml:90

Friday 23 February 2018  23:03:57 +0000 (0:00:06.231)       0:00:34.309 ******* 
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: PatchError: 5 out of 5 hunks FAILED
fatal: [test03.lxc]: FAILED! => {
    "changed": false
}
MSG:
5 out of 5 hunks FAILED

https://travis-ci.org/lae/ansible-role-proxmox/jobs/345467613

Add pve_exporter deployment steps for Prometheus monitoring

Source: https://github.com/znerol/prometheus-pve-exporter

We should be able to:

  • create a new local unix user (pve-exp?) to deploy/run under, with random password (needs to be stored on the server somewhere so we can refer to it later, in order to maintain idempotence)
  • create a python3 virtualenv
  • deploy pve-exporter into that virtualenv
  • add PVEAuditor role for pve-exp@pam in PVE auth - probably merge with the associated role variable(s)
  • configure pve-exporter to use a pve-exp@pam (?) API user, referencing the password we stored earlier
  • create systemd unit file for running pve-exporter, reload on config change
  • add note about opening up port 9221 in firewall if using firewall

refer to https://github.com/znerol/prometheus-pve-exporter/wiki/PVE-Exporter-on-Proxmox-VE-Node-using-in-a-virtualenv

[Proxmox 5.2] Error when configuring cluster

Hi,

I'm testing your role for provisioning Proxmox nodes in a cluster and I believe I ran into a bug. I am testing this by installing a Debian 9.5 on Hyper-V and then running the role.

The task [lae.proxmox : Lookup cluster information] fails with the following error:

fatal: [node-test.home]: FAILED! => {
    "changed": false,
    "module_stderr": "Shared connection to 172.26.245.157 closed.\r\n",
    "module_stdout": "Traceback (most recent call last):\r\n  File \"/tmp/ansible_PYjZHK/ansible_module_proxmox_query.py\", line 70, in <module>\r\n    main()\r\n  File \"/tmp/ansible_PYjZHK/ansible_module_proxmox_query.py\", line 60, in main\r\n
result['response'] = pvesh.get(module.params['query'])\r\n  File \"/tmp/ansible_PYjZHK/ansible_modlib.zip/ansible/module_utils/pvesh.py\", line 69, in get\r\n  File \"/tmp/ansible_PYjZHK/ansible_modlib.zip/ansible/module_utils/pvesh.py\", line 32, in run_command\r\nIndexError: list index out of range\r\n",
    "msg": "MODULE FAILURE",
    "rc": 1
}

The error is in pvesh.py line 32:

    if stderr[0] == "200 OK":

Running the pvesh query manually on the node:

pvesh get cluster/status --output-format json

gives:

[{"id":"node/node-test","ip":"172.26.245.157","level":"","local":1,"name":"node-test","nodeid":0,"online":1,"type":"node"}]

and no stderr output, while pvesh.py code expects a 200 OK.

Support for LXC containers and VMs?

Hi @lae!
First off, congrats on ansible-role-proxmox. I have been educating myself recently on Ansible Proxmox options, and your repository is the most up-to-date and relevant.

Looking at your implementation, your use of pvesh is the most natural and powerful. The three official Ansible modules (proxmox, proxmox_kvm and proxmox_template) all rely on proxmoxer, which is a simple wrapper over the Proxmox VE API with support for different access modes. Since pvesh is itself just an interface over the API, that seems like the right technical choice for an Ansible role.

pvesh/Proxmox API can be used for both LXC and VM creation. I feel that you repository is ripe for tackling support for them. What's your opinion on the topic?

If this is something that you are open to, I could potentially contribute to that effort. Let me know!

PVE firewall

As far as I know, there is no real Ansible role to manage the PVE firewall. Is it a good idea to integrate it here? I would start with it today.

Ceph OSD provisioning failure

@lae, the problem i am seeing when installing on a fresh bare-metal is

235: TASK [ansible-role-proxmox : Create Ceph OSDs] ********************************* 236: failed: [proxmox-test.corp.####.com] (item={u'device': u'/dev/sdb'}) => {"ansible_loop_var": "item", "changed": true, "cmd": ["pveceph", "osd", "create", "/dev/sdb"], "delta": "0:00:01.461373", "end": "2019-11-18 20:04:43.436298", "item": {"device": "/dev/sdb"}, "msg": "non-zero return code", "rc": 25, "start": "2019-11-18 20:04:41.974925", "stderr": "device '/dev/sdb' is already in use", "stderr_lines": ["device '/dev/sdb' is already in use"], "stdout": "", "stdout_lines": []} 237: failed: [proxmox-test.corp.####.com] (item={u'device': u'/dev/sdc'}) => {"ansible_loop_var": "item", "changed": true, "cmd": ["pveceph", "osd", "create", "/dev/sdc"], "delta": "0:00:00.968795", "end": "2019-11-18 20:04:44.735755", "item": {"device": "/dev/sdc"}, "msg": "non-zero return code", "rc": 25, "start": "2019-11-18 20:04:43.766960", "stderr": "device '/dev/sdc' is already in use", "stderr_lines": ["device '/dev/sdc' is already in use"], "stdout": "", "stdout_lines": []}

I have the following ansible parameters set:

pve_ceph_crush_rules:
  - name: hdd
pve_ceph_enabled: true
pve_ceph_mds_group: all
pve_ceph_pools:
  - name: vm-storage
    pgs: 128
    application: rbd
    storage: true
  - name: k8-storage
    pgs: 64
    application: rbd
pve_storages:
  - name: vm-storage
    type: rbd
    content:
      - images
      - rootdir
    pool: vm-storage
    username: admin
    monhost:
      - proxmox-test.corp.####.com
pve_ceph_osds:
  - device: "/dev/sdb"
  - device: "/dev/sdc"

Any ideas what i am missing?

Originally posted by @zenntrix in #73 (comment)

Support in-PVE Let's Encrypt setup

PVE 5.2 introduced ACME support within PVE.

https://pve.proxmox.com/wiki/Certificate_Management

An ACME management module can be created to manage ACME registration and certificate.

Process I used to generate cert:

~# pvesh create /cluster/acme/account -contact $email -tos_url https://letsencrypt.org/documents/LE-SA-v1.2-November-15-2017.pdf 
Generating ACME account key..
Registering ACME account..
Registration successful, account URL: 'https://acme-v02.api.letsencrypt.org/acme/acct/X'
200 OK
~# pvesh set /nodes/maika/config -acme domains=$fqdn
200 OK
~# pvesh create /nodes/maika/certificates/acme/certificate -force
Loading ACME account details
Placing ACME order
Order URL: https://acme-v02.api.letsencrypt.org/acme/order/X/X

Getting authorization details from 'https://acme-v02.api.letsencrypt.org/acme/authz/XXXXXXXXXXXXX'
... pending!
Setting up webserver
Triggering validation
Sleeping for 5 seconds
Status is 'valid'!

All domains validated!

Creating CSR
Finalizing order
Checking order status
valid!

Downloading certificate
Setting pveproxy certificate and key
Restarting pveproxy
200 OK
UPID:XXX

Note that failure returns 200 OK:

Triggering validation
Sleeping for 5 seconds
Status is still 'pending', trying again in 30 seconds
validating challenge 'https://acme-v02.api.letsencrypt.org/acme/authz/XXXXXXXX' failed
200 OK

Documentation should note that port 80 must be accessible from the Internet (above happened to me because firewall dropped connections) and nothing else should be running on port 80.

Remove Subscription Check isn't working

I'm using the develop branch:

- src: https://github.com/lae/ansible-role-proxmox.git
  version: develop
  name: lae.proxmox

and make a ansible-galaxy install --force -r roles/requirements.yml and I'm still getting the message at TASK [lae.proxmox : Remove subscription check wrapper function in web UI]

The full traceback is:
Traceback (most recent call last):
  File "/tmp/ansible_patch_payload_e9ypt3al/ansible_patch_payload.zip/ansible/modules/files/patch.py", line 208, in main
  File "/tmp/ansible_patch_payload_e9ypt3al/ansible_patch_payload.zip/ansible/modules/files/patch.py", line 158, in apply_patch
PatchError: 1 out of 1 hunk FAILED
Reversed (or previously applied) patch detected!  Skipping patch.
3 out of 3 hunks ignored

fatal: [saturn]: FAILED! => {
    "changed": false,
    "invocation": {
        "module_args": {
            "backup": true,
            "basedir": "/",
            "binary": false,
            "dest": null,
            "remote_src": false,
            "src": "/root/.ansible/tmp/ansible-tmp-1596549500.348261-10773-167626737788042/00_remove_checked_command_buster.patch",
            "state": "present",
            "strip": 1
        }
    },
    "msg": "1 out of 1 hunk FAILED\nReversed (or previously applied) patch detected!  Skipping patch.\n3 out of 3 hunks ignored\n"
}

It is a Proxmox 6.2-10

Lack of validation/properly escaping args to pvesh with proxmoxer

The dependency on proxmoxer appears to have introduced the necessity of escaping spaces (in #18) so that pvesh can parse flags correctly. Currently, library/proxmox_user.py and library/proxmox_group.py only escape spaces, but there are possibly other characters that may need to be escaped that otherwise would not be parsed correctly by Bash. This should either be fixed upstream eventually or within this role.

Currently, this role installs a modified version of proxmoxer: https://github.com/lae/proxmoxer/tree/ansible-friendly

Another option may possibly be to write our own Python library for communicating with Proxmox on the local host with pvesh.

TypeError: a bytes-like object is required, not 'str'

I don't want to install Python2 on my new hosts, the support ends this year. So I tried your role with a Python3 only host and got the following error:

TASK [lae.proxmox : Check for kernel update] *****************************************************************************************************************************
fatal: [ac-sh0001]: FAILED! => 
{
  "changed": false, 
...
    File \"/tmp/ansible_collect_kernel_info_payload_96ul2c8o/__main__.py\", line 71, in <module>\r\n
    File \"/tmp/ansible_collect_kernel_info_payload_96ul2c8o/__main__.py\", line 67, in main\r\n
        TypeError: a bytes-like object is required, not 'str'\r\n",
  "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error",
  "rc": 1
}

The error is in the library/collect_kernel_info.py line 67, where two strings are splittet and partly compared. I'm not a Python developer (yet), but the Strings should be converted. But I'm not sure, if this is a good idea. If you give me an idea, what to do, I would create a fix.

Integrate Wireguard Support?

In my production setup, I'm using wireguard to setup a private network over all Proxmox hosts (my hosts are all public root servers on some different provider sites). Even when Wireguard is not yet official supported (coming with the next major release) it is possible to use it already via the unstable tree. Wireguard is nice, because it is really fast (in comparison to other VPN solutions) and the setup is really easy.

The setup is easy. There must be a wireguard-kernel module, a configuration file all known other peers and a service to listin on a given UDP port. All or some traffic can then be routed via that port to other peers/hosts.

The issue is - I've bootstrap problem together with the lae.proxmox role. And I'm not sure, how to resolve it without changing this role or by making "manual" changes on the hosts.

First - to join an existing cluster the new host needs access to the Wireguard network. Currently this works by enabling the module in the Debian Kernel. When I then start this role on that new empty host, the Proxmox kernel is installed and the system reboots. I can also add the pve-headers package to pve_extra_packages so the Wireguard module can be build. But - before I can enable the module again with modprobe on the newly booted kernel I need to run a "dkms autoinstall" on Proxmox to generate the module in the correct place under /lib/modules/... Only then the kernel will find the module and start the tunnel and the role can use it to do what ever is necessary to do (join the cluster).

So my problem is the different behaviour after switching the kernel.

Is there a good way to integrate that? Currently I stop the play before the reboot, reboot manually, make the "dkms autoinstall" and start the role again.

I'd like to integrate the wireguard support with some flags. If the flag is set, then it compares the kernels before and after the reboot, checks if it is a linux->pve switch and runs the tasks to reconfigure wireguard. I could also integrate the complete setup of the wireguard (a single template and some key generations).

Getting error when creating a new cluster

Hello!

Thanks for all your hard work on this. I've been setting up my cluster and it runs fine all the way until it has to add a new node to a cluster. I see that the cluster is created and the playbook passes all the tests and reports that things were changed. However, the cluster never gets any new nodes and it actually throws and error.

I get this error:
TASK ERROR: format error at /usr/share/perl5/PVE/JSONSchema.pm line 1866 type: Invalid schema definition. optional: Invalid schema definition. description: Invalid schema definition. format: property is missing and it is not optional

Let me know what logs or whatever you need to debug.

Rejoin after Reboot

Hi,

We have a Problem that a node has not rejoined the cluster after reboot.

We fixed it with a simple hack in the /lib/systemd/system/corosync.service file
We guess the network is not up fast enough , so we inject a simple sleep command see below ExecStartPre= command.

whole output of the service file systemd corosync.service

root@node1:~# cat /lib/systemd/system/corosync.service
[Unit]
Description=Corosync Cluster Engine
Documentation=man:corosync man:corosync.conf man:corosync_overview
ConditionKernelCommandLine=!nocluster
ConditionPathExists=/etc/corosync/corosync.conf
Requires=network-online.target
After=network-online.target

[Service]
EnvironmentFile=-/etc/default/corosync
ExecStart=/usr/sbin/corosync -f $COROSYNC_OPTIONS
Type=notify
ExecStartPre=/bin/sleep 5

# The following config is for corosync with enabled watchdog service.
#
#  When corosync watchdog service is being enabled and using with
#  pacemaker.service, and if you want to exert the watchdog when a
#  corosync process is terminated abnormally,
#  uncomment the line of the following Restart= and RestartSec=.
#Restart=on-failure
#  Specify a period longer than soft_margin as RestartSec.
#RestartSec=70
#  rewrite according to environment.
#ExecStartPre=/sbin/modprobe softdog

[Install]
WantedBy=multi-user.target

our install_pve.yml looks like:

- name: 00_install_proxmox
  hosts: all
  roles:
    - {
        role: geerlingguy.ntp,
        ntp_manage_config: true,
        ntp_servers: [
          ourntp.local
        ],
        ntp_timezone: Europe/Berlin
      }
    - {
        role: lae.proxmox,
        pve_group: clustername,
        pve_reboot_on_kernel_update: true,
      }

  tasks:
    - name: fix reboot rejoin bug
      lineinfile:
        path: /lib/systemd/system/corosync.service
        line: "ExecStartPre=/bin/sleep 5"
        insertafter: "Type=notify"

Is there another solution for that issue?

kind regards
matthias

Define all hosts and pvelocalhost in /etc/hosts

From pve-user:

Each node should have all other nodes listed in the /etc/hosts
files. Currently, it only adds the local hostname to it.
Additionally, the local node should add "pvelocalhost" to the
/etc/hosts entry. An example for a host of "10.0.0.2"

10.0.0.1 mox1.example.com mox1
10.0.0.2 mox2.example.com mox2 pvelocalhost
10.0.0.3 mox3.example.com mox3
...

Ceph on Buster

I've two completly fresh PVE nodes joint to a cluster on Debian Buster. PVE is running on 6.0-6 and everything is fine.

When I set pve_ceph_enabled: true and run a playbook with the role + diff I see that:

TASK [lae.proxmox : Configure Ceph package source] ***************************************************************************************************************************************************************************************************************************
--- before: /dev/null
+++ after: /etc/apt/sources.list.d/ceph.list.list
@@ -0,0 +1 @@
+deb http://download.proxmox.com/debian/ceph-luminous stretch main

changed: [server1]
--- before: /dev/null
+++ after: /etc/apt/sources.list.d/ceph.list.list
@@ -0,0 +1 @@
+deb http://download.proxmox.com/debian/ceph-luminous stretch main

changed: [server2]

TASK [lae.proxmox : Install Ceph packages] ***********************************************************************************************************************************************************************************************************************************
fatal: [server1]: FAILED! => {"cache_update_time": 1566886284, "cache_updated": false, "changed": false, "msg": "'/usr/bin/apt-get -y -o \"Dpkg::Options::=--force-confdef\" -o \"Dpkg::Options::=--force-confold\"      install 'ceph' 'ceph-mds' -o APT::Install-Recommends=no' failed: E: Unable to correct problems, you have held broken packages.\n", "rc": 100, "stderr": "E: Unable to correct problems, you have held broken packages.\n", "stderr_lines": ["E: Unable to correct problems, you have held broken packages."], "stdout": "Reading package lists...\nBuilding dependency tree...\nReading state information...\nSome packages could not be installed. This may mean that you have\nrequested an impossible situation or if you are using the unstable\ndistribution that some required packages have not yet been created\nor been moved out of Incoming.\nThe following information may help to resolve the situation:\n\nThe following packages have unmet dependencies:\n ceph : Depends: ceph-mgr (= 12.2.12-pve1) but it is not going to be installed\n        Depends: ceph-mon (= 12.2.12-pve1) but it is not going to be installed\n        Depends: ceph-osd (= 12.2.12-pve1) but it is not going to be installed\n ceph-mds : Depends: ceph-base (= 12.2.12-pve1) but it is not going to be installed\n", "stdout_lines": ["Reading package lists...", "Building dependency tree...", "Reading state information...", "Some packages could not be installed. This may mean that you have", "requested an impossible situation or if you are using the unstable", "distribution that some required packages have not yet been created", "or been moved out of Incoming.", "The following information may help to resolve the situation:", "", "The following packages have unmet dependencies:", " ceph : Depends: ceph-mgr (= 12.2.12-pve1) but it is not going to be installed", "        Depends: ceph-mon (= 12.2.12-pve1) but it is not going to be installed", "        Depends: ceph-osd (= 12.2.12-pve1) but it is not going to be installed", " ceph-mds : Depends: ceph-base (= 12.2.12-pve1) but it is not going to be installed"]}
fatal: [server2]: FAILED! => {"cache_update_time": 1566886285, "cache_updated": false, "changed": false, "msg": "'/usr/bin/apt-get -y -o \"Dpkg::Options::=--force-confdef\" -o \"Dpkg::Options::=--force-confold\"      install 'ceph' 'ceph-mds' -o APT::Install-Recommends=no' failed: E: Unable to correct problems, you have held broken packages.\n", "rc": 100, "stderr": "E: Unable to correct problems, you have held broken packages.\n", "stderr_lines": ["E: Unable to correct problems, you have held broken packages."], "stdout": "Reading package lists...\nBuilding dependency tree...\nReading state information...\nSome packages could not be installed. This may mean that you have\nrequested an impossible situation or if you are using the unstable\ndistribution that some required packages have not yet been created\nor been moved out of Incoming.\nThe following information may help to resolve the situation:\n\nThe following packages have unmet dependencies:\n ceph : Depends: ceph-mgr (= 12.2.12-pve1) but it is not going to be installed\n        Depends: ceph-mon (= 12.2.12-pve1) but it is not going to be installed\n        Depends: ceph-osd (= 12.2.12-pve1) but it is not going to be installed\n ceph-mds : Depends: ceph-base (= 12.2.12-pve1) but it is not going to be installed\n", "stdout_lines": ["Reading package lists...", "Building dependency tree...", "Reading state information...", "Some packages could not be installed. This may mean that you have", "requested an impossible situation or if you are using the unstable", "distribution that some required packages have not yet been created", "or been moved out of Incoming.", "The following information may help to resolve the situation:", "", "The following packages have unmet dependencies:", " ceph : Depends: ceph-mgr (= 12.2.12-pve1) but it is not going to be installed", "        Depends: ceph-mon (= 12.2.12-pve1) but it is not going to be installed", "        Depends: ceph-osd (= 12.2.12-pve1) but it is not going to be installed", " ceph-mds : Depends: ceph-base (= 12.2.12-pve1) but it is not going to be installed"]}

I don't understand why it chooses luminous instead of nautilus. The vars based on debian-buster should choose nautilus. A ansible -m setup server1.. returns buster.

Then I added pve_ceph_repository_line: "deb http://download.proxmox.com/debian/ceph-nautilus buster main" to my playbook group but that did not help.

Then I added a task before including the role to create the ceph.list file in /etc/apt/sources.list.d with nautilus and lae.proxmox adds then luminous. Ok, not important maybe. It uses the correct packages and install cephs, but then I get:


TASK [lae.proxmox : Install custom Ceph systemd service] *********************************************************************************************************************************************************************************************************************
fatal: [server1]: FAILED! => {"changed": false, "msg": "Source /usr/share/doc/pve-manager/examples/ceph.service not found"}
fatal: [server2]: FAILED! => {"changed": false, "msg": "Source /usr/share/doc/pve-manager/examples/ceph.service not found"}

The file is really not available. I think the install process with pveceph install is different in Buster compared to Stretch.

Additionally the task "Configure Ceph package source" should use the filename: ceph instead of filename: ceph.list, because the .list is automatically added by apt_repository.

Storage configuration

@lae, are there any plans for adding Proxmox storage configuration support?

I have some experience with Ansible, I wouldn't mind helping out.

"list index out of range" error in proxmox_group module

Hello,

i tried to install an proxmox stand-alone cluster using your role (v1.4.2 via ansible-galaxy) and run into an error on group management.

TASK [lae.proxmox : Configure Proxmox groups] ******************************************************************************************
failed: [host.local] (item={u'comment': u'Operations Team', u'name': u'ops'}) => 
{"ansible_loop_var": "item", "changed": false,
 "item": {"comment": "Operations Team", "name": "ops"},
 "module_stderr": "Shared connection to host.local closed.\r\n", 
 "module_stdout": "Traceback (most recent call last):\r\n
  File \"/root/.ansible/tmp/ansible-tmp-1557841838.09-250907062164291/AnsiballZ_proxmox_group.py\", line 114, in <module>\r\n
    _ansiballz_main()\r\n
  File \"/root/.ansible/tmp/ansible-tmp-1557841838.09-250907062164291/AnsiballZ_proxmox_group.py\", line 106, in _ansiballz_main\r\n
    invoke_module(zipped_mod, temp_path, ANSIBALLZ_PARAMS)\r\n
  File \"/root/.ansible/tmp/ansible-tmp-1557841838.09-250907062164291/AnsiballZ_proxmox_group.py\", line 49, in invoke_module\r\n
    imp.load_module('__main__', mod, module, MOD_DESC)\r\n
  File \"/tmp/ansible_proxmox_group_payload_dXdbIL/__main__.py\", line 175, in <module>\r\n
  File \"/tmp/ansible_proxmox_group_payload_dXdbIL/__main__.py\", line 154, in main\r\n
  File \"/tmp/ansible_proxmox_group_payload_dXdbIL/__main__.py\", line 87, in create_group\r\n
  File \"/tmp/ansible_proxmox_group_payload_dXdbIL/ansible_proxmox_group_payload.zip/ansible/module_utils/pvesh.py\", line 86, in create\r\n
  File \"/tmp/ansible_proxmox_group_payload_dXdbIL/ansible_proxmox_group_payload.zip/ansible/module_utils/pvesh.py\", line 32, in run_command\r\nIndexError: list index out of range\r\n",
 "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error",
 "rc": 1}

Ansible playbook rrole configuration was:

    - role: lae.proxmox
      pve_group: proxmox
      pve_cluster_enabled: false
      pve_reboot_on_kernel_update: true
      pve_ssl_letsencrypt: true       # needs role systemli.letsencrypt
      pve_groups:
        - name: ops
          comment: Operations Team

I tried to install debian package for proxmox 5.4-1 (pve-no-subscription) on debian 9.9 with latest updates.

Another thnig i noticed was that the role was hanging at the installation step of proxmox deb packages at "/usr/bin/perl /usr/sbin/needrestart" - the deb installation step did not return and seem to wait on user interaction.

[Proxmox 5.4] storage type "Snippets" missing

using the ansible module and variable pve_storage i am able to create new storage spaces.
The default storage added by proxmox 5.4 (named 'local') has one addtionial type "Snippets" available besides "ISO Image", "Container" and so on...

Content types listed in the WebUI for the auto-created storage "local":

Content: Disk image, ISO image, Container, Snippets, Container template

For my test i added all types listed inside proxmox_storage module as content, but "Snippets" is missing then.

content:
required: true
aliases: [ "storagecontent" ]
type: list
choices: [ "images", "rootdir", "vztmpl", "backup", "iso" ]

The ansible playbook has the following variable:

      pve_storages:
        - name: localdir
          type: dir
          content: [ "images", "rootdir", "vztmpl", "backup", "iso" ]
          path: /home/proxmox
          maxfiles: 4

This creates the following list as seen inside WebUI for storage "localdir":

Content: VZDump backup file, Disk image, ISO image, Container, Container template

Terraform module interest?

I've started using Terraform at my part time job and have been thinking, maybe providing a Terraform module for bringing up a PVE cluster with this role might be of interest? Would anyone use it?

I think Packet.net might be the only practical provider, maybe Vultr (iirc they have dedicated machines, but their provider is not in the registry yet (see vultr/terraform-provider-vultr#102)).

Also in order for this to work on non-multicast providers, configuring unicast in corosync will probably become a necessity, and last time I tried to do that in Ansible it wasn't pretty (so I probably don't do this if there's no interest).

/etc/hosts conflict causes pve-cluster fail

On some newly provisioned systems with preseed - it turns out an entry is added to /etc/hosts pointing the current machine's hostname to 127.0.1.1 - it looks like this causes pmxcfs to fail during package installation:

Mar 24 18:28:59 labs-pve03-f92f7621 systemd[1]: Starting The Proxmox VE cluster filesystem...
Mar 24 18:28:59 labs-pve03-f92f7621 pmxcfs[16589]: [main] crit: Unable to get local IP address
Mar 24 18:28:59 labs-pve03-f92f7621 pmxcfs[16589]: [main] crit: Unable to get local IP address
Mar 24 18:28:59 labs-pve03-f92f7621 systemd[1]: pve-cluster.service: control process exited, code=exited status=255
Mar 24 18:28:59 labs-pve03-f92f7621 systemd[1]: Failed to start The Proxmox VE cluster filesystem.
Mar 24 18:28:59 labs-pve03-f92f7621 systemd[1]: Unit pve-cluster.service entered failed state.

The role should be aware of stray lines that include the current host but needs to be aware of the block somehow so that the role can stay idempotent

Downgrade "Remove Subscription Check Wrapper" from Fatal to Warning

When running ansible on 2 new nodes in a cluster based off PVE 6.1-3 i am faced with

TASK [lae.proxmox : Remove subscription check wrapper function in web UI] ************************************************************
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: 1 out of 1 hunk ignored
fatal: [proxmox-test.corp.zenntrix.com]: FAILED! => {"changed": false, "msg": "Reversed (or previously applied) patch detected!  Skipping patch.\n1 out of 1 hunk ignored\n"}
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: 1 out of 1 hunk ignored
fatal: [test-box.corp.zenntrix.com]: FAILED! => {"changed": false, "msg": "Reversed (or previously applied) patch detected!  Skipping patch.\n1 out of 1 hunk ignored\n"}

Feels like this should be a warning and not a fatal.

"list index out of range" in proxmox_user module

Same problem as in #53 but for creating users...

TASK [lae.proxmox : Configure Proxmox user accounts] ***********************************************************************************
failed: [host.local] (item={u'lastname': u'Name', u'name': u'user@pam', u'firstname': u'User', u'email': u'[email protected]'}) =>
{"ansible_loop_var": "item",
 "changed": false,
 "item": {"email": "[email protected]", "firstname": "User", "lastname": "Name", "name": "user@pam"},
 "module_stderr": "Shared connection to host.local closed.\r\n",
 "module_stdout": "Traceback (most recent call last):\r\n
  File \"/root/.ansible/tmp/ansible-tmp-1557844139.76-246187583778071/AnsiballZ_proxmox_user.py\", line 114, in <module>\r\n
    _ansiballz_main()\r\n
  File \"/root/.ansible/tmp/ansible-tmp-1557844139.76-246187583778071/AnsiballZ_proxmox_user.py\", line 106, in _ansiballz_main\r\n
    invoke_module(zipped_mod, temp_path, ANSIBALLZ_PARAMS)\r\n
  File \"/root/.ansible/tmp/ansible-tmp-1557844139.76-246187583778071/AnsiballZ_proxmox_user.py\", line 49, in invoke_module\r\n
    imp.load_module('__main__', mod, module, MOD_DESC)\r\n
  File \"/tmp/ansible_proxmox_user_payload_duPsu5/__main__.py\", line 300, in <module>\r\n
  File \"/tmp/ansible_proxmox_user_payload_duPsu5/__main__.py\", line 279, in main\r\n
  File \"/tmp/ansible_proxmox_user_payload_duPsu5/__main__.py\", line 192, in create_user\r\n
  File \"/tmp/ansible_proxmox_user_payload_duPsu5/__main__.py\", line 149, in check_groups_exist\r\n
  File \"/tmp/ansible_proxmox_user_payload_duPsu5/ansible_proxmox_user_payload.zip/ansible/module_utils/pvesh.py\", line 69, in get\r\n
  File \"/tmp/ansible_proxmox_user_payload_duPsu5/ansible_proxmox_user_payload.zip/ansible/module_utils/pvesh.py\", line 32, in run_command\r\n
IndexError: list index out of range\r\n",
 "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error",
 "rc": 1}

Deprecated warnings in ansible 2.8 with v1.5

Hi - I love this project - it makes my life much more easier. But there are at least some very minor but later more major warnings. With Ansible 2.8 some variable managing changes - so this should be easy be updated to avoid later problems with newer Ansible releases

TASK [lae.proxmox : Ensure this host is in the group specified] **********************************************************************************************************
[DEPRECATION WARNING]: evaluating pve_cluster_enabled as a bare variable, this behaviour will go away and you might need to add |bool to the expression in the future.
Also see CONDITIONAL_BARE_VARS configuration toggle.. This feature will be removed in version 2.12. Deprecation warnings can be disabled by setting
deprecation_warnings=False in ansible.cfg.
skipping: [ac-sh002]

...

TASK [lae.proxmox : Perform system upgrades] *****************************************************************************************************************************
[DEPRECATION WARNING]: evaluating pve_run_system_upgrades as a bare variable, this behaviour will go away and you might need to add |bool to the expression in the
future. Also see CONDITIONAL_BARE_VARS configuration toggle.. This feature will be removed in version 2.12. Deprecation warnings can be disabled by setting
deprecation_warnings=False in ansible.cfg.
skipping: [ac-sh002]

...

TASK [lae.proxmox : Stage ZFS packages if ZFS is enabled] ****************************************************************************************************************
[DEPRECATION WARNING]: evaluating pve_zfs_enabled as a bare variable, this behaviour will go away and you might need to add |bool to the expression in the future. Also
see CONDITIONAL_BARE_VARS configuration toggle.. This feature will be removed in version 2.12. Deprecation warnings can be disabled by setting deprecation_warnings=False
in ansible.cfg.
skipping: [ac-sh002]

...

TASK [lae.proxmox : Remove subscription check wrapper function in web UI] ************************************************************************************************
[DEPRECATION WARNING]: evaluating pve_remove_subscription_warning as a bare variable, this behaviour will go away and you might need to add |bool to the expression in
the future. Also see CONDITIONAL_BARE_VARS configuration toggle.. This feature will be removed in version 2.12. Deprecation warnings can be disabled by setting
deprecation_warnings=False in ansible.cfg.
ok: [ac-sh002]

TASK [lae.proxmox : Check for kernel update] *****************************************************************************************************************************
[DEPRECATION WARNING]: evaluating pve_reboot_on_kernel_update as a bare variable, this behaviour will go away and you might need to add |bool to the expression in the
future. Also see CONDITIONAL_BARE_VARS configuration toggle.. This feature will be removed in version 2.12. Deprecation warnings can be disabled by setting
deprecation_warnings=False in ansible.cfg.
ok: [ac-sh002]

TASK [lae.proxmox : Reboot for kernel update] ****************************************************************************************************************************
[DEPRECATION WARNING]: evaluating pve_reboot_on_kernel_update as a bare variable, this behaviour will go away and you might need to add |bool to the expression in the
future. Also see CONDITIONAL_BARE_VARS configuration toggle.. This feature will be removed in version 2.12. Deprecation warnings can be disabled by setting
deprecation_warnings=False in ansible.cfg.
skipping: [ac-sh002]

TASK [lae.proxmox : Wait for server to come back online] *****************************************************************************************************************
[DEPRECATION WARNING]: evaluating pve_reboot_on_kernel_update as a bare variable, this behaviour will go away and you might need to add |bool to the expression in the
future. Also see CONDITIONAL_BARE_VARS configuration toggle.. This feature will be removed in version 2.12. Deprecation warnings can be disabled by setting
deprecation_warnings=False in ansible.cfg.
skipping: [ac-sh002]

TASK [lae.proxmox : Remove old Debian/PVE kernels] ***********************************************************************************************************************
[DEPRECATION WARNING]: evaluating pve_remove_old_kernels as a bare variable, this behaviour will go away and you might need to add |bool to the expression in the future.
Also see CONDITIONAL_BARE_VARS configuration toggle.. This feature will be removed in version 2.12. Deprecation warnings can be disabled by setting
deprecation_warnings=False in ansible.cfg.
ok: [ac-sh002]

TASK [lae.proxmox : Load ZFS module live] ********************************************************************************************************************************
[DEPRECATION WARNING]: evaluating pve_zfs_enabled as a bare variable, this behaviour will go away and you might need to add |bool to the expression in the future. Also
see CONDITIONAL_BARE_VARS configuration toggle.. This feature will be removed in version 2.12. Deprecation warnings can be disabled by setting deprecation_warnings=False
in ansible.cfg.
skipping: [ac-sh002]

TASK [lae.proxmox : Load ZFS module on init] *****************************************************************************************************************************
[DEPRECATION WARNING]: evaluating pve_zfs_enabled as a bare variable, this behaviour will go away and you might need to add |bool to the expression in the future. Also
see CONDITIONAL_BARE_VARS configuration toggle.. This feature will be removed in version 2.12. Deprecation warnings can be disabled by setting deprecation_warnings=False
in ansible.cfg.
skipping: [ac-sh002]

TASK [lae.proxmox : Copy ZFS modprobe configuration] *********************************************************************************************************************
[DEPRECATION WARNING]: evaluating pve_zfs_enabled as a bare variable, this behaviour will go away and you might need to add |bool to the expression in the future. Also
see CONDITIONAL_BARE_VARS configuration toggle.. This feature will be removed in version 2.12. Deprecation warnings can be disabled by setting deprecation_warnings=False
in ansible.cfg.
skipping: [ac-sh002]

TASK [lae.proxmox : Configure email address for ZFS event daemon notifications] ******************************************************************************************
[DEPRECATION WARNING]: evaluating pve_zfs_enabled as a bare variable, this behaviour will go away and you might need to add |bool to the expression in the future. Also
see CONDITIONAL_BARE_VARS configuration toggle.. This feature will be removed in version 2.12. Deprecation warnings can be disabled by setting deprecation_warnings=False
in ansible.cfg.
skipping: [ac-sh002]

TASK [lae.proxmox : Lookup cluster information] **************************************************************************************************************************
[DEPRECATION WARNING]: evaluating pve_cluster_enabled as a bare variable, this behaviour will go away and you might need to add |bool to the expression in the future.
Also see CONDITIONAL_BARE_VARS configuration toggle.. This feature will be removed in version 2.12. Deprecation warnings can be disabled by setting
deprecation_warnings=False in ansible.cfg.
skipping: [ac-sh002]

TASK [lae.proxmox : Identify if the host is already part of a cluster] ***************************************************************************************************
[DEPRECATION WARNING]: evaluating pve_cluster_enabled as a bare variable, this behaviour will go away and you might need to add |bool to the expression in the future.
Also see CONDITIONAL_BARE_VARS configuration toggle.. This feature will be removed in version 2.12. Deprecation warnings can be disabled by setting
deprecation_warnings=False in ansible.cfg.
skipping: [ac-sh002]

TASK [lae.proxmox : Identify all clusters that the hosts in the specified group may be in] *******************************************************************************
[DEPRECATION WARNING]: evaluating pve_cluster_enabled as a bare variable, this behaviour will go away and you might need to add |bool to the expression in the future.
Also see CONDITIONAL_BARE_VARS configuration toggle.. This feature will be removed in version 2.12. Deprecation warnings can be disabled by setting
deprecation_warnings=False in ansible.cfg.
skipping: [ac-sh002] => (item=ac-sh002)

TASK [lae.proxmox : Ensure that hosts found are not in multiple existing clusters] ***************************************************************************************
[DEPRECATION WARNING]: evaluating pve_cluster_enabled as a bare variable, this behaviour will go away and you might need to add |bool to the expression in the future.
Also see CONDITIONAL_BARE_VARS configuration toggle.. This feature will be removed in version 2.12. Deprecation warnings can be disabled by setting
deprecation_warnings=False in ansible.cfg.
skipping: [ac-sh002]

TASK [lae.proxmox : Ensure that, if we find an existing cluster, that it matches the specified cluster name] *************************************************************
[DEPRECATION WARNING]: evaluating pve_cluster_enabled as a bare variable, this behaviour will go away and you might need to add |bool to the expression in the future.
Also see CONDITIONAL_BARE_VARS configuration toggle.. This feature will be removed in version 2.12. Deprecation warnings can be disabled by setting
deprecation_warnings=False in ansible.cfg.
skipping: [ac-sh002]

TASK [lae.proxmox : Initialize a Proxmox cluster] ************************************************************************************************************************
[DEPRECATION WARNING]: evaluating pve_cluster_enabled as a bare variable, this behaviour will go away and you might need to add |bool to the expression in the future.
Also see CONDITIONAL_BARE_VARS configuration toggle.. This feature will be removed in version 2.12. Deprecation warnings can be disabled by setting
deprecation_warnings=False in ansible.cfg.
skipping: [ac-sh002]

TASK [lae.proxmox : Wait for quorum on initialization node] **************************************************************************************************************
[DEPRECATION WARNING]: evaluating pve_cluster_enabled as a bare variable, this behaviour will go away and you might need to add |bool to the expression in the future.
Also see CONDITIONAL_BARE_VARS configuration toggle.. This feature will be removed in version 2.12. Deprecation warnings can be disabled by setting
deprecation_warnings=False in ansible.cfg.
skipping: [ac-sh002]

TASK [lae.proxmox : Install Proxmox Let's Encrypt post-hook script] ******************************************************************************************************
[DEPRECATION WARNING]: evaluating pve_ssl_letsencrypt as a bare variable, this behaviour will go away and you might need to add |bool to the expression in the future.
Also see CONDITIONAL_BARE_VARS configuration toggle.. This feature will be removed in version 2.12. Deprecation warnings can be disabled by setting
deprecation_warnings=False in ansible.cfg.
skipping: [ac-sh002]

TASK [Request Let's Encrypt certificate for ...] ************************************************************************************************************
[DEPRECATION WARNING]: evaluating pve_ssl_letsencrypt as a bare variable, this behaviour will go away and you might need to add |bool to the expression in the future.
Also see CONDITIONAL_BARE_VARS configuration toggle.. This feature will be removed in version 2.12. Deprecation warnings can be disabled by setting
deprecation_warnings=False in ansible.cfg.
skipping: [ac-sh002]

Sorry - but I didn't check the dev branch for it

PVE cluster witness

So in my knowledge there is currently no way to tag one host part of the cluster as a witness only. This mean that the tagged host will not be running full blown PVE but only corosync-qdevice to be an arbitrator of the cluster This would add reliability to a 2-nodes cluster.

https://www.mankier.com/8/corosync-qdevice

I would be able to make a pull request with the feature if you think it would be a good idea to support it here.
What do you think ?

[Question] Cleanup of the Debian platform

Hello,

The difference between the installation of Proxmox by your Ansible role or by the Proxmox ISO.
I am a beginner concerning Proxmox.
I know the ISO is built with some specifications and Proxmox has directly the capability to interact with the CPU, RAM, ...
I wish to know if your role will have the capability to clean the OS, change the configuration of different files, ... to allow to Proxmox installation to interact with the CPU direclty.

I thank you in advance.

PVE Ceph repository should be configurable

Hi, repo should be configurable like the "pve_repository_line" in defaults/main.yml.
For example, in my setup the nodes don't have internet access, I use a local mirror.

Originally posted by @Fbrachere in #52

Let's encrypt request error

Hey there,

I'm trying to configure proxmox with let's encrypt support enabled. During installation I run into the following issue:

TASK [systemli.letsencrypt : register Let's Encrypt certificate with HTTP challenge] ***************************************************************************************************************************************************************************************************************************************
fatal: [vm1.my.domain]: FAILED! => {"msg": "The conditional check 'not \"no action taken\" in letsencrypt_reg_certbot_http.stdout' failed. The error was: error while evaluating conditional (not \"no action taken\" in letsencrypt_reg_certbot_http.stdout): Unable to look up a name or access an attribute in template string ({% if not \"no action taken\" in letsencrypt_reg_certbot_http.stdout %} True {% else %} False {% endif %}).\nMake sure your variable name does not contain invalid characters like '-': argument of type 'AnsibleUndefined' is not iterable"}

ansible 2.8.5 on macOS 10.15.1
Debian 10.1
Proxmox 6.0-9

I don't know if it's related but I found a similar issue here: #31. If this problem not related to this role please feel free to close this issue.

Thanx

interfaces template and resolvconf

In the section of README.md which discusses the interfaces-{{ pve_group }}.j2 template, the sample code demonstrates setting the DNS servers inside the iface section:

    dns-nameservers 10.2.2.4 10.3.2.4
    dns-search local

But based on Defining the DNS Servers, it appears that this is only the case if you have installed the resolvconf package: on a fresh install of Proxmox this is not present, and the original installer will have created a static /etc/resolv.conf file.

So perhaps this should be noted in the documentation, and/or explicit code to configure /etc/resolv.conf could be added, similarly to the support for configuring NTP?

Creation of users in pve realm with password

Hey there,

I'm trying to create an api user in the PVE realm with a password like this:

pve_users:
  - name: api@pve
    password: "MyPassword123"
    groups:
      - api_users

While the user itself gets created correctly, the corresponding password is not stored on the server, thus the user is not able to login. According to the documentation passwords in the pve realm should be encrypted and written to /etc/pve/priv/shadow.cfg. This file does not exist upon creation of the user through this role.

I'm able to set the password for the user from within the web admin manually, upon which the file is created and I'm able to login with the user. Maybe I'm missing something here?

ansible 2.8.5 on macOS 10.15.1
Debian 10.1
Proxmox 6.0-9

Thanx!

OSD creation isn't fully idempotent

Splitting off issue #72.

I finished with this issue and testing the other #69 (took some time to prepare my bare metal cluster that I am going to install). I have some problems regarding idempodent creation of osd. The role stops if an osd is already in use.

@lae Should we check, if a defined OSD is already in use and move on with the other tasks?

Example:
I had some timeout during OSD creation and it was necessary to replay the playbook. The OSD's were created properly but the execution stops after OSD creation. Role does not create pools and storages at this stage.

Originally posted by @mholasek in #72 (comment)

yes - I'm unfortunately not able to test idempotency of that part myself (no longer working at fireeye/don't have access to a lot of physical hardware anymore) and CI can't either. All tasks should be made idempotent.

I'm not sure if you can just use a creates argument on the OSD creation step (because iirc OSD creation picks a random number for the folder name) but maybe you could add another task before the creation step to check if there is an existing OSD associated with the selected drive, and then skip OSD creation/configuration tasks based on the result?

Originally posted by @lae in #72 (comment)

@lae I am not sure why the current osd creation is done via "creates:" argument by checking for a /dev/sd?1. I'll do my tests with proxmox v6.0 and nautilus and it does not create any partition. Maybe this is correct under luminous, but I have no chance to test this.

Thats why I tried another approach by checking if there is already any ceph lvm volume via ceph-volume lvm listcommand. But I am not sure if this can be used for luminous as well (but I'll guess).

What do you think, should we go for this? If OK, I'll do some additional testing, enhance the readme within the next days and create a pull request after that. Maybe we mention, that ceph task is still beta and only tested with proxmox v6.0? You can take a look at the fork (feature branch): https://github.com/mholasek/ansible-role-proxmox/tree/feature/ceph-replication-network.

Originally posted by @mholasek in #72 (comment)

@mholasek What's some example JSON output from that command once you get a successful OSD provisioned with pveceph? (docs show that you can use --format=json) Just to confirm, the lvm list command doesn't have anything to do with Linux LVM, right? (In other words, this command should be appropriate for checking provisioned OSDs in all deployment scenarios with pveceph?) If that's the case, we could possibly create a small Ansible module for OSD creation without needing to write any parsing code.

So I had forgotten earlier, but I believe {device}1 was selected on the basis that the pveceph osd create command creates a partition on the device, and that was picked instead of something like /var/lib/ceph/osd-5 (which is what I was referring to in my previous comment) to keep the OSD creation step idempotent. (For mutual reference, you're referring to this line, right? So it's not necessarily /dev/sd?1, unless you're finding that somewhere else?) The pveceph tool expects the devices passed to be one the patterns listed in this comment, at least in PVE5. Are you trying with something different? (maybe you could check if that code is any different in PVE6?)

Also, I did add a note to the README in the Ceph PR I mentioned earlier stating that PVE Ceph management with this role is experimental, so I think we're fine there.

Originally posted by @lae in #72 (comment)

pvecm usability breakage?

Clustering is failing on PVE 5.x test, I'm guessing pvecm got an update that for some reason requires inputting the root password when we're SSHing in as root in the first place?

TASK [lae.proxmox : Add node to Proxmox cluster] *******************************
skipping: [proxmox-5-03.lxc] => {"changed": false, "skip_reason": "Conditional result was False"}
changed: [proxmox-5-02.lxc] => {"changed": true, "cmd": ["pvecm", "add", "10.0.3.49", "-ring0_addr", "10.0.3.149"], "delta": "0:00:00.471548", "end": "2018-04-22 09:00:24.708668", "rc": 0, "start": "2018-04-22 09:00:24.237120", "stderr": "EOF while reading password", "stderr_lines": ["EOF while reading password"], "stdout": "Please enter superuser (root) password for '10.0.3.49':", "stdout_lines": ["Please enter superuser (root) password for '10.0.3.49':"]}
TASK [lae.proxmox : Remove stale corosync lock file due to lack of quorum during initialization] ***
skipping: [proxmox-5-03.lxc] => {"changed": false, "skip_reason": "Conditional result was False"}
ok: [proxmox-5-02.lxc] => {"changed": false, "path": "/etc/pve/priv/lock/file-corosync_conf", "state": "absent"}
TASK [lae.proxmox : set_fact] **************************************************
ok: [proxmox-5-02.lxc] => {"ansible_facts": {"__pve_current_node": "proxmox-5-03.lxc"}, "changed": false}
ok: [proxmox-5-03.lxc] => {"ansible_facts": {"__pve_current_node": "proxmox-5-03.lxc"}, "changed": false}
TASK [lae.proxmox : Add node to Proxmox cluster] *******************************
skipping: [proxmox-5-02.lxc] => {"changed": false, "skip_reason": "Conditional result was False"}
changed: [proxmox-5-03.lxc] => {"changed": true, "cmd": ["pvecm", "add", "10.0.3.49", "-ring0_addr", "10.0.3.209"], "delta": "0:00:00.465110", "end": "2018-04-22 09:00:25.767593", "rc": 0, "start": "2018-04-22 09:00:25.302483", "stderr": "EOF while reading password", "stderr_lines": ["EOF while reading password"], "stdout": "Please enter superuser (root) password for '10.0.3.49':", "stdout_lines": ["Please enter superuser (root) password for '10.0.3.49':"]}

Add support for cephx authentication for Ceph storage backends

Whilst it is possible to create a pve_storage that points to an ceph rbd:

pve_storages:
  - name: vm-storage
    type: rbd
    content:
    - images
    - rootdir
    pool: vm-storage
    username: admin
    monhost:
    - proxmox-test.corp.zenntrix.com:6789
    - test-box.corp.zenntrix.com:6789
pve_ceph_pools:
  - name: vm-storage
    pgs: 64
    application: rbd
    storage: true
    rule: hdd

It is inaccessible due to the keyring not being created. I have only tried this with cephx, i believe it wouldn't be a problem without it.

Ceph deployment documentation

The Ceph configuration section in the README is a bit lackluster and requires the user to parse through the example provided themselves - of which some might not even be needed for some users. Ideally, this section should provide steps and explanations along the way for configuring PVE Ceph with the help of this role.

I haven't exactly used this feature myself, so it might be best for someone who has experience with this use case to write said documentation.

Manage authentication realms in Proxmox

By default pam and pve are realms created within Proxmox for authenticating users into Proxmox. This supports most use cases. However, Proxmox supports LDAP/AD realms, which some users may want:

https://pve.proxmox.com/wiki/User_Management#pveum_authentication_realms

This role should introduce a proxmox_realm (or something else) module to help manage and create these realms.

For new contributors:

There are existing modules in library/ that can be used for reference for this issue. The PVE API documentation is also available at https://pve.proxmox.com/pve-docs/api-viewer/.

Remove old PVE kernels when not in use

Depending on environment, /boot maybe become full after several PVE kernel updates. This role should have a toggle that allows purging of older, unused kernels.

For implementation, ensure that this role a) check what kernel is running and skip its removal and b) keep the latest installed version.

Ceph init should include possibility to add replication network

We need the ability to create ceph with different public and replication networks.
I miss an option to define --cluster-network <string>.

I would create a variable to define cluster-network and if this cluster network is defined, this option should be added.

Something like that:

  command: 'pveceph init --network {{ pve_ceph_network }} --cluster-network {{ pve_ceph_cluster_network }}' 

US based Proxmox hosts for this role?

hey

have you (or any of the users of this role) used it to deploy clusters on US based dedicated server hosts?

I've had servers at OVH in Quebec, but want to stay within US jurisdiction

any suggestions for Proxmox friendly hosters where this role has been tested or used in production?

Some more deprecated warnings

This should be fixed for later Ansible version support.

TASK [lae.proxmox : Configure Ceph package source] ***********************************************************************************************************************
[DEPRECATION WARNING]: evaluating pve_ceph_enabled as a bare variable, this behaviour will go away and you might need to add |bool to the expression in the future. Also 
see CONDITIONAL_BARE_VARS configuration toggle.. This feature will be removed in version 2.12. Deprecation warnings can be disabled by setting deprecation_warnings=False
 in ansible.cfg.

TASK [lae.proxmox : Install Ceph packages] *******************************************************************************************************************************
[DEPRECATION WARNING]: evaluating pve_ceph_enabled as a bare variable, this behaviour will go away and you might need to add |bool to the expression in the future. Also 
see CONDITIONAL_BARE_VARS configuration toggle.. This feature will be removed in version 2.12. Deprecation warnings can be disabled by setting deprecation_warnings=False
 in ansible.cfg.

TASK [lae.proxmox : Install custom Ceph systemd service] *****************************************************************************************************************
[DEPRECATION WARNING]: evaluating pve_ceph_enabled as a bare variable, this behaviour will go away and you might need to add |bool to the expression in the future. Also 
see CONDITIONAL_BARE_VARS configuration toggle.. This feature will be removed in version 2.12. Deprecation warnings can be disabled by setting deprecation_warnings=False
 in ansible.cfg.

TASK [lae.proxmox : Enable Ceph] *****************************************************************************************************************************************
[DEPRECATION WARNING]: evaluating pve_ceph_enabled as a bare variable, this behaviour will go away and you might need to add |bool to the expression in the future. Also 
see CONDITIONAL_BARE_VARS configuration toggle.. This feature will be removed in version 2.12. Deprecation warnings can be disabled by setting deprecation_warnings=False
 in ansible.cfg.

TASK [lae.proxmox : Create initial Ceph config] **************************************************************************************************************************
[DEPRECATION WARNING]: evaluating pve_ceph_enabled as a bare variable, this behaviour will go away and you might need to add |bool to the expression in the future. Also 
see CONDITIONAL_BARE_VARS configuration toggle.. This feature will be removed in version 2.12. Deprecation warnings can be disabled by setting deprecation_warnings=False
 in ansible.cfg.

TASK [lae.proxmox : Create initial Ceph monitor] *************************************************************************************************************************
[DEPRECATION WARNING]: evaluating pve_ceph_enabled as a bare variable, this behaviour will go away and you might need to add |bool to the expression in the future. Also 
see CONDITIONAL_BARE_VARS configuration toggle.. This feature will be removed in version 2.12. Deprecation warnings can be disabled by setting deprecation_warnings=False
 in ansible.cfg.

TASK [lae.proxmox : Fail if initial monitor creation failed] *************************************************************************************************************
[DEPRECATION WARNING]: evaluating pve_ceph_enabled as a bare variable, this behaviour will go away and you might need to add |bool to the expression in the future. Also 
see CONDITIONAL_BARE_VARS configuration toggle.. This feature will be removed in version 2.12. Deprecation warnings can be disabled by setting deprecation_warnings=False
 in ansible.cfg.

TASK [lae.proxmox : Create additional Ceph monitors] *********************************************************************************************************************
[DEPRECATION WARNING]: evaluating pve_ceph_enabled as a bare variable, this behaviour will go away and you might need to add |bool to the expression in the future. Also 
see CONDITIONAL_BARE_VARS configuration toggle.. This feature will be removed in version 2.12. Deprecation warnings can be disabled by setting deprecation_warnings=False
 in ansible.cfg.

TASK [lae.proxmox : List Ceph CRUSH rules] *******************************************************************************************************************************
[DEPRECATION WARNING]: evaluating pve_ceph_enabled as a bare variable, this behaviour will go away and you might need to add |bool to the expression in the future. Also 
see CONDITIONAL_BARE_VARS configuration toggle.. This feature will be removed in version 2.12. Deprecation warnings can be disabled by setting deprecation_warnings=False
 in ansible.cfg.

TASK [lae.proxmox : List Ceph Pools] *************************************************************************************************************************************
[DEPRECATION WARNING]: evaluating pve_ceph_enabled as a bare variable, this behaviour will go away and you might need to add |bool to the expression in the future. Also 
see CONDITIONAL_BARE_VARS configuration toggle.. This feature will be removed in version 2.12. Deprecation warnings can be disabled by setting deprecation_warnings=False
 in ansible.cfg.
TASK [lae.proxmox : Create Ceph MDS servers] *****************************************************************************************************************************
[DEPRECATION WARNING]: evaluating pve_ceph_enabled as a bare variable, this behaviour will go away and you might need to add |bool to the expression in the future. Also 
see CONDITIONAL_BARE_VARS configuration toggle.. This feature will be removed in version 2.12. Deprecation warnings can be disabled by setting deprecation_warnings=False
 in ansible.cfg.

TASK [lae.proxmox : Wait for standby MDS] ********************************************************************************************************************************
[DEPRECATION WARNING]: evaluating pve_ceph_enabled as a bare variable, this behaviour will go away and you might need to add |bool to the expression in the future. Also 
see CONDITIONAL_BARE_VARS configuration toggle.. This feature will be removed in version 2.12. Deprecation warnings can be disabled by setting deprecation_warnings=False
 in ansible.cfg.

TASK [lae.proxmox : List Ceph Filesystems] *******************************************************************************************************************************
[DEPRECATION WARNING]: evaluating pve_ceph_enabled as a bare variable, this behaviour will go away and you might need to add |bool to the expression in the future. Also 
see CONDITIONAL_BARE_VARS configuration toggle.. This feature will be removed in version 2.12. Deprecation warnings can be disabled by setting deprecation_warnings=False
 in ansible.cfg.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.