stackhpc / kayobe-original Goto Github PK

View Code? Open in Web Editor NEW

26.0 17.0 16.0 4.56 MB

Deployment of containerised OpenStack to bare metal using OpenStack kolla and bifrost

openstack kolla-ansible kolla ansible ironic baremetal

kayobe-original's Introduction

Kayobe

Kayobe has moved to openstack/kayobe. Please submit patches via gerrit. Please file bugs and RFEs in StoryBoard.

kayobe-original's People

Contributors

Stargazers

Watchers

Forkers

xiaoruiguo johngarbutt ktibi markgoddard yankcrime autonomouse th3architect pforai diranetafen ms-building-blocks miradam jovial bbezak antonkorsov-az chazzrobbz aethylred

kayobe-original's Issues

Add hosts to SSH known hosts

Currently, kayobe adds SSH known host entries for the remote hosts during these commands:

kayobe seed host configure
kayobe seed hypervisor host configure
kayobe overcloud host configure

These commands are typically run once after provisioning these hosts, and typically should not be run again unless changes to configuration are required. As the SSH known hosts apply to the local ansible control host, when subsequently managing the same system from a different ansible control host, SSH known hosts entries may not exist, and the user will be prompted.

Kayobe should add a command to add known hosts, or ensure known hosts entries exist for the target hosts prior running any command. The existing ssh-known-host.yml playbook could easily be reused here.

Admin openrc files not present in a fresh control host environment

Kolla-ansible generates an openrc file for the admin user at ${KOLLA_CONFIG_PATH}/admin-openrc.sh. Kayobe also generates one which uses the public OpenStack endpoints (${KOLLA_CONFIG_PATH}/public-openrc.sh). Typically this will not be committed to a kayobe-config repo, as it contains the admin password in plain text.

These files are typically required when interacting with the OpenStack APIs as the admin user, e.g. when registering IPA deployment images with Glance. Kayobe should provide a command to regenerate these files.

In the mean time, this may be done via these commands:

kayobe kolla ansible run post-deploy -ke node_config_directory=${KOLLA_CONFIG_PATH}
kayobe playbook run ansible/public-openrc.yml

Add support for policy-based routing

Add support for configuration of policy-based routing to kayobe. Ideally this would be added to the MichaelRigart.interfaces ansible role, with appropriate filters added to/updated in ansible/filter_plugins/network.py in kayobe. This should follow the existing network configuration patterns, as in the kayobe docs.

Improve container image build workflow

The existing container image build workflow has a number of areas where it could be improved.

Cleaner separation of the build process from the running system. Seed images are currently built on the seed, and control plane images on all of the controllers. Ideally, these would be decoupled, allowing for images to be built and tested elsewhere. The addition of e.g. [seed-container-builder] and [overcloud-container-builder] groups could help here. It should still be possible to build locally on the seed or controller hosts, as this is convenient for development and testing.
Documentation of the workflow, use of registries, and a build, push, pull model.

Apply upper-constraints when installing OpenStack python packages

Overview

When installing OpenStack python packages it is recommended to use an upper constraints file to place an upper limit on dependent package versions. In particular this applies to kolla and kolla-ansible.

See https://github.com/openstack/bifrost/blob/0809195f01736f5729f8b991e1e2330d0de81a09/playbooks/roles/bifrost-ironic-install/tasks/install.yml#L54 for an example of how this is done in bifrost.

Nested list of storage groups causes templating of overcloud inventory to fail

Overview

The storage group mapping from kolla-ansible to kayobe added in PR #84 uses a nested list. This causes the generation of the overcloud inventory to fail.

Steps to reproduce

For example, this was seen whilst running:

kayobe overcloud service reconfigure -kt haproxy

Expected results

Overcloud inventory file is generated

Actual results

TASK [kolla-ansible : Ensure the Kolla overcloud inventory file exists] **********************************************************************************************************************************************************************
fatal: [localhost]: FAILED! => {"failed": true, "msg": "{{ kolla_overcloud_inventory_custom or kolla_overcloud_inventory_default }}: # This file is managed by Ansible. Do not edit.\n\n# Overcloud inventory file for kolla-ansible.\n\n{{ kolla_overcloud_inventory_top_level }}\n\n{{ kolla_overcloud_inventory_components }}\n\n{{ kolla_overcloud_inventory_services }}\n: {{ kolla_overcloud_inventory_custom_top_level or kolla_overcloud_inventory_default_top_level }}: {{ lookup('template', 'overcloud-top-level.j2') }}: An unhandled exception occurred while running the lookup plugin 'template'. Error was a <class 'ansible.errors.AnsibleError'>, original message: Unexpected templating type error occurred on (# This inventory section provides a mapping of the top level groups to hosts.\n#\n# Top level groups define the roles of hosts, e.g. controller or compute.\n# Components define groups of services, e.g. nova or ironic.\n# Services define single containers, e.g. nova-compute or ironic-api.\n\n{% set top_level_groups = kolla_overcloud_inventory_top_level_group_map.values() |\n                          selectattr('groups', 'defined') |\n                          map(attribute='groups') |\n                          sum(start=[]) |\n                          unique |\n                          list %}\n\n#{% for group in top_level_groups %}\n# Top level {{ group }} group.\n#[{{ group }}]\n# These hostnames must be resolvable from your deployment host\n#{% for host in groups.get(group, []) %}\n#{% set host_hv=hostvars[host] %}\n#{{ host }}{% for hv_name in kolla_overcloud_inventory_pass_through_host_vars %}{% if hv_name in host_hv %} {{ hv_name | regex_replace('^kolla_(.*)$', '\\\\1') }}={{ host_hv[hv_name] }}{% endif %}{% endfor %}\n\n#{% endfor %}\n\n#{% endfor %}\n#[overcloud:children]\n#{% for group in top_level_groups %}\n#{{ group }}\n#{% endfor %}\n\n#[overcloud:vars]\n#ansible_user=kolla\n#ansible_become=true\n#{% if kolla_ansible_target_venv is not none %}\n# Execute ansible modules on the remote target hosts using a virtualenv.\n#ansible_python_interpreter={{ kolla_ansible_target_venv }}/bin/python\n#{% endif %}\n\n\n#{% for kolla_group, kolla_group_config in kolla_overcloud_inventory_top_level_group_map.items() %}\n#{% if 'groups' in kolla_group_config %}\n#{% set renamed_groups = kolla_group_config.groups | difference([kolla_group]) | list %}\n#{% if renamed_groups | length > 0 %}\n# Mapping from kolla-ansible group {{ kolla_group }} to top level kayobe\n# groups.\n#[{{ kolla_group }}:children]\n#{% for group in kolla_group_config.groups %}\n#{{ group }}\n#{% endfor %}\n\n#{% endif %}\n#{% endif %}\n#{% if 'vars' in kolla_group_config %}\n# Mapping from kolla-ansible group {{ kolla_group }} to top level kayobe\n# variables.\n#[{{ kolla_group }}:vars]\n#{% for var_name, var_value in kayobe_group_config.vars.items() %}\n#{{ var_name }}={{ var_value }}\n#{% endfor %}\n\n#{% endif %}\n#{% endfor %}\n#{% for group in kolla_overcloud_inventory_kolla_top_level_groups %}\n#{% if group not in kolla_overcloud_inventory_top_level_group_map %}\n# Empty group definition for {{ group }}.\n#{{ group }}\n\n#{% endif %}\n#{% endfor %}): unhashable type: 'list'"}

[Ceph] Support storage group for dedicated node

We need to add a storage group for specific nodes dedicated to storage role.

We can always use the group compute or controller for ceph.

PR in progress : #84

kolla is attempting to read in in an ini file as YAML during `kayobe overcloud host configure`

kolla is attempting to read in in an ini file as YAML during kayobe overcloud host configure

        Bootstraping servers : ansible-playbook -i /vagrant/kayobe-config_staging/etc/kolla/inventory/overcloud -e @/vagrant/kayobe-config_staging/etc/kolla/globals.yml -e @/vagrant/kayobe-config_staging/etc/kolla/passwords.yml -e CONFIG_DIR=/vagrant/kayobe-config_staging/etc/kolla  --vault-password-file=/root/kayobe-venv/bin/kayobe-vault-password-helper -e ansible_user=stack -e action=bootstrap-servers /vagrant/kayobe/venvs/kolla-ansible/share/kolla-ansible/ansible/kolla-host.yml 
        ERROR! Attempted to read "/vagrant/kayobe-config_staging/etc/kolla/inventory/overcloud" as YAML: Syntax Error while loading YAML.
        
        
        The error appears to have been in '/vagrant/kayobe-config_staging/etc/kolla/inventory/overcloud': line 15, column 1, but may
        be elsewhere in the file depending on the exact syntax problem.
        
        The offending line appears to be:
        
        # These hostnames must be resolvable from your deployment host
        stg-vircon0001 ansible_host=10.114.143.2
        ^ here
        
        Attempted to read "/vagrant/kayobe-config_staging/etc/kolla/inventory/overcloud" as ini file: /vagrant/kayobe-config_staging/etc/kolla/inventory/overcloud:259: Section [nova-compute-ironic:children] includes undefined group: stg-vircon0002 
        Command failed ansible-playbook -i /vagrant/kayobe-config_staging/etc/kolla/inventory/overcloud -e @/vagrant/kayobe-config_staging/etc/kolla/globals.yml -e @/vagrant/kayobe-config_staging/etc/kolla/passwords.yml -e CONFIG_DIR=/vagrant/kayobe-config_staging/etc/kolla  --vault-password-file=/root/kayobe-venv/bin/kayobe-vault-password-helper -e ansible_user=stack -e action=bootstrap-servers /vagrant/kayobe/venvs/kolla-ansible/share/kolla-ansible/ansible/kolla-host.yml 
        kolla-ansible bootstrap-servers exited 1

Yet if I run that command manually, it works!

Affects kayobe version 1b96a20895ad128e2ff296f3e88533a77a34efdc

Filter physical network configuration by interface

Currently, it is not possible to target specific interfaces when configuring physical network devices via kayobe physical network configure. In particular, this would be useful when enabling discovery mode via kayobe physical network configure --enable-discovery, to discover a subset of compute hosts without affecting the network configuration of all compute hosts.

Use case: adding compute nodes to an existing cloud.

Generation of Kolla-ansible passwords.yml fails with --vault-password-file or --ask-vault-pass

This could occur during a number of commands in which kolla-ansible is configured. Example output:

TASK [kolla-ansible : Ensure the Kolla passwords file exists] ******************
fatal: [localhost]: FAILED! => {"changed": false, "cmd": "kolla-mergepwd --old /tmp/tmpuCnRK2 --new /tmp/tmpFFiuSg --final /tmp/tmpFFiuSg", "failed": true, "msg": "Traceback (most recent call last):\n  File \"/home/darryl/venvs/kolla-ansible/bin/kolla-mergepwd\", line 10, in <module>\n    sys.exit(main())\n  File \"/home/darryl/venvs/kolla-ansible/lib/python2.7/site-packages/kolla_ansible/cmd/mergepwd.py\", line 32, in main\n    new_passwords.update(old_passwords)\nValueError: dictionary update sequence element #0 has length 1; 2 is required", "rc": 1, "stderr": "Traceback (most recent call last):\n  File \"/home/darryl/venvs/kolla-ansible/bin/kolla-mergepwd\", line 10, in <module>\n    sys.exit(main())\n  File \"/home/darryl/venvs/kolla-ansible/lib/python2.7/site-packages/kolla_ansible/cmd/mergepwd.py\", line 32, in main\n    new_passwords.update(old_passwords)\nValueError: dictionary update sequence element #0 has length 1; 2 is required\n", "stdout": "", "stdout_lines": []}

Workaround: use $KAYOBE_VAULT_PASSWORD to define the vault password.

Seed VM network bootstrap issues

When provisioning a seed VM via kayobe seed vm provision, typically the VM cannot be SSH'd into for at least 5-10 minutes due to DHCP timeouts. Sometimes the VM cannot be accessed even after this timeout and must be manually configured via the console before proceeding with the kayobe seed host configure command.

Enable multidomain support for horizon

Enable kayobe to set the option: OPENSTACK_KEYSTONE_MULTIDOMAIN_SUPPORT in the horizon config file to enable the domain login for Horizon web interface.

Tags don't work with kayobe overcloud post configure

Steps to reproduce

kayobe overcloud post configure --tags provision-net

Expected results

The provision-net playbook tasks are executed.

Actual results

The provision-net playbook tasks are skipped.

Kayobe uses personal github repositories

Kayobe uses personal github repositories (e.g https://github.com/jriguera/ansible-role-configdrive/ and
https://github.com/ahuffman/ansible-resolv) which should be backed up on stackHPC’s own github account and/or moved to ansible galaxy (to ensure they don’t disappear overnight if their owners close their github accounts or something)

Bootstrap user SSH access required after bootstrapping is complete

Overview

In order to run the command kayobe <seed|overcloud> host configure, the user running kayobe must have an SSH key that is authorised for the bootstrap user (the one that sets up the stack account). This might not be ideal, particularly in scenarios when the bootstrap user is root.

Steps to reproduce

Against a system that has already been deployed, run kayobe <seed|overcloud> host configure, using an SSH key not authorised for the bootstrap user account.

Expected results

The command succeeds.

Actual results

The command fails with the following output:

PLAY [Ensure the Kayobe Ansible user account exists] *********************************************************************************************************************************************************

TASK [Gathering Facts] ***************************************************************************************************************************************************************************************
fatal: [controller]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).\r\n", "unreachable": true}

Environment

Support configuration of package repositories for containers

Kolla supports the use of custom package repositories when building container images. This should be possible in kayobe without changes, but this needs to be verified.

Kolla docs: https://docs.openstack.org/kolla/latest/image-building.html#custom-repos.

RabbitMQ upgrade fails due to stale /etc/hosts entries

Seen on Ocata and Pike.

When running kayobe overcloud service upgrade, the RabbitMQ upgrade fails. This is due to stale entries in /etc/hosts for the controller's hostname with the overcloud provisioning network IP address. RabbitMQ requires the hostname to resolve to the IP on which it is listening, namely the internal network IP address. The issue can be resolved by removing the stale entries in the rabbitmq container. They should also be removed from the host to prevent propagation to new containers.

Here's an example of a broken /etc/hosts:

cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

10.41.253.103 sv-b16-u23
10.41.253.103 sv-b16-u23
10.41.253.103 sv-b16-u23
10.41.253.103 sv-b16-u23
10.41.253.103 sv-b16-u23
10.41.253.103 sv-b16-u23
127.0.0.1 localhost
127.0.0.1 localhost
127.0.0.1 localhost
127.0.0.1 localhost
127.0.0.1 localhost
127.0.0.1 localhost
# BEGIN ANSIBLE GENERATED HOSTS
192.168.7.11 sv-b16-u23
# END ANSIBLE GENERATED HOSTS

The 10.41.253.103 entries are incorrect.

Here's an example output on failure:

PLAY [Apply role rabbitmq] *****************************************************

TASK [setup] *******************************************************************
ok: [sv-b16-u23]

TASK [common : include] ********************************************************
skipping: [sv-b16-u23]

TASK [common : Registering common role has run] ********************************
skipping: [sv-b16-u23]

TASK [rabbitmq : include] ******************************************************
included: /opt/alaska/alt-1/venvs/kolla/share/kolla-ansible/ansible/roles/rabbitmq/tasks/upgrade.yml for sv-b16-u23

TASK [rabbitmq : Checking if rabbitmq container needs upgrading] ***************
ok: [sv-b16-u23]

TASK [rabbitmq : include] ******************************************************
included: /opt/alaska/alt-1/venvs/kolla/share/kolla-ansible/ansible/roles/rabbitmq/tasks/config.yml for sv-b16-u23

TASK [rabbitmq : Ensuring config directories exist] ****************************
ok: [sv-b16-u23] => (item=rabbitmq)

TASK [rabbitmq : Copying over config.json files for services] ******************
ok: [sv-b16-u23] => (item=rabbitmq)

TASK [rabbitmq : Copying over rabbitmq configs] ********************************
ok: [sv-b16-u23] => (item=rabbitmq-env.conf)
ok: [sv-b16-u23] => (item=rabbitmq.config)
ok: [sv-b16-u23] => (item=rabbitmq-clusterer.config)
ok: [sv-b16-u23] => (item=definitions.json)

TASK [rabbitmq : Find gospel node] *********************************************
fatal: [sv-b16-u23]: FAILED! => {"changed": true, "cmd": ["docker", "exec", "-t", "rabbitmq", "/usr/local/bin/rabbitmq_get_gospel_node"], "delta": "0:00:01.263525", "end": "2017-09-15 15:28:36.476105", "failed": true, "failed_when_result": true, "rc": 0, "start": "2017-09-15 15:28:35.212580", "stderr": "", "stdout": "{\"failed\": true, \"changed\": true, \"error\": \"Traceback (most recent call last):\\n  File \\\"/usr/local/bin/rabbitmq_get_gospel_node\\\", line 29, in main\\n    shell=True, stderr=subprocess.STDOUT  # nosec: this command appears\\n  File \\\"/usr/lib64/python2.7/subprocess.py\\\", line 575, in check_output\\n    raise CalledProcessError(retcode, cmd, output=output)\\nCalledProcessError: Command '/usr/sbin/rabbitmqctl eval 'rabbit_clusterer:status().'' returned non-zero exit status 2\\n\"}", "stdout_lines": ["{\"failed\": true, \"changed\": true, \"error\": \"Traceback (most recent call last):\\n  File \\\"/usr/local/bin/rabbitmq_get_gospel_node\\\", line 29, in main\\n    shell=True, stderr=subprocess.STDOUT  # nosec: this command appears\\n  File \\\"/usr/lib64/python2.7/subprocess.py\\\", line 575, in check_output\\n    raise CalledProcessError(retcode, cmd, output=output)\\nCalledProcessError: Command '/usr/sbin/rabbitmqctl eval 'rabbit_clusterer:status().'' returned non-zero exit status 2\\n\"}"], "warnings": []}

NO MORE HOSTS LEFT *************************************************************
        to retry, use: --limit @/opt/alaska/alt-1/venvs/kolla/share/kolla-ansible/ansible/site.retry

PLAY RECAP *********************************************************************
sv-b16-u23                 : ok=75   changed=5    unreachable=0    failed=1

IPA image for used by inspector does not get updated when specified via URL

The IPA ramdisk and kernel images may be built or downloaded via a URL. If the latter option is used, any images previously downloaded to $KOLLA_CONFIG_PATH/config/ironic/ironic-agent.* will not be updated if the image contents change.

There are a few options.

Specify force=true in get_url to always download the images.
Allow specification of checksum URLs to detect changes in the file contents.
Check the content length or Etag via uri module and compare with file size, as described in ansible/ansible#30003.

Support different interface names on all interfaces

Overview

On multi node deployments interfaces may have different names. For example, the api_interface may be called breno1 on one host, and eno1 on another. Support has been added for this in this PR, however other interfaces are assumed to be in-variate across hosts.

Enhancement

Support all interfaces and write the variables to an inventory/host_vars/ file. Hard code the interface variable names rather than adding them to the pass through vars list.

Ansible vault error when checking the passwords file

On a test system I see the following error:

TASK [kolla-ansible : Ensure the Kolla passwords file exists] **********************************************************************************************************************************************
fatal: [localhost]: FAILED! => {"changed": false, "cmd": "ansible-vault decrypt --vault-password-file /tmp/tmpgExNTu /tmp/tmpFwyBsD", "msg": "ERROR! input is not vault encrypted data/tmp/tmpFwyBsD is not a vault encrypted file for /tmp/tmpFwyBsD", "rc": 1, "stderr": "ERROR! input is not vault encrypted data/tmp/tmpFwyBsD is not a vault encrypted file for /tmp/tmpFwyBsD\n", "stderr_lines": ["ERROR! input is not vault encrypted data/tmp/tmpFwyBsD is not a vault encrypted file for /tmp/tmpFwyBsD"], "stdout": "", "stdout_lines": []}

The command-line is:
kayobe overcloud host configure --vault-password-file /home/kayobe/information

The file etc/kolla/passwords.yml does exist.

Enable Kayobe to deploy keystone policy.json file

We would like to be able to deploy a custom keystone policy.json file.
This is supported in kolla, but needs to be enabled in Kayobe.

Network not found: no network with matching name 'management'

TASK [stackhpc.libvirt-vm : Ensure the VM is running and started at boot] *************************************************************************************************
fatal: [control]: FAILED! => {"changed": false, "failed": true, "msg": "Network not found: no network with matching name 'management'"}

I config references http://kayobe.readthedocs.io/en/latest/configuration/network.html.

run: kayobe seed vm provision

Python packages installed via pip may conflict with system site-packages

Overcloud nodes may have the python-requests and python-urllib3 library installed both via pip and via yum. This can cause dependency issues when running kayobe such as:

fatal: [MY_NODE]: FAILED! => {"changed": false, "failed": true, "module_stderr": "Shared connection to MY_IP closed.\r\n", "module_stdout": "Traceback (most recent call last):\r\n  File \"/tmp/ansible_12vatz/ansible_module_kolla_container_facts.py\", line 52, in <module>\r\n    import docker\r\n  File \"/usr/lib/python2.7/site-packages/docker/__init__.py\", line 2, in <module>\r\n    from .api import APIClient\r\n  File \"/usr/lib/python2.7/site-packages/docker/api/__init__.py\", line 2, in <module>\r\n    from .client import APIClient\r\n  File \"/usr/lib/python2.7/site-packages/docker/api/client.py\", line 6, in <module>\r\n    import requests\r\n  File \"/usr/lib/python2.7/site-packages/requests/__init__.py\", line 43, in <module>\r\n    import urllib3\r\n  File \"/usr/lib/python2.7/site-packages/urllib3/__init__.py\", line 10, in <module>\r\n    from .connectionpool import (\r\n  File \"/usr/lib/python2.7/site-packages/urllib3/connectionpool.py\", line 31, in <module>\r\n    from .connection import (\r\n  File \"/usr/lib/python2.7/site-packages/urllib3/connection.py\", line 45, in <module>\r\n    from .util.ssl_ import (\r\n  File \"/usr/lib/python2.7/site-packages/urllib3/util/__init__.py\", line 4, in <module>\r\n    from .request import make_headers\r\n  File \"/usr/lib/python2.7/site-packages/urllib3/util/request.py\", line 5, in <module>\r\n    from ..exceptions import UnrewindableBodyError\r\nImportError: cannot import name UnrewindableBodyError\r\n", "msg": "MODULE FAILURE"}

A workaround is to manually make sure that on each node only one copy of each library is installed. For example:

pip uninstall requests
pip uninstall urllib3
yum remove python-requests python-urllib3
yum install python-requests python-urllib3

We should investigate if we are installing both packages during deployment. It's possible that they may have been installed out-of-band and that this is not a valid bug.

Overcloud host bifrost network bootstrap issues

When provisioning bare metal overcloud hosts via kayobe overcloud provision, typically the hosts cannot be SSH'd into for at least 5-10 minutes due to DHCP timeouts. Sometimes the hosts cannot be accessed even after this timeout and must be manually configured via the console before proceeding with the kayobe overcloud host configure command.

Deployment fails with 3 Ceph mon

Overview

Hi
I'm trying to deploy OpenStack with three controllers nodes. But I'm facing an issue went kolla bootstrap Ceph container. The deployment fails and OSD are not correctly created

Steps to reproduce

Deployment with three controllers nodes (ceph enabled)

Expected results

Actual results

I have found those logs :

-> on compute

  *ceph-client.admin.log :

monclient(hunting): authenticate timed out after 300
librados: client.admin authentication error (110) Connection timed out

  * docker ps -a :

aa48da67a3c7 private-registry/centos-binary-ceph-osd:5.0.0 "kolla_start" 45 minutes ago Exited (1) 40 minutes ago bootstrap_osd_5
6d12c93c668b private-registry/centos-binary-ceph-osd:5.0.0 "kolla_start" About an hour ago Exited (1) 45 minutes ago bootstrap_osd_4
100e9f6c436b private-registry/centos-binary-ceph-osd:5.0.0 "kolla_start" About an hour ago Exited (1) About an hour ago bootstrap_osd_3
6f82eb8ae898 private-registry/centos-binary-ceph-osd:5.0.0 "kolla_start" About an hour ago Exited (1) About an hour ago bootstrap_osd_2
6b909d87ced2 private-registry/centos-binary-ceph-osd:5.0.0 "kolla_start" About an hour ago Exited (1) About an hour ago bootstrap_osd_1
819914f446fb private-registry/centos-binary-ceph-osd:5.0.0 "kolla_start" About an hour ago Exited (1) About an hour ago bootstrap_osd_0

 * docker logs bootstrap_osd_3

INFO:main:Loading config file at /var/lib/kolla/config_files/config.json
INFO:main:Validating config file
INFO:main:Kolla config strategy set to: COPY_ALWAYS
INFO:main:Copying service configuration files
INFO:main:Copying /var/lib/kolla/config_files/ceph.conf to /etc/ceph/ceph.conf
INFO:main:Setting permission for /etc/ceph/ceph.conf
INFO:main:Copying /var/lib/kolla/config_files/ceph.client.admin.keyring to /etc/ceph/ceph.client.admin.keyring
INFO:main:Setting permission for /etc/ceph/ceph.client.admin.keyring
INFO:main:Writing out command to execute
Error connecting to cluster: TimedOut

-> On controller ceph-mon.X.X.X.X.log
2017-12-28 11:09:21.206387 7f116aeba700 0 log_channel(cluster) log [INF] : mon.X.X.X.X calling new monitor election
2017-12-28 11:09:21.206487 7f116aeba700 1 mon.X.X.X.X@2(electing).elector(439) init, last seen epoch 439
2017-12-28 11:09:36.237530 7f116aeba700 0 log_channel(cluster) log [INF] : mon.X.X.X.X calling new monitor election
2017-12-28 11:09:36.237600 7f116aeba700 1 mon.X.X.X.X@2(electing).elector(441) init, last seen epoch 441
2017-12-28 11:09:51.276675 7f116aeba700 0 log_channel(cluster) log [INF] : mon.X.X.X.X calling new monitor election
2017-12-28 11:09:51.276774 7f116aeba700 1 mon.X.X.X.X@2(electing).elector(443) init, last seen epoch 443
2017-12-28 11:10:06.313344 7f116aeba700 0 log_channel(cluster) log [INF] : mon.X.X.X.X calling new monitor election
2017-12-28 11:10:06.313438 7f116aeba700 1 mon.X.X.X.X@2(electing).elector(445) init, last seen epoch 445
2017-12-28 11:10:18.817119 7f116b6bb700 0 mon.X.X.X.X@2(electing).data_health(0) update_stats avail 94% total 329 GB, used 407 MB, avail 312 GB
2017-12-28 11:10:21.350803 7f116aeba700 0 log_channel(cluster) log [INF] : mon.X.X.X.X calling new monitor election
2017-12-28 11:10:21.350913 7f116aeba700 1 mon.X.X.X.X@2(electing).elector(447) init, last seen epoch 447
2017-12-28 11:10:36.385019 7f116aeba700 0 log_channel(cluster) log [INF] : mon.X.X.X.X calling new monitor election
2017-12-28 11:10:36.385101 7f116aeba700 1 mon.X.X.X.X@2(electing).elector(449) init, last seen epoch 449
2017-12-28 11:10:51.424834 7f116aeba700 0 log_channel(cluster) log [INF] : mon.X.X.X.X calling new monitor election
2017-12-28 11:10:51.424952 7f116aeba700 1 mon.X.X.X.X@2(electing).elector(451) init, last seen epoch 451
2017-12-28 11:11:06.459091 7f116aeba700 0 log_channel(cluster) log [INF] : mon.X.X.X.X calling new monitor election
2017-12-28 11:11:06.459160 7f116aeba700 1 mon.X.X.X.X@2(electing).elector(453) init, last seen epoch 453
2017-12-28 11:11:18.817242 7f116b6bb700 0 mon.X.X.X.X@2(electing).data_health(0) update_stats avail 94% total 329 GB, used 408 MB, avail 312 GB
2017-12-28 11:11:21.499690 7f116aeba700 0 log_channel(cluster) log [INF] : mon.X.X.X.X calling new monitor election
2017-12-28 11:11:21.499799 7f116aeba700 1 mon.X.X.X.X@2(electing).elector(455) init, last seen epoch 455
2017-12-28 11:11:36.534253 7f116aeba700 0 log_channel(cluster) log [INF] : mon.X.X.X.X calling new monitor election
2017-12-28 11:11:36.534364 7f116aeba700 1 mon.X.X.X.X@2(electing).elector(457) init, last seen epoch 457
2017-12-28 11:11:51.577507 7f116aeba700 0 log_channel(cluster) log [INF] : mon.X.X.X.X calling new monitor election
2017-12-28 11:11:51.577589 7f116aeba700 1 mon.X.X.X.X@2(electing).elector(459) init, last seen epoch 459
2017-12-28 11:12:06.613963 7f116aeba700 0 log_channel(cluster) log [INF] : mon.X.X.X.X calling new monitor election
2017-12-28 11:12:06.614036 7f116aeba700 1 mon.X.X.X.X@2(electing).elector(461) init, last seen epoch 461
2017-12-28 11:12:18.817438 7f116b6bb700 0 mon.X.X.X.X@2(electing).data_health(0) update_stats avail 94% total 329 GB, used 410 MB, avail 312 GB
2017-12-28 11:12:21.654153 7f116aeba700 0 log_channel(cluster) log [INF] : mon.X.X.X.X calling new monitor election
2017-12-28 11:12:21.654244 7f116aeba700 1 mon.X.X.X.X@2(electing).elector(463) init, last seen epoch 463
2017-12-28 11:12:36.696281 7f116aeba700 0 log_channel(cluster) log [INF] : mon.X.X.X.X calling new monitor election
2017-12-28 11:12:36.696370 7f116aeba700 1 mon.X.X.X.X@2(electing).elector(465) init, last seen epoch 465
2017-12-28 11:12:51.736531 7f116aeba700 0 log_channel(cluster) log [INF] : mon.X.X.X.X calling new monitor election
2017-12-28 11:12:51.736625 7f116aeba700 1 mon.X.X.X.X@2(electing).elector(467) init, last seen epoch 467
2017-12-28 11:13:06.771501 7f116aeba700 0 log_channel(cluster) log [INF] : mon.X.X.X.X calling new monitor election
2017-12-28 11:13:06.771587 7f116aeba700 1 mon.X.X.X.X@2(electing).elector(469) init, last seen epoch 469
2017-12-28 11:13:18.817649 7f116b6bb700 0 mon.X.X.X.X@2(electing).data_health(0) update_stats avail 94% total 329 GB, used 411 MB, avail 312 GB
2017-12-28 11:13:21.810869 7f116aeba700 0 log_channel(cluster) log [INF] : mon.X.X.X.X calling new monitor election
2017-12-28 11:13:21.810984 7f116aeba700 1 mon.X.X.X.X@2(electing).elector(471) init, last seen epoch 471
2017-12-28 11:13:36.851396 7f116aeba700 0 log_channel(cluster) log [INF] : mon.X.X.X.X calling new monitor election
2017-12-28 11:13:36.851495 7f116aeba700 1 mon.X.X.X.X@2(electing).elector(473) init, last seen epoch 473
2017-12-28 11:13:51.890575 7f116aeba700 0 log_channel(cluster) log [INF] : mon.X.X.X.X calling new monitor election
2017-12-28 11:13:51.890661 7f116aeba700 1 mon.X.X.X.X@2(electing).elector(475) init, last seen epoch 475
2017-12-28 11:14:06.930553 7f116aeba700 0 log_channel(cluster) log [INF] : mon.X.X.X.X calling new monitor election
2017-12-28 11:14:06.930642 7f116aeba700 1 mon.X.X.X.X@2(electing).elector(477) init, last seen epoch 477
2017-12-28 11:14:18.817921 7f116b6bb700 0 mon.X.X.X.X@2(electing).data_health(0) update_stats avail 94% total 329 GB, used 412 MB, avail 312 GB
2017-12-28 11:14:21.971071 7f116aeba700 0 log_channel(cluster) log [INF] : mon.X.X.X.X calling new monitor election
2017-12-28 11:14:21.971178 7f116aeba700 1 mon.X.X.X.X@2(electing).elector(479) init, last seen epoch 479
2017-12-28 11:14:37.012441 7f116aeba700 0 log_channel(cluster) log [INF] : mon.X.X.X.X calling new monitor election
2017-12-28 11:14:37.012516 7f116aeba700 1 mon.X.X.X.X@2(electing).elector(481) init, last seen epoch 481
2017-12-28 11:14:52.057103 7f116aeba700 0 log_channel(cluster) log [INF] : mon.X.X.X.X calling new monitor election
2017-12-28 11:14:52.057181 7f116aeba700 1 mon.X.X.X.X@2(electing).elector(483) init, last seen epoch 483
2017-12-28 11:15:07.092539 7f116aeba700 0 log_channel(cluster) log [INF] : mon.X.X.X.X calling new monitor election
2017-12-28 11:15:07.092627 7f116aeba700 1 mon.X.X.X.X@2(electing).elector(485) init, last seen epoch 485
2017-12-28 11:15:18.818098 7f116b6bb700 0 mon.X.X.X.X@2(electing).data_health(0) update_stats avail 94% total 329 GB, used 414 MB, avail 312 GB
2017-12-28 11:15:22.124296 7f116aeba700 0 log_channel(cluster) log [INF] : mon.X.X.X.X calling new monitor election
2017-12-28 11:15:22.124368 7f116aeba700 1 mon.X.X.X.X@2(electing).elector(487) init, last seen epoch 487
2017-12-28 11:15:37.162572 7f116aeba700 0 log_channel(cluster) log [INF] : mon.X.X.X.X calling new monitor election
2017-12-28 11:15:37.162664 7f116aeba700 1 mon.X.X.X.X@2(electing).elector(489) init, last seen epoch 489

-> on Kayobe node (during kayobe overcloud service deploy )
fatal: [ctrl03 -> X.X.X.X]: FAILED! => {"changed": false, "cmd": ["docker", "exec", "ceph_mon", "ceph", "auth", "get-or-create", "client.glance", "mon", "allow r", "osd", "allow class-read object_prefix rbd_children, allow rwx pool=images, allow rwx pool=images-cache"], "delta": "0:05:00.186101", "end": "2017-12-28 11:30:29.577681", "failed": true, "rc": 1, "start": "2017-12-28 11:25:29.391580", "stderr": "Error connecting to cluster: TimedOut", "stderr_lines": ["Error connecting to cluster: TimedOut"], "stdout": "", "stdout_lines": []}

Environment

kayobe --version
kayobe 0.1

cat /etc/centos-release
CentOS Linux release 7.4.1708 (Core)

PS : I've hound similar issue here https://bugs.launchpad.net/kolla/+bug/1629237

Thank you in advance for your help

Support configuration of package repositories for hosts

Add functionality to the kayobe <seed|seed-hypervisor|overcloud> host configure commands to configure package repositories for the seed, seed-hypervisor, and overcloud hosts.

iscsid container restarting because ironic conductor started iscsid systemd service on host

The host's iscsid service on the controllers should be disabled by the kolla-host.yml playbook, to avoid conflicting with the iscsid container. It was observed that the iscsid systemd service was running, and appeared to have been started following ironic conductor performing an iscsi-based image deployment.

Sep 05 10:23:16 kef1p-phycon0003 sudo[103863]:   ironic : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/bin/ironic-rootwrap /etc/ironic/rootwrap.conf iscsiadm -m discovery -t st -p 10.104.128.4:3260
Sep 05 10:23:16 kef1p-phycon0003 systemd[1]: Listening on Open-iSCSI iscsid Socket.
Sep 05 10:23:16 kef1p-phycon0003 systemd[1]: Starting Open-iSCSI iscsid Socket.
Sep 05 10:23:16 kef1p-phycon0003 systemd[1]: Listening on Open-iSCSI iscsiuio Socket.
Sep 05 10:23:16 kef1p-phycon0003 systemd[1]: Starting Open-iSCSI iscsiuio Socket.
Sep 05 10:23:17 kef1p-phycon0003 systemd[1]: Starting Open-iSCSI...

Deployment image build idempotency issues

When building IPA deployment images via one of these commands, in some cases the image may not be rebuilt, even if the image contents have changed.

kayobe seed deployment image build
kayobe overcloud deployment image build

It is often hard to tell whether anything would have changed in the image without actually building it (external repos, etc.). However, rebuilding an image every time seems unnecessary.

Some options:

Image naming to include versions
Image DIB manifest comparison
A --force option to force rebuilding

More verification for common configuration errors

Add common errors here:

Invalid paths for $KAYOBE_CONFIG_PATH, $KOLLA_CONFIG_PATH, and other environment variables. In particular, $KOLLA_CONFIG_PATH should not be $KAYOBE_CONFIG_PATH/kolla.
Minimum ansible version.

Support extension points for custom behaviour

Overview

Kayobe should provide extension mechanisms that allow an operator to apply arbitrary configuration to their cloud, without requiring explicit support in kayobe.

User Stories

As a cloud admin, I want to provide extension points as custom ansible playbooks and roles along with my configuration.
As a cloud admin, I want to run specific extensions on demand, by playbook name.
As a cloud admin, I want to use hooks that run extensions at specific points in the existing deployment flow.
As a cloud admin, I want to add custom kayobe commands to perform custom tasks using extensions.

Proposal

User Story 1

Provide a well-known location within kayobe-config (or other?) for playbooks and roles:

${KAYOBE_CONFIG_PATH}/ansible/*.yml
${KAYOBE_CONFIG_PATH}/ansible/roles/

Provide a mechanism to ensure that playbooks on this path can use kayobe's playbook group variables. Ensure

User Story 2

Provide a kayobe command to run extensions by playbook name.

User Story 3

Define a set of hook points (pre-overcloud host configure, post-overcloud service deploy, etc.). Provide a well-known location within kayobe-config (or other?) for hooks:

${KAYOBE_CONFIG_PATH}/hooks/*.yml

User story 4

Define the supported API of the kayobe python module. Allow extension through Cliff commands as python entry points under kayobe.cli.

Ansible galaxy roles can get out of sync

It's easy for ansible galaxy roles to get out of sync, especially as they tend not to be versioned. Kayobe may depend on features added in galaxy roles, but has no way of checking whether the roles in a given environment meet minimum requirements.

Enable deployment of Nova Compute hosts to provide virtual machines

Allow deployment of a Nova Compute node to act as a virtual machine host to allow deployment of standard virtual machines in the same region (and connected to the same network) as ironic bare metal hosts.

stackhpc / kayobe-original Goto Github PK

kayobe-original's Introduction

Kayobe

kayobe-original's People

Contributors

Stargazers

Watchers

Forkers

kayobe-original's Issues

Overview

Overview

Steps to reproduce

Expected results

Actual results

Steps to reproduce

Expected results

Actual results

Overview

Steps to reproduce

Expected results

Actual results

Environment

Overview

Enhancement

Overview

Steps to reproduce

Expected results

Actual results

Environment

Overview

User Stories

Proposal

User Story 1

User Story 2

User Story 3

User story 4

Recommend Projects

Recommend Topics

Recommend Org