Giter Site home page Giter Site logo

tripleo-quickstart's People

Contributors

apevec avatar cubeek avatar dhoppe avatar dmsimard avatar dtantsur avatar frac avatar harryrybacki avatar judge-red avatar kambiz-aghaiepour avatar larsks avatar mandre avatar paramite avatar ryansb avatar sshnaidm avatar tosky avatar weshayutin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tripleo-quickstart's Issues

Default undercloud control plane network violates rfc5737

The default control plan is 192.0.2.0/24 which is not supposedto be used, it is only for documentation purposes according to rfc5737:

https://tools.ietf.org/html/rfc5737

"Addresses within the TEST-NET-1, TEST-NET-2, and TEST-NET-3 blocks
SHOULD NOT appear on the public Internet and are used without any
coordination with IANA or an Internet registry [RFC2050]. Network
operators SHOULD add these address blocks to the list of non-
routeable address spaces, and if packet filters are deployed, then
this address block SHOULD be added to packet filters.

These blocks are not for local use, and the filters may be used in
both local and public contexts."

It breaks tools that enforce the filters.

Initial run fails because of missing gcc dependency

Clean setup of Centos 7, first time run fails without gcc present in system:
Running setup.py install for pycrypto [528/1718]
checking for gcc... no
checking for cc... no
checking for cl.exe... no
configure: error: in /root/.quickstart/build/pycrypto': configure: error: no acceptable C compiler found in $PATH Seeconfig.log' for more details
Traceback (most recent call last):
File "", line 1, in
File "/root/.quickstart/build/pycrypto/setup.py", line 456, in
core.setup(**kw)
File "/usr/lib64/python2.7/distutils/core.py", line 152, in setup
dist.run_commands()
File "/usr/lib64/python2.7/distutils/dist.py", line 953, in run_commands
self.run_command(cmd)
File "/usr/lib64/python2.7/distutils/dist.py", line 972, in run_command
cmd_obj.run()
File "/root/.quickstart/lib/python2.7/site-packages/setuptools/command/install.py", line 53, in run
return _install.run(self)
File "/usr/lib64/python2.7/distutils/command/install.py", line 563, in run
self.run_command('build')
File "/usr/lib64/python2.7/distutils/cmd.py", line 326, in run_command
self.distribution.run_command(command)
File "/usr/lib64/python2.7/distutils/dist.py", line 972, in run_command
cmd_obj.run()
File "/usr/lib64/python2.7/distutils/command/build.py", line 127, in run
self.run_command(cmd_name)
File "/usr/lib64/python2.7/distutils/cmd.py", line 326, in run_command
self.distribution.run_command(command)
File "/usr/lib64/python2.7/distutils/dist.py", line 972, in run_command
cmd_obj.run()
File "/root/.quickstart/build/pycrypto/setup.py", line 251, in run
self.run_command(cmd_name)
File "/usr/lib64/python2.7/distutils/cmd.py", line 326, in run_command
self.distribution.run_command(command)
File "/usr/lib64/python2.7/distutils/dist.py", line 972, in run_command
cmd_obj.run()
File "/root/.quickstart/build/pycrypto/setup.py", line 278, in run
raise RuntimeError("autoconf error")
RuntimeError: autoconf error
Complete output from command /root/.quickstart/bin/python -c "import setuptools;file='/root/.quickstart/build/pycrypto/setup.py';exec(compile(open(file).read().replace('\r
\n', '\n'), file, 'exec'))" install --record /tmp/pip-kFPiwm-record/install-record.txt --single-version-externally-managed --install-headers /root/.quickstart/include/site/python2
.7:
running install
untimeError: autoconf error
Complete output from command /root/.quickstart/bin/python -c "import setuptools;file='/root/.quickstart/build/pycrypto/setup.py';exec(compile(open(file).read().replace('\r
\n', '\n'), file, 'exec'))" install --record /tmp/pip-kFPiwm-record/install-record.txt --single-version-externally-managed --install-headers /root/.quickstart/include/site/python2
.7:
running install
running build
running build_py
creating build
...
running build_ext
running build_configure
checking for gcc... no
checking for cc... no
checking for cl.exe... no
configure: error: in /root/.quickstart/build/pycrypto': configure: error: no acceptable C compiler found in $PATH Seeconfig.log' for more details
Traceback (most recent call last):
File "", line 1, in
File "/root/.quickstart/build/pycrypto/setup.py", line 456, in
core.setup(**kw)
File "/usr/lib64/python2.7/distutils/core.py", line 152, in setup
dist.run_commands()
File "/usr/lib64/python2.7/distutils/dist.py", line 953, in run_commands
self.run_command(cmd)
File "/usr/lib64/python2.7/distutils/dist.py", line 972, in run_command
cmd_obj.run()
File "/root/.quickstart/lib/python2.7/site-packages/setuptools/command/install.py", line 53, in run
return _install.run(self)
File "/usr/lib64/python2.7/distutils/command/install.py", line 563, in run
self.run_command('build')
File "/usr/lib64/python2.7/distutils/cmd.py", line 326, in run_command
self.distribution.run_command(command)
File "/usr/lib64/python2.7/distutils/dist.py", line 972, in run_command
cmd_obj.run()
File "/usr/lib64/python2.7/distutils/command/build.py", line 127, in run
self.run_command(cmd_name)
File "/usr/lib64/python2.7/distutils/cmd.py", line 326, in run_command
self.distribution.run_command(command)
File "/usr/lib64/python2.7/distutils/dist.py", line 972, in run_command
cmd_obj.run()
File "/root/.quickstart/build/pycrypto/setup.py", line 251, in run
self.run_command(cmd_name)
File "/usr/lib64/python2.7/distutils/cmd.py", line 326, in run_command
self.distribution.run_command(command)
File "/usr/lib64/python2.7/distutils/dist.py", line 972, in run_command
cmd_obj.run()
File "/root/.quickstart/build/pycrypto/setup.py", line 278, in run
raise RuntimeError("autoconf error")

RuntimeError: autoconf error

Cleaning up...
...

$ yum install gcc # Problem fixed, initial run OK

OC deployment fails as controller has only 4GB RAM by default

Baremetal machine with 64GB RAM, Centos 7, tripleo-quickstart ([minimal] virthost) with defaults, following guide https://github.com/redhat-openstack/tripleo-quickstart:
...
$ undercloud-install.sh
$ undercloud-post-install.sh
$ overcloud-deploy.sh # FAILED
...
Mar 10 15:45:04 overcloud-controller-0 os-collect-config[1951]: No hosts in Heat, nothing written.
Mar 10 15:45:04 overcloud-controller-0 os-collect-config[1951]: dib-run-parts Thu Mar 10 15:45:04 UTC 2016 51-hosts completed
Mar 10 15:45:04 overcloud-controller-0 os-collect-config[1951]: dib-run-parts Thu Mar 10 15:45:04 UTC 2016 Running /usr/libexec/os-refresh-config/configure.d/55-heat-config
Mar 10 15:45:04 overcloud-controller-0 os-collect-config[1951]: dib-run-parts Thu Mar 10 15:45:04 UTC 2016 55-heat-config completed
Mar 10 15:45:04 overcloud-controller-0 os-collect-config[1951]: ----------------------- PROFILING -----------------------
Mar 10 15:45:04 overcloud-controller-0 os-collect-config[1951]: Target: configure.d
Mar 10 15:45:04 overcloud-controller-0 os-collect-config[1951]: Script Seconds
Mar 10 15:45:04 overcloud-controller-0 os-collect-config[1951]: --------------------------------------- ----------
Mar 10 15:45:04 overcloud-controller-0 os-collect-config[1951]: 10-sysctl-apply-config 0.209
Mar 10 15:45:04 overcloud-controller-0 os-collect-config[1951]: 20-os-apply-config 0.183
...skipping...
2016-03-10 15:55:32 [0]: CREATE_IN_PROGRESS state changed [0/9923]
2016-03-10 15:55:32 [0]: SIGNAL_COMPLETE Unknown
2016-03-10 15:55:33 [ControllerDeployment]: SIGNAL_COMPLETE Unknown
2016-03-10 15:55:33 [0]: SIGNAL_COMPLETE Unknown
2016-03-10 15:55:33 [ControllerDeployment]: SIGNAL_COMPLETE Unknown
2016-03-10 15:55:34 [0]: SIGNAL_COMPLETE Unknown
2016-03-10 15:55:34 [NetworkDeployment]: SIGNAL_COMPLETE Unknown
2016-03-10 15:55:35 [0]: SIGNAL_COMPLETE Unknown
2016-03-10 15:56:49 [0]: SIGNAL_IN_PROGRESS Signal: deployment failed (6)
2016-03-10 15:56:49 [0]: CREATE_FAILED Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 6
2016-03-10 15:56:50 [overcloud-ControllerNodesPostDeployment-3mu4j4fs7ybn-ControllerOvercloudServicesDeployment_Step6-jruwoxkxdm32]: CREATE_FAILED Resource CREATE failed: Error: resources[0]: Deploym
ent to server failed: deploy_status_code : Deployment exited with non-zero status code: 6
2016-03-10 15:56:51 [0]: SIGNAL_COMPLETE Unknown
2016-03-10 15:56:51 [ControllerOvercloudServicesDeployment_Step6]: CREATE_FAILED Error: resources.ControllerOvercloudServicesDeployment_Step6.resources[0]: Deployment to server failed: deploy_status_
code: Deployment exited with non-zero status code: 6
2016-03-10 15:56:51 [0]: SIGNAL_COMPLETE Unknown
2016-03-10 15:56:52 [ControllerNodesPostDeployment]: CREATE_FAILED Error: resources.ControllerNodesPostDeployment.resources.ControllerOvercloudServicesDeployment_Step6.resources[0]: Deployment to ser
ver failed: deploy_status_code: Deployment exited with non-zero status code: 6
2016-03-10 15:56:52 [overcloud-ControllerNodesPostDeployment-3mu4j4fs7ybn]: CREATE_FAILED Resource CREATE failed: Error: resources.ControllerOvercloudServicesDeployment_Step6.resources[0]: Deployment
to server failed: deploy_status_code: Deployment exited with non-zero status code: 6
2016-03-10 15:56:52 [0]: SIGNAL_COMPLETE Unknown
2016-03-10 15:56:52 [0]: SIGNAL_COMPLETE Unknown
2016-03-10 15:56:53 [overcloud]: CREATE_FAILED Resource CREATE failed: Error: resources.ControllerNodesPostDeployment.resources.ControllerOvercloudServicesDeployment_Step6.resources[0]: Deployment to
server failed: deploy_status_code: Deployment exited with non-zero status code: 6
2016-03-10 15:56:53 [0]: SIGNAL_COMPLETE Unknown
2016-03-10 15:56:53 [ControllerDeployment]: SIGNAL_COMPLETE Unknown
2016-03-10 15:56:54 [NetworkDeployment]: SIGNAL_COMPLETE Unknown
2016-03-10 15:56:54 [0]: SIGNAL_COMPLETE Unknown
Stack overcloud CREATE_FAILED
Deployment failed: Heat Stack create failed.

$ heat resource-list overcloud | grep -i failed
| ControllerNodesPostDeployment | a08b472c-ffe0-40c7-a029-073a4b5df79e | OS::TripleO::ControllerPostDeployment | CREATE_FAILED | 2016-03-10T15:41:24 |
heat resource-show overcloud ControllerNodesPostDeployment
+------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Property | Value |
+------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| attributes | {} |
| creation_time | 2016-03-10T15:41:24 |
| description | |
| links | http://192.0.2.1:8004/v1/b592cbcada2946c8881feecd8b9b30a5/stacks/overcloud/51bb82b1-07d7-4222-8739-fec5effaafce/resources/ControllerNodesPostDeployment (self) |
| | http://192.0.2.1:8004/v1/b592cbcada2946c8881feecd8b9b30a5/stacks/overcloud/51bb82b1-07d7-4222-8739-fec5effaafce (stack) |
| | http://192.0.2.1:8004/v1/b592cbcada2946c8881feecd8b9b30a5/stacks/overcloud-ControllerNodesPostDeployment-3mu4j4fs7ybn/a08b472c-ffe0-40c7-a029-073a4b5df79e (nested) |
| logical_resource_id | ControllerNodesPostDeployment |
| physical_resource_id | a08b472c-ffe0-40c7-a029-073a4b5df79e |
| required_by | BlockStorageNodesPostDeployment |
| | CephStorageNodesPostDeployment |
| resource_name | ControllerNodesPostDeployment |
| resource_status | CREATE_FAILED |
| resource_status_reason | Error: resources.ControllerNodesPostDeployment.resources.ControllerOvercloudServicesDeployment_Step6.resources[0]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 6 |
| resource_type | OS::TripleO::ControllerPostDeployment |
| updated_time | 2016-03-10T15:41:24 |
+------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

$ ssh heat-admin@
$ sudo journalctl -u os-collect-config # and near the end:
Mar 10 15:56:48 overcloud-controller-0 os-collect-config[1951]: Warning: Scope(Class[Ceilometer::Api]): The keystone_identity_uri parameter is deprecated. Please use identity_uri instead.
Mar 10 15:56:48 overcloud-controller-0 os-collect-config[1951]: Warning: Scope(Haproxy::Config[haproxy]): haproxy: The $merge_options parameter will default to true in the next major release. Please
Mar 10 15:56:48 overcloud-controller-0 os-collect-config[1951]: Error: /Stage[main]/Apache::Service/Service[httpd]: Could not evaluate: Cannot allocate memory - fork(2)
Mar 10 15:56:48 overcloud-controller-0 os-collect-config[1951]: Warning: /Stage[main]/Keystone/Anchor[keystone_started]: Skipping because of failed dependencies
Mar 10 15:56:48 overcloud-controller-0 os-collect-config[1951]: Error: Could not prefetch keystone_tenant provider 'openstack': Cannot allocate memory - fork(2)
Mar 10 15:56:48 overcloud-controller-0 os-collect-config[1951]: Warning: /Stage[main]/Keystone::Roles::Admin/Keystone_tenant[service]: Skipping because of failed dependencies
Mar 10 15:56:48 overcloud-controller-0 os-collect-config[1951]: Warning: /Stage[main]/Keystone::Roles::Admin/Keystone_tenant[admin]: Skipping because of failed dependencies
Mar 10 15:56:48 overcloud-controller-0 os-collect-config[1951]: Error: Could not prefetch keystone_role provider 'openstack': Cannot allocate memory - fork(2)
Mar 10 15:56:48 overcloud-controller-0 os-collect-config[1951]: Warning: /Stage[main]/Keystone::Roles::Admin/Keystone_role[admin]: Skipping because of failed dependencies
Mar 10 15:56:48 overcloud-controller-0 os-collect-config[1951]: Error: Could not prefetch keystone_user provider 'openstack': Cannot allocate memory - fork(2)
Mar 10 15:56:48 overcloud-controller-0 os-collect-config[1951]: Warning: /Stage[main]/Keystone::Roles::Admin/Keystone_user[admin]: Skipping because of failed dependencies
Mar 10 15:56:48 overcloud-controller-0 os-collect-config[1951]: Warning: /Stage[main]/Keystone::Roles::Admin/Keystone_user_role[admin@admin]: Skipping because of failed dependencies
Mar 10 15:56:48 overcloud-controller-0 os-collect-config[1951]: Error: Could not prefetch keystone_service provider 'openstack': Cannot allocate memory - fork(2)
Mar 10 15:56:48 overcloud-controller-0 os-collect-config[1951]: Warning: /Stage[main]/Keystone::Endpoint/Keystone::Resource::Service_identity[keystone]/Keystone_service[keystone::identity]: Skipping
Mar 10 15:56:48 overcloud-controller-0 os-collect-config[1951]: Error: Could not prefetch keystone_endpoint provider 'openstack': Cannot allocate memory - fork(2)
Mar 10 15:56:48 overcloud-controller-0 os-collect-config[1951]: Warning: /Stage[main]/Keystone::Endpoint/Keystone::Resource::Service_identity[keystone]/Keystone_endpoint[regionOne/keystone::identity]
Mar 10 15:56:48 overcloud-controller-0 os-collect-config[1951]: Error: /Stage[main]/Heat::Api_cfn/Service[heat-api-cfn]: Could not evaluate: Cannot allocate memory - fork(2)
Mar 10 15:56:48 overcloud-controller-0 os-collect-config[1951]: Error: /Stage[main]/Heat::Engine/Service[heat-engine]: Could not evaluate: Cannot allocate memory - fork(2)
Mar 10 15:56:48 overcloud-controller-0 os-collect-config[1951]: Error: /Stage[main]/Heat::Api/Service[heat-api]: Could not evaluate: Cannot allocate memory - fork(2)
Mar 10 15:56:48 overcloud-controller-0 os-collect-config[1951]: Error: /Stage[main]/Heat::Api_cloudwatch/Service[heat-api-cloudwatch]: Could not evaluate: Cannot allocate memory - fork(2)

It looks like there is not enough memory for minimal virthost OC controller set up.

Introspection is not being run in CI

In the refactor of the CI playbooks, introspection was inadvertently dropped from the nonHA job. This allowed the IPA ramdisk we build in CI to stop working for introspection. (I am pretty sure we are just missing the python-hardware dependency)

We need to make sure to include introspection on at least one job.

quickstart undercloud installation issue

~

  • export ANSIBLE_CONFIG=/home/stack/.quickstart/usr/local/share/tripleo-quickstart/ansible.cfg
  • ANSIBLE_CONFIG=/home/stack/.quickstart/usr/local/share/tripleo-quickstart/ansible.cfg
  • export ANSIBLE_INVENTORY=/home/stack/.quickstart/hosts
  • ANSIBLE_INVENTORY=/home/stack/.quickstart/hosts
  • echo 'ssh_args = -F /home/stack/.quickstart/ssh.config.ansible'
  • RELEASE=mitaka
  • UNDERCLOUD_QCOW2_LOCATION=file:///cloud/vms/images/undercloud.qcow2
  • ansible-playbook -vv /home/stack/.quickstart/usr/local/share/tripleo-quickstart/playbooks/quickstart.yml --extra-vars url=file:///cloud/vms/images/undercloud.qcow2
    Using /home/stack/.quickstart/usr/local/share/tripleo-quickstart/ansible.cfg as config file
    [WARNING]: provided hosts list is empty, only localhost is available

3 plays in /home/stack/.quickstart/usr/local/share/tripleo-quickstart/playbooks/quickstart.yml

PLAY [Add virthost to inventory] ***********************************************

TASK [provision/manual : Create working_dir] ***********************************
ok: [localhost] => {"changed": false, "gid": 1000, "group": "stack", "mode": "0775", "owner": "stack", "path": "/home/stack/.quickstart", "secontext": "unconfined_u:object_r:user_home_t:s0", "size": 87, "state": "directory", "uid": 1000}

TASK [provision/manual : Create empty ssh config file] *************************
changed: [localhost] => {"changed": true, "dest": "/home/stack/.quickstart/ssh.config.ansible", "gid": 1000, "group": "stack", "mode": "0664", "owner": "stack", "secontext": "unconfined_u:object_r:user_home_t:s0", "size": 0, "state": "file", "uid": 1000}

TASK [provision/manual : Add the virthost to the inventory] ********************
creating host via 'add_host': hostname=host0
changed: [localhost] => {"add_host": {"groups": ["virthost"], "host_name": "host0", "host_vars": {"ansible_fqdn": "*****.ualberta.ca", "ansible_ssh_host": "cirrus.nic.ualberta.ca", "ansible_ssh_private_key_file": "/home/stack/.ssh/id_rsa", "ansible_ssh_user": "stack", "local_working_dir": "/home/stack/.quickstart"}}, "changed": true}

TASK [rebuild-inventory : rebuild-inventory] ***********************************
changed: [localhost] => {"changed": true, "checksum": "7f041b7ad7074f3700330a48a4c52b8282218255", "dest": "/home/stack/.quickstart/hosts", "gid": 1000, "group": "stack", "md5sum": "e64ff5b25e4eba5bc4955f6682684318", "mode": "0664", "owner": "stack", "secontext": "unconfined_u:object_r:user_home_t:s0", "size": 176, "src": "/home/stack/.ansible/tmp/ansible-tmp-1454365142.58-185300564260252/source", "state": "file", "uid": 1000}

PLAY [Setup undercloud and baremetal vms and networks in libvirt] **************

TASK [teardown/check : check if libvirt is running] ****************************
ok: [host0] => {"changed": false, "cmd": "rpm -qa libvirt && systemctl status libvirtd", "delta": "0:00:01.206875", "end": "2016-02-01 15:19:05.154115", "failed": false, "failed_when_result": false, "rc": 0, "start": "2016-02-01 15:19:03.947240", "stderr": "", "stdout": "libvirt-1.2.17-13.el7_2.2.x86_64\n● libvirtd.service - Virtualization daemon\n Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled; vendor preset: enabled)\n Active: active (running) since Mon 2016-01-25 14:16:44 MST; 1 weeks 0 days ago\n Docs: man:libvirtd(8)\n http://libvirt.org\n Main PID: 3339 (libvirtd)\n CGroup: /system.slice/libvirtd.service\n ├─3143 /sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro --dhcp-script=/usr/libexec/libvirt_leaseshelper\n ├─3144 /sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro --dhcp-script=/usr/libexec/libvirt_leaseshelper\n └─3339 /usr/sbin/libvirtd\n\nJan 26 15:27:47 cirrus.nic.ualberta.ca libvirtd[3339]: Received unexpected event 3\nJan 26 15:27:47 cirrus.nic.ualberta.ca libvirtd[3339]: Received unexpected event 3\nJan 26 15:29:49 cirrus.nic.ualberta.ca libvirtd[3339]: Received unexpected event 3\nJan 26 15:29:49 ****.ualberta.ca libvirtd[3339]: Received unexpected event 3\nJan 26 15:37:22 ***.ualberta.ca libvirtd[3339]: Received unexpected event 3\nJan 26 15:37:22 ***.ualberta.ca libvirtd[3339]: Received unexpected event 3\nFeb 01 13:45:58 cirrus.nic.ualberta.ca libvirtd[3339]: Received unexpected event 3\nFeb 01 13:45:58 ***__.ualberta.ca libvirtd[3339]: Received unexpected event 3\nFeb 01 13:46:59 *.ualberta.ca libvirtd[3339]: Received unexpected event 3\nFeb 01 13:46:59 ***__.ualberta.ca libvirtd[3339]: Received unexpected event 3", "stdout_lines": ["libvirt-1.2.17-13.el7_2.2.x86_64", "● libvirtd.service - Virtualization daemon", " Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled; vendor preset: enabled)", " Active: active (running) since Mon 2016-01-25 14:16:44 MST; 1 weeks 0 days ago", " Docs: man:libvirtd(8)", " http://libvirt.org", " Main PID: 3339 (libvirtd)", " CGroup: /system.slice/libvirtd.service", " ├─3143 /sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro --dhcp-script=/usr/libexec/libvirt_leaseshelper", " ├─3144 /sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro --dhcp-script=/usr/libexec/libvirt_leaseshelper", " └─3339 /usr/sbin/libvirtd", "", "Jan 26 15:27:47 *.ualberta.ca libvirtd[3339]: Received unexpected event 3", "Jan 26 15:27:47 ***__.ualberta.ca libvirtd[3339]: Received unexpected event 3", "Jan 26 15:29:49 cirrus.nic.ualberta.ca libvirtd[3339]: Received unexpected event 3", "Jan 26 15:29:49 ***.ualberta.ca libvirtd[3339]: Received unexpected event 3", "Jan 26 15:37:22 ***.ualberta.ca libvirtd[3339]: Received unexpected event 3"], "warnings": ["Consider using yum module rather than running rpm"]}
[WARNING]: Consider using yum module rather than running rpm

TASK [teardown/networks : Delete external network] *****************************
ok: [host0] => {"changed": false}

TASK [teardown/networks : Delete the OVS bridges] ******************************
failed: [host0] => (item={u'name': u'bridget'}) => {"cmd": "ovs-vsctl -t 5 br-exists bridget", "failed": true, "item": {"name": "bridget"}, "msg": "[Errno 2] No such file or directory", "rc": 2}

PLAY RECAP *********************************************************************
host0 : ok=2 changed=0 unreachable=0 failed=1
localhost : ok=4 changed=3 unreachable=0 failed=0

Add a ci test for teardown operations

We're currently running teardown tasks against a "clean" system, so there's nothing to tear down. We should have job that:

  • Gets as far as starting the undercloud
  • Cleans everything up (libvirt/teardown and envrionment/teardown)
  • Re-provisions and boots the undercloud

image building: decouple I.B. playbook setup/config from quickstart proper

Presently the image building playbooks are reusing too much of the libvirt / networking setup & config from the quickstart roles. For example there's no need to set up overcloud network(s) to build the images. In addition it makes the build image workflows prone to breaking on setup/config, for reasons unrelated to image building. Finally it increases the amount of time needed to build an image. As we explore ways to leverage tripleo-quickstart's appliance based undercloud in (potentially) in more projects for CI, being able to quickly, simply, and cleanly build images will increasingly become important. I've got some deltas locally to address this, testing now. Planning to post a review shortly.

Use more tags

Need to make better use of task tagging throughout the oooq roles.

Interrupting quickstart.sh results in erroneous output

If one interrupts quickstart.sh via ^C, it will erroneously display the help output re: how to log in to the undercloud and continue the deploy. Need to figure out why the ^C is resulting in failure code from ansible-playbook (which would then cause the script to exit because of our set -e).

Thinking about the tripleo-quickstart workflow

I was thinking about workflow today and this, and in particular:

The stuff in green in that diagram should be runnable as a completely unprivileged user. Ideally, if you have a host that is already virt capable and a couple of configured bridges you could just run the playbooks against localhost with a local connection and end up with a fully deployed overcloud.

I am interested in opinions on the above, and in particular whether or not the provision role as described makes sense, and whether it needs an unprovision counterpart.

Running with virthost=localhost may lead to sadness and doom

If someone tries running quickstart against an inventory file like this:

[virthost]
localhost ansible_user=admin

And they are not logged in as the admin user where they are
running the quickstart, confusion will result: because the playbooks
assume that delegate_to: localhost will run as the current user,
but in this situation that isn't true, and they will get permission
errors when tasks try to write to /home/currentuser/.quickstart.

The solution right now is "don't target localhost as a different user".

Requires jinja2 2.8 which is not packaged in CentOS

There is (at least) one dependency on a new jinja2.8 functionality, which leads to the following error when the system jinja2 is used (python-jinja2-2.7.2-2.el7.noarch):

TASK [setup/overcloud_nodes : define baremetal vms] ****************************
Thursday 10 March 2016 13:42:25 +0200 (0:00:03.792) 0:00:43.142 ********
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: TemplateRuntimeError: no test named 'equalto'
fatal: [host0]: FAILED! => {"failed": true, "stdout": ""}

It would be nice if the playbooks could work with the system-provided jinja2.

Baremetal support

Let's track here the work needed to get initial baremetal support going:

  • Focus initially on a virtual undercloud deployed in a VM which will deploy baremetal nodes
  • Need to make the instackenv.json file overridable, so it can be passed to quickstart.sh
  • Networking will need to be customizable

undercloud.qcow2.md5 should not contain absolute path

As documented at [1], it's a good idea to verify the md5sum of the downloaded undercloud image. However, when executing md5sum -c undercloud.qcow2.md5, the check will fail:

[root@rdo mitaka]# md5sum -c undercloud.qcow2.md5
md5sum: /tmp/oooq-images/undercloud.qcow2: No such file or directory
/tmp/oooq-images/undercloud.qcow2: FAILED open or read

This is, because the undercloud.qcow2.md5 file [2] includes an absolute path after the checksum:

6c7dc0e0b14b7b3deff0da284594d778 /tmp/oooq-images/undercloud.qcow2

Instead of just the file name as expected, like this:

6c7dc0e0b14b7b3deff0da284594d778 undercloud.qcow2


[1] https://www.rdoproject.org/rdo-manager/
[2] https://ci.centos.org/artifacts/rdo/images/mitaka/delorean/stable/undercloud.qcow2.md5

Cross-distribution support

I have a dream...that one day, tripleo-quickstart will be able to run somewhere besides EL7 distributions.

Anything impacting this topic should refer to this issue.

We should use the `package` module consistently

In order to make #15 not completely crazy we need to use the package module exclusively for package installation. We're still using yum in a few places (we're using both in the same role in at least one place).

Add a default .ssh/config on the undercloud

It would be quite nice to have a .ssh/config for the stack user on the undercloud so that hopping on the
nodes manually is a bit quicker for the senile amongst us:
Host 192.0.2.*
User heat-admin
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null

virsh list --all doesn't see any VM

Hello,
with recent clean deployment of tripleo-quickstart (73d543d) on Centos 7, I can not see any output on $VIRTHOST, something is really wrong here:

$ virsh list --all                                         
 Id    Name                           State
----------------------------------------------------

I can access undercloud, though:

$ ssh -F /root/.quickstart/ssh.config.ansible undercloud
# OK

Then I get errors in OC deployment stage:

exec openstack overcloud deploy -e /usr/share/openstack-tripleo-heat-templates/environments/puppet-pacemaker.yaml --control-scale 3 --compute-scale 2 --ntp-server pool.ntp.org --templates --libvirt-type qemu --control-flavor oooq_control
 --compute-flavor oooq_compute --ceph-storage-flavor oooq_ceph --timeout 60 -e /tmp/deploy_env.yaml
Error: only 1 of 3 requested ironic nodes are tagged to profile oooq_control (for flavor oooq_control)
Recommendation: tag more nodes using ironic node-update <NODE ID> replace properties/capabilities=profile:oooq_control,boot_option:local
Error: only 1 of 2 requested ironic nodes are tagged to profile oooq_compute (for flavor oooq_compute)
Recommendation: tag more nodes using ironic node-update <NODE ID> replace properties/capabilities=profile:oooq_compute,boot_option:local
Configuration has 2 errors, fix them before proceeding. Ignoring these errors is likely to lead to a failed deploy.
Deployment failed:  Not enough nodes - available: 2, requested: 5

ImportError: No module named markupsafe

Clean Centos 7 and tripleo-quickstart, it seems like missing python virtualenv dependency:

$ bash quickstart.sh $VIRTHOST
...

Traceback (most recent call last):
File "/root/.quickstart/bin/ansible-playbook", line 72, in
mycli = getattr(import("ansible.cli.%s" % sub, fromlist=[myclass]), myclass)
File "/root/.quickstart/lib/python2.7/site-packages/ansible/cli/playbook.py", line 30, in
from ansible.executor.playbook_executor import PlaybookExecutor
File "/root/.quickstart/lib/python2.7/site-packages/ansible/executor/playbook_executor.py", line 30, in
from ansible.executor.task_queue_manager import TaskQueueManager
File "/root/.quickstart/lib/python2.7/site-packages/ansible/executor/task_queue_manager.py", line 29, in
from ansible.executor.play_iterator import PlayIterator
File "/root/.quickstart/lib/python2.7/site-packages/ansible/executor/play_iterator.py", line 29, in
from ansible.playbook.block import Block
File "/root/.quickstart/lib/python2.7/site-packages/ansible/playbook/init.py", line 25, in
from ansible.playbook.play import Play
File "/root/.quickstart/lib/python2.7/site-packages/ansible/playbook/play.py", line 27, in
from ansible.playbook.base import Base
File "/root/.quickstart/lib/python2.7/site-packages/ansible/playbook/base.py", line 32, in
from jinja2.exceptions import UndefinedError
File "/root/.quickstart/lib/python2.7/site-packages/jinja2/init.py", line 33, in
from jinja2.environment import Environment, Template
File "/root/.quickstart/lib/python2.7/site-packages/jinja2/environment.py", line 13, in
from jinja2 import nodes
File "/root/.quickstart/lib/python2.7/site-packages/jinja2/nodes.py", line 19, in
from jinja2.utils import Markup
File "/root/.quickstart/lib/python2.7/site-packages/jinja2/utils.py", line 531, in
from markupsafe import Markup, escape, soft_unicode
ImportError: No module named markupsafe

$ source .quickstart/bin/activate
$ pip install markupsafe
$ deactivate
$ bash quickstart.sh $VIRTHOST # even with --system-site-packages

OK

improve documentation for tags

The addition of tags is super useful and totally the right thing to do (thanks!)

That said, to a oooq newcomer (we want these folks!) - from ./quickstart.sh --help, it's clear that we have tags, but we don't tell folks what they are. If they dig into the .sh, or walk through the playbooks (and know what to look for) that are "discoverable" - but it would make sense to have --help list the available tags. I realize that static text for --help causes a potential drift issue later on. IMHO what would be ideal is if the tags found in playbooks could be dynamically discovered and listed...but that's getting perhaps too elaborate and not pragmatically worth the effort. This issue is just to capture the idea, not infer a solution.

Somewhere between docs, --help, and [uber solution] we should be able to find something workable.

quickstart should be runnable without privileged access

Currently, the quickstart requires root privileges on the target host and makes a number of system configuration changes. This makes some folks (understandably) nervous and may inhibit the adoption of the tool.

Most of the privileged access is required to configure networking.

We ought to be able to complete just about everything through the use of qemu user mode networking. There are a few options:

  • An unprivileged user can attach to pre-configured host bridges through the use of the qemu-bridge-helper tool. With the packages shipped in RHEL/CentOS/Fedora/etc, this will Just Work with the virbr0 bridge created by the libvirt default network. We can take advantage of this to create inbound access into the virtual environment.
  • danpb suggested mcast networks. These are simple to setup, but getting them to work does require administrative access on the host to (a) enable multicasting on the loopback interface (ip set multicast on dev lo) and (b) to add the necessary routes so that the multicast traffic uses the lo interface.
  • An unprivileged user can create tunneled networks between guests. There are various options for this, but the best may end up being tcp tunnels. These are point-to-point networks between individual guests, so you need one interface per guest. In this model, we would probably treat the undercloud node as a "switch"; we would create a bridge in the undercloud for each virtual network, and then plug all the the other guests in as necessary.
  • qemu also supports "user" networks. With libvirt there can be only one. User networks provide outbound access from the guest to the host or to the internet, but do not provide for any inbound access.

I kind of like the tcp model.

virsh pool-undefine oooq_pool seems to fail from time to time

fatal: [host0]: FAILED! => {"changed": true, "cmd": ["virsh", "pool-undefine", "oooq_pool"], "d
elta": "0:00:00.029126", "end": "2016-03-15 18:23:19.744160", "failed": true, "invocation": {"m
odule_args": {"_raw_params": "virsh pool-undefine "oooq_pool"", "_uses_shell": false, "chdir"
: null, "creates": null, "executable": null, "removes": null, "warn": true}, "module_name": "co
mmand"}, "rc": 1, "start": "2016-03-15 18:23:19.715034", "stderr": "error: Failed to undefine p
ool oooq_pool\nerror: internal error: no config file for oooq_pool", "stdout": "", "stdout_line
s": [], "warnings": []}

A "rm -rf /run/user/1003/libvirt/" on the host, fixes it

Wrong undercloud post install script name in setup cummary

In setup summary at the end of the script, there are hints how to proceed further with deployment. One of script names mentioned is
"undercloud-install.post.sh will perform all pre-deploy steps"
while the actual name of the script is undercloud-post-install.sh

Install & run dstat

When running tripleo-quickstart in a CI environment, it can be useful to get live systems informations.
dstat is an excellent tool for that, OpenStack CI already uses it.

sudo dnf -y install dstat
sudo dstat -tcmndrylpg --top-cpu-adv --top-io-adv --nocolor | sudo tee --append /var/log/dstat.log > /dev/null &

Create an optional bootstrap playbook

The instructions have become a bit long with the need to pre-download the image. It would be good to codify these extra steps into a "bootstrap" playbook that is run if a --boostrap option is passed to the quickstart.sh.

Ideally we would want things to work with just:

export VIRTHOST='foo.example.com'
quickstart.sh --bootstrap

Do gate jobs need to perform an overcloud deploy?

The gate jobs take a looooong time to complete. Do they actually need to perform a full overcloud deploy? Can we have separate short-running jobs that only get as far as booting the undercloud node so that we can get faster feedback on changes (while still testing out the full overcloud deploy)?

image building: make logs / output more accessible and usable

Currently the output from the playbook that builds images is a bit of a land-mine. In the case of the output from DIB when it creates the overcloud image, it ends up being > 400,000 characters on a single line. This literally makes Atom, Emacs, PyCharm, Gedit, Kate, VIM, etc go into a death loop trying to parse the line. If one waits far longer than I have patience for (minutes) eventually you'll get back. Interim solutions are to not use the ansible output directly, rather passing it thru sed first to turn \n into \n, open the logfile that's captured, etc. It's a low priority issue, but the first time it happened to me I thought "Ouch...that hurt..." As this project catches on I think anything that increases friction for developers / newcomers should where possible be fixed. In addition, it would be pretty cool to have a nice summary of what was built, the source of the binaries, etc as part of basic output. I'm happy to propose deltas to address (next week at the earliest).

Undercloud post install fails if flavors already exist

In undercloud-install-post.sh, we attempt to avoid failures by deleting flavors before (re-)creating them, like this:

openstack flavor delete oooq_{{ name }} > /dev/null 2>&1 || true
openstack flavor create --id auto ... oooq_{{ name }}

But what actually happens is that the post-install script fails with:

+ openstack flavor create --id auto --ram 4096 --disk 49 --vcpus 1 oooq_control
Flavor with name oooq_control already exists. (HTTP 409) (Request-ID: req-55b34a97-f21d-41fd-b69d-e46829811e6c)

Because:

$ openstack flavor delete oooq_control
public endpoint for messaging service not found

This looks like it's due to a bug in python-zaqarclient.

Stuck task if no space on device

There is an issue when running task with virt-resize on downloaded image and no much space on root partition (as it's by default in CentOS/Fedora installation).
When no space is left on partition the task hangs and never finishes. I see 2 points here:

  1. Make possible to define vm.volume_path in playbooks. Maybe it's possible to pass it with --extra-var and "hash_behavior=merge" in ansible.cfg, which allows to merge vars and not overwrite them.
  2. To make timeout for the task, that fail it after a specific time to prevent hanging forever. Maybe it's worth to make timeout for every task by default.

When the undercloud vm is already running, quickstart fails

Just noticed that if the undercloud vm is already running (because of a previously failed job or for some whatever reason), the deployment bails out with:
TASK [setup/undercloud : start undercloud vm] **********************************
Thursday 03 March 2016 10:31:40 -0500 (0:00:00.516) 0:03:31.308 ********
failed: [host0] => (item={u'flavor': u'undercloud', u'name': u'undercloud'}) => {"failed": true, "item": {"flavor": "undercloud", "name": "undercloud"}, "msg": "Requested operation is not valid: domain is already running"}

Generate names in instackenv.json

TripleO understand "name" field for nodes in instackenv.json. It's much easier to refer to nodes by e.g. node-%d (the same as devstack btw) than by their UUID.

Need to set python_interpreter for roles other than rebuild_inventory

When running...

quickstart.sh localhost

The deploy fails with:

TASK [setup/undercloud : Generate ssh configuration] ***************************
Sunday 13 March 2016  10:52:20 -0400 (0:00:00.038)       0:21:41.223 ********** 
fatal: [host0 -> localhost]: FAILED! => {"changed": true, "failed": true, "msg": "Aborting, target uses selinux but python bindings (libselinux-python) aren't installed!"}

We may want to set ansible_python_interpreter at a higher level...either by restoring the group_vars/all.yml file, or by just passing it on the ansible-playbook command line via -e.

flavor disk size too small to deploy overcloud

Following the instructions at:
https://www.rdoproject.org/testday/mitaka/milestone3/
https://www.rdoproject.org/rdo-manager/
http://docs.openstack.org/developer/tripleo-docs/basic_deployment/basic_deployment_cli.html#upload-images
(on CentOS 7, without any special configuration), the overcloud deployment will fail when executing openstack overcloud deploy --templates.

The error given was:

2016-03-02 13:51:58 [Controller]: CREATE_IN_PROGRESS state changed
2016-03-02 13:52:03 [NovaCompute]: CREATE_FAILED ResourceInError: resources.NovaCompute: Went to status ERROR due to "Message: Unknown, Code: Unknown"
2016-03-02 13:52:04 [NovaCompute]: DELETE_IN_PROGRESS state changed
2016-03-02 13:52:07 [Controller]: CREATE_FAILED ResourceInError: resources.Controller: Went to status ERROR due to "Message: No valid host was found. There are not enough hosts available., Co
de: 500"
2016-03-02 13:52:07 [Controller]: DELETE_IN_PROGRESS state changed
2016-03-02 13:52:09 [NovaCompute]: DELETE_COMPLETE state changed
2016-03-02 13:52:11 [Controller]: DELETE_COMPLETE state changed

And in /var/log/nova/nova-scheduler.log:

2016-03-02 13:52:02.080 15034 DEBUG nova.scheduler.filters.disk_filter [req-032ace83-c834-4174-bac2-7c7cafde82e2 814e8c92f5af4aafbdf4b31c8fc23c2d 3b74005453a34fa5a06b02555b45a5a8 - - -] (undercloud, 305b7ed6-3c99-47a9-a201-837b86848f3c) ram:4096 disk:39936 io_ops:0 instances:0 does not have 40960 MB usable disk, it only has 39936.0 MB usable disk. host_passes /usr/lib/python2.7/s
ite-packages/nova/scheduler/filters/disk_filter.py:55
2016-03-02 13:52:02.080 15034 INFO nova.filters [req-032ace83-c834-4174-bac2-7c7cafde82e2 814e8c92f5af4aafbdf4b31c8fc23c2d 3b74005453a34fa5a06b02555b45a5a8 - - -] Filter DiskFilter returned 0 hosts

So the disks should be (slightly) bigger.

Better handling of persistent network configuration

Someone should be able to reboot their virtualization host without completely hosing their underlcoud. This means that we need to make network configuration, such as host bridges, persistent.

If we're setting everything up as libvirt networks we get this for free (yay!). If we're mucking about directly with things like brctl or ovs-vsctl we may be in for sadness and disappointment.

This intersects with both #18 and #10.

Where oh where have my anisble tags gone...

Because of rebasing and working in parallel on the same sets of files I think we may have lost some tags related to @trown's refactoring of quickstart.sh. With https://review.gerrithub.io/#/c/265712/24, running quickstart.yml finishes like this:

PLAY [Install undercloud and deploy overcloud] *********************************

PLAY RECAP *********************************************************************
host0                      : ok=77   changed=38   unreachable=0    failed=0
localhost                  : ok=6    changed=2    unreachable=0    failed=0

That is, after booting the undercloud, but before running any tasks on
the undercloud host. Then quickstart.sh says:

##################################
Virtual Environment Setup Complete
##################################

Access the undercloud by:

    ssh -F /home/lars/.quickstart/ssh.config.ansible undercloud

There are scripts in the home directory to continue the deploy:

    undercloud-install.sh will run the undercloud install
    undercloud-post-install.sh will perform all pre-deploy steps
    overcloud-deploy.sh will deploy the overcloud
    overcloud-deploy-post.sh will do any post-deploy configuration
    overcloud-validate.sh will run post-deploy validation

But in fact none of those scripts exist:

ssh -F /home/lars/.quickstart/ssh.config.ansible undercloud
Last login: Sat Mar 12 00:28:53 2016 from 192.168.23.1
[stack@undercloud ~]$ ls
instackenv.json                ironic-python-agent.kernel  overcloud-full.qcow2    undercloud.conf
ironic-python-agent.initramfs  overcloud-full.initrd       overcloud-full.vmlinuz
[stack@undercloud ~]$ 

We probably just need to put appropriate tags back into the tripleo/* roles. This is just here so I don't forget.

Script is not able to deal with switched parameters [notabug]

This is probably not a bug, but maybe suggestion for improvement in future:

According to documentation, one has to call script following way:
bash quickstart.sh -u $UNDERCLOUD_QCOW2_LOCATION $VIRTHOST

I know even help says:
quickstart.sh: usage: quickstart.sh [options] virthost [release]
...

But I was accidently able to issue:
bash quickstart.sh $VIRTHOST -u $UNDERCLOUD_QCOW2_LOCATION

which failed with error
TASK [setup/undercloud : get undercloud image expected checksum] ***************
task path: /root/.quickstart/tripleo-quickstart/playbooks/roles/libvirt/setup/undercloud/tasks/fetch_image.yml:14
Friday 11 March 2016 11:20:55 +0100 (0:00:00.176) 0:00:20.002 **********
fatal: [host0]: FAILED! => {"changed": true, "cmd": ["curl", "-sf", "https://ci.centos.org/artifacts/rdo/images/-u/delorean/stable/undercloud.qcow2.md5"], "delta": "0:00:00.518934", "
end": "2016-03-11 11:20:55.796695", "failed": true, "invocation": {"module_args": {"_raw_params": "curl -sf https://ci.centos.org/artifacts/rdo/images/-u/delorean/stable/undercloud.qcow2.md5", "_uses_shell": false, "chdir": null, "creates": null, "executable": null, "removes": null, "warn": true}, "module_name": "command"}, "rc": 22, "start": "2016-03-11 11:20:55.
277761", "stderr": "", "stdout": "", "stdout_lines": [], "warnings": ["Consider using get_url module rather than running curl"]}

I assume parameter url (with incorrect "-u" inside) was not propagated correctly from bash as he didn't exactly expect different param order.

Collect more logs in CI

We need to collect logs from the virthost itself for image building. We should probably not be depending on khaleesi for log collection and just fork what is there now into oooq.

image building: setup/config for DIB needs to be extendable/tweakable

This is an instance of: #18

The prep needed to run diskimage-builder on the virthost needs to be re-factored to allow for portability. The list of prerequisite packages needs to be changable. In addition presently it allows for installing one RPM. This needs to be extended via a script, or a list of packages + script, etc. For example images might be built for rpm/rhos/rhel, or some other packaging format. As this could be used in the context of CI for a particular big tent (or otherwise) OpenStack project, there might be some simple scripting necessary. Scenarios include:

  • pull changes from gerrit
  • run a release tool that handles repo/package configuration
  • install/start diagnostic tooling.

There is a proposed change, currently integrating feedback:

https://review.gerrithub.io/#/c/265857/

quickstart.sh shouldn't require privileges

With 6bd4368, the quickstart.sh script was transformed from something that can be run as a non-root user to something that must be run as root (or via sudo), because now it tries to install packages. This introduces a variety of complications:

  • If you're running ansible from a virtualenv, this just ruined your plans.
  • Is this going to result in populating your home directory with root-owned files?
  • Or is this going to put all the quickstart generated files in root's home directory? Neither of these two options is really a good idea.

I think putting these yum install commands in quickstart.sh was a move in the wrong direction. A better solution would be to error out with an appropriate error message if the commands aren't available.

We should be moving away from requiring additional privileges.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.