redhat-openstack / tripleo-quickstart Goto Github PK
View Code? Open in Web Editor NEWAnsible roles for setting up TripleO virtual environments and building images
Ansible roles for setting up TripleO virtual environments and building images
The default control plan is 192.0.2.0/24 which is not supposedto be used, it is only for documentation purposes according to rfc5737:
https://tools.ietf.org/html/rfc5737
"Addresses within the TEST-NET-1, TEST-NET-2, and TEST-NET-3 blocks
SHOULD NOT appear on the public Internet and are used without any
coordination with IANA or an Internet registry [RFC2050]. Network
operators SHOULD add these address blocks to the list of non-
routeable address spaces, and if packet filters are deployed, then
this address block SHOULD be added to packet filters.
These blocks are not for local use, and the filters may be used in
both local and public contexts."
It breaks tools that enforce the filters.
Clean setup of Centos 7, first time run fails without gcc present in system:
Running setup.py install for pycrypto [528/1718]
checking for gcc... no
checking for cc... no
checking for cl.exe... no
configure: error: in /root/.quickstart/build/pycrypto': configure: error: no acceptable C compiler found in $PATH See
config.log' for more details
Traceback (most recent call last):
File "", line 1, in
File "/root/.quickstart/build/pycrypto/setup.py", line 456, in
core.setup(**kw)
File "/usr/lib64/python2.7/distutils/core.py", line 152, in setup
dist.run_commands()
File "/usr/lib64/python2.7/distutils/dist.py", line 953, in run_commands
self.run_command(cmd)
File "/usr/lib64/python2.7/distutils/dist.py", line 972, in run_command
cmd_obj.run()
File "/root/.quickstart/lib/python2.7/site-packages/setuptools/command/install.py", line 53, in run
return _install.run(self)
File "/usr/lib64/python2.7/distutils/command/install.py", line 563, in run
self.run_command('build')
File "/usr/lib64/python2.7/distutils/cmd.py", line 326, in run_command
self.distribution.run_command(command)
File "/usr/lib64/python2.7/distutils/dist.py", line 972, in run_command
cmd_obj.run()
File "/usr/lib64/python2.7/distutils/command/build.py", line 127, in run
self.run_command(cmd_name)
File "/usr/lib64/python2.7/distutils/cmd.py", line 326, in run_command
self.distribution.run_command(command)
File "/usr/lib64/python2.7/distutils/dist.py", line 972, in run_command
cmd_obj.run()
File "/root/.quickstart/build/pycrypto/setup.py", line 251, in run
self.run_command(cmd_name)
File "/usr/lib64/python2.7/distutils/cmd.py", line 326, in run_command
self.distribution.run_command(command)
File "/usr/lib64/python2.7/distutils/dist.py", line 972, in run_command
cmd_obj.run()
File "/root/.quickstart/build/pycrypto/setup.py", line 278, in run
raise RuntimeError("autoconf error")
RuntimeError: autoconf error
Complete output from command /root/.quickstart/bin/python -c "import setuptools;file='/root/.quickstart/build/pycrypto/setup.py';exec(compile(open(file).read().replace('\r
\n', '\n'), file, 'exec'))" install --record /tmp/pip-kFPiwm-record/install-record.txt --single-version-externally-managed --install-headers /root/.quickstart/include/site/python2
.7:
running install
untimeError: autoconf error
Complete output from command /root/.quickstart/bin/python -c "import setuptools;file='/root/.quickstart/build/pycrypto/setup.py';exec(compile(open(file).read().replace('\r
\n', '\n'), file, 'exec'))" install --record /tmp/pip-kFPiwm-record/install-record.txt --single-version-externally-managed --install-headers /root/.quickstart/include/site/python2
.7:
running install
running build
running build_py
creating build
...
running build_ext
running build_configure
checking for gcc... no
checking for cc... no
checking for cl.exe... no
configure: error: in /root/.quickstart/build/pycrypto': configure: error: no acceptable C compiler found in $PATH See
config.log' for more details
Traceback (most recent call last):
File "", line 1, in
File "/root/.quickstart/build/pycrypto/setup.py", line 456, in
core.setup(**kw)
File "/usr/lib64/python2.7/distutils/core.py", line 152, in setup
dist.run_commands()
File "/usr/lib64/python2.7/distutils/dist.py", line 953, in run_commands
self.run_command(cmd)
File "/usr/lib64/python2.7/distutils/dist.py", line 972, in run_command
cmd_obj.run()
File "/root/.quickstart/lib/python2.7/site-packages/setuptools/command/install.py", line 53, in run
return _install.run(self)
File "/usr/lib64/python2.7/distutils/command/install.py", line 563, in run
self.run_command('build')
File "/usr/lib64/python2.7/distutils/cmd.py", line 326, in run_command
self.distribution.run_command(command)
File "/usr/lib64/python2.7/distutils/dist.py", line 972, in run_command
cmd_obj.run()
File "/usr/lib64/python2.7/distutils/command/build.py", line 127, in run
self.run_command(cmd_name)
File "/usr/lib64/python2.7/distutils/cmd.py", line 326, in run_command
self.distribution.run_command(command)
File "/usr/lib64/python2.7/distutils/dist.py", line 972, in run_command
cmd_obj.run()
File "/root/.quickstart/build/pycrypto/setup.py", line 251, in run
self.run_command(cmd_name)
File "/usr/lib64/python2.7/distutils/cmd.py", line 326, in run_command
self.distribution.run_command(command)
File "/usr/lib64/python2.7/distutils/dist.py", line 972, in run_command
cmd_obj.run()
File "/root/.quickstart/build/pycrypto/setup.py", line 278, in run
raise RuntimeError("autoconf error")
Cleaning up...
...
$ yum install gcc # Problem fixed, initial run OK
Baremetal machine with 64GB RAM, Centos 7, tripleo-quickstart ([minimal] virthost) with defaults, following guide https://github.com/redhat-openstack/tripleo-quickstart:
...
$ undercloud-install.sh
$ undercloud-post-install.sh
$ overcloud-deploy.sh # FAILED
...
Mar 10 15:45:04 overcloud-controller-0 os-collect-config[1951]: No hosts in Heat, nothing written.
Mar 10 15:45:04 overcloud-controller-0 os-collect-config[1951]: dib-run-parts Thu Mar 10 15:45:04 UTC 2016 51-hosts completed
Mar 10 15:45:04 overcloud-controller-0 os-collect-config[1951]: dib-run-parts Thu Mar 10 15:45:04 UTC 2016 Running /usr/libexec/os-refresh-config/configure.d/55-heat-config
Mar 10 15:45:04 overcloud-controller-0 os-collect-config[1951]: dib-run-parts Thu Mar 10 15:45:04 UTC 2016 55-heat-config completed
Mar 10 15:45:04 overcloud-controller-0 os-collect-config[1951]: ----------------------- PROFILING -----------------------
Mar 10 15:45:04 overcloud-controller-0 os-collect-config[1951]: Target: configure.d
Mar 10 15:45:04 overcloud-controller-0 os-collect-config[1951]: Script Seconds
Mar 10 15:45:04 overcloud-controller-0 os-collect-config[1951]: --------------------------------------- ----------
Mar 10 15:45:04 overcloud-controller-0 os-collect-config[1951]: 10-sysctl-apply-config 0.209
Mar 10 15:45:04 overcloud-controller-0 os-collect-config[1951]: 20-os-apply-config 0.183
...skipping...
2016-03-10 15:55:32 [0]: CREATE_IN_PROGRESS state changed [0/9923]
2016-03-10 15:55:32 [0]: SIGNAL_COMPLETE Unknown
2016-03-10 15:55:33 [ControllerDeployment]: SIGNAL_COMPLETE Unknown
2016-03-10 15:55:33 [0]: SIGNAL_COMPLETE Unknown
2016-03-10 15:55:33 [ControllerDeployment]: SIGNAL_COMPLETE Unknown
2016-03-10 15:55:34 [0]: SIGNAL_COMPLETE Unknown
2016-03-10 15:55:34 [NetworkDeployment]: SIGNAL_COMPLETE Unknown
2016-03-10 15:55:35 [0]: SIGNAL_COMPLETE Unknown
2016-03-10 15:56:49 [0]: SIGNAL_IN_PROGRESS Signal: deployment failed (6)
2016-03-10 15:56:49 [0]: CREATE_FAILED Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 6
2016-03-10 15:56:50 [overcloud-ControllerNodesPostDeployment-3mu4j4fs7ybn-ControllerOvercloudServicesDeployment_Step6-jruwoxkxdm32]: CREATE_FAILED Resource CREATE failed: Error: resources[0]: Deploym
ent to server failed: deploy_status_code : Deployment exited with non-zero status code: 6
2016-03-10 15:56:51 [0]: SIGNAL_COMPLETE Unknown
2016-03-10 15:56:51 [ControllerOvercloudServicesDeployment_Step6]: CREATE_FAILED Error: resources.ControllerOvercloudServicesDeployment_Step6.resources[0]: Deployment to server failed: deploy_status_
code: Deployment exited with non-zero status code: 6
2016-03-10 15:56:51 [0]: SIGNAL_COMPLETE Unknown
2016-03-10 15:56:52 [ControllerNodesPostDeployment]: CREATE_FAILED Error: resources.ControllerNodesPostDeployment.resources.ControllerOvercloudServicesDeployment_Step6.resources[0]: Deployment to ser
ver failed: deploy_status_code: Deployment exited with non-zero status code: 6
2016-03-10 15:56:52 [overcloud-ControllerNodesPostDeployment-3mu4j4fs7ybn]: CREATE_FAILED Resource CREATE failed: Error: resources.ControllerOvercloudServicesDeployment_Step6.resources[0]: Deployment
to server failed: deploy_status_code: Deployment exited with non-zero status code: 6
2016-03-10 15:56:52 [0]: SIGNAL_COMPLETE Unknown
2016-03-10 15:56:52 [0]: SIGNAL_COMPLETE Unknown
2016-03-10 15:56:53 [overcloud]: CREATE_FAILED Resource CREATE failed: Error: resources.ControllerNodesPostDeployment.resources.ControllerOvercloudServicesDeployment_Step6.resources[0]: Deployment to
server failed: deploy_status_code: Deployment exited with non-zero status code: 6
2016-03-10 15:56:53 [0]: SIGNAL_COMPLETE Unknown
2016-03-10 15:56:53 [ControllerDeployment]: SIGNAL_COMPLETE Unknown
2016-03-10 15:56:54 [NetworkDeployment]: SIGNAL_COMPLETE Unknown
2016-03-10 15:56:54 [0]: SIGNAL_COMPLETE Unknown
Stack overcloud CREATE_FAILED
Deployment failed: Heat Stack create failed.
$ heat resource-list overcloud | grep -i failed
| ControllerNodesPostDeployment | a08b472c-ffe0-40c7-a029-073a4b5df79e | OS::TripleO::ControllerPostDeployment | CREATE_FAILED | 2016-03-10T15:41:24 |
heat resource-show overcloud ControllerNodesPostDeployment
+------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Property | Value |
+------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| attributes | {} |
| creation_time | 2016-03-10T15:41:24 |
| description | |
| links | http://192.0.2.1:8004/v1/b592cbcada2946c8881feecd8b9b30a5/stacks/overcloud/51bb82b1-07d7-4222-8739-fec5effaafce/resources/ControllerNodesPostDeployment (self) |
| | http://192.0.2.1:8004/v1/b592cbcada2946c8881feecd8b9b30a5/stacks/overcloud/51bb82b1-07d7-4222-8739-fec5effaafce (stack) |
| | http://192.0.2.1:8004/v1/b592cbcada2946c8881feecd8b9b30a5/stacks/overcloud-ControllerNodesPostDeployment-3mu4j4fs7ybn/a08b472c-ffe0-40c7-a029-073a4b5df79e (nested) |
| logical_resource_id | ControllerNodesPostDeployment |
| physical_resource_id | a08b472c-ffe0-40c7-a029-073a4b5df79e |
| required_by | BlockStorageNodesPostDeployment |
| | CephStorageNodesPostDeployment |
| resource_name | ControllerNodesPostDeployment |
| resource_status | CREATE_FAILED |
| resource_status_reason | Error: resources.ControllerNodesPostDeployment.resources.ControllerOvercloudServicesDeployment_Step6.resources[0]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 6 |
| resource_type | OS::TripleO::ControllerPostDeployment |
| updated_time | 2016-03-10T15:41:24 |
+------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
$ ssh heat-admin@
$ sudo journalctl -u os-collect-config # and near the end:
Mar 10 15:56:48 overcloud-controller-0 os-collect-config[1951]: Warning: Scope(Class[Ceilometer::Api]): The keystone_identity_uri parameter is deprecated. Please use identity_uri instead.
Mar 10 15:56:48 overcloud-controller-0 os-collect-config[1951]: Warning: Scope(Haproxy::Config[haproxy]): haproxy: The $merge_options parameter will default to true in the next major release. Please
Mar 10 15:56:48 overcloud-controller-0 os-collect-config[1951]: Error: /Stage[main]/Apache::Service/Service[httpd]: Could not evaluate: Cannot allocate memory - fork(2)
Mar 10 15:56:48 overcloud-controller-0 os-collect-config[1951]: Warning: /Stage[main]/Keystone/Anchor[keystone_started]: Skipping because of failed dependencies
Mar 10 15:56:48 overcloud-controller-0 os-collect-config[1951]: Error: Could not prefetch keystone_tenant provider 'openstack': Cannot allocate memory - fork(2)
Mar 10 15:56:48 overcloud-controller-0 os-collect-config[1951]: Warning: /Stage[main]/Keystone::Roles::Admin/Keystone_tenant[service]: Skipping because of failed dependencies
Mar 10 15:56:48 overcloud-controller-0 os-collect-config[1951]: Warning: /Stage[main]/Keystone::Roles::Admin/Keystone_tenant[admin]: Skipping because of failed dependencies
Mar 10 15:56:48 overcloud-controller-0 os-collect-config[1951]: Error: Could not prefetch keystone_role provider 'openstack': Cannot allocate memory - fork(2)
Mar 10 15:56:48 overcloud-controller-0 os-collect-config[1951]: Warning: /Stage[main]/Keystone::Roles::Admin/Keystone_role[admin]: Skipping because of failed dependencies
Mar 10 15:56:48 overcloud-controller-0 os-collect-config[1951]: Error: Could not prefetch keystone_user provider 'openstack': Cannot allocate memory - fork(2)
Mar 10 15:56:48 overcloud-controller-0 os-collect-config[1951]: Warning: /Stage[main]/Keystone::Roles::Admin/Keystone_user[admin]: Skipping because of failed dependencies
Mar 10 15:56:48 overcloud-controller-0 os-collect-config[1951]: Warning: /Stage[main]/Keystone::Roles::Admin/Keystone_user_role[admin@admin]: Skipping because of failed dependencies
Mar 10 15:56:48 overcloud-controller-0 os-collect-config[1951]: Error: Could not prefetch keystone_service provider 'openstack': Cannot allocate memory - fork(2)
Mar 10 15:56:48 overcloud-controller-0 os-collect-config[1951]: Warning: /Stage[main]/Keystone::Endpoint/Keystone::Resource::Service_identity[keystone]/Keystone_service[keystone::identity]: Skipping
Mar 10 15:56:48 overcloud-controller-0 os-collect-config[1951]: Error: Could not prefetch keystone_endpoint provider 'openstack': Cannot allocate memory - fork(2)
Mar 10 15:56:48 overcloud-controller-0 os-collect-config[1951]: Warning: /Stage[main]/Keystone::Endpoint/Keystone::Resource::Service_identity[keystone]/Keystone_endpoint[regionOne/keystone::identity]
Mar 10 15:56:48 overcloud-controller-0 os-collect-config[1951]: Error: /Stage[main]/Heat::Api_cfn/Service[heat-api-cfn]: Could not evaluate: Cannot allocate memory - fork(2)
Mar 10 15:56:48 overcloud-controller-0 os-collect-config[1951]: Error: /Stage[main]/Heat::Engine/Service[heat-engine]: Could not evaluate: Cannot allocate memory - fork(2)
Mar 10 15:56:48 overcloud-controller-0 os-collect-config[1951]: Error: /Stage[main]/Heat::Api/Service[heat-api]: Could not evaluate: Cannot allocate memory - fork(2)
Mar 10 15:56:48 overcloud-controller-0 os-collect-config[1951]: Error: /Stage[main]/Heat::Api_cloudwatch/Service[heat-api-cloudwatch]: Could not evaluate: Cannot allocate memory - fork(2)
It looks like there is not enough memory for minimal virthost OC controller set up.
Things like line length, but could also include some ansible style guidelines.
In the refactor of the CI playbooks, introspection was inadvertently dropped from the nonHA job. This allowed the IPA ramdisk we build in CI to stop working for introspection. (I am pretty sure we are just missing the python-hardware dependency)
We need to make sure to include introspection on at least one job.
~
3 plays in /home/stack/.quickstart/usr/local/share/tripleo-quickstart/playbooks/quickstart.yml
PLAY [Add virthost to inventory] ***********************************************
TASK [provision/manual : Create working_dir] ***********************************
ok: [localhost] => {"changed": false, "gid": 1000, "group": "stack", "mode": "0775", "owner": "stack", "path": "/home/stack/.quickstart", "secontext": "unconfined_u:object_r:user_home_t:s0", "size": 87, "state": "directory", "uid": 1000}
TASK [provision/manual : Create empty ssh config file] *************************
changed: [localhost] => {"changed": true, "dest": "/home/stack/.quickstart/ssh.config.ansible", "gid": 1000, "group": "stack", "mode": "0664", "owner": "stack", "secontext": "unconfined_u:object_r:user_home_t:s0", "size": 0, "state": "file", "uid": 1000}
TASK [provision/manual : Add the virthost to the inventory] ********************
creating host via 'add_host': hostname=host0
changed: [localhost] => {"add_host": {"groups": ["virthost"], "host_name": "host0", "host_vars": {"ansible_fqdn": "*****.ualberta.ca", "ansible_ssh_host": "cirrus.nic.ualberta.ca", "ansible_ssh_private_key_file": "/home/stack/.ssh/id_rsa", "ansible_ssh_user": "stack", "local_working_dir": "/home/stack/.quickstart"}}, "changed": true}
TASK [rebuild-inventory : rebuild-inventory] ***********************************
changed: [localhost] => {"changed": true, "checksum": "7f041b7ad7074f3700330a48a4c52b8282218255", "dest": "/home/stack/.quickstart/hosts", "gid": 1000, "group": "stack", "md5sum": "e64ff5b25e4eba5bc4955f6682684318", "mode": "0664", "owner": "stack", "secontext": "unconfined_u:object_r:user_home_t:s0", "size": 176, "src": "/home/stack/.ansible/tmp/ansible-tmp-1454365142.58-185300564260252/source", "state": "file", "uid": 1000}
PLAY [Setup undercloud and baremetal vms and networks in libvirt] **************
TASK [teardown/check : check if libvirt is running] ****************************
ok: [host0] => {"changed": false, "cmd": "rpm -qa libvirt && systemctl status libvirtd", "delta": "0:00:01.206875", "end": "2016-02-01 15:19:05.154115", "failed": false, "failed_when_result": false, "rc": 0, "start": "2016-02-01 15:19:03.947240", "stderr": "", "stdout": "libvirt-1.2.17-13.el7_2.2.x86_64\n● libvirtd.service - Virtualization daemon\n Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled; vendor preset: enabled)\n Active: active (running) since Mon 2016-01-25 14:16:44 MST; 1 weeks 0 days ago\n Docs: man:libvirtd(8)\n http://libvirt.org\n Main PID: 3339 (libvirtd)\n CGroup: /system.slice/libvirtd.service\n ├─3143 /sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro --dhcp-script=/usr/libexec/libvirt_leaseshelper\n ├─3144 /sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro --dhcp-script=/usr/libexec/libvirt_leaseshelper\n └─3339 /usr/sbin/libvirtd\n\nJan 26 15:27:47 cirrus.nic.ualberta.ca libvirtd[3339]: Received unexpected event 3\nJan 26 15:27:47 cirrus.nic.ualberta.ca libvirtd[3339]: Received unexpected event 3\nJan 26 15:29:49 cirrus.nic.ualberta.ca libvirtd[3339]: Received unexpected event 3\nJan 26 15:29:49 ****.ualberta.ca libvirtd[3339]: Received unexpected event 3\nJan 26 15:37:22 ***.ualberta.ca libvirtd[3339]: Received unexpected event 3\nJan 26 15:37:22 ***.ualberta.ca libvirtd[3339]: Received unexpected event 3\nFeb 01 13:45:58 cirrus.nic.ualberta.ca libvirtd[3339]: Received unexpected event 3\nFeb 01 13:45:58 ***__.ualberta.ca libvirtd[3339]: Received unexpected event 3\nFeb 01 13:46:59 *.ualberta.ca libvirtd[3339]: Received unexpected event 3\nFeb 01 13:46:59 ***__.ualberta.ca libvirtd[3339]: Received unexpected event 3", "stdout_lines": ["libvirt-1.2.17-13.el7_2.2.x86_64", "● libvirtd.service - Virtualization daemon", " Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled; vendor preset: enabled)", " Active: active (running) since Mon 2016-01-25 14:16:44 MST; 1 weeks 0 days ago", " Docs: man:libvirtd(8)", " http://libvirt.org", " Main PID: 3339 (libvirtd)", " CGroup: /system.slice/libvirtd.service", " ├─3143 /sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro --dhcp-script=/usr/libexec/libvirt_leaseshelper", " ├─3144 /sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro --dhcp-script=/usr/libexec/libvirt_leaseshelper", " └─3339 /usr/sbin/libvirtd", "", "Jan 26 15:27:47 *.ualberta.ca libvirtd[3339]: Received unexpected event 3", "Jan 26 15:27:47 ***__.ualberta.ca libvirtd[3339]: Received unexpected event 3", "Jan 26 15:29:49 cirrus.nic.ualberta.ca libvirtd[3339]: Received unexpected event 3", "Jan 26 15:29:49 ***.ualberta.ca libvirtd[3339]: Received unexpected event 3", "Jan 26 15:37:22 ***.ualberta.ca libvirtd[3339]: Received unexpected event 3"], "warnings": ["Consider using yum module rather than running rpm"]}
[WARNING]: Consider using yum module rather than running rpm
TASK [teardown/networks : Delete external network] *****************************
ok: [host0] => {"changed": false}
TASK [teardown/networks : Delete the OVS bridges] ******************************
failed: [host0] => (item={u'name': u'bridget'}) => {"cmd": "ovs-vsctl -t 5 br-exists bridget", "failed": true, "item": {"name": "bridget"}, "msg": "[Errno 2] No such file or directory", "rc": 2}
PLAY RECAP *********************************************************************
host0 : ok=2 changed=0 unreachable=0 failed=1
localhost : ok=4 changed=3 unreachable=0 failed=0
We need documentation telling folks how to contribute.
ie, gerrit workflow and github issues like this for bugs/RFEs
We're currently running teardown tasks against a "clean" system, so there's nothing to tear down. We should have job that:
Because we tear down everything in quickstart, we actually delete any cached image that happens to exist. This goes back to #30.
Presently the image building playbooks are reusing too much of the libvirt / networking setup & config from the quickstart roles. For example there's no need to set up overcloud network(s) to build the images. In addition it makes the build image workflows prone to breaking on setup/config, for reasons unrelated to image building. Finally it increases the amount of time needed to build an image. As we explore ways to leverage tripleo-quickstart's appliance based undercloud in (potentially) in more projects for CI, being able to quickly, simply, and cleanly build images will increasingly become important. I've got some deltas locally to address this, testing now. Planning to post a review shortly.
Presently the publish playbook has hard coded destination. To make this useful in other contexts (downstream ospd-based image generation in my specific case) we should make this more re-usable.
WIP to address: https://review.gerrithub.io/#/c/266245/
Need to make better use of task tagging throughout the oooq roles.
If one interrupts quickstart.sh
via ^C
, it will erroneously display the help output re: how to log in to the undercloud and continue the deploy. Need to figure out why the ^C
is resulting in failure code from ansible-playbook
(which would then cause the script to exit because of our set -e
).
I was thinking about workflow today and this, and in particular:
The stuff in green in that diagram should be runnable as a completely unprivileged user. Ideally, if you have a host that is already virt capable and a couple of configured bridges you could just run the playbooks against localhost
with a local
connection and end up with a fully deployed overcloud.
I am interested in opinions on the above, and in particular whether or not the provision
role as described makes sense, and whether it needs an unprovision
counterpart.
If someone tries running quickstart against an inventory file like this:
[virthost]
localhost ansible_user=admin
And they are not logged in as the admin
user where they are
running the quickstart, confusion will result: because the playbooks
assume that delegate_to: localhost
will run as the current user,
but in this situation that isn't true, and they will get permission
errors when tasks try to write to /home/currentuser/.quickstart
.
The solution right now is "don't target localhost as a different user".
This is a requirement of cross-distro support since packages can be named differently in different distros.
Initial idea for the 'breakpoints':
CI would then use different settings YAMLs to define the different scenarios, all of which would be run against the full deploy and test scenario.
There is (at least) one dependency on a new jinja2.8 functionality, which leads to the following error when the system jinja2 is used (python-jinja2-2.7.2-2.el7.noarch):
TASK [setup/overcloud_nodes : define baremetal vms] ****************************
Thursday 10 March 2016 13:42:25 +0200 (0:00:03.792) 0:00:43.142 ********
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: TemplateRuntimeError: no test named 'equalto'
fatal: [host0]: FAILED! => {"failed": true, "stdout": ""}
It would be nice if the playbooks could work with the system-provided jinja2.
Let's track here the work needed to get initial baremetal support going:
As documented at [1], it's a good idea to verify the md5sum of the downloaded undercloud image. However, when executing md5sum -c undercloud.qcow2.md5
, the check will fail:
[root@rdo mitaka]# md5sum -c undercloud.qcow2.md5
md5sum: /tmp/oooq-images/undercloud.qcow2: No such file or directory
/tmp/oooq-images/undercloud.qcow2: FAILED open or read
This is, because the undercloud.qcow2.md5 file [2] includes an absolute path after the checksum:
6c7dc0e0b14b7b3deff0da284594d778 /tmp/oooq-images/undercloud.qcow2
Instead of just the file name as expected, like this:
6c7dc0e0b14b7b3deff0da284594d778 undercloud.qcow2
[1] https://www.rdoproject.org/rdo-manager/
[2] https://ci.centos.org/artifacts/rdo/images/mitaka/delorean/stable/undercloud.qcow2.md5
We probably need to namespace the configuration options some how as well. Top-level "url" option, Im looking at you.
I have a dream...that one day, tripleo-quickstart will be able to run somewhere besides EL7 distributions.
Anything impacting this topic should refer to this issue.
In order to make #15 not completely crazy we need to use the package
module exclusively for package installation. We're still using yum
in a few places (we're using both in the same role in at least one place).
Need to update http://blog.oddbit.com/2016/02/19/deploy-an-ha-openstack-development-envir/ since centosci/minimal.yml changed from a playbook to list of vars to instructions all need updating.
It would be quite nice to have a .ssh/config for the stack user on the undercloud so that hopping on the
nodes manually is a bit quicker for the senile amongst us:
Host 192.0.2.*
User heat-admin
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null
Hello,
with recent clean deployment of tripleo-quickstart (73d543d) on Centos 7, I can not see any output on $VIRTHOST, something is really wrong here:
$ virsh list --all
Id Name State
----------------------------------------------------
I can access undercloud, though:
$ ssh -F /root/.quickstart/ssh.config.ansible undercloud
# OK
Then I get errors in OC deployment stage:
exec openstack overcloud deploy -e /usr/share/openstack-tripleo-heat-templates/environments/puppet-pacemaker.yaml --control-scale 3 --compute-scale 2 --ntp-server pool.ntp.org --templates --libvirt-type qemu --control-flavor oooq_control
--compute-flavor oooq_compute --ceph-storage-flavor oooq_ceph --timeout 60 -e /tmp/deploy_env.yaml
Error: only 1 of 3 requested ironic nodes are tagged to profile oooq_control (for flavor oooq_control)
Recommendation: tag more nodes using ironic node-update <NODE ID> replace properties/capabilities=profile:oooq_control,boot_option:local
Error: only 1 of 2 requested ironic nodes are tagged to profile oooq_compute (for flavor oooq_compute)
Recommendation: tag more nodes using ironic node-update <NODE ID> replace properties/capabilities=profile:oooq_compute,boot_option:local
Configuration has 2 errors, fix them before proceeding. Ignoring these errors is likely to lead to a failed deploy.
Deployment failed: Not enough nodes - available: 2, requested: 5
Clean Centos 7 and tripleo-quickstart, it seems like missing python virtualenv dependency:
$ bash quickstart.sh $VIRTHOST
...
Traceback (most recent call last):
File "/root/.quickstart/bin/ansible-playbook", line 72, in
mycli = getattr(import("ansible.cli.%s" % sub, fromlist=[myclass]), myclass)
File "/root/.quickstart/lib/python2.7/site-packages/ansible/cli/playbook.py", line 30, in
from ansible.executor.playbook_executor import PlaybookExecutor
File "/root/.quickstart/lib/python2.7/site-packages/ansible/executor/playbook_executor.py", line 30, in
from ansible.executor.task_queue_manager import TaskQueueManager
File "/root/.quickstart/lib/python2.7/site-packages/ansible/executor/task_queue_manager.py", line 29, in
from ansible.executor.play_iterator import PlayIterator
File "/root/.quickstart/lib/python2.7/site-packages/ansible/executor/play_iterator.py", line 29, in
from ansible.playbook.block import Block
File "/root/.quickstart/lib/python2.7/site-packages/ansible/playbook/init.py", line 25, in
from ansible.playbook.play import Play
File "/root/.quickstart/lib/python2.7/site-packages/ansible/playbook/play.py", line 27, in
from ansible.playbook.base import Base
File "/root/.quickstart/lib/python2.7/site-packages/ansible/playbook/base.py", line 32, in
from jinja2.exceptions import UndefinedError
File "/root/.quickstart/lib/python2.7/site-packages/jinja2/init.py", line 33, in
from jinja2.environment import Environment, Template
File "/root/.quickstart/lib/python2.7/site-packages/jinja2/environment.py", line 13, in
from jinja2 import nodes
File "/root/.quickstart/lib/python2.7/site-packages/jinja2/nodes.py", line 19, in
from jinja2.utils import Markup
File "/root/.quickstart/lib/python2.7/site-packages/jinja2/utils.py", line 531, in
from markupsafe import Markup, escape, soft_unicode
ImportError: No module named markupsafe
$ source .quickstart/bin/activate
$ pip install markupsafe
$ deactivate
$ bash quickstart.sh $VIRTHOST # even with --system-site-packages
The addition of tags is super useful and totally the right thing to do (thanks!)
That said, to a oooq newcomer (we want these folks!) - from ./quickstart.sh --help, it's clear that we have tags, but we don't tell folks what they are. If they dig into the .sh, or walk through the playbooks (and know what to look for) that are "discoverable" - but it would make sense to have --help list the available tags. I realize that static text for --help causes a potential drift issue later on. IMHO what would be ideal is if the tags found in playbooks could be dynamically discovered and listed...but that's getting perhaps too elaborate and not pragmatically worth the effort. This issue is just to capture the idea, not infer a solution.
Somewhere between docs, --help, and [uber solution] we should be able to find something workable.
Currently, the quickstart requires root privileges on the target host and makes a number of system configuration changes. This makes some folks (understandably) nervous and may inhibit the adoption of the tool.
Most of the privileged access is required to configure networking.
We ought to be able to complete just about everything through the use of qemu user mode networking. There are a few options:
qemu-bridge-helper
tool. With the packages shipped in RHEL/CentOS/Fedora/etc, this will Just Work with the virbr0
bridge created by the libvirt default
network. We can take advantage of this to create inbound access into the virtual environment.ip set multicast on dev lo
) and (b) to add the necessary routes so that the multicast traffic uses the lo
interface.I kind of like the tcp model.
fatal: [host0]: FAILED! => {"changed": true, "cmd": ["virsh", "pool-undefine", "oooq_pool"], "d
elta": "0:00:00.029126", "end": "2016-03-15 18:23:19.744160", "failed": true, "invocation": {"m
odule_args": {"_raw_params": "virsh pool-undefine "oooq_pool"", "_uses_shell": false, "chdir"
: null, "creates": null, "executable": null, "removes": null, "warn": true}, "module_name": "co
mmand"}, "rc": 1, "start": "2016-03-15 18:23:19.715034", "stderr": "error: Failed to undefine p
ool oooq_pool\nerror: internal error: no config file for oooq_pool", "stdout": "", "stdout_line
s": [], "warnings": []}
A "rm -rf /run/user/1003/libvirt/" on the host, fixes it
In setup summary at the end of the script, there are hints how to proceed further with deployment. One of script names mentioned is
"undercloud-install.post.sh will perform all pre-deploy steps"
while the actual name of the script is undercloud-post-install.sh
Our gate jobs have a regexp trigger that looks like:
playbooks/roles/(libvirt|common|overcloud|provision|rebuild-inventory|tripleo)/.*
That will miss -- for example -- environment
, and parts
. We can add those two, or consider just matching on playbooks/roles/.*
instead.
When running tripleo-quickstart in a CI environment, it can be useful to get live systems informations.
dstat
is an excellent tool for that, OpenStack CI already uses it.
sudo dnf -y install dstat
sudo dstat -tcmndrylpg --top-cpu-adv --top-io-adv --nocolor | sudo tee --append /var/log/dstat.log > /dev/null &
The instructions have become a bit long with the need to pre-download the image. It would be good to codify these extra steps into a "bootstrap" playbook that is run if a --boostrap option is passed to the quickstart.sh.
Ideally we would want things to work with just:
export VIRTHOST='foo.example.com'
quickstart.sh --bootstrap
The gate jobs take a looooong time to complete. Do they actually need to perform a full overcloud deploy? Can we have separate short-running jobs that only get as far as booting the undercloud node so that we can get faster feedback on changes (while still testing out the full overcloud deploy)?
Currently the output from the playbook that builds images is a bit of a land-mine. In the case of the output from DIB when it creates the overcloud image, it ends up being > 400,000 characters on a single line. This literally makes Atom, Emacs, PyCharm, Gedit, Kate, VIM, etc go into a death loop trying to parse the line. If one waits far longer than I have patience for (minutes) eventually you'll get back. Interim solutions are to not use the ansible output directly, rather passing it thru sed first to turn \n into \n, open the logfile that's captured, etc. It's a low priority issue, but the first time it happened to me I thought "Ouch...that hurt..." As this project catches on I think anything that increases friction for developers / newcomers should where possible be fixed. In addition, it would be pretty cool to have a nice summary of what was built, the source of the binaries, etc as part of basic output. I'm happy to propose deltas to address (next week at the earliest).
In undercloud-install-post.sh
, we attempt to avoid failures by deleting flavors before (re-)creating them, like this:
openstack flavor delete oooq_{{ name }} > /dev/null 2>&1 || true
openstack flavor create --id auto ... oooq_{{ name }}
But what actually happens is that the post-install script fails with:
+ openstack flavor create --id auto --ram 4096 --disk 49 --vcpus 1 oooq_control
Flavor with name oooq_control already exists. (HTTP 409) (Request-ID: req-55b34a97-f21d-41fd-b69d-e46829811e6c)
Because:
$ openstack flavor delete oooq_control
public endpoint for messaging service not found
This looks like it's due to a bug in python-zaqarclient.
There is an issue when running task with virt-resize on downloaded image and no much space on root partition (as it's by default in CentOS/Fedora installation).
When no space is left on partition the task hangs and never finishes. I see 2 points here:
Just noticed that if the undercloud vm is already running (because of a previously failed job or for some whatever reason), the deployment bails out with:
TASK [setup/undercloud : start undercloud vm] **********************************
Thursday 03 March 2016 10:31:40 -0500 (0:00:00.516) 0:03:31.308 ********
failed: [host0] => (item={u'flavor': u'undercloud', u'name': u'undercloud'}) => {"failed": true, "item": {"flavor": "undercloud", "name": "undercloud"}, "msg": "Requested operation is not valid: domain is already running"}
I can ssh root@$VIRTHOST
and login by inputting my password. However bash quickstart.sh $VIRTHOST
will fail.
If I set up key authentication for SSH, then quickstart.sh will work correctly.
TripleO understand "name" field for nodes in instackenv.json. It's much easier to refer to nodes by e.g. node-%d (the same as devstack btw) than by their UUID.
When running...
quickstart.sh localhost
The deploy fails with:
TASK [setup/undercloud : Generate ssh configuration] ***************************
Sunday 13 March 2016 10:52:20 -0400 (0:00:00.038) 0:21:41.223 **********
fatal: [host0 -> localhost]: FAILED! => {"changed": true, "failed": true, "msg": "Aborting, target uses selinux but python bindings (libselinux-python) aren't installed!"}
We may want to set ansible_python_interpreter at a higher level...either by restoring the group_vars/all.yml file, or by just passing it on the ansible-playbook command line via -e.
We should test quickstart.sh -- and quickstart.yml -- in CI, since this is what we expose to people.
Following the instructions at:
https://www.rdoproject.org/testday/mitaka/milestone3/
https://www.rdoproject.org/rdo-manager/
http://docs.openstack.org/developer/tripleo-docs/basic_deployment/basic_deployment_cli.html#upload-images
(on CentOS 7, without any special configuration), the overcloud deployment will fail when executing openstack overcloud deploy --templates
.
The error given was:
2016-03-02 13:51:58 [Controller]: CREATE_IN_PROGRESS state changed
2016-03-02 13:52:03 [NovaCompute]: CREATE_FAILED ResourceInError: resources.NovaCompute: Went to status ERROR due to "Message: Unknown, Code: Unknown"
2016-03-02 13:52:04 [NovaCompute]: DELETE_IN_PROGRESS state changed
2016-03-02 13:52:07 [Controller]: CREATE_FAILED ResourceInError: resources.Controller: Went to status ERROR due to "Message: No valid host was found. There are not enough hosts available., Co
de: 500"
2016-03-02 13:52:07 [Controller]: DELETE_IN_PROGRESS state changed
2016-03-02 13:52:09 [NovaCompute]: DELETE_COMPLETE state changed
2016-03-02 13:52:11 [Controller]: DELETE_COMPLETE state changed
And in /var/log/nova/nova-scheduler.log
:
2016-03-02 13:52:02.080 15034 DEBUG nova.scheduler.filters.disk_filter [req-032ace83-c834-4174-bac2-7c7cafde82e2 814e8c92f5af4aafbdf4b31c8fc23c2d 3b74005453a34fa5a06b02555b45a5a8 - - -] (undercloud, 305b7ed6-3c99-47a9-a201-837b86848f3c) ram:4096 disk:39936 io_ops:0 instances:0 does not have 40960 MB usable disk, it only has 39936.0 MB usable disk. host_passes /usr/lib/python2.7/s
ite-packages/nova/scheduler/filters/disk_filter.py:55
2016-03-02 13:52:02.080 15034 INFO nova.filters [req-032ace83-c834-4174-bac2-7c7cafde82e2 814e8c92f5af4aafbdf4b31c8fc23c2d 3b74005453a34fa5a06b02555b45a5a8 - - -] Filter DiskFilter returned 0 hosts
So the disks should be (slightly) bigger.
Someone should be able to reboot their virtualization host without completely hosing their underlcoud. This means that we need to make network configuration, such as host bridges, persistent.
If we're setting everything up as libvirt networks we get this for free (yay!). If we're mucking about directly with things like brctl
or ovs-vsctl
we may be in for sadness and disappointment.
Because of rebasing and working in parallel on the same sets of files I think we may have lost some tags related to @trown's refactoring of quickstart.sh. With https://review.gerrithub.io/#/c/265712/24, running quickstart.yml finishes like this:
PLAY [Install undercloud and deploy overcloud] *********************************
PLAY RECAP *********************************************************************
host0 : ok=77 changed=38 unreachable=0 failed=0
localhost : ok=6 changed=2 unreachable=0 failed=0
That is, after booting the undercloud, but before running any tasks on
the undercloud host. Then quickstart.sh
says:
##################################
Virtual Environment Setup Complete
##################################
Access the undercloud by:
ssh -F /home/lars/.quickstart/ssh.config.ansible undercloud
There are scripts in the home directory to continue the deploy:
undercloud-install.sh will run the undercloud install
undercloud-post-install.sh will perform all pre-deploy steps
overcloud-deploy.sh will deploy the overcloud
overcloud-deploy-post.sh will do any post-deploy configuration
overcloud-validate.sh will run post-deploy validation
But in fact none of those scripts exist:
ssh -F /home/lars/.quickstart/ssh.config.ansible undercloud
Last login: Sat Mar 12 00:28:53 2016 from 192.168.23.1
[stack@undercloud ~]$ ls
instackenv.json ironic-python-agent.kernel overcloud-full.qcow2 undercloud.conf
ironic-python-agent.initramfs overcloud-full.initrd overcloud-full.vmlinuz
[stack@undercloud ~]$
We probably just need to put appropriate tags back into the tripleo/* roles. This is just here so I don't forget.
This is probably not a bug, but maybe suggestion for improvement in future:
According to documentation, one has to call script following way:
bash quickstart.sh -u $UNDERCLOUD_QCOW2_LOCATION $VIRTHOST
I know even help says:
quickstart.sh: usage: quickstart.sh [options] virthost [release]
...
But I was accidently able to issue:
bash quickstart.sh $VIRTHOST -u $UNDERCLOUD_QCOW2_LOCATION
which failed with error
TASK [setup/undercloud : get undercloud image expected checksum] ***************
task path: /root/.quickstart/tripleo-quickstart/playbooks/roles/libvirt/setup/undercloud/tasks/fetch_image.yml:14
Friday 11 March 2016 11:20:55 +0100 (0:00:00.176) 0:00:20.002 **********
fatal: [host0]: FAILED! => {"changed": true, "cmd": ["curl", "-sf", "https://ci.centos.org/artifacts/rdo/images/-u/delorean/stable/undercloud.qcow2.md5"], "delta": "0:00:00.518934", "
end": "2016-03-11 11:20:55.796695", "failed": true, "invocation": {"module_args": {"_raw_params": "curl -sf https://ci.centos.org/artifacts/rdo/images/-u/delorean/stable/undercloud.qcow2.md5", "_uses_shell": false, "chdir": null, "creates": null, "executable": null, "removes": null, "warn": true}, "module_name": "command"}, "rc": 22, "start": "2016-03-11 11:20:55.
277761", "stderr": "", "stdout": "", "stdout_lines": [], "warnings": ["Consider using get_url module rather than running curl"]}
I assume parameter url (with incorrect "-u" inside) was not propagated correctly from bash as he didn't exactly expect different param order.
We need to collect logs from the virthost itself for image building. We should probably not be depending on khaleesi for log collection and just fork what is there now into oooq.
This is an instance of: #18
The prep needed to run diskimage-builder on the virthost needs to be re-factored to allow for portability. The list of prerequisite packages needs to be changable. In addition presently it allows for installing one RPM. This needs to be extended via a script, or a list of packages + script, etc. For example images might be built for rpm/rhos/rhel, or some other packaging format. As this could be used in the context of CI for a particular big tent (or otherwise) OpenStack project, there might be some simple scripting necessary. Scenarios include:
There is a proposed change, currently integrating feedback:
With 6bd4368, the quickstart.sh
script was transformed from something that can be run as a non-root user to something that must be run as root (or via sudo
), because now it tries to install packages. This introduces a variety of complications:
I think putting these yum install
commands in quickstart.sh was a move in the wrong direction. A better solution would be to error out with an appropriate error message if the commands aren't available.
We should be moving away from requiring additional privileges.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.