os-nova-install stalls for hours and re-run requires deleting cell

Bug #1663616 reported by Melvin Hillsman
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack-Ansible
Fix Released
Undecided
Unassigned

Bug Description

;tldr - install was going fine until it got stuck trying to create nova service/endpoints and did not fail, nova service/endpoints never got created, killed playbook run, when trying to re-run os-nova-install.yml after the fail, a cell was already created, required deleting the cell before nova install would complete

####
While trying to install nova, playbook is stuck here for hours
####

TASK [os_nova : Ensure nova service] *******************************************
Friday 10 February 2017 08:03:14 +0000 (0:00:03.728) 0:16:14.138 *******
skipping: [infra01_nova_console_container-68402994]
skipping: [infra03_nova_console_container-7c43d58f]
skipping: [infra02_nova_console_container-c860332e]
skipping: [infra01_nova_api_placement_container-6ea6d942]
skipping: [infra03_nova_api_placement_container-4f7ab4aa]
skipping: [infra02_nova_api_placement_container-2809d4f0]
skipping: [infra03_nova_api_os_compute_container-58c89079]
skipping: [infra01_nova_conductor_container-3394fb7e]
skipping: [infra03_nova_conductor_container-a73f977f]
skipping: [infra02_nova_conductor_container-4452dc00]
skipping: [compute02]
skipping: [compute03]
skipping: [compute01]
skipping: [infra01_nova_scheduler_container-6396ad2f]
skipping: [infra03_nova_scheduler_container-43f9ac0c]
skipping: [infra02_nova_scheduler_container-c6740737]
skipping: [infra01_nova_api_metadata_container-3a6ae25f]
skipping: [infra03_nova_api_metadata_container-26fc5f74]
skipping: [infra02_nova_api_os_compute_container-5e331484]
skipping: [infra02_nova_api_metadata_container-241f74d8]

####
nova endpoint/service never got created
####

(rally) root@melv7301-osic-lab:~# openstack endpoint list
+----------------------------------+-----------+--------------+--------------+---------+-----------+---------------------------------------------+
| ID | Region | Service Name | Service Type | Enabled | Interface | URL |
+----------------------------------+-----------+--------------+--------------+---------+-----------+---------------------------------------------+
| 185eaaad25d54c769d89d489a5032f49 | RegionOne | glance | image | True | admin | http://172.29.176.100:9292 |
| 4f5a30a3ff0d452d8392fbe6cd892e79 | RegionOne | glance | image | True | public | https://172.24.96.249:9292 |
| 560ca37e821c45b3a9097b4569dab254 | RegionOne | cinderv2 | volumev2 | True | public | https://172.24.96.249:8776/v2/%(tenant_id)s |
| 7b2d985d7ef74a01a0144e246e027080 | RegionOne | cinderv2 | volumev2 | True | internal | http://172.29.176.100:8776/v2/%(tenant_id)s |
| 84741cadc2df45b092278b08bb0b09cb | RegionOne | cinderv2 | volumev2 | True | admin | http://172.29.176.100:8776/v2/%(tenant_id)s |
| 9e873b34b9a344c3b79d7033e8d61cf8 | RegionOne | cinder | volume | True | internal | http://172.29.176.100:8776/v1/%(tenant_id)s |
| bc24011f9732428e97e78068f7989391 | RegionOne | cinder | volume | True | admin | http://172.29.176.100:8776/v1/%(tenant_id)s |
| be50be66aa1e406aa8b0fe0fe7b58fb8 | RegionOne | keystone | identity | True | internal | http://172.29.176.100:5000/v3 |
| dbe077dfab504c3ca9b981d40df01aa0 | RegionOne | glance | image | True | internal | http://172.29.176.100:9292 |
| dc85a1c1b85f4ccd96a8e7d79f563aa2 | RegionOne | cinder | volume | True | public | https://172.24.96.249:8776/v1/%(tenant_id)s |
| e4890d4332e94c49b885824122decf42 | RegionOne | keystone | identity | True | admin | http://172.29.176.100:35357/v3 |
| ed86117fcca848c6a3d271cec63ceba9 | RegionOne | keystone | identity | True | public | https://172.24.96.249:5000/v3 |
+----------------------------------+-----------+--------------+--------------+---------+-----------+---------------------------------------------+

####
`ps faux` on the deployment host
####

root 25409 0.0 0.0 55564 32996 ? Ss 06:36 0:13 tmux
root 25410 0.0 0.0 28396 9484 pts/18 Ss 06:36 0:00 \_ -bash
root 32868 0.0 0.0 12688 1596 pts/18 S+ 07:46 0:00 \_ bash /usr/local/bin/openstack-ansible -f 15 -i inventory os-keystone-install.yml os-glance-install.yml os-cinder-install.yml os-nova-install.yml os-neutron-install.yml os-heat-install.yml os-horizon-install.yml os-swift-install.yml
root 32877 10.1 0.4 795292 566236 pts/18 Sl+ 07:46 28:40 \_ /opt/ansible-runtime/bin/python2 /opt/ansible-runtime/bin/ansible-playbook -f 15 -i inventory os-keystone-install.yml os-glance-install.yml os-cinder-install.yml os-nova-install.yml os-neutron-install.yml os-heat-install.yml os-horizon-install.yml os-swift-install.yml -e @/etc/openstack_deploy/user_secrets.yml -e @/etc/openstack_deploy/user_variables.yml
root 10544 0.0 0.4 769692 537808 pts/18 S+ 08:03 0:00 \_ /opt/ansible-runtime/bin/python2 /opt/ansible-runtime/bin/ansible-playbook -f 15 -i inventory os-keystone-install.yml os-glance-install.yml os-cinder-install.yml os-nova-install.yml os-neutron-install.yml os-heat-install.yml os-horizon-install.yml os-swift-install.yml -e @/etc/openstack_deploy/user_secrets.yml -e @/etc/openstack_deploy/user_variables.yml
root 10549 0.0 0.0 44024 2524 pts/18 S+ 08:03 0:00 \_ ssh -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=root -o ConnectTimeout=5 -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o ServerAliveInterval=64 -o ServerAliveCountMax=1024 -o Compression=no -o TCPKeepAlive=yes -o VerifyHostKeyDNS=no -o ForwardX11=no -o ForwardAgent=yes -T -o ControlPath=/root/.ansible/cp/ansible-ssh-%h-%p-%r 172.29.176.101 lxc-attach --name infra01_nova_api_os_compute_container-50570c15 -- /bin/sh -c '/usr/bin/python && sleep 0'
root 32905 0.0 0.0 44760 2008 ? Ss 07:47 0:04 ssh: /root/.ansible/cp/ansible-ssh-172.29.176.101-22-root [mux]

####
`ps faux` on the container it is attached to
####

root@infra01:~# lxc-attach -n infra01_nova_api_os_compute_container-50570c15
root@infra01-nova-api-os-compute-container-50570c15:~# ps faux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 3055 0.0 0.0 21192 3556 ? Ss 06:30 0:00 /bin/bash
root 3065 0.0 0.0 37364 3368 ? R+ 06:30 0:00 \_ ps faux
root 2817 0.0 0.0 4508 708 ? S 02:03 0:00 /bin/sh -c /usr/bin/python && sleep 0
root 2818 0.0 0.1 32412 10188 ? S 02:03 0:00 \_ /usr/bin/python
root 2819 0.0 0.6 103196 53048 ? S 02:03 0:00 \_ /usr/bin/python /tmp/ansible_AYnDx5/ansible_module_keystone.py
root 1 0.0 0.0 37088 4592 ? Ss 01:56 0:00 /sbin/init
root 41 0.0 0.0 43844 6948 ? Ss 01:56 0:00 /lib/systemd/systemd-journald
root 76 0.0 0.0 28980 2796 ? Ss 01:56 0:00 /usr/sbin/cron -f
syslog 78 0.0 0.0 256400 2672 ? Ssl 01:56 0:00 /usr/sbin/rsyslogd -n
root 145 0.0 0.0 16128 2628 ? Ss 01:56 0:00 /sbin/dhclient -1 -v -pf /run/dhclient.eth0.pid -lf /var/lib/dhcp/dhclient.eth0.leases -I -df /var/lib/dhcp/dhclient6.eth0.leases eth0
root 194 0.0 0.0 65520 3916 ? Ss 01:56 0:00 /usr/sbin/sshd -D
root 196 0.0 0.0 15756 2128 pts/2 Ss+ 01:56 0:00 /sbin/agetty --noclear --keep-baud pts/2 115200 38400 9600 vt220
root 197 0.0 0.0 15756 2124 pts/3 Ss+ 01:56 0:00 /sbin/agetty --noclear --keep-baud pts/3 115200 38400 9600 vt220
root 198 0.0 0.0 15756 2172 pts/1 Ss+ 01:56 0:00 /sbin/agetty --noclear --keep-baud pts/1 115200 38400 9600 vt220
root 199 0.0 0.0 15756 2132 pts/0 Ss+ 01:56 0:00 /sbin/agetty --noclear --keep-baud pts/0 115200 38400 9600 vt220
root 200 0.0 0.0 15756 2132 lxc/console Ss+ 01:56 0:00 /sbin/agetty --noclear --keep-baud console 115200 38400 9600 vt220
nova 2808 1.3 1.4 207312 122420 ? Ss 02:03 2:24 /openstack/venvs/nova-master/bin/python /openstack/venvs/nova-master/bin/nova-api-os-compute --log-file=/var/log/nova/nova-api-os-compute.log
nova 2824 0.0 1.5 213796 124860 ? S 02:03 0:08 \_ /openstack/venvs/nova-master/bin/python /openstack/venvs/nova-master/bin/nova-api-os-compute --log-file=/var/log/nova/nova-api-os-compute.log
nova 2825 0.0 1.5 214052 125004 ? S 02:03 0:08 \_ /openstack/venvs/nova-master/bin/python /openstack/venvs/nova-master/bin/nova-api-os-compute --log-file=/var/log/nova/nova-api-os-compute.log
nova 2826 0.0 1.5 214052 125068 ? S 02:03 0:08 \_ /openstack/venvs/nova-master/bin/python /openstack/venvs/nova-master/bin/nova-api-os-compute --log-file=/var/log/nova/nova-api-os-compute.log
nova 2827 0.0 1.5 213796 124860 ? S 02:03 0:08 \_ /openstack/venvs/nova-master/bin/python /openstack/venvs/nova-master/bin/nova-api-os-compute --log-file=/var/log/nova/nova-api-os-compute.log

####
after killing the run and restarting
####

TASK [os_nova : Perform cell_v2 initial cell setup] ****************************
Friday 10 February 2017 13:27:50 +0000 (0:00:04.843) 0:02:28.021 *******
fatal: [infra01_nova_api_os_compute_container-50570c15]: FAILED! => {"changed": false, "cmd": ["/openstack/venvs/nova-master/bin/nova-manage", "cell_v2", "create_cell", "--name", "cell1", "--database_connection", "mysql+pymysql://nova:edccf532412bce784a39ef75a414847f8f21918cdf027acae59f02@172.29.176.100/nova?charset=utf8", "--transport-url", "rabbit://nova:69beab369bc0a0668598a4a17e8@172.29.177.87:5671,nova:69beab369bc0a0668598a4a17e8@172.29.178.194:5671,nova:69beab369bc0a0668598a4a17e8@172.29.177.24:5671//nova"], "delta": "0:00:04.377561", "end": "2017-02-10 07:27:56.173126", "failed": true, "rc": 2, "start": "2017-02-10 07:27:51.795565", "stderr": "", "stdout": "Cell with the specified transport_url and database_connection combination already exists", "stdout_lines": ["Cell with the specified transport_url and database_connection combination already exists"], "warnings": []}

####
resolution
####

I resolved this by going on the mentioned nova_api container above and running:

root@infra02-nova-api-os-compute-container-5e331484:~# . /openstack/venvs/nova-master/bin/activate
(nova-master) root@infra02-nova-api-os-compute-container-5e331484:~# nova-manage cell_v2 list_cells
+-------+--------------------------------------+
| Name | UUID |
+-------+--------------------------------------+
| cell0 | 00000000-0000-0000-0000-000000000000 |
| cell1 | 4ca40cd9-3b85-49f5-b4d8-e6141327125a |
+-------+--------------------------------------+

(nova-master) root@infra02-nova-api-os-compute-container-5e331484:~# nova-manage cell_v2 delete_cell --cell_uuid 4ca40cd9-3b85-49f5-b4d8-e6141327125a

Then re-ran the os-nova-install.yml

http://cdn.pasteraw.com/pgflbn94xjijmf0vpxb4d9j69nmfead

Revision history for this message
Logan V (loganv) wrote :

Resolved by I078caf682aa01db6d5a1472946b25159f3473586

Revision history for this message
Andy McCrae (andrew-mccrae) wrote :

@Melvin - if you're still seeing this let us know or re-open/re-raise a new bug, but this should be fixed per Logan's comment.

Changed in openstack-ansible:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.