Containers: hypervisor not up after host reboot

Bug #1822631 reported by Peng Peng
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
Gerry Kopec

Bug Description

Brief Description
-----------------
reboot -f on vms host, vms are successfully evacuated, but host is not recovered after reboot

Severity
--------
Major

Steps to Reproduce
------------------
as description

Expected Behaviour
------------------
host recovered

Actual Behaviour
----------------
host not recovered

Reproducibility
---------------
Reproducible

System Configuration
--------------------
Multi-node system

Branch/Pull Time/Commit
-----------------------
master as of 20190331T233001Z

Timestamp/Logs
--------------
[2019-04-01 09:50:38,598] 262 DEBUG MainThread ssh.send :: Send 'nova --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://keystone.openstack.svc.cluster.local/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne hypervisor-list'
[2019-04-01 09:50:39,735] 387 DEBUG MainThread ssh.expect :: Output:
+--------------------------------------+---------------------+-------+---------+
| ID | Hypervisor hostname | State | Status |
+--------------------------------------+---------------------+-------+---------+
| 91f0ea76-ca61-4d8d-8e62-fea0002c8b27 | compute-2 | up | enabled |
| fded3aea-63f2-40c8-99b0-cd8c252090de | compute-0 | up | enabled |
| 2b4989ef-1d7c-41bf-9f74-c942381f4632 | compute-1 | up | enabled |
+--------------------------------------+---------------------+-------+---------+

[2019-04-01 09:50:48,061] 262 DEBUG MainThread ssh.send :: Send 'sudo reboot -f'

[2019-04-01 10:49:16,316] 262 DEBUG MainThread ssh.send :: Send 'nova --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://keystone.openstack.svc.cluster.local/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne hypervisor-list'
[2019-04-01 10:49:17,452] 387 DEBUG MainThread ssh.expect :: Output:
+--------------------------------------+---------------------+-------+---------+
| ID | Hypervisor hostname | State | Status |
+--------------------------------------+---------------------+-------+---------+
| 91f0ea76-ca61-4d8d-8e62-fea0002c8b27 | compute-2 | up | enabled |
| fded3aea-63f2-40c8-99b0-cd8c252090de | compute-0 | up | enabled |
| 2b4989ef-1d7c-41bf-9f74-c942381f4632 | compute-1 | down | enabled |
+--------------------------------------+---------------------+-------+---------+

Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking as release gating; maybe a duplicate of https://bugs.launchpad.net/starlingx/+bug/1822116
Assigning with Gerry to triage/confirm and then mark as such.

Changed in starlingx:
status: New → Triaged
tags: added: stx.2019.05 stx.containers
Changed in starlingx:
importance: Undecided → High
assignee: nobody → Gerry Kopec (gerry-kopec)
Revision history for this message
Gerry Kopec (gerry-kopec) wrote :

Yes, this is a duplicate of https://bugs.launchpad.net/starlingx/+bug/1822116

Looking at system in question:
nova-compute on compute-1 is stuck and one of the placements pods is not responsive.

controller-1 placement logs:
2019-04-01 09:55:50.793763 Traceback (most recent call last):
2019-04-01 09:55:50.793802 File "/var/lib/openstack/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 460, in fire_timers
2019-04-01 09:55:50.793995 timer()
2019-04-01 09:55:50.794004 File "/var/lib/openstack/lib/python2.7/site-packages/eventlet/hubs/timer.py", line 59, in __call__
2019-04-01 09:55:50.794051 cb(*args, **kw)
2019-04-01 09:55:50.794060 File "/var/lib/openstack/lib/python2.7/site-packages/eventlet/semaphore.py", line 147, in _do_acquire
2019-04-01 09:55:50.794136 waiter.switch()
2019-04-01 09:55:50.794150 error: cannot switch to a different thread

nova-compute-1 logs, shows delete of VM but no other activity:
</capabilities>

2019-04-01 09:55:44,471.471 47950 INFO nova.compute.manager [req-879c43a8-a86f-46fc-a2fa-e2f36d6ca94b - - - - -] [instance: 6cbfe64e-1b50-457e-bee9-544f997b1e16] Deleting instance as it has been evacuated from this host
2019-04-01 09:55:45,663.663 47950 INFO nova.virt.libvirt.driver [-] [instance: 6cbfe64e-1b50-457e-bee9-544f997b1e16] Instance destroyed successfully.
2019-04-01 09:55:50,348.348 47950 INFO nova.virt.libvirt.driver [req-879c43a8-a86f-46fc-a2fa-e2f36d6ca94b - - - - -] [instance: 6cbfe64e-1b50-457e-bee9-544f997b1e16] Deleting instance files /var/lib/nova/instances/6cbfe64e-1b50-457e-bee9-544f997b1e16_del
2019-04-01 09:55:50,350.350 47950 INFO nova.virt.libvirt.driver [req-879c43a8-a86f-46fc-a2fa-e2f36d6ca94b - - - - -] [instance: 6cbfe64e-1b50-457e-bee9-544f997b1e16] Deletion of /var/lib/nova/instances/6cbfe64e-1b50-457e-bee9-544f997b1e16_del complete

Resource tracker not running on compute-1 since time of reboot:[wrsroot@controller-1 ~(keystone_admin)]$ date; kubectl exec -it -n openstack mariadb-server-0 -- bash -c "mysql --password=\$MYSQL_ROOT_PASSWORD --user=root nova -e 'select host,updated_at,vcpus_used from compute_nodes;'"
Mon Apr 1 18:10:08 UTC 2019
+-----------+---------------------+------------+
| host | updated_at | vcpus_used |
+-----------+---------------------+------------+
| compute-2 | 2019-04-01 15:31:02 | 0 |
| compute-0 | 2019-04-01 09:52:40 | 2 |
| compute-1 | 2019-04-01 09:49:56 | 4 |
+-----------+---------------------+------------+

Ken Young (kenyis)
tags: added: stx.2.0
removed: stx.2019.05
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking as Fix Released to align with the duplicate bug

From Frank:
This issue was addressed by rebasing the docker images to the Stein release and this commit to have the system application-upload command pull these docker images: https://review.openstack.org/#/c/650436/

Changed in starlingx:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.