Comment 11 for bug 1880509

Revision history for this message
alpha23 (alpha23) wrote :

The issue may be related to the nova database. Curiously, the compute_nodes table produces:

+----------------------+---------------------+--------------------------------------+----------------------+---------------------+
| host | created_at | uuid | hypervisor_hostname | deleted_at |
+----------------------+---------------------+--------------------------------------+----------------------+---------------------+
| st1 | 2018-06-09 21:04:56 | 9c7056b8-21f4-49d2-833b-b446c9315974 | st1 | 2018-10-13 16:48:47 |
| st1 | 2018-10-13 16:48:19 | 14fc8f68-92bd-48dd-8826-915d28fb4822 | st1.xyz.local | 2020-05-26 21:04:00 |
| st1.xyz.local | 2020-05-26 04:36:36 | a3786abe-0389-46d5-ac08-2fce604548e2 | st1.xyz.local | NULL |
| st1 | 2020-05-26 21:04:23 | f840a1c8-432a-41f5-a7ba-e5e7d1d78c29 | st1 | NULL |
+----------------------+---------------------+--------------------------------------+----------------------+---------------------+

ALL of the above refer to the SAME compute node (note the creation and deletion dates).

nova hypervisor list produces:

| ID | Hypervisor hostname | State | Status |
+--------------------------------------+----------------------+-------+----------+
| a3786abe-0389-46d5-ac08-2fce604548e2 | st1.xyz.local | down | disabled |
| f840a1c8-432a-41f5-a7ba-e5e7d1d78c29 | st1 | up | enabled |

Both refer to the same compute node and the later was enabled after changing the /etc/hosts file on the docker host which occured on 5/26 @ 21:04. Both the hostname and the hypervisor hostname should be 'st1' now.

However, the nova instances table results in the following columns, for example (I do not see the hypervisor uuid available as a column in this table):

select host,hostname,node from instances limit 1;
+------+-----------------------------------------------+------+
| host | hostname | node |
+------+-----------------------------------------------+------+
| st1 | zeta-13-cluster-qypnojhoxz5q-primary-master-0 | st1 |
+------+-----------------------------------------------+------+

Is nova-compute attempting to attach a volume to 'st1' but selecting the incorrect host/uuid which is causing it to show HypervisorUnavailable and/or if the hypervisor_hostname was/is a FQDN, did the migration from Queens to Rocky break functionality because of the mismatch between the host and the FQDN (see the 2nd entry which was the compute node at the time of this issue was reported)?

Regardless, the 3rd compute node needs to be deleted. How can this be done without affecting the instances?