Ironic hypervisor disappears once hashring got rebuilt
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
In Progress
|
Undecided
|
Nikolay Fedotov |
Bug Description
Steps to reproduce
==================
Precondition: Need fresh openstack deployment. Database tables nova.compute_nodes and nova_api.
It HA deployment. Need to have at least two ironic-conductors running on different servers.
Steps:
1. Create baremetal node . "openstack baremetal node create ..."
2. Change node's state to manageable
3. After sometime "nova hypervisor-list" should list a hypervisor with same UUID as the baremetal node.
3.1 Database should like below
MariaDB [(none)]> select uuid, host, mapped from nova.compute_nodes;
+------
| uuid | host | mapped |
+------
| d394aa91-
+------
MariaDB [(none)]> select * from nova_api.
+------
| created_at | updated_at | id | cell_id | host |
+------
| 2019-04-22 09:14:23 | NULL | 22 | 7 | ironic.aio1 |
+------
4. Call "nova hypervisor-show <hypervisor UUID>" in order to find out server where ironic-conductor is running. Log into that server and stop ironic-conductor. Need to force hashring to rebuild it's state. Wait for about five minutes.
5. Check output of "nova hypervisor-list". The hypervisor is absent.
Result
==================
Look inside database (see below). ironic.aio3 took the baremetal thus node nova changed 'host' field of compute (d394aa91-
Because of mapped = 1 'nova-manage cell_v2 discover_hosts' (run preiodically https:/
MariaDB [(none)]> select uuid, host, mapped from nova.compute_nodes;
+------
| uuid | host | mapped |
+------
| d394aa91-
+------
MariaDB [(none)]> select * from nova_api.
+------
| created_at | updated_at | id | cell_id | host |
+------
| 2019-04-22 09:14:23 | NULL | 22 | 7 | ironic.aio1 |
+------
2019-04-22 19:54:00.813 8 WARNING nova.compute.
2019-04-22 19:54:00.831 8 INFO nova.compute.
2019-04-22 19:54:00.891 8 DEBUG nova.virt.
Missing record in host_mappings table causes nova to print "Unable to find service" DEBUG message (see below). The compute become 'invisible'.
See source code nova/api/
108 def _get_hypervisor
109 links=False):
110 """Get hypervisors for the given request.
111
112 :param req: nova.api.
...
161 hypervisors_list = []
162 for hyp in compute_nodes:
163 try:
164 instances = None
165 if with_servers:
166 instances = self.host_
167 context, hyp.host)
168 service = self.host_
169 context, hyp.host)
170 hypervisors_
171 self._view_
172 hyp, service, detail, req, servers=instances))
173 except (exception.
174 exception.
175 # The compute service could be deleted which doesn't delete
176 # the compute node record, that has to be manually removed
177 # from the database so we just ignore it when listing nodes.
178 LOG.debug('Unable to find service for compute node %s. The '
179 'service may be deleted and compute nodes need to '
180 'be manually cleaned up.', hyp.host)
Fix proposed to branch: master /review. opendev. org/654584
Review: https:/