Overwrite node field by wrong value after ironic instance rebuild

Bug #1623473 reported by Tomasz Czekajło
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
New
Undecided
Unassigned

Bug Description

Hi,

When I rebuild ironic instance via nova, after the first rebuild the node for the instance's overwritten by wrong value, thus next rebuild is not possible.

Steps to reproduce
==================
1. Spawn new ironic instance
2. Rebuild the instance
After this step you can see that hypervisor_hostname for the instance is totally different than before. (I use "nova show uuid" command to display information). When you display information for instance in ironic (ironic node-show --instance uuid) you can see that UUID of node is different than node in nova.

3. Second rebuild and we can see error as below.

http://paste.openstack.org/show/irCzuu5qucX6kF44X6oe/

Environment
===========
Mitaka release and Ubuntu 16

My workaround
=============
After debugging I've found where is bug(?).

https://github.com/openstack/nova/blob/stable/mitaka/nova/compute/manager.py#L2795

2795: compute_node = self._get_compute_info(context, self.host)
2796: scheduled_node = compute_node.hypervisor_hostname

[...]

5118: def _get_compute_info(self, context, host):
5119: return objects.ComputeNode.get_first_node_by_host_for_old_compat(
5120: context, host)

OK, let's dive deep

https://github.com/openstack/nova/blob/stable/mitaka/nova/objects/compute_node.py#L274

274: def get_first_node_by_host_for_old_compat(cls, context, host,
275: use_slave=False):
276: computes = ComputeNodeList.get_all_by_host(context, host, use_slave)
277: # FIXME(sbauza): Some hypervisors (VMware, Ironic) can return multiple
278: # nodes per host, we should return all the nodes and modify the callers
279: # instead.
280: # Arbitrarily returning the first node.
281: return computes[0]

It's looks the method return the first node for the given host. In case when we've hypervisor for ironic there is multiple nodes and the first node which is return is random.

My workaround, nothing sophisticated but works for me:

--- manager.py_org 2016-09-14 13:50:37.807379651 +0200
+++ manager.py 2016-09-14 13:51:40.275126034 +0200
@@ -2793,7 +2793,11 @@
         if not scheduled_node:
             try:
                 compute_node = self._get_compute_info(context, self.host)
- scheduled_node = compute_node.hypervisor_hostname
+ #workaround for ironic
+ if compute_node.hypervisor_type == 'ironic':
+ scheduled_node = instance.node
+ else:
+ scheduled_node = compute_node.hypervisor_hostname
             except exception.ComputeHostNotFound:
                 LOG.exception(_LE('Failed to get compute_info for %s'),
                                 self.host)

I've tested this issue on Mitaka release, but it seems the code is the same in master branch.

That's all.
Regards

Tags: ironic rebuild
description: updated
Revision history for this message
Tomasz Czekajło (coldgunpl) wrote :

It seems this bug is the same as issue described in https://bugs.launchpad.net/nova/+bug/1564921

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.