OpenStack Compute (nova)

Bug #1623473
Activity log

Activity log for bug #1623473

Date	Who	What changed	Old value	New value	Message
2016-09-14 12:15:50	Tomasz Czekajło	bug			added bug
2016-09-14 12:21:57	Tomasz Czekajło	description	Hi, When I rebuild ironic instance via nova, after the first rebuild the node for the instance's overwritten by wrong value, thus next rebuild is not possible. Steps to reproduce ================== 1. Spawn new ironic instance 2. Rebuild the instance After this step you can see that hypervisor_hostname for the instance is totally different than before. (I use "nova show uuid" command to display information). When you display information for instance in ironic (ironic node-show --instance uuid) you can see that UUID of node is different than node in nova. 3. Second rebuild and we can see error as below. http://paste.openstack.org/show/irCzuu5qucX6kF44X6oe/ Environment =========== Mitaka release and Ubuntu 16 My workaround ============= After debugging I've found where is bug(?). https://github.com/openstack/nova/blob/stable/mitaka/nova/compute/manager.py#L2795 2795: compute_node = self._get_compute_info(context, self.host) 2796: scheduled_node = compute_node.hypervisor_hostname [...] 5118: def _get_compute_info(self, context, host): 5119: return objects.ComputeNode.get_first_node_by_host_for_old_compat( 5120: context, host) OK, let's dive deep https://github.com/openstack/nova/blob/stable/mitaka/nova/objects/compute_node.py#L274 274: def get_first_node_by_host_for_old_compat(cls, context, host, 275: use_slave=False): 276: computes = ComputeNodeList.get_all_by_host(context, host, use_slave) 277: # FIXME(sbauza): Some hypervisors (VMware, Ironic) can return multiple 278: # nodes per host, we should return all the nodes and modify the callers 279: # instead. 280: # Arbitrarily returning the first node. 281: return computes[0] It's looks the method return the first node for the given host. In case when we've hypervisor for ironic there is multiple nodes and the first node which is return is random. My workaround, nothing sophisticated but works for me: --- manager.py_org 2016-09-14 13:50:37.807379651 +0200 +++ manager.py 2016-09-14 13:51:40.275126034 +0200 @@ -2793,7 +2793,11 @@ if not scheduled_node: try: compute_node = self._get_compute_info(context, self.host) - scheduled_node = compute_node.hypervisor_hostname + #workaround for ironic + if compute_node.hypervisor_type == 'ironic': + scheduled_node = instance.node + else: + scheduled_node = compute_node.hypervisor_hostname except exception.ComputeHostNotFound: LOG.exception(_LE('Failed to get compute_info for %s'), self.host) I've tested this issue on Mitaka release, but it seems the code is the same in master branch. That's all. Regards	Hi, When I rebuild ironic instance via nova, after the first rebuild the node for the instance's overwritten by wrong value, thus next rebuild is not possible. Steps to reproduce ================== 1. Spawn new ironic instance 2. Rebuild the instance After this step you can see that hypervisor_hostname for the instance is totally different than before. (I use "nova show uuid" command to display information). When you display information for instance in ironic (ironic node-show --instance uuid) you can see that UUID of node is different than node in nova. 3. Second rebuild and we can see error as below. http://paste.openstack.org/show/irCzuu5qucX6kF44X6oe/ Environment =========== Mitaka release and Ubuntu 16 My workaround ============= After debugging I've found where is bug(?). https://github.com/openstack/nova/blob/stable/mitaka/nova/compute/manager.py#L2795 2795: compute_node = self._get_compute_info(context, self.host) 2796: scheduled_node = compute_node.hypervisor_hostname [...] 5118: def _get_compute_info(self, context, host): 5119: return objects.ComputeNode.get_first_node_by_host_for_old_compat( 5120: context, host) OK, let's dive deep https://github.com/openstack/nova/blob/stable/mitaka/nova/objects/compute_node.py#L274 274: def get_first_node_by_host_for_old_compat(cls, context, host, 275: use_slave=False): 276: computes = ComputeNodeList.get_all_by_host(context, host, use_slave) 277: # FIXME(sbauza): Some hypervisors (VMware, Ironic) can return multiple 278: # nodes per host, we should return all the nodes and modify the callers 279: # instead. 280: # Arbitrarily returning the first node. 281: return computes[0] It's looks the method return the first node for the given host. In case when we've hypervisor for ironic there is multiple nodes and the first node which is return is random. My workaround, nothing sophisticated but works for me: --- manager.py_org 2016-09-14 13:50:37.807379651 +0200 +++ manager.py 2016-09-14 13:51:40.275126034 +0200 @@ -2793,7 +2793,11 @@ if not scheduled_node: try: compute_node = self._get_compute_info(context, self.host) - scheduled_node = compute_node.hypervisor_hostname + #workaround for ironic + if compute_node.hypervisor_type == 'ironic': + scheduled_node = instance.node + else: + scheduled_node = compute_node.hypervisor_hostname except exception.ComputeHostNotFound: LOG.exception(_LE('Failed to get compute_info for %s'), self.host) I've tested this issue on Mitaka release, but it seems the code is the same in master branch. That's all. Regards
2016-09-14 13:42:44	Tomasz Czekajło	marked as duplicate		1564921