Activity log for bug #1623473

Date Who What changed Old value New value Message
2016-09-14 12:15:50 Tomasz Czekajło bug added bug
2016-09-14 12:21:57 Tomasz Czekajło description Hi, When I rebuild ironic instance via nova, after the first rebuild the node for the instance's overwritten by wrong value, thus next rebuild is not possible. Steps to reproduce ================== 1. Spawn new ironic instance 2. Rebuild the instance After this step you can see that hypervisor_hostname for the instance is totally different than before. (I use "nova show uuid" command to display information). When you display information for instance in ironic (ironic node-show --instance uuid) you can see that UUID of node is different than node in nova. 3. Second rebuild and we can see error as below. http://paste.openstack.org/show/irCzuu5qucX6kF44X6oe/ Environment =========== Mitaka release and Ubuntu 16 My workaround ============= After debugging I've found where is bug(?). https://github.com/openstack/nova/blob/stable/mitaka/nova/compute/manager.py#L2795 2795: compute_node = self._get_compute_info(context, self.host) 2796: scheduled_node = compute_node.hypervisor_hostname [...] 5118: def _get_compute_info(self, context, host): 5119: return objects.ComputeNode.get_first_node_by_host_for_old_compat( 5120: context, host) OK, let's dive deep https://github.com/openstack/nova/blob/stable/mitaka/nova/objects/compute_node.py#L274 274: def get_first_node_by_host_for_old_compat(cls, context, host, 275: use_slave=False): 276: computes = ComputeNodeList.get_all_by_host(context, host, use_slave) 277: # FIXME(sbauza): Some hypervisors (VMware, Ironic) can return multiple 278: # nodes per host, we should return all the nodes and modify the callers 279: # instead. 280: # Arbitrarily returning the first node. 281: return computes[0] It's looks the method return the first node for the given host. In case when we've hypervisor for ironic there is multiple nodes and the first node which is return is random. My workaround, nothing sophisticated but works for me: --- manager.py_org 2016-09-14 13:50:37.807379651 +0200 +++ manager.py 2016-09-14 13:51:40.275126034 +0200 @@ -2793,7 +2793,11 @@ if not scheduled_node: try: compute_node = self._get_compute_info(context, self.host) - scheduled_node = compute_node.hypervisor_hostname + #workaround for ironic + if compute_node.hypervisor_type == 'ironic': + scheduled_node = instance.node + else: + scheduled_node = compute_node.hypervisor_hostname except exception.ComputeHostNotFound: LOG.exception(_LE('Failed to get compute_info for %s'), self.host) I've tested this issue on Mitaka release, but it seems the code is the same in master branch. That's all. Regards Hi, When I rebuild ironic instance via nova, after the first rebuild the node for the instance's overwritten by wrong value, thus next rebuild is not possible. Steps to reproduce ================== 1. Spawn new ironic instance 2. Rebuild the instance After this step you can see that hypervisor_hostname for the instance is totally different than before. (I use "nova show uuid" command to display information). When you display information for instance in ironic (ironic node-show --instance uuid) you can see that UUID of node is different than node in nova. 3. Second rebuild and we can see error as below. http://paste.openstack.org/show/irCzuu5qucX6kF44X6oe/ Environment =========== Mitaka release and Ubuntu 16 My workaround ============= After debugging I've found where is bug(?). https://github.com/openstack/nova/blob/stable/mitaka/nova/compute/manager.py#L2795 2795: compute_node = self._get_compute_info(context, self.host) 2796: scheduled_node = compute_node.hypervisor_hostname [...] 5118: def _get_compute_info(self, context, host): 5119: return objects.ComputeNode.get_first_node_by_host_for_old_compat( 5120: context, host) OK, let's dive deep https://github.com/openstack/nova/blob/stable/mitaka/nova/objects/compute_node.py#L274 274: def get_first_node_by_host_for_old_compat(cls, context, host, 275: use_slave=False): 276: computes = ComputeNodeList.get_all_by_host(context, host, use_slave) 277: # FIXME(sbauza): Some hypervisors (VMware, Ironic) can return multiple 278: # nodes per host, we should return all the nodes and modify the callers 279: # instead. 280: # Arbitrarily returning the first node. 281: return computes[0] It's looks the method return the first node for the given host. In case when we've hypervisor for ironic there is multiple nodes and the first node which is return is random. My workaround, nothing sophisticated but works for me: --- manager.py_org 2016-09-14 13:50:37.807379651 +0200 +++ manager.py 2016-09-14 13:51:40.275126034 +0200 @@ -2793,7 +2793,11 @@          if not scheduled_node:              try:                  compute_node = self._get_compute_info(context, self.host) - scheduled_node = compute_node.hypervisor_hostname + #workaround for ironic + if compute_node.hypervisor_type == 'ironic': + scheduled_node = instance.node + else: + scheduled_node = compute_node.hypervisor_hostname              except exception.ComputeHostNotFound:                  LOG.exception(_LE('Failed to get compute_info for %s'),                                  self.host) I've tested this issue on Mitaka release, but it seems the code is the same in master branch. That's all. Regards
2016-09-14 13:42:44 Tomasz Czekajło marked as duplicate 1564921